You are on page 1of 4

DOI: 10.1002/cem.

2902

THE CHEMOMETRICS COLUMN

Statistical experimental design

Make measurements, or observations or a model.


Collect and analyze the data .
Draw a conclusion with respect to the hypothesis .

Physical scientists have for generations viewed this as a


typical recommended approach to experimental design
thinking about hypotheses and testing them out in what they
feel is a systematic logical way, but in the above scenarios,
For a paper published by R.A.Fisher in 1921, it statistics has a limited role to play, and there are no
would take 8 months of 12hour days just to generate recommendations above that one needs to understand statisti-
the numbers in the tables. cal principles before performing experiments.

ONE VARIABLE AT A TIME


MA STE RING A PHD
Further poorly conceived recommendations are available on
In Mastering Your PhD,1 Chapter 5 is entitled Good the web, often engrained in formalised courses, for example,
Experimental Design. The recommendations are as follows:
How can you be sure you're designing the best When designing experiments identify all of the potential
experiment possible? In order to assist you in designing good variables in the system, control them, and vary only one
experiments, we suggest you follow these three steps: variable at a time.3
a controlled experiment is one in which a hypothesis is
Define your objective(s). What is it that you are trying to tested by the manipulation of a single variable. The
test in this particular experiment (which question are you change that results from the manipulated variable
trying to answer, which hypothesis are you trying to establishes causality through directional cause and
prove)? effect.4
Plan your strategy. How will you achieve this objective? In science it is important to change one variable at a time
What is the size and scope of your experiment, and how so you may see which variable actually changed the data.
many times will you try to repeat it? If you change multiple and the data changes, you
Experimental details. Sketch out the details of your wouldn't know which variable it was. So it is better to
experiment. Which tools and equipment do you need? do one.5
How much time will your experiment take (one hour, There should be only one manipulated variable within a
one day, one month)? scientific experiment so that the experimenter can be
certain it is this variable which causes a pattern in the
A Summer Program for Science Teachers2 recommends resultant data, if any exists at all. If there were multiple
as follows: variables, then the experimenter would not know what
accounted for the results of the experiment.6
Develop your own problems .What do you want to
find out? Such is conventional wisdom for many physical
Formulate your hypothesis to be testedBased on scientists. Attitudes change only slowly, taking decades or
previous knowledge and information, what educated even centuries. There are still people that believe in the flat
guess do you want to test? Earth, and Darwinism definitely remains controversial for
Design a procedure and list the materials needed some.

Journal of Chemometrics. 2017;e2902. wileyonlinelibrary.com/journal/cem Copyright 2017 John Wiley & Sons, Ltd. 1 of 4
https://doi.org/10.1002/cem.2902
2 of 4 THE CHEMOMETRICS COLUMN

STATISTICAL TH INK ING widespread. During and after the war, there was a particular
need in the manufacturing industry, especially chemicals.
However, in the past few decades, some scientists are begin- Hence, a new generation of statisticians worked closely with
ning to realise that there are alternatives, and adopt statistical industrialists. G.E.P. Box is one of the best known who was
thinking. Statistics does not come naturally to many physical employed by ICI in 1948 and also married one of Fisher's
scientists, who are often used to exact laws established many daughters. His first major work in this area was within a book
generations ago. Whether learning about the hydrogen lines edited by O.L. Davies.10 Hence, what started off as agricul-
in a spectrum or Newton's law of motion, students are taught ture moved to chemistry and also engineering via chemical
that the more accurate their observations, the better; if their engineering.
results are inaccurate, they are bad. Errors are considered
bad things and to be avoided. Experiments are designed so
that they do indeed only study one variable or factor. To work
A NA LY T I CA L C H E M I ST S
out how mass, force, and speed are related, we try to remove
Such methods though for several decades were thought to be
all other external factors and design our apparatus just to
part of applied science and primarily the province of statisti-
study one effect. An experiment with extraneous or interfer-
cians. Only in the 1970s did laboratory and academic scien-
ing factors is considered a bad one, and experimentalists
tists start to realise the widespread possibilities of statistical
under such situations go to great lengths to eliminate them.
experimental design. Before then, there was often a signifi-
cant gap between applied science in industry and pure sci-
UK AGRICULT URAL STAT ISTICS ence in universities, and very little crossfertilisation. Books
and papers on experimental design were especially published
However, statistical thinking about rational designs of experi- in the analytical chemistry literature, as many analytical
ments started to develop formally around a century ago, largely chemists obtained funding from and collaboration with
out of necessity, primarily economic. In the UK, agricultural industry. High Performance Liquid Chromatography (HPLC)
statistics was a very important growth point in the 1920s and optimisation was a prominent early application area.
1930s. Prior to this, much agricultural land was owned by large
landowners. Labour was very cheap, so they could live luxuri-
COMPUTERS
ous lives on the back of their tenant farmers or labourers. After
the first World War, there was rapid social change, and the
From the 1970s, there was also another significant revolu-
price of labour increased. Many large estates were broken up
tion. One limitation prior to that was of computing power.
and sold off, and many former labourers who had fought in
The early workers in this area had to calculate everything
the first World War were permitted to purchase smallholdings.
manually or using slide rules or logarithm tables, and, at a
This highlighted the inherent inefficiency of 19th century
later date, calculators. It has been estimated that for a paper
agricultural techniques. Landowners wanting to make their
published by R.A. Fisher in 1921, with 15 tables, if it took
land more efficient, or new managed farms, needed new
1 minute to calculate every number, it would take 8 months
methods to improve their productivity. Rothamsted Research
of 12 hour days just to generate the numbers in the tables
Station in the UK was the centre of agricultural research and
(perhaps he had assistants whose job was to calculate num-
hired a stellar group of statisticians active in the 1920s and
bers for him).11 So many of the earlier papers in experimental
1930s, of which R.A.Fisher was the central figure. Almost
design were quite theoretical, but the applicability was lim-
single handed, they developed the discipline of formal exper-
ited by calculating power. Hence, some early designs were
imental design, of which Fisher's 1935 book7 summarises
purposely developed so that it was easy to determine effects
many of the concepts we now regard as essential statistical
using simple computations. Yates algorithm was an example.
thinking. Some principles put together by Fisher had been
As computing power became more widespread, this was less
reported but not widely acknowledged in the 18th and 19th
of a limitation. Initially, people would use programming lan-
century, for example, James Lind's work on scurvy8 and
guages such as FORTRAN and might type in code manually,
Charles Pierce's discussion of randomised trials9 but nothing
for example, to compute matrix inverses (either published in
mainstream. Fisher pulled this thinking together and
journals or in books such as Numerical Recipes). As time
formalised it.
evolved, these types of calculations became even easier, for
example, via spreadsheets or using MATLAB. Hence, the
MANUFACTURING INDUSTRY original limitations disappeared, and there was almost com-
plete flexibility using any meaningful statistical design,
Until the 1940s, statistical design was primarily applied to the resulting in a third revolution, and a broader acceptance
agricultural industry, although the principles were potentially throughout science and engineering.
THE CHEMOMETRICS COLUMN 3 of 4

In certain areas of science, the use of statistical experi- designs but nevertheless was widely consulted by his
mental design is gradually becoming accepted. Synthetic colleagues before optimising a process, and widely advocated
chemists exposed to process chemistry are increasingly the use of experimental designs.
accepting the need for formal designs. Their interest is almost Yet most analytical and physical chemists approach this
exclusively in optimisation, but it is simple to justify, by quite differently, often starting with concepts of design matri-
showing that false optima can be obtained because of interac- ces, degrees of freedom, orthogonality, regression, and so on.
tions between factors if one factor is altered at a time. As an Their view is that you cannot understand the basis of statisti-
example, consider a reaction that is pH dependent. At one cal designs without some statistics.
pH, a compound may be unreactive, hence changing the Finally, statisticians and many engineers will delve
temperature will make very little difference until extremes deeper into the statistics, building on concepts like distribu-
are reached (eg, when the compound may evaporate), and tions, estimation, regression, and so on. The mathematical
over a significant temperature region, the reactivity is basis of multivariate pattern recognition is not so far away
virtually flat. However, if one changes the pH, and the from experimental designafter all, when more than one
compound is now reactive, the reaction rate may change with factor is involved, the data are multivariate. The principles
temperature. Thus, the temperature profile differs according of multivariate regression are the same whether used in mul-
to pH and as such, we cannot study the effects of pH and tivariate calibration or in analysing designed experiments.
temperature independently. Statistically, it is said that pH In these articles, we will take the point of view of a
and temperature interact. numerate chemist or a starting statistician. This author
Unfortunately, in different areas of science, there are believes that it is important to understand the basis of designs
different needs. A synthetic chemist may not be interested rather than use them as recipes. As a chemist by training, he
about accurate estimate of effects, or about blocking or has been involved in laboratory classes and has always
randomisation; he or she just wants to find an optimum. Yet emphasised how essential it is to understand the basis of a
there are several diverse reasons why systematic or statistical practical experiment, rather than blindly follow a cookbook.
designs are useful. Why is a particular solvent used? Why do we use a specific
pH? Why do we use a 1 L volumetric flask? Why is the
Discovering interactions among factors. solvent refluxed? Why do we put an alkyl chain on one of
Screening many factors and deciding which are important the reactants? All good laboratorybased chemists try to
ones. understand the practical basis of why they are assembling
Establishing and maintaining quality control. their apparatus. This distinguishes a professional chemist
Optimising a process, including evolutionary operations. who may have a master's degree or a doctorate from a techni-
Saving time. cian who primarily follows instructions. So it is this author's
Reducing uncertainty of quantitative estimates. view that anyone using statistical designs should likewise
have some insight into their theoretical basis, maybe not at
the level of an advanced mathematically based statistician,
but at least know what a design matrix is, and how to calcu-
F O R M A L I S E D RUL E S late interaction effects, and what rotatability is. We will work
on these concepts in future articles.
Statistical experimental designs involve following formalised
rules. In future articles, we will discuss the basis of these
rules. Unfortunately, in different areas, people tend to have HISTORIC DEVELOPMENT
quite diverse expectations as to what is necessary to under-
stand about statistical designs before using them. In summary, the use of statistical experimental design has
Many organic chemists, for example, after being progressed over the last century. It started in agriculture,
convinced that interactions can have an influence over the spread to the chemical industry, and then started to enter into
optimum they find, just want a few recipes, such as a factorial the mainstream. Particularly important over the past few
design, and leave it to someone else or to a software package decades has been the opening up of rapid and easy to use
to calculate the optimum or the importance of an individual computing power, especially spreadsheets, so it is not neces-
factor. They have no interest in mathematical equations or sary to spend several days or even months calculating coeffi-
probabilities. This author has met an industrial guru, whose cients, and one does not need to be a specialist statistician.
original background was organic chemistry, who had no However, having some idea of the statistical principles
interest in matrices or regression or ANOVA, and would behind designs is importantas an example, a chemist in a
prefer to pay for a hugely expensive package rather than laboratory should understand the mechanism of reactions
spend a few hours learning about the statistical basis of they are performing and know the structures of the
4 of 4 THE CHEMOMETRICS COLUMN

compounds they are making but does not normally need to Undergraduate Education's Educational Materials Development
understand the full quantum chemical description of what is Program, http://www.webguru.neu.edu/lab/research/experimental
happening. designconsiderations, accessed 8 November, 2016

There still of course are many misconceptions about 4. Reference An IAC Publishing Labs Company, 2016 https://www.refer-
experimental design, some still taught formally as we have ence.com/science/experimentcalledonlyonevariablechangedtime
48fcf9608b13f75d#, accessed 8 November, 2016
seen. Statistical experimental design also differs from tradi-
tional design of experimentsfor example, if we want to 5. Gutierrez P. Why it is important in science to change only one var-
iable at a time; 2014, https://prezi.com/hddbqxr0xjk4/whyisit
make a new compound, how can we choose the reactants or
importantinsciencetochangeonlyonevariablea/, accessed 8
series of stepsstatistics is little use here, and we should November, 2016
not muddle the terminologies up. These articles will focus
6. Science homework help, enotes, http://www.enotes.com/homework
on the statistical aspects. help/whenconductinganexperimentwhyimportanttest535858,
accessed 8 November, 2016
Richard G. Brereton 7. Fisher RA. The Design of Experiments. New York: Hafner; 1935.
School of Chemistry, University of Bristol, Cantock's Close,
8. Lind J. A Treatise of the Scurvy in Three Parts. Edinburgh: Sands,
Bristol BS8 1TS, UK Murray and Cochran; 1753.
Email: r.g.brereton@bris.ac.uk
9. Pierce C. Illustrations of the Logic of Science (series). In: Popular
Science Monthly. Vol.1213; 18771878.
R E F E RENC E S 10. Davies OL (Ed). Statistical Methods in Research and Production.
London: Longman; 1956.
1. Gosling P, Noordam B. Mastering Your PhD. Berlin: Springer;
2006. 11. Fisher RA. Studies in crop variation. I. An examination of the yield
of dressed grain from Broadbalk. J Agric Sci. 1921;11:107135.
2. Garcia MAF. Summer research program for science teachers: how to
design an experiment, high school for leadership and public service;
1999, http://www.scienceteacherprogram.org/chemistry/Garcia99. How to cite this article: Brereton RG. Statistical
html, accessed 8 November, 2016
experimental design. Journal of Chemometrics. 2017;
3. Experimental Design Considerations, webGuru guide for under- e2902. https://doi.org/10.1002/cem.2902
graduate research, National Science Foundation Division of

You might also like