Professional Documents
Culture Documents
Introduction
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
A t its simplest level, statistics involves the description and summary of events. How
many home runs did Babe Ruth hit? What is the average rainfall in Seattle? But
from a scientific point of view, it has come to mean much more. Broadly defined, it is
the science, technology and art of extracting information from observational data, with
an emphasis on solving real world problems. As Stigler (1986, p. 1) has so eloquently
put it:
Modern statistics provides a quantitative technology for empirical science; it is a
logic and methodology for the measurement of uncertainty and for examination
of the consequences of that uncertainty in the planning and interpretation of
experimentation and observation.
The logic and associated technology behind modern statistical methods pervades all
of the sciences, from astronomy and physics to psychology, business, manufacturing,
sociology, economics, agriculture, education, and medicine—it affects your life.
To help elucidate the types of problems addressed in this book, consider an
experiment aimed at investigating the effects of ozone on weight gain in rats (Doksum
and Sievers, 1976). The experimental group consisted of 22 seventy-day-old rats kept
in an ozone environment for 7 days. A control group of 23 rats, of the same age,
was kept in an ozone-free environment. The results of this experiment are shown
in table 1.1.
Copyright @ 2009. Oxford University Press.
What about using the average to reflect the weight gain for the typical rat? Are there
other methods for summarizing the data that might have practical value when
characterizing the differences between the groups? The answers to these problems
are nontrivial. The purpose of this book is to introduce the basic tools for answering
these questions.
The mathematical foundations of the statistical methods described in this book were
developed about two hundred years ago. Of particular importance was the work of Pierre-
Simon Laplace (1749–1827) and Carl Friedrich Gauss (1777–1855). Approximately a
century ago, major advances began to appear that dominate how researchers analyze data
today. Especially important was the work of Karl Pearson (1857–1936) Jerzy Neyman
(1894–1981), Egon Pearson (1895–1980), and Sir Ronald Fisher (1890–1962). During
the 1950s, there was some evidence that the methods routinely used today serve us quite
well in our attempts to understand data, but in the 1960s it became evident that serious
practical problems needed attention. Indeed, since 1960, three major insights revealed
conditions where methods routinely used today can be highly unsatisfactory. Although
the many new tools for dealing with known problems go beyond the scope of this book,
it is essential that a foundation be laid for appreciating modern advances and insights,
and so one motivation for this book is to accomplish this goal.
This book does not describe the mathematical underpinnings of routinely used
statistical techniques, but rather the concepts and principles that are used. Generally,
the essence of statistical reasoning can be understood with little training in mathematics
beyond basic high-school algebra. However, if you put enough simple pieces together,
the picture can seem rather fuzzy and complex, and it is easy to lose track of where we
are going when the individual pieces are being explained. Accordingly, it might help to
provide a brief overview of what is covered in this book.
Copyright @ 2009. Oxford University Press.
One key idea behind most statistical methods is the distinction between a sample of
participants or objects versus a population. A population of participants or objects consists
of all those participants or objects that are relevant in a particular study. In the weight-
gain experiment with rats, there are millions of rats we could use if only we had the
resources. To be concrete, suppose there are a billion rats and we want to know the
average weight gain if all one billion were exposed to ozone. Then these one billion rats
compose the population of rats we wish to study. The average gain for these rats is called
the population mean. In a similar manner, there is an average weight gain for all the rats
if they are raised in an ozone-free environment instead. This is the population mean for
ratsEBSCO
raised in anAcademic
: eBook ozone-free environment.
Collection (EBSCOhost)The obvious
- printed problem3:47
on 3/10/2019 is that it isUNIVERSITA
AM via impractical
DEGLI STUDI DI BRESCIA
AN: 277660 ; Wilcox, Rand R..; Basic Statistics : Understanding Conventional Methods and
Modern Insights
Account: s5715924.main.ehost
INTRODUCTION 5
to measure all one billion rats. In the experiment, only 22 rats were exposed to ozone.
These 22 rats are an example of what is called a sample.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
The main focus of this book is not experimental design, but it is worthwhile
mentioning the difference between the issues covered in this book versus a course on
design. As a simple illustration, imagine you are interested in factors that affect health.
In North America, where fat accounts for a third of the calories consumed, the death
rate from heart disease is 20 times higher than in rural China where the typical diet is
closer to 10% fat. What are we to make of this? Should we eliminate as much fat from
our diet as possible? Are all fats bad? Could it be that some are beneficial? This purely
descriptive study does not address these issues in an adequate manner. This is not to
say that descriptive studies have no merit, only that resolving important issues can be
difficult or impossible without good experimental design. For example, heart disease is
relatively rare in Mediterranean countries where fat intake can approach 40% of calories.
OneEBSCO
distinguishing feature
: eBook Academic between
Collection the American
(EBSCOhost) - printed diet and the3:47
on 3/10/2019 Mediterranean diet is
AM via UNIVERSITA
DEGLI STUDI DI BRESCIA
AN: 277660 ; Wilcox, Rand R..; Basic Statistics : Understanding Conventional Methods and
Modern Insights
Account: s5715924.main.ehost
6 BASIC STATISTICS
the type of fat consumed. So one possibility is that the amount of fat in a diet, without
regard to the type of fat, might be a poor gauge of nutritional quality. Note, however,
that in the observational study just described, nothing has been done to control other
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
It might help to comment on the goals of this book versus the general goal of teaching
Copyright @ 2009. Oxford University Press.
statistics books have given the impression that all major advances ceased circa 1955.
This is not remotely true. Indeed, major improvements have emerged, some of which
are briefly indicated here.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
As is probably evident, a key component to getting the most accurate and useful
information from data is software. There are now several popular computer programs for
analyzing data. Perhaps the most important thing to keep in mind is that the choice of
software can be crucial, particularly when the goal is to apply new and improved methods
developed during the last half century. Presumably no software package is best, based
on all of the criteria that might be used to judge them, but the following comments
might help.
Excellent software
The software R is one of the two best software packages available. Moreover, it is free and
available at http://cran.R-project.org. All modern methods developed in recent years,
as well as all classic techniques, are easily applied. One feature that makes R highly
valuable from a research perspective is that a group of academics do an excellent job
of constantly adding and updating routines aimed at applying modern techniques.
A wide range of modern methods can be applied using the basic package. And many
specialized methods are available via packages available at the R web site. A library
of R functions especially designed for applying the newest methods for comparing
groups and studying associations is available at www-rcf.usc.edu/˜rwilcox/.1 Although
not the focus here, occasionally the name of some of these functions will be mentioned
when illustrating some of the important features of modern methods. (Unless stated
otherwise, whenever the name of an R function is supplied, it is a function that belongs
to the two files Rallfunv1-v7 and Rallfunv2-v7, which can be downloaded from the site
just mentioned.)
S-PLUS is another excellent software package. It is nearly identical to R and the
basic commands are the same. One of the main differences is cost: S-PLUS can be
very expensive. There are a few differences from R, but generally they are minor and
of little importance when applying the methods covered in this book. (The R functions
Copyright @ 2009. Oxford University Press.
mentioned in this book are available as S-PLUS functions, which are stored in the files
allfunv1-v7 and allfunv2-v7 and which can be downloaded in the same manner as the
files Rallfunv1-v7 and Rallfunv2-v7.)
EBSCO : eBook
1. Details andAcademic Collection
illustrations (EBSCOhost)
of how this - used
software is printed
can on
be 3/10/2019 3:47 AM
found in Wilcox via UNIVERSITA
(2003, 2005).
DEGLI STUDI DI BRESCIA
AN: 277660 ; Wilcox, Rand R..; Basic Statistics : Understanding Conventional Methods and
Modern Insights
Account: s5715924.main.ehost
8 BASIC STATISTICS
routines in their package, but this has not been done as yet for some of the methods to
be described.
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Good software
Minitab is fairly simple to use and provides a reasonable degree of flexibility when
analyzing data. All of the standard methods developed prior to the year 1960 are
readily available. Many modern methods could be run in Minitab, but doing so is
not straightforward. Like SAS, special Minitab code is needed and writing this code
would take some effort. Moreover, certain modern methods that are readily applied
with R cannot be easily done in Minitab even if an investigator was willing to write the
appropriate code.
Unsatisfactory software
SPSS is certainly one of the most popular and frequently used software packages. Part of
its appeal is ease of use. When handling complex data sets, it is one of the best packages
available and it contains all of the classic methods for analyzing data. But in terms
of providing access to the many new and improved methods for comparing groups and
studying associations, which have appeared during the last half-century, it must be given
a poor rating. An additional concern is that it has less flexibility than R and S-PLUS.
That is, it is a relatively simple matter for statisticians to create specialized R and S-PLUS
code that provides non-statisticians with easy access to modern methods. Some modern
methods can be applied with SPSS, but often this task is difficult. However, SPSS 16
has added the ability to access R, which might increase its flexibility considerably. Also,
zumastat.com has software that provides access to a large number of R functions aimed
at applying the modern methods mentioned in this book plus many other methods
covered in more advanced courses. (On the zumastat web page, click on robust statistics
to get more information.)
The software EXCEL is relatively easy to use, it provides some flexibility, but
generally modern methods are not readily applied. A recent review by McCullough and
Wilson (2005) concludes that this software package is not maintained in an adequate
manner. (For a more detailed description of some problems with this software, see
Heiser, 2006.) Even if EXCEL functions were available for all modern methods that
might be used, features noted by McCullough and Wilson suggest that EXCEL should
not be used.
Copyright @ 2009. Oxford University Press.
EBSCO : eBook Academic Collection (EBSCOhost) - printed on 3/10/2019 3:47 AM via UNIVERSITA
DEGLI STUDI DI BRESCIA
AN: 277660 ; Wilcox, Rand R..; Basic Statistics : Understanding Conventional Methods and
Modern Insights
Account: s5715924.main.ehost