You are on page 1of 3


Multivariate statistical methods have become commonplace in the Earth Sciences,

What was once an exclusive area of activity is now within the reach of Everyman,
owing to the ubiquitousness of mini-computers and the ready availability of software
for doing the computing. In the days when one was required to do one's own
programming, it was necessary to acquire considerable proficiency in linear algebra
and one or more programming languages. Today, the vast majority of the people
who use multivariate methods to analyse geological data have little or no idea
of the matrix operations underlying a particular method, nor, for that matter, what
the program is actually supposed to be doing. This situation can be both good
and bad. It can do no harm if everything goes according to schedule, the program
being used is competently constructed, which, alas, is far from being the general
case, and there are no strong deviations from standard statistical theory in the data
under examination. It is bad if the data do not fit the theoretical requirements
o f a particular method and even worse if the method of computation used is
inappropriate. It is an inescapable and sad fact of life that much geological and
biological material deviates in some manner or other from the theoretical require-
ments of a multivariate statistical procedure. The immediate relevance of this obser-
vation is that there are many sources of error in doing an analysis of geological data
by means of standard statistical software.
The spread of multivariate statistics in the Natural Sciences has, therefore, taken
place at a c o s t - the risk of doing something quite wrong and yet never knowing
that a mistake can have been committed, or worse, that a blunder is even possible.
Books on multivariate statistics aimed at all levels of sophistication abound, from
abstruse algebraically loaded treatises, through practically oriented texts, to volumes
of computing recipes such as are profusely available for biologists. The special
justification for this book is that it is concerned with the elementary consideration
of the special types of multidimensional problems that occur in Geology and which
are never, or are only summarily, considered in other places (e.g., textbooks dealing
with multivariate statistics) and which cannot always be correctly analysed by com-
mercially available software. There is an attached compact disk of compiled
programs and trial data for doing the most commonly occurring computations,
the files for running Graph Server and a file summarizing the steps involved in
activating the various routines, but we lay no claims to perfection nor to elegance
in the appearance of the computational output. There will be a www-site at the State
viii Preface

University of New York at Stony Brook for updates and corrections to the contents
of the CD, thanks to the generosity of Professor F. James Rohlf. In two recent
multivariate texts of RAR, the accompanying programs are written in copyrighted
code, which means that the user is required to acquire the means of accessing this
code. We have desisted from this practice, since it would defeat the main practical
purpose of the book. We supply our own compiled F O R T R A N and C programs
with instructions for entering data; each method is illustrated by one or more sets
of observations typical for the class of problem treated with an emphasis on the
peculiarities of geological data. The programs have been constructed outgoing from
our own research commitments. Note, however, it is not our intention to provide a
self-contained, hierarchical system such as offered by F. James Rohlf's NTSYSpc.
The compiled programs have mainly a didactic p u r p o s e - a simple means of illus-
trating the ideas expressed in the text. The important aspect of graphical presen-
tation has been the province of Enrico Savazzi, whose new language Graph
Server (GS) for displaying plots forms an integral part of the enterprise.
We have not provided more than a few introductory notes on the elements of
matrix algebra, deeming that the basic manipulations required for being able to
use the primer can be most effectively introduced at the appropriate points in
the text. References to introductory manuals presenting the elements of linear
algebra are provided wherever we have thought it necessary.
The idea for writing this introduction for geologists stems from more than 35 years
of experience of R A R in teaching statistics to geologists and biologists in many parts
of the world. This accumulated experience has convinced us that no matter how well
people seem to be grasping a course in multivariate applications, and despite a maxi-
mum of teaching effort, the number of the participants who will really stay with the
subject is small indeed. This is no reflection on the value of the discipline, but rather
an indication of the difficulty experienced by the tyro in understanding what can
be asked of statistical methods and how powerful multivariate methods are when
applied in an appropriate manner. Frustrations occasioned by the incorrect use
of techniques, including the "loyalty syndrome" with respect to a particular one (e.g.
Correspondence Analysis among francophones) is a major source of disaffection.
The main reason for this unfortunate situation is that when the average student
has been cast out to swim on his own he will drown unless he h a s a lifebuoy with
him. Only time can tell if this modest text is that lifesaver.
We wish to make it clear that the primer is not concerned with multivariate
modelling of geological processes, regionalized variables, etc. Cluster-analysis as
a specially delimited topic is likewise not taken up, notwithstanding that some
of the techniques that have come to be associated with the concept of "clustering"
are made use o f - f o r example, the minimum spanning tree, similarity coefficients
and Q-mode latent roots and vectors. (An easy introduction to clustering analysis
is available in the book by Everitt (1974).) It is solely concerned with the simple
application of standard methods of multivariate statistical analysis to geological
data in the form of arrays of measurements and compositional data (e.g. chemical
Preface ix

determinations). Consequently, we have not taken up advanced special methods of

multivariate analysis such as M-estimators, robust methods, nor such interesting
though relatively difficult procedures as the generalization of "biplots", notwith-
standing that biplots are doubtless destined to play an increasingly important role
in the future (cf. Gower and Hand, 1996). The full implementation of that subject
is, moreover, still in the relative early phases of development. Where desirable,
we provide references to more advanced statistical texts.
We are grateful to Professor John C. Davis, Geological Survey of Kansas and
University of Kansas for a valuable and well reasoned criticism of the first version
of the text from the geological standpoint. Professor John Aitchison, Department
of Statistics, University of Glasgow, is thanked for precious advice on aspects of
the analysis of compositions in connexion with a very early version of some of
the chapters of the text as well as the current one. Dr. Allan Gordon, Department
of Mathematics and Statistics, University of St. Andrews, kindly read the entire
manuscript and furnished us with many thoughtful suggestions for improvements.
Dr. Vera Pawlowsky-Glahn, Universidad Polit6cnica de Cataluna, Barcelona like-
wise read the entire text from the geomathematician's viewpoint and provided
us with valuable suggestions and advice. For answers to various questions from
Professor Leslie F. Marcus, Queen's College, New York and Professor F. James
Rohlf, Department of Ecology and Evolution, State University of New York at
Stony Brook, we are thankful. Professor Rohlf and SUNY are also thanked for
making forthcoming updates available via the Internet. The updates will be made
available by F(ile) T(ransfer) P(rotocol) at
We are well aware that some will react against the mode of presentation we adopt
here, regarding it as "condescending", "non-academic" (because of the use of the
first person rather than "the present authors" or the like, or the occasional use
of phrases more appropriate to the spoken language than the written). This has been
done with a definite purpose in mind. Multivariate geostatistics is not generally per-
ceived as being an enthralling subject. A heavy "nuts-and-bolts" mode of presen-
tation would have done little to help better matters.
The compact disk accompanying the primer contains the computer programs and
teaching sets of data in two sub-directories, an H T M L file explaining their use (being
a summary of what is said in the main text), the files needed for using Graph Server
and some files containing general information in a separate directory.
Finally, we wish to stress that this is a text for the IBM Personal Computer system
(and clones). We have no plans for releasing a version for Macintosh machines.
However, any Mac-adept with programming skills should be able to produce his
own set of programs from the information provided in the text.

Uppsala and Stockholm, June 1999

You might also like