Professional Documents
Culture Documents
Advisors:
P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,
I. Olkin, N. Wermuth, S. Zeger
Exploring Multivariate
Data with the
Forward Search
'Springer
Anthony C. Atkinson Marco Riani and Andrea Cerioli
Department of Statistics Dipartimento di Economia
The London School of Economics Sezione di Statistica e Infonnatica
London WClA 2AE Universita di Parma
UK Via Kennedy 6
A.C.Atldnson@Jse.ac.uk 43100 Parma
ltaly
mriani @unipr.it
andrea.cerioli @unipr.it
1. Multivariate analysis. 1. Riani. Marco. II. Cerioli. Andrea. III. Title. IV. Series.
QA278.A8S 2003
Sl9.5'35-<lc22 2003058614
ISBN 978-1-4419-2353-0 ISBN 978-0-387-21840-3 (eBook)
DOI 10.1007/978-0-387-21840-3 Printed on acid-free paper.
<C 2004 Springer Science+Business Media New York
Originally published by Springer-Verlag New York, lnc. in 2004
Softcover reprint ofthe hardcover lst edition 2004
Ali rights reserved. This worlc may not be translated or copied in whole or in part without the
wriuen perrnission of the publisher (Springer Science+Business Media, LLC ),
except for brief excerpts in connection with reviews or scholarly analysis. Use
in connection with any form of information storage and retrieval. electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names. trademarks. service marks. and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.
9 8 7 6 s4 3 2 1 SPIN 10949610
To the memory of
Iris Atkinson
an enthusiastic grower of bearded irises
Our Thanks
As with our first book, the writing of this book and the research on which
it is based, have been both complicated and enriched by the fact that the
authors are separated by half of Europe. Our travel has been supported
by grants from the Italian Ministry for Scientific Research, by the Depart-
ment of Economics of the University of Parma and by the Staff Research
Fund of the London School of Economics. We are grateful to our numerous
colleagues for their help in many ways. In England we especially thank Dr
Martin Knott at the London School of Economics, who has been a steadfast
source of help with both statistics and computing. Kjell Konis, currently at
x Preface
Anthony Atkinson
a.c.atkinson@lse.ac.uk
http://stats.lse.ac.uk/atkinson/
London, England
Marco Riani
mriani@unipr.it
http://www.riani.it
http://economia.unipr.it/docenti/riani
http://stat.econ.unipr.it/riani
Andrea Cerioli
andrea.cerioli@unipr.it
http://economia.unipr.it/docenti/cerioli
http://stat.econ.unipr.it/cerioli
Parma, Italy
June 2003
Contents
Bibliography 597
Multivariate Data
Y is the n x v matrix of observations with ith row Yi, the jth element of
which is Yij and, where essential, the jth column of Y is yc 1 .
X is n x p, the matrix of explanatory variables in regression.
J is an n x 1 vector of ones, even though it is a capital letter.
q(i) is a vector, usually n x 1, of zeroes, with ith element equal to one:
qj(i) = 0, j -1- i, qi(i) = 1.
Statistical Operations
Eis expectation, a Romanletter distinct from the matrix of residuals E.
var variance, v x v for multivariate data and
cov covariance.
Matrix Operations
tr the trace of a matrix.
diag a diagonal matrix.
IIYi- Yjll = n::::%=l(Yik- Yjk) 2 } 0 · 5 , the Euclidean distance between Yi
and Yj·
Ly, = IIYill = n::::%=1yfd 0 ·5 , the length of the vector Yi·
xviii Notation
U nivariate Regression
E(y) = Xß, where ß is p x 1.
E(yi) = x'[ ß.
/J = (XT X)- 1 XT y, the least squares estimator.
e = y - fj = y - X /J and
8 2 = eT el(n- p).
S(/J) = (y- X/J)T(y- X/J), the residual sum of squares.
So = (y- iJ)T(y- y), the corrected sum of squares of the data.
R 2 = R~IX ={So- S(/J)}IS(/3), the squared multiple correlation coeffi-
cient.
Estimation for Multivariate Data
If there is no matrix of explanatory variables,
E(yi) = J.L and E(Y) = JJ.LT.
{1, = iJ = yr J In, the v x 1 vector of estimated means.
Y = J{l,T = JJTYin, the n x v matrix of fitted values.
E = Y = Y- Y = Y- J JTYin, the n x v matrix of residuals.
2: is the v x v population covariance matrix of Y.
S([l,) = L:.:~=l (Yi - [l,)(yi - [l,)T = ET E is the v x v matrix of residual
sums of squares and products of the data.
f; = S([l,)ln is the maximum likelihood estimator of 2:, with diagonal
elements aJ and off-diagonal elements aik·
tu = S([l,)l(n- 1) is the unbiased estimator of 2:.
Notation xix
xr
E(Y) = XB, where B (capital beta) is p x v, with ith row ßi·
E(y;) = BT X; and E(yij) = ßcj'
B = (XT x) - xry, the p X V matrixleast squares estimator when the
1
Ordinary Kriging
s0 is a prediction site, i.e. the value y(so) is tobe predicted from y .
h is a 2 x 1 vector giving the spatial lag between two sites, s and t.
N(h) is the nurober of sites at lag h within S.
c(h) is the covariogram, while
2v(h) is the variagram (v(h) is called the semivariogram). Both c(h) and
2v(h) may depend upon a parameter vector e.
2v(h) isarobust estimate of 2v(h).
C is the n x n covariance matrix of y.
T is the n x n matrix of variagram values of y.
c is the n x 1 vector of covariances between y(so) and Yi·
v is the n x 1 vector of variagram values between y(s 0 ) and Yi·
y(soiS) is the ordinary kriging predictor at site s 0 , computed through
Tf, the n x 1 vector of ordinary kriging weights.
a 2 (s 0 IS) is the mean-squared prediction error associated with y(so iS).
v* and Tf* have the same meaning as v and ry, but they are defined under
a measurement error model; Y* (s 0 IS) is the corresponding ordinary kriging
predictor.
s(i) is network s with the ith location removed.
ei,S(i) is the standardized prediction residual at site Si, based Oll the n -1
Observations from S(i).
C(i) is the (n -1) x (n -1) covariance matrix of Y(i). c(i) is the (n - 1) x 1
vector of covariances between Yi and Y( i).
J(i) is vector J with the ith entry removed, i.e. a (n- 1) x 1 vector of
ones.
e.t, s<=l
•
is the standardized prediction residual at site Si at step m of the
forward search. Here m 0 :::; m :::; n - 1 because the last step of the search
is uninformative for prediction purposes.
Spatial Autoregression
X and ß are the same as in univariate regression.
W is the n x n weight matrix defining the neighbourhood structure inS.
p is the spatial interaction measure between neighbouring sites.
a 2 is the variance of the independent disturbance terms Ei, i = 1, . . . , n.
:E is the n x n covariance matrix of y.
2 is a 2 :E- 1 .
ß, 0'2 and p are the maximum likelihood estimators of ß, a 2 and p.
e = o:-l (In - pw) (y - X /3), the n x 1 vector of standardized regression
residuals.
B. is a block of b. contiguous sites.
n* is the number of such blocks. Usually,
b. = b, for t = 1, ... , n*.
Notation xxi
These departures are not exclusive; for example the data may need trans-
formation whilst consisting of three groups together with some outliers.
But it is useful to consider the first three departures on their own.
To explore the structure of our data we shall make much use of Ma-
halanobis distances. Although tests on large distances are sometimes sug-
gested to test for outliers, we make particular use of plots. Our argument
2 1. Examples of Multivariate Data
is that these distances from all n observations can fail to reveal some of
the departures listed above when all n observations are used to estimate
the parameters needed to calculate the distances. We will show that extra
information can be obtained when outliers are present by calculating all n
distances using parameter estimates from a subset of m observations which
exclude the outliers. To start our argument it is helpful to look at the case
of a simple sample, that is v = 1. This seemingly trivial instance is sur-
prisingly helpful, not only in describing the problems of outlier detection
using Mahalanobis distances but also in giving an informal description of
the forward search.
For univariate observations the Mahalanobis distance reduces to the
scaled residual
n
di =ei/s=(yi-y)/s and s2 =L)Yi-y) 2 /(n-1), (1.1)
i=l
where y = 2.:::" 1 yifn. The squared distances di have a scaled beta dis-
tribution which is asymptotically chi squared on 1 degree of freedom. (We
devote the first part of Chapter 2 to a discussion of such distributional
results). We can use a probability plot of the n values of di to check the
distribution of the distances and so the adequacy of the model. Alterna-
tively, we can plot di against the normal distribution.
Now suppose that there is a single outlier, formed by adding an amount
s/:::i to observation f which has the value fj, with s as in (1.1). This changed
observation will affect both the value of y and of s 2 . Then for this obser-
vation
de = (1- 1/n)/:::i. (1.2)
J1 + /:::i 2 /n
If observation f has residual ee in the original sample, a slightly more com-
plicated formula results, which has a similar structure as a function of /:::i.
The relationship in (1.2) shows that, for large n and moderate /:::i, the
value of de will be near to /::). since the single observation will not have a
large effect on the estimate of the variance. But, for moderate n, the value
of de is not so large; for /::). = 3 and n = 10, 50 and 100 we obtain values
of 1.959, 2.706 and 2.845 for de. An outlier test based on the approximate
normal distribution of de would fail to detect anything strange abut this
Observation. Of course, with n = 100, a value of 3 is not particularly large.
However, for all n, information about the outlying nature of observation
f can be obtained by plotting the values of di . In particular, although the
value of de may not be especially large, the values of the other distances
will be shrunk by the inflated estimate of the variance in (1.2).
A powerful method of detecting single outliers (Cook and Weisberg 1982,
Atkinson 1985) is the deletion of single observations. If in our simple ex-
ample Yf. is removed from the estimation of the mean and variance, the
1.2 A Sketch of the Forward Search 3
distance for that observationwill have the value ~ . For all other observa-
tions Yi the distances from deletion of the observation will still include the
effect of Ye in estimation of the parameters. The corresponding distances
will thus not be much changed by the deletion of each Yi in turn and this
deletion procedure will lead to clear identification of the single outlier.
Now suppose there are k 2: 2 outliers formed in much the same way
as we formed ye. The outliers will then form a small duster consisting of
observations € 1 up to fk· Single deletion in turn of each of these k obser-
vations will leave the estimates of the mean and variance affected by the
other k - 1 outliers so that the outlying units may not have particularly
large distances. This hiding of the effect of one outlier by another is called
"masking" . This masking effect can be broken and the k outliers revealed if
we delete all k outliers and then calculate the parameter estimates. However
we have first to determine which set of k observations to delete; there are
n!jk!(n- k)! possibilities, which can be a large nurnber , even formoderate
sized samples; with 100 Observations and five outliers there are 75,287,520
possibilities. The presence of rnasking may rnake it impossible to reduce
the nurnber by a series of univariate deletions. An example of the failure of
such a "backwards" rnethod for a binornial rnodel is in Atkinson and Riani
(2000, §6.16.2). In addition, the exact nurnber of outliers is not known;
there rnight be six outliers or four.
Of course, with a univariate sample it is not hard to deterrnine which
observations to delete as they are ordered on a line; the ordering of the
observations is not changed by the estimates of the rnean and variance. But
this is no Ionger so with multivariate observations. The problern with v 2: 2
is that the ordering of the data depends upon the parameter estimates.
Observations with the same Mahalanobis distance di when k 2: 2 lie on
the sarne ellipsoid, centered at the origin, the shape of which is determined
by the estimated covariance rnatrix. Outliers rnay cause the shape of this
ellipsoid to change making it perhaps more or less spherical. The result is
that, as the estimated covariance rnatrix changes, so does the ordering of the
units by their Mahalanobis distances. The forward search overcornes this
problern by finding outlier free subsets of the data from which parameters
and distances can be estimated and the observations ordered.
either duster. Then units from both dusters would join the subset during
the search. Even if there were no very large distances, the presence of the
two dusters would be revealed by a shortage of very small distances. As we
see in a multivariate example in §3.4, rerunning the search from a different
starting point would reveal the dusters.
This brief introduction is complemented by the Exercises in §3.6. In the
solutions we give examples of forward plots of Mahalanobis distances and
of aspects of the estimated covariance matrix for a sample from a bivariate
normal distribution, both in its original state and after contamination by
a variety of patterns outliers.
Our purpose is not only to identify outlying observations and dusters,
groups and subpopulations, but also to determine what effect they have
both on the model to be fitted and on the condusions that can be drawn
from the data. An example is the need for power transformation of the data
to improve normality. The estimated transformation can be very much af-
fected by a few outliers. These need to be identified so that a transformation
can be found that is suitable for the bulk of the data.
The data were collected to determine the variability in size and shape
of heads of young men in order to help in the design of a new protection
mask for the Swiss army. Because of the variations in human heads, it was
clear that one mask could not be satisfactory for all soldiers. The aim was
to find a few typical head sizes and shapes which, it was hoped, would
make it possible to provide satisfactory masks for all soldiers. If the data
have a multivariate normal distribution, the techniques of multivariate data
analysis can be used to determine the best few standard types.
Accordingly we start with two plots to check whether the data are ap-
proximately normal. Figure 1.1 is the scatterplot matrix for the six vari-
ables , that is the matrix of scatterplots for all pairs of variables. The data
do seem to have the elliptical contours which would be expected from the
pairwise bivariate normal distributions. However, it is hard to tell by visual
inspection whether the scattering of more remote points are what is usually
found in the tails of normal distributions.
A conventional way to try to answer this question is to look at a plot of
the n Mahalanobis distances, the analogue of the residuals in regression.
We derive the scaled beta distribution of the squared distances in §2.6. But,
asymptotically, the squared distances have a X2 distribution Oll V degrees of
freedom, where v is the dimension of the measurements, here six. We could
look at a QQ plot of the ordered squared distances against the percentage
points of x~, but this plot is sparse for large values. Instead we look at the
plot for the distances, with percentage points that are the square roots of
the percentage points of the chi-squared distribution. The resulting plot is
Figure 1.2. In the absence of random fiuctuations the calculated distances
should fall on the diagonal line given in the figure. They seem to do so, as
far as the unaided eye can tell, although the smaller distances perhaps lie
slightly, but systematically, off the line.
We use simulation to provide a guide as to what kind of fiuctuations are
tobe expected in such plots. In Figure 1.2 we also include a 90% envelope
formed from 99 simulations of samples of 200 six-dimensional normal ran-
dom variables. The Mahalanobis distances are calculated for each sample
and ordered. The ends of the point-wise confidence intervals in the plot are
the fifth and 95th largest value of each simulated order statistic of the Ma-
halanobis distances. The figure shows that the smallest and largest obser-
vations alllie on or within this narrow envelope. There is no evidence, when
all 200 observations are fitted, of any departure from the multivariate nor-
mal distribution of these measurements. Similar conclusions are provided
by the plot of the squared distances, which, however, due to the effect of
squaring, has a much more spread out upper tail and a more compact lower
tail.
It seems as if the data do indeed have a multivariate normal distribution.
However, it is possible to introduce outliers into a multivariate data set in
such a way that they are not particularly outlying in any of the two dimen-
sional projections onto the co-ordinate axes which form the scatterplots
8 1. Examples of Multivariate Data
100
.."'
~
FIGURE 1.1. Swiss heads: scatterplot matrix of the six measurements on 200
heads
of Figure 1.1. If there were one such outlier, it would have a significantly
large Mahalanobis distance in the QQ plot of Figure 1.2. But, if there were
several outliers close together, they might not show in the final plot of Ma-
halanobis distances because of their combined effect on estimation of the
mean and covariance matrix of the fitted distribution. Such observations
can however be detected from a forward plot of Mahalanobis distances cal-
culated for each subset size during the forward search. It may seem unlikely
that observations of this kind will occur in a well established subject like
the measurement ofheads, where each number is well understood. However,
data entry and editing can produce bizarre errors.
One way of detecting the presence of outliers is to look at the distance
of the next observation to join the subset. Figure 1.3 is a plot of these dis-
tances, that is, at each m, of the minimum distance amongst the observa-
tions not in the subset. This is usually the next unit to join the subset. If the
1.4 Swiss Heads 9
...
0
"'c-;
0
c-;
"'.,;
0
.,;
"1
2 3 4
FIGURE 1.2. Swiss heads: QQ plot of the ordered Mahalanobis distances against
the square root of the percentage points of x~ with 90% envelope from 99 sim-
ulations. There do not seem to be any outliers when all observations are fitted
distance is large, then an outlier is being introduced into the subset. Once
an outlier has been introduced, it may distort the estimates of the mean
and covariance matrix in such a way that other similar outliers no Ionger
seem remote. Thus, the introduction of a duster of outliers will be heralded
by a spike in the plot, which will dedine thereafter. Samething of this be-
haviour is visible in Figure 1.3. Initially the plot is virtually horizontal as
observations from the centre of a multivariate normal distribution join the
subset. The slope of the curve increases gently as more remote observations
are introduced. Towards the end there are severallarger jumps. The penul-
timate jump is due to the introduction of observation 104. Once it has been
introduced, observation 111 is equally remote. Otherwise, Figure 1.3 does
not reveal the introduction of a duster of outliers. The generally smooth
shape of the curve indicates that the forward search has ordered the obser-
vations from those at the centre of the multivariate distribution to those
most remote. We pursue this line of analysis in Chapter 3 where we Iook
at forward plots of the distances for individual units.
We now return to the purpose for which the data were collected. The idea
is to identify typical heads which could be used to decide which protection
masks should be manufactured. It would be most convenient if there were
just a range of standard sizes from small to large, so that only one number
were needed to specify a mask. They could then be selected in the way in
10 1. Examples of Multivariate Data
FIGURE 1.3. Swiss heads: forward plot of minimum distances of units not in the
subset. There may be a few outliers entering at the end of the search
which cheap shoes are bought solely by size, as are shirts in the United
Kingdom. If the authorities are unlucky, it may be necessary to specify two
or more dimensions. Expensive shoes are bought both by size and width
and shirts in the United States by both collar size and sleeve length. Both
pairs of measurements define a bivariate distribution of satisfactory items.
One could think of further variables which are necessary for a comfortable
shirt, for example ehest size. However it may be that once the variation of
ehest size with collar size and sleeve length has been accounted for, there
is no further appreciable variation in ehest size. The same may be true
with head sizes. Are all six variables really needed to explain the observed
variation or are there a few combinations of the measurements which ex-
plain nearly all differences between people? If the data are multivariate
normal, appropriate linear combinations are simply found by the methods
of principal components analysis, described in Chapter 5. In the absence of
multivariate normality, it is much harder to discover suitable combinations
of the variables and to determine their importance.
easy to suggest reasons for any patterns which we find. Of course, it may be
too easy to explain random fluctuations and we need statistical procedures
to help us assess facile explanations of any seeming structure.
The data in Table A.2 are women's athletic records for 55 countries. The
variables are times for the following distances:
The data were given by Johnson and Wichern (1997, pp. 44-45) and
are taken from a handbook prepared for the 1984 Olympic games in Los
Angeles. They therefore come from an interesting period in the history of
women's athletics, when there were, now authenticated, allegations about
the treatment of female athletes, especially in communist countries, with
male sex hormones. One aspect of the analysis is therefore to see what
evidence the data provide- perhaps these countrieswill appear as outliers.
Table 1.1 gives the minimum and maximum national records for each
race and the country concerned. Also given is the ratio of these times.
Both the minimum and maximum times are interesting. Apart from the
USA, the other countries with minimum times were all communistcountries
at the time, which have all since disappeared as legal entities, which has
not been the fate of all communist countries. The German Democratic
Republic (GDR) is now part of the Federal Republic of Germany (FRG) ,
Czechoslovakia (CZ) has split into two countries and the USSR into Russia,
the Ukraine and many more.
The maximum times belong to two island nations in the Pacific (Cook
Islands, CI, and Western Samoa, WS). Figure 1.4 is a scatterplot matrix
of the data, in which both the times for the Cook Islands and Western
12 1. Examples of Multivariate Data
[] .... w
22 24 26 1.9 2.1 2.3 9 10 12
.... ....
~:·
W$ m- 'W w
.... w
;t '
. ~i· +
.~ : N
+ .f +
.
.<;t.
~:. . +
•tt't• ~q ~·
c:J
:;.;+
w ..... v•
w +
v•
f•
+ +
r
++
1?~ J~·
~·
~· ·;tl
~...-...·
........ ~· r
'i~;r ..::~"
~r·
l
..
c:J ;r·
"'V ~
.. v•
.
. ~
v •
WS ++WS + .'W w • w
.
,.
.t1t;·
,.
+ +jtlif· + ~~
"" s. + ~c:
... •tw~ C
. ~ [J[l]
w
...J •
.c• + "
f
+~!:. .~F ~"\4·
if'A! f+
...... +~
. .
-·1,\.
t\~· • J1·
[Ljc:J
+ • *~+ y4 ' +
no "~
··~
+ (.. t c
~;:;~: .. .
~· . c
~#l c ~~I ~{BI+
0
"'
..
"c:J
~;,.,·: +
0
no ... ~ ... ~ n
c
.. * <i' +
J:l Cl
c +++: c lt....
~* 1/~~
... +
c:J
+
+
. . +
·~
• c
~
• +
.+
..... c c
++ + Cl 9( +Cl
+
...t.
~~;r ~·
'+...
;:.Jii• ~?~ · ~~it + ~jfJ
11 .0 12.0 48 52 56 80 •o s.o 150 250
FIGURE 1.4. Track records: scatterplot matrix of the national records for women
from 55 countries for seven races. The results for the Cook Islands (CI) and
Western Samoa (WS) are Iabelied
Samoa have been labelled. The difference in the pattern of times for the two
countries is interesting. The times for the Cook Islands are always amongst
the largest, lying towards the end of the major axis of the elliptical clouds
of points. However, the very large tim es for Western Samoa for the last
three races cause the country to lie away from the scatter in many of the
bivariate plots.
Considering the minima and maxima is looking at each variable indi-
vidually, that is at one-dimensional projections along the coordinate axes.
The scatterplot matrix shows bivariate projections onto the same axes. We
now see what multivariate techniques reveal. Figure 1.5 is the QQ plot of
Mahalanobis distances at the end of the search, which shows three outliers.
The largest is, indeed, Western Samoa (55), with the next largest being
North Korea (the Democratic People's Republic of North Korea, 33). The
1.5 National Track Records for Women 13
<0
•
•
•
third outlier is Mauritius (36), another island. There also seem tobe some
other indications of non-normality, with several other observations lying
on, or outside, the simulation envelope. More detailed information can be
obtained from forward plots of Mahalanobis distances during the search.
Figure 1.6, like Figure 1.3, is a forward plot of the minimum Mahalanobis
distance among the units not included in the subset, which is large when
outliers join. The two plots are quite different. For most of the search the
values in Figure 1.6 oscillate between four and six, but at the end of the
search there is a very large jump upwards, much larger than those at the
end of the search in Figure 1.3. The last increase, at m = 54 is for West-
ern Samoa, which joins the subset at m = n = 55. The preceding jumps
are indeed for Mauritius and North Korea which come in at steps 54 and
53. Because these distances are for units not included in the subset, they
are larger than those in the QQ plot of Figure 1.5 where all observations
are used in estimation of the parameters. However, these three extreme
observations are clear from either plot.
It is informative to refer these indications of outliers back to the data.
Accordingly, Figure 1.7 is a scatterplot matrix of the data, minus Western
Samoa, on which the results for the Cook Islands (CI), North Korea (DRK)
and Mauritius (M) have been labelled. The general performance of North
Korean athletes is such as to lie within the bulk of the data, but the plot
14 1. Examples of Multivariate Data
0
::;;
E
:::>
"'
E
.2'
~
CD
10 20 30 40 50
Subset size m
FIGURE 1.6. Track records: forward plot of minimum distances of units not in
the subset
shows that there is, in particular, a high value for y 2 which causes the
observation to fall away from the general distribution. This is particularly
clear in the panel plotting Y2 against Y3· The values for Mauritius also
stand away from the generally normal distribution of observations, in part
because of the large time for y 7 compared to some of the other times. lt
is important to be clear that the Cook Islands do not show as an outlier
because the times, although they are all large, fit in the general correlated
multivariate normal distribution - there is no combination of egregiously
high and low times for this country.
A feature ofFigure 1.6 is the peak at m = 40, which may be an indication
of the beginning of a duster of slightly outlying observations. A further in-
dication of some undetected structure comes from the QQ plot of Figure 1.5
which indicates general non-normality. We investigate these indications in
Chapter 3 using forward plots of individual Mahalanobis distances for each
country. As a result there seem to be about a dozen countries which do not
fit the general model of multivariate normality.
lt makes little sense to remove 12 observations out of 55. Instead we
should look for a model which includes all observations, or perhaps all
except one or two outliers, such as Western Samoa. One possibility is
to transform the data. For example, taking the reciprocals of t he times
would give average speeds, which seems as logical a measure of perfor-
mance as times. In some examples, such as the univariate data on survival
times of animals (Box and Cox 1964), often known as the "Poison Data",
1.5 National Track Records for Women 15
[][2] .
1.9 2 .1 2.3 8.5 9.5 10.5
" "'
t.• ....
ttt
..
ORK + -1 +- ORK ++ + + o~ + 0~ •
t<l•·
DRK
:f;J•J:F
.~:~~ t ...... ++/"J"~_..,++ M
~
• W. +
;.;..
~! +
t;.. . . .
•• + ~ .·;
~ ~·
[Z][J
~ +
"\"
·:~l
.
+ • ,...+ +
~·
+ ........... :;:·
••• + f~ .
•f
[]
+ v• t.
M+
'1 +M +M + M
.
·~ ;;1· fJtl}'+
• t.;.+•
...
~:r<-:tf+ + :,...t};-::·
~~~
.'t'; .t· +
.
ff~· ORK '$~+ DRK ~ ·~ ·'.t·
~... ~~K
*
.
[] .... :sr··........
.
+
..'$..;. ~·"
·~ !.M
... . +
.t+
t~jl( ...... \ rk o~- • .f
"'
+
~t:
.. ~:·
w.+~ + ++T~· ~ + .._.t+.~ +
.W+ ....
~~+ .lt!l
.~ []
ORK 4 AK
....
+,...++ ORK "K + K
:...
......
....
r~.: ~.:. 'M
•~ +
Jt
~ ~· + . ... ~
-#i
•• + + +
;..,
:~fr:
+ +
r:il: .~ ...
• + ......
citK..,_
....
*...
• r.tRK
~
.J.•ORK
.~ ... y5
Jf ~·
.:··
M +
+ •
~~
..,.+M
•: + .."'· .
+IK
.... . .•++•::: +
.. ~ +
!i$+oRK oo.f'+~ OF\(t~!+
........ ...
!;.~;, .
"' ~tt• ~t·
+ ...
ht~K ~· ~A.'il· ·~ ·
:'[]
+ ti'"
. ...... - .
m
... .......·:
c c c
r.;;ß:..
+ G
+
•Jd:·tr
• + + ·...... y7
..
+
....~~
++ ~jORK
.~it· #'+..
...
ll ~+ •
tA.fo-w.•+"fttt. ~"W ··
11 .0 12.0 <18 52 56 60 4.0 4 .4 4.8 140 180 220 260
FIGURE 1.7. Track records with Western Samoa removed: scatterplot matrix of
the national records. The results for the Cook Islands (CI) , North Korea (DRK)
and Mauritius (M) are Iabelied
which were analysed using the forward search by Atkinson and Riani (2000,
p. 95-8), a simpler model is indeed obtained by use of the reciprocal trans-
formation. For such a transformation to be possible the data have to be
non-negative, which they are here. If a transformation is useful, the data
will have skew marginal distributions. Information on the correct transfor-
mation is stronger if the data cover several cycles, that is powers of ten.
However, the ratios of the value of the maximum time to the minimum
time for each race in Table 1.1 suggest that little information will be avail-
able on the correct transformation of the shorter races. It may well be
that the marathon times are the only ones for which a transformation will
have any effect on our analysis. We accordingly analyse the reciprocals of
these record times in Chapter 3 and see, in Chapter 4 , whether we can
improve on this particular transformation of the data. The importance of
16 1. Examples of Multivariate Data
Europe
Prann.&
MILAN ThePoDtHi.
Piana Emiti.ana VERONA VENI CE VENI CE
ANCONA
TABLE 1.2. The municipalities in Emilia-Romagna with the largest and smallest
populations
Municipality Unit Population
Bologna 6 404,378
Modena 159 176,990
Parma 210 170,520
Ferrara 68 138,015
Ravenna 292 135,844
Reggio nell 'Emilia 329 132,030
Rimini 121 127,960
These 28 variables were selected from 50 available. The first 13 are de-
mographic variables, the next three, y 14 - Y16 , measure housing quality, the
succeeding seven, Yl7- Yn, are measures of individual income and wealth
and the last five , Y24- Y2s, relate to industrial production. The definitions
(and translations from Italian) arenot all unambiguous. For example, y 16
is intended to distinguish dwellings with central heating from those with-
out, whether the heating to the radiators is provided centrally to a block
of flats or provided by individual households in the flats. It is also not clear
whether variables like y 25 refer to all employed in the municipality, wher-
ever they live, or to all residents of the municipality wherever they work.
However, such details would be more important if comparisons were tobe
made with similar data from other countries. Provided that the same rules
were applied in collecting the data and calculating the indices in all 341
municipalities (which may be a strong assumption) comparisons between
the municipalities are possible and interpretable in terms of these variables.
We start with a forward search through all the data. The forward plot
of the minimum Mahalanobis distance amongst units not included in the
subset is in Figure 1.11 . The very sharp spike at the end of the search is
caused by units 245 and 277 which are the two smallest municipalities in
the region. The magnitude of these distances is surprising; with 341 units it
might be expected that an individual unit would have only a small effect.
However, there are 28 variables, so that there may be directions in the
space of these variables which are very sparsely filled, hence making such
changes possible.
The very large distances associated with units 245 and 277 are so large
as to make it hard to identify any further appreciable distances in the
20 1. Examples of Multivariate Data
li5
0
10
Cl
::;:
E $f
:::>
.5
c
::E
g -
0
C\1
;=
rest of the plot. If these two units are removed from the plot, the rest of
Figure 1.11 becomes as shown in Figure 1.12. It is now clear that a number
of other, lesser, outliers are also entering at the end of the search. These
are alllisted in Table 1.3, starting with unit 277 (Zerba), the last to enter
the subset. The table therefore works downwards from the most outlying
community. The last community to be listed is the first to enter after the
local minimum in distances in Figure 1.12 at m = 324.
The results of Table 1.3 show that the search has found five (out of ten)
of the communities with populations less than 1,000. No cities have been
found to be outlying and, indeed, all the communities at the end of the
search are fairly small, although the last entry in the table, Bellaria-Igea
Marina, has a population of over 12,000 and one other has a population
of over 15,000. So the search seems not merely to have found the smallest
communities. In Chapter 3 we Iook for ways in which these units are out-
lying. Part of the challenge is that, with 28 variables, we cannot, as we did
in the two previous examples, study the scatterplot matrix and highlight
units to see in what way they are outlying. For an understanding of the
general structure of the data we need to estimate any patterns once the
outliers have been detected. For example, many of the outlying commu-
nities fall in the mountainous area in Figure 1.10, as do many other poor
communities. One possible approach, which we do not explore, is to see
whether elevation above sea Ievel can be used to predict the properties of
the communities. Instead, in Chapter 4, we investigate transformation of
1.6 Municipalities in Emilia-Romagna 21
;:!
0
::;:
E
::> ~
E
·c:
~
<X>
of men, in baroque poses, pour molten metal into moulds. The data, and
a reproduction of the bank note, are given by Flury and Riedwyl (1988,
pp. 4-8). The analysis of these data forms the central example of their book.
The six variables are measurements of the size of the bank notes:
Flury and Riedwyl (1988, p. 4) illustrate where on the bank note the
measurements are taken: the first three are measurements of paper size
and the fourth and fifth measurements from the edge of the paper to the
printed area. Only y 6 is solely on the printed area: it measures the diagonal
distance across the frame of the central illustration.
A scatterplot matrix of the data is given in Figure 1.13 in which the
two groups have different symbols. This plot already shows much of the
structure. One feature is that y 6 seems to provide an almost complete
separation between the two groups. Complete separation occurs in several
of the scatterplots, particularly clearly in that of Ys against Y6· It seems to
be easier for forgers to get the paper size right - y 1 to y 3 - than to get the
image size correct. If the distance from the top of the image, y 5 is correct,
as is shown by the overlap of the first scatterplots in the row for y 5 , the
distance from the bottom to the image, y 4 is then wrong. As the plots
of Y6 show, the forgeries tend to have too small an image. Being slightly
too small is a feature of forgeries of bronze statues made from moulds
taken from the original statue, because the bronze shrinks on cooling after
casting. However, it is not clear what causes the shrinkage in the banknotes.
A second feature of the data is that the group of forgeries does indeed not
look homogeneous. Particularly in the plot of y4 against y 6 , it looks as if
the forgeries may split cleanly into two groups.
We begin our forward analysis with a search which treats the observa-
tions as if they came from a single population. In previous examples we have
looked at plots of Mahalanobis distances. Instead we here start by monitor-
ing the parameter estimates that go into the calculation of the distances.
Figure 1.14 is a forward plot of the estimates of the individual elements of
the 6 x 6 covariance matrix from a search that starts with an initial subset
containing units from both groups. This plot is extremely stable, giving no
indication of the presence of the two groups.
Although this search fails to reveal the two groups it does reveal many
outliers. Their existence is exhibited in Figure 1.15 by the forward plot
of the minimum Mahalanobis distance amongst observations not in the
subset. This suggests that there is a well defined duster of outliers, the
24 1. Examples of Multivariate Data
.
0
~~~~J~~~==~~r=~~~~~~~ N
i!
0
~
0
!!! 0
i!
a
0
FIGURE 1.13. Swiss bank notes: scatterplot matrix of the six measurements on
200 notes. The filled circles are units in Group 1, the notes believed genuine
first of which enters at m = 180, making 21 in all, the first sharp increase
in Figure 1.15 occurring at m = 179. That there is a duster of outliers is
indicated by the partial decline of these distances as similar observations
enter. At the very end of the search the larger values for the distances are
for more remote observations. It is surprising that the parameter estimates
in Figure 1.14 are so stable. However, the plot does show an increase in
elements (4,6) and (6,6) from around m = 180. Wehave already seen in the
scatter plot that the structure of the second group is most clearly displayed
in the plot of y4 against Y6 · The last observations to enterare indeed many
of those most remote in the second group, which are most extreme in y6
and so will affect these two elements of the variance-covariance matrix.
This search fails to reveal the two groups because it starts with a subset
containing observations from both groups. The fitted model is thus centred
between the two groups. We now see what happens if we start the search
1.7 Swiss Bank Notes 25
,_, ......... ,
........... , ....... --- _, ... _,,- .... --"-"" _.... _, _____ __..,_.,._, ___ ... ___ _
....
,,, '
I
I
I
I
I
I
,..-6,6
I __ ..,
r -~-------------------------------//
I _______________________ , __ _-------·
=
~-----
.,.".----
-: . _;. . -- .... ._ .......-"!.••------ .... - - - - - --..-- .........•.. .r ••••.
s; ___=< __
'-
. . 4 ,6
FIGURE 1.14. Swiss bank notes: forward plot of elements of the covariance ma-
trix. A stable plot which does not suggest the presence of two groups
ll) ll)
o.ri o.ri
0 0
o.ri (/)
o.ri
c.
*
0
:::;: ,..:
ll) ll)
,..:
0
c: (")
:E u;
"'
0 0
,..: --' ,..:
ll) ll)
,..; ,..;
0 0
,..; ,..;
50 100 150 200 170 175 180 185 190 195 200
Subset size m Subset size m
FIGURE 1.15. Swiss bank notes: forward plot of minimum distances of units not
in the subset, left-hand panel, and a zoom taken in the last 30 steps
in one or other of the two groups. First we describe the results from a
forward search starting with the first 20 observations on supposedly genuine
notes. Figure 1.16 shows that, until m = 100, the elements of the variance-
covariance matrix are small, as the data are being fitted to a single group.
When m = 101 the search will have reached a point at which at least one
observation from Group 2 has to joint the subset. The figure clearly shows
the effect of the mingling of the two groups. As soon as units from the
26 1. Examples of Multivariate Data
"'
( --,------------------------ ~.·1
'
!'' _. . . . . .
-
--6.6
~ I :r---------------
t
(.) 1 'I
0
c"'
~/ L::--------------------------5.5
"'
E F~:~~-E~~~~.:-,.-.-:~---"··----------------"·~::-.:;;-. ·t:~
m 0 __..
~---------------------------------.:~
I .,
!i:~ '
J ·.~_.-- ---------------------- -- . . -.-s.s
6.4
l r- ---r
50 100 150 200
Subset size m
FIGURE 1.16. Swiss bank notes starting with the first 20 observations on genuine
notes: forward plot of elements of the covariance matrix. A plot which, unlike
Figure 1.14, clearly suggests the presence of two groups
second group enter, there is a rapid increase in the values of the elements,
especially those associated with variables 4 and 6. The increase is rapid
because many of the units from Group 1 leave the subset as an appreciable
number from Group 2 enter. This interchange occurs because, once the
model is fitted to units from both groups, many units in Group 1 now seem
to be outliers when judged by the common mean and variance-covariance
matrix.
Similar remarks apply to the forward plot of minimum Mahalanobis
distances in Figure 1.17. This shows that, just before m = 100, the last
few observations to join the subset are remote, as judged by the variance-
covariance matrix plotted in Figure 1.16. But, once units from Group 2,
the forgeries, enter the subset, and some from Group 1 leave, these, and
other, units are no Ionger so remote as measured using the larger elements
shown in Figure 1.16. The distances accordingly decrease. They only in-
crease again towards the end of the search as, in the main, the outliers from
Group 2 enter. The last third of Figure 1.17 is identical to the last third of
Figure 1.15, an indication of the stability of the end of the forward search
to very different starting points.
The next two plots we consider come from the complementary start of
the search solely with units from Group 2, which is less concentrated than
1.7 Swiss Bank Notes 27
L()
Cl
::;
E
..,.
::J
E
·c:
:E
"'
FIGURE 1.17. Swiss bank notes starting with the first 20 observations on genuine
notes: forward plot of minimum distances of units not in the subset. The two
groups areevident
Group 1. We can therefore expect that the evidence for two groups will be
slightly weaker in these new plots than it was in those we have just seen.
The forward plot of the elements of the variance-covariance matrix, Fig-
ure 1.18, is similar to that when the start of the search was in Group 1,
Figure 1.16, but reflects the more dispersed nature of Group 2, which is
evident in Figure 1.13. Initially the elements involving variables 4 and 6 are
larger in magnitude than they were before. The effect of including outliers
from Group 2 and observations from Group 1 also has a moregradual effect
than it did before. The forward plot of minimum Mahalanobis distances,
Figure 1.19, shows a sharp peak at m = 84, just before the first outlier from
Group 2 enters the subset. As several similar observations successively en-
ter, the distance to the next unit rapidly decreases. The search finishes as
before.
The forward plots of minimum Mahalanobis distances lead to the iden-
tification of 21 outliers. In order of entry from m = 180, these are units
50, 5, 13, 70, 194, 111, 168, 116, 138, 187, 148, 192, 162, 182, 160, 180,
161, 1, 167, 171 and 40. Of these 21 observations only 6 come from Group
1. Figure 1.20 repeats the scatterplot matrix of Figure 1.13 with these 21
outliers marked. Several of the Group 1 outliers, 1, 13, 40 and 50 show on
the plots of the first three variables. Unit 5 has a very low value of y 5 .
With the possible exception of unit 1, these all seern like clear outliers from
28 1. Examples of Multivariate Data
C\1
,'
''
~
, ''
>
; I
, ______________ ,,, ~-6,6
0
(.)
'ö
- ,, ___ ... _____ ,_.;
,,
,,'
/1
I
/-
_.",.--------------------------5.5
~~~~-=~~;;···.-:-~-----------·;. ---1:3
c"'
Q)
E
.!! 0
w
'
I
-- -·
................ ______ ..... ..,..,..- .. , . , . " ' " " ' ' ' ....... ..., _ _ _ _ _ _ _ _ _ _ _ _ .,. _ _ _ _ _ _ _ .., _ _ ..__
g-~
•
'-6.5
. . 6 ,4
FIGURE 1.18. Swiss bank notes starting with units in Group 2 (the forgeries):
forward plot of elements of the covariance matrix. The presence of two groups is
evident
"'
I()
0
:::;;:
E
:::1
E
....
·c:
~
C')
C\1
FIGURE 1.19. Swiss ba nk notes starting with units in Group 2 (the forgeries) :
forward plot of minimum distances of units not in the subset
1.7 Swiss Bank Notes 29
"...
N
.
0
.-~--~~~~~~====~~~==~~~==~~==~~ N
~ 0
..,
0
8
0
~
~
!!
::
!!
214 .0 215.S 8 9 10 11 12
FIGURE 1.20. Swiss bank notes: scatterplot matrix of the six measurements on
200 notes. The last 21 units to enter the forward search
Unlike the other chapters in t he book, this chapter contains little data
analysis. The emphasis is on theory and on the description of the search. In
the first half of the chapter we provide distributional results on estimation,
testing and on the distribution of quantities such as squared Mahalanobis
distances from samples of size n . The second half of the chapter focuses on
the forward search
We start in §2. 1 by recalling the univariate normal distribution. Sections
2.2 and 2.3 outline estimation and hypothesis testing for the multivariate
normal distribution. As we indicated in Chapter 1, forward plots of Maha-
lanobis distances are one of our major tools. Since the distribution theory
for these distances seems, to us, not to be clear in the literature, we devote
§2.4 to §2.6 to deriving the distribution using results on the deletion of
observations. As a pendant, in §§2. 7 and 2.8, we derive this distribution
first using an often quoted result of Wilks and then for regression with a
multivariate response. The subject of the following three sections is also re-
gression. In §2.9 we introduce added variables which provide useful results
for t ests for transformations. These results are applied in §2.10 to the mean
shift outlier model to provide an alternative derivation of deletion results
which is useful in the analysis of spatial data, Chapter 8. This part of the
chapter closes in §2.11 where we outline seemingly unrelated regression, a
simplification of the results for multivariate regression when each model
contains the same explanatory variables.
A general discussion of the forward search is in §2.12. Three aspects
of the search require special attention: how to start, how to progress and
what to monitor. These three are treated in detail in §§2.13 a nd 2.14. The
32 2. Multivariate Data and the Forward Search
the sample mean. The sum of squares about the sample mean
n n n
S(p,) = L(Yi- P,) 2 = L(Yi- Y) 2 = LYl- nf)2 ,
i=l i=l i=l
s 2 = S(P,)/(n- 1),
if 2 = S(p,)fn.
2.1 The Univariate Normal Distribution 33
Obviously
s2 = _n_a2
n-1
An alternative way of writing the distribution of the Yi is
where now the independent errors €i ,....., N(O, a 2 ). These errors are estimated
by the least squares residuals
(2.1)
(2.2)
(2.3)
it follows that
n n
LdT = :Le7/s2 = n -1,
i=l i=l
so the dr
must have a distribution with a limited range. The results of
Cook and Weisberg (1982, p. 19) show that this distribution is, in fact,
34 2. Multivariate Data and the Forward Search
a scaled beta. In §2.6 we obtain the related result for the multivariate
Mahalanobis distance. But a couple of preliminary distributional results
for the univariate case are helpful.
If both J-L and f7 2 are known
(2.5)
(2.6)
r
The maximum likelihood estimator of the vector f.t is now
[< ~ ~ (t•ufn,
fj ,t···fn
Alternatively, if J is an n x 1 vector of ones with Y, as before, n x v,
(2.9)
E = Y-Y
Y-Jp7
= Y- JJTY/n
(I- H)Y =GY,
where H = J JT / n. As a result
(2.13)
and
CJ = J- JJT Jjn = J- J = 0,
2. 2. 3 Estimation of :E
The maximum likelihood estimator of "E is (Exercise 2. 7)
f: = S(fl)/n, (2.15)
DJ.L=C, (2.18)
Let the null hypothesis (2.18) be that J.L = /LO· The residuals under this
hypothesis are E 0 yielding via (2.12) a maximum likelihood estimator f:: 0
of I:. The maximised loglikelihood (2.19) becomes
(2.23)
where
1 g
~ I:vl"tul
1=1
2v 2 + 3v - 1 (
r=1-~
6(v + 1)(g- 1)
I:v -v-
!= 1
9
1- 1 1) . (2.24)
V! = n1 - 1 and v = '2:: V! = n - g
!=1
2.4 The Mahalanobis Distance 39
for a model in which only a constant J.L is fitted to each mean. Further
degrees of freedom are lost if the covariance matrices are calculated from
residuals from regression (§2.8).
With this result on the test of equality of covariance matrices we have
the results we need on estimation of J.L and E and for testing hypotheses
about their values. All are based on aggregate statistics summed over the
data. One use of the forward search is to see how these quantities vary
as we increase the number of observations in the subset. We shall look
at forward plots of several test statistics, particularly, in Chapter 4 on
transformations. But now we consider some statistical properties of the
Mahalanobis distances for individual observations.
(2.25)
(2.26)
or its square root di, in which both the mean and variance are estimated.
As was argued above for the squared scaled residual (2.4), the distribution
of this squared distance is affected by the lack of independence between Yi
and the estimators of f.L and E .
We obtain the distribution of the squared Mahalanobis distance in two
steps using the deletion of observations. If f.L and E are estimated with ob-
servation i deleted, the results of (2.26) and (2.27) indicate that the deletion
distance will follow an F distribution. We find an expression for the squared
distance (2.28) as a function of this deletion distance and then rewrite the
F distribution as a scaled beta to obtain the required distribution. We start
with deletion results.
d2(i) = ( Yi- A
f.L(i)
)T~-1 (
u(i) Yi- A
f-L(i)
) T ~-1
= e(i)u(i)e(i) · (2.29)
(2.31)
2.5 Some Deletion Results 41
We also need the residual for any other observation Yl, when Yi is ex-
cluded. In a similar manner to (2.30)
When observation i is deleted, the residuals change and one term is lost
from the sum. Then
n
S(jl)jk(i) L elj(i)elk(i)
I,.Oi=1
n
L {elj + eij/(n- 1)} {elk + eik/(n- 1)}
I,.Oi=1
n
L eljelk- neijeik/(n- 1),
1=1
(2.35)
(2.36)
(2.37)
so that
BC-BxxT =I. (2.38)
Postmultiplication of (2.38) by c- 1 and x, followed by rearrangement leads
to
(2.39)
Substitution for Bx in (2.38) together with postmultiplication of both sides
by c- 1 ' leads to the desired result
(2.40)
It is convenient to write
s-l (P)(i)
(2.44)
Then
(2.45)
2.6 Distribution of the Squared Mahalanobis Distance 43
Finally we combine the definition of dzi) (2.29) with that of 'Eu(i), the
unbiased deletion estimator of :E to obtain
(n- 2)n2 T - l ,
(n _ 1) 2 ei S (J.L)(i)ei
(n- 2)n 2 dr (2.46)
(n- 1)3 1- ndTf(n- 1)2 ·
(2.4 7)
2 , v(n-2)
di (J.L, :Ev)"' Fv ,n-v-1· (2.48)
n-v-1
But the squared deletion distance is a quadratic form in Yi - Y(i) whereas
in (2.26) we have a quadratic in Yi - J.L. But, as in (2.3), the variance of
Yi- Y(i) is nj(n- 1) timesthat of Yi- J.L. The distribution of the deletion
Mahalanobis distance is then given by
2 n v(n-2)
d(i) "' (n- 1) (n- V - 1) Fv,n-v-1· (2.49)
x~/v
Fv,n-v-1 = 2
Xn-v-l
/( _
n V
_ 1) ·
where, again, the two chi-squared variables are independent. It then follows
from (2.47) that the distribution of the Mahalanobis distance is given by
d 2 "" (n - 1 )2 X~
2 2 . (2.51)
t
n Xv + Xn-v-l
We now apply two standard distributional results. The first is that
Xv =
2 r (LI2' 21) .
The second is that if X 1 and X2 are independently Gamma(p, A) and
Gamma(q, A), then XI/(XI + X2) "" Beta(p, q) . We finally obtain
2
d . ""
•
(n-1)2
n
(v n-v-1)
Beta -
2' 2 ·
(2.52)
(2.54)
(2.56)
tv
~
t:l
~
C1)
...,
ss·
TABLE 2.1. Summary of distributional results for the squared Mahalanobis distances used in this book; y; is a v x 1 vector of responses
from Nv(J.L, E), x; is a p x 1 vector of regressionvariables and B is a p x v matrix of regression parameters ~
[ll
0
.....,
Reference J.L E Distribution 9.
[ll
(2.25) known known X~ "Ci
C1)
...,
(2.26) and (2.27) known unknown [ll
c;·
estimated independently of Yi T 2 (v, v) = 11 ~,;'+1 Fv,v-v+l :::1
on v degrees of freedom ~
~
...,
Exercise 2.4 known unknown (v = 1) (n - 1)Beta U, n 2:..!.) §'
o- 2 estimated by s 2 [ll
_n_vj_n-2Jp
(2.49) unknown unknown (n-1) n-v-1 v,n-v -1
~p,.
rn
(deletion distance) estimated by Y(i) estimated by Eu( i) ..0
.::
unknown unknown (n -1)•B t (!!. n - v -1)
(2.52) and n ea 2' 2 ~p,.
Exercise 2.5 estimated by jl estimated by Eu for v = 1 the distribution of s:::
the squared scaled residual (2.4) ~
(2.79) J.L = ßT Xi
unknown unknown (n- p)(1- hi) Beta(~ , ~) e:.
~
estimated by ßT Xi estimated by Eu 0
s:
[ll
where here xT is one of the rows of X and it is assumed that (XT x)- 1
exists (Rao 1973, p. 32). We now apply this relationship to the matrix of
residuals.
We recall (2.35)
Next we recall that ifthe random variable X ,..._,Beta(o:, ß), 1-X ,..._,Beta(ß, o:)
when, invoking the notation of (2.42), we obtain
which is (2.52).
2.8 Regression
In many of the examples in this book the data, perhaps after transforma-
tion and the removal of outliers, follow the multivariate normal distribution
(2.7) in which each observation Yij on the jth response has mean /-Lj· How-
ever, in some examples, the mean has a regression structure. The simplest,
considered in this section, is when the regressors for each of the v responses
are the same. Then (2.14) becomes
of E. This is the subject of the next section. Here, with a common X, each
estimate /Jj is found from univariate regression of yc1 on X, that is
' T -1 T
ßj =(X X) X Yc1 . (2.59)
so called because the matrix of fitted values Y = HY. The ith diagonal
element of H is
(2.64)
The theorems relating to the Wishart distribution of the matrix of the sum
of squares and products are analogaus to those in §2.2.2 with the matrix
C (2.13) replaced by the symmetric idempotent matrix I- H.
The maximum likelihood estimator of E is basically unchanged,
f: = S(/3)/n, (2.65)
Foramodel in which only the mean is fitted, hi = 1/n and (2.67) reduces
to (2.35). The unbiased deletion estimator of E for the regression model is
The standard result for the deletion estimator /J(i) in regression (for exam-
ple (2.94) or Atkinson and Riani 2000, p. 23) shows that
AT
Yi- ß(i)Xi = ei/(1- hi) · (2.69)
In the deletion distance E is estimated with n - p - 1 degrees of freedom.
Then the distance with known mean dT(J.t, En-p- 1 ) in (2.26) has a scaled
F distribution
2 v(n-p-1)
(2.70)
A
The next stage in the argument is to find the relationship between the
distance dT and the deletion distance. As before, let C = ET E . Then
Eu= Cj(n- p)
and the squared Mahalanobis distance is
dr = (n- p)e[c- 1 ei. (2.72)
If now we write
e[C- 1 ei=dTf(n-p) as 9i and a=1/(1-hi), (2.73)
application of (2.44) leads to
T -1 dr
(2.74)
A
The combination of this result with the definition of the unbiased deletion
estimator Eu(i) in (2.36) tagether with the residuals ed (1- hi) (2.69) yields
the required relationship
2 (n-p-1) dr
(2.75)
d(i) = (1- hi) 2 (n- p) 1- dTf {(n- p)(1- hi)}'
2.9 Added Variables in Regression 49
which reduces to (2.46) when the linear model contains just a mean, that is
when p = 1 and h i = 1/n. Theinversion of this relationship again provides
a n expression for df as a function of the squared deletion distance
(2.76)
d2 ,...., n- p-1 X~
(2 .77)
C•) 1 - h' xn2 - v-p ·
It now follows from (2.76) that the distribution ofthe Mahalanobis distance
is given by
2
dfrv(n-p)(1-hi) 2 X~ (2.78)
Xv + Xn-v - p
Finally, we again use the relationship between beta and gamma random
variables employed in §2.6 to obtain
so that the range of support of the distribution of df depends upon hi. For
some balanced experimental designs, such as two-level factorials , all hi are
equal when all n observations are included in the subset and so L hi = p,
when each hi = pjn. Then the distribution (2.79) reduces to
(2.82)
and
(2.83)
If the model without "' can be fitted, (XT x)- 1 exists and (2.82) yields
(2.84)
(2.85)
2.10 The Mean Shift Outlier Model 51
Calculation of the t est statistic also requires s~ , the residual mean square
estimate of a 2 from regression on X and w , given by (Atkinson and Riani
2000, eq. 2.28)
(2.40) . In this section we sketch how the mean shift outlier model can be
used to obtain deletion results for the more general case of regression, using
the relationships for added variables derived in the previous section. The
standard results for deletion in regression are summarized, for example, by
Atkinson and Riani (2000, §2.3) .
Formally the model is similar tothat of (2 .81). We write
where the n x 1 vector q( i) is all zeroes apart from a single one in the
ith position and 4> is a scalar parameter. Observation i therefore has its
own parameter and, when the model is fitted, the residual for observation
i will be zero; fitting (2.91) thus yields the sameresidual sum of squares as
deleting observation i and refitting.
To show this equivalence requires some properties of q(i) . Since it is a
vector with one nonzero element equal to one, it extracts elements from
vectors and matrices, for example:
(2.93)
If the parameter estimate in the mean shift outlier model is denoted {3q , it
follows from (2.84) that
(2.94)
Comparison of (2.94) with standard deletion results shows that /3q = /3(i)>
confirming the equivalence of deletion and a single mean shift outlier.
The expression for the change in residual sum of squares comes from
(2.89) . If the new estimate of CJ 2 is s~ we have immediately that
E(y) = Xß + Qcp,
with Q a matrix that has a single one in each of its columns, which are oth-
erwise zero, and m rows with one nonzero element. These m entries specify
the observations that are to have individual parameters or, equivalently,
are to be deleted.
(2.96)
where yci is the n x 1 vector of responses (jth column of matrix Y). Here X 1
is an n x p matrix of regression variables, as was X in (2 .57), but now those
specifically for the jth response, and ß1 is a p x 1 vector of parameters. In our
applications we do not need the more general theory in which the number
of parameters p1 in the model depends upon the particular response. The
extension of the theory to this case is straightforward, but is not considered
here.
Because the explanatory variables are no Ionger the same for all responses
the simplification of the regression in §2.8 no Ionger holds: the covariance :E
between the v responses has to be allowed for in estimation and independent
least squares is replaced by generalized least squares. The model for all n
observations can be written in the standard form of (2.57) by stacking the
equations und er each other. In this form the model is that for a vector of
nv observations on a heteroscedastic univariate response variable and the
vector of parameters ß is of dimension pv x 1. If we let \11 be the nv x nv
covariance matrix of the observations, generalized least squares yields the
parameter estimator
(2.97)
(2.98)
In all there are nv observations. When the data are stacked the covariance
matrix W is block diagonal with n blocks of the v x v matrix ~. As a result
of the block diagonal structure the calculation of the parameters ß can be
achieved without inversion of an nv x nv matrix.
There are p* = vp parameters to be estimated. Let X* be the n x p*
matrix of explanatory variables formed by copying each column of the Xj
in order - first all the elements of the first column of each Xj, then all the
second columns and so on up to the last columns of each Xj. The elements
of X* are xij. If ß* is the p* x 1 vector of parameters, calculation of the
least squares estimates requires the p* x p* covariance matrix W. Let J be
a vector of ones of dimension n x 1. Then
(2.99)
"'n * w-1 *
L..i=1 xij jk xik
(2.100)
'\;"""'n '\;"""'V
L..i=1 uk=1 xij
* w-1
jk Yik·
t
1. Obtain 0 , an estimate of ~' from the independent regressions as in
(2.61), but with Ycj regressed on Xj.
Much of the emphasis so far in this chapter has been on the distribu-
tion of the statistics we have calculated, particularly the Mahalanobis dis-
tances. However, such results are not available for the seemingly unrelated
regression procedure of this section. U nder the assumption of normally dis-
tributed errors, the estimate in ß from generalized least squares in (2.97)
2.12 The Forward Search 55
has the normal distribution. But with \ll estimated from the data, the dis-
tribution is not readily determined. If the exact distribution is important,
recourse may be had to simulation. But, we use the asymptotic results
which apply when \ll is known.
that such an event is unusual , only occurring when the search includes one
unit that belongs to a duster of outliers. At the next step the remaining
outliers in the duster seem less outlying and so several may be included at
once. Of course, several other units then have to leave the subset.
Remark 1: The search starts with a robustified estimator of p, and :E
found by use of a bivariate boxplot. Let this estimator of p, be flo and let
the estimator at the end of the search be fl~ = fl. In the absence of outliers
and systematic departures from the model
E(P,0) = E(P,) = p,;
that is, both parameter estimates are unbiased estimators of the same quan-
tity. The same property holds for the sequence of estimates p,;,_. produced
in the forward search. Therefore, in the absence of outliers, we expect esti-
mates of the mean to remain sensibly constant during the forward search.
However, because of the way in which we select the observations for in-
clusion in the subset, those with smaller Mahalanobis distances will be
selected first. As a result the estimate of :E, unlike that of p,, will increase
during the forward search. Therefore, unless outliers are present, the dis-
tances di:;, will trend steadily downwards during the search. The use of the
scaled distances defined in ( 2.104) overcomes this tendency. A comparison
of plots of scaled and unscaled distances is in Figure 2.5.
Remark 2: Now suppose there are k outliers. Starting from a clean subset,
the forward procedure will include these towards the end of the search,
usually in the last k steps. Until these outliers are included, we expect
that the conditions of Remark 1 will hold and that plots of Mahalanobis
distances will remain reasonably smooth until the outliers are incorporated
in the subset used for fitting. The forward plot of scaled distances for
the data on municipalities in Emilia-Romagna, Figure 3.24, is a dramatic
example in which the pattern is initially stable, but changes appreciably at
the end of the search when the two gross outliers enter.
Remark 3: If there are indications that the data should be transformed,
it is important to remernher that outliers in one transformed scale may
not be outliers in another scale. If the data are analyzed using the wrong
transformation, the k outliers may enter the search weH before the end.
The search avoids the initial inclusion of outliers and provides a natural
ordering of the data according to the specified null model. In our approach
we use a robust starting point combined with unbiased estimators during
the search that are multiples of the maximum likelihood estimators. The es-
timators are therefore fully efficient for the multivariate normal model. The
zero breakdown point of these estimators is an advantage for the forward
search. The introduction of atypical infl.uential observations is signalled by
sharp changes in the curves that monitor Mahalanobis distances and test
statistics at every step. In this context, the robustness of the method does
not derive from the choice of a particular estimator with a high breakdown
point, but from the progressive inclusion of units into a subset which, in the
58 2. Multivariate Data and the Forward Search
first steps, is outlier free. As a result of the forward search, the observations
are ordered according to the specified null model and it becomes clear how
many of them are compatible with a particular specification. Our approach
enables us to analyze the inferential effect of the atypical units ( "outliers")
on the results of statistical analyses.
Remark 4: The procedure is not sensitive to the method used to select
an initial subset; even if outliers are included at the start they are often
removed in the first few steps. For example, two forms of robust bivariate
boxplot are described in the next section, either of which can be used to
provide an initial subset. For speed of calculation we use the less robust.
Although the first steps of the search may depend on which of the two
methods is used to find the initial subset, the later stages are independent
of it. What is important in the procedure is that the initial subset is either
free of outliers or breaks the masking of outliers which are masked in the
complete set of n observations. The removal of outliers is visible in some
searches where there are sometimes numerous interchanges in the first few
steps. Examples in which the search recovers from a start that contains
outliers include Exercise 3.4 and Figure 7.20. An example for spatial data
in which the search recovers from a start that is not very robust is given
by Cerioli and Riani (1999).
(d) •
• •
•
•
N
•
0
FIGURE 2.1. Logged babyfood data, Yl and y2 : the first three convex hulls con-
taining respectively 7, 7 and 5 points. Panel (d) shows the B-spline fitted to the
50% hull of five points and the robust centre, marked almost coincident with +
an observation
that the fitted spline contains only seven data points, since one observation,
with Coordinates 5.15 and 5.47, lies inside the 50% hull, but outside the
spline.
Step 2 The Robust Centroid. We find a robust bivariate centroid
using the componentwise arithmetic means of the Observations inside the
inner region defined by the fitted spline. In this way we exploit both the ef-
ficiency properties of the arithmetic mean and the natural trimming offered
by the hulls. This mean of the values of logged y 1 and logged y 2 is marked
with a cross in Panel (d) of Figure 2.1. This cross gives the appearance of
being near the centre of the nearly elliptical spline.
A useful requirement of estimators of location is affine invariance (for
example Woodruff and Rocke 1994) ensuring that different rescalings of
the individual variables leave the estimator of location unchanged. If we
require such a property of our estimator we need to take the mean of the
observations over the convex hull, rather than over the fitted B-spline.
References to other ways of finding robust bivariate centres are given at
the end of the chapter.
2.13 Starting the Search 61
2.5 3.0 3.5 4 .0 4.5 5~ 5.5 6.0 6.5 7.0 2.5 3.0 3.5 4.0 4 .5 5.0 5 .5 6 .0 6 .5 7.0
FIGURE 2.2. Logged babyfood data, Yt and y2: scaling the convex hull. The
resulting 99% hull indicates four outliers
0
0
"
0
0
0
0
0
<0
d •
..
0
0
0
0
0 •
• •
•
800 1000 1 200
•
400 600 800 1000 1 200
FIGURE 2.3. Untransformed babyfood data, Yl and y2: four convex hulls have
to be peeled to obtain the 50% hull as opposed to three for the transformed data
in Figure 2.1
data. Our description follows Riani and Zani (1997) who use a version of
the "quelplot" of Goldberg and lglewicz (1992).
The robust centroid of the ellipse is found as the componentwise median
of the two variables in the scatterplot. Let this be P,. The shape of the con-
tours is based on a covariance matrix in which the univariate medians are
used, but which is otherwise calculated in the usual way. That is, the mean
in (2.10) is replaced by iJ, to give a 2 x 2 matrix with elementsproportional
to
n
Sjk(il) = L(Yij- P,j)(Yik- ilk).
i=l
0
0
OCl
...
0
0
0
0
-
0
0
0
OCl
0
0
<D
0
...
0
0
0
N
0
0 200 400 600 800 1000 1 200
FIGURE 2.4. Untransformed babyfood data, YI and y2: the scaled 99% convex
hull indicates seven outliers
The relationship between () and the scaling c follows from the approxi-
mate F distribution for Mahalanobis distances defined in (2 .102). For ex-
ample, a value of () = 1 corresponds to the 61.8% point of the F distribution
on 2 and 25 degrees of freedom. Usually we have to use smaller values to
obtain a sufficiently small value for m 0 . An example showing the variation
of m 0 with () is in §4.2.
The value of m 0 is not critical. It should be small enough so that the
initial subset contains no masked outliers, but large enough that the initial
stages of the search are fairly stable, apart from any initial interchanges.
For examples in which we are fitting multivariate models without any struc-
ture in the means, a value around 2v is often suitable. The procedure is
generally robust to the choice of the value of mo and allows us to start
with a somewhat larger subset if the percentage of contamination permits.
Since the method does not involve complicated iterative procedures, there
is no computational burden in finding the starting point. As the size of the
initial subset can easily be increased or decreased by changing the value
of () , we usually try several values and check whether the last third or so
of each search from the various starting points is the same. As we have
seen, it is often towards the end of the search that we obtain information
about unsuspected structure and outliers for observations basically from
a single normal population. However, if there are several populations, as
in the Swiss bank note data, the earlier parts of the search are also infor-
mative. For example, Figure 3.30 will show the effect of two populations
around m = 100 when n = 200. Larger initial subsets than 2v are required
for models in which there are more than v parameters to be estimated, for
example when we are determining transformations.
We find the initial subset from the intersection in all v( v- 1) /2 bivariate
scatterplots and v univariate boxplots of units within the contour specified
by (). This subset will exclude any observations which are outlying in one
or two dimensions. However it will not exclude observations that are not
outlying in one or two dimensions but are outlying in three or more. Al-
though it is not difficult to construct such observations, they seem to be
rare in practice. Any problern they might cause can be simply reduced by
decreasing the value of (). However, in general, even if one or two have been
included in the intimal subset, they are detected in the early stages of the
search, their large Mahalanobis distance causing them to leave the subset.
We do indeed sometimes observe several interchanges in the first two or
three steps of the search. All that we require is that the construction of the
initial subset reveals outliers which are masked in the whole data set. They
do not need to be excluded from the initial subset, merely to be unmasked
in it.
66 2. Multivariate Data and the Forward Search
are scaled by the square root of the estimated covariance matrix. If we had
independent observations with constant variance a 2 , "E would be a diagonal
matrix and
, ) l/2v
d* X ( /~um/ (2 .104)
•m /"Eun/ '
,--.1
50 100 150 200
Subset size m
.,.,
N
0
"'Cl>
0
N
c
ill .,.,
'ö
~0
c
"'
"iij
~ ~
"'
::;
.,.,
0
.---
50 100 150 200
Subset size m
FIGURE 2.5. Swiss bank notes, starting with the first 20 observations on genuine
notes: forward plots of Mahalanobis distances - upper panel scaled and, lower
panel, unscaled
are somewhat less stable than scaled residuals in regression. This is not
surprising since the regression structure means that the residuals fl.uctuate
much less than those here from a structureless sample. We discuss the upper
panel in some detail in Chapter 3 as Figure 3.30.
68 2. Multivariate Data and the Forward Search
and
U[m+1J ,m-1 , . . . , U[nJ,m-1 ~ sim).
To move to the new subset S~m+l) we form the n distances di;. and
order them. It is not certain that all the units which were in S~m) will be
in S~m+ 1 ). There are three cases which need tobe distinguished :
1. Normal Progression. If
the next unit to join will be U[m+ 1Jm which ~ S~mJ with distance
d(m+1)m;
2. Inversion. Now suppose that
Then U[m+l)m will remain in the subset. But there must be a unit,
say UNEW ~ S~m) for which
dNEW,m s; d(mJm·
This unit will join the subset while U[m+l]m will remain in the subset.
The minimum distance among units not in the subset will obviously
be dNEW,m s; d(mJm ;
3. Interchange. An interchange occurs when two or more new units
enter the subset, when one or more must leave. Instead of the one new
unit U N EW when inversion occurs we have a set SN EW , containing
at least two members, such that
iESNEW if di,ms;d(m+l)mni~S~m)_
2.14 Monitoring the Search 69
Then the minimum distance among units not in the subset can be
written as
dNEW,m = mind7,m i E SNEW·
To obtain an upper bound for this distance let the number of units
in SNEW be nNEw(2: 2). Then
*
max di,m ,;• E S*(m),
the largest distance among units in the subset. For normal progression this
will be d(mJm· As we have seen above, for inversion the distance is d(m+l]m;
it will be larger than this when an interchange occurs . The forward plot
of this largest distance will show a peak when the first outlier is included.
The peak is therefore one step later than it is for the preceding plot of the
smallest distance not in the subset. The largest distance is monitored up
to step n.
In general, when there is one outlier, the size of the peak in the plot
of the largest distance amongst units in the subset is smaller than that
in the plot of the smallest distance among units not in the subset. This
arises because dj;, for units not belonging to the subset has an unbounded
distribution, wh~reas that for the maximum over units in the subset is the
maximum of m scaled beta distributions.
"Gap" Plot. The forward plots of the minimum and maximum dis-
tances trend upwards, which can sometimes obscure interpretation. In the
gap plot we look at the difference of the two preceding quantities, that is
(2.105)
where both distances are calculated using the same subset of size m. If
there is an inversion, an upper bound on the value is
d(mJm- d(m+l]m'
the negative of the value for normal progression. The bound is even more
negative if there is an interchange, the magnitude depending on the value
of nNEW· We plot both the true difference (2.105), which can be negative
and the difference in order statistics (2.106), which is always positive.
70 2. Multivariate Data and the Forward Search
<D
U)
0
::;;
E
::I
....
E
·c:
~
"'
C\1
FIGURE 2.6. Swiss bank notes starting with the first 20 observations on genuine
bank notes: forward plots of, dotted line, minimum distances of units not in the
subset and, solid line, of the ordered distance dim+I]m · There is an indicat ion of
appreciable interchange around m = 100
As an example of these plots, Figure 2.6 shows the forward plot of the
minimum distance amongst units not in the subset for the Swiss bank note
data, which we have seen in Figure 1.17, together with the forward plot of
d[m+l]m. These two are the same for much of the search, but different at the
beginning and around m = 100. Both regions in which some interchanges
are occurring. The earlier one is associated with instability at the b eginning
of the search. The later reflects the interchanges which occur as units from
the group of forgeries start to enter the subset, with the appreciable change
in covariance matrix and distances that we saw in Figures 1.16 and 2.5.
Covariance Matrix. The estimate of the covariance matrix :E does not
remain constant during the forward search as observations are sequentially
selected that have small Mahalanobis distances. To see how the variance is
increasing we can look at forward plots of the ratios
(2.107)
(2.108)
where e[kJ (b) is the kth ordered squared residual. In order to allow for
estimation of the parameters of the linea r model the median is taken as
h
Sh(b) = L e[iJ(b), (2.110)
i= l
for some h with [(n + p + 1)/2] :::; h < n. The rate of convergence of LTS
estimates is n- 112 as opposed to n- 113 for LMS. But, for the moderate sized
datasets of the size considered in Atkinson and Riani (2000), the largest
having 200 observations, there seems to be little difference in the abilities
of the two methods to detect outliers and so to provide a clean starting
point for the forward search.
2.16 Further Reading 73
- (k)
m(k ) - S*l n ... n s*V
(k)
. (2.111)
We start with k = m 0 and increase k until the first time when there are
at least m 0 common units in the intersection. These units form the initial
subset.
The v forward searches order the observations by their closeness to the
fitted univariate models. If there were no interchanges during the search, we
would have a single list of the order in which observations on each response
enter the subset and S~';) would consist of the first m units on this list.
However, when there is an interchange, some units leave Bi';), and it is not
true that
s<m)
*J
c s<~+l)
*J .
The lists ofunits used in (2.111) to calculate m(k) therefore need to include
information on units which leave the subsets as the search progresses in
addition to those which enter.
Once we have an initial subset of m 0 units, the search progresses much
as it did in the absence of regression in §2.12. Given S~m) individual regres-
sions are fitted to the v responses. From the parameter estimates we can
calculate the n x v matrix of residuals with elements eijm and so the set
of n squared Mahalanobis distances di;,. The search moves to dimension
m + 1 by selecting the m + 1 units with the smallest squared Mahalanobis
distances, the units being chosen by ordering the squared distances di;,,
i = 1, ... ,n.
2.17 Exercises
In all exercises y is a v-variate random variable with E(y) = f.L and cov(y) =
E. Unless otherwise stated, the normality of y may also be assumed.
Exercise 2.1 Show, without assuming normality, that E{(y- f.L)TE- 1 (y-
f.L)} = v.
Exercise 2.2 Show that the variance of the residual ei (2.1} is as given in
§2.1.2. What distributional assumptions did you make?
Exercise 2.3 The distribution of the sum of squares and products matrix
S(fl) (2.10} depends on the projection matrix C (2.13}. Show that C is
symmetric and idempotent and prove the resuZt claimed at the end of§ 2.2.2.
Exercise 2.4 When f.L is known, the squared Mahalanobis distance dT(f.L, f:..,)
is defined in (2.26}. Derive the distribution of this quantity when v = 1 and
f:.., = s2.
Exercise 2.5 Find the distribution of the scaled squared residual about the
mean, which is called dT in (2.4}.
fl=f}= (
n n
8Yil/n, ... , 8 yiv / n
)T
and
f: = S(fl) / n;
3} the maximised multivariate normalloglikelihood is given by (equation 2.19}
-(n/2) log l2nf:l - nv/2.
Exercise 2.8 Find the form of the matrix D when the test of equality of
the v m eans is formulated as Df.L = c (equation 2. 18}. What is the row rank
of D ?
2.17 Exercises 77
Exercise 2.9 In order to test Ho : J.l = J.lo versus H1 : J.l =f. J.lo the usual
test statistic is
The quantity T 2 has Hotelling 's T 2 distribution with dimension v and de-
grees of freedom n- 1. We reject Ho if T 2 ~ T~ v n-l and accept Ho oth-
erwise. Show the connection between T 2 and th~ 'corresponding likelihood
ratio test ( equation 2. 20) .
Exercise 2.14 The explanatory variables in the first 16 rows of the baby-
food data have coded levels of 1 and -1. The experimental design is a 2 5 - 1
fractional factorial. If x 5 is omitted, the design is a full 2 4 factorial. Suppose
a first-order model, including a constant term, is fitted to the results of a
full 2k factorial experiment. Calculate the values of the leverage measures
hi (equation 2.64) and confirm that the value of the sum of the leverage
measures L:~ 1 hi agrees with the resuZt you found in Exercise 2.13.
H ow does your answer change when some interaction terms of the form
XiXj are included in the model?
Exercise 2.15 Figure 1.16 in Chapter 1 is a forward plot, for the Swiss
bank note data, of the elements of the estimated covariance matrix for a
search starting from 20 observations on genuine notes. The left panel of
Figure 2. 7 is a forward plot of the determinant of this matrix. The right
panel shows the trace. Relate these two figures to one another and give rea-
sons for the difference between the two panels of Figure 2. 7. What different
features of the data are revealed by the two panels?
78 2. Multivariate Data and the Forward Search
~ ~
{
Q) Q)
0 0
c: <D <D
"'c
l~
0 <I>
0
0
~
<I> 1-
-a; 0~ ~
~
0 0
C\1 C\1
0 0
0 0
0 0
50 100 150 200 50 100 150 200
Subset size m Subset size m
FIGURE 2.7. Swiss bank notes starting with the first 20 observations on genuine
bank notes : forward plots of the estimated covariance matrix; left panel, the
determinant and, right panel, the trace
2.18 Salutions
Exercise 2.1
Exercise 2.2
var(yi- Y)
var(yi) + var(y) -
2cov(yi , Y)
2 2 n
<7 2 + 5?:_ - -cov(yi, LYi)
n n i=l
2 (j2 2<72
(j +n - -n-
n -1 2
--(j
n
Exercise 2.3
A matrix C is symmetric when C = er. We have that
Note that
S(fl) = yrcy
Exercise 2.4
We have to find the distribution of
which be rewritten as
(2.112)
(yi-tJ) 2
"'
(
n
_ 1)
2
xi 2
s
2
X1 + Xn-1 ·
Given that the XI in the denominator is independent of the X~- 1 , the
resulting distribution is Beta, that is
Exercise 2. 5
Equation (2.4) can be rewritten as
80 2. Multivariate Data and the Forward Search
Now, given that (n -1)s 2 can be decomposed as the sum of two quantities,
n n
(n -1)s 2 = ~)Yi- Y) 2 = n: 1 (Yi- Y) 2 + L (yj- Y(i)) 2,
i=I j#i=I
(2.113)
(n- 1) 2 xi
2 2 .
n XI+ Xn-2
We now have to prove that the xi which appears both in the numerator
and the denominator
n (Yi- Y) 2 2
-- rv XI
n- 1 u2
The proof we give has two steps. First we write the x2 variables as idem-
potent quadratic forms. Then, we show that the product of the matrices
of the two quadratic forms is equal to zero so we conclude that the two
random variables are independent. The numerator of equation (2.113) can
be rewritten as
_n_(Yi- Y)2 = yTQiy ,
n-1
where y = (YI, ... , Yn)T, QI = n':...I q(i)q(i)T (In- J JT /n), and q(i) is a vec-
tor which has a 1 in ith position and 0 elsewhere: q(i) = (0, ... , 0, 1, 0 . .. , O)T.
QI is symmetric and idempotent with trace (rank) equal to 1. On the other
hand,
n
L (yj- Y(i))2 = yTQ2y,
#i=I
Exercise 2.6
We now require the distribution of
(2.115)
where
Since
From (2.95)
(n- p)s 2 = (n- p- 1)szi) + e7 /(1- hi)·
The residual sum of squares (n- p -l)szi) "'a 2 x~-p-l independently of
Yi and of {3(i)· But, from (2.94),
Exercise 2. 7
Since the Yi 's are independent (because they arise from a random sample)
82 2. Multivariate Data and the Forward Search
the likelihood function (joint density) Lik(J.L, :E; y) is the product of the
densities of the Yi 's
n
Lik(J.L,:E;y) = Ilf(YiiJ.L,:E)
i=1
IT
n 1 T 1
i=l (J27r)vi:Eil/2 exp{-(Yi- J.L) :E- (Yi- J.l /2}
)
To find the maximum likelihood estimator for f.l we differentiate L(J.L, :E; y)
in (2.121) with respect to f.l and set the resulting expression equal to 0:
which gives
[1 = fj.
It is clear that [1 = fj maximizes log L(J.L, :E; y) with respect to JJ because
the last term in (2.121) is :::; 0 and the term vanishes for [1 = fj. Before
differentiating log L(JJ, :E; y) to find t, we substitute JJ = fj in (2.121) and
rewrite log I:EI in terms of :E- 1 to obtain
(2.122)
otr~~B) = B + BT - diag(B)
and that
We obtain
oL([l, :E; y) n . 1 .
= + n:E- 2d1ag(:E)- S(J.L) + 2"diagS(J.L) = 0,
A A
a:E- 1 -0 (2.123)
whence
A 1 A 1 1
:E- -2 diag(:E) = -{S(P,)- -diagS(P,)}
n 2
or
t = S([l) .
n
Note that we solved (2.123) for :E rather than :E- 1 , even though we differen-
tiated with respect to L; - 1 . Otherwise we would have obtained { S([l)/n} - 1
as the maximum likelihood estimator for L; - 1 . Wehave exploited the prop-
erty of invariance of maximum likelihood estimators.
For part 3) of the exercise, we have from equation (2.122) that the log-
likelihood maximized with respect to [1 and is t
L([l, t; y) -nv log J2;n
2n + 2 1og I:E-
'1 1
I- 2 '1
tr(:E- S([l))
n , 1 nv
-nv log v .t.?r + -log I:E- 1- -
~
2 2
n v n , nv
--log(2n) - -logi:EI--
2 2 2
n , nv
--log l2n:EI--
2 2"
84 20 Multivariate Data and the Forward Search
Exercise 2.8
In order to test the hypothesis of equality of means, the matrix D and the
vectors c and 11 are
( ·~ -~ -~
0 0
0 0
.
0
0
-1
1 :l'
-1
c = (O, .. o,O)T and 11 = (p, 1 , .. o,P,v)To The first row of the matrix D
imposes the constraint p, 1 - p, 2 = 0, the second p, 2 - p, 3 = 0, .. o, the last
11v-l -P,v = Üo D in this case has dimension (v- 1) x v and has full row
rank.
Exercise 2.9
We start by rewriting the expression which defines the likelihood ratio test
(2020)
Now, since IA+bbTI = IAI(l+bT A- 1 b), we can rewrite the former equation
as
This implies that the likelihood ratio test is a monotone function of Hotelling's
T 2 statistico
2.18 Solutions 85
Exercise 2.10
Wehave a random sample of size ni from each of Nv(J.L 1, :E1; y). l = 1, 2, ... , g.
The likelihood function is
Lik(J.Ll, J.l2, ... , J.L 9 , :E1, :E2, ... , :E 9 , y) = f1f= 1 Lik(J.L1, :E1; y E group l)
= (v'Z~r)nv nr=1 I:Ednt/ 2 exp [- ~tr:EI 1 { n1:E1 + n1(Y1 - J.L1)('fh - J.Ll)T}] .
The maximum likelihood estimate of J.l1 (v x 1 vector of means for group l) is
y1 under both Ho and H 1 because there is no restriction on the population
means. The maximum likelihood estimate of :E1 is I.:;f= 1 n1'El/n under Ho
where n = I::f= 1 n1. Under the alternative H1, the maximum likelihood
estimate of :E1 is 'E1. So the maximized likelihood in the two cases is:
maxLik =
1
v'27f I1
g I, ~-nt/2
:E1 exp(-n1v/2),
H1 ( 2n)nv
lt nt"f:,tfn~ -n/
1=1
maxLik =
Ho (
~
27r )nv 1=1
2
exp ( -nv/2).
L 1 = maxL
H1
= nv 1
--log2n--
2 2
L n1log I,:E1 I-
9
nv/2,
lt
1=1
Exercise 2.11
We must show that the product of (C - xxT) with the right hand side
of (2.40) gives the identity matrix
T ( -1 c-1XXTC-l)
(C-xx) C + 1-xTC-1x =
Exercise 2.12
We start with m = n. From (2.28)
d i2 = ( Yi - /LA)T-<'---1(
L..u Yi - f.LA) ,
Then
n
(n -1)tr L(Yi- {L)TS({L)- 1 (Yi- {L)
i=1
n
(n- 1) tr L S(p,)- 1 (Yi - fl)(Yi - {L)T
i=1
(n -1)tr Iv = (n- 1)v.
units must have values less than the average; the maximum number of units
with zero distances is n- m. In this case, 2:::~= 1 di;. 2: (m- 1)v. The step
of the search going from sim)to sim+ 1 ) will then destroy this structure
and the sum of all the distances will increase.
Exercise 2.13
(a) Hr = (X(xr x)- 1 xr)r = xcxr x) - 1 xr = H.
(b) HH = X(XTX) - 1 XTX(XTX)- 1 XT = X(XTX)- 1 XT.
(c) tr H = 2:::~=1 hi = tr {X(XT x)- 1xr} = tr (XT X)(XT x)- 1 = tr Ip =
p.
When X contains only the constant term, that is X = J .
Exercise 2.14
There are p = k + 1 columns in X , the column for the constant term, which
is a vector of ones, and k columns, one for each variable in the model ,
which contain 2k - 1 entries of +1 and the same number of -1 entries. The
columns are mutually orthogonal, so
xT X= diag (n, ... )n) and (XT x)- 1 = diag (1/n, ... ) 1/n),
where n = 2k .
The hat matrix H = X(XT x) - 1 xr is n X n. The leverage measures hi
are the diagonal terms of H:
p
Exercise 2.15
The trace of the estimated covariance matrix is a function only of the vari-
ances of the variables; the determinant also includes the correlations. The
88 2. Multivariate Data and the Forward Search
right panel of Figure 2. 7 shows that the inclusion of units from the sec-
ond group causes an appreciable increase in the variances of the variables
(signalled by a sudden change of slope in the trace). The left panel of Fig-
ure 2.7 shows that the increase of the variances due to the initial inclusion
of the units from the group of forgeries is partially counterbalanced by the
increase in the covariances. Due to this compensation, the overall effect on
the determinant seems to be negligible compared to that on the variances
(see the left panel of Figure 2. 7). This conclusion is in agreement with
what we had already seen in Figure 1.16. This figure showed that around
m = 105- m = 110 there was not only a big increase in the variances of
variables 4, 5, 6, but also an increase in absolute values of the covariances
between variables 6 and 4, and between variables 6 and 5.
3
Data from One Multivariate
Distribution
the data. The outer ellipse is the same ellipse scaled using B = 0.92, which
corresponds to a theoretical value of 60% of the data in a single boxplot.
Since the content of the ellipses is similar, they are hard to distinguish in
the plot, even though we have used different types of line for the two of
them. There are exactly 25 observations inside all the 60% ellipses; these
·oo·[l]·[!]·o)•·>
o·ttJ·:<lfJ. .:.
defined the initial subset for the search in the first chapter.
. • 159 . • • . • . : . • • • ••
. . ... :.:· . 'c· :'· . . .• . . ·. ;:·. . . :·
• ~ ·lfli:-. \~,.··· . ~:·-:·
'[f]'Lfj
• • 10 • ,
• ..
.
• • • • • • •
.
•
~
~ '[.~;
!j'
. . . .
·. . . . ,'!,;· . ~-:~·. · ......
'i.· . . ·.,·..-,::.
• 0 I 0 • o: : o . '· o. '. Oo o .. • ,o'
: . ~~~· : I .,·.
·t!l.·t!l·oo·· .[!]
. . . . .
. . . . .
.. . .:-~.~.:·
. . ,.; .. -:f.:.· · ·:. .
.. . ' ~ :·
~ .. ;-.....
:'' ·.· .. ..... :.::.:,
·.·":..~ "';'"- : ~; . :'
·:[!]·
•• , o 'o I 0 ~~~ ... 0
o • .. o t ...
-~.·[!].··[!]··oo·IB~ ·Clj:
I t 0 0 t I I I O
. .
0
.
• • •
0 • • • •
' '
:[ !]···
;;: ..·
.
r,,;.· :
··;·~ - ··~
, )• :[ f]·· [·...f. l.· .~.·:].:[l]':•.'
:1:. : . . :~
.-:~, . . .
O I~ 0 t 00
0 ',',+' ~~~ 0 0 O 1~ 0 ,• I '·'.·, 1 O
.
. :. •
,
• •• • •
'
• . •
'
·.. • .
.
• '
.
. 10
FIGURE 3.1. Swiss heads: scatterplot matrix of the six measurements on 200
heads. The outer (dotted) ellipse for which (} = 0.92 gives a starting point with
mo = 25
Figure 3.2 replots Figure 3.1, except that the coefficient for the outer
threshold is now B = 4. 71. This larger ellipse gives some indication of
whether there will be any outliers, in so far as bivariate plots are enough
to establish this. Units 111 and 104 were the last to enter the search in
3.1 Swiss Heads 91
Chapter 1. They are the two highlighted points on the scatterplot matrix of
Figure 1.1. In particular they have the two very large, and almost identical,
values of Y4 visible in the univariate boxplot in Figure 3.2 and lie far from
the ellipse in some of the panels .
. ~.~·"''•,1
..·.~ ··
~ . .:·
.T,,;.
..
' ~. ,j.;·
FIGURE 3.2. Swiss heads: scatterplot matrix of the six measurements on 200
heads. The outer (dotted) ellipse for which () = 4.71 indicates some potential
outliers
0
...=
0
:::;:
9
E
"C
c: "'
('"j
"'~
:::;:
0
('"j
FIGURE 3.3. Swiss heads: forward plots of, dotted line, maximum distances of
units in the subset and, continuous line, the ordered distance dirnJrn · In normal
progression the two are identical. To be compared with Figure 1.3
Figure 3.3 is similar to Figure 1.3, except that the maximum distances are
smaller than the minimum distances among units not in the subset, in line
with the argument of §2.14. A small difference is that the plots differ by
one in the value of m at which events occur. An outlier outside the subset
at stage m in Figure 1.3 will give the largest distance at stage m + 1 in
Figure 3.3, which looks at units within the subset.
A third related plot is the gap plot in Figure 3.4, which indicates inver-
sions and interchanges, that is whether the difference between the minimum
distance of units not in the subset is less than the maximum distance of
units in the subset. A series of interchanges (§2.14) can indicate the pres-
ence of a group of similar outliers. Once one or two have been included
in the subset, they may so alter the parameter estimates that the rest of
the group no Ionger seem remote and many or all will enter at the next
step. There are two curves on the plot. The upper shows the gap between
the m + 1st and mth ordered Mahalanobis distances at a subset size of
m, regardless of which observations are in the subset. This quantity can-
not be negative. The second curve is the difference between the minimum
Mahalanobis distance amongst the units not in the subset and the max-
imum distance amongst those in the subset. It is therefore the difference
between the curve in Figure 1.3 and the upper curve in Figure 3.3. Usually,
these two differences are the same as the ordering of distances is the same,
whether or not the constraint of belonging to the subset is considered. But
3.1 Swiss Heads 93
<0
ci
....
ci
N
ci
Cl. 0
"'
C) ci
N
9
....
9
<0
9
50 100 150 200
Subset size m
FIGURE 3.4. Swiss heads: gap plot. Forward plots of, solid line, the difference
d(m+ l]m - d(mJm and, dotted line, the difference between the minimum distance
of units not in the subset and the maximum distance of units in the subset . In
normal progression the two are identical
C!
·~
o;
E
c
0
~ 10
~ c:i
00
ö
!!1
c
Q)
E
w
Q)
0
c:i
FIGURE 3.5. Swiss heads: forward plot of elements of the estimated correlation
matrix
any particularly large distances. As we saw in Chapter 1, the last two units
to join are 104 and 111, which we showed on Figure 1.1. Their distances
are largest at m = 198, just before the first of them joins the subset.
The two horizontallines superimposed on Figure 3.6 are the square roots
of the 2 ~% and 97 ~% points of the x~ distribution. These show that, unlike
residuals from normal theory regression models with a single response, Ma-
halanobis distances do not necessarily have small values: gaps at the bottom
of the plot arenot surprising. This point is reinforced by Figure 3.7 which
gives the density functions and 2 ~% and 97 ~% points of x2 distributions
with degrees of freedom from one to six. We can therefore expect that
95% of the observed distances should, at the end of the search, lie within
these regions. In Figure 3.6 the lower line is at 1.112 = v'1.237. In fact
the boundaries at the end of the plot suggest a very slight skewness in the
distribution of distances - there are slightly too many large ones and too
few small ones. This might indicate that we need to transform the data.
Although the scatterplots in Figure 1.1 do appear elliptical, we return to
this question at the end of the section when we look ar robust boxplots in
Figure 3 .11.
Interpretation of Figure 3.6 can be aided by the use of simulation. For
example, the forward plot of Mahalanobis distances, Figure 3.6, had some
structure including the diagonal band of decreases in distance when a unit
entered the subset. To see whether this is indeed a feature which can be
expected with data from a multivariate normal distribution, we simulated
200 observations from a six-dimensional multivariate normal distribution.
96 3. Data from One Multivariate Distribution
0"'
df=1 df=2 d =3
....0 0
C\1
0
0
C\i
"'
0
0
N
C! 0 0
0
0 0 0
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
0"'
0
0
CO
0 0 0
0
0
"' ....
"'
0
0
0 0 0
0
0 0 0
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
FIGURE 3.7. Densities and 2~% and 97~% points of x2 distributions with degrees
of freedom from one to six
.,Cl)
(.)
c::
s ....
"'
'5
"'0
:i5
o;
.,c::
.,
..<:
:::;: C\1
--, -,---------,-
50 100 150 200
Subset size m
FIGURE 3.8. Swiss heads: forward plot of simulated scaled Mahalanobis distances
to be compared with Figure 3.6
3.1 Swiss Heads 97
y1 y2 y3 y4 yS y6
~ ! r-~
~
'1~1
104 ~ 11!4 ,---,
111
195
"'~ ~
~ ,---,
...
0
,---,
~
~ al ~
I I I
~· ~
"' "' ~I ~
I
~
2
S! ~
S! ~
~
~
::l
a
'--"-' ~I ..___,
"'
'~
~ 5! ~
10
I ~1 1 7
•
I ~~ '--' ~
FIGURE 3.9. Swiss heads: boxplots of six variables with univariate outliers la-
belled
Since the Mahalanobis distances are invariant to the mean and covariance
matrix, we do not need to estimate the parameters of the distribution.
Furthermore, we can take the six dimensions as independent, so we only
need to sample six times from a univariate standard normal distribution
for each observation. The resulting forward plot of scaled Mahalanobis
distances for one simulation is shown in Figure 3.8. This is remarkably
similar to Figure 3.6. Not only does the diagonal band again show clearly,
dividing the units which are in the subset from those which are not , but
there are a few units with large distances and a similar gap at the bottom
of the plot, indicating the absence of very small distances.
Finally, we consider the outliers, if any. The last 20 observations to enter
the forward search, from m = 181 onwards, are units 179, 153, 29, 95, 158,
13, 33, 159, 3, 100, 125, 147, 57, 194, 133, 10, 80, 99, 104 and 111. This is
an ordering of the data in increasing distance from the fitted multivariate
normal distribution. To see whether any of these observations are also re-
mote in the univariate distributions of each variable we give the boxplots
for the six variables in Figure 3.9.
All units lying outside the whiskers have been labelled. There is not
much relationship between the univariate outliers and the list of the last
20 units. The last two to enter, units 111 and 104, have large values of
98 3. Data from One Multivariate Distribution
~
2
!i! ~
!!!
~
2
."'
100 110 120 130 110 120 130 140 115 125 135
FIGURE 3.10. Swiss heads: scatterplot matrix of the six measurements on 200
heads. The last three units to enter the search are, from the end, 111, 104 and
99
y 4 , more remote from this distribution than any other variables, but they
are not outlying in any other marginal distribution. Unit 99, which enters
when m = 198, is not outlying in any boxplots while unit 147, which has
the only outlying value of y 2 enters eighth from the end of the search. These
univariate boxplots do not seem, for these data, to give a clear idea as to
which are the more remote units.
Finally we return to bivariate plots. Figure 3.10 has three units high-
lighted. It shows that units 104 and 111 are indeed outlying in y 4 , but
are not particularly remote in the other variables. Unit 99 has one of the
largest values of y 2 and although it has no other extreme values, is often
towards the edge of the point cloud. The conclusion seems to be that it is
an extreme, but not anomalous, unit.
3.1 Swiss Heads 99
FIGURE 3.11. Swiss heads: scatterplot matrix of the six measurements on 200
heads with virtually elliptical robust contours
-,
I
I
I
I
' '- ... ,
'....... -- .... ,
,__ _
10 20 30 40 50
Subset slze m
FIGURE 3.12. Track records: forward plot of scaled Mahalanobis distances. There
are three clear outliers towards the end of the search and a further duster around
m=30
normality provides a useful model and that there may be two people for
whom the measurement of y 4 is incorrect. If it is not possible to check
these readings for transcription or other errors and then, if necessary, by
remeasuring the individuals, further readings, just on this one variable,
should be enough accurately to determine the population distribution of
Y4 so as to decide whether some errors have been made.
'!!.
g
V>
g
V>
'6
~ ~
g
"'
o;
.t::.
"'
:::;;
.n
0
J
20 30 40 50
Subset size m
FIGURE 3.13. Track records, simulated data: forward plot of scaled Mahalanobis
distances. Tobe compared with Figure 3.12
of the search, being Mauritius (36), another island. A third outlier is North
Korea (the Democratic People's Republic of North Korea, 33).
For comparison, a forward plot of simulated scaled Mahalanobis distances
is in Figure 3.13. There are some similarities - if the two outliers at the
beginning of the search in Figure 3.12 are ignored, the largest distances in
both plots are of a similar order at the beginning of the search and both
plots show the diagonal band of decreases as each non-outlying unit joins
the subset. However, there are some very important differences.
For the simulated data the distances rapidly decrease to around six.
On the contrary, the distances for the real data have appreciably higher
values. Indeed, it looks as if at around m = 30 there are three groups
of observations: the two outliers, a group of 10 observations rather distant
from the rest and then the bulk of the data. The gap between this apparent
group and the central observations decreases after m = 40. This is the point
at which there is a local maximum in the forward plot of the minimum
distances among the observationsnot included in the subset in Figure 1.6.
Comparison with the same plot for the simulated data in Figure 3.14 shows
that this maximum, with a value just larger than six, is indeed large enough
to suggest a departure from multivariate normality.
From largest distance downward in the centre of Figure 3.12 the coun-
tries concerned are: the Dominican Republic (16), Papua-New Guinea (41),
Guatemala (23), Turkey (52), North Korea (33), the Philippines (42), In-
donesia (26), Burma (7), the Cook Islands (12) and Argentina (1). These
102 3. Data from One Multivariate Distribution
0
::;: CO
E
::I
E
·;::
~
<D
20 30 40 50
Subset size m
FIGURE 3.14. Track records, simulated data: forward plot of minimum distances
of units not in the subset. Tobe compared with Figure 1.6
[]0 ·;;
22 24 26 19 2.1 23 85 95 10.5
0<
, ;· + ~il·· 36 ·;tt
+ .J..,
.~\~ ..i.{!;~
\.;. ++
4..t
~
... ..
-
~
+ .... ,
p:
*<·~
~ 41
~
0[] .
. . +
,.
.J?Dt ·if.;
.··~ ~'
t• ~w 26i rf~11ß
dA.f
:. 4 1 31
.
+
;tf
+ ..
<'\~ 1
•2
+ + . .. . 1· f ~··
+ .t 42 f~~
+
..
.~1.,.. . r ..
~[] ~ ~~.~ ~ ~~~j'G
t·
3
+ 'Ii'
~
.;t '\' ~ ..·~·
~~·
:~~· I "~ \"'t *
+ <t +
'?
.."' .*
+
-
N
~~·
~1
' 1
-·~ +
. +
.~*~
-K···
... t
+
~!·
~~·
+
~~ ~
[] ~~
si rJ"fi
•:+t ..
..
~··
.;.'l
+
.J.~~·
~f"l
t .j;;'
+
: L41 b ~t·
....
.{~52
..:1#
.1·
....
*~t.,........ + +
;.
~·~
*~ ..
..
t• 0
...~ ..~
[] 23 ~~
~
~ ~?6
~·
~ + ·'lfe • ••
:"~
·?k~ ·~~6
42;i .
i_h;r;r.
...
···~· ~+552
+0-1 ii.
'J;t,752
....·~·
•• +
..
...
...
~·
... l.Y.· ~-.·
..
[]
.,. .,. .,. .,.
1 41 1 41 1 411
• 411
!+,
..
~ ...1
·\·~* ~~t. .~
.
-23
+ , ... , . ++\1+ + ~( +
·M:!J ...~!.
+
...~tt~~
,. + ~~~··~
"• ~
11 0 120 '8 52 56 60 40 44 48
FIGURE 3.15. Track records with Western Samoa removed: scatterplot matrix
of the national records. The results for nine countries plus Mauritius are Iabelied
Cl
:;:
"'
9
E
"0
c:
"' ..,.
~
:;:
20 30 40 50
Subset size m
FIGURE 3.16. Track records: forward plot of maximum distance of units in the
subset and ordered distance d(rnJm· The two curves overlap
20 30 40 50
Subset size m
FIGURE 3.17. Track records: gap plot . Forwardplots of, solid line, the difference
d(m+l]rn - d(rnJrn and, dotted line, the difference between the minimum distances
of units not in the subset and the maximum distance of units in the subset. There
is a notable decline after m = 40
3.2 NationalThack Records for Women 105
C\!
0
J
0
0
20 30 40 50
Subset slze m
FIGURE 3.18. Thack records: forward plot of the elements of the estimated cor-
relation matrix. There is one change before m = 40 and others at the end of the
search
TABLE 3.1. National track records for women: the units with the twelve largest
Mahalanobis distances at two points in the search
Rank 1 2 3 4 5 6 7 8 9 10 11 12
m=40 55 36 16 41 23 52 33 42 26 7 12 1
m=n 55 33 36 12 13 51 35 16 52 25 7 14
among the twelve most outlying at the end of the search. Such hiding of
outliers when all observations are fitted is an example of masking. It would
be difficult to detect the group of outliers at m = 40 using the backwards
deletion of observations starting from the information at the end of the
search provided by the QQ plot in Figure 1.5.
Figure 3.19 is a zoom of part of Figure 3.12 which shows visually the ef-
fect of masking. If outliers remain outliers and central observations remain
near the centre of the distribution throughout the search, the plot of Ma-
halanobis distances will not include many trajectories of individual units
which cross each other. Figure 3.19 does show several trajectories crossing.
After m = 43 the distances for several of the previously outlying units
steadily decrease, crossing the trajectories of other units, which increase
towards the end, in line with our discussion of Table 3.1.
If it is correct to fit a single multivariate normal distribution to the
data, the bivariate distributions should have elliptical contours, perhaps
106 3. Data from One Multivariate Distribution
'
I I I
~ /'" \
-~ I
I
I
-
I
I
I
<0 I
I
.,"'<>
I
I
I
c::
!9.
lss
"' "'
'5
"'
:.0
0
c::
"'
a; .... ~
.r:.
"'
::;:
"'
~::_-_-:::.---.-:::::.
10 20 30 40 50
Subset size m
FIGURE 3.19. Track records: forward plot of scaled Mahalanobis distances. Detail
of Figure 3.12 exemplifying masking
with a few outliers. This seemed to be the case for the data on Swiss
heads in Figure 3.11. Figure 3.20 shows similar robust contours fitted to
the scatterplot matrix of Figure 1.4. These nominally contain 99% of the
data. This new plot is slightly strange. The upper 3 x 3 submatrix for the
shorter races contains curves which are pretty much elliptical, as does the
lower 4 x 4 submatrix for the Ionger races. But the off-diagonal plots of
observations from both sets of responses appear very non-elliptical.
In our analysis of these data in Chapter 1 we mentioned that there might
be reasons for looking at speed as the response, rather than time: that
is analysing 1/y rather than y. Figure 3.21 shows the robust contours,
now applied to the data after this reciprocal transformation. Camparisou
with the plot for the untransformed data suggests some improvement: in
particular, the bottom row of the matrix contains a set of appreciably more
elliptical contours. This is in agreement with the discussion of Table 1.1,
as it is these Ionger times about which we would expect to have more
information for transformation.
If the data have a more nearly multivariate normal distribution after
this transformation, the forward search may be more stable, showing, for
example, fewer outliers. Figure 3.22 is the forward plot of standardised Ma-
halanobis distances for the reciprocal data for a search starting from units
in an intersection of robust contours. It is quite different from the plot
from the untransformed data, Figure 3.12, and does indeed show many
fewer outliers. The distances overall for units not in the subset are smaller
3.2 National Track Records for Women 107
•·rn·lZJ·
r . ··:[,.l]·~·~·
Ii ...: ·~. . ·~
· -:··
:[ZJ·
.
. :.0].12:.[ Z].
.
.
.
.-:~··
. --- . :i].
:~
.
.
.
.
·'f!;--
. ..· .
.:~.
.
.
. .•.
·~ ..
...
.
.
.
.
.
.
[2][2][[]~~]~]~
~~][ZJOJ~[l]~
[~J~CLJ~W~-J~
[;J~~[~~]UJ0
'
.
1
.
•
•
..•
.
1
.
•
. ·rn
[;J[l]~[LJ[LJ
.. .. e
•
.
o
.
•
•
•
o
.
•
•
FIGURE 3.20. Track records: scatterplot matrix with robust contours. Some
.
0
.
0
•
.
o
'
•
.
,.}6
,41
and the group of outliers associated with developing countries with long
times for the marathon is no Ionger evident. The four outliers are, in de-
creasing order at the end of the search, North Korea (33), Western Samoa
(55), Mauritius (36) and Czechoslovakia (14), which has not previously
appeared as outlying.
Comparison of Figure 3.22 with the simulated distances for these data
in Figure 3.13 shows how similar the two plots are, apart from the four
potential outliers. Both show the diagonal band of decreases in distance
as units join the subset. The distances in Figure 3.22 are , apart from the
outliers, rather smaller than those in the simulated data. This is an effect
of standardisation by an estimated covariance matrix inflated by outliers,
which makes non-outlying distances too small.
The four outliers show up clearly in the forward plots associated with
individual distances. Only one is given here, that of the gap plot in Fig-
ure 3.23 which is stable until almost the end of the search, when there are
four !arger values associated with the four outliers.
108 3. Data from One Multivariate Distribution
. ... ..
:·rn·lZJ·~·~·~·~·~
I r6 : : .i2.,. : ~ : ... : .: : . :
I a I 1• I • 1• a 1 • e
I I I I I I 1
:'[.ZJ'DJ'
: t :~
·- ·~·~·~·~
. :.~ . . :. ~ . :· .- . :.....
I I I I a• I a. I A I a
I I I I ... A I I I •• I
I I I I a a I I a I a
: : : : : : :
·~(j}·CZJ·rn·~·~·~·~
j .#e" j
~ ~-
j
I
j . i •
~ j ~ j ··.:
t• I t • I • 1 ••t•
I I • I t•• • I • I •• I • •
I I I I I I I
fliJGliZlOJ·[2]~[{~
:~: :~=-::~·:~:rn:0:~.·
..
I • I a I a I a I .~ ~ I a I •
1 I I I l I I
: • : • : .. : • : • : .!t~ :.
:· ~
. ·~·~·[tj·~·~·rn
II II • t
I •
o" O
I
: ! I )AI
• I •
II II
:. . . : . : :. : : : .: : 8
I I I a • I I • I I I
I a• t •a I • t • I I I
I
I •
• I
I
~6 a
a I
I
a I
I
a I
I •
I
I •
I
I ·'~
I I I I I I I
~----------~--------------~------------~--------__J
20 30 40 50
Subset stze m
FIGURE 3.22. Reciprocal track records: forward plot of scaled Mahalanobis dis-
tances. Comparison with Figure 3.19 shows a stable pattern of outliers
"'
C\i
0
C\i
'l!
0.
"'
(!)
q
"'c:)
0
c:)
"'
9
20 30 40 50
Subset size m
FIGURE 3.23. Reciprocal track records: gap plot. Forward plots of, solid line, the
difference d[m+l]m- d[mJm and, dotted line, the difference between the minimum
distances of units not in the subset and the maximum distance of units in the
subset . The four outliers are evident
110 3. Data from One Multivariate Distribution
~l
<J)
~
.!!l
<J)
g
'6
"'
:;;;
0
., ....0
c:;
I
.!1!
.,
.r:;
::;: 245--,"\.. _____ ,.._
~~ -- -- - ~ - ---- - - -- -- - -- -- ....,_
0
forward search. In this section we first Iook at the forward plot of scaled
Mahalanobis distances and then explore the properties of the outlying units.
Part of the challenge is that there are too many variables for the scatterplot
matrix of all observations to be decipherable.
The scaled Mahalanobis distances from a search through all the data are
plotted in Figure 3.24. This shows the two clear outliers, as weil as the
diagonal band associated with the change in distance when a unit enters
the subset. The two outliers, units 245 (Cerignale) and 277 (Zerba), were
identified in Chapter 1 as small and poor mountain communities. What
is surprising is the large change in the distance of unit 277 when it joins
the subset in the final step; this unit, which as we shall see is outlying
in almost all variables, must be having an appreciable influence on the
estimated parameters of the model. We now Iook for ways in which the
units are outlying.
With the moderate numbers of observations in the previous two exam-
ples, we could study the scatterplot matrix and highlight units to see in
what way they were outlying. This is not practicable with 28 variables. Even
if we take groups of 7 variables, so that it is possible to see the structure
of the scatterplots, we shall miss a large number of pairwise scatterplots.
We therefore start with a univariate analysis of our outlying observations.
We first looked at 28 boxplots, six of which are shown in Figure 3.25.
In t his form of boxplot the central box covers the interquartile range, with
the median denoted by the central white stripe. The whiskers can extend
up to 1.5 interquartile ranges from the central box. They t erminate at the
3.3 Municipalities in Emilia-Romagna 111
~
pop.inl
••9 ~
",_l birth
g
T=
cars
........,
!il
luxury
133 -
70 -
....,...,
gr
artisan
.---,
~
entrepreneur
m -
~
s 88 -
260 -
I I
~
~ ........,
~
..
I
I
~
2
~
'---' 88
!il
250 -
~ 260
f~
m
'-'-'
188 ~
260 0 245 250 0 239 - 0
10 15 10 30 50 70
pop.inf (y1)
~ ~, 2 •••• •
;41 9
~' 31 ~-·'l.
7.lifo.. artisan (y27)
~..-:
. . ,.
lö! 188 2 • • 50 • ,. ':
•• •a ' ••
*e:
e 1,., 10 1_. 10 !Bis
r;
+n.r·~~
a :O~~
· i · ( 2i~~~ ~. ~-: ~ • entrepreneur (y28
r . .: . " ~ ·U . . . ,_,"10__,.·-li..-:;··~-·
2sfl 31t 50 .. • :: •
all the other communities show as outliers from the general clusters of
points. The most striking is unit 310, Casina, which has an amazingly
high birth rate, y 12 , about twice the usual value. This is hard to explain,
if, indeed, the number is correct. One possibility is that births have been
credited to both father and mother. Another might be the presence of a
maternity hospital, although this is not in fact the case. A more prosaic
explanation is that there is a transcription error; the value for Y12 should
be 7.10 not 17.10. Apart from this one anomalous value, the measurements
for Casina fall in the centre of the observed values.
One unit , 238, which came in when m = 326, is not a univariate outlier
in any of our boxplots. However it shows up as a bivariate outlier in, for
example, the plot of Yl against y 18 . Car ownership is a little too high for a
community with such a low birth rate which, in many cases, is associated
with an impoverished and aging rural population. Projection of this obser-
3.3 Municipalities in Emilia-Romagna 113
wealth indexes. These municipalities often stand out in bivariate plots, such
as those contrasting aging and unemployment, aging and housing and aging
and wealth, in Zani (1996, Chapter 7).
The additional communities in Table 1.3 are scattered throughout the
regional map. For these villages outlyingness is mainly attributed to local
instances, rather than to general socio-economic causes, as seemed to be
the case for unit 238 (Calendasco). Goro (70), which is the third last to
enter (and so is the most remote after the two extreme outliers) shows
up as an outlier in many of the scatterplots in Figure 3.26. It is a non-
touristic seaside community on the once malarial marshes near the mouth
of the River Po. In some of the scatterplots it is close to unit 245, but it
has an average birth rate and number of young children. It also, as the
boxplot in Figure 3.25 shows, has a high number of luxury cars. Recording
or definition errors may also, as we have suggested for unit 310 (Casina) ,
generate outliers. As a further example, the last community reported in
Table 1.3, Bellaria-Igea Marina (88), a seaside resort, has a relatively high
unemployment rate. However, it is well-known that many people working in
tourism ( for example in hotels and restaurants or on beaches) are wrongly
registered as unemployed in censuses and other surveys.
This further discussion is a remirrder that we may have several kinds
of outlier. In particular, atypical combinations of positive and negative
characteristics willlead to communities being detected as outlying. For un-
116 3. Data from One Multivariate Distribution
derstanding the general structure of the data we are also concerned with
estimation of patterns once the outliers have been detected. For example,
many of the outlying communities shown shaded on the map of Figure 3.27
fall in the mountainous areas of the region, as do many other poor com-
munities. We return to these data in Chapter 4, where our focus is on
simultaneous transformation of the 28 variables.
I
r
~ ~..,JI .' .
,.., ..... ,
,'
I
\
I I
\"''""'-, ... -,
I \
."'
I
CO '
<> ''
''
c
"'
;;;
'6 <0
"'
~
c
.!'1
"' "'
.<=
"'
::;
C\1
0
l 50 100 150 200
Subset size m
FIGURE 3.28. Swiss bank notes: forward plot of scaled Mahalanobis distances
from the sea rch starting with a subset of units from both groups. There seem to
be many outliers, but the group structure is not clear
C!
Ll)
0
0 ''
0
''
a. '''
"'
(!) Ll)
''
'
9 '
'
'''
C!
''
";"
''
''
''
'
"l
";" '''
'
''
'
50 100 150 200
Subset size m
FIGURE 3.29. Swiss bank notes: gap plot. Forward plots of, solid line, the dif-
ference d(-m+l}m - d(mJm and, dotted line, the difference between the minimum
distances of units not in the subset and the maximum distance of units in the
subset
118 3. Data from One Multivariate Distribution
"'c:
Q)
() CO
~
"'
'5
"'
:ö
0
<0
c:
<II
<ö
r.
<II oo:t
:E
FIGURE 3.30. Swiss bank notes, starting with the first 20 observations on genuine
notes: forward plot of scaled Mahalanobis distances. The three groups are evident
.,.,
ci
0
ci
.
~j
.
II\ I
..
"~ :"":
a.
.,.,
I :'
<II
.
111 I
0
9 'lN
"
q
"7
""!
"7
FIGURE 3.31. Swiss bank notes, starting with the first 20 observations on genuine
notes: gap plot. A series of interchanges starts at m = 106
3.4 Swiss Bank Notes 119
When m = 101 the search has reached a point at which at least one
observation from Group 2 has to joint the subset. In fact, as the figure
shows, due to the presence of outliers, this inclusion of units from both
groups starts a little earlier. From m = 95 the distances to the group of
outliers decrease rapidly, as remote observations from Group 1, the genuine
notes, join the subset. Around m = 105 we can see that many of these
former outliers are joining the subset (their distances decrease), while many
of the units formerly in the subset leave (their distances increase). The
crossing of two bands of outliers, seen here between m = 105 and m = 115,
is typical of distances in a forward search when one multivariate normal
distribution is fitted to data that contain two or more groups of appreciable
size. Once the subset contains units from both groups, the search continues
in a way similar and then identical to that of the earlier search. The right
hand thirds of Figures 3.28 and 3.30 show identical patterns of Mahalanobis
distances , the only difference being the vertical scale of the plots.
The structure of the data is also exposed in the gap plot for the search
starting in the first group. Figure 3.31 shows that the gap is steady until
m is near 100, when a few outlying observations enter. But, most informa-
tively, at m = 105, there is a large gap between the two plots, indicating
that there is an interchange of units at m = 106. This happens a little after
m = 100 because a few units from the second group need to be in the sub-
set before there is an appreciable change in the parameter estimates. The
interchanging of units persists for another ten steps of the forward search.
The last third of the search is the sameasthat in Figure 3.29 which started
from both groups.
Very detailed information about the interchange and of the group mem-
bership of each unit comes from looking at forward plots of individual
distances for the one or more units entering the subset at each stage of
the search. Figure 3.32 shows a series of such plots. The first panel plots
all 98 distances for the units in the subset at m = 98. This distribution
of distances is used to judge the behaviour of successive units as they are
included in the subset. The second panel of the figure shows the percentage
points of the estimated distribution of these 98 distances. The plotted per-
centage points are at 2.5%, 5%, 12.5%, 25% and 50% and the symmetrical
upper points of the empirical distribution. Superimposed upon this distri-
bution is the forward plot of the distance for the unit which enters when
m = 99, which is unit 125. This is just outside the distribution at m = 99,
and so is the appropriate unit to enter. But the shape of the trace of the
Mahalanobis distance is very different from units already in the subset: it
is initially too high, becoming atypically low later in the search. This is the
first unit from the second group to enter the subset. Units 104 (m = 100)
and 127 (m = 101) show a similar transition from high to low. However,
when m = 102, unit 70 has a rather different profile, being much smaller
around m = 70 to 90, and much higher after m = 100. This unit does not
behave in the same way as the three earlier ones to join the subset.
120 3. Data from One Multivariate Distribution
FIGURE 3.32. Swiss bank notes, starting with the first 20 observations on genuine
notes: nine panel plot of forward Mahalanobis distances for specific units, starting
fromm= 98
The next three units to join, 129, 107 and 105 all show the pattern
associated with units from Group 2. The interchange starts at m = 106
when three units from Group 2 enter, all with proflies which are initially
very high and finally very low. This pattern continues in Figure 3.33. All
nine panels on the plot show at least two, but mostly three or more units
from Group 2 entering the subset at each stage of the forward search. For
the units to enter together, they have to have Mahalanobis distances which
are very close to one another at the value of m for which they enter. In
the last panel of Figure 3.33 this is weil inside the reference distribution
of units from Group 1, many of which will have left the subset during the
interchanges.
At each stage one or two or more units which are already in the subset
have to leave. As the parameter estimates change with the inclusion of
large numbers of units from Group 2, some units from Group 1 become
increasingly outlying. However, as Figure 3.34 shows, they do have finally
to rejoin the subset, since all units are included by the end of the search.
3.4 Swiss Bank Notes 121
FIGURE 3.33. Swiss bank notes, starting with the first 20 observations on genuine
notes: nine panel plot of forward Mahalanobis distances for specific units, starting
fromm= 107. Reference distribution: m = 98
The first panel of the figure shows four units joining, three from Group 2
and one from Group 1. However the succeeding panels show the pattern
more clearly. When m = 117 and 118 the units included are from Group
1 and lie well within the distribution , as does unit 17 which joins when
m = 121. The rest of the units, with profiles which are initially high and
finally low clearly belong to Group 2.
These three figures show very clearly the amount of information on group
membership which can be found by looking at individual profiles against a
background of profiles from a known group. We find this a very powerful
technique in duster analysis as we judge units against the clusters we are
developing. Further examples are in Chapter 7. But, for the moment, we
see what happens for a search starting with members of the second group.
The forward plot of scaled Mahalanobis distances, Figure 3.35, is broadly
similar tothat when we started with Group 1, Figure 3.30, but the differ-
ences are informative. Before the interchange of units in the subset, the two
groups seem more clearly separated than when we started from Group 1,
122 3. Data from One Multivariate Distribution
FIGURE 3.34. Swiss bank notes, starting with the first 20 observations on genuine
notes: nine panel plot of forward MahaJanabis distances for specific units, starting
fromm= 116. Reference distribution: m = 98
;:!:
' -,,_,..'
I
I
I
''I
~ I
I
I
</) ~
Q)
<.>
c:
.l!! ----,-,
--...
--- -""'
</)
CD
'ö
</) ' ....... ......
:ö
0
c:
"' "'
öi
J::
"'
::E ~
(\j
FIGURE 3.35. Swiss bank notes, starting with units in Group 2 (the forgeries):
forward plot of scaled Mahalanobis distances. Compare with Figure 3.30
Theseareall units from Group 2 and so are the first five to enter. The first
four to enter have profiles similar to those in the calibrating distribution,
although they are a little extreme and indeed should be: otherwise they
would have entered the subset earlier. The first unit that looks rather dif-
ferent is unit 125, entering when m = 85, which is the outlier causing the
spike at m = 84 in Figure 3.36. It is also the first unit to join Group 1 in
Figure 3.32. The three remaining units to enter, in the bottom panels of
Figure 3.37, are all from Group 1 and have similar profiles: they are initially
too high, but finally have increasingly small distances in the panels from
left to right in the last row of the figure. These profiles are reminiscent of
those for the Group 2 units in Figure 3.32 when starting from Group 1
because the units are initially remote from the subset used in fitting, but
finally have small distances from the model fitted to most of the data. Unit
125, the outlier in Figure 3.37, has a profile somewhere between those for
units in Group 2, too high initially, and those in Group 1 being finally too
high.
The interchanges start in Figure 3.38 when m = 97. All units joining are
from Group 1 and show similar profiles. Figure 3.39 is similar, although
interchanges do not occur at all steps of the forward search. However, in
the final panel, when m = 106, five units from Group 1 enter the subset.
124 3. Data from One Multivariate Distribution
0
C\i
"!
l()
a. ci
<tl
<!l
0
ci \,,:'. ': ,... ,'
,, : :'-'
l()
' ~"''
9
~
"';"
FIGURE 3.36. Swiss bank notes starting with units in Group 2 (the forgeries):
gap plot. Figures 3.38 and 3.39 show a series of interchanges starting at m = 97
The final plot of this series, Figure 3.40, shows further interchanges and
the inclusion of units from Group 1. It is not until m = 114 that another
unit from Group 2 is included. This interesting panel, in the centre of the
bottarn row of the figure , shows the inclusion of two units: unit 2, from
Group 1, has a profile which starts high before merging into the general
distribution of distances. On the other hand, unit 128, from Group 2, has a
distribution which is within the Group 2 distribution throughout. Finally,
at m = 114, the distances for the two units are very close.
These series of plots of distances for individual units against a calibratory
background provide a powerful tool for assigning units to groups and we
make appreciable use of them in Chapter 7 on duster analysis. But, to
conclude this chapter, we analyse each group individually.
Figure 3.41 is a forward plot of the scaled Mahalanobis distances for
the units only in Group 1. This seems to be a well-behaved plot: the last
five units to enter are, from last backwards, 1, 40, 70, 71 and 5. Of these
1 and 40 seem to be outlying for much of the search. The forward plot
of the minimum Mahalanobis distance among the units not in the subset,
Figure 3 .42, has the largest distances at the end, suggesting four or five
potential outliers. There are no sharp spikes in the curve, so we seem to have
ordered the units satisfactorily. There is thus no evidence to suggest that
the data are more complicated than a sample from one multivariate normal
distribution with, perhaps, a few outliers. This suggestion is supported by
Figure 3.43 which shows that the elements of the estimated covariance
3.4 Swiss Bank Notes 125
..
FIGURE 3.37. Swiss bank notes starting with units in Group 2 (the forgeries):
nine panel plot of forward Mahalanobis distances for specific units, starting from
m=80
matrix are stable from m = 38 until almost the end of the search: the last
five units to enter do cause a noticeable change in some of the variances
and covariances. The visually most obvious involve y4, Y5 and Y6·
We now conclude our analysis of the genuine notes by looking once more
at the data. Figure 3.44 is the set of univariate boxplots for each variable,
with all univariate outliers marked. Units 1 and 5 are each outlying in
two boxplots and units 40, 70 and 71 in one each. These are all the units
identified by the boxplots as outlying and are the last five to enter the
forward search. However appreciation of the relative importance of these
units is partly clarified by the scatterplot matrix of Figure 3.45 in which
the five units are highlighted.
For example unit 1 has high values of y 2 and y 3 and lies at the upper ex-
treme of the major axis of the bivariate scatter; unit 40 is likewise extreme,
but with the opposite sign, in the plot of y 1 and Y3· But the bivariate scat-
terplots fail to show any dustering of outliers, a conclusion which agrees
with the structure of the forward plots for Group 1.
126 3. Data from One Multivariate Distribution
FIGURE 3.38. Swiss bank notes starting with units in Group 2 (the forgeries):
nine panel plot of forward Mahalanobis distances for specific units, starting from
m = 89. Reference distribution: m = 80
FIGURE 3.39. Swiss bank notes starting with units in Group 2 (the forgeries):
nine panel plot of forward Mahalanobis distances for specific units , starting from
m = 98. Reference distribution: m = 80
enter. The first panel shows the distances up to m = 80 which are used to
define the reference distribution. The next four panels show units entering
which , although extreme, have profiles similar to those in the reference
set. The outlier, unit 125, enters when m = 85. Its distance is too large
throughout, but it has a similar profile to the other units in the central
group. Thereafter the units forming the second cluster start to enter. The
three panels in the bottom row of the plot show units 111, 194 and 168.
These profiles, high for most of the search, are clearly different from those
in the central group. The profiles for the remaining 12 units of this cluster,
which we do not show, are similar although gradually larger at the end of
the search: the forward plot of all distances, Figure 3.46, shows that finally
none of the distances is very large.
This analysis clarifies details of the structure that was found in Chapter 1.
We again conclude by looking at plots of the data. The univariate boxplots,
Figure 3.50, do not reveal a cluster of outliers, although they do show that
there are some present.
128 3. Data from One Multivariate Distribution
FIGURE 3.40. Swiss bank notes starting with units in Group 2 (the forgeries):
nine panel plot of forward Mahalanobis distances for specific units, starting from
m = 107. Reference distribution: m = 80
The scatterplot matrix of Figure 3.51 clearly shows the second duster
and the clear separation in the plot of y 4 against y 6 .
In passing we note that the overwritten Iabels in the boxplots occur
because of the rounding of the data, which is clear in the panels of the
scatterplot matrix, perhaps most clearly in that of Y2 against Y3· It is also
clear in the plot of y 4 against y6 . As Figure 3.52 shows, the structure
of all 200 observations is revealed by this one scatterplot. An interesting
question is whether the other four measurements help in deciding whether
a new note is genuine or forged. We consider such questions in Chapter 6
on discriminant analysis.
3.4 Swiss Bank Notes 129
"'
.,"'
0
c:
<D J
~
'6
"'
:0
g ...
"'
i'D
.s=
"'
::;:
N -j
0
"""1
40 60 80 100
Subset slze m
FIGURE 3.41. Swiss bank notes, units in Group 1 (genuine notes): forward plot
of scaled Mahalanobis distances; a single multivariate population with perhaps
five outliers
0
cD
"'<ri
0
<ri
a::;:
E "'
..;
::::>
E
·c:
~ 0
..;
"'
.,.;
0
.,.;
''
40 60 80 100
Subset size m
FIGURE 3.42. Swiss bank notes, units in Group 1 (genuine notes) : forward plot
of maximum distances of units in the subset. There is some evidence of outliers
130 3. Data from One Multivariate Distribution
1~"'"=- 5,
, - - .... ------~ 4,
,,-----'
,' ~,'-- .... ------------ ...,_,/ .... ...,_
~,.--'
_____ .
_______ __ _
," ..... _.,,.. ___________
.-
"---
-/- - - - - - / I
X
't:: ,"' .//
(ii
E
~
"'0 ,: __ _,-J r- 6,
~fi{fi=i-~
·~
~0
ö 0
{! 0
Q) - . . . - -- - ' - - .... - .... _________ ... - - .... -----,'-- 6,
E
.!!!
UJ
.........
.... _.......
.. _\,
. , ............ _........ _... --- ----------------------\ ____ , s .
40 60 80 100
Subset size m
FIGURE 3.43. Swiss bank notes, units in Group 1 (genuine notes): forward plot
of elements of estimated covariance matrix. The effect of the five outliers can be
seen towards the end of the search
y1 y2 y3 y4 yS y6
0 ~ "'~
~
0
~
..
0
0 0
: ~
"'
...
;;;
I
~ "' ,........,
I I
~
"'
~! :;: 5!
I I
0
~
0
~
.
0
.. "'c
N
.. .
"'~
:L
!1:1 "'!1:1 0
. ..,
0
N
."'
•
0 0
71
!1:1 !l:l "
FIGURE 3.44. Swiss bank notes, units in Group 1 (genuine notes): boxplots
showing univariate outliers
3.4 Swiss Bank Notes 131
....,
1 6~~·
40
A
o!oo
8 9 10 11
FIGURE 3.45. Swiss bank notes, units in Group 1 (genuine notes): scatterplot
matrix showing the five univariate outliers of Figure 3.44
132 3. Data from One Multivariate Distribution
00
"'g
<D
"'
u;
'6
"'0
:.0
c;
fJ
.s::
..
'"
:::0
(\1
-~------_:..:_·_:...~~~
0
40 60 80 100
Subset size m
FIGURE 3.46. Swiss bank notes, units in Group 2 (the forgeries) : forward plot
of scaled Mahalanobis distances. The structure of two groups and an outlier is
clear around m = 70
<D
Cl
:::0
E lO
:::1
E
·;::
~
...
40 60 80 100
Subset size m
FIGURE 3.47 . Swiss bank notes, units in Group 2 (the forgeries) : forward plot
of minimum distances of units not in the subset. The p eak is at m = 84
3.4 Swiss Bank Notes 133
, ' , -'
---------~--~~----~
-------5,4
40 60 80 100
Subset size m
FIGURE 3.48. Swiss bank notes, units in Group 2 (the forgeries): forward plot
of elements of estimated covariance matrix. The variances of Y4 and Y6, together
with their covariance, change appreciably at the end of the search
134 3. Data from One Multivariate Distribution
..
.... ...
·::·::::::::::::::~:~:
··············· · · ··~~ .........
·. ·.
~
~
FIGURE 3.49. Swiss bank notes, units in Group 2 (the forgeries): nine panel plot
of forward Mahalanobis distances for specific units, starting from m = 80
3.4 Swiss Bank Notes 135
y1 y2 y3 y4 yS y6
167 - ·I
~~ 0
..,
123
~l
0
i l~
0 !'!
~ ~
~ r-1
0
!
0
~
N
11M
.---,
- .a "'
~
::
::
"l
q
:: ~
el
N
~I
I
e"'
0
"' 0
~
N
a
0
~
e
0
~I
;;;
"~
'---'
~
.. ..
L..;._j
"' "l
i~
L..-..J
'--'
~
~L -- ..
~
~
142
~53 ~- :;:
190
___J
160 -
lU
FIGURE 3.50. Swiss bank notes, units in Group 2 (the forgeries): boxplots show-
ing univariate outliers
136 3. Data from One Multivariate Distribution
FIGURE 3.51. Swiss bank notes, units in Group 2 (the forgeries) : scatterplot
matrix. The separation of the two groups is particularly clear in the panel for Y4
and Y6
3.4 Swiss Bank Notes 137
+
+ + +
+ +
+ ++
++++++ +
+ + + +
++++ +++ + +
+++++ ++
+ + ++++ +
++ +++++ ++
++ + ++ ++
+ + + +
+ + ++++
+ + + + +
++ + + +
++ + ++
-...-.-. -·-
+
+ +
+ •
• •• ••
... . ..
•
• •
...
. -.--.. .......
•
0
- ..
• • •• •
+
• • • • •
11116
• •• •
167
182
192 1§~
187
161
138
148 162
171
7 8 9 10 11 12
y4
FIGURE 3.52. Swiss bank notes. Scatterplot of Y6 agairrst Y4 which reveals most
of the structure of all 200 observations: there are three groups and an outlier
from Group 1, the crosses. Unit 125, the open circle, lies within Group 2 for
these variables
138 3. Data from One Multivariate Distribution
a third group of 15 units and the main second group with one outlier. The
scatterplots showed no evidence of non-elliptical contours, so that there
was no obvious reason to consider transformation of these data.
140 3. Data from One Multivariate Distribution
3. 6 Exercises
Exercise 3.1 Table 3.3 gives 20 simulated observations Y from a bivariate
normal population. What do you expect will be the form of the forward plots
of the determinant IEml, of the maximum Mahalanobis distance within the
subset, of the minimum distance outside the subset and of the gap plot
(equation 2.105 or 2.106}? What do you expect from the forward plot of
scaled M ahalanobis distances ?
Exercise 3.2 A constant value of six is added to the last three rows of the
matrix Y in Table 3.3. What do you expect for the form of the Jiveforward
plots you described in Exercise 3. 1? What is the effect of the inclusion of
the three contaminated units on the covariance matrix?
Exercise 3.3 If instead the constant value of six is added to the last six
rows of the matrix Y in Table 3.3, what are the forms of the Jiveforward
plots?
Exercise 3.4 As in Exercise 3.3 the constant value six is added to the last
six rows of the matrix Y in Table 3. 3. Describe the form of the Jive forward
plots when now the initial subset contains Jour good units (say for example
3.6 Exercises 141
•
.• • •
. ... •
0
"<t C\1
I ...... d
) •
I()
~r.·
C\1
d
•
. -,a:··.
C\1
>- ~~
.I ••...•
d
0
I •
•""""'•'I
•
.i..."• ..
I I()
0
d I
~
•
•
0
d
;:
-4 -2 0 2 4 6 0.0 0 .2 0.4 0 .6 0 .8 1.0
y1 y1
FIGURE 3.53. Exercise 3.7: two sets of simulated bivariate data. Which of the
two has to be transformed to achieve bivariate normality?
11, 12, 13 and 14) and two contaminated units (say for example 15 and
16) .
Exercise 3.5 Again, as in Exercise 3.3, we add six to the last six rows of
the matrix Y in Table 3. 3, but now start the search with a subset formed
only by the contaminated units. Describe the form of the Jiveforward plots.
Exercise 3.6 Track records: can you assess from Figures 3.19 and 3.22
which units enter the search in the last three steps? What is their order of
entry?
Exercise 3.9 The QQ plot of Figure 1.5 shows three Zarge outliers. The
next nine distances lie outside or on the simulation envelope. Determine
whether these, tagether with one of the three clear outliers, form the group
of ten developing countries identified as being rather different from the rest
in §1. 3. Discuss the implications of your finding for masking.
142 3. Data from One Multivariate Distribution
0 0
::::;; co:i
E
::>
E
-~ 0
::::;; C\i
6 8 10 12 14 16 18 20 6 8 10 12 14 16 18 20
Subset size m Subset size m
0
::::;; 00
E
::>
E «>
·c:
~ ....
C\1
6 8 10 12 14 16 18 6 8 10 12 14 16 18
Subset size m Subset size m
FIGURE 3.54. Exercise 3.1, simulated uncontaminated data from a bivariate nor-
mal distribution: forward plots of Wilks' ratio, maximum and minimum distances
and of the gap. The scales of the y axes are the same as those in Figure 3.56
3. 7 Salutions
Exercise 3.1
To plot determinants we use Wilks' ratio defined as lf:ml/ lf:nl (equa-
tion 2.107), which is one at the end of the search. When there are no
outliers we expect that the curve which monitors this ratio will increase
steadily from 0 to 1 throughout the forward search. Similarly, we expect
that the curves of the maximum (mth) and minimum (m + lth) Maha-
lanobis distances will increase slightly with small random fluctuations dur-
ing the search. Figure 3.54 shows that this is the case.
When there are no outliers we expect that all scaled Mahalanobis dis-
tances fall within the theoretical horizontal asymptotic confidence bands
throughout the forward search (see Figure 3.55) .
<0
LO
V>
Q)
<.>
c:
CO ....
1ii
'0
V>
:i5
0 M
c: ············ .........
m
CO
. ···········
~~~
.c
CO
:::;: N
. - - --.
-~-::-.•::;:.::..~--:;;..=-~-.:~.:=:. .-=---:..:=-;~=-=-=--=---.:::.~
::-,_,.._::=:=-~-·-···-···~--~--=~:c=··--~
0
10 15 20
Subset size m
FIGURE 3.55. Exercise 3.1, simulated uncontaminated data from a bivariate nor-
mal distribution: monitoring of scaled Mahalanobis distances. The two horizontal
lines are the 2.5 and 97.5 percentiles of the x~ distribution. The scale of the y
axis is the same as that in Figure 3.57
0
..,:
00
0
0 0
0 :::;: c.;
~ E
....0
.Cl) ::J
~ E
-~ 0
~ :::;: C'J
0 ~
0
6 8 10 12 14 16 18 20 6 8 10 12 14 16 18 20
Subset size m Subset size m
~
00
~
0
:::;: 00
<0
E c.
::J
E
·c:
<0 "' ....
(!)
~ .... N
C\J
0
6 8 10 12 14 16 18 6 8 10 12 14 16 18
Subset size m Subset size m
FIGURE 3.56. Exercise 3.2, simulated data from a bivariate normal distribu-
tion with three contaminated units: forward plots of Wilks' ratio, maximum and
minimum distances and of the gap
144 3. Data from One Multivariate Distribution
<0
Lt)
"'c
Q)
....
(.)
~
'0
"'0
:0
C')
c
"'
(ij
.<=
"' ""
::;;:
10 15 20
Subset size m
FIGURE 3.57. Exercise 3.2, simulated data from a bivariate normal distribution
with three contaminated units: forward plot of scaled Mahalanobis distances
expect there will be a decrease due to masking after the jump. Figure 3.56
shows all these properties.
In the plot of all scaled Mahalanobis distances we expect that the curves
associated with the 3 outliers will be remote from the rest up to step
m = 17. In the final step we expect them to have Mahalanobis distances
comparable with the other units (see Figure 3.57).
The effect of the three outliers is to increase considerably all the entries
of the variance covariance matrix. The plot which monitors the elements of
the covariance matrix (not given here) shows a big change in the slope of
the curves for all three elements, that is for &i, &~ and &12 , when m = 18.
Exercise 3.3
With six outliers we expect a change in the slope of the curve which moni-
tors the Wilks' ratio when m = 15 and a jump in the curve of the maximum
(mth) Mahalanobis distance at the same point. At step m = n- 6 = 14
we expect a jump in the plots of the minimum distance and in the gap
plot. Figure 3.58 confirms our expectations. Since these six units form a
duster of outliers, we expect a considerable decrease after the jump due to
masking.
In the forward plot of scaled Mahalanobis distances we expect that the
curves associated with the six outliers will be remote up to m = 14. With
this high level of contamination (30%), we expect that the outliers in the
3.7 Solutions 145
...
0
CX)
ci
a
:::;: 0
"'
0
~ E
-., :::>
E
-"' ~
·x
ci
~ 0
"'
:::;: N
~ ~
0
6 8 10 12 14 16 18 20 6 8 10 12 14 16 18 20
Subset size m Subset size m
~
CX)
~
a
:::;: <D
CO
E 0.
:::>
E
·;::
<D
"'
Cl ~
:E ~
"'
"' 0
6 8 10 12 14 16 18 6 8 10 12 14 16 18
Subset size m Subset size m
FIGURE 3.58. Exercise 3.3, simulated data from a bivariate normal distribu-
tion with six contaminated units: forward plots of Wilks' ratio, maximum and
minimum distances and of the gap . Note the masking effect after the peak
final steps will have Mahalanobis distances in agreement with those of the
other units (see Figure 3.59).
Exercise 3.4
If the percentage of contaminated units in the initial subset is not too high
the outliers are usually removed in the initial steps of the forward search,
so that the final part of the search is exactly equal to the one we obtain
starting from an initial subset free from contamination. Figure 3.60 shows
that initially there is a considerable overlapping among the curves. The
first steps are very active, the contaminated units being removed from the
subset. They are clearly distinct from the majority of the data in the central
part of the search. Comparison of Figure 3.60 with Figure 3.59 indicates
correctly that the order of entry in the central and final parts of the search
(fromm= 10 onwards) is exactly the same.
Exercise 3.5
When we start the search in the group of contaminated units the two groups
are clearly visible in the forward plot of scaled distances, Figure 3.61, until
some uncontaminated units enter the subset. Generally, after the initial
inclusion of units from the other group, we will observe an interchange
because the centroid of the fitted observations will lie in between the two
groups. Figure 3.62 shows that this interchange occurs around steps 8-12.
146 3. Data from One Multivariate Distribution
<0
U)
"'"'c: ..,.
0
"'
ü5
'i5
"'
:0
0 (')
c:
"'
(ij
.<::
"' ""
::;:
10 12 14 16 18 20
Subset size m
FIGURE 3.59. Exercise 3.3, simulated data from a bivariate normal distribution
with six contaminated units: Forwardplot of scaled Mahalanobis distances. Note
the masking at the end of the search
<0
U)
"'"'c: ..,.
0
7Jl
'i5
"'0
:0
(')
c:
.5!1
"'
.<::
"'
::;:
""
10 15 20
Subset size m
FIGURE 3.60. Exercise 3.4, simulated data from a bivariate normal distribution
with six contaminated units starting with an initial subset containing both good
units and outliers: forward plot of scaled Mahalanobis distances. The outliers are
immediately removed from the subset in the first step of the search
3.7 Salutions 147
<0
"'
V>
Q)
u
~
<::
....
'ö
V>
:0
0
<:: ""
"'
(ij
.s:::
"' "'
:;:
5 10 15 20
Subset size m
FIGURE 3.61. Exercise 3.5, simulated data from a bivariate normal distribution
with six contaminated units, starting in the group of contaminated units: forward
plot of scaled MahaJanabis distances. The continuous lines are the "good" units,
the dotted lines are for the outliers. The two groups are clear at the beginning
of the search. From m = 5 to m = 10 the contaminated units have increasing
MahaJanabis distances while t he good units show a clearly decreasing pattern
Exercise 3.6
If there are no interchanges the unit which enters the subset in the last
step of the forward search is the one which has the largest Mahalanobis
distance when m = n - 1. Similarly, the unit which enters the subset in
step (n - 1) of the forward search is the one which has the Mahalanobis
distance ranked (n - 1) when m = n- 1. So it is clear from Figure 3.19
that, in steps n - 2, n - 1 and n the units which join the subset are 36,
33 and 55. Applying a similar reasoning it is clear from Figure 3.22 that
starting with n - 2 the order of entry of the units is: 36, 55 and 33.
Exercise 3. 7
The left panel of Figure 3.53 shows that the data have a symmetrical ellip-
tical shape. On the other hand, the data in the right panel show a differing
148 3. Data from One Multivariate Distribution
0
cn
CO
LI)
c:i Cl C\i
0 :::!;
~ E 0
::J
/~ ~ ~ ' "..._
~ ....
c:i
E C\i
-~
~ :::!; ~
0 C!
c:i
5 10 15 20 5 10 15 20
Subset size m Subset size m
~ LI)
~ ....
Cl
:::!; CO
E Cl.
"'
::J
E
·c:
CD "'
<!l
N
~ ....
0
N
','
"';-
5 10 15 5 10 15
Subset size m Subset size m
FIGURE 3,62. Exercise 3.5, simulated data from a bivariate normal distribution
with six contaminated units starting in the group of contaminated units: forward
plots of Wilks' ratio, maximum and minimum distances and of the gap. Inter-
changes are evident in the plots of the distances around m = 10, after the p eaks
at m = 6 or 7
spread of the data for one variable at various values of the other variable.
It is clear that in this case the variability increases with the mean for both
variables. This is a typical shape of data which have to be transformed to
achieve approximate normality.
The 250 points in the left panel of Figure 3.53 were simulated from a
normal bivariate distribution with J.L = (6, 6)T and
2.6 )
2 .
The data in the right panel are a multiple of the exponential of t hose in
the left panel. The pattern in the right-hand panel of Figure 3.53 is very
similar to t he babyfood data which will be analyzed in Chapter 4.
The information provided by plots of the likelihood ratio test for the null
hypothesis of no transformation during the forward search is different for
the two sets of data. That from the left-hand panel of Figure 3.63 yields a
horizontal line centred around E(x~) = 2 and below the critical rejection
points. The data in the right-hand panel of Figure 3.63 show an increasing
curve always well above the critical percentage points of the x~ distribution.
Figure 3.63 shows the monitoring of the likelihood ratio test in both cases.
3.7 Salutions 149
0
0
~ 0
(\J
~ 0
0
ll)
u; Q)
u;
.!!! .!!!
0 .Q 0
<D 0
~ "§ ~
""' ....
:::; ""'
:::;
8
ll)
(\J
0 0
FIGURE 3.63. Data from Figure 3.53: likelihood ratio tests for the hypothesis of
no transformation. The left and right panels are for the data in the respective
panels of Figure 3.53
Exercise 3.8
The Mahalanobis distance is invariant under linear transformations so the
simulation of the distances does not require the sample estimates of the
values of p and :E. In other words, it is possible to use the standard nor-
mal distribution to generate samples from the multivariate distribution
Nv(O,I).
Exercise 3.9
As Figure 3.64 shows, beyond units 33, 36 and 55, the next nine distances
which lie outside the simulation envelope are associated with units 12, 13,
51, 35, 16, 52, 25, 7 and 14. This list is very different from the group
of ten developing countries identified as being rather different from the
rest in §1.3. As Figure 3.13 showed, during the final steps of the search the
Mahalanobis distances of this group of units seems to decrease substantially.
The conclusion is that working backwards it is impossible to detect the
group of units found in section §1.3.
150 3. Data from One Multivariate Distribution
••
.. r--
1.0 1.5 2 .0 2.5 30 35 40
4.1 Background
The analysis of data is often improved by using a transformation of the re-
sponse, rather than the original response itself. There are physical reasons
why a transformation might be expected to be helpful in some examples.
If the data arise from a counting process, they often have a Poisson distri-
bution and the square root transformation will provide observations with
an approximately constant variance, independent of the mean. Similarly,
concentrations are nonnegative variables and so cannot strictly be subject
to additive errors of constant variance. Such effects are most noticeable
if there are observations both close to, and far from, zero as they are for
the viscosity measurements of the babyfood data introduced in §2.13.2. In
this chapter we analyze such data using the multivariate version of the
parametric family of power transformations introduced by Box and Cox
(1964).
We begin in §4.2 with another look at the babyfood data, comparing
the forward plots of Mahalanobis distances from transformed and untrans-
formed analyses. The theory for transformation of the response in univari-
ate regression is in §4.3.1, with the extension to multivariate data in §4.3.2.
Before we consider the forward search, we describe relatively easily calcu-
lated score tests in §4.4. Graphical procedures, including fan plots of score
tests, are in §4.5.
The procedure for finding multivariate transformations with the forward
search is described in §4.6. This is exemplified in three relatively straight-
152 4. Multivariate Transformations to Normality
forward examples. For the babyfood data in §4. 7 we show that the log
transformation is most satisfactory, a conclusion we can demonstrate is
unaffected by the influence of individual observations. In our analysis of
the data on Swiss heads in §4.8 we show the strong effect of just two units
on the estimated transformation. Our third introductory example, in §4.9,
is of data on horse mussels. As a result of our transformation we are able
to separate a set of six observations from the majority of the data.
The major example of the chapter is an analysis of the data on munici-
palities in Emilia-Romagna in §4.10. Amongst other features, we find that,
with 341 units, we need a finer grid than previously of values of trans-
formation parameters. We also need to replaces zeroes in the data with
minimum values. The forward search enables us to monitor the influential
impact of such changes. The analysis is complicated because there are 28
responses - it is not possible to have all outliers in all responses entering
at the end of the search. We follow this in §4.11 with a shorter analysis of
the transformation of the data on national track records for warnen. Again,
our transformation leads to a simplified structure for the data.
An example of transformations when there is a regression structure is in
§4.12. The data used come from an experiment on dyestuff manufacture. In
§4.13 we show how the forward search can be used to provide information
about the variables to be included in the linear model for the babyfood
data. The chapter concludes with suggestions for further reading.
TABLE 4.1. Babyfood data: minimum and maximum value of each response and
their ratio
Figure 4.2 shows the same plot for the data after logarithms have been
taken of all four responses. The observations now appear much more nor-
mal: the univariate boxplots seem symmetrical and the robust contours,
which are plausibly elliptical, include a higher proportion of the observa-
tions. Those observations that are outside the outer contour are much less
remote than they werein Figure 4.1.
The beneficial effect of the transformation in moving the data closer to
normality can also be seen in forward plots of Mahalanobis distances. Fig-
ure 4.3, the plot of scaled distances for the original data, is quite irregular
with severallarge outliers for much of the search. By comparison, the plot in
Figure 4.4, for the logarithmically transformed data, is stable, the pattern
of distances not changing much as the search progresses.
There are three further informative contrasts between these two figures.
The firstisthat Mahalanobis distances are scale free: they consist of residu-
als divided by an estimated standard deviation. The plots of Figure 4.3 and
4.4 are on the same vertical scale, showing that the large distances for the
untransformed data are much greater, until the final step of the search, than
those for the transformed data. This is a reflection of the difference in the
presence and structure of outliers between the untransformed and trans-
formed data which we saw in the scatterplot matrices. The second contrast
between the two figures is less evident. Figure 4.4 for the transformed data
starts with m 0 = 11 , found from the intersection of units inside ellipses
for which 0 (2.103) is one. This ellipse for the transformed data is shown
in Figure 4.5 together with the 50% ellipse passing through the median
Mahalanobis distance. For n = 27 the F value (2.102) corresponding to 0
= 1 gives an ellipse with an asymptotic content of 61.8%. The figure shows
that the two ellipses are close together, an indication of normality of the
data. However the ellipses in Figure 4.6 for the untransformed data are far
apart. The inner one again contains exactly 50% of the data. But the outer
ellipse, again for 0 = 1, is much larger, because the estimate of E is inflated
by the presence of outliers, even though the ellipse has a robust centre. As
a result of the skewed distribution of the Mahalanobis distances the ellipse
contains appreciably more of the observations than that in Figure 4.5. If we
use the same value of e for the starting point of the untransformed data as
154 4. Multivariate Transformations to Normality
.8 0 0 0
.6 0 0 0
..
.
0 0
0 0 0 0
:lroo
• (!)
(!) 0
0 .8 0
0 0
(!) ( 0
·"'
..
0 0 0 .6
0 0 .8
:t.:t
0
0 •
0
[5
FIGURE 4.1. Babyfood data: scatterplot matrix with asymptotic 50% and 99%
spline curves. There are many outliers and the data are far from multivariate
normality
FIGURE 4.2. Logged babyfood data: scatterplot matrix with asymptotic 50%
and 99% spline curves. The data are much closer to multivariate normality after
transformation than the untransformed data in Figure 4.1
Figures 4.3 and 4.4 we fitted a constant Jlj to each response. We return
later in this chapter, in §4. 7, to an analysis in which we fit a model in the
explanatory variables. In this example, although not in all, introduction of
a linear model does not affect our choice of transformation.
"'
.,
<J)
0
c: ~
"''6
<i)
V>
:0
g ~
..
"'
;;;
.<:
:::;:
"'
0
---
--------~-::'9'-
14 16 18 20 22 24 26
Subset size m
FIGURE 4.3. Babyfood data: forward plot of scaled Mahalanobis distances. There
are some large distances for much of the search, which are masked at the end
other transformations, such as the square root or the reciprocal, would give
even better properties. In order to test whether this is so, we embed the
various transformations in the single parametric family due to Box and
Cox (1964). An advantage of this embedding isthat standard methods of
inference are then available for the choice of the best transformation.
In the next section we present a short description of methods for transfor-
mation of the response in univariate regression; the multivariate extension
is in §4.3.2. Unfortunately the estimated transformation and related test
statistics may be sensitive to the presence of one, or several, outliers. Ex-
amples for univariate data are in Chapter 4 of Atkinson and Riani (2000).
We use the forward search to see how estimates and test statistics evolve
as we move through the ordered data. Since observations that appear as
outlying in untransformed data may not be outlying once the data have
been transformed, and vice versa, we employ the forward search on data
subject to several transformations, as well as on untransformed data.
0
N
"'8 ~
~
üi
'6
~
0
c e
"'
<0
.s:
"'
:::;
/
"'
:~ -------=~- ··-... ~ -
o ~ A~~~~~;~~~--------------------~~~~~~~-~~---
~-----------., ~
15 20 25
Subset size m
FIGURE 4.4. Log transformed Babyfood data: forward plot of scaled Maha-
lanobis distances. Compare with Figure 4.3
. .
#'
.
~5
~
: 4 '
.l-.r .•:fr· I
1!
:7 B 2
· f-
I
~·
85
.
. I
}}/"'
5 5
tf ~
ß
22
. '3-- - ·
.t·
) 'f' ,)
.
I
5ßß I %5 I ß 5
.P~
.. 14
~
2 I -
·'~~
.1
.(.
FIGURE 4.5 . Logged babyfood data: scatterplot matrix with fitted ellipses. The
inner ellipse contains 50% of the data, the outer, for (} = 1, asymptotically con-
tains 61.8%. For (} = 1, mo = 11
Once a value of .X has been decided upon, the analysis is the same as
that using the simple power transformation
I ß
8
- ß
6 5
.:z~-·t14
•••
--/ '#_L
•' -
~I --···(
I
. 5
-
•• .L~ ß
ß . 8
/ . I • ~4
,.-:_;_:Jl.2....
,, -
....··
,' r-'-
:..
5 5
ß
I
~<'
8
-
,.
5 5
- ß
..l;r:,:
ß 6
8
~~ #
//
:'~,,~
FIGURE 4.6. Babyfood data: scatterplot matrix with fitted ellipses. The inner
ellipse contains 50% of the data, the outer, for (} = 1, asymptotically contains
61.8% . Outtiers cause the ellipses to be far apart, with the inner ellipse virtually
indistinguishable from the central duster of data points. Here, for (} = 1, mo = 19
(4.4)
z(.X) = y(.X)/Jlfn,
for which the Jacobian is one. The likelihood is therefore now
160 4. Multivariate Transformations to Normality
a standard normal theory likelihood for the response z(.-\) . For the power
transformation (4.3),
so that
log J = (.-\- 1) 2)ogyi = n(.-\- 1) logy.
(4.9)
Although the two statistics have the same properties for testing the value
of .-\, we sometimes prefer to plot the asymptotically normal TN: including
the sign of the difference 5. - A0 gives an indication of the direction of any
departure from the hypothesised value.
4.3 Power Transformations to Approximate Normality 161
n
const - n log If:(>..) I - L ei(>.)Tf:(>.) - l ei (>..). (4.12)
i=l
In (4.12) the parameter estimates Pi(>..) and f:(>.) are found for fixed >. and
ei(>..) is the v x 1 vector of residuals for observation i for the same value
of >... As in (4.8) for univariate transformations, it makes no difference in
likelihood ratio tests for the value of >.. whether we use the maximum likeli-
hood estimator of I:, or the unbiased estimator f:u. Suppose the maximum
likelihood estimator is used, so
(4.13)
(4.15)
These are the two most frequently occurring values in the analysis of data:
either no transformation, the starting point for most analyses, or the log
transformation. For other values of .X the constructed variables are found by
evaluation of (4.20). Because Tp(.A) is the t test for regression on -w(.A),
large positive values of the statistic mean that A0 is too low and that a
higher value should be considered.
When the Box-Cox transformation is applied to multivariate data, lin-
earization of the transformation (4.10) leads to the nv values of the v
constructed variables
Wij (.Xo)
(4.22)
where "/j is a scalar. Using the approximate score test to determine whether
.A0 is the correct transformation of the v responses is equivalent to testing
that the v parameters "/j, j = 1, ... , v are all zero. Even if the matrix of ex-
planatory variables X is the same for all responses, so that the regression
model is (2.57), (4.22) shows that inclusion of the constructed variables
means that the variables for regression are no Ionger the same for all re-
sponses. As a result, the simplification of the regression in §2.11 no Ionger
holds and the covariance :E between the v responses has to be allowed for in
estimation. U nder the alternative hypothesis, although not under the null,
independent least squares is replaced by seemingly unrelated regression
described in §2.11.
164 4. Multivariate Transformations to Normality
To test the hypothesis >. = >. 0 , we can use an approximate score test. As
in the likelihood ratio test given by (4.15), this is a function of the ratio
of determinants of estimated covariance matrices. Under the hypothesised
transformation the estimated covariance matrix is f;(>. 0 ). Regressionon the
constructed variables Wj(>.o) in (4.22) gives a vector of parameter estimates
i and an estimated covariance matrix from the resid uals which we call f;( i).
The approximate score test is then
keep their values in .\0 . To form the likelihood ratio test we also require
the estimator ~Oj found by maximization only over Aj. More explicitly, we
can write ~Oj = ( Ao(j) : ~j). Then the version, for multivariate data, of the
signed square root likelihood ratio test defined in (4.9) is
(4.24)
We produce v fan plots, one for each variable, by letting Aj, j = 1, ... , v
take each of the five standard values. Alternatively, particularly if numerical
maximization of the likelihood is time consuming, we could produce the fan
plot from the signed square root of the score test in (4.23).
4. 7 Babyfood Data
Our preliminary analysis of the babyfood data indicated that the log trans-
formation greatly improved the normality of the data. We reached this con-
clusion without fitting a linear model. We now fit a model and see what
transformation is needed, using the forward search to detect the effect of
individual observations.
There are five explanatory variables. In their analysis of the data Box
and Draper (1987, page 572) find a linear model with terms in x 2 , x 3 and
X5 as weil as, surprisingly, the interaction X3X4 in the absence of x 4 . This
model was suggested for ail four responses. It is generaily agreed that such
models, violating a marginality constraint, are undesirable: if the variables
in this model are rescaled, the model will apparently change, a term in x 4
appearing. Suggestions for determining the importance of this interaction
are on p. 574 of Box and Draper (1987), tagether with an analysis of the
fitted coefficients in the model, which suggests that perhaps a common
model can be fitted to ail four responses.
For the present we choose a linear model with ail five first-order terms as
weil as the interaction of x 3 and x 4 , estimating the parameters separately
for each of the four responses. Our purpose at this point is to demonstrate
methods for choosing a multivariate transformation. We discuss the use of
the forward search in selecting this model in §4.13.
The first step in §4.6 is a forward search through the untransformed
data, estimating >.. at each step. The resulting forward plot of the four
elements of ~ is in Figure 4. 7. It is clear from this stable forward plot of
values close to zero that we should try the logarithmic transformation in
Step 2. The high significance of t he transformation is shown in the forward
plot of the likelihood ratio test for >..0 = 1 in Figure 4.8. This increases
steadily throughout the search; even initiaily the values are significant when
4. 7 Babyfood Data 167
C>
I
I
I
CX) I
0 I
I
I
I
<0 I
0
'<t
"'
"'0
.0
0
E
.!!! C\J
0
0
0 ,,........
..
,_______
\',
I -------
."
1 /
14 16 18 20 22 24 26 28
Subset size m
FIGURE 4.7. Babyfood data: forward plot of the four elements of the maximum
likelihood estimate >.. The log transformation is indicated for all responses and
there are no obvious outliers
0
0
(')
"'
C\1
0
0 0
C\1
~
"C
0
0 0
,!:
c;;
..><
"'
::J
8
"'
0
14 16 18 20 22 24 26
Subset size m
FIGURE 4.8. Babyfood data: forward plot of the likelihood ratio test for the
hypothesis of no transformation. The horizontallines are the 95% and 99% points
of X~
CO
0
CO
0
"<t
"'
-c
.0
E
0
.!!l N ..................
0
·..\\······· · ····· ······················-· ........·····················-···-····-··-·--......
0 ·················•··
_]
0 -::::..-..::~
15 20 25
Subset size m
FIGURE 4.9. Log transformed babyfood data: forward plot of the four elements
of the maximum likelihood estimate i The log transformation, or a slightly lower
value of .\, is indicated for all responses from this search on the logged data
I' I'
I I
"<1- "<1-
L{) L{)
"<1- "<1-
0 0
"<1- "<1-
"<1- "<1-
"<1- "<1-
~ m=26 y3 ~ m=26 y4
I I
-0.9 - 0.5 - 0.1 0.3 0.6 0.9 - 0 .9 - 0 .5 - 0 1 0 .3 0 .6 0. 9
FIGURE 4.10. Babyfood data: profile loglikelihoods for the four transformation
parameters when m = 26. Search on logged data
"'
"'
0
"'
~
"0
!;e
0
0
,!:
]l 0 \
\
::J
"'
15 20 25
Subset size m
FIGURE 4.11. Logged babyfood data: forward plot of the likelihood ratio test for
the hypothesis of the log transformation. The horizontal lines are the 95% and
99% points of x~: this transformation is supported
4.8 Swiss Heads 171
-1
~
;;; ...- - ·05
.l!l
············-·-····-···-···-·····/"/ •..
e
0
"0 ''
------
Cl
ü5 0
-----05
-------------
--. 1
0
C)J
15 20 25
Subset size m
FIGURE 4.12. Babyfood data: fan plot of signed square root likelihood ratio tests
that all responses should have the same transformation . The incomplete curves
result from numerical problems with the convergence of parameter estimates for
data which Figure 4.1 shows are far from normal
untransformed data, so that the order of entry of the units is the same as in
earlier chapters. In particular, units 104 and 111 are the last two to enter.
The figure shows the enormous impact these two observations have on the
evidence for transforming the data. At m = 198 the value of the statistic
is 7.218, only slightly above the expected value for a x~ random variable.
This rises to 15.84 after the two outliers have entered, a value above the
95% point of the distribution, which is included in the plot. Without the
information provided by the forward plot it would be easy to be misled into
believing that the data need transformation.
Evidence for a transformation is provided by a skewed distribution. The
only skewed distribution in the scatterplots of the data such as those in
Figure 3.10 is the marginal distribution of y 4 , caused by the outlying values
of units 104 and 111. To test whether all the evidence for the transformation
is due to y 4 we repeat the calculation of the likelihood ratio, but now
only testing whether -\ 4 = 1. The other five values of ,\ are kept at one,
both in the null parameter vector Ao and in the m.l.e. i The search is
therefore the same as before, but now gives rise to Figure 4.14, showing
a test statistic to be compared with xi. It is now even clearer that all
evidence for transformation of y 4 is provided by the inclusion of units 104
and 111. At the end of the search the test statistic has a value of 8. 789,
compared with 15.84 in Figure 4.13 for transforming all six variables. The
difference, 7.05, is not significant for a xg random variable, so that the
172 4. Multivariate Transformations to Normality
~
1ii
.l!l
~
";.
:.::;
LO
FIGURE 4.13. Swiss heads: forward plot of the likelihood ratio test for the hy-
pothesis of no transformation. The horizontal line is the 95% point of X~· The
last two units to enter provide all the evidence for a transformation
evidence of the tests at the end of the search is that >. 4 -j. 1, whereas all
other variables are equal to one.
The forward plot of the maximum likelihood estimate of >.4 , when all
other variables remain untransformed, is less revealing than the plots of
the estimates for the babyfood data in Figures 4. 7 and 4.9. The estimate
is poorly defined until m = 140, being larger than one. It then gradually
drifts down to around 0.5, before suddenly decreasing in the last two steps
and becoming -1.108 at the end of the search. This evolution needs tobe
judged against tests of the parameter value.
We use the fan plot based on the signed square root of the multivariate
likelihood ratio test which was defined in (4.24) as a confirmatory test
of the value of >.. Since there are six transformation parameters, the fan
plot in Figure 4.15 shows the result of 30 forward searches with Ao(j) = 1.
In all panels, except that for y 4 , there is no evidence of any need for a
transformation. In fact, any of the five transformations would be acceptable
for Yl, Y2, y 3 and y 5 since all statistics lie within the 99% interval for the
asymptotic normal distribution throughout the searches. For y6 the statistic
for no transformation is closest to zero, with the reciprocal and reciprocal
square root being rejected at the end of the search. The transformation for
Y4 is more interesting and the panel is reproduced enlarged in Figure 4.16.
The main feature of the fan plot for y 4 in Figure 4.16 is the effect of the
two outlying units which enter in the last two steps of the search when >. =
1 or 0.5. Before the entry of the outliers, at m = 198, all transformations
4.8 Swiss Heads 173
CO
C\1
~ -~
0
FIGURE 4.14. Swiss heads: forward plot of the likelihood ratio test for the hy-
pothesis of no transformation of Y4· The horizontal !irres are the 95% and 99%
points of xi. The last two units to enter provide all the evidence for this trans-
formation
are acceptable. But at the end of the search, the hypothesis of no transfor-
mation is clearly rejected with a value for the statistic of -2.965. Although
the outliers also enter at the end of the search when >. = 0.5, these two
observations are less outlying on other scales and so enter earlier in some
other searches. For example, for >. = -1, there are two downward jumps
in the value of the statistic, caused by these outliers entering when m =
187 and 193. For the other transformations the outliers enter later in the
search, their effect being apparent on all curves. The value of m for entry
of the two outliers into the subset is given in Table 4.2 for each search. As
the transformation moves from the reciprocal to no transformation, there
is a smooth increase in the value of m at which the observations enter.
174 4. Multivariate Transformations to Normality
M r--------------------------,
t:;;ii!::~S§}:!~tf-~i;;J?-~ ~6: ~ o
~ L-------------------------~ ~ L-------------------------~
M r--------------------------,
~A
~~~~~~-~ ~
J ~--~~ \-i·-0.5
Cl- 0
1\0.5
~ L-------------------------~ ~ L-------------------------~
"'
120 140 160 180 200 120 140 160 180 200
Subset size m Subset size m
FIGURE 4.15. Swiss heads: fan plots when Ao = 1. Only Y4 shows a need for any
transformation. See also Figure 4.16
TABLE 4.3. Swiss heads: minimum and maximum value of each response and
their ratio
This example clearly shows the effect of the two outliers on the estimated
transformation. In this case unthinking reliance on the aggregate statistics
when all observations are fitted would lead to a model which was inap-
propriate for most of the data. Although here the two outliers are readily
detected by using simple tools, such as scatterplots, our analysis quanti-
fies the effect of the two observations on a specific aspect of parameter
estimation.
Apart from the transformation of y 4 these data provide no evidence that
any other variables need transformation. This is typical of data in which the
observations have a narrow range, are far from the origin and have roughly
4.8 Swiss Heads 175
Subset size m
FIGURE 4 .16. Swiss heads: fan plot for Y4 when >.o = 1. The effect of the two
outliers can be seen at different points in the search for different transformations
(detail of Figure 4.15)
Y1 : shell length, mm
y2: shell width, mm
y3 : shell height, mm
Y4: shell mass, grams
Y5: muscle mass, grams.
The data were introduced by Cook and Weisberg (1994, p. 161) who treat
them as regression with muscle mass, the edible portion of the mussel, as
response. They focus on independent transformations of the response and
of one of the explanatory variables which we now call y4 . In Atkinson and
Riani (2000, p. 116), the focus is on the joint transformation of the response
and one explanatory variable in the regression model. Here we see whether
multivariate normality can be obtained by joint transformation of all five
variables.
We begin by looking at the data. Figure 4.17 is the scatterplot matrix
of the data with superimposed robust contours. Several of the contours are
not elliptical and many cover the data poorly: for example in the plots of y5
against y 1 or y3 the scatter of points is decidedly curved, lying to one side
of the contour. It therefore seems that there is plenty for a transformation
to do in achieving scatterplots with elliptical contours.
We start with a forward search with untransformed data. Figure 4.18
is the forward plot of the resulting likelihood ratio test, to be compared
with X~· The value at the end of the search is 160.56 and the statistic is
significant throughout the range shown in the figure. The data need to be
transformed.
To obtain a first idea of a better transformation consider the forward
plot of the estimates of ,\in Figure 4.19. These estimates alltrenddown at
the end of the search, indicating the continuing introduction of units which
are further and further from the untransformed multivariate normal model.
The jump at m = 71 is caused by the introduction of unit 78. The general
shape of the curves towards the end of the search suggests we might try ,\ =
(1, 0.5, 1, 0, 1/3)T. Although 1/3 is not one of the standard five values we
4.9 Horse Musseis 177
·rn·~·~·~·~
t T" i i 0 0 I 0 1 0o
~ ° :
0
°
:[2]~~j~0°_
: : 0 0 : : 0
:[21:[IJ
I
o
0
~ O 0
o
•
0
o 0
o
•
0 0
0
)
0 .. .. •
• 0 ° • 0 • . 0
0 • • • •
[ZJ[~Jrn~J:~
+: ~
·~·~·[l]'ITJ'lZJ
t I I 0 I 0
I I t ~ I I 0 QQ
: 0 0 : 00 : 0 : 0
-~-~-~-~-rn
. o . 0 • o • o0 · ~
• . J 'h . o o 8, • o . oo o :
. . . . 0 0 .
. . . .
• C' • • • •
.
• • • • f
FIGURE 4.17. Horse mussels: scatterplot matrix with superimposed robust con-
tours. These non-elliptical curves indicate that transformation might be beneficial
e
0 0
~
"0
0
0
,5
]l
:.:J
0
."
50 60 70 80
Subset size m
FIGURE 4.18. Horse mussels: forward search on untransformed data. Likelihood
ratio test for the hypothesis of no transformation. The horizontal lines are the
95% and 99% points of X~· The data need tobe transformed
, .. "'' ...
__ ,,"'' ...... _,,'
'"'
q
CU
"0 ."
.0 ci
E
.!!!
0
ci
."
9
50 60 70 80
Subset size m
FIGURE 4.19. Horse mussels: forward search on untransformed data. The five
elements of the estimate j_, suggesting that at least three variables should be
transformed
4.9 Horse Mussels 179
"'
0
-~
'C l{)
0
0
,!:
Q)
""
::::;
LI)
50 60 70 80
Subset size m
FIGURE 4.20. Horse mussels: forward search with AR = (1, 0.5 , 1, 0, 1/3)T.
Likelihood ratio test for AR, which is rejected during the latter part of the search
~
-- ,--· •• 3
50 60 70 80
Subset size m
FIGURE 4.21. Horse mussels: forward search with AR = (1, 0.5, 1, 0, 1/3) T. The
five elements of the estimate 5.. Some outliers are entering towards the end of the
search
180 4. Multivariate Transformations to Normality
0
~ Lt>
"0
0
0
,s
o;
..><
::::; ~
Lt>
50 60 70 80
Subset size m
FIGURE 4.22. Horse mussels: forward search with AR = (0.5, 0, 0.5, 0, O)T.
Likelihood ratio test for AR, which is only rejected at the end of the search
q
//'' ·3
<U
I ',,.,.,'\\
''
... ,
"' ,'
,----, I
'0
.0 0 ' __ , ,,
' ' ........ '
E
..!!!
/1~
I I
0
I J..--4
0 .......
...
~-
/'···~...//
--···········~--
"'
c? ··.. /
................J
50 60 70 80
Subset size m
FIGURE 4.23. Horse mussels: forward search with AR= (0.5 , 0, 0.5, 0, O)r. The
five elements of the estimate >.. The estimates are stable until the outliers enter
at the end of the search
.... ~
~
- / . -o.
"' ..--------~- ...-•. ..•••.---;o>"
........- ••.•. _-.,0
.:
;z:=~~~~-..::=--=-=--=~ - - - -r.L:.-=-~~3
---------------~---~?"'---~-- ;
~ L------------~ ~ L------------~
"' }------------~
~ 0 --- -- - - - ---- - - 0
'l'
~
U.C><
"'
~
"'
'l'
~
------------ ............
- -/
55 60 65 70 75 80 1
Subset size m
FIGURE 4.24. Horse mussels: fan plots for six values of .A including 1/3, with .Ao
= (0.5,0,0.5,0,0)T . The outliers have most effect on the transformation of Y5
the presence of these two observations. They however lie outside the range
plotted in the figure .
The comparison of forward plots of scaled Mahalanobis distances also
shows how the transformation has improved normality for most units and
made the outliers more remote. Figure 4.26 shows that , for the untrans-
formed data, there is a large number of outliers without any apparent
structure, which do not reduce in number as the search progresses. Due
to these outliers, the distances for the central units are too small and are
in the lower half of the theoretical distribution. After transformation, Fig-
ure 4.27, there is a much clearer group of six outliers, two of which, units 8
and 48, are particularly remote in the scatter plots in Figure 4.25. The same
structure is clear in the forward plots of the minimum distance amongst
units not in the subset. Figure 4.28, for the original data, shows an irregular
increase in distance as the search progresses. On the other hand, the left-
hand panel of Figure 4.29 for the transformed data initially shows a set of
distances which only increase very gradually, with few fluctuations . At the
end of the search the distances are large as the six outliers enter, although,
due to masking, the distance for the last observation is the same as that
of the preceding observation. The right-hand panel of Figure 4.29 is the
gap plot, showing the difference between the smallest squared Mahalanobis
distance for units not in the subset and the largest distance for units within
the subset. This show a first peak at m = 76, just before the first outlier
enters the subset at m = 77. There is a second peak at m = 80 as the
4.9 Horse Musseis 183
·rn·lZJ·
: I
•
~ T
[2J·[ZJ·[l]
..............
.
I
~ 8
7
.
o
•:
0
.
0 •
: .8
0
•
~ c48 ~
:lZJ.:r n·[ZJ:lZJ
I
•
O..:lLJ78 I
'"t'
J I
4
I
(j48
:~[Z]LIIZJ[Zl
!0 !lZJ!lZJI[]]!CZJ
~~[ZJ~[Il
FIGURE 4.25. Horse mussels: scatterplot matrix of t ransformed data, ,\ = (0.5,
0, 0.5, 0, 0) T , with superimposed robust contours. These elliptical curves, to be
compared with those of Figure 4.17, show the effect of t ransformation
last two, more extreme, outliers enter with some masking. Other forward
plots, not reproduced here, likewise show the effect of the transformation
in partitioning the data into a large central part and half a dozen outliers.
It is now time to interpret the transformation. The first three measure-
ments are lengt h, the fourth and fifth mass, which is related to volume and
so to the cube of length. The log transformation of mass leads to responses
with dimension approximately the logarithm of length. It might be hoped
that the first three variables should also be logged, to give a dimensionally
homogeneaus model. However the model with logarithms of all five vari-
ables is not acceptable, as is indicated by the fan plots of Figure 4.24. In our
final model one of the variables is logged, but the square root is taken of the
other two. Since all three measurements are of length, the same transfor-
mation might have been a nticipated for all variables. In fact, our discussion
of the fan plots of Figure 4.24 suggested that the 1/3 transformationwas
184 4. Multivariate Transformations to Normality
"'
.,<.>
V>
c
!1!
V>
'ö ...
:gc
V>
"'
o;
z::;
"'
:::;:
"'
0 .-
50 60 70 80
Subset size m
FIGURE 4.26. Horse mussels; untransformed data. Forward plot of scaled Ma-
halanobis distances, showing many outliers
"'
§
"'
u;
'ö
...
~c
"'
o;
i
z::;
"'
:::;:
"'
40 50 60 70 80
Subset size m
FIGURE 4.27. Horse mussels; transformed data, >. = (0.5, 0, 0.5, 0, O)T. Forward
plot of scaled Mahalanobis distances, showing six well-separated outliers
acceptable for the first three response. Figure 4.30 is the forward plot of
the likelihood ratio test for the null value ..\ = (1/3, 1/3, 1/3, 0, O)T. It is
similar in shape to Figure 4.22 for ..\ = (0.5, 0, 0.5, 0, o)T , but the values
throughout are higher, although not significantly so. Although the trans-
formation with three values of 1/3 is therefore also acceptable, that with
the log of Y2 and the square root of the other two variables is statistically
4.9 Horse Musseis 185
0
a. .
"'
<!)
~
ll)
ci
0
ci
50 60 70 80 50 60 70 80
Subset size m Subset size m
FIGURE 4.28. Horse mussels: untransformed data. Forwardplots ofMahalanobis
distances. Left-hand panel, minimum distance among units not in the subset and,
right-hand panel, gap plot
<0
0
a. .
"'
<!)
~
....
ll)
ci
0
ci
40 50 60 70 80 40 50 60 70 80
Subset size m Subset size m
FIGURE 4.29. Horse mussels: transformed data, ,\ = (0.5, 0 , 0.5, 0, O)r. Forward
plots of Mahalanobis distances. Left-hand panel, minimum distance among units
not in the subset and, right-hand panel, gap plot . The effect of the outliers is
now apparent at the end of the search
preferable. Although the three variables are all lengths, there is no reason
for them tobe subject to the same transformation, particularly if the mus-
sei shells change shape as they grow; the shapes of the three distributions
of length would then not be the same.
This is a canonical example of our approach to finding a multivariate
transformation in the potential presence of outliers and influential observa-
tions. We start with a search with untransformed data and use information
from the forward plots of the estimated transformation parameters to sug-
gest a transformation, which we then use in a second forward search, repeat-
ing the analysis until we find an acceptable transformation. In this example
186 4. Multivariate Transformations to Normality
"'
C\1
0
C\1
~
~
"&8 ~
Qi
-"'
:.:::;
"'
50 60 70 80
Subset size m
FIGURE 4.30. Horse mussels: forward search with >. = (1/3, 1/3, 1/3, 0, O)T.
Likelihood ratio test for this value, which is only rejected at the end of the search.
Campare with Figure 4.22
only three searches were needed in all to find a transformation which was
stable for nearly all the search, any changes being at the end where the
outliers entered and the likelihood ratio test became highly significant. We
can use the same procedure for the methods in our next chapters to find
transformations for principal components analysis , discriminant analysis
and duster analysis. But now we conclude this chapter with some exam-
ples of transformations of data from single populations that show some
special features.
"'!
<D
d
"'
E
E
.!!!
"'d
"'d
.J.,
d
"'
"0
..c
"'d .~ 12
E ''....... -,, ...... ,.., .. - ___ ,-\,, .·•,_,1\_............
.!!! "'9 . - - .. _... __ ........ .-'''-,
-- ...... ,_ ........... - ....\,_ ...........-~I 3
I
I
<D I
9 I
I
,\ = 0.5 for all variables. This hypotheses is rejected for m greater than
almost exactly 200, an improvement on the results for untransformed data,
but one that can be further improved. As a result of the forward plot of
the elements of 5. from this search we try
that is the square root for six variables, the logarithm for two and the
reciprocal square root for one. The forward plots of the parameter estimates
from this search are more stable than those for the search on untransformed
data in Figure 4.31.
4.10 Municipalities in Emilia-Romagna 189
V
-
.
"
"'I
~ " . .
"
N "'
N
I
"'I
N
"""'
N
""
"'
N
""
N"'I
I I
.
" ..
"cr> .
"
"'I
N N
I
"'I
N
"
<i) "
<i) "
<i)
"'I
N
"'
N
I
"'I
N
" "
~m=331y1 ~m=331y1
~~~~~~~~~~~ 1 -~0~.9~-~0~.5~-~0.~1~~~~~
-0.9 -0.5 -0 .1 0.3 0.6 0.9 0.6 0.9
The forward plot of .Xis changed from the earlier plot in Figure 4.31. The
upper panel of Figure 4.33 again shows forward plots of six estimates. The
major improvement over the earlier plot is that now virtually all change
190 4. Multivariate Transformations to Normality
<D
0
"'
"0
.0
E
.!!!
C\J
0
C\J
0
w
0
, __
C\J ' _,_"..,- . . . . . ,~_~j"',. . .,..__",
0
"'
"0
.0
E C\J
.!!!
9 -\.-'- .,;-------- ------ , __ --- · - --- .,_- --- ,,___ ---'- --- --- · ------------- .,_- --,---- ----·\ 3
<D
9
200 220 240 260 280 300 320 340
Subset size m
"'
e
0
"8
0
,!:
a; ~
.><
::::;
245, 264, 261, 239, 246 and 250. These observations cause the estimated
transformation to move steadily from 0 to 1. Such behaviour would be
undetectable by the backwards deletion of outliers. Even if observations
277 and 310 were correctly identified as the pair needing deletion, their
removal would only cause the estimated value of )q to move closer to one.
As the right-hand panel of Figure 4.36 shows, many of these observations
correspond to the smallest values of y 1 . The six smallest values at the
bottom of the boxplot are for units 260, 188, 245, 277, 261 and 246.
-------------';<~~·c-_:_·:~~-~~)/ 1
_,
200 220 240 260 280 300 320 340 200 220 240 260 280 300 320 340
Subset size m Subset size m
~
188
<:!
~
<0
~
I ci
3 10
:8"' <0 "'
E ci
.!!! >.
ö
w
..,. "'
--' ci
:::0 264
239
"'
ci "' 250 246
261
2n
~aa
0
ci "' 260
320 325 330 335 340
Subset size m
FIGURE 4 .36. Emilia-Romagna data, demographic variables: forward search with
AR2 = (0, 0.25, 0, 0.5, 0.5, 0, 0, 0.5, 0.25)T. Left-hand panel, the last 22 values of
>.1; right-hand panel, boxplot of y1 : small values of YI influence .\1
40 60 80 100 0 20 40 60
yl6 IOO.yl6
so that Yl6 and yz 3 should both be cubed. These are surprising transfor-
mations since customarily we find that -1 :::; A :::; 1.
Usually we are transforming variables with a long right-hand tail, which
require values of A < 1 to give symmetry and approximate normality. But
the histogram of y 16 in the left-hand panel of Figure 4.37 shows that this
variable has a long left-hand tail. The variable itself is the percentage of
occupied houses with a fixed heating system. We could, with equal logic,
consider 100-y 16 , the percentage of occupied houses without a fixed heating
system. As the right-hand panel of Figure 4.37 shows, this new variable has
a long right-hand tail, of the kind that requires a value of A less than one
for transformation. The plot of y 23 is similarly skewed to the left and we
also work with 100- y 23 , the percentage of inhabitants not filing an income
tax return.
With these two modified variables, our analysis based on forward plots
of the estimates >..j now leads to
277
....
0
....
0
0 0
in C')
in C')
2 2
0 0
~ 0
C\J
~ 0
C\J
.,;, .,;,
:.::J :.::J
~ ~
0 0
causing this value to be rejected. The last six units to enter the search,
working backwards, are 70, 277, 250, 264, 191 and 117. The large jumps
upwards in the value of the statistic are associated with units 277, 191 and
117, which have the three smallest values of y 14 . The large effects of these
three units are shown in the right-hand panel of the figure.
We now look at a few fan plots of expansions of the signed square root of
the likelihood ratio test for >..R2 , which are presented in the four panels of
Figure 4.39. The first panel, for y 14 , shows that our chosen value of zero is
acceptable almost to the end of the search, but that the last six observations
to enter cause this value to be rejected. The effect of these units on the
likelihood ratio test for >..R2 has already been illustrated in the right-hand
panel of Figure 4.38. These units also cause 0.5 to become an acceptable
value for the transformation parameter. The panel for y 16 shows that either
1/3 or 1/4 is acceptable, butthat 0.5 is not. The inferences about the value
of >.. 16 are unaffected by the outliers so evident in the panel for transforming
Yl4·
The two remairring parreis of Figure 4.39 show that some transformations
a re sharply defined, others not. The top right-ha nd panel for y 20 indicates
clearly that -0.5 is the only correct transformation whereas the last panel,
for Yzz, shows that any value between 0 and 0.5 gives a satisfactory trans-
formation. The plot for y 21 , which is not given, shows that 0.25 is the only
acceptable value of >.. for this response; even the nearby value of 1/ 3 gives
a test statistic which occasionally wanders out of the acceptance region.
The plot for Y21, which we do not give, does not show a ny effect of the
replacement of 11 zeroes by the minimum observation which was noted in
Table 4.4.
4.10 Municipalities in Emilia-Romagna 195
•'
, ____ _, '- -
- - . /J..... , '----- ~ -~·'
J -....--.._\ - ................. ___ _ ___ "'A _ _ ".,.:'
~ ~- ---rr- - 1~ ~
- - - -. r\..
I
200 220 240 260 280 300 320 340 200 220 240 260 280 300 320 340
Subset size m Subset size m
FIGURE 4.39. Emilia-Romagna data, modified wealth variables: fan plots con-
firming .X.= (0, 1, 0.25, 1, 1, 0.5, -0.5, 0.25 , 0.25, -1)T . Only the panels for YI4, Y1s ,
y2o and Y22 are shown; in the upper row .X. 5 = ( - 1, -0.5, 0, 0.5 , 1) , in the lower
row .X. 5 = (0, 0.25, 1/3, 0.5)
The major conclusion from this analysis is that , on replacing y 16 and Y23
by 100- y 16 and 100- Y2 3 , we have obtained stable transformations within
the customary range of minus one to one. In addition , the finer grid of ).
values is needed for some of this group of responses as weil as for some of
the demographic variables. The only evidence for the effect of outliers is on
the transformation of y 14 and it is with this that we end our analysis.
The first panel of Figure 4.39 showed that 0.5 was an acceptable value
for >. 14 at the end of the search. If we replace the first element of AR2
by this value of 0.5, the forward plot of the likelihood ratio test for this
new hypothesis when it is used as the basis of the search is as given in
Figure 4.40, which is quite different from that in Figure 4.38. Now the
value of the statistic is roughly double what it was during the search with
two peaks on the boundary of significance. There is also a non-significant
increase at the end of the search. This plot shows how the effect of the
outliers can be masked by the choice of just one of the ten elements of AR·
ll)
"'
0
"'
0
~
~
"80
,;;
Q;
""'
:.::; ~
There are several new features to the analysis of these data. One is that,
as Table 4.4 showed, there were 92 zero values for y 26 which had to be
replaced by the minimum value. There were also three zeroes in y 25 . We
need to determine whether these two variables are having an effect on any
inferences drawn from the data.
We start our analysis by replacing these zeroes and performing the cus-
tomary analysis of looking at maximum likelihood estimates of the param-
eters during the forward search, combined with likelihood ratio tests, to
obtain the estimated transformation
4.10 Municipalities in Emilia-Romagna 197
~ \
__ ,,/' . '
-------------.,_ .. __,............ ..... -__ .... __ -.......... __
~ ../- ,, . ,,-, __,'--'-' "
~~-~-- ....... _..'
C!
"'
"0
.0
--
E ~
.!!!
0 = .."-----
c:i .......-····------·····----· ------···-../"._........•
\
.._ __;
---------·-····/"""·---------·----····--···--· ·-----------
....... ,. 9
C!
\ ...._....
,... ___ ........
__ .,.----------- r/-
'7
0
C\i
----------~------·-----------------·-·-----·-·-·-_·-:-:--:-·,..-,-~,:-:,-: ~5
"' C! ~~~~~==~~--~==--=-~-~-~-~-~--~~~~-~-~-~-~--~---=~----~
:g
E
.!!!
~ +-------------------------------------------------------~
~ t=~====~====~======~====~====~======~====~==~
200 220 240 260 280 300 320 340
Subset size m
0
0 ~
1/)
8
....
0
'lii 'lii 0
CO
2 0
2
"'
.Q 0
~ ~ 0
<0
->< -><
)
::::; 0
::::;
N
....
0
~ 0
N
'-'
~
0 0
~;
200 220 240 260 280 300 320 340 200 220 240 260 280 300 320 340
Subset size m Subset size m
is shown in the right-hand panel of Figure 4.42 in which the value of 0 for
A7 in AR is replaced by -1. This is the value of 5.7 at the beginning of the
search plotted in Figure 4.41. The effect on the likelihood ratio test is to
remove the peak at m = 200, but to produce a more gradual increase of
the statistic to significance towards the end of the search.
4.10 Municipalities in Emilia-Romagna 199
~ 0 ~~:~~::~~~~::::::::~_::_::-~:~~_:::~.5
__ ___ ______________ _____ , - ~
·--------. __ ........__ 1
200 220 240 260 280 300 320 340 200 220 240 260 280 300 320 340
Subset size m Subset size m
FIGURE 4.44. Emilia-Romagna data, modified work variables: fan plots confirm-
ing ,\ = (0.25, 0, 2, -1 , 0, 0, 1.5, 1, 1)r. Only the panels for the modified variables
Y25 and Y26 are shown
We now briefly consider four panels of the fan plots for these nine vari-
ables. The expansions are around AR. The first panel in Figure 4.43 is for
y6 , showing that neither 0 nor 0.5 are acceptable values for A6 , but that
the finer grid of values is needed. The second panel is for y 7 , which shows
a trend which is to be expected from the argument in the previous para-
graph. Initially the value of 0 is rejected, although it is acceptable for much
of the search and is preferred towards the end. The plots of the estimates
of the transformation parameters in Figure 4.41 also indicate that A8 and
Ag behave in a way which requires further investigation. The bottom left
panel of Figure 4.43 shows that 2 is the best value for A8 . A value of one
is unacceptable only at the end of the search, so it is not certain that this
variable has to be transformed; the evidence may be solely due to the pres-
ence of outliers. The plot for yg, the bottom right panel of Figure 4.43,
shows that - 1 is the best value for Ag until close to the end of the search,
when it is rejected due to a series of upward jumps in the test statistic.
Theseare caused by the introduction of units 264, 252 and 165, the three
units with the smallest values for yg. This is a measure of unemployment
which is low in these small rural communities, but not zero.
We have not, so far, discussed y 25 and y26 . These a re the two variables
in which zeroes had to be replaced. Both are percentages of employees
working in factories of a particular size, in the case of y 26 , in factories
with more than fifty employees. It is not surprising that there were 92 zero
entries for y 26 . In analysing the wealth variables we replaced two variables
by their values subtracted from 100, in order to obtain variables for which
the estimated transformation parameters lay between -1 and one. Here
the estimated transformations are 1.5 and 0.5, rather than the values of
three we obtained for the two wealth variables. However, if we replace y 25
by 100- y 25 , with a similar transformation for Y26, we obtain variables for
which zeroes no Ionger have to be replaced.
Once this replacement has been made we obtain 0 and 1.5 for the param-
eter estimates instead of 1.5 and 0.5, the other values of AR remairring as
200 4. Multivariate Transformations to Normality
they were. Most of the previous plots also remain as they were. For example
the forward plot of the likelihood ratio test for the new AR is similar to
the left-hand panel of Figure 4.42 with a significant peak around m = 200.
The fan plots for >.. 25 and >..26 are in Figure 4.44. The left-hand panel of the
figure shows that 0 is a satisfactory transformation for Y25, which is the
value we had already found. However the right-hand panel of Figure 4.44
shows that, although 1.5 is close to the maximum likelihood estimate of
>.. 26 , one is also an acceptable value. It is this value that we use for our
further analysis.
Variable Transformation
.Ac1 .Acz
Demographie variables
Y1 0.00 0.5
Y2 0.25
Y3 0.00
Y4 0.50 1
Y5 0.50 0.25
Y10 0.00
Yu 0.00
Y12 0.50 0.25
Y13 0.25 0.5
Wealth variables
Y14 0.00 0.25
Y15 1.00 0.5
Y16 0.25 0.5
Y11 1.00
Y1s 1.00
Y19 0.50
Yzo -0.50 -1/3
Y21 0.25
Y22 0.25
Y23 -1.00
Work variables
Y6 0.25
Y1 0.00
Ys 2.00 1
yg -1.00 0
Y24 0.00
Y25 0.00 0.25
Y26 1.00
Y21 1.00
Yzs 1.00
for .A 14 . The effect of the entry of outliers is apparent in the figure, for
example around m = 280, but they do not affect this choice of value for
the parameter. A finer grid is also used in the final panel of Figure 4.45 for
Y25, again a work variable. From the upper panel of Figure 4.44 it seemed
that 0 was a satisfactory value for .A 25 . However the panel in Figure 4.45
202 4. Multivariate Transformations to Normality
...--;.o
V"'-----.... __
l.,
100 150 200 250 300 350 100 150 200 250 300 350
Subset size m Subset size m
FIGURE 4.45. Modified Emilia-Romagna data: fan plots confirming .Xc1 in Ta-
ble 4.5. Only the panels for ys, yg, Y14 and Y25 are shown
indicates that 0.25 is a better value. This panel makes the general point
that with 341 units, a finer grid of values of ). is necessary; neither 0 nor
0.5 are satisfactory as values for this transformation.
As a result of the fan plots of Figure 4.45, and those for the remairring 24
variables, we obtain a series of slight adjustments of our vector of estimates
that are listed as .Ac 2 in Table 4.5. It is these values that we use for our
remairring analyses. That it has taken appreciable effort and several pages
to reach this vector of estimates is a refl.ection of the complexity of trying to
find a satisfactory transformation simultaneously for 28 variables. With the
univariate transformations described in Atkinson and Riani (2000, Chapter
4) it was possible for the forward search to achieve an ordering of the
data in which any outliers infl.uential for the transformation entered at the
end of the search. But with multivariate data, the infl.uential observations
for one transformation may be different from those for another. The final
ordering of the observations will be such that outliers enter at the end,
as we shall see from forward plots of Mahalanobis distances. But, as the
panels of Figure 4.45 and other plots show, the values of the estimates of
the transformation parameters and the test statistics may vary during the
search as observations important for transformation of a particular response
enter the subset.
The last sixteen observations to enter the forward search are:
277 70 239 245 260 250 310 264 188 133 238 194 252 278 315 327.
4.10 Municipalities in Emilia-Romagna 203
0
N
0
::;:
E
:::> ~
E
·c:
~
Comparison of these results with those in Table 1.3 for the search on un-
transformed data shows that the seven most outlying communities are the
samein both searches, although, apart from unit 277, Zerba, they enter in
a different order. The four communities in Table 1.3 that do not appear
are units 2, 30, 88 and 149, those with the largest populations in the table.
How many of these sixteen communities can be thought of as outlying can
be determined from plots of Mahalanobis distances.
Figure 4.46 is a forward plot of the minimum Mahalanobis distance
amongst the observations not in the subset. This clearly shows the re-
moteness of the last unit, as well as indicating two other groups of outliers.
The plot is similar in general form to the same plot for the untransformed
data in Figure 1.12, except that it is smoother and the outliers are perhaps
more separated at the end of the search. Figure 4.47 is a forward plot of
scaled Mahalanobis distances, to be compared with Figure 3.24. In both
plots the obvious feature is the effect of Zerba. However, the effect of the
transformation has been to make unit 245 appreciably less remote.
Figure 4.48 repeats Figure 4.47, but cut to ignore the curve for Zerba.
It is now evident that there is a set of five units: 70, 239, 245, 260 and 250
which form a clear group especially visible around m = 300. Perhaps also
there is a slight grouping of units 310, 264 and 188 visible around m = 320.
We now consider the location of these units. Figure 4.49 shows the lo-
cation of the last 21 communities to enter the forward search using trans-
formed data. Comparison with Figure 3.27 shows that, as a result of the
204 4. Multivariate Transformations to Normality
0
<0
~ I
I
.,"'<.>
c:
~ ....
0
'6
~ I
"'
f5?~~o:toooo:~-~=:::::"-~~ .. .. • .....,----.~
"iil
.c
"'
::E
I
0
"'
--r
150 200 250 300
Subset size m
transformation, all except two of the last 21 are poor rural communities
in the Appenines. The two that are not are units 310 (Casina) and 70
(Goro), which we discussed at the end of §1.6. The transformation has led
to a sharpening of the separation of units; in Figure 3.27, six of the last 16
units to enter were on the plain rather than in the mountains.
___ - ----
26o'---""'".".,.,."'""'"""--,--=------"'~":.:.:.:----~-- . . . ~
, __
........... ........, ... .... .... '.... ".---------~ ~
04---- 0 -. 0 ~
",
...... ,:,...:.-"":~-..... , .... -:: .. . . . . . 239 .- .. -:. ~~- ...{, "-- ...
"'c:~ • • ·~ , _ I
*
'5
"'0
:c
c:
0
"'
7ö
~
"'
:::::;;
"'
j
150 200 250 300
Subset size m
for Aj = - 3 lies near the centre of the plot, that for y 3 is near the upper
end of the central region; a transformation of - 4 for this variable is only
just acceptable at the end of the search and is beyond the 1% point of the
distribution for some preceding steps. Only for the transformation of Ys
is there evidence of change of the score statistic at the end of the search
as outliers enter. The conclusion from this figure is again that a common
value of - 3 is acceptable for all transformation parameters, a conclusion
which is hardly affected by any outliers.
We could have started our analysis of transformations of these data by
finding maximum likelihood estimates of the individual transformation pa-
rameters, as we did, for example, in Figure 4. 7 for the babyfood data. For
the present data this would require a seven-dimensional optimization at
each stage of the forward search. A numerically simpler alternative is to
use the score test described in §4.4. The test statistic Tsc is defined in
(4.23). We might hope that this will have a chi-squared distribution on
seven degrees of freedom. However we shall see that this is not the case at
all steps of the search.
Two versions of this statistic are possible. They differ in the estimated
covariance matrix f;(.5.) which can either be calculated from the residuals
of independent regression or iterated from this starting point using SUR
regression. Forthis example the statistics are close in value throughout the
search. Figure 4.52 is a forward plot of the values of the non-iterated test
4.11 National Track Records for Women 207
0 0
"' "'
~ ~
;,
EI
~ r: ~
I
-"!
A I
/
~-vv
"'
20 30 40 50 20 30 40 50
Subset size m Subset size m
FIGURE 4.50. Track records: forward plots of likelihood ratio statistics for testing
a comrnon transformation for all races. Upper two panels, .>. = -1 and -2; lower
two panels , .>. = -3 and -4
for the hypothesis that all Aj = -3. It seems as if the statistic is signifi-
cant, which we do not necessarily expect since the fan plots in Figure 4.51
supported a common value of minus three for all transformations. But the
null distributions of score tests for transformations are not always close
to the asymptotic null distributions, here chi-squared. We have therefore
added to the plot the results of a simulation of 1,000 test statistics. These
show that the distribution has Ionger upper tails than the chi-squared both
at the beginning of the search and at the end. This phenomenon has been
analysed for the score test Tp(>.. 0 ) for transformations of univariate data
by Atkinson and Riani (2002b). The longer-tailed distribution at the be-
ginning of the search in the univariate case is due to the statistic have
approximately a t distribution, rather than a normal one. The Ionger tails
at the end of the search, which they call a "trumpet effect", are due to
the presence of the observations in the constructed variables on which the
response is regressed. A similar logic applies here. The simulations show
that the score test is not quite significant at the end of the test and that a
common value of -3 is indeed appropriate for all responses.
We now compare plots of the observations transformed with >.. = -3
to those we have already seen for no transformation and for the recipro-
cal. The forward plot of the scaled Mahalanobis distances in Figure 4.53
shows more regular behaviour than that for the reciprocal transformation
in Figure 3.22. The last four observations to enter, from the last, are North
208 4. Multivariate Transformations to Normality
S~L~~~~~~~:; ~ ,
~'---'::-.:-_t_'\____
:: -2
0
\.--- -""'' 0
~::
0
-·
_/_ - --.. . . ........../-~\
·~ ·2
II
\' ·1
I o
30 35 40 45 50 55 30 35 40 45 50 55
Subset size m Subset size m
FIGURE 4.51. Track records: fan plots confirming that all elements of A equal
minus three. Only the panels for y2, y3, Ys and Y7 are shown
Korea (33), Czechoslovakia (14), Western Samoa (55) , and Mauritius (36) ,
the same as for A = -1 but in a different order. These four observations
have clearly large Mahalanobis distances at the end of the search.
The scatterplot matrix of the transformed data with superimposed ro-
bust contours, Figure 4.54, is to be compared with Figure 3.20 for the
untransformed data and Figure 3.21 for the reciprocals of the data. The
improvement in normality from this further transformation is shown both
by the increasing ellipticity of the contours and by the proportion of each
panel enclosed by the outer contour. The scale of the panels is set to include
all the data and the contour should asymptotically include 99% of the data
if normality holds.
Finally we look at two plots which focus on the behaviour of the last
four units to enter the search. These have been labelled on the scatterplot
matrix of Figure 4.55 and highlighted in the parallel coordinate plot of
Figure 4.56. In interpreting these plots we are looking at a response which is
speed cubed, called z, so that large values are desirable. North Korea (33) is
outlying because of its poor performance in the two shortest races compared
with relatively good performance in middle distance races. It shows as a
bivariate outlier on, for example, the plot of z 2 against z 3 . Czechoslovakia
(14) has relatively poor performance in Ionger races, compared with its
world record for the 400m. race (z3 ). It is outlying in the plot of z 3 against
Z5. Western Samoa (55) has a particularly low value of z5 , although all
its speeds are low. It plots at the bottom left-hand corner of all panels,
4.12 Dyestuff Data 209
_/ ____ 0.99
0
~-------~------------------------------------ 0.05
30 35 40 45 50 55
Subset size m
FIGURE 4.52. Track records: forward plot of non-iterated score test that all Aj =
-3. The dotted lines of the simulation envelope show the value is not significant,
even though it seems to be when judged by the continuous lines of the asymptotic
X~ distribution
CXl
"'c
GI
(.)
CU <0
(ij
'6
"'
:0
0
c
CU
.".
äi
.<:
CU
~
20 30 40 50
Subset size m
FIGURE 4.53. Track records: forward plot of scaled Mahalanobis distances when
all Aj = -3. Tobe compared with Figure 3.22 for the reciprocal transformation
responses - strength, hue and brightness. Box and Draper find that only
three of the six explanatory variables have a significant effect on the three
responses. Their plots of residuals (p. 123) arguably indicate that Y2 should
be transformed, but no transformation of either y 1 or Y3 is suggested by
their univariate analyses of each response separately.
We start our multivariate analysis with a forward search on untrans-
formed data. Throughout we use the three variable model (x 1 , x 4 and X6)
used by Box and Draper, the evidence for which is not affected by the
transformations we consider. The results of our forward search are in the
left-hand panel of Figure 4.57 which shows the evolution of the estimates
of the transformation parameters. The values of 5-1 and 5- 3 oscillate around
one, which is also the starting value for 5- 2. However, as the search pro-
gresses the estimate of >. 2 decreases to around one half. We therefore repeat
the search with .A2 = 0.5, obtaining the right-hand panel of Figure 4.57;
as before, 5- 1 and 5- 3 oscillate around one. But now 5-2 fluctuates around
a half from the start of the search. We seem to have found a satisfactory
transformation and take
AR= (1, 0.5, 1f.
We have already seen a wide variety of forward plots in the analysis
of transformations of the earlier examples in this chapter, so do not give
further examples here. For example, the forward plot of the likelihood ratio
4.12 Dyestuff Data 211
[IJ[ZJ~~~~~
:rn:[. 2J:
.
.
. ~. _.
. :~..::i :~;t~..:
'
. . . .
-~--:~ '
.
: .
.
.
.
.
.
~[ZJ[IJ~[2j~~
:~!t'~=tp·
.. . .
.
•[~
.
.
/..· : .:rn: C. 2J: ~·-· .-~. --~
lj. .•
. . . .
. .
.
~~~~rn~~
: .:;: . ::~..-.:·..' ..
.... . :~"{.:~.:-:0: .. []]:~ . :- .. ..
\
. ..
~··· .. j[lJ: [
jlj>[IJ
• ' • • t t ' •
I
: : : : . :
' ' ' . '
I •• I : ' I • •
1.-..-~---- ·
:~(.'j~ -- · ",·:~c. j~
.r:J'-1 j
I
;
:. . ·:,. :
I
:-
•.
.-
:
. . -
:
I
: ~
.. • • I
;
: ·:
:
I
:
.
~
I
;
:
•
FIGURE 4.54. Transformed track records with all Aj = -3: scatterplot matrix
with asymptotic 50% and 99% spline curves. The contours are more elliptical
than those shown in Fgure 3.21 for the reciprocally transformed data
test for >..R lies well within the 95% point of the chi-squared distribution
on three degrees of freedom throughout the search: there are no outliers,
although there is a peak around m = 35, indicating a compromise in the
ordering of the units between the three responses. The forward plots of
minimum and maximum Mahalanobis distances likewise increase steadily,
with no particular peaks to suggest any omitted structure.
One plot we do give is of profile loglikelihoods, together with asymptotic
95% confidence intervals, for the three parameters, when m =54. The left
and right-hand panels of Figure 4.58 indicate that the values of one for >.. 1
and >..3 are less precisely determined than the value of 0.5 for >..2 in the
central panel. Similar information comes from the three panels of fan plots
in Figure 4.59, where only 0.5 is acceptable for >.. 2 in the central panel,
whereas several values are acceptable for the other two parameters.
212 4. Multivariate Transformations to Normality
[]0 .
. . ooe o 12
... ...,..,.(':
0 00005 0 00009
~ ~ ~
..,
bZJ[J
33: • 33 •..;.. 33 ........ 33 • •t:~
55 1.55 ls i5 ls iS.
-
++
...
38-4'
...
++
f555 33 33 i5 33
~ 31 ++33
iS 33
~[] *. ""+.•t+
~ :ft·
33
·~
1;.t*
•
f"" ,. +
i;ti.\!'~·
iS~
r/1 ."... ~.
::~r. +
~~
~.t.....;-.
33+~
.~ll,..
..
. '[][2]
os36.
33
+
....
+
•.;t·:
. ~.
~~t ;t."!!f
++; ~·,;
·-t:ttl
. . i.l.. .+++
33 +
!11++ + z4 ;;ii.~· ~.)i!••
:te·· Lll+ ....... 1Jtt..+ +
FJ ~
. ß..... ~ ,~ l~yr·z+
$'
···~
~:.}·
:'*.
)f*f ++
33 ...
~·
•I>
j;J;
'(+
J·fit: • .'<;'
<*f .~··ol zS
~· ~~.
·~ •
1•• .. 1 ••
8 t!i~· ·~
*+ ........
··<t+.*• t
·?,-'. •;~:~:.
.;~·
0
.Jqf..···
·~- .. 33.ff::,..
• # .
~~a:t
jg.t··
·lA~'ii:l
1
..)ii[J
,.~
36 St,....
1•• 1••
.
~:.. ..
0.0005
..
J··~·
+
..•
-
•• +
~
0 0001
# (~
j.t\! ..."..
33t,........
'.:ii!· r~r
11.·~·
~\ •• + '
+ •t. . "'33
[(++
5 "1QA.6
i.t .....
~-!~·! ..:)3 +
8"1()A.6
··k:
,. ~~
......~
~
tf!. •
.....
5'10"'--8
z7
~
3"1()1'.7 .",
FIGURE 4.55. Transformed track records with all Aj = -3: scatterplot matrix
with the last four units to enter highlighted and labelled; 33- DNK, 14- CZ, 55
-WS, 36- MA
2 3 4 5 6 7
FIGURE 4.56. Transformed and standardized track records with all AJ = -3:
parallel coordinate plot with the last four units to enter labelled; 33 - DNK, 14
- CZ, 55 - WS , 36 - MA
0 0
C\i C\i
"? "?
q q
"'
'0
.0 "'
'0
.0
E E
.!!! "? .!!!
0 0 0
w w
...J 0 ...J
:::;: 0 :::;: 00
q q
'7 '7
20 30 40 50 60 20 30 40 50 60
Subset size m Subset size m
the choice between these models can be made in the knowledge that the
correctly transformed data do not contain any outlying observations. In
this way our analysis sharpens the information that comes from the residual
plots when m = n shown by Box and Draper (1987, p. 123).
214 4. Multivariate Transformations to Normality
:V
---...._
0 0
1 m=5 4 1 m=5 4 I
y1 m=5 4 y3
"'<0 "' ~~~~~~~~~ :2
<0
~~0~9~
-~0 .~
3 ~0~.2~0~
. 6~1.0~1~.4L1~.8~ ~-0.9 -0.3 0.2 0 .6 1.0 1. 4 1.8
FIGURE 4.58. Dyestuff data: profile loglikelihoods for the three transformation
parameters when m =54. Search with AR= (1 , 0.5, 1)T. The value of >.2 is weil
determined
- . - '\ ,.
~ L_--~----~----~----~--~
30 40 50 60 30 40 50 60
Subset size m Subset size m
".,-- . . _, . . -- -:.:o--
0 ;;~j:!;;_;~~::.:.-----~-j" 1
30 40 50 60
Subset size m
FIGURE 4.59. Dyestuff data: fan plots for individual transformations confirming
AR= (1, 0.5 , 1)T
C>
0
~
0
~
ö '
e ''
I!! "' ''
'
"':> '' '
''
CT
--
(/)
alc
"'
(ij
C>
20 30 40 50 60
Subset size m
FIGURE 4.60. Dyestuff data: fan plots for a n overall transformation. Only 0.5 is
admissible
C>
"'
C>
"' ''
\V ' ' ',
' · 3
~ ~
'
(/) , ___ _ (/)
u u
~
.... ·····- 1
~ -······ ···· ~ .... 1
C> C>
~ .,!.
--------- -4
~
.,!.
--4
C> C>
"';"
C> C>
~ ~
10 15 20 25 10 15 20 25
Subset size m Subset size m
FIGURE 4.61. Logged babyfood data: forward plotsoft statistics for the terms
of the linear model. Left-hand panel Yl, right-hand panel Y2· The values of the
statistics shrink as the estimate of a 2 increases with m
of Figure 4.61 illustrate the point. The overall impression is that the t
values start large and decrease. The left-hand panel of the plot, for Yl,
shows that, at the end of the search, the significant variables, in order, are
x3, xs, X3X4 and xz. For Yz the right-hand panel shows that the order of
variables changes slightly to x 3 , x 5 , x 2 and then the interaction x 3 x4. A
216 4. Multivariate Transformations to Normality
l()
l3 0
~ ~
1 \
\
\
c:
0
l()
~
Ci)
0
10 15 20 25
Subset size m
FIGURE 4.62. Logged babyfood data: forward plot of added variable t statistics
for the terms of the linear model for YI. Six separate forward searches were needed
to produce this plot
puzzling feature of these plots is the behaviour of the statistic for X2: for Yl
it starts large and positive, but finishes significantly negative, whereas, for
y 2 it is negative throughout. lt is hard to obtain much information on the
effect of individual observations on the values of this and other t statistics.
We now contrast these plots with those produced by the use of added
variables when we delete each x in turn. The method was described in §2.9.
The plot for y 1 is in Figure 4.62. The significant variables in these plots
now, as we would expect, have t statistics which diverge steadily from zero
as the sample size increases. Of course, the curve for each t statistic finishes
with the same value as in Figure 4.61. But now we can see that only the
last few observations make x 2 and the X3X 4 interaction significant; there
is nothing eise noteworthy about the behaviour of x 2 , which initially is
non-significant. That the curve for x 3 starts late in the plot is a reminder
that we have a separate search for each added variable. The design points
chosen by the search using the other variables were singular when x 3 was
included until m = 18.
Figure 4.63 for y 2 is similar, although now the significance of both x 4 and
the X3X4 interaction change towards the end of the search. The absence of
sharp jumps in these plots shows that no individual observation is having
a large effect on significance of the variables. Forward plots of residuals for
the individual variables provide a useful supplement to the Mahalanobis
4.13 Babyfood Data and Variable Selection 217
1.()
rn
'-' 0
~
~
.,!.
c:
0
---------
1.() __ /_."...--5
~
Q)
Cl
10 15 20 25
Subset size m
FIGURE 4.63. Logged babyfood data: forward plot of added variable t statistics
for the terms of the linear model for y2
LO
3
U)
u ~
./!/'
~
1ii
u;
..!.
c:
0
LO
__ ,
c:
0
\ ----, ___ ,
/
-/
// 5
~ ~
a; a; 0
Cl Cl
0
"?
18 20 22 24 26 28 18 20 22 24 26 28
Subset size m Subset size m
FIGURE 4.64. Logged babyfood data: forward plots of added variable t statistics
for the terms of the 16-term linear model including all two factor interactions.
Left-hand panel YI, right-hand panel Y2· Few of the variables are significant
The forward plots of the added variable t tests for this example show
that there is no evidence of masking or other difficulties in model selection.
Our recommendation for a model would be to drop x 1 from the model
used in this section, since it is not significant, but to keep x 4 because of its
presence in the x 3 x 4 interaction. A more complicated example in which the
forward added variable plots reveal masking by outliers of the importance
of explanatory variables is given by Atkinson and Riani (2002a).
TABLE 4.6. Swiss bank notes: minimum, maximum and ratio of maximum to
minimum value for each variable and each group of notes
Genuine notes
Minimum Maximum Ratio
Y1 213.8 215.9 1.010
Y2 129.0 131.0 1.016
Y3 129.0 131.1 1.016
Y4 7.2 10.4 1.444
Y5 7.7 11.7 1.520
Y6 139.6 142.4 1.020
Forged notes
Minimum Maximum Ratio
Y1 213.9 216.3 l.Oll
Y2 129.6 130.8 1.009
Y3 129.3 131.1 1.014
Y4 7.4 12.7 1.716
Y5 9.1 12.3 1.352
Y6 137.8 140.6 1.020
4.15 Exercises
Exercise 4.1 Derive the expression for w(.\) , equation (4.17).
Exercise 4.2 Swiss heads. Do you expect that units 104 and 111 have a
stronger or weaker effect on the univariate forward test for transformation
of variable y 4 on its own than on the multivariate test? Discuss the power of
univariate or multivariate tests in the presence of univariate or multivariate
outliers.
Exercise 4.3 Swiss bank notes. a) Table 4.6 gives the minimum, maxi-
mum and ratio of the maximum to minimum for each variable and each
group of notes. A priori, what can you say about the need for transforming
the data? b) Does the group of genuine not es ( observations 1-100) need
transformation? Try a common transformation for all variables. c) Would
you expect the evidence for transformation to increase or decrease when
the two groups of notes are considered together? What do you expect from
the plot monitaring the likelihood ratio test for the common transformation
.\ = 1 using: I) an unconstrained search; II) a search which starts in the
group of genuine notes, and III) a search which starts in the group of forged
notes?
Exercise 4.4 The analysis of the data on national track records for women
in §4.11 showed that evidence for the common transformation .\ = -3 is
spread throughout the data. Describe what you think will be the shape of the
4.16 Salutions 221
Exercise 4.5 The last Jour observations to enter the Jorward search Jor the
transJormed data on national track records Jor women are, Jrom the last,
North Korea (33}, Czechoslovakia (14), Western Samoa (55) and Mauritius
(36). Their proflies are shown in the parallel coordinate plot, Figure 4.56.
What effect do you expect the introduction oJ these Jour units to have on
the Jorward plots oJ ( a) the maximum Mahalanobis distance among those
in the subset and (b) the minimum distance among those not in the subset?
You should consider the possibility oJ masking.
Exercise 4.6 Emilia-Romagna data. Figure 4.32 shows the profile loglike-
lihoods at m = 331 Jor the 9 demographic variables. In this step the maxi-
mum likelihood estimate oJ the transJormation parameter Jor variables Y10
and y 11 is very close to zero. From this figure, what are your expectations
about the Jan plots oJ the expansion oJ the signed square root oJ the likelihood
ratio test Jor >..R2 Jor variables y 10 and y 11 araund >.. = (-0.5, 0, 0.25, 0.5)T?
Exercise 4. 7 Dyestuff data. Using the library oJ routines Jor the Jorward
search mentioned in the PreJace, plot the Jan plots Jor the three responses
separately. What are your conclusions?
The left and right-hand panels oJ Figure 4.58 indicated that the values oJ
one Jor >..1 and A3 were less precisely determined than the value oJ 0. 5 Jor
>.. 2. What do you expect Jrom the likelihood ratio tests Jor the null hypothe-
ses >.. = (1, 1, 1)T against the unrestricted alternative (>.. 1 , >..2, A.3)T and the
restricted alternative >.. = (1, >.. 2, 1)T?
4.16 Salutions
Exercise 4 .1
Exercise 4.2
Observations 104 and 111 are basically univariate outliers, not multivariate
atypical observations. We expect that marginal univariate tests are more
powerful when there are univariate outliers. Figure 4.65 shows the plot of
222 4. Multivariate Transformations to Normality
I'
I' 0.
I
I
I 1
the signed square root likelihood ratio test when Y4 is considered on its own.
If we compare this figure with Figure 4.16 we see that the jump caused by
the inclusion of units 104 and 111 is much stronger in Figure 4.65 than
in Figure 4.16. As we expected, univariate tests are more appropriate in
presence of univariate outliers.
Exercise 4.3
a) Table 4.6 shows that for Yb Y2, Y3 and Y6 the values of the ratio in
both groups are very close to one. This implies that we expect to see very
flat profile likelihoods for transformation parameters and that these vari-
ables will not need any transformation. Also, because of the flatness in the
likelihood, we expect that while the most remote units may change the
maximum likelihood estimate of the transformation parameter, they will
not cause significant alterations in the value of the likelihood ratio test. For
variables y 4 and y 5 , the ratio is around 1.5, which is still small, so we also
expect that these variables will not need transformation.
b) Figure 4.66 is a forward plot of 5. when testing the common transfor-
mation .X = 1 for the group of genuine notes. It shows values near one for
the last 20 steps of the search. Figure 4.67, the forward plot of the likeli-
hood ratio test statistic for the hypothesis Ho : .X = 1, has small values,
when compared with xi, confirming that one is an acceptable value. But a
wide range of values is also acceptable for .X. The two panels of Figure 4.68
are twice the profile loglikelihoods for .X at m = 90 and m = 100. These
4.16 Solutions 223
"'
"'
"0
.c
E
.l\1
0
~
E
~
<I>
~ 0
1;i
:;
'7
75 80 85 90 95 100
Subset size m
FIGURE 4.66. Genuine Swiss bank notes: forward plot of the common maximum
likelihood estimate of >. when testing >. = 1
(i)
2 "'
0
~
.:.!.
:::J
"'
75 80 85 90 95 100
Subset size m
FIGURE 4.67. Genuine Swiss bank notes: forward plot of the likelihood ratio test
statistic for the hypothesis Ho : >. = 1, tobe compared with xi. The figure shows
that one is an acceptable value
224 4. Multivariate Transformations to Normality
v---- ---
m=90 m=100
0 0
C\1
:;:: ~
"'
.Q
0 0
~"'
CO CO
0
~
a..
0
g ....
0
-3 -2 -1 0 2 3 -3 -2 -1 0 2 3
Iambda Iambda
FIGURE 4.68. Genuine Swiss bank notes: profile loglikelihood for transformation
parameter when testing >. = 1 when m = n - 10 = 90 (left panel) and m =
n = 100 (right panel). The outer verticallines define the 95% confidence interval.
The central vertical line gives the maximum likelihood estimate of >.
virtually parabolic curves are, from Figure 4.66, centred near one. But the
95% confidence intervals, which are the points where the curves have de-
creased by 3.84/2, are ( -0.29, 1.96) at m = 90 and (0.24, 1.88) at m = 100.
They cover a wide range of values of .X. Not only is the transformationnot
well defined, but there is no evidence of any effect of the last observations
on this inference. So the outliers detected in §3.4 do not affect this trans-
formation and there is no reason not to analyse the data in the original
scale.
c) The evidence for transformation will increase when the two groups of
notes are considered jointly, because the values of the ratio of the maximum
to the minimum foreachvariable will generally increase (see Table 4.6).
N ow consider the plot monitoring the likelihood ratio test for the common
transformation A = 1. If we start our search with units coming from both
groups (unconstrained search) we expect many ßuctuations throughout the
search. If we start in the group of genuine notes or in the group of forgeries,
we expect to see a jump in the values of the likelihood ratio after step
m = 100. Finally, irrespective of the starting point used in the search, we
expect that the final part of the plot will be the same. Since the ratios of
maximum to minimum are not high for each variable, we do not expect
significant values of the likelihood ratio test in the final part of the search.
Figure 4.69, which gives the likelihood ratio test for the hypothesis of the
common transformation (.X= 1) using an unconstrained search (left panel),
a search starting in the group of genuine notes (centre panel) and a search
starting in the group of forgeries (right panel), shows all these aspects.
4.16 Solutions 225
<0
5'!- 100
FIGURE 4.69 . Swiss bank notes: forward likelihood ratio test for the hypothesis
of a common transformation ..\ = 1. The left panel gives an unconstrained search
starting with units belonging to both groups. The centre panel gives a search
starting with the first 20 genuine notes and the right panel a search starting with
the first 20 forged notes
~
'7
0
"'
"0
.0 ~
E
.!!!
0
UJ
--'
:::;:
0
<?
0
"1
20 30 40 50 20 30 40 50
Subset size m Subset size m
FIGURE 4. 70. Track records: forward plots of maximum likelihood estimates of
..\ when testing ..\ = -1 (left panel) and ..\ = -3 (right panel)
Exercise 4.4
(a) When the search uses reciprocal data although the evidence for the
common transformation A = -3 is spread throughout the data, the forward
plot of the maximum likelihood estimate of >. will lie around -1 in the
central part of the search and then trend downwards steadily towards -3.
In a search using a value of A close to the true value we expect to see
fiuctuations around the true value throughout the search.
The two searches for A = -1 and A = -3 are given in Figure 4.70.
The left-hand panel, in agrement with our expectation, shows the estimate
declining steadily to -3. Initially those observations are included which
most closely agree with the reciprocal transformation, although even then
the highest value is a little less than -1. The right-hand panel, as expected,
shows fiuctuations around -3 throughout.
226 4. Multivariate Transformations to Normality
0
Lri
1.0
.j <0
0 0
::;: ::;:
E 0 E
::> ::> 1.0
E .j
"1;i ·c:E
::;: ~
1.0
"; ...
0
";
20 30 40 50 20 30 40 50
Subset size m Subset size m
FIGURE 4.71. Transformed track record data: forward plots of maximum (left
panel) and minimum (right panel) Mahalanobis distances. There is no indication
of masking at the end of the search. This implies that the profiles of the y values
for the 4 units which enter the search in the last four steps should be dissimilar
Exercise 4.5
Figure 4.56 showed that the four profiles are dissimilar and so we do not
expect any masking. In other words, we expect that the plots of minimum
and maximum Mahalanobis distances of the transformed data will show a
constant increase rather than a peak and a sudden decrease due to mask-
ing in the final four steps. The two panels of Figure 4.71 which show the
monitoring of maximum and minimum distances confirm our expectations.
Exercise 4.6
Figure 4.32 shows that the profile likelihood surface for y 10 is sharply
peaked around 0 and the value 0.25 seems to be outside the confidence
interval, so we expect that in the expansion 0 is the only one acceptable
value. The profile likelihood for y 11 is much less sharply peaked and the
confidence interval covers 0.25, so we expect in the expansion to see that
both 0 and 0.25 are equally plausible throughout the search. Figure 4. 72
confirms our expectations.
Exercise 4. 7
The three resulting fan plots of the score test are gathered together in
Figure 4.73. For the first response y 1 , strength, there is a jump in four
out of five score statistics at the end of the search, due to the inclusion of
observation 1, the smallest observation. The effect is largest on the recipro-
cal transformation and negligible on the acceptable >. values of 1 and 0.5.
The conclusion isthat Yl does not require transformation. For y 2 , hue, the
structure of the plot is similar, except that the square root transformation
is indicated. The large increases in the statistics for >. = 0, -0.5 and -1
4.16 Solutions 227
~ -0.
0
>-0
0
>- 0
-, ••• ----------··- .·, 0.2
\...---... ,., --...,__
, __ ...... -'- ....
'-J\ 0 .
at the end of the search are caused by inclusion of the two smallest obser-
vations. Only >. = 0 .5 is acceptable throughout the search. The plots for
brightness, y3, are devoid of sudden jumps, all observations indicating no
need for transformation.
Given that all evidence for transformation seems to be due to y 2 we ex-
pect the forward curve associated with the likelihood ratio test for the null
hypothesis >. = ( 1, 1, 1) T against the unrestricted alternative ( )q, >. 2 , >. 3 ) T
to be very close to the curve which has the restricted alternative >. =
(1 , A2 , l)T.
Figure 4. 74 shows forward plots of the two likelihood ratio tests from
searches on untransformed data. The upper curve is the likelihood ratio
for testing >. = (1 , 1, 1) , against an unrestricted alternative. The plot also
shows the 95% and 99% points of the asymptotic x~ distribution: the hy-
pothesis of no transformation is clearly rejected. Since the search is on
untransformed data, the initial part of the search includes observations
which support the null hypothesis. The lower curve is again for testing the
hypothesis of no transformation, but with the alternative >. = (1 , >. 2 , 1), so
that only transformation of y 2 is considered. The two tests are virtually
indistinguishable, showing that all the evidence for transformation is in y 2 .
The general shape of the curves shows that this conclusion does not depend
on one or a few observations.
228 4. Multivariate Transformations to Normality
-1
0 ~
~ -0.
~
u;
2
"' 0
w
o"'
~ ~
~ ~==============================================~==~
____J /
.4
~U")~~~~~~~~~--~-----~----~----~/~~-0
(J)
u;>
--~ -----.=---:.:-- ...-- O.E
u;> L __ _ _ --_--_-~-~~-·
-·----------------------------------------~
30 40 50 60
FIGURE 4.73. Dyestuff data. Fan plots of score statistics Tp(Ao) for marginal
power transformation of each response. Toppanel y1, bottom panel Y3 · Individual
searches for each ,\. Only Y2 needs transforming
0
C\1
U")
~
u;
2
0
~
8
,5
~
a;
-"'
5
U")
10 20 30 40 50 60
Subset size m
FIGURE 4.74. Dyestuff data. Forwardplots of likelihood ratio tests for the null
hypothesis ,\ = (1, 1, 1) agairrst (continuous line) the unrestricted alternative
(-\1, -\2, -\3) and (dotted line) the restricted alternative ,\ = (1, ,\ 2, 1) . The hori-
zontal lines are the 95% and the 99% points of the :d distribution. All evidence
for a transformation is provided by Y2
5
Principal Components Analysis
5.1 Background
Principal components analysis is a way of reducing the number of variables
in the model. It may be that some of the variables are highly correlated
with each other, so that not all are needed for a description of the subject
of study; perhaps a few linear combinations of the variables would suffice.
Other variables may be unrelated to any features of interest. The data on
communities in Emilia-Romagna offer many such possibilities. In Chapter 4
we arbitra rily divided the variables into three groups. But do we need all
the nine demographic variables in order to describe the variation in the
communities or would a few variables suffice, or a few combinations of
variables? Then the other variables would be contributing nothing but noise
to the measurements.
If we are dealing with normally distributed random variables, any linear
combinations that we take will themselves be normally distributed. One
consequence of the transformations to normality of the previous chapter
is that, once the data have been transformed to approximate normality
we can use the methods of principal components analysis to reduce the
dimension of the problern if this is possible.
Principal component analysis has also sometimes been suggested as a
method of outlier detection. However, if there are outliers or unidentified
subsets, these may infl.uence the estimation of the principal components,
which are functions solely of the estimates of the mean and covariance
matrix of the data. What is needed for outlier detection is a form of prin-
230 5. Principal Components Analysis
(5.1)
E Y-Y
y- Jfj?
Y- JJTYjn
(I- H)Y =GY= Y, (5.4)
where H = J JT j n. The spectral representation of the estimated covariance
matrix is written as
T
"Eu= GLG , (5.5)
A
ponents are included, the greater the percentage of total variation in the
data that is explained.
There is usually also no obvious hypothesis that can be tested about the
values of the population eigenvalues >. 1 , ... , >-v . In particular, to test that
>-v = 0, is to test that the data lie in a v - 1 dimensional subspace, which
is a rather special structure. If the principal components are effectively
explaining the general structure of relationships between the variables, the
fitted multivariate normal distribution will have contours that are sensibly
ellipsoidal. For those variables which are independent, but with differing
variances, the principal components will be the variables themselves, maybe
with small contributions from other variables arising from sampling error.
But, if the independent variables have similar variances, perhaps due to
scaling, the contours will be roughly spherical and the components will
be poorly defined, with each variable explaining a similar amount of the
total variation. Under such conditions, the principal components are not
achieving any simplification in the structure of the data. It has therefore
been suggested (for example by Mardia, Kent, and Bibby 1979, p. 235 and
by Flury 1997, p. 622) to test for sphericity, that is for equality of the last k
eigenvalues. Even if the test is not significant, so that the last k eigenvalues
can all be taken as being the same, this does not mean that the variables
concerned can automatically be dropped from the analysis. Failure to reject
the hypothesis of equality only indicates that no further structure can b e
extracted by the use of principal components.
Lli =
V
tr f:u. (5.9)
i=l
(5.10)
Some numerical values are given in Table 5.1. Our hope isthat the variance
explained by the first few components will be greater than those in the
table.
(5.13)
5.3 Monitoring the Forward Search 235
TABLE 5.1. Halving rule for percentage of variance explained by successive prin-
cipal components
Component
Number Dimension of Data v
r 2 3 4 5 6 7 8
1 66.67 57.14 53.33 51.61 50.79 50.39 50.20
2 33.33 28.57 26.67 25.81 25.40 25.20 25.10
3 14.29 13.33 12.90 12.70 12.60 12.55
4 6.67 6.45 6.35 6.30 6.27
5 3.23 3.17 3.15 3.14
6 1.59 1.57 1.57
7 0.79 0.78
8 0.39
the score for the ith observation on the jth component. If the original
data are normally distributed, the transformed data and the scores of the
observations on each principal component will also be normally distributed.
Forward plots of the scores for units once they have joined the subset are
informative about the presence and effect of clusters and outliers.
(5.16)
estimated as
(5.17)
(5.18)
where r is the rank of the matrix Y, L 1 12 = diag(li/ 2 , ... , l~/ 2 ) is the di-
agonal matrix containing the square root of the r non zero eigenvalues (in
non-descending order) of matrices (YTY)j(n- 1) =tu or (YYT) j (n- 1).
U = ( u1, . . . , Ur) is the orthorrormal matrix containing the r normalized
eigenvectors of yyT' (ur u = Ir). G = (gl' . .. ' gr ) is the orthorrormal
5.4 The Biplot and the Singular Value Decomposition 237
2
Ynxv ~
i Ui9ir
vn=IL-z1/2
)(
i=1
)
11;2 0 g'[
~ vn=-1 ( u1 U2 ) (
1
121;2 g'f
0
~ vn=!U(2)Lg)2 G(z)· (5.20)
(5.21)
where usually 0 :::; a :::; 1 and 0 :::; a :::; 1. The biplot consists of plotting
the n + v two-dimensional vectors which form the rows of A and B. In the
238 5. Principal Components Analysis
a[ )
af
A= (
a'f:
is represented by a point. The jth row (j 1, . .. , v) of the v x 2 data
matrix
bf )
b[
B= (
b~
is represented as an arrow from the origin to the point with coordinates
bj = (bj 1 , bj2f· While every row ofthe n x 2 matrix Ais associated with a
row of Y, every row of the v x 2 matrix B is associated with a column of Y.
If the scale of the two sets of coordinates is not compatible we can introduce
an arbitrary multiplier which adjusts all the variables by the same amount,
or we can use two scales.
The most popular choice is a = 0 and o: = 1. In this case A = Jn=l Uc 2 )
= YGc 2 )L;~/ 2 contains the first two principal components scaled to have
unit variance. It is easy to show that in this case the Euclidean distances
between the points in the biplot (rows of the matrix A) are the best rank
two approximations of the Mahalanobis distances between the correspond-
ing rows of the centred matrix Y (Exercise 5.13). On the other hand, the
v arrows associated with the v rows of B = Gc 2 )L~g, will provide the
best two dimensional approximation of the elements of the covariance ma-
trix f:u. In other words, the lengths bJbj of the vectors bj (element j,j of
BBT) are the best rank two approximations of the variances of the vari-
ables (s]) and, similarly, the cosines of the angles between the bj represent
correlations between the variables. Finally, if principal components analysis
is applied to standardized variables, the length of the jth arrow is equal
to the percentage of variance of the jth variable explained by the first two
principal components (Exercise 5.13).
When a = 0 and o: = 0 the approximation becomes
- ~ 1/2--
A- vn- 1U(2)L( 2 ) - YG(2), (5 .23)
BT = G'(;).
In this last case the ith row of A simply contains the principal component
scores for unit i and the jth row of B contains the elements of the first
two eigenvectors for the jth variable (gj 1 and gj 2 ). When a = 0 and o: = 0 ,
the properties of the rows of the matrices A and B separately will change.
In this case the distance between two points in the biplot (rows of matrix
5.5 Swiss Heads 239
(5.24)
Since the length of the projection of a vector a onto a vector bis given by
aTb/llbll (see Exercise 5.2), it follows that Yij is represented as the length
ofthe projection of a; onto bj , multiplied by the scalar llbill/vn=T which
does not depend on i. If the vectors ai and bj are nearly orthogonal the
value of Yij will be approximately zero. Conversely, observations for which
Yij is very far from zero will have a; lying in a similar direction to bj.
The relative positions of the points defined by ai and bj will therefore give
information about which observations take !arge, average and small values
on each variable.
The biplot works well if the percentage of variance explained by the
first two principal components is high and the data do not contain out-
liers. However, if influential observations are present they may influence
the correlations between variables. The biplot will then give misleading in-
formation and willlead to wrong inferences. As we see in §5.6, it is highly
informative to draw the biplot in selected steps of the forward search. In
this way we can easily monitor how the angles and the relative lengths of
the arrows are modified when a group of outliers is introduced into the
subset.
C>
U)
....
C>
-... _,.. ---- .. -------------- . -- .. _--- .. ----- . --- ...... -.. -- .. _-- _.... - ..
C> -
FIGURE 5.1. Swiss heads: forward plot of the percentage of variance explained
by the six principal components: the first two components explain only around
63% of the total variation
standardised variables. This plot established two things. The first is that
the first principal component explains only 42.7% of the total variation in
the data at the end of the search. The next component explains a further
20.4%, making just over 63% in all. The remaining four components are
similar in the amount they explain, around 10% each. Two components
are needed to give a not very complete representation of the structure. The
second point is that the percentages shown in the plot are stable, unaffected
by any outliers. In particular, units 104 and 111, entering at the end of the
search, have little effect on the percentages explained.
As well as the percentage of variation explained by the principal compo-
nents, a global property, it is interesting to see how the scores of individual
units change during the search. Figure 5.2 gives the scores for the first
component for units included in the subset. These seem like a sample from
a normal distribution, as they should if the data are normally distributed,
and are stable as the search progresses. The same is true for Figure 5.3
which shows the scores for the second component. The unit with the most
positive score on the first component is 159, which has an outlying value
of Yl in the boxplot of Figure 3.9. Otherwise extreme values in the original
variables do not seem extreme on the principal components.
This example does not show the forward search leading to new discoveries
about the structure of the data. What the forward search has achieved is to
show how conclusions from an analysis of all the data, such as that in Flury
5.5 Swiss Heads 241
...
"'
"'0~ 0 j
cX
~
'9
.----
100 120 140 160 160 200
Subset size m
FIGURE 5.2. Swiss heads: forward plot of scores for included ullits Oll the first
prillcipal compollent
... J
"'
"'0~ 0
cX
'r l_ --r
-"J
100 120 140 160 180 200
Subset size m
FIGURE 5.3. Swiss heads: forward plot of scores for illcluded Ullits Oll the secolld
prillcipal compollellt
242 5. Principal Components Analysis
Y1: density
y 2 : fat content, grams/litre
y3: protein content, grams/litre
y4: casein content, grams/litre
y5 : cheese dry substance measured in the factory, grams/ litre
y6 : cheese dry substance measured in the laboratory, grams/ litre
y7: milk dry substance, grams/ litre
y8 : cheese produced, grams/ litre.
Although Daudin, Duby, and Trecourt (1988) state that there are 85 Ob-
servations, their Table 1 contains 86 rows, as row 63 is repeated as row 64.
Atkinson (1994) analysed these 86 rows.
A scatterplot matrix of the data is in Figure 5.4. The panel for y 5 and Y6
shows clearly that one unit is remote in this bivariate projection. Otherwise
several panels show a strong rising diagonal structure which we can hope
will be well explained by a single principal component. There arealso sev-
eral plots with an almost circular scatter, which will not be well explained.
But, before we try principal components, we consider transformation of the
data to multivariate normality.
Figure 5.5 is a forward plot of the likelihood ratio test for the hypothesis
of no transformation, to be compared with the chi-squared distribution on
eight degrees of freedom. Up to m = 80 there is no evidence of any need
for a transformation. The next four units to enter make the test significant
and then there is a final peak caused by the last unit to enter. The gap
plot, Figure 5.6, illuminates this structure: there is a sharp peak at m = 81 ,
followed by a decline until the last unit enters. This structure suggests that
the four units entering towards the end form a duster. This suggestion is
strengthened by the two scatterplots of Figure 5.7, which are details of
Figure 5.4 with a few points marked. Unit 69 is the last to enter and is
the clear outlier already mentioned. The group of four units are numbers
1, 2, 41 and 44, which are particularly evident in the right-hand panel of
the figure.
5.6 Milk Data 243
32 36
l:j
~-----~~==~~==;~,_~-,~
:<:
.-------.. "'------' ;:==;~ l!l.:J[C=-----> ~
"'
..,
0
FIGURE 5.4. Milk data: scatterplot matrix of the eight measurements on 85 milk
samples
0
<D
0
~
'80 ..,.
0
,!;;
o;
""'
::::;
0
N
I
I
0 -
- '- ~
40 50 60 70 80
Subset size m
FIGURE 5.5. Milk data: forward plot of the likelihood ratio statistic for testing
transformation of the data. To be compared with X~
20 30 40 50 60 70 80
Subset size m
FIGURE 5.6. Milk data: gap plot - forward plot of differences in Mahalanobis
distances. One outlier and a duster of four observations are indicated
5.6 Milk Data 245
~
..-
69o "' 69•
1ll "'
"'
0 0
"' "'
<g_ <0 <g_ <0
"' "' 41 ~ 1 •
44• • • •
•
.... ~-
<0 <0
"' "'
. -....
..- ..-
"' "'
."..
"'
"' "'"'
30 31 32 33 34 35 24 25 26 27 28
y3 y4
FIGURE 5.7. Milk data: scatterplots of Y6 against Y3 and Y6 against y4, showing
the single outlier and the duster indicated in Figure 5.6
~
(I)
!'lc:
~
'6
(I)
~ 4
:ö
0
c:
"'
<V
.s::;
"'
:::;:
1/)
40 50 60 70 80
Subset size m
FIGURE 5.8. Milk data: forward plot of scaled Mahalanobis distances. The single
outlier enter in the last step of the search; the duster of four observations labelled
in Figure 5. 7 enter immediately before it
not transform. One focus of the analysis is the effect of the five units we
have identified on the principal components.
Figure 5.9 is a forward plot of the percentage of variance explained by
each principal component. For the ten steps of the search before the last
the first component explains about 71% of the variance, with the second
component explaining a further 13%. The remaining components make
only a small contribution. The power of the first component fulfills our
246 5. Principal Components Analysis
"0
Q)
c::
'(ij
Ci.
X
Q)
....
0
lii
>
'ö
;!?.
0
"' ----··-··--·--·--··----······-·--··-··-·-·-·--···-.........-·-··-····-··--·-·········--....--..
-------------------- ------------ ··-.....................____.....................__ _.....
------------------------- --------------- _____________
________________ .,.
-----------------------------
/
/
0
-----~;
40 50 60 70 80
Subset size m
FIGURE 5.9. Milk data: forward plot of the percentage of variance explained
by the eight principal components: after the duster of six units enters, the first
component explains around 71% of the total variation
expectations from the initial inspection of Figure 5.4. However, the forward
structure of the plot does not particularly refl.ect the structure we have
already found.
Figure 5.9 shows that, in the last step of the search, there is a dedine
in the percentage of variation explained by the first component. This is
caused by the entry of unit 69. However, there is no indication of any
effect of the previous four observations. Instead, there is an increase from
m = 71 to 76 as units 11, 76, 15, 14, 12 and 13 successively enter. These
six units are shown highlighted on the scatterplot matrix of Figure 5.10.
In general they form a duster of low values in these scatterplots, although
in some, such as the left-hand panel of Figure 5.7 they are joined by some
other units. However, most importantly, they extend the major axis of the
duster of points in those plots with a strong diagonal pattern. Their effect
is to increase the variance explained by the first principal component. The
effect of the introduction of these units is similar to that of "good" leverage
points in regression. The units are remote in the space of the observations,
but they reinforce the model already fitted to the data.
We can augment the forward plot of the percentage of variation explained
by the components, Figure 5.9, by looking at scree plots for particular val-
ues of m. Four such plots are shown in Figure 5.11. Theseare just alter-
native representations of the information in Figure 5.9, to which we have
added curves using the values from the halving rule in Table 5.1. The panel
for m = 70 seems to follow this rule rather well, whereas those for m = 76
5.6 Milk Data 247
FIGURE 5.10. Milk data, omitting observation 69: scatterplot matrix of the Ob-
servations. The second duster, of six observations, is highlighted
and 84 show that much more of the total variance is explained by the first
component after the duster of six units has entered. The plot for the end of
the search, m = 85, shows that the effect of including unit 69 is to increase
appreciably the contribution of the smaller principal components. The con-
tribution of the first component is, of course, correspondingly reduced.
We now consider the effect of individual units on the composition of the
principal components. The left-hand panel of Figure 5.12 shows a forward
plot of the correlations between the variables and the first principal com-
ponent. At the beginning of the search this is a combination of the mean of
four variables (3, 4, 5 and 6) and a less equal combination of the remaining
four. However, by the end of the search, the weight of the eight variables is
more nearly equal. This component represents the general positive correla-
tion of all variables. It is little changed by the effect of the last observation.
248 5. Principal Components Analysis
m~76
.... 0722
Ote:l Ott
0
Comp. 1Comp. 2Comp. 3Comp. 4Comp. SComp. 6 Comp. 1 Comp 2Comp. 3Comp. 4Comp. SComp. 6
0717
0032
0087 ....
Comp. 1 Comp. 2Comp. 3Comp. 4Comp. SComp. 6 Comp. , Comp. 2Comp. 3Comp. 4Comp. 5Comp. 6
FIGURE 5.11. Milk data: scree plots at four values of m. The superimposed
curves are for the halving rule, Table 5.1
"=! 3 . 4. s. 6
a)
()
0.. 0
~
-=
:5 0
<D
-~ 7,
<J) 8·"
c "<t
0 1
~ 0 2 ·"\
~
0
() ""
0
0
0
40 50 60 70 80 40 50 60 70 80
Subset size m Subset size m
Component two, in the right-hand panel of the figure, has some positive
correlations and some negative. It is a cantrast between the mean of the
same four variables (3 , 4, 5 and 6) and again a looser, positively weighted
combination of the remaining four variables. Compared with the left-hand
panel, the plots in the right-hand panel fl.uctuate more, showing a greater
5.6 Milk Data 249
LO
ci
....ci
~~~--- --..... ______ / __, ___,_________ \ .
1~:
<::
Q)
"'
·a;
"! 1 ·········\_...... //··.. ........······
1!? 0
2···,
ü:
... : :;c-~~--::o.~-5:.:.---..~-:=\6
ci \ -···. .',
5
0
ci
40 50 60 70 80 40 50 60 70 80
Subset size m Subset size m
·8
C\J
(.) c:i
0..
"fi
·;;;
0
E c:i
"'c:0"
~
-e0 C\J
9
(.)
....
9
40 50 60 70 80 40 50 60 70 80
Subset size m Subset size m
:I
0
~L
40 50 60 70 80
Subset size m
FIGURE 5.15. Milk data: forward plot of scores for included units on the first
eigenvector. The group of six outliers highlighted in Figure 5.10 have extreme
negative values
of the other five units. The effect of the introduction of units 11-15 is to
shrink the range of variation of the scores of the units already included in
the subset.
We conclude this section with a comparison of the biplot at two steps of
the forward search. Figure 5.16 shows the biplot with a = 0 and a = 1 when
m = 70 (left-hand panel) and m = 76 (right-hand panel). When m = 70 the
5.6 Milk Data 251
m=70 m=76
·6 -4 -2 0 2 4 6 6 -10 -5 0 5
<')
ci r---- 50
42
<D 17 50
-,
6464
17 ~ 4 <D
(\j
ci 206%f0 4y242
"'
ci 64 yeY7 7~ li)
V ci
15 11 76 »%~~ :sy7
,~'
ci 29
"' 0
0
ci 0 52 y§
9
"
<)' 1413
:HI 136
~
3Y 28
~ J12
21
26 'f
18
25 9 0
~
21
-0.2 0 .0 0.1 0.2 0.3 ·0.3 ·0.1 0 .1 0 .2
FIGURE 5.16. Milk data: biplot when , left-hand panel, m = 70 and , right-hand
panel, m = 76, after the introduction of units 11, 12, 13, 14, 15 and 76. The first
component is on the horizontal axis and the second on the vertical axis
length of the arrow associated with the first variable (yt) is much shorter
than those for the other variables and has an orientation towards the second
quadrant. After the inclusion of units 11 , 12, 13, 14, 15 and 76 the length
of this arrow seems similar to that of the other variables and it now points
towards the first quadrant. This is in agreement with what we observed
in Figure 5.12 about the evolution of the curve for the first variable. In
both panels of Figure 5.12 we can observe that, in going from m = 70 to
m = 76 , the magnitude of the correlation between the components and
y 1 increases in absolute value and that the sign of the correlation with
the second principal component changes. Finally, the left-hand panel of
Figure 5.16 indicates that variables 2, 7 and 8 are almost orthogonal to
variables 3, 4, 5 and 6 and only lightly correlat ed with the horizontal axis ,
that is the first principal component. After the inclusion of the six units
the cosine of the angle for variables 2, 7 and 8 and the first component is
considerably reduced. This is in agreement with what we have already seen
in the left-hand panel of Figure 5.12, that is the increase of the correlation
between the first principal component and variables 2, 7 and 8. All units
that enter have an extreme negative value for the first principal component
together with different signs on the second component. Figure 5.12 also
shows that, throughout the search, apart from the end, y 2 is the variable
with the highest correlation with the second principal component. If we
project the six units on the direction of the arrow for y 2 we can see that
while units 12, 13 and 14 will have strongly negative values, the values for
units 11 , 15 and 76 will be only slightly negative. This is, of course, in
agreement with what can be seen in the second column of the scatterplot
252 5. Principal Components Analysis
matrix of Figure 5.10. Units 12, 13 and 14 are associated with the duster
of 3 points having by far the smallest values for y2 , whereas units 11, 15
and 76 (the three other points highlighted in this figure) all have small, but
not particularly low, values for Y2·
Our analysis shows that the forward search can illuminate not only the
structure of the data, but also the effect of individual units on the structure
of the principal components. Most importantly, units which are important
for the principal components are not necessarily those important for deter-
mining other aspects of the structure, such as the presence of outliers or
the need for a transformation. In this case our analysis has also revealed
a duster of six units, five of which intriguingly have consecutive numbers
from 11 to 15. Regrettably, the data description in Daudin, Duby, and Tre-
court (1988) does not say anything about the numbering of the units. It
is tempting to think that these units, with lower values of the variables,
correspond to a different breed of cow, or a different geographicallocation.
As so often, the forward search is rich in suggesting further questions about
the structure of the data.
5. 7 Quality of Life
In the two preceding examples we had samples that were dose to those
from a single multivariate normal distribution, perhaps with a few outliers.
We now consider an example in which the data need to be transformed to
achieve normality. We compare analyses of the untransformed data with
those after transformation and show in what ways multivariate normality
improves the principal components analysis.
Since 1990, the Italian financial journal Il Sole - 24 Ore has promoted
a survey on the quality of life in Italian provinces. Provinces are aggre-
gations of municipalities, but at a finer level than regions. For instance,
the Emilia-Romagna region discussed in Chapter 1 is currently made up
of nine provinces: Piacenza, Parma, Reggio nell'Emilia, Modena, Bologna,
Forll, Ferrara, Ravenna and Rimini. This survey on quality of life is not re-
stricted to Emilia-Romagna, however, but covers all103 provinces of Italy.
It is conducted yearly. We use the data published in 2001, which mainly
refers to the previous year.
The survey is intended to provide a synthetic measure of "quality of life" ,
which is then used to rank the provinces. We are not greatly interested in
such a ranking, which· relies on questionable social premises. Instead, we
look at the variables collected and show the effect of transformations. In
the 2001 survey there were 36 responses dealing with 6 different aspects of
quality of life. The complete data set can be found in the web site of the
book. Specifically, the areas of interest were:
5.7 Quality of Life 253
• welfare
• wealth and work
• services and environment
• crime
• population
• leisure.
Data are not provided on the original variables listed above, but rather
on a scaled version of them. Scaling is performed by dividing each response
value by the maximum reading for that response. Of course, this procedure
is dramatically affected by outliers. It is likely to produce highly skewed
distributions which will be shrunk towards the origin compared with those
for variables without outliers. We might expect that our robust approach
to multivariate transformations will greatly improve the performance and
usefulness of principal components analysis in this application.
The data are in Table A.9, the scatterplot matrix for which is given
in Figure 5.17. The main features of the data are an upward trend in all
variables and a dispersion which appears to increase with the magnitude of
the observations. We expect that power transformations of the variables will
be appropriate. An interesting question is how the transformation affects
the principal components.
Before plunging into principal components analysis, we indulge in a little
data analysis of the sort exemplified in earlier chapters. Figure 5.18 is a
forward plot of the scaled Mahalanobis distances. There seem to be several
outliers and the distribution of distances at the beginning of the plot is
Ionger tailed than comparable plotssuch as Figure 3.6 or Figure 3.13. The
two panels of Figure 5.19 confirm this impression. The left-hand panel of the
figure shows the trace of the estimated covariance matrix, which increase
at m = 95 and again at m = 99. There seem to be four serious outliers
and four lesser ones. This impression is confirmed by the right-hand panel
of Figure 5.19 which is the minimum distance amongst units not in the
subset. It confirms the four large outliers, so clear in the forward plot of
Mahalanobis distance, Figure 5.18, as well as showing an upward trend
254 5. Principal Components Analysis
FIGURE 5.17. Quality of life: scatterplot matrix of the six indices from 103 ltalian
provinces
for the lesser outliers. We now see how these observations influence the
principal components analysis.
The left-hand panel of Figure 5.20 shows the percentage of the variance
explained by each principal component. For most of the plot the first com-
ponent explains around 50% and the second around 20%. But, in the last
eight steps of the search the variance explained by the first component
drops to being only a little above 40% while that explained by the third
increases from 10% to almost 20%. The right-hand panel of Figure 5.20
shows the elements of the first eigenvector. This is a remarkably stable
mean of all six variables; only the weight for y 2 changes, first increasing
and then decreasing slightly in the last eight steps. The scores for the first
principal component in Figure 5.21 belie all this seeming stability. They
shrink towards the end of the search and some of the outliers visible in Fig-
ure 5.18 have extreme scores. More importantly, for the use to which such
5.7 Quality of Life 255
"'
"'
0
V> "'
"'c:<.>
1Jl
'ö
~
0
.
c:
..
<ö
.s::
:::<
~
"'
0
75 80 85 90 95 100
Subset size m
FIGURE 5.18. Quality of life: forward plot of scaled MahaJanabis distances . There
seem to be several outliers
~ 0
0>
0 Cl CO
:::<
Q)
(.) E
:::>
~
I-
CO E
0 ·;::
CD
~
.....
0
....
CD
0
75 80 85 90 95 100 75 80 85 90 95 100 105
Subset size m Subset size m
FIGURE 5.19. Quality of life: left-hand panel, forward plot of trace of the es-
timated covariance matrix; right-hand panel, forward plot of minimum Maha-
lanobis distance among units not in the subset
an analysis will be put, the rankings of many towns change markedly in the
last six or eight steps. According to these scores the half dozen best places
are 36 ('frieste), 17 (Milan), 27 (Verona), 42 (Bologna), 65 (Rome) and
42 (Genoa). Apart from Rome, all these towns are located in the northern
part of ltaly.
We now transform the data. The procedure of Chapter 4 leads to the
transformation >. = ( -0.5, 0, 0, 0.5, 0.5, O)r. The least well established of
256 5. Principal Components Analysis
0
CD
0
Ll"l
"0
Q)
c::
"(ii ....0
a.
X
Q) 0
<"')
(ij
>
0 0
C\J
"!!!
;;: 0
C\J
<f!.
~ 0
f\...··----···./ \2
0
0 0
75 80 85 90 95 100 75 80 85 90 95 100 105
Subset size m Subset size m
FIGURE 5.20. Quality of life: left-hand panel, forward plot of percentage of total
variation explained by each principal component; right-hand panel, forward plot
of elements of the first eigenvector
0 '
~-~,. ~ -.-,. ,.-,-.-,. ,.-.-.-.,..., -.--- -.-.-,. ..,-.-.- ,...,-._,..,. .,-.-,...,-.-.- ,. "'-.-,-,. -,-.- 88
--r
75 80 85 90 95 100
Subset size m
FIGURE 5.21. Quality of life: score of individual units on the first principal
component. There is an apparent Iack of stability, with the ranking of many
towns changing in the last steps of the search
~
" ~ 70
~f I 38
I ...
75
127
I'
02
~
,.........,
.
j;! ~
~ ' 38 -
,.........,
~ ~ ~
I
!:!
!I Ia
89 - 17
,.
u-
I
!! n - e
2
I
~ •• ""
.w
~
e
SI e
~· II j;!
~·
L'--'
FIGURE 5.22. Quality of life: boxplots of the six indices on the original scale
flll -
I I
..
0
.,
0
I
0
N
0
0
0
.. -
..... -
0
FIGURE 5.23. Transformed quality of life data: boxplots of the six indices. Com-
parison with Figure 5.22 shows the removal of numerous outliers and an increase
in symmetry
"'
"'
0
"'g "'
"'
.!!l
'0 "' ~
:i5 "'0
c
"'
=
(ij 0
"'
:::;; ------'~--------
"'
0
J
60 80 90 100
Subset size m
FIGURE 5.24. Transformed quality of life data: forward plot of scaled Maha-
lanobis distances on the same scale as Figure 5.18
5.7 Quality of Life 259
0
CD
"0
"'
....
Q)
c: 0
·a;
c.
X
Q) 0
C")
~
>
ö 0
1ft "'
~
60 70 80 90 100 60 70 80 90 100
Subset size m Subset size m
FIGURE 5.25. Transformed quality of life data: left-hand panel , forward plot of
percentage of total variation explained by each principal component; right-hand
panel, forward plot of correlations between first principal component and the
variables. To be compared with Figure 5.20
....
"' 1
0
==
"'~
.§
~J ~~
.., = ~ ~~::: J
~-
------ ~- --~ ~---- ~ --.--- --~ ll!l
--------~-----~----
"?
60 70 80 90 100
Subset size m
FIGURE 5.26. Transformed quality of life data: score of individual units on the
first principal component . There is greater stability than in Figure 5.21
stable orderings throughout the plot. Now the best half dozen towns are
42 (Bologna), 12 (Genoa), 17 (Milan), 36 (Trieste), 46 (Rimini) and Rome
(65). However the same two towns of the South, 89 ( Crotone) and 90 (Vibo
Valentia), are still ranked worst on this first component.
The stability of the scores after transformation is an important improve-
ment which comes from the greater normality of the data. A second con-
260 5. Principal Components Analysis
m=93 m = 103
First Second Third Total First Second Third Total
U ntransformed 50.8 20.1 10. 8 81.7 41.8 18.8 17.2 77.8
FIGURE 5.27. Swiss bank notes: forward plot of variance explained by the six
principal components. The effect of the entry of the second group is manifest
shortly after m = 100
FIGURE 5.28. Swiss bank notes: forward plots of correlations of principal compo-
nents with variables. Left-hand panel, first component; right-hand panel, second
component
...
j
N
1
.,
Vl
0
0 .
.X
'!' ~,''I
FIGURE 5.29. Swiss bank notes: scores, after inclusion, of the first one hundred
units on the first principal component
group. Comparison of the scores from the two plots towards the end of the
search shows that the first component achieves almost complete separation
of the two groups. But Figure 5.30 shows that the separation decreases
slightly from m = 180 as the third group enter.
5. 8. 2 Forgeries Alone
When we fit just the forgeries on their own, the 15 observations of Group 3
enter fromm= 86 onwards. The left-hand panel of Figure 5.31 shows the
percentage of the variance explained by the eigenvectors; that for the first
eigenvector decreases from m = 86 whereas that explained by the third
and fourth increases- first one and then the other. The right-hand panel of
Figure 5.31 shows the correlations of the variables with the first principal
component. This is stable in four variables but the coefficient of Y6 decreases
markedly from m = 86, the sign changing from positive to negative before
the search ends. The coefficient for y 4 becomes more negative over the same
period. This reflects the outlying nature of the last 15 observations in y 4
and Y6 , which we highlighted in the scatterplot matrix of Figure 3.51. The
correlations with the second component are not particularly interesting,
but a similar sensitivity from m = 86 of other responses, is seen in the two
panels of Figure 5.32 for higher components. The left-hand panel, for the
third principal component, shows how y 1 is eliminated from this compo-
nent , while the weight for y 5 increases. Both respond appreciably to the
264 5. Principal Components Analysis
";"
e
Cl)
~
~
<?
..,
u;> ----r ,.--'
50 100 150 200
Subset size m
FIGURE 5.30. Swiss bank notes: scores, after inclusion, of the second one hundred
units on the first principal component
ü l()
Q. 0
.-···--....-~.. _......
"Pi
""
.<::
~ 0
c
Cl) 0
--- - ~
0
-- /
/ ~
0
ü
l()
9
"......---------
0
----------- -- -- ---
60 70 80 90 100 60 70 80 90 100
Subset size m Subset size m
FIGURE 5.31. Swiss bank notes - forgeries: left-hand panel, forward plot of
percentage of total variance explained by each principal component; right-hand
panel, forward plot of correlations of variables with the first principal component
entry of the last, outlying, observation. The correlations with the fourth
component in the right-hand panel of Figure 5.32 show both some further
trends from m = 86 and some jumps at the very end of the search. These
plots show how changes in the structure of the data during the forward
search can simultaneously affect several components.
5.9 Municipalities in Emilia-Romagna 265
(.)
c.. "'
0
"E
~
.c
~ 0
"'c:0 0
~
~
0 ""?
(.)
9
60 70 80 90 100 60 70 80 90 100
Subset size m Subset size m
FIGURE 5.32. Swiss bank notes: forward plots of correlations of principal compo-
nents with variables. Left-hand panel, third component; right-hand panel, fourth
component
0
C')
"0
"'
c:
·a;
Ci.
)(
"'
iii
0
N
>
ö (r--········-~·····-······-··-··-···-·~--......_..J······~·../···-·····-······~-·~--..-·········---·······-········,···········-················-··--··..
;!!
0
,,... --------------------- --------- ----------------------- _____ ,
>-~~~~~~~~~-----------~--------~---------------
~ --=====--=~--:-_--=-----;;;:_==:;;::-_~,:_-==-:;;::_.
:... :.....:._·_:. ..:..._-_._ .:._: -·- :_ ..:~ -·-.:....:. _._._,:. ..:-·-:... ..
-
!!:3&~B&aiii5D!iJinn~ . . ~.:....: _._._ -=--=~~~
0
.2 "'ci
~0
go
~
0"'
9 "'
9
,16.------------------16
"'
0
~~
~ 0
:;:"' 0
"'
9 t . ~
~- --,----------17
200 250 300 350 200 250 300 350 200 250 300 350
Subset size m Subset size m Subset size m
28
8. ........
....
0
! .... , .. _ - - .•/ _ , _.-- - - - _ _ ,.
"'
0
0 0
:;:
~ 5 ...·~·················· · · ....
L~~-- ··._:
9
....
0
0
~ 0
:;:"
~
"'
9
200 250 300 350 200 250 300 350 200 250 300 350
Subset size m Subset size m Subset size m
although the last fifty or so units to enter produce a slight trend. Their
infl.uence is particularly evident in the panel of Figure 5.34 for demographic
variables, for which many outlying municipalities have extreme readings as
is shown in the first two rows of the scatterplot matrix in Figure 3.26. An
additional feature of the second component is the steady increase in the
trajectory of its correlation with y 9 , the unemployment rate, shown in the
lower right panel. Although not of great relevance for the interpretation
of this component (the correlation ranges between -0.2 and +0.2), it is
interesting to investigate why the correlation changes its sign along the
search. Figure 5.36 shows the boxplots of the distribution of y 9 for all
communities and then separately for the first 241 municipalities to enter
the search and for those entering in the last 100 steps; communities with
higher unemployment tend to enter later. As we know, many of these are
the poorer and more remote communities.
Figure 5.34 also provides guidance in interpreting the meaning of the
second principal component. This component is also positively correlated
with indicators of population youth (y 1 and, to a lesser extent, YIO, y 1 2
and YI3)· This is the reason why the forward plots related to demographic
variables decrease towards the end of the search, due to the inclusion in the
last steps of several small and aging communities located in the mountains
(the last 14 units which enter are the same as those reported in §4.10.4).
However, in contrast to the interpretation of the first component, economic
conditions seem to be generally poor for municipalities with high scores
5.9 Municipalities in Emilia-Romagna 269
~1
--.
~ -- ~ < ~
e S!
-
..----.=...,
e
!!
0 0
t
FIGURE 5.36. Transformed Emilia-Romagna data: boxplots of the distribution
J
of yg, the unemployment rate, for, left-hand panel, all units, centre panel, the
units included when m = 214 and, right-hand panel, the last 100 units to enter
the search
both of them the absolute values of the correlations are fairly high in the
last 100 steps of the search.
Before we leave Figures 5.34 and 5.35 we pointout the special effect of the
final few outliers which had such a dramatic effect in figures like 4.46. These
effects are all small, but sharp, changes in some correlations, for example for
y 28 in the work variables of component one and Y23 in the wealth variables
of component two in Figure 5.34. There is also some effect on these two
variables in the third and fourth components of Figure 5.35. The effects
are however slight, showing the stability of the principal components to
the presence of a few outliers with a data set of this size.
Individual scores on the first two components are monitored in Fig-
ures 5.37 and 5.38. Both forward plots show firm patterns along the search,
which are not infiuenced by the final indusion of outliers. As was illustrated
at the end of §5. 7 in our analysis of data on the quality of life, such sta-
bility is one beneficial consequence of transforming data to approximate
normality
We first consider Figure 5.37. Although there are many extreme negative
scores, projecting the 28 variables on the first principal component does
not seem to reveal any particular duster of outliers. In fact, we already
noted that it is the multivariate combination of several extreme responses
that produces the impressive plots of Mahalanobis distances in Figure 4.47
and Figure 4.48. The first principal component explains 38% of the total
variation in the last step of the search, which is certainly a good result
starting from 28 responses, but we could not expect this projection to
be fully representative of the multivariate structure of the data. Highly
negative scores tend to be less so towards the end of the search, when
an increasing number of aging and poor municipalities are induded in the
fitted subset.
Projection onto the second principal component (Figure 5.38) shows, on
the contrary, a marked negative outlier, corresponding to Bologna (unit
6), the largest and richest (according to the available data) municipality in
Emilia-Romagna, but also a community with an aging population. Hence,
the outlyingness of Bologna is not unexpected in this plot, in view of our
interpretation of the second principal component. There also seems to be
a duster of a few less extreme outliers on the second principal component,
induding the towns of Parma (unit 210), Modena (159), Piacenza (262)
and Ferrara (68), and the municipalities of Casalecchio di Reno (11, in the
suburbs of Bologna) and Porretta Terme (49, a touristic and spa resort in
the Apennines). This duster is particularly dear fromm= 280 onwards.
We end our principal components analysis of municipalities in Emilia-
Romagna by looking at Figure 5.39, the scatterplot of individual scores on
the first two components computed at step m = 321 before the indusion of
the outliers. The last units to enter are numbered in the plot, which shows
that many of them have low scores on the first component whilst their
scores on the second component are unremarkable. The plot shows even
5.9 Municipalities in Emilia-Romagna 271
5!
II')
j"' u;>
II')
- ·6
0 ,---~--------------------------------
200 250 300
Subset slze m
0
u
Q.
-g
~
cn
'l'
'!"
Porretta.. C~salecchii_>, d
Ferrart Piacent'cl' 0 ena
"' +Parma
"\'
+ Bologna
-10 -5 0 5
First PC
more clearly the outlying nature of Bologna and the other six communities
mentioned in the last paragraph as having low scores on the second com-
ponent in Figure 5.38. This plot shows how principal components analysis,
combined with the forward search, can reveal the structure of the data.
In many examples the presence of outliers obscures the structure when all
observations are fitted. Plots, as here, of quantities calculated earlier in
the search are more informative about outliers and, more importantly, the
structure of the majority of the data.
5.11 Exercises
Exercise 5.1 Show that the cosine of the angle () between vectors x
(x1, x2)T and y = (Y1, Y2)T in Figure 5.40 is given by
X1Y1 + X2Y2
cos(O) = llxiiiiYII . (5.25)
Exercise 5.2 Given two vectors x and y, find an expression for the vector
x which represents the projection of x onto y (see Figure 5.41). What is the
expression which defines the length of x? Show that the vector x minimizes
the function llx- xll 2.
Exercise 5.3 Given a set of independent vectors (y1 , Y2, ... , Yk) , find a set
of mutually orthonormal vectors (z 1 , z2 , . .. , zk) with the same linear span.
Prove that the expression for the vector fj which represents the projection
of Yk on the linear span of Y1, Y2, . . . , Yk-1 is given by fj = Z zr
Yk where
Z = (z1, Z2, .. . , Zk-1)·
Exercise 5.4 Let the squared distance of a point y from the origin be given
by
(5.26)
5.11 Exercises 275
X - X
A
x=?
Length of the projection =?
FIGURE 5.41. The projection of vector x on y
(5 .27)
(5.28)
Give the equation of the ellipse in canonical form, the equation of the
straight lines which define the major and minor axes of the ellipse, and
calculate the length of the semiaxes.
If a point on the ellipse in canonical form has a value of 0.3 for z 1 , the
first canonical coordinate, find its z 2 coordinate. H ence find the coordinates
of the point on the original ellipse.
xTßx
max - T - = >.1, attained when x = 'Y1 (5.29)
xfO X X
xTßx
min-T-
xfO X X
= Ap, attained when x = ')'p. (5.30)
xTBx
max - T - = Ak+l, attained when x = 'Yk+l, k = 1, ... ,p- X5.31)
x,t'O.l./'1,···,/'k X X
(5.32)
i=l i=l j=l
(5.33)
where G = (g 1 , ... , 9r) is the matrix which contains the eigenvectors corre-
sponding to the r biggest eigenvalues of "Eu = y T y / (n - 1).
b) Show that the sum of squares of the errors is given by
n V
(5.34)
i=l i=r+l
where lr+l ;::: l r +2 ;::: · · · ;::: lv are the (v- r) smallest eigenvalues of "Eu.
Note that ai = GGT Yi is the projection of the Yi into the space spanned
by 91, ... , 9r and 2:: a'[ ai is the sum of the squared lengths of the projected
deviations. So, what are the geom etrical interpretations of principal com-
ponent analysis ?
c) What are the coefficients of the best approximating plane g 1 , ... , 9r?
d} Why in equation {5.32}, without loss of generality, can we consider
vectors Yi instead of Yi ? What is the geometrical interpretation of this as-
pect?
5oll Exercises 277
10 1.5
y = ( 6 005
5 2
what can you say, a priori, about the correlations between variables and
principal components using standardized and unstandardized variables? What
are your expectations about the lengths and the orientation of the arrows
which represent the variables in the biplot using both standardized and un-
standardized variables?
Using a computer program calculate the matrices A and B which form
the basis for the construction of the biplot using unstandardized variables
when a = 0 and o: = 10
Exercise 5.9 Let Zk = (zc 1 , 000, zck) be the matrix containing the first
k columns of the matrix Y G Show that the percentage of variance of the
0
variable iJc J explained by the first k principal components (R~c IZ) can be
J
partitioned as R~Ycj IZk = r~
Yc J 1zc 1
+ r~YcJ 1z c 2 + 000+ r~Yc j 1z c k 0 Show that
k 2 l
"" gji i (5035)
L So2 '
i=l J
Exercise 5.14 How can you interpret the distance between units i and j,
the lengths of the arrows associated with vectors bj, and the cosine of the
angles between vectors bj in the biplot when a = 0 and a = 0?
5.12 Salutions
Exercise 5.1
Westart by noticing that from Figure 5.40, by definition, cos(BI) = xdllxll
and cos(B2) = vdllvll, sin(Bt) = x2/llxll and sin(B2) = Y2/IIYII· Now
Yl X! Y2 X2 XtYl + X2Y2 XT y
cos(B) = cos(B2 - Bt) = TIYIT~ + TIYIT~ = llxll IIYII = llxll IIYII ·
(5.37)
Since cos(90°) = cos(270°) = 0 and cos(B) = 0 only if xT y = 0, x and y
are perpendicular when xT y = 0.
Exercise 5.2
lf B is the angle between x and y (see Figure 5.42), the length of the
projection is given by
jxTyj
Lxl cosBj = llxll IcosBj = llxll llxll IIYII (5.38)
(5.40)
Another way to derive the expression for the vector x comes from noticing
that vectors x- x = x- ty and x = ty are orthogonal (see Figure 5.42).
5.12 Solutions 279
A
X - X
(X - Xf X = (X - ty) T ty = (X - ty f y = 0. (5.41)
(5.42)
Differentiating with respect to t we obtain
Exercise 5.3
Starting from a generic set of vectors y 1 , .. . , Yk, a set of orthogonal vectors
280 5. Principal Components Analysis
u 1 , ... , uk which span the same linear space can be constructed sequentially
as follows ( Gram-Schmidt orthogonalization process):
k-1
L_)y'[zj)Zj (5.43)
j=l
yr(A1/11'f' +A2121!)y
Al (yT 11 ) 2 + A2 (yT 12) 2.
Yz
FIGURE 5.43. Points with constant distance from the origin (p = 2, 1 :::; >.1 < >.2)
Exercise 5.5
The purpose of this exercise is to apply in practice what we have learnt in
Exercise 5.4. Equation (5.28) can be written as a quadratic form
A ( ~ ~ ) = ( 11 12 ) ( ~1 12 ) ( ~f )
( 0.615 0.788 ) ( 1.44 0 ) ( 0.615 -0.788 )
-0.788 0.615 0 5.56 0. 788 0.615 .
From the results of exercise 5.4, the equation of the ellipse in canonical
form is
.X1z~ + >.2z~ 1
1.44z~ + 5.56z~ 1,
where Z = (zl, Z2)T = rT(y- J.l), Z1 = (y- J.l)T [l = 0.615(yl - 1.5)-
0.788(y2- 1) and z2 = (y- J.l)T 12 = 0.788(y1 - 1.5) + 0.615(y2- 1) . The
lengths of the semiaxes are
1/v'>."l = 0.834
1/v0."2 = 0.424.
The equation of the major axis (the one associated with .XI), remember-
ing equation (5.45), can be found by putting z 2 = 0 and is
Similarly, the equation of the straight line which defines the minor semiaxes
can be obtained by putting z 1 = 0 and is given by
0.615
Y2 = 0 _788 (yl -1.5) + 1 = 0.78(yl -1.5) + 1. (5.47)
Finally, the z 2 coordinates of point (say A') which has z 1 = 0.3 is given
by
z2 = ~ = ±0.3956.
±y ~
5.12 Solutions 283
lD
N
N
N
CXJ
"T
0J
>--,
0
lD
0 z2
N
'>'1
0
N
0
I
lD
0
I -0 .7 -0 .3 0 .1 0 .5 0.9 1 .3 1 .7 2 .1
y,
Exercise 5.6
Fora fixed x 1 ;j:. 0, x 1 Bxi/(x[ x 1 ) has a constant value of xT Bx where x =
xd ~ has unit length. This implies that, without loss of generality,
we can prove the result for any normalized vector xT x = 1. Now, let r
be the orthogonal matrix whose columns are the eigenvectors /'1, .. . , /'p of
matrix B and A the diagonal matrix with the eigenvalues along the main
diagonal. Finally, let y = rT X. Note that X i- 0 implies y i- 0. Now, using
the spectral value decomposition of the matrix B, the quadratic form xT Bx
can be written
xTrArTx
p p
LYTAy = LAiYI
i=l i=l
p
In order to prove that the maximum value is attained when x = /'l note
that setting x = ')'1 gives
since
1, k=1
,..r; /'1 = { (5.48)
0,
xTBx ,..'[B/'1
,..'[r ArT1'1 = yT Ay
)( )
A1 0 0 1
0 A2 0 0
( 1 0 0 ) (
0 0 Ap 0
A1.
5.12 Solutions 285
To prove the final part of the exercise, note that when x is perpendicular
to the first k eigenvectors 'Yi, the vector y becomes
'Yi 0
y rTx= 'Y[ X=
0
(5.49)
T
'Yk+l Yk+l
'Y'f; Yp
Exercise 5. 7
a) Consider a set of orthonormal vectors U = (u 1 , ... , Uk) and, for fixed fh,
consider the approximation given by an arbitrary vector Ubi. Westart by
noticing that
Yi - uur Yi + uur ih - Ubi
(I - UUT)Yi + u (UT Yi - bi). (5 .50)
Note that the cross product vanishes because (I- UUT)U = U- uuru =
U- U = 0. The final term in equation (5.51) is positive unless bi is chosen
so that bi = ur Yi· With this choice of bi, Ubi = uur Yi is the projection
of Yi on the plane spanned by the orthonormal vectors u 1 , ... , Uk (see Ex-
ercise 5.3). In other words, for fixed U, the vector Yi is best approximated
by its projection onto the space spanned by u 1 , ... , Ur. When ai is chosen
as uur Yi, the sum of the nv squared errors becomes
n v n
2.:: 2.:: (f)ij - aij )2 L)Yi- UUTiJi)T(f)i- UUTiJi) (5.52)
i=l j=l i=l
n n n
2.:: iJT Yi + 2.:: iJ'[uuriJi - 2 2.:: y'[uur Yi
i=l i=l i=l
n n
2.:: iJT Yi - 2.:: y'[uur Yi· (5.53)
i=l i=l
286 5. Principal Components Analysis
The first term in equation (5.53) does not depend on U, therefore the
sum of squares of the errors can be minimized by maximizing the last term
in the former equation. From the geometrical point of view
n n
2:/fi[UUTf}i = L IIUUTilill 2 (5 .54)
i=1 i=1
n n
"'-ruur-.
L__- Yi y, = tr L fi[UUT Yi
i=1 i=1
n
i=1
n
i=1
From Exercise 5.6, we know that the quadratic form u[f:uu 1 is maximized
when u1 = 91 , where 9 1 is the first eigenvector corresponding to the first
eigenvalue of matrix Eu. For u 2 perpendicular to u 1 , uf Su2 is maximized
by 92· In r dimensions U = (u1, ... ,ur) = (91,···,9r) and ai = GGTfli·
Consequently A'fvxn) = GGT(fj 1 , ... , fln)· In other words, the r dimensional
plane which minimizes the sum of squares of the distances between the
observations Yi and the plane is determined by 9 1 , ... , 9r, a new basis.
b) In this part of the exercise we derive the error bound for the sum of
squares of the approximation. Westart noticing that when ui = 9i ,
(5.55)
5.12 Salutions 287
i=l i=l
T
i=l i=l
L
V
(n- 1) li.
i = r+l
-T
Yn9r ) (cxn)
(5.56)
So, the ith element (column) of A_T can be written
(5.57)
Exercise 5. 8
The purpose of this exercise is to show in geometric terms why we must
make a preliminary standardization of the data when the variables have
different magnitudes.
Y can be written in terms of deviations from the mean as: Y = (i}c 1 , ••• ,
fJcJ, where f)ci is the n x 1 vector forming the jth column of Y.
As we have seen from Exercise 5.7, the best r dimensional approximation
of Y which minimizes the equation
n v
IIY - All 2 = L .L:wij - aij) 2 (5.61)
i= 1 j = 1
where ac3 is the jth column of the n x v matrix A. As we have seen in Ex-
ercise 5.7 if, for example, Ais ofrank 1, aci is the jth column ofthe matrix
Y 919[. In this case the jth column of Y is approximated by a multiple 9j1
(j = 1, ... , v) of the n dimensional vector (line) Y 91 which represents the
first principal component. This implies that the first principal component
Y 91 minimizes the sum of the squared distances LJ
from the deviation
5.12 Solutions 289
TABLE 5.3. Centred matrix Y and squared lengths of the vectors associated with
the three variables
f}c, iJc2 YCs
3 0.17 -1.33
-1 -0.83 1.67
-2 0.67 -0.33
TABLE 5.4. Correlation between variables and first principal component using
unstandardized and standardized variables
Variable U nstandardized Standardized
number variables variables
1 0.983 0.619
2 0.183 0.786
3 -0.752 1
vector YCj = Ycj - yjJ to a line and so on. Naturally, the Ionger devia-
tion vectors f}cj = (f}cj, , . .. , f}cjn)T (those with larger Sj) have the most
effect on the minimization of I: L]. Table 5.3 gives the centred matrix Y
and the squared lengths of the vectors associated with the columns of Y.
Note that the length of f}c, is much greater than that of the vectors as-
sociated with the second and third columns of Y. This implies that the
first column of Y will exert a great influence on the minimization of equa-
tion (5.62). Figure 5.45 which represents the 3 vectors associated with the
3 columns of the matrix Y and the line associated with the first principal
component, shows that, in this example, the first principal component is
highly attracted by the first variable f}c,. In other terms, the angle between
f}c, and the vector which represents the first principal component is very
small. Table 5.4, which gives the correlations between the variables and the
first principal component using standardized and unstandardized variables,
shows that using unstandardized variables the correlations with variable 1
are much greater than those of the other variables. Finally, the magnitude
of the correlations between the first principal component and the variables
is exactly equal to the ordering of their lengths.
On the other hand, if the variables are standardized, they have equal
lengths and exert equal influence in the minimization of I:;=l Fig- LJ.
ure 5.46, which represents (using the same 3 dimensional point of view as
Figure 5.45) the 3 vectors associated with the 3 columns of Y and the line
290 5. Principal Components Analysis
Y2
r--JO
-._
<'"
y, ""'"
I
rJ
I
a.ssociated with the first principal component, shows that the first principal
component is virtually equidistant from the three vectors.
Let us now see what we can say a priori about the biplot. The biplot
for the unstandardized variables will show one arrow ( the one for variable
1) much Ionger than the others. When the variables are standardized, the
matrix which contains all the standardized variables will have rank 2. This
implies that the biplot for standardized variables will give a perfect rep-
resentation of the original rank 2 matrix. In this case, the length of the
arrows will be exactly the same. Finally, one of the arrows in the biplot
will be parallel to one of the axes because the matrix Y has rank 2.
Figure 5.47, which shows the biplot for unstandardized (left panel) and
standardized data (right panel) when a = 0 and a = 1, illustrates graphi-
cally all the concepts just described.
5.12 Solutions 291
ll
I
No
,-
Y3
-K I
I
I I
I
I Y2
I I I Yl
('J
I
-0.8165 0.004683
u~(
0.5774 )
0.4042 -0.7094 0.5774
0.4123 0.7048 0.5774
L =
( 8.1044
0
0
0
1.8123
0 n
)
-0.9136 -0.3603 -0.1883
c~ ( -0.0492
0.4026
0.5577
-0.7477
-0.8286
-0.5273
292 5. Principal Components Analysis
·3 ·2 ·1 0 2 ·0.5 0.0 0.5 1.0
C!
3 3
"'ci y2
y2 "'ci
~
!\'-
"'ci. 0 \ "'Eci.
E ci
0 .:;-------------------"~ 0 0
I
ü y1 ü ci y3
y3
"'9 2
"'9 v!
/
2
C! y1
'7
-1.0 -0.5 0.0 0.5 -0.5 0.0 0.5 1.0
Comp.1 Comp.1
When a = 0 and a = 1,
-0.8165 0.004683 )
A= ..;n=Iu(2) = J2 ( 0.4042 -0.7094 . (5 .63)
0.4123 0.7048
Exercise 5. 9
In the multiple regression model y = X ß+ E, R~IX is defined as
In this exercise we have to find the expression which defines the percentage
of variance of the jth variable (dependent variable) extracted by the first
k principal components (explanatory variables). In this case y corresponds
to the jth column of the matrix Y, while the matrix X corresponds to the
first k columns of the matrix YG, say zk = (zc,' ... 'zck ).
In this case (XT X) = (n- l)cov(Zk) = (n- l)diag(s;,, ... , s;k) =
(n- l)diag(h, . .. , lk), and xry = (n- l)cov(Zk, i/cj). Now, given that
cov(Zk) is diagonal, and that Y and YG have zero mean, equation (5.65)
can be rewritten as
Since cov(ycj, zcJ = 9jili, where 9)i is the jth element of the ith eigen-
vector, and s;,= li
k 2 l
"'9ji i
L...t s. .2
i=l J
L9I)i·
i=l
Exercise 5.10
The Euclidean distance between rows i and j of the matrix Y is defined as
294 5. Principal Components Analysis
dij (Yi - Yjf (Yi- Yj) = {YT q(i)- yT q(j)} T {YT q(i)- yT q(j)}
{q(i)- q(j)}T yyT {q(i)- q(j)}. (5.66)
Given that the Mahalanobis distance between rows i and j of the matrix
Y is defined as
(5.67)
it follows from equation (5.66) that equation (5.67) can be rewritten as
(5.68)
Exercise 5.11
If z = YG, where Gis an orthogonalmatrixsuch that cTc = GGT =I,
the distance between rows i and j of the matrix Z can be written as
This implies that the Euclidean distance b etween two units in the space of
the principal components (if all components are considered) is equal to the
Euclidean distance in the original space. Note that if not all components are
considered (g1, ... ,gr)T(gt, .. . ,gr ) =Ir, but (gt, ... ,gr)(gt, ... ,gr)T -1- Iv
Exercise 5.12
The matrix which cont ains the standardized principal components can be
written as U* = YGL - l / 2 . The square distance between row i and row j
of U* can then be written as:
Exercise 5.13
When a = 0 and a = 1, A = Vn=-1Uc 2 ) and B = Gc 2 )Lgr
Using the
result in Exercise 5.10, the distance between rows i and j of A can be
5.12 Solutions 295
written as
This implies that the matrix G( 2 JL(2~Gbl in equation (5.69) can be rec-
ognized asthebest rank two approximation of matrix E;_;- 1 . In conclusion,
the Euclidean distance between two points in the biplot (rows of matrix
A) , when a = 0 and a = 1 can be interpreted as the best rank two ap-
proximation of the Mahalanobis distance between the corresponding rows
in the original space.
Let us now check what interpretation we can give to the lengths and
g{
cosines of the arrows associated with the p rows of the matrix B = G (2 ) L 2 )
We must determine to what extent the (i,j)th element of BBT (scalar
product b'[bj) approximates the i, jth element (SiJ) of the sample covari-
ance matrix Eu. In this case BBT = G( 2)L( 2)Gb)· Now, given that Eu
can be decomposed as GLGT, it is easy to recognize that G( 2)L( 2)Gb)
is the best rank two approximation of the matrix Eu. This implies that
the lengths bJbj of the arrows (diagonal elements of BBT) are the best
rank two approximations of s] . Similarly, the cosine of the angles between
two arrows b'[bj/(llb;ll llbill) can be interpreted asthebest rank two ap-
proximation of the correlation coefficient between the two corresponding
variables, that is s;j/(sisj)· In this case the jth diagonal element of the
matrix BBT is
(5.70)
From equation (5.35) of Exercise 5.9, it follows that if the variables have
been standardized, the squared length of vector ( arrow) bj is equal to the
percentage of variance of the jth variable explained by the first two prin-
cipal components.
Exercise 5.14
- 1/2
When a = 0 and a = 0, A = YG( 2) = vn:=TU(2)L( 2), and B = G(2)· The
distance between rows i and j of the matrix A can be written as
The matrix (n - 1)U( 2 )L( 2 )U~) is the best rank two approximation of
yyr = ( n - 1 )U LUT. This implies that when a = 0 and a = 1, the dis-
tance between two units in the biplot can be interpreted as the best rank
two approximation of the Euclidean distance in the original p dimensional
space.
However for B, BBT = G( 2)G'[;) is not the bestrank two approximation
of f:u.
6
Discriminant Analysis
6.1 Background
In discriminant analysis the multivariate observations are divided into g
groups the membership of which is assumed known without error. The
purpose of the analysis is to develop a rule for the allocation of a new
observation of unknown origin to the most likely group. For example, in
the case of the Swiss bank notes there are two groups, genuine notes and
forgeries . The purpose of the analysis would be to develop a rule for deter-
mining whether or not a new note was genuine.
Westart in §6.2 with an outline of some theory for discriminant a nalysis.
The assumptions are not only that group membership is known, but also
that the observations have a multivariate normal distribution. In the more
general case the observations in each group have both a distinct mean and a
distinct covariance matrix. Application of maximum likelihood theory Ieads
to a classification rule which, in the space of the variables, has quadratic
boundaries. The more usual case is that of linear discriminant analysis
which arises when the groups have the same covariance matrix although,
of course, differing means.
Mention of the Swiss bank note data as an example was not fortuitous.
Although these data were believed to have two groups, our analysis has
showed that there are three groups and at least one misclassified note. Use
of such data as a training set on the assumption that all observations are
correctly categorised will not lead to optimal discrimination and may, of
course, Iead to a very poor rule. Accordingly, we use the forward search to
298 6. Discriminant Analysis
see how the behaviour of the allocation rule changes as we add observations
to those used in discrimination.
With one group of observations we have seen how the search progresses,
ordering all observations by their Mahalanobis distances. In §6.3 we extend
the forward search to ordering and including units from several popula-
tion. We then, in §6.4, describe the properties of the analysis which it is
informative to monitor during the forward search. These again include Ma-
halanobis distances as well as the probabilities of correct classification of
units and, for linear discriminant analysis, the composition of the planes
dividing the groups. The final theoretical material is in §6.5 where we ex-
tend the material on multivariate Box Cox transformations of Chapter 4
to discriminant analysis.
Our analyses of data start in §6.6 where we present a first analysis of
data on irises popularised by Fisher. Although the groups have differing
variances, we begin with a linear discriminant analysis. In the following
section we compare linear and quadratic discriminant analyses on some
data on electrodes where the two groups have very different variances.
We return to the iris data in §6.8 where we transform the data to obtain
more nearly equal variances in all groups. Despite the strong evidence for a
transformation, the performance of the linear discriminant analysis is little
affected by the transformation. This group of analyses concludes in §6.9
with the investigation of the effect of the three groups of the Swiss bank
note data on two group discriminant analysis.
The second half of the chapter covers two related analyses. In §6.10 we
analyse a set of simulated data. The data are more complicated than those
analysed earlier and are designed to have a structure similar to data on
muscular dystrophy that are analysed in the succeeding section. Both sets of
data require transformation: in the case of the data on muscular dystrophy
the transformation increases the discriminatory power of easily measured
variables compared with those that are more difficult to measure. We use
the analysis of the simulated data to highlight ways in which a diagnostic
analysis, starting from a fit to all the data, can fail when a complicated
structure of outliers is present in the data. The chapter concludes with
comments on the literatme and suggestions for further reading.
Let 7rz denote the prior probability of an individual coming from pop-
ulation or group Pz, l = 1, .. . , g where g is the number of populations
considered. If we indicate by J(yll) the density of the distribution of the
observations for population l, then the posterior probability that unit i
belongs to population l after observing Yi is:
Following the Bayes rule, we choose the population with maximum pos-
terior probability p(llyi). If we assume that Pz is a multivariate normal
population with mean p,z and dispersion matrix I;z, the log of the numera-
tor of equation (6.1) becomes
1 1 T
V 1
- 2log27r- 2log II;zl- 2 (Yi- p,z) I;! (Yi - p,z) + log1rz . (6.2)
We allocate the unit to that population for which the posterior probability
is highest.
V 1 1 T 1
-2log27r- 2log II;zl- 2(Yi- p,z) I;! (Yi- p,z) (6.3)
-logii;ll- (Yi- p,I)T2;1 1 (Yi- p,I) > -logii;21- (Yi- f..t2)TI;2 1 (Yi - f..t2),
(6.4)
a quadratic in y. The effect on the allocation rule of unequal prior proba-
bilities 1r1 and 1r2 is to change this boundary by a constant a mount.
It is convenient to rewrite (6.4) in terms of squared population Maha-
lanobis distances dr. The observation is allocated to population 1 if
(6.5)
(6.6)
that is, Yi is allocated to the group for which its Mahalanobis distance is
least.
It is informative to rewrite (6.6) as
These estimates are then used in place of the known values in the quadratic
discrimination rule (6.3).
Since in this chapter the n sample members are divided into g groups it
is sometimes convenient to relabel the y values. Let Yi 1 = (yill, ... , Yivl)T
be the v x 1 vector containing the readings for unit i belonging to group l
( i = 1, . . . , nz) and l = 1, ... , g.
If the hypothesis of equality among covariances is true, that is :E 1 = :E2 =
· · · = L: 9 = :E, the training sets ( n 1, ... , n 9 ) are pooled for estimation of :E
to give an overall training set of size n = 2::f=1 n 1• The estimated within
groups covariance matrix is
(6.9)
6.2 An Outline of Discriminant Analysis 301
6. 2. 5 Canonical Variates
In contrast to the probabilistic approach of §6.2.1, Fisher (1936) tackled
discrimination from a purely data-based standpoint. He supposed that one
was presented with g independent random samples, of sizes n 1 , n2 , ... , n 9
from g multivariate populations and that a method of best distinguishing
among these samples was required. The only assumption he made was
that the dispersion matrices of these populations were equal; otherwise the
populations were completely unspecified. With this assumption the data
can be summarized by computing the sample mean vectors fh, l = 1, ... , g
and the pooled within-sample covariance matrix f:w in (6.9) .
Fisher then looked for the linear combination Zil = aT Y i 1 , with a =
(a 1 , . . . , av)T, that gave maximum Separation of the group means, when
measured relative to the within group varia nce of the data. This is possible
since, if we specify the vector a, we convert each v-variate observation
Yi 1 = (Yi11, .. . , Yivl)T into a univariate observation Zil·
Given that the total sum of squares of the Zil can be partitioned into the
sum of between groups (SSB) and within groups components (SSW)
9 n1
If we use the unbiased estimators of the within groups and between groups
variances, we can rewrite the former ratio as:
F = SSB(a)j(g- 1) .
SSW(a)/(n- g)
The larger the value of this ratio, the more variability is there between
groups rather than within groups. The notation SSB(a) and SSW(a) em-
phasizes that choice of a determines the value of F; different choices of
the coefficients a = (a 1 , ... , av)T yield different values for the two sums of
squares and hence different values of F. The best choice of a will clearly be
the one which yields the largest F value. With this choice of a the resulting
values Zil will yield the one-dimensional projection of sample points that
show up differences among groups as much as possible.
To analyse this rule we need the unbiased between groups estimator of
the covariance matrix
With this normalization the canonical variables are not only arranged to
be uncorrelated within groups, between groups (and consequently over the
whole sample) but also share the property to have equal variance (Exer-
cise 6.7).
Once the linear discriminant function (first canonical variable) has been
calculated, an observation Yi can be allocated to one of the g populations on
the basis of its "discriminant score" aT Yi· The sample means have scores
aT'ili = Zi . Then y is allocated to that population whose mean score is
304 6. Discriminant Analysis
IaT Yi - a T-I
y j < IaT Yi - a T-1
Y1 for l =/= j = 1, . .. , g.
It is possible to show (Exercise 6.4) that when there are only two groups
the first and unique canonical eigenvector of the matrix f:iJi;B is given by
allocate y to P1 if (6.14)
and to P2 otherwise.
The allocation rule given by (6.14) is exactly the same as the sample
maximum likelihood rule for two groups from the multivariate normal dis-
tribution with the same covariance matrix given in (6.11). However, the
justifications for this rule are quite different in the two cases. In (6.11)
there is an explicit assumption of multivariate normality, whereas in (6.14)
we have merely sought a sensible rule based on a linear function of y. Thus
we might hope that this rule will be appropriate for populations where
the hypothesis of multivariate normality is not exactly satisfied. However,
Fisher based his rule solely on the first two moments of the data. Results
on the characterization of distributions show that these are the sufficient
statistics for members of the elliptical family, provided any other parame-
ters are known. For example, if the family is multivariate t, the degrees of
freedom would need to be known. Since the normal distribution is a mem-
ber of this family, the relationship is perhaps not so surprising. Details of
the elliptical family are given, amongst others, by Muirhead (1982, p. 34).
For g ;:::: 3 groups the allocation rule based on the first canonical variate
and the sample maximum likelihood rule for multivariate normal popu-
lations with the same covariance matrix will not be the same unless the
sample means are collinear.
over optimistic assessment of the success rate of the allocation rule and
may give misleading results unless sample sizes are very large.
A reliable estimate of the error rate will only be obtained if the data used
in the assessment of the rule are different from the data that are used in the
formulation of the rule. This is essentially the principle of cross validation.
The simplest implementation of this principle is to split each training set
randomly into two portions and then to use one portion of each training set
for estimation of the allocation rule itself and the other portion to assess its
performance by finding the proportion of individuals misallocated by the
rule. This approach is known as sample splitting. The main drawback of this
approach is that unless initial sample sizes are very large, the estimation
of the allocation rule and the assessment of its performance will be based
on small samples and hence will be subject to large sampling fluctuations.
In addition, we have to remernher that any future allocations will be made
according to a rule based on the whole of the training set not just on
a random portion of them. Thus, the rule whose performance is being
assessed by sample splitting is not the rule that will be used in the future.
In order to overcome the problems associated with the two previous
methods the leave one out method has been suggested. A review is given
by Krzanowski and Hand (1997). The technique consists of determining
the allocation rule using the sample data minus one observation, and then
using the consequent rule to classify the omitted observation. Repeating
this procedure by omitting each of the units in the training sets in turn
yields, as estimates of the error rates, the proportion of misclassified Ob-
servations in the training sets. The problern of all these approaches is that
they may produce biased estimates if multiple outliers are present in the
data. We therefore monitor the misclassification rate of all units throughout
the forward search, that is
Thus, in the constrained search, the subset in every step of the forward
search must contain proportions of units which agree, as closely as possible,
with the proportions in the overall sample.
(6 .15)
on the included units for each m. We then generate forward plots of the
quantities customarily calculated when all the data are fitted, that is when
m = n . In this section we describe the most informative of these quantities.
Outliers and influential observations can be detected by simple graphical
displays of statistics involved in the forward search. It is extremely useful
to monitor particular Mahalanobis distances such as:
and
dfmz+l] m = mo, .. . ,n- 1 l = 1, ... ,g. (6.17)
Statistics in equations (6.16) and (6.17) respectively refer, for each group,
to the maximum Mahalanobis distance in the subset and the minimum
Mahalanobis distance among the units not belonging to the subset.
If the dispersion among the groups is markedly different, the curve of
d[m.+l] never overlaps that of dJmz+l]' l =1- t = 1, ... ,g. Moreover, these
curves give, for each group, a senes of outlier tests comparing the observa-
tion about to be introduced with those already in. If one or more atypical
observations are present in the data, the plot of dfmz+l] must show a peak
in the step prior to the inclusion of the first outlier. On the contrary, the
plot which monitors dfmzl shows a sharp increase when the first outlier joins
siml. This curve may also show a subsequent decrease due to the masking
effect.
The details of these curves depend importantly on whether the search is
constrained or unconstrained and on what departures , if any, are present.
The search progressively include units with small Mahalanobis distances.
First suppose that there are no outliers and that quadratic discriminant
analysis is used if appropriate. Then, with a constrained search, the number
of units in the subsets will be balanced and, for example, for each group,
the minimum distance of units not in that group will be similar for all
groups. The progress and output of the search will be similar in all groups.
Now suppose one group contains a set of k outliers. In the unconstrained
search these k units will enter last, so that, before they enter, the proportion
of units from this group in the subset will be lower than the ratio Rt.
Alternatively, if a balanced search is used, these units will be forced to
enter to keep the ratio close to Rt. However, before they enter, the minimum
distance within group l of units not in the subset will be larger than for
outlier free groups.
The same effect is seen if linear discriminant analysis is used when the
variances of the groups are different. Then, in an unconstrained search,
the units of a group with small variance will tend to be included by the
unconstrained search before those from a group with larger variance. These
effects are illustrated in our analysis of the electrode data, for example in
Figures 6.8 and 6.9.
6.5 Transformations to Normality in Discriminant Analysis 309
m = mo, . .. , n; (6.18)
L()..) = -
l=l l=l i=l
(6.21)
where Zil = (zill, ... Zivl)T is the v x 1 vector which denotes the trans-
formed data for unit i coming from group l, J-tt(A) and 'Et()..) are respectively
the mean vector and the covariance matrix for population l. Substituting
the maximum likelihood estimates P,t ()..) and f:1 1 ()..) for given ).. in equa-
tion (6.21), twice the profile loglikelihood can be written as:
g
(6.24)
1. Iris setosa
2. Iris versicolor
6.6 Iris Data 311
3. Iris virginica.
Yl : sepal length
Y2: sepal width
Y3: petal length
y4: petal width.
The data have been much analysed; they were published by Anderson
(1935) from measurements takenon plants in the Gaspe Peninsula, Quebec.
The three species are blue-flowered water loving irises, or flags, similar
to the European yellow flag. Iris versicolor is the emblematic flower of
Quebec province. The data were analysed by Fisher (1936) as an example
of discriminant analysis and are often known as "Fisher's Iris Data". They
are in Table A.10 and are also given, for example, by Krzanowski (2000,
pp. 46- 47) and by Mardia, Kent, and Bibby (1979, pp. 6- 7).
Although the data are frequently taken as a standard example for dis-
criminant analysis, they have several interesting features. They are often
analysed on the original scale (Venables and Ripley 1994, p. 307) but some-
times logs are taken (Venables and Ripley 1994, p. 316) . It is customary to
use linear discriminant analysis which assumes that the three groups have
equal covariance matrices, but there is strong evidence that this is not the
case, for example from the test for equality of variances (2.23).
Figure 6.1 is a scatterplot matrix of the four variables, plotted with a
symbol for each of the three species. The plot of Y3 against y 4 , that is petal
length and petal width, show that one species ( iris setosa) is completely
separated from the other two . The robust bivariate boxplots in this panel
of the figure enable us to see that there is also good separation, in these
two dimensions , between the other two species. We may suspect that dis-
crimination will not be very much affected by whether we use the original
or transformed data. That the variances of the measurements in the three
groups are not the same seems evident from the plot and is emphasized by
the three univariate boxplots in panels (3,3) and (4,4) of the plot, which
summarize the values of y separated by group. The bivariate boxplot for
y3 and y 4 clearly shows that the variability of the three groups increases
with the size of the measurements on petals.
In our first analysis we use linear discriminant analysis and a common
covariance matrix for all groups. We compare linear and quadratic discrim-
inant analyses for our second example, the electrodes data which are the
subject of §6.7. Because of the differing variances in the three groups of the
iris data we use a constrained forward search. With the same number of
individuals in each species, this means that, when m is a multiple of three,
the subset will contain equal numbers of observations from each species.
312 6. Discriminant Analysis
FIGURE 6.1. Iris data: scatterplot matrix with univariate and bivariate boxplots
Figure 6.2 shows two plots of Mahalanobis distances from this search.
The first panel shows the maximum Mahalanobis distances of those units
which belong to the subset for each group. The second panel shows the
minimum distance for each group of those units not in the subset - apart
from the constraint caused by the need to keep group sizes equal, these
would be the next units to be included in the subset. The first thing to
notice is the very different sizes of the distances for the groups. If a different
covariance matrix were used for each group, we know from Exercise 2.12
that the distances for the m 1 units in the subset for the lth group would
sum to v(ml - 1). The pattern shown here is further evidence that the
covariance matrices of the three groups are not equal. More important,
however, is the behaviour of the distances for Group 1. The observations
entering Group 1 from 137 onwards (and which give large distances in the
6.6 Iris Data 313
Maximum MD of units belanging to the subset Minimum MD of units not belanging to the subset
110 120 130 140 150 110 120 130 140 150
FIGURE 6 .2. Iris data. Maximum Mahalanobis distances, for the three groups ,
of units belonging to the subset and minimum distances for those units not be-
longing: reading upwards in the left half of the plots, Groups 1, 2 and 3
"" ! "1
"'
~;~(:
++f +
+
+..., + <D
~:~
.c
:y
.c
f ~ "'
.!!!
s
:2 • 6 ,.:~. llf:l.
l "' ·~ 66 o 0 15 o 10 ~ .., /f
""
6o&f'M
Jl " oOao o :i< a..
6 ~" §a~oo o 33
es o o "' "'0 ~fA.O34 \,
"2 0
"80 .e ~'Th~ ~
2 .0 3.0 4 .0 2 .0 3 .0 4 .0 2 .0 3.0 4 .0
FIGURE 6.3. Iris data. Three scatterplots of pairs of variables showing, for Group
1 (diamonds), the last units tobe included in the forward search. Theseare the
observations yielding the !arge Mahalanobis distances in Figure 6.2
right-hand panel from 133 on as the next to enter) are 33, 34, 15, 16 and
42. If these were outliers and the covariances were estimated independently
for each group, the effect of these additions on the Mahalanobis distances
would rapidly die down; inclusion of these units in the estimation of the
covariance matrix would lead to masking. But here, a succession of outliers
enters from one group only and so have a partial effect on the common
covariance matrix. They therefore remain visible in the plot.
Figure 6.3 shows scatterplots including the numbers of the units which
give the large increases in Mahalanobis distances for Group 1. The last of all
to enter is unit 42, very much an outlier from Group 1. The other four units
form a duster at the other end of the group. It may seem surprising that
these units appear so outlying. This is caused by the common covariance
314 6. Discriminant Analysis
matrix which is being fitted to the three groups. If these five units are
excluded, the bivariate scatters of Group 1 are more like those of the other
groups. This reconciliation is strongest in Panel 1 of the figure. However,
as the bivariate boxplots of Figure 6.1 emphasize, the orientation of the
bivariate distribution, particulary for y 2 and y 4 in Group 1, is quite different
from that of the readings in the other two groups. The oblique common
covariance matrix increase the outlyingness of these five observations.
The discussion of Figures 6.2 and 6.3 shows the similarities and differ-
ences in the forward search and the Mahalanobis distances when there are
several groups fitted simultaneously rather than the one fitted distribution
in the previous five chapters. Now we turn to the main feature of discrim-
inant analysis, which is the assessment of the probabilities of group mem-
bership and the establishment of boundaries between the groups. What
we are particularly interested in, of course, is how these properties change
during the search and what such changes, if any, reveal about the structure
of the data.
We use the forward search to monitor the evolution of the posterior
probabilities as observations are included in the subset. We can then both
detect infl.uential observations and determine the effect of each unit on the
posterior probabilities, so monitoring the performance of the allocation rule.
The forward search is on the Mahalanobis distances, which we showed in
equation (6.15) are strongly linked to changes in the posterior probabilities.
Figure 6.4 monitors the calculated posterior probabilities that observa-
tions in Groups 2 and 3 belong to those groups. The plots start with a
subset size of 102, out of the total 150 observations. During this period
only four units from Group 2 ever have posterior probabilities less than
0.6: two, units 71 and 84, are finally misclassified. For Group 3 only unit
134 is misclassified. In the earlier part of the search shown in the left-hand
panel most units have a classification probability close to one; some of these
values decrease slightly towards the end of the search as inore extreme Ob-
servations enter the subset and the distinction between groups becomes
slightly blurred. This effect is much more noticeable in the upper right-
hand part of the right-hand panel of the figure. The difference between
the two parreis in this respect is caused by the differing variances of the
observations in the two groups to which we are fitting the same covariance
matrix. We do not show the plot for Group 1 as all units in every step of the
search are correctly classified with posterior probabilities of at least 0.99.
As with most analyses, the pattern in Figure 6.4 is stable to the contour
of the robust boxplot used to choose the initial subset.
In line with our contention of the importance and usefulness of returning
from the forward analysis to further inspection of the data, we now give
interpretations of these findings in the space of the original data. Figure 6.5
shows the scatterplot for petal width and sepal length for units in Groups
2 and 3. Those which were sometimes misclassified are represented by filled
symbols. For two of the three units (71, 84 and 134) which are misclassified
6.6 Iris Data 315
·69
69.
,--\_,84
0 84 .. __________ , ................. ____ .. - ... ""/ 0
0 0
100 110 120 130 140 150 100 110 120 130 140 150
Subset size m Subset size m
FIGURE 6.4. Iris data. Posterior probabilities, as a function of subset size, that
observations in Groups 2 and 3 respectively belong to those groups
at the end of the forward search, there are units in the other group which
have identical observed values for these two variables. More precisely unit
71 (coordinates 3.2 and 1.8) presents the same values as unit 126 and unit
134 (2.8 and 1.5) overlaps with unit 55. Examination of the scatterplot ma-
trix with brushing shows that units 134 and 55 are very close to each other
in all bivariate scatterplots. On the left side of Figure 6.3, unit 69 (coordi-
nates 2.2 and 1.5) overlaps with unit 120. Among the units of Group 3, 120
is, apart from 134, the one which shows the smallest posterior probability
(0.779) in the last step of the forward search.
The discriminant line dividing the two groups is not shown in Figure 6.3,
but passes close to units 73 (2.5, 1.5), 78 (3, 1.7) and 139 (3, 1.8). As Fig-
ure 6.4 shows, the posterior probability of observation 73 fluctuates appre-
ciably even though this unit is always categorised correctly from m = 136
onwards. In all steps of the forward search, unit 78 always has a posterior
probability around 0.65. Unit 139 is, apart from 134, the one in Group 3
showing the smallest posterior probability in almost all steps of the forward
search.
The two remaining crosses in Figure 6.3 which appear close to triangles
refer to units 135 (2.6 and 1.4) and 130 (3.0 and 1.6). Unit 135 is the last of
the third group to be included in the forward search: it has a final posterior
probability of 0.934. During the forward search unit 130 generally shows a
posterior probability around 0.95 (the final value is 0.896).
There remains unit 84 (2.7, 1.6), the third tobe misclassified at the end
of the forward search. It is included when m = 142. Thereafter the posterior
probability that this unit belongs to Group 2 tends generally to increase.
Its final posterior probability is 0.143. An analysis of the scatterplot matrix
reveals that, in almost all the bivariate plots, this unit is surrounded by
some observations belanging to Group 3.
316 6. Discriminant Analysis
U")
C\i + +
+ + +
+ + + + +
+ + +
+ + + +
0
C\i + + + + +
.<= + + +
~ + + + + • + •
]i
.
+
"'
0..
"' "'
. .
+
"1
• "' "' "' "'
+
"' "' "' "' "' "'
"' "' "' "' "' "'
"' "' "' "'
q "' "'
"' "' "' "' "' "'
2.0 2.5 3 .0 3.5
Sepalwidth
FIGURE 6.5. Iris data. Scatterplot of two variables showing, by filled symbols, the
units sometimes misclassified. Triangles are for Group 2, crosses and diamonds
for Group 3
Our analysis of the iris data shows, we believe, that the forward search
technique in discriminant analysis is an extremely useful tool. As a result
we can:
1. Highlight the units which are always classified correctly with high
posterior probability in each step of the search. These can be sepa-
rated from those units which are declared correctly only when they
are included in the allocation rule;
2. See the evolution of the degree of separation or overlapping among
the groups as the subset size increases and determine the relationship
with those units which have a posterior probability close to 0.5;
3. Monitor the stability of the allocation rule with respect to different
sample sizes;
4. Determine the infiuence of observations by separating the units with
the biggest Mahalanobis distances into two groups: those which have
an effect on the posterior probabilities and those which leave them
unaltered.
In our example, monitaring the posterior probabilities enables us to dis-
tinguish the units whose posterior probabilities tended to increase as the
sample size grew (e.g. units 69 and 73), those whose posterior probability
was close to 0.5 (e.g. units 78, 71 and 134) and those which were always
completely misclassified (e.g. unit 84). If we have to classify a new unit we
can monitor its posterior probability at each step of the forward search. In
6.7 Electrodes Data 317
this way we can have an idea about the stability of the associated alloca-
tion and therefore which and how many observations are responsible for its
allocation to a particular group.
6. 7 Electrodes Data
The main purpose of this section is to investigate the contrasting proper-
ties of linear and quadratic discriminant analysis. We also study the effect
of using a balanced as opposed to an unbalanced forward search. For our
comparisons to be effective we again need a set of data in which the groups
have differing variances. For this purpose we use data from an unpublished
University of Berne Ph.D. thesis by Kreuter. The data, given and described
by Flury and Riedwyl (1988, pp. 128- 132), are measurements from two ma-
chines manufacturing supposedly identical electrodes. We give the numbers
in Table A.11.
The electrodes are shaped rather like nipples. There are five measure-
ments on each: Yl, Y2 and Ys are diameters, while Y3 and y 4 are lengths
(there is a trivial misprint in Flury and Riedwyl's description of the data)
and fifty electrodes from each machine have been measured. For reasons of
commercial secrecy, the data have been transformed by subtracting con-
stants from the variables. Flury and Riedwyl comment that this shift in
location does not affect discrimination. Whilst this is true in a limited way,
the subtraction of these unknown quantities means that it is not at all
Straightforward to investigate power transformations of the data to achieve
homogeneity of variance. The difficulty arises because estimation of the
subtracted constants leads to a distribution where the range of the data
depends on the parameter values. The resulting likelihood is unbounded
when it is estimated that the smallest observed values of each Yi have been
added to the original readings. There may also be local maxima of the
likelihood. A fuller discussion of the difficulties of the Box and Cox family
when shift parameters have to be estimated is given by Atkinson, Pericchi,
and Smith (1991).
Figure 6.6 shows the scatterplot matrix of the data with superimposed
robust boxplots. The data have been slightly jittered for this plot as there
is appreciable overlap of values in some of the variables. The univariate
boxplots on the diagonal of the figure show that some variances are larger
for Group one, others for Group two. Furthermore, the readings with the
larger means do not always have the larger variances, so that power trans-
formations would be unlikely to yield constancy of variance even if the data
did not have a shifted location. The bivariate boxplots in the figure show
that not only are the variances of the variables of differing magnitudes, but
also the covariances in the two groups sometimes differ. For example, in the
panel for Y3 and y 5 , the major axes of the covariance matrices are virtually
318 6. Discriminant Analysis
.so
92 . - ..
·~~ 0 0
0 .7 0 0 0
0
:~~ 0
FIGURE 6.6. Electrodes data: scatterplot matrix with univariate and bivariate
boxplots (data have been slightly jittered). Units in Group one are represented
by circles
orthogonal. The plot also shows an almost clear separation of the groups
on y4 . A question of interest will therefore be whether the remairring four
variables provide any extra information for discrimination.
We start with linear discriminant analysis and initially compare the bal-
anced and unbalanced searches. Figure 6. 7 shows forward plots of the poste-
rior probabilities that units in Group 1 belong to that group: the probabili-
ties for the balanced search are in the upper panel, those for the unbalanced
search in the lower. The most obvious difference in the two panels is that
units 6 and 8 are correctly classified around 15 steps earlier in the balanced
search than in the unbalanced one. The behaviour of the other appreciable
outlier, unit 9, is broadly similar in both panels. This unit is visible in the
plot for y 4 and y5 as a circle on the edge of a duster of triangles. Whichever
6.7 Electrodes Data 319
70 80 90 10(
Subset size m
.,"' /'
\r/
00
;§ 0
:0 r
e"' ..,.
.0
Cl.
0 0
.äi
(ii
0 0
.,.... /"''-
0.. .~:>:>-
0
70 80 90 10(
Subset size m
(a) (b)
0 0
<0 <0
0 g
"'
0
::;; ...
0
0
:::;; ~
E E
g
"!;\
0
M ·c"
E 0
M
::;; ':E
0
N
0
N
70 80 90 100 70 80 90 100
Subset size m Subset size m
(a) (b)
g g
0
~
"'
0
::;; ~ 0
::;;
..,.
0
E E
::J
-~
::;;
g E
:~
::;;
g
0 0
"' "'
~ ~
70 80 90 100 70 80 90 100
Subset size m Subset size m
selection are plotted in Figures 6.8, for the balanced search, and 6.9, for
the unbalanced one. The left-hand panel of Figure 6.8 shows the maximum
distance, at each step of the search, of units belanging to the subset; the
right-hand panel shows the minimum distance for units not belonging. The
two plots are similar- for most of the search the distances for units in Group
1 are larger, reflecting the more dispersed nature of this group which gives
rise to larger distances when a common covariance matrix is used.
In the absence of any constraint on the entry of units, those with smaller
Mahalanobis distances enter earlier. Of course, the entry of units alters the
estimates of the mean and variance and so changes the relative distances
of the units. But, from the discussion of Figure 6.8, we might expect that
units from Group 2 would enter earlier in the absence of the constraint. The
right-hand panel of Figure 6.9 shows that this is so, since the dotted line
associated with the minimum Mahalanobis distance for Group 2 terminates
when m = 94. From m = 96 only units from Group 1 enter the subset.
Figure 6.9 also shows the resulting effect on the distances in the two groups.
Whether these are the maximum of those inside or the minimum of those
outside, they are now much more equal. A consequence of the later entry
of some units from Group 1 has already been seen in the lower panel of
Figure 6.7 where the delayed entry of units 6 and 8 causes them to be
misclassified until much later in the search.
It is not necessary to choose between the two searches; our purpose is
to explore ways in which individual observations affect discrimination. But
we do need to choose between linear and quadratic discriminant analysis
when finally establishing a discrimination procedure.
The posterior probabilities for units in Group one of belanging to that
group are plotted in Figure 6.10. Those for the balanced search are in the
6.7 Electrodes Data 321
"'"'
~ ci
Q)
:E
"'
.0
0
0..~
0
s"'
0
0 0
a.. ci
70 80 90 10<
Subset size m
~"'
Q)
ci
:E
"'
.0
0
0.~
0
0
·;:::
*0 0
a.. ci
70 80 90 10(
Subset size m
upper panel and those for the unbalanced search are in the lower panel.
As in Figure 6.7, several units are only correctly classified later in the
search when the search is unbalanced. However, the important comparison
between Figures 6.7 and 6.10 is the much higher misclassification rate for
quadratic discrimination until, particularly for the unbalanced search, six
or seven observations from the end of the search. The effect is caused by the
increased number of parameters to be estimated when a different covariance
matrix is fitted to each group. Although observations included in the subset
will be better fitted by the individual matrices, the variance of prediction
for observations outside the subset is increased by the increased number of
parameters.
We saw in Figure 6.6 that there is good, although not perfect, separa-
tion between the two groups from the values of Y4· We now turn to the
question as to whether any extra discrimination is achieved by including
the remaining four variables.
Figure 6.11 is a forward plot of the elements of the standardized canonical
eigenvector of the plane separating the two populations. For this we have
reverted to linear discriminant analysis. The left-hand plot is for a balanced
search and the right-hand panel is for an unbalanced one. The two panels
are very similar.
In both the panels there is a large contribution, as we would expect, from
y 4 . The next largest contribution, araund -0.5, is from y 1 . The panel for
y 1 and Y4 in the scatterplot of Figure 6.6 shows that a straight line can
322 6. Discriminant Analysis
(a) (b)
__ ,..- ..... ______ ."_,----·4
0
i
2
"'
ci
"'"
0>
..,·a;
.
:!l
'E
..,
0
ci
·3
.s"
fJ) 3 ',------.'-.- -----------------.---- __ ,
9"' "'
9
70 80 90 100 70 80 90 100
Subset size m Subset size m
m=n-11 m=n
~ 0
~ ~
/,.,\
:;l :;l / '-,
~ ~
;; ;;
:;;
56 57 58
m=n-11
59 60 61
:;;
56 58 60
m=n
62 64 .
~ ~
~ ~
-
:;l :;l
~ ~
: ',
',
;; ;;
:;;
56 57 58 59 60 61
:;;
56 58 60 62 64 ..
FIGURE 6.12. Electrodes data: estimated densities of the two populations for y2
Si
using mean and variances based on 89 > (panels on the left) and on 100 > (panels Si
on the right). Panels on top show the estimated densities in the interval fl2 ± 30"2.
Panels at the bottom show the estimated densities in the range of variation of
the data
completely separate the two groups. The other panels in the column of the
plot for Y4 also show virtually complete separation, except for unit 9 which
is sometimes surrounded by units from Group 2.
FIGURE 6.13. Electrodes data: estimated densities of the two populations for
variables y1 , y3 , Y4 and Y5 using means and variances based on S~n) . The greatest
separation by far is in Y4
0
0
~ ~====~================~================~========~
'7" L
FIGURE 6.14. Iris data: forward plots from linear discriminant analysis. Up-
per panel, transformation parameters; lower panel, likelihood ratio test for the
hypothesis of no transformation
I 1\
aJ aJ
cD cD
\
L[)
~m=150y3 m= 150 y4
- 0 .9 -0 .4 - 0.9 -0.4 0 .1 0.4 0 . 7 1.0 1. 3
FIGURE 6.15. Iris data: profile loglikelihood surfaces for the four transformation
p arameters at the end of the search
~ ~
"' ~
~ ~
u; u;
.l!l CO .l!l CO
0 0
~ <D
~ <tl
.;i .;i
:::; :::;
.... ....
"' "'
0 0
90 100 110 120 130 140 150 100 110 120 130 140 150
Subset size m Subset size m
FIGURE 6 .16. Iris data: forward plots of likelihood ratio tests for
transformation. Left panel, Ho >. (0.5, 0.5, 0.5, 0 .5)T; right panel,
H o : >. = ( - 0.5, 0.5, 0.5, 0.5)r . Both hypotheses are acceptable
6.8 'fransformed Iris Data 327
FIGURE 6.17. Iris data: scatterplot matrix with univariate and bivariate boxplots
(after transforming the data with all Aj = 0.5). Comparison with Figure 6 .1 shows
that the effect of transformation in equalizing variances is most evident in the
bivariate boxplot for variables 3 and 4; the univariate boxplots for variables 3
and 4 now show spreads closer to those of variables 1 and 2
We now analyse the data with a constrained forward search. The for-
ward plot of the Mahalanobis distances is in Figure 6.18. Comparison with
Figure 6.2 shows the effect of the transformation. Because the three groups
now have more nearly equal variances, both the maximum distances for
units within the group and the minimum to those outside the group are
more nearly equal. This is shown by the increased closeness of the dis-
tances for the three groups. A second effect of the transformation is that
the outliers in Group 1 are now much less severe.
The scatterplot of Figure 6.17 and the forward plots of Figure 6.18 are
both indications that the data after transformation more nearly satisfy the
328 6. Discriminant Analysis
(a) (b)
"'
"' "'"'
0
~
"'
0 0
:::0 :::0
E E I
"'
E ~ "'
E ~ I
'lii ·c I
~ --....,... I "
!''
:::0
,.. _.......,
:= := 'I
' ... ,,":'- ......... ,_,'-''
..
..
110 120 130 140 150 110 120 130 140 150
Subset size m Subset size m
\
',I
I
I
I
''
I
I
',
I
FIGURE 6.19. Swiss bank notes: linear discriminant analysis. Posterior proba-
bility, as a function of subset size, that, upper panel, Observations in Group 1
belong to that group and, lower panel, that observations in Group 2 belong to
Group 2
what happens when the number of groups is larger than specified. For this
purpose we look once agairr at the data on Swiss bank notes. Our earlier
analysis showed that there was one observation from Group 1, unit 70,
which should have been classified as a forgery, and that the second group,
the forgeries, could be split into two groups, the smaller containing 15 units.
We see how the forward search applied to discriminant analysis reveals this
structure.
We start with linear discriminant analysis and, as throughout this sec-
tion, a balanced search. The uneventful upper panel of Figure 6.19 shows
that all units in Group 1 are correctly classified throughout, except for unit
70 which is, correctly we believe, classified as a forgery towards the end of
the search. The lower panel of the figure, for Group 2, is more eventful:
all units are eventually classified as belanging to Group 2, but two are
consistently misclassified until, in the case of unit 116, m = 171.
Figure 6.19 does not reveal the two groups of forgeries. Theseare clearly
revealed in Figure 6.20 which is a forward plot of maximum and minimum
Mahalanobis distances for the two groups. The two panels are similar, with
the behaviour of the distances in the two groups being quite different . The
lower curve is for Group 1. This increases towards the end of the search as
more remote units enter, with unit 70 being the last to enter. The curves
for Group 2 are more dramatic and clearly show the entry of the third
group of observations. The peak in the curve has its maximum at m = 173.
330 6. Discriminant Analysis
(a) (b)
"'"' "'"'
...,
,,,,
,~
0 0 :'. .'!•
"'
0
"'
.
0 '
0 0
0 0
"
:'..!~
0
0
I
0
'o
tl'
r
E
._,-.
I
:>
E 0
I I.; '- I
0
"ii\ "' "' ..'
_o
:::;;
(
t_,.J o'
~ ~
_,-- . ,,.:
, .. , I
! •
__ ,
~ ,. - --- ~ ,-
FIGURE 6.20. Swiss bank notes: linear discriminant analysis, forward plots of
Mahalanobis distances. Left panel, maximum distance for units in the subset;
right panel, minimum distance of units not in the subset. Solid line, Group 1
In Figure 3.47, for units in Group 2, the peak was at m = 84, before the
group of 15 outliers started to enter. But now units from Groups 1 and 2
are entering alternately, with the first of the group of outliers being less
remote than the others. After the peak in the panels of Figure 6.20, there
is a decline as similar units enter and infl.uence the estimated covariance
matrix. However, since a common covariance matrix is being fitted to both
groups of observations, the decline is less than when just Group 2 is fitted
in Figure 3.47. Comparison with the posterior probabilities of classification
in Figure 6.19 shows that the changes of classification for units 70 and 116
occur as the units in the second group of forgeries start to enter the subset.
We now briefl.y repeat the analysis using quadratic discriminant analysis.
The upper panel of Figure 6.21 shows that all units in Group 1 are again
correctly classified during the search, except for unit 70, which now has
a rapid change to the second group at m = 172. This panel is similar to
that for linear discriminant analysis in Figure 6.19. The lower panel of
Figure 6.21 however shows much more activity than the comparable figure
for linear discriminant analysis. In particular, at m = 172, ten units change
from being classified in Group 1 and move to Group 2, with two other
units changing shortly afterwards. The changes at m = 172 are therefore
important for the probabilities in both groups. This comparison of analyses
shows again that quadratic discriminant analysis is appreciably less stable
than linear discriminant analysis.
The dramatic change in classification in the lower panel of Figure 6.21
is caused by fitting an individual covariance matrix to each group: until
the outliers are included in Group 2 and affect the covariance matrix, they
seem to be far from the group. How far can be seen from the forward
plots of Mahalanobis distances in Figure 6.22. As in Figure 6.20, there are
6.9 Swiss Bank Notes 331
V
1
1
1
1
1
1
1
1
1
1
1
',
1
\ - - - - - - - - - - - - - - - - ... - - - ·70
FIGURE 6.21. Swiss bank notes: quadratic discriminant analysis. Posterior prob-
ability, as a function of subset size, that , upper panel, observations in Group 1
belong to that group and, lower panel, that observations in Group 2 belong to
Group 2
(a) (b)
'
''
''
-~
""
"""
'•
,, ~ :~
"
''• ~· !
' '
:' '',
I 111
'\
I Hl
'\,''
: '
:•' ' \.;''
_... ----'
FIGURE 6.22. Swiss bank notes: quadratic discriminant analysis, forward plots
of Mahalanobis distances. Left panel, maximum distance for units in the subset;
right panel, minimum distance of units not in the subset. Solid line, Group 1
sharp peaks in both panels around m = 174. But now the distances decline
more rapidly after the peak. This is because, with individual covariance
matrices for each group, the estimated matrices are strongly influenced by
the outlying observations, which begin to seem less remote.
332 6. Discriminant Analysis
TABLE 6.1. Simulated data: correct transformation and index numbers of con-
taminated units
'Ifue transformation: )..T 0.5 -0.5 -0.5 -0.5
Outliers: original scale 1 5 10 13
Outliers: transformed scale 51 84 92 99
Yl Y2 Y3 Y4
Transformed Data (n = 92) 0.08 0.45 0.38 0.48
Untransformed Data (n = 100) -0.23 0.23 0.32 0.63
y1
.,
"'
y2 (!, tJ.
tJ.
-:l
0
~:.
"'
...
<')
tJ. tJ.
tJ. "'
0
tJ.
"'
<')
tJ.
(!,
"'
1 2 3 4 5 0 1 2 3 4 5 6
y1"(0.5)
2 3 4 5 6 3 4 5 6
-
7
• "'
"'
y2"(-0.5)
(")
y3"(-0.5)
M
"'
y41'(-0.5)
2 3 4 5 6 234567
deletion likelihood ratio is well above the x~ 99% threshold suggesting that
this combination of values of .A must be firmly rejected. Finally, the last line
shows the results when the null hypothesis isthat .A = (1/3, -0.5, 1/3, O)r,
a combination which comes from rounding the maximum likelihood es-
timates of the first and third transformation parameters to 1/3 and the
others to one of the five most common values of .A. The maximum deletion
value of the likelihood ratio is below the 95% point of x~, while the final
value of 4.847 is very close to the expectation of the X~ distribution.
As a result of this diagnostic analysis one might think that .A = (1/3,
-0.5, 1/3, O)T would be a good transformation, since the values of the
statistic are always within the 95% confidence boundary. We give the plot
of this deletion statistic in Figure 6.25.
6.10 Transformations in Discriminant Analysis: A Simulated Example 337
1.0
0 20 40 60 80 100
Observation number
0
92 0
"' 0
0
0
~ 0
0
0
ODO
0
~
~
0 20 40 60
Observation number
80 100
"'
0 /
0 5
Ouantiles
10 15
L-~====~===-:-~-~-====~~~~~~~====~====~======~,r~==
:
0
,....
~:=~~~~~:~:=:=~==~~~;;~~~~~=~~~~~~~~~;~~4
'
60 70 80 90 100
Subset size m
60 70 80 90 100
Subset size m
shown in Figure 6.28. In this case all the estimates along the forward search
are as specified in our vector A until m = 97. The plot of the likelihood ratio
in the lower panel of Figure 6.28 shows that it is the last three observations
to enter that cause rejection of our initial estimate. We therefore take AR
= (0.5,-0.5, -0.5, -0.5)T for further diagnostic calculations.
Step 3. As a result of our forward analysis we have easily recovered
the transformation in Table 6.1 which leads back to normality. Equally
importantly, we can now identify the outliers which were causing difficulty
in finding these transformations. In Table 6.4 we give the order of inclusion
of the last nine units in the forward search with AR· The last four are the
units which are outliers on the scale in which we started to analyse the
data. The four before are outliers once we have correctly transformed the
data.
340 6. Discriminant Analysis
0
c)
C?
,....
0
60 70 80 90 100
Subset size m
I
0
<D
0
"""
-
0
C\1
0
- ~
~
60 70 80 90 100
Subset size m
r---1
0
I /---- · -0.5
-0.5
~1
0
50 60 70 80 90 100 50 60 70 80 90 100
Subset size m Subset size m
y3 y4
/-1
_,...-""' ,-----0.5
,
0
---~~~:_>---=:::~~:- 0
0.5
0
1
50 60 70 80 90 100 50 60 70 80 90 100
Subset size m Subset size m
FIGURE 6.29. Simulated data: fan plots of the signed square root of the likelihood
ratio test for transformation, confirming An = (0.5, -0.5, -0.5, -0.5)T
steady downward trend caused by all the other observations again brings
the value of the statistic below the lower threshold.
---
15 ~~---
1
' '-----~------- .....
""d
§"' <0
:c d
"'e
--
..c
0.
0
...d ..l
·c
*
0
0.. C\1
d
_...",.- ......... ___ _
0
d 6--------------------------
,---J
75 80 85 90 95 100
Subset size m
remains stable and good until the search reaches m = 92. Then the four
outliers from Group 1 enter one after the other and cause the probabilities
to worsen. At the end of the search units from Group 2 enter and do not
have much effect on the probabilities in Group 1.
The plot for Group 2 is much more dramatic. We see from Figure 6.31
that the outliers on the original scale of our data are even more outlying
after the data have been transformed - as would be expected from their
large influence on the transformation. When these observations are intro-
duced, the probabilities change appreciably and the units move to being
correctly classified. But there remain two observations 74 and 86 on the
boundary of the two groups, for which the probabilities oscillate during the
search. Also the naturally outlying 69 is continually mis-classified.
Finally in Figure 6.32 we look at the monitoring of the Mahalanobis
distances and Box statistic (2.23) during the search. The left-hand panel
of Figure 6.32 shows the maximum distance monitoring plot. In the first
stages the plot shows that the maxima are very close - there is no evidence
of any difference in variance in the two groups. When m = 92, observation
69 enters Group 2 and the distance jumps up. After that four outlying
units enter Group 1, causing a jump in the maximum distance for that
group. As successive units enter there is a slight decrease due to masking,
but observation 1 causes a slight increase. After this the last four units
enter Group 2 - there is again a big increase in the distance for Group 2,
which then drops back a bit due to masking. The distances for Group 1 are
hardly altered by the introduction of these four units, despite the common
6.10 Transformations in Discriminant Analysis: A Simulated Example 343
CX>
0
"'
~ CD
~ 0
"'
.0
0
Q.
0
....0
·~
(ij
0
c.. C\1
0
0
0
75 80 85 90 95 100
Subset size m
....'
... .
First group
Second group ''
'
0 . ''
'
"'
...f \
.
--·.
75 80 85 90 95 100 75 80 85 90 95 100
Subset size m Subset size m
FIGURE 6.32 . Correctly transformed simulated data: forward plots. Left panel,
maximum Mahalanobis distance of units included in the subset; right panel, Box
test of equality of covariance matrices (2.23)
estimate of the covariance matrix used. Two conclusions from this plot are
that initially the within group variances are very similar and that we do not
need to constrain the search to be balanced - a situation very different from
that of the example in the next section. And, secondly, that the outliers,
infiuential or not, are entering at the end of the search.
344 6. Discriminant Analysis
The first two serum markers, y 3 and y 4 , may be m easured rather inexpen-
sively from frozen serum. The second two, y 5 and y 6 , require fresh serum.
An important scientific problern is whether use of the expensive second pair
6.11 Muscular Dystrophy Data 345
.... ____ _
0
ci ---------------------------
1.0 --------- 3
9
40 50 60 70
Subset size m
0
ci
40 50 60 70
Subset size m
manner in which the likelihood ratio statistic for this transformation jumps
up at the end of the search, so that the transformation is rejected by the
test. Our initial conclusion is that we have found the correct transformation
and that there are three outliers.
348 6. Discriminant Analysis
0
N
40 50 60 70 0 20 40 60
Subset size m Observation number
FIGURE 6.36. Small muscular dystrophy dataset: results of search with >. =
(0.5, 1, -0.5, 1, 0, o)T. Left panel, forward plot of likelihood ratio test of the trans-
formation; right panel, deletion likelihood ratio test, which fails to identify the
importance of the outliers
"' I
--~
20 30 40 50 60 70 20 30 40 50 60 70
.---......,.--
.......... ... .... _.. - ..
........
_.." __ _
.._ ... ____ , ___ -0.
20 30 40 50 60 70
20 30 40 50 60 70 20 30 40 50 60 70
FIGURE 6.37. Small muscular dystrophy dataset: fan plots of the signed
square root of the likelihood ratio test for transformation, confirming AR =
(0.5, 1, -0.5, 1, 0, O)T
.
~
:::
-:
n = 73 n = 70
Transformed data 0 .153 0.159
Untransformed data 0 .188 0.199
Percentage of units improved 79.5 77.1
group and then plotting the logits ofthe probabilities against order number.
Figure 6.39 shows that the improvement isover almost all units.
6.11 Muscular Dystrophy Data 351
CX)
..ci <0
eCl.
(i) ~
0
Cl.
0 C\1
.l!!
"C»
0
......1 0
0 20 40 60
order number
__.........,_.-.
>. 0 __ .. -
-
--------------
-
0
0 L_------------~------~
120 140 160 180 120 140 160 180
~ r-----------------------,
"'
__:_-----------,
-----=:.::::._'-'-. . . . . . . . . . . --.. . . ___ 0
0
L_------------~---------
120 140 160 180 120 140 160 180
FIGURE 6.40. Large muscular dystrophy dataset: fan plots of signed square
roots of the likelihood ratio test for transformation, confirming AR
( -0.5, 1, -0.5, 1, 0, O)T
ted in Figure 6.40, in which we have used a balanced search as some of the
30 forward searches are for values far from AR· The order of inclusion of
the units for the search with AR is given in Table 6.6.
The plots for variables 1, 2 and 3 are uneventful. The inclusion of obser-
vation 78 at the end of the search causes a jump in the value of A4 as it
does for A6 , which is also slightly influenced by observation 53. But there
is no reason to question the suggested value of AR·
We now examine the data for any other outliers and the effect of trans-
formation on their presence. If there are a ny further outliers, the plots in
6.11 Muscular Dystrophy Data 353
Figure 6.40 show that they will not have had an influence on the choice of
transformation.
We start with the untransformed data. Because we are interested in the
differences between the groups we use a balanced search so that we can mon-
itor the Mahalanobis distances from the two groups as the search evolves.
Figure 6.41(a), the minimum distance monitaring plot, shows that before
TABLE 6.8. Large muscular dystrophy dataset (n = 194) : results for discrimina-
tion using untransformed and transformed observations with AR =(-0.5, 1, -0.5,
1, 0, O)T: n = 189, observations 53, 78, 118, 130 and 140 deleted
n = 194 n = 189
Transformed data 0.132 0.119
Untransformed data 0.170 0.166
Percentage of units improved 72.7 78.3
n = 194 n = 189
U ntransformed data 605 .6 602.9
Transformed data 48 .059 51.401
Figure 6 .41(c) shows the minimum distance monitaring plot now using an
unbalanced search. The order of inclusion of the observations is given in
Table 6.7. The plot shows that, for most of the search, the transformation
has made the variances in the two groups comparable, but that there are
five outliers. For Group 1 observations 118, 53 and 78 are shown to be
6.11 Muscular Dystrophy Data 355
outlying by the upward jump in the plot. The two outliers for Group 2 are
130 and 140. One interesting feature is that these outliers are for subjects
with ages between 22 and 39. The inclusion of the additional 14 units with
ages at least 43 has rendered the previous three outliers extreme but not
highly atypical. We now consider the effect of transformations and outlier
detection on the properties of the discriminant analyses.
We compare all 194 observations and the 189 observations left after out-
Her detection, both on the original and on the transformed scales. Table 6.8,
tobe compared with Table 6.5, shows the overall mis-classification proba-
bilities for the four analyses. The general difference between the two tables
is that, with more observations used in estimation, the overall rates have
gone down. The highest average probability of mis-classification now is
0.170 for the original data, dropping to 0.119 for the transformed data
without the outliers, a 43% increase in mis-classification from failure to use
the transformation. The percentage of units for which the probability of
mis-classification decreases because of the transformation is around 75%
and is slightly increased when the outliers are removed.
The table also gives the results of the test for equality of covariances. lt
is clear from the table, as it was from the figures, that the transformation
has an enormaus effect in improving the equality of the variances of the
two groups, although, with the 99% point of x~ 1 being 38.93, equality has
not quite been obtained.
Transformed data
(n = 194) 0.490 -0.052 0.582 -0.387 -0.250 -0.276
Transformed data
(n = 189) 0.542 -0.073 0.650 -0.310 -0.202 -0.211
The final table gives information on the discriminant function. We list the
coefficients of the standardized canonical eigenvector in Table 6.9. Apart
from YI and Y2, age and month, the other four variables are serum mark-
ers: Y3 and Y4 are inexpensive to measure, y5 and Y6 expensive. For the
untransformed data the inexpensive variables are second and fourth in im-
portance amongst the markers whereas, after transformation, they are first
and second. Removal of the outliers further increases the weighting on Y3
356 6. Discriminant Analysis
6.13 Exercises
Exercise 6.1 Describe similarities and differences in terms of projection
of points between principal components analysis and canonical variate anal-
ysis.
2
B = (g- l)Es = L nl('fh- fi)(fh- yf (6.25)
l=l
can be expressed as
(6 .26)
where
nz
w for each Yu, Y21, · · ·, Yn 1 l in sample 1
n1 + nz
n1
for each Y12, Yzz , · · ·, Yn22 in sample 2,
n1 +nz
Exercise 6.4 Show that in the case of two groups the classification rule
based on the first canonical variate is exactly the same as the classification
rule based on multivariate normality with equal covariance matrices.
Exercise 6.5 Show that in the two group case, the within group sample
correlation of each variable yc1 with the discriminant function (ryc ,z) is
J
358 6. Discriminant Analysis
k Y11- Y12
(j=l, ... ,v),
ryci ,z = . f(_L + _L) w··
y n1 n2 JJ
where w11 is the jth diagonal element of the pooled covariance matrix Ew
and
k=
Exercise 6.6 Let B = (g -l)EB and W = (n- g)Ew be, respectively, the
between and within groups matrices of residual sum of squares and products
of the data. Show that the maximum of aT Baj(aTWa) is obtained when a
is the eigenvector corresponding to the Zargest eigenvalue >. of the matrix
w- 1B. This gives the so-called first discriminant function (or canonical
variate) z1 = Ya1 = auyc 1 + a21YC2 + · · · + av1YCv. Show that the dis-
criminant function which has the Zargest discriminant criterion achievable
by any linear combination of the y 's that is uncorrelated with z 1 is obtained
when >. 2 is the second Zargest eigenvalue of the matrix w- 1 B and a2 is
the corresponding eigenvector. Show that the discriminant function which
has the Zargest discriminant criterion achievable by any linear combination
of the y 's that is uncorrelated with z 1 and z2 is obtained when A3 is the
third Zargest eigenvalue of the matrix w- 1B and a 3 is the corresponding
eigenvector. Show that this property extends to z 4 , z 5 , ... , z 8 • How do your
answers change if the matrices B and W are replaced by the estimates EB
and Ew?
Exercise 6. 7 Show that the canonical variates are uncorrelated, between
groups, within groups and over the whole sample.
Exercise 6.8 For two populations the allocation rule which has been sug-
gested in this chapter is of the form
6.14 Salutions
Exercise 6.1
Principal components analysis tries to find new orthogonal directions (i.e.
linear combination z = aljYC 1 + · · · + avjYCv) in which the projected points
exhibit maximum spread. The purpose of canonical variates analysis is to
provide a low dimensional representation of the data that highlights as
accurately as possible the true differences between the g subsets of points
in the full configuration. For example, the first principal component is such
that when all units are projected onto this direction they exhibit maximum
spread. It is evident that although the overall projection of points along
this direction may be maximum there is no indication of any difference
between the two groups along this direction.
If 9] contains the coefficients of the jth principal component eigenvec-
tor and if G = (g 1 , . . . , 9v), the condition GT G = I established in Section
5.2.1 implies that the principal components are uncorrelated over the whole
sample and that the principal component transformation from the original
variates y to the new variates z is orthogonal. On the other hand, the
constraint imposed to the set of canonical eigenvectors A = ( a 1 , ... , a 8 ),
ATi;wA =I (equation 6.13) shows that the canonical variate transforma-
tion from y to z is not orthogonal. In geometric terms this means that the
principal component axes are at right angles to each other and that the
frame of reference for the principal component space is obtained by a rigid
rotation of the original frame of reference. On the other hand, the canonical
variate axes are not at right angles to each other and the frame of reference
for the canonical variate space involves some deformation of the original
frame of reference, with some axes pressed closer to each other and others
pulled further apart.
Exercise 6.2
'fh - fi can be written as
n1fi 1 + n2'!h - n1fi1 - n2'fh n2('!h - fi2)
n1 + n2 n1 +n2
360 60 Discriminant Analysis
It is easy to verify that the sum of equations (6029) and (6 030) gives
B = + Y1 - _Y2 )(-Y1 - _Y2 )T
n1n2 (-
n1 n2
°
Exercise 6.3
Note that w = 0 for all n 1 + n 2 thus owe work in terms of deviations from
the pooled means both in terms of explanatory variables (columns of the
matrix Y) and dependent variable (dummy variable w) o Thus
xrx =nEO
Using equation (6026) we can decompose the matrix nE between groups
and within groups as
n~
, = (n1 + n2- 2) {'~w + n1n2 ddT } ,
n1 + n2 n1 + n2 - 2
where d = Cfh - 'fh)o Using the Sherman-Morrison-Woodbury inversion
formula (equation 2044) , with o: = n 1n 2/{(n 1 + n2)(n 1 + n2- 2)}, we can
easily obtain the inverse
n1
Finally
n1n2 _
T _ n1n2 d o
X Y= (y1 - Y2) = -=----=-
n1 + n2 n1 + n2
Putting these pieces together we obtain
{3 (XTX)-1XTy
tR}d + o:~tR}dtR}d- o:Ew1 d~Ew1 d n1n2 1
1 + o:dTEw1d n1 + n2 n1 + n2- 2
' 1
d
~w n1n2 1
1 + o:dTER}d n1 + n2 n1 + n2- 2
n1n2 ~- 1 d
, 1 L..lw
(n1 + nz)(n1 + nz- 2) + n1nzdT~w d
n1n2
6.14 Solutions 361
we can write
2 ßT d !!1!!2.
A
n 'T
Rw!Y = ~ =ß d.
n
The link between two group discriminant analysis and regression was first
noted by Fisher (1936) . Flury and Riedwyl (1985) give further insights into
the relationship.
Exercise 6.4
When there are 2 populations s = min(v , g -1) = 1, thus f:w1 f:B has only
one non zero eigenvalue which can be found explicitly. Given that the trace
is the sum of the eigenvalues, this eigenvalue equals
(6.31)
So, the unique non zero eigenvalue .X of the matrix f:w1 f:B is given by
equation (6.31). For the corresponding eigenvector, it is easy to verify that
the equation
.Xa
2 CJh -Ih)f:w1 0h -Ih)a
n~ n2
n1
z1 > z2 , z > (z1 + z2 )/2 implies that z is closer to Z1, the rule in terms of
z is to assign z to P1 if
Thus, in the two group case, Fisher classification rule is exactly equal
to the classification rule we obtain for two normal populations and equal
covariance matrices.
Exercise 6.5
Let q(j) be the v x 1 vector with a one in the jth position and zeroes in
all other positions. Then, following the results in (2.92), the jth column
of the matrix Y can be written as YcJ = Y q(j). The within group sample
correlation between YcJ and z = Ya is given by
BycJ ,z q(j)Ttwa
J 8 ~cJ, s; J q(j)Ttwq(j) aTtwa
q(j)Ttwti\}Oh- Yh)
Jwjj(]h -]h)Ttl,;}twt\,;}(yl- Y2)
Y11- Y12
J Wjj (yl - y2)Tt\,;} (Yl - Y2)
n1 + n2 Y11 - Yj2
Y2)t\,;}(yl- Y2) . f(..L + ..L)w·.
n1n2(f11-
V n1 n2 JJ
Exercise 6.6
We first convert the maximization problern to one already solved. Spec-
tral decomposition yields "E = r ArT. The symmetric square root matrix
"E 1 12 = rA 112 rT and its inverse "E- 1 12 = rA- 1 12 rT satisfy "E 112 I:; 112 = "E,
I:;l/2I:;-1/2 = I= I:;-l/2I:;l/2 and I:;-l/2I:;-l/2 = I:;-1 .
If we set u = W 112 a, the ratio
(6.32)
6.14 Solutions 363
can be rewritten as
aTW1/2W1 /2a
aTw1/2w-1/2 Bw-1/ 2w1/2a
uTu
uTw-1/2 Bw-1 f2u
(6.33)
uTu
Consequently, the problern reduces to maximizing equation (6.33) over u.
From Exercise 5.6 we know that the maximum of this ratio is >'1, the
largest eigenvalue of w- 1 / 2 Bw- 112 .
This maximum occurs when u = ')'1 , the normalized eigenvector associ-
ated with .>q. By equation (5.31) , u orthogonal to /'I maximizes the pre-
ceding ratio when u = ')'2 , the normalized eigenvector of w - 112 BW- 112
corresponding to the second largest eigenvalue .\2. We can continue in this
fashion for the remairring linear canonical variates. Now, note that if .\ and
1' are an eigenvalue-eigenvector pair of w - 112Bw- 112 , then by definition
or
w - 1B(w-112") = .\(w- 1/ 21'),
thus w- 1 B has the same eigenvalues as w - 1 12 BW- 112 , but the corre-
sponding eigenvector is proportional to w- 112 1' = a. Thus, the vector a
which maximizes equation (6.32) can be found by taking the eigenvector 1'1
corresponding to the largest eigenvalue of the matrix w- 1 12 BW- 112 and
then premultiplying it by w- 1 12 . We can avoid the computation of the
square root W 1 12 followed by premultiplication and find a = w- 1 12 ')' more
directly by taking the eigenvector corresponding to the largest eigenvalue
of the matrix w- 1 B . A similar argument applies to the other eigenvectors.
Note that 'Ew1'EB = { (n - g)j(g - 1)} w- 1 B. Hence eigenvectors of
w - 1 B are t he same as those of 'tii)'tB, but any eigenvalue of w - 1 B is
(g - 1)/(n- g) times the corresponding eigenvalue of 'tiiJ'tB.
Exercise 6. 7
As in the preceding exercise, we work with B = (g - 1)'EB and W =
(n - g )'tw. Given that the canonical vectors a 1 , ... , a 8 are the eigenvectors
of the matrix w - 1 B we must have that
364 6. Discriminant Analysis
for j = 1, . . . , s; or
(B -l1 W)a1 = 0.
From this equation we obtain that any two particular eigenvalue/ eigenvector
pairs (li, ai) and (Z1 , a1 ) satisfy
(6.34)
and
Ba1 = Z1Wa 1 . (6.35)
Premultiplying (6.34) by aJ and (6 .35) by a[ , we obtain
and
a[ Ba1 = l1afW a1.
Since B is symmetric, the scalar aJ Bai = a[ Baj , so it follows that
(6.36)
Exercise 6.8
If y has a probability density function f(y), then the probability that an
observed value falls in a region R 1 of the sample space is
r
jR1
J(y)dy.
6.14 Solutions 365
The integral sign in equation (6.37) represents the volume formed by the
density function h(y) over the region R 1 . Similarly, the probability that y
comes from population P 1 and is misallocated is
Now, since the two regions R 1 and R 2 are a partition of R, we have that
R = R1 u R2 and R1 n Rz = 0, so that
JR
r fi(y)dy = JR1r fi(y)dy + JR2
r fi(y)dy = 1
or
r
JR2
fi(y)dy = 1 _ r fi(y)dy.
JR1
Substituting for JR 2 fi(y)dy in equation (6.38) yields
(6.39)
In other words, EC M is minimized by choosing the region R 1 to be the set
of all those points and only those points that give a negative contribution
to the expression {c(112)n2f2(y)- c(211)ndl(y)}. This is because with this
choice the largest possible amount will be subtracted from c(211)n1 to yield
ECM. Hence, from equation (6.39), it follows that the optimal rule is
associated with the region R 1 composed of all those points y for which
366 6. Discriminant Analysis
This argument provides a theoretical justification for a rule of the form (6.27).
Note that if the two costs c(ll2) and c(2ll) are equal, the optimal rule for
the classification criterion becomes the one of assigning y to the population
with the larger posterior probability.
7
Cluster Analysis
7.1 lntroduction
In duster analysis the multivariate observations are to be divided into g
groups. The membership of the groups is not known, nor is the number of
groups. The situation is seemingly different from that of discriminant anal-
ysis considered in Chapter 6 where both the number of groups and group
membership are known. However, there is much in common between our
procedure for dustering and the methods we used in the earlier chapters.
We start by treating the observations as if they, perhaps after trans-
formation, come from a single multivariate population. We monitor the
forward search to see whether there is any evidence of groups. If there is,
we t entatively divide the observations into dusters and then use the tech-
nique of search with several groups employed in discriminant analysis to see
how stable the proposed duster membership is and what is the allocation
of unallocated units.
We describe the stages of our search in §7.2.1 , dividing the data analysis
into three stages: preliminary, exploratory and confirmatory. In the last of
these unassigned units are assigned to dusters during the forward search on
the basis of comparisons of Mahalanobis distances. This comparison is not
entirely Straightforward when the distances are calculated for populations
with differing covariance matrices. In §7.2.2 we discuss the use of standard-
ized distances to help with these comparisons. The extensions needed to
the forward search are outlined in §7.2.3
368 7. Cluster Analysis
lanobis distance from that duster, due to the low dispersion of the duster.
Because of the higher variance of a dispersed duster, the observation may
have a large Euclidean distance from the dispersed duster, but a Maha-
lanobis distance slightly less than that from the tight duster, and so be
wrongly allocated. Such a wrong allocation would increase the difference
in dispersions and might lead to a dispersed group "invading" a compact
group during the search, leading to the wrong allocation of several units.
The problern is mentioned by, amongst others, Gordon (1999, p. 48).
We attack this problern by introducing standardized distances, which
are adjusted for the variance of the individual duster. The customary Ma-
halanobis distance for the ith observation from the lth group at step m
is
T f,-1
dilm = (eilm )0.5
L.Julm eilm , (l = 1, ... , g).
A form of generalized distance can then be defined as
dilm(r)
'
= dilm { IEutml
1/2v}r , (7.1)
try a variety of starting points, but there are no novel aspects to the
choice of units in the subset as we move forward.
0
0 0 0
0 0 0
0 0
0 oo 0 0
0 0 0 0 0
0 0 0
0 0
0 0
o o'!', 0
0 0
0 0 0 0
0000 0
0 0
0
0 'b 0 0 0 'b 0
0 0
0
0
0~
0
0 0
-10 -5 0 5 10
y1
FIGURE 7.1. The 60:80 data: scatterplot showing the two clusters
more complicated examples. These plots include the entry plot and plots
of the fitted ellipses containing all the m observations in the subset.
~ ,------------------------------------------------ ------ ,
057
oo 045
0
0 0 0
:.
0 0
0 0 0
0
0
0 00
~
oq, 0 0
0
0
~
027
-10 -5 0 5 10
y1
FIGURE 7.2. The 60:80 data: ellipse from very robust fit nominally containing
97.5% of the data. The half of the data used for estimation contains observations
from both clusters
both groups. It must be stressed that this is not a failure of the algorithm to
find the minimum of a multimodal function. The ellipse found is of smaller
area than that can be found containing the same number of observations
solely from the larger group. What has happened is obvious here in two
dimensions. We need to be able to check whether something similar has
occurred in more than two dimensions when the Separation into groups is
not obvious by plotting along the coordinate axes.
FIGURE 7.3. The 60:80 data: forward plot of scaled Mahalanobis distances
from which the parameters are estimated and a distinct set of much larger
distances for the more dispersed group.
When m = 61 the first unit from the dispersed group is included in
the subset. Immediately there is a change in the plot; some units from the
larger group have decreasing distances, while those for other units initially
increase. There is a further period of activity around m = 90 and then, as
the search progresses, the distances for the small group tend to increase.
In the middle of this process of fitting to units from both groups , around
m =105, there arenosmall Mahalanobis distances, even though the asymp-
totic distribution of the distances is x~, that is exponential. Unit 57, visible
in the top right-hand corner of Figure 7.2 is particularly outlying at the
beginning and end of the search.
Figure 7.4 is a QQ plot of squared Mahalanobis distances for the x~
distribution during the forward search for two values of m . The left-hand
panel, form= 37, shows the two groups with a gap between the 60small
distances and the 80 large, just as there is in Figure 7.3 for this value of
m. The plot in the right-hand panel, for m = 101, shows, in the left-hand
tail, the absence of very small distances; since the centroid of the fitted
observations is not near any data point, the smallest distance plots well
above zero.
The forward plot thus shows that there is a tight group of observations
and around 80 observations which are different. We now consider other plots
7.3 The 60:80 Data 375
m=37 m=101
0
"'
....
_.
.
I
~ ""..-
.-··
0
____/
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 .0 0.5 1.0 1.5 2.0 2.5 3.0
Quantiles Ouantiles
FIGURE 7.4. The 60:80 data: QQ plots of squared Mahalanobis distances against
X~· Left panel, m = 37, showing the two groups; right panel, m = 101- there are
no small distances
0
'<t
0
C\1
0
0
ä)
1/)
.0
:::l 0
1/) Cl)
Q)
"0
.iii
.!: 0
<0
.l!l
·c:
:::>
0
'<t
0
C\1
can plot by order of entry in the forward search. We can also use tentative
duster labels as plotting symbols.
Ellipses. The entry plot teils us which observations form the subset as
m increases. With two variables, information on the nature of the subset
and the fitted model comes from ellipses containing a specified fraction
of the data, as in Figure 7.2. We now Iook at the ellipses containing all
m points and m/2 points as the search progresses, where the eigenvectors
of the ellipse are those of f;um, the unbiased estimator of the covariance
matrix based on the subset of size m. The outer ellipse will therefore pass
through the most remote observation in the subset, that is the one with the
largest Mahalanobis distance. In labelling the axes of these plots we scale
the variables by the standard deviations estimated from the subset, that is
by the square root of the elements of the diagonal of f;m · With more than
two variables the ellipses can be superimposed on the scatterplot matrix of
the data.
The upper left-hand panel of Figure 7.6 shows the ellipses for the starting
value m 0 = 18. In the plot we have used filled symbols to indicate the
observations in the subset. The figure shows that the initial subset contains
observations from both the compact and dispersed groups. It also shows
that the ellipse contains observations which are not in the subset. Since
these have smaller Mahalanobis distances than at least one observation in
the subset, some will be included at the first step of the forward search.
7.3 The 60:80 Data 377
The upper right-hand panel of Figure 7.6 for m = 19 shows that this is
exactly what happens: now the subset contains just one observation from
the more dispersed group, an effect we noted in the entry plot ofFigure 7.5.
One more step of the forward search leads to the elimination of this last
observation and so to a subset consisting solely of observations from the
compact group.
The ellipse in the lower left-hand panel of Figure 7.6 for m = 20 is
comparatively very small so gives large scaled distances for units not in the
subset. As the first three panels of Figure 7.6 show, the ellipse shrinks and
the scaled distances grow at the start of this search. We have seen in the
forward plot of Mahalanobis distances, Figure 7.3, that large Mahalanobis
distances are obtained near this point in the forward search.
The orientation of the ellipses in the three figures changes a little as
observations from the dispersed group are excluded. There is also some
slight change in the orientation as they are re-introduced after m = 60
(lower right-hand panel of Figure 7.6) . The left-hand panel of Figure 7. 7
shows the ellipseform = 80. This is now much larger than previous ellipses,
since 20 observations are included from the more dispersed group. As the
forward search progresses the ellipses grow and rotate slightly in an anti-
clockwise direction until the right-hand panel of Figure 7.7 is obtained for
m = 140. The outer ellipse is large because it passes through unit 57, the
outlier in Figure 7.3.
Forward Plots of Minimum and Maximum Mahalanobis Dis-
tances. The entry plot and plots of ellipses illuminate the structure of the
data. We now consider plots of Mahalanobis distances which pinpoint the
position in the forward search at which changes in the structure of the
subset occur.
The left-hand panel of Figure 7.8 is the forward plot of the minimum
Mahalanobis distances amongst units not in the subset. There is a needle
sharp peak at m = 60 indicating that the next unit to be introduced is
remote from the group of observations so far fitted. The high value at the
end of the search belongs to the last observation to enter, unit 57, which
the outer ellipse in the right-hand panel of Figure 7.7 showed tobe remote
even from the dispersed data cloud. There is no indication of any other
structure.
The right-hand panel of Figure 7.8 shows the complementary plot of the
maximum Mahalanobis distance among the units in the subset. This has a
peak one observation later when m = 61. This peak is less clearly defined
than that in the left-hand panel, since the minimum Mahalanobis distance
among units not in the subset is the deletion version of the maximum dis-
tance within the subset one step later, unless there is an interchange. This
plot, like that in the left-hand panel, has a high value at the end of the
search caused by the single outlier. Both plots start with low values because
distances are calculated over observations included in the outer ellipses of
the upper two panels of Figure 7.6. The minimum distance among units
378 7. Cluster Analysis
m=18 m=19
m=20 m=61
•
FIGURE 7.6. The 60:80 data: plots of ellipses for the first three steps with
mo = 18 and for m = 61 when the first diffused observation enters the sub-
set. The search initially includes units from both groups. Filled symbols are used
for units in the subset
..
-10 -5 0 -6 -4 -2 0 4
y1 y1
FIGURE 7.7. The 60:80 data: plots of ellipses for two further steps when m 0 = 18.
Left-hand panel, m = 80; right-hand panel, m = 140. Inclusion of unit 57 when
m = 140 causes an appreciable increase in the size of the ellipse
7.4 Three Clusters, Two Outliers: A Second Synthetic Example 379
C\1
""
~
<D
Cl
::;: "" Cl
::;:
E
E
::> <D ::>
E
....
E ·;.:
·c:
....
<0
~ ::;:
C\1
C\1
0 0
FIGURE 7.8. The 60:80 data: forward plots of Mahalanobis distances. Left panel,
minimum distance amongst units not in the subset; right panel, maximum dis-
tance among units in the subset. The interchanges at the beginning of the search
and around m = 90 are evident
not in the subset will be for some point lying inside the ellipse. The maxi-
mum distance amongst those in the subset is also too small because several
observations are lying on or near to the ellipse: the extreme case would be
when all observations in the subset lay on the ellipse, when the maximum
distance would be the same as any in the subset (see Exercise 2.12).
..
. ., . •
. . . .. : ·.· . .:
.. . •
•
••
.. " . .. .. • .•
I • ••• •• • •.•.. • •
·•
I • •• •
•
• •
~ rA•
• •
-10 -5 0 5 10
y1
FIGURE 7.9. Three Clusters, Two Outliers: scatterplot of the data. The numbers
of units in the groups are 60, 80 and 18, with two outliers
that the starting ellipse has mostly chosen observations from the group of
60. The few observations from the diffuse group are eliminated at the first
forward step and the search then continues solely with observations from
the compact group until m = 61 when observations from the diffuse group
enter. Just before all the diffuse observations are included the two outliers,
observations 159 and 160 enter the subset, although they are soon rejected,
rejoining again at the very end of the search.
This behaviour can be explained by studying ellipses similar to those
considered in detail earlier for the 60:80 data_ However we leave this study
to Exercise 7.1 , passing on instead to the forward plot of the Mahalanobis
distances, Figure 7.11. This indicates all the structure in the data; up to
m = 60 the second group is clearly separated from the first with the dis-
tances of the compact group of 18 evident at the top of the plot. The two
outliers follow an independent path. Once the second group starts to enter
at m = 61 the smaller distances are not so readily interpreted, although
the group of 18 remains distinct. The two outliers re-emerge at the very
end.
The forward plot of the minimum Mahalanobis distances amongst units
not in the subset in the left-hand panel of Figure 7.12 clearly shows the end
of the firstduster through the sharp spike at m = 60. The second spike at
m = 142 is a rather less clear indication of the end of the second group.
The spike at the end of the plot clearly shows the presence of, now, two
outliers. The forward plot of the maximum Mahalanobis distances amongst
units included in the subset, right-hand panel of Figure 7.12, is similar, but
with a slightly less strong, although still appreciable, indication of the end
7.4 Three Clusters, Two Outliers: A Second Synthetic Example 381
1
tl)
Qj
0
"':::>
1
.0 0
.,"'
"0
· ~n
.s
2
·c:
1
::::> 0
<f)
FIGURE 7.10. Three Clusters, Two Outliers: entry plot from mo = 28 . The two
largest groups enter in succession
FIGURE 7.11. Three Clusters, Two Outliers: forward plot of scaled Mahalanobis
distances. The three clusters and two outliers are revealed
382 7. Cluster Analysis
0 IX>
:;
E
::::> CD
E
·;;:
"' ....
:;
FIGURE 7.12. Three Clusters, Two Outliers: left panel, forward plot of the min-
imum Mahalanobis distances amongst units not in the subset; right panel, max-
imum Mahalanobis distances amongst units in the subset
of the first duster. The latter part of the plot gives a slightly stronger
indication than that of the minimum distance of a third duster. The plot
again signals the two outliers.
The conclusion of this analysis is that forward plots of Mahalanobis dis-
tances and the entry plot are both useful. To condude our look at these
data we contrast our plots from the forward search with those from the min-
imum covariance determinant (MCD) procedure inS-Plus and with a plot
of dassical Mahalanobis distances from fitting all the data non-robustly.
0 0
0 0
0
0 <9
0 0' 0 0
0
"' o>,g
00 0 0
oo 0 0
0 0 0 0
0 0 oo 00 0
00
~ 0
0
<9 8 oo 0
o 0o
0
@0 OOo 0
oo
0 0 0
oo 0
0 0
0
0
«;> 0
0-
0
~
~~
·10 -5 0 5 10
y1
FIGURE 7.13. Three Clusters, Two Outliers: ellipse from very robust fit nom-
inally containing 97.5% of the data. The half of the data used for estimation
contains observations from two clusters
.ß.J.& oo 1 57
f9 0~ 00
0
0
0
0
0 0
0 0
0 0
0 0 0
0
0 0 0
0~ 0
:o
0 n
o o oo <b o
oo oooooooo oco o ~ oo
o
o
o:ooo
o Oo o o
0
~o oo o o
o ~
d?o~o~~~~oi!Sib
0
0 50 100 150
Index
FIGURE 7.14. Three Clusters, Two Outliers: index plot of robust Mahalanobis
distances from the MCD fit . The horizontalline is the 97.5% point of the asymp-
totic distribution of distances
384 7. Cluster Analysis
015
016
0
0
0
0
0 0
0 0
0 0 0 0
0 0
0 0
0 50 100 150
Index
FIGURE 7.15. Three Clusters, Two Outliers : index plot of classical Mahalanobis
distances. The horizontal line is the 97.5% point of the asymptotic distribution
of distances
'll
"'
0 <SO
0 &146 0
0 0 0 0
0 00 0
0 0
00 0 0
'0 0 0
0
0
0 0 00 0 0 0 0 0 0
o o o o o o no o Do
~ ~o o o "oo~ -ct o q, -,.,
0 0 000 0 0 cj)O
o,.p~o~~-~~ 0
0
0 0
0
0
FIGURE 7.16. Three Clusters, Two Outliers: index plots of Mahalanobis dis-
tances after permuting observation numbers: left, robust; right, non-robust. The
horizontallines are the 97.5% points of the asymptotic distribution of distances
the robust distances. Now only the two outliers h ave distances lying outside
the 97.5% point of the nominal distribution. vx;
7.5 Data with a Bridge 385
0 0
0
0 0
0 0
o'i '0 o o o u
o o Q;)o~ooo o ooo o _ ...
o q, oo;~o~~ro ·····
. ....o- ··· ·· ··
0 2 3
Classical Mahalanobis Distance
FIGURE 7.17. Three Clusters, Two Outliers: classical and robust Mahalanobis
distances
0
LO oo
0
0 0 0
0
0 oo 0
0 0 cP 0 0
~ 0
0
0
o 10o 0 0 0 0
0
0
0 '8 oCbo10 0
0
0
0
oo~ 0 0
0 0
o qpo 0 0
0
00
o<o 0
0
0
0 0 '0 0 '8 r:P 0
0 0 0
u;>
o'6 q,oo
0
o~lb~0 0
0
-10 -5 0 5 10
y1
FIGURE 7.18. Bridge data: the 60:80 data with a further 30 observations joining
the two clusters
tion is rare in real examples, where often clusters almost overlap. We now
consider the modification of the 60:80 data to reduce the sharpness of di-
vision between groups.
Our analysis is broken into the three steps described in §7.2.1: prelim-
inary analysis using the plots from fitting one normal distribution, ex-
ploratory analysis where we break the data into tentative clusters and con-
firmatory analysis where we adjust and try to confirm the clusters we have
found.
0
l{)
o;
(/)
.0 0
::> 0
(/) ~
Q)
u
·c;;
.s
2
·c:
::::> 0
l{)
J
40 60 80 100 120 140 160
Subset size m
FIGURE 7.19. Bridge data: entry plot. The initial subset consists mostly of units
from the "bridge", numbered 141-170. These are rapidly eliminated by the for-
ward search
of rapid interchange in which the compact group enters, driving out the
points from the bridge. After m = 60, when all the compact group have
entered, the Observations from the bridge are re-introduced, followed by the
observations from the dispersed duster. Once the procedure has found the
compact group, there are few interchanges during the rest of the search.
Those that there are occur within the dispersed group towards the end of
the search.
This description of the movement of the forward search can be encapsu-
lated by a few plots of ellipses. Figure 7.20 shows, in the upper left-hand
panel, the starting point (m0 = 31) with the outer ellipse containing points
from the bridge. There are also many from the compact group, but most
are not included in the initial subset. There then follows a period of rapid
interchange as the subset changes to consist solely of points within the
ellipse. The upper right-hand panel of Figure 7.20 shows a small outer el-
lipse for m = 36, which contains points both from the bridge and from
the ellipse. The two lower panels show the progress, through m = 39 to a
subset, form= 42, consisting solely of points in the compact group. Here
the ellipse is very small. We can expect that, around this value of m, there
will be very large Mahalanobis distances for many of the observations.
The second set of ellipses, in Figure 7.21, starts with m = 60, which
is the last set of ellipses for which the subset contains only observations
from the compact group. The remaining three panels, up to m = 81, show
the rotation and extension of the ellipse as observations from the bridge
388 7. Cluster Analysis
m=31 m=36
0 0
oo-ao o
00
o qpo
oQ, o
0 0 0
m=39 m=42
0 0' 0
0 0 0
0
00
oo~O o
o qpo 00
oQ,
0
0 0 0
FIGURE 7.20. Bridge data: plots of ellipses in the earlier stages of the search,
showing how the subset moves to the tight duster
m=60 m=67
oO oo
00
0 'S 0
"' ·~
:~q,~ 0
.:.."..
m=74 m=81
oO oO
0 0 0 0
oo~o o oo-ao o
0 qpo 00 0 qpo
oQ, 0"' 0
0 0 931Sl 0 0 0 0
0~ ~
FIGURE 7.21. Bridge data: plots of ellipses showing the reintroduction of units
from the bridge into the subset
7.5 Data with a Bridge 389
m=90 m=116
m=142 m=169
• • .:: ..... ~ 0
.·.·~~·• 0
0 oo
OQ) O
•"
0 • • e ~ e"" •" •"
0 00 ··~-'-
FIGURE 7.22. Bridge data: plots of ellipses showing the relatively constant ori-
entation of the ellipse as the dispersed group is included in the subset
join those from the compact group. The rotation of the ellipse will result
in changes in the ordering of the observations by Mahalanobis distance.
The increase in the size of the subset leads to a general shrinkage in the
distances of units not in the subset (Exercise 2.12) .
The final set of ellipses, Figure 7.22, shows that the ellipse hardly changes
its orientation during the rest of the search. Fromm= 90 to m = 169 the
effect of the tight duster and of the points in the bridge is sufficient to keep
the orientation of the ellipse constant. During this period, Mahalanobis
distances will shrink, but there will be little change in the ranking of the
observations. The final plot, for m = 169 in the lower right-hand panel of
Figure 7.22, shows that the last observation to enter will seem to be an
outlier.
A summary of the behaviour of these ellipses is given by the changes in
the principal components of the ellipse during the forward search. The left-
hand panel of Figure 7. 23 shows the percentage of variance of the complete
data explained by the two principal components and the right-hand panel
the eccentricity of the ellipse, which is 1- y'(.A 2 / .Al), where .A 2 is the smaller
of the two eigenvalues. For a circle, the value is zero. The two panels of
Figure 7.24 plot the elements of the eigenvectors. Four phases are evident,
to some extent, in each plot. These reflect the initial rejection from the
subset of units in the bridge, the rotation of the ellipse from m = 40 up to
the reinclusion of units from the bridge after m = 60 and changes in the
ellipse towards the end of the search as the ellipse becomes more circular.
390 7. Cluster Analysis
<=! <=!
s~ s
0
Q)
~~-'
'/·
LO > LO
c:
> 0
c: Q) 0
C>
Q)
C> ·a;
·a; "0
c:
~
0 0 0
0 0
Q) 0
ö U>
ö
~ LO
'.
,,,"., .l1 LO
Q)
c:
E
Q) 9 '~ .-- ...
Q)
E
9
iii Q)
iii
<=! <=!
";" ";"
FIGURE 7.24. Bridge data: forward plots of the first and second eigenvectors,
showing the rotation of the fitted ellipse as units not from the tight duster leave
or enter the subset
It is now time to consider how these suggestions of the cha nging structure
of the subset are revealed in other plots. Figure 7.25 is the forward plot
of scaled Mahalanobis distances. There is a maximum around m = 38;
many distances start to decline after m = 60, with some crossing of the
7.5 Data with a Bridge 391
CX>
57
<fl
__ ..,-
Q)
0
c: / ..... ,..,
"'
u; <rJ
I
I ....__,
'5 /
:s"'0
I
./' ....... "/ /AI'ö=..:;:::-;::::::~
c:
"'
(ij ....
.c
"'
::::;;
"'
0
~
40 60 80 100 120 140 160
Subset size m
FIGURE 7.25. Bridge data: forward plot of scaled Mahalanobis distances. The
three groups seem to be separated around m = 50
0
·' '
::,:
:~::
'••''
V
FIGURE 7.26. Bridge data, forward plots of Mahalanobis distances: left panel,
minimum distance of units not in the subset; right panel, gap plot
less severe any changes in distances. The same is true of the gap plot on the
right of Figure 7.26. This shows that, initially, there are many interchanges.
There are also a couple of peaks around m = 60, but otherwise the plot is
featureless, apart from the outlier at the end.
We now move to a further preliminary analysis, where we rely heavily
on the forward plots of Mahalanobis distances for groups of units and for
individual units.
FIGURE 7.27. Bridge data: forward plots of scaled Mahalanobis distances divided
into three tentative groups at m = 53
The three groups we have selected are then subdivided, by order of dis-
tance, into three panels each. There are 20 curves on the first three plots
and 11 each on those in the second row. There is a remarkable progression
from the first panel to the last. The distances in the first three panels ap-
pear, on this scale, small throughout. The distances in the last six panels
are increasingly large.
As weil as a difference in scale, these plots also show different shapes.
Figure 7.28 replots the nine panels, each with their own scale. The plots
of distances now are revealed as having very different shapes. In the first
three panels the plots are high at the end, whereas in the later panels the
distances decrease as m increases. So the plots of distances differ both in
size and in shape.
The large distances at the beginning of the plots may not help in in-
terpretation. They arise during the period of interchange evident in the
gap plot in the right panel of Figure 7.26. We could therefore rerun the
forward search from a later starting point, to get a different perspective on
the distances. Or we could just omit the earlier part of the plot.
These plots certainly show that there is a progression in the shapes of the
curves, based on the ordering at m = 53. It is however hard to know where,
394 7. Cluster Analysis
Units= 25 th>2.0
~
FIGURE 7.28. Bridge data: forward plots of scaled Mahalanobis distances divided
into three tentative groups at m =53. As Figure 7.27, but with each row rescaled
14 1 1~9
FIGURE 7.29. Bridge data: nine panel plot of forward Mahalanobis distances for
specific units, starting from m = 58
plots are rescaled so that, during the course of the part of the search cap-
tured in the nine panels, the curve for unit 114, which is the one included
at m = 58 in the first panel and has the highest initial distance in that
panel, would shrink to occupying less than half the plotting space of the
panels in the third row.
The curve for m = 60 is a little high in the middle, but it is the curve for
m = 61 which is the first to be really different. Initially it is high, finally
low, changing rapidly from one to the other, indicating that it does not
belong to the duster found so far. The same pattern, but more extreme, is
shown by the units which join for m = 63 to 66. The unit which joins when
m = 62 is slightly different in behaviour: initially the distance is high, but
it is not extreme at the end of the search.
The study of these individual curves confirms the importance of the in-
dications from the gap plot and the plot of minimum Mahalanobis distance
in Figure 7.26, that there is a duster of 60 Observations which enter the
search first. Although 60 is indeed the number of observations in our orig-
inal tight duster, observation 168, which enter when m = 60, is not from
that duster, whereas observation 118, entering at m = 61 is, even though it
appears not to belong. Although these observations are incorrectly dassi-
fied by our procedure, they are drawn from simulated samples and so have
randomness which may take them into a neighbouring group. Both are, in
396 7. Cluster Analysis
i·"·111 : ...
.
.
I•
:i
0
w i
Li) I
;r
'I
.I
0
::;,;; 0
0
.5:
Vl
Q)
Ol
c
ro
r.
ü
0
I!)
fact, extreme in their groups and are shown, by the scatterplot of the data,
Exercise 7.4, tobe in the "wrong" positions.
Our analysis has provided strong evidence for the existence of a t ight
duster of 60 observations. The forward plot of scaled distances in Fig-
ure 7.25 suggested that there were three dusters. The evidence from this
plot is summarised in Figure 7.30 which shows changes in successive Ma-
halanobis distances, in this case unscaled: if, for a particular observation,
there is an increase in the distance when m increases to m + 1, a symbol
is plotted, but not otherwise. Distauces which move up and down together
will then have a similar pattern of dark symbolsandlight spaces. The units
can be ordered on the vertical axis in any way found to be informative, for
example by the magnitude of t he distances a t some point in t he search.
Here we h ave used the order of final entry to the subset. Like t he other
plots, this one suggests the existence of three groups. We now eliminate
the 60 observations we have found to be in a duster, one incorrectly, and
analyse the remaining 110 observations.
The forward plot of all 170 distances, Figure 7.25, indicated that, in
addition to the tight group of 60 observations, there was a relatively little
group (which we know to be the bridge) and one which was more highly
dispersed. We proceed by again running a forward sea rch, but now one in
which we influence the initial subset. If this is chosen in the more dispersed
7.5 Data with a Bridge 397
U">
..,:
""!
0
..,:
Cl 10
:::;: "'!
<-?
E 0..
"'
:::>
E 0 Cl
")(
<-?
"'
:::;: 10
ci
10
C\i
0
0 ci
C\i
20 40 60 80 100 20 40 60 80 100
Subset size m Subset size m
FIGURE 7.31. Bridge data, llO observations: forward plots of Mahalanobis dis-
tances: left panel , maximum distance of units in the subset ; right panel, gap plot
group that seems to have been identified, rather uninformative plots result,
as there is now no structure of extreme values of Mahalanobis distances.
But, if the search starts with the less dispersed group, informative plots
result. Accordingly we start with those units in Figure 7.25 which, at m =
45, have distances between 0.4 and 2. This procedure yields a starting
subset with m 0 = 17.
The forward plot of the maximum Mahalanobis distance in the left panel
of Figure 7.31 has a series of high values starting at m = 31, indicative of a
duster ending at m = 30. After several rather different observations have
entered, the covariance matrix has changed sufficiently that the entering
observations no Ionger have high distances. The gap plot in the right-hand
panel of Figure 7.31has a sharp peak at m = 30.
To investigate whether m = 30 does indeed reflect anything interesting
we go straight to a nine-panel plot of the curves of the Mahalanobis dis-
tances for individual observations. The first panel of Figure 7.32 shows the
group of 28 reference curves. The points entering when m = 29 and 30
agree with this general form. However the remaining observations do not.
That for m = 31 is of quite different shape, being around twice as large
as any other distances around m = 50. The remairring curves behave like
observations from another duster: initially the values are too high, and fi-
nally too low. The high values arealso increasingly high, as is evidenced by
the shrinking of the reference set in successive panels. These observations
are initially far from the fitted duster centre, but are finally nearer the
centroid of the fitted ellipse than the 30 observations found here.
This time the group of 30 have been found correctly, apart from observa-
tion 168 which was induded in the first duster. We now consider methods
of exploring this dustering into three groups. For example, should there be
two groups or four? Have some units been put in the wrong duster?
398 7. Cluster Analysis
~ ,--m--~2S~rn-c~rr~r8~----, ~ r---
m~-J~O~rn-c~
, ~
,,~
J ------~
FIGURE 7.32. Bridge data, 110 observations; nine panel plot of forward Maha-
lanobis distances for specific units, starting from m = 28
0
.,.;
"'c:~
*
'5
.!!!
LJ
g
q
.!!!
"'
J:.
"'
:<
II)
ci
~ ~ _j
50 100 150
Subset size m
FIGURE 7.33. Bridge data: forward plot of scaled Mahalanobis distances for
tentative Cluster 2 when the search starts in that duster
For the bridge data we start with the tight duster of 60 observations,
Cluster 2. Figure 7.33 shows the scaled distances for a search starting with
Cluster 2. The distances are indeed nearly stable to begin with, but then in-
crease, virtually all together, with m. The observations most different from
the rest are 109, which initially has the highest distance, but which changes
to the lowest for the last 30 steps of the search, and 168, which is highest
around m = llO. At m = llO the other observations with anomalously
high distances are, reading down, l17, 100, 129, 106 and 131.
We can use nine-panel plots similar to Figure 7.28 to compare the dis-
tances from this starting point for all three clusters. We can also look at
plots of individual distances. Figure 7.34 showsindividual curves agairrst a
background of Cluster 2 units. The first panel in the figure gives all curves
for the first 54 units to enter the search, that is 90% of the units forming
Cluster 2. As in Figure 7.29 the quantiles of this distribution are used as
a reference set in the other panels. These show that, up to m = 59, the
included units have trajectories of distances much like those of units in the
reference set. When m = 60, unit 168 is included, which has a high distance
for m > 60. Unit l18, included when m = 61, is more extreme, especially
up to m = 60, but does have a trajectory very much like unit l14, included
when m = 58. As we shall see, these two units are close together in Eu-
clidean space. Continuation of this plot for units not in Cluster 2, which
we do not give, shows trajectories which look very different from those of
the first 60 units to be included. They more resemble more extreme ver-
400 7. Cluster Analysis
FIGURE 7.34. Bridge data: forward plots of scaled Mahalanobis distances from
m = 54 for individual units tentatively in Cluster 2 from a search starting in that
duster
sions of the trajectory for unit 118 in the centre panel of the bottom row
of Figure 7.34.
The 30 curves of scaled Mahalanobis distances for the bridge, that is
Cluster 3, in Figure 7.35 seem to tell a less coherent story. However, up to
m = 87, most of the curves move together, the principal exceptions being
those for units 118 and 167, which are the two lowest around m = 87 and
two of the highest towards the end of the search. Again, these two units
are close in Euclidean space. Traces of distances for the three groups show
that units 118 and 167 show up in the last panel for units in Cluster 3.
The curves for the individual units of Cluster 3 show that, although units
118 and 167 are not the last to enter the duster, the trajectories of their
distances do have something in common with units from Cluster 2 which
enter from m = 30.
Finally, Figure 7.36 shows the results of the search starting from Cluster
1, the dispersed group. The distances form a coherent pattern up to m = 80.
Thereafter most rise gently, including that for observation 57, which is the
outlier we have noticed before. However, the curves for units 17, 24, 43 and
67 curve downwards, cutting across the general trend. Plots of individual
trajectories show that unit 43 behaves very similarly to units from the
7.5 Data with a Bridge 401
V>
Q)
u
c CD
~ 0
'6
V>
J5
0
c
<II
äi
s:;
0""'"
<II
::?!
C\1
0
0
0
50 100 150
Subset size m
FIGURE 7.35. Bridge data: forward plot of scaled Mahalanobis distances for
tentative Cluster 3 (the bridge) when the search starts in that duster
bridge which enter after m = 80. In fact, observation 43 is very close to the
bridge.
II>
"'<.> ....
Q)
c:
<0
(i)
'ö
"'0
:ö ('")
c:
<0
(ij
.<:
<0 (\J
::lö
50 100 150
Subset size m
FIGURE 7.36. Bridge data: forward plot of scaled Mahalanobis distances for
tentative Cluster 1 (the dispersed group) when the search starts in that duster
Mahalanobis distance. For this purpose we can use either the usual dis-
tances or standardized distances. The search can be constrained so that no
observations can leave once they have joined a subset.
We can agairr plot distances for each unit, but now there are three dis-
tances at each stagein the search. We can also project these plots to show
how the closeness of the units to each duster varies during the forward
search. However, for the bridge data, the extra information found from
the search with three centres can be summarized by a forward plot of the
population to which the unit is allocated, tagether with scatterplots of the
data.
Figure 7.37 shows the allocation of units to populations for all units about
which we were undecided, tagether with those for which the allocation
changed at least once during the search, using the customary Mahalanobis
distances. Reading up from the bottarn of the plot we have a key giving the
symbols for the three groups. Then we have a group of units from Cluster
2 which had never been thought tobe atypical of their group in our earlier
analyses, but which were classified in other clusters at least once in this
search. Above them is unit 156 from Cluster 3. Then, in a single group, are
the units which we have singled out earlier for further consideration.
We now consider the individual units in turn. Unit 109 is correctly clas-
sified in Cluster 2 for most of the search. Units 111 and 114 are indicated
7.5 Data with a Bridge 403
3 HIS 6
3 167 6
••••••
6 6 6 6 6 6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
2 131
• • • • • • • • • • • • • • • • • ••• • • • • • • •
2 129
2 118 6
• • • • • • • • • • • • • • • • • • •• • • • • • • •
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
2 117
• • • • ••• • • • • •• ••• • • •• • • • • • • •
2 lOS
•••••••••••••••••••••••••••
2 100
I 67
•+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •• •••••••
+ + + + + + + + +
I ~7 + + + + + + + + + + + + + + + + + + + + + + + + + + +
I 43 + + + + + + + + + + + + + + + + + + + + + + + + + + +
I 24 + + + + + + + + + + + + + + + + + + + + + + + + + + +
I 17 + + + + + + + + + + + + + + + + + + + + + + + + + + +
31~ 6 6 6 6 6 + 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
2 114
•••• •••• •• •••• 6 6 6 6 A 6 A 6 6 6 6 6 6
2 111
•••••••••••••• e • 6 6 6 A 6 6 6 6 6 6 6
2 109
gr. 3
••• •••••• ••••
6 6 A 6
e
6
•••••••
6 6 6 6 6 6 6
•
A A A 6
• 6 6
A 6
6 A
A 6 A 6 6 6 A 6 6
gr. 2
gr. 1
•••• •••• ••••••••••••• ••• •••
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
1 42 14 4 1 46 1 48 1 50 1 52 1 54 1 56 1 58 1 60 1 62 1 64 1 66 1 68 170
FIGURE 7.37. Bridge data: confirmatory search with t hree clusters. Allocation
of units in t he last steps of the search
0 +
+
+
+
A +
A
A
A A
A A
A
A A d>
A A
..
A
AA A A A
A A
A AA
A
A
118 ••
109 .,.
A
A
114 •• •
~~. •
.,.•
111 ~·
•••••• •
168
•
2 3 4 5 6 7
y1
FIGURE 7.38. Bridge data: units not certainly classified in Figure 7.37. • units
in Cluster 2 (81-140)
3 168
3 US7 A A
•••••••••••••••••••••••••••
6 A 6 6 6 6 6 6 6 A A A A A A A A A 6 6 6 A 6 A A
2 131
•• •• •• •• •• •• • •• •• •• •• •• •• •• •• •• •• •• •• •• •• •• •• •• •• •• ••
•• • •• • • •• •• • •• • •• •• • •• • •• •• •• •• •• •• •• •• •• •• •• ••
2 129
2 118
14 2 14 4 146 1 48 1 50 15 2 1 54 156 1 58 1 60 1 62 16 4 1 66 1 68 1 70
FIGURE 7.39. Bridge data: confirmatory search with three clusters using stan-
dardized distances. Allocation of units in the last steps of the search
7.5 Data with a Bridge 405
57
L{)
+
+
0
+ + +
+
++1\:. +
+ + ++++ +
+ iJ. +
+ + +
-10 -5 0 5 10
y1
FIGURE 7.40. Bridge data: units not certainly classified in Figure 7.39. + units
in Cluster 1 (1-80); • units in Cluster 2 (81-140)
0 0
0 0 "'
fb~
y1 =Short term
performance ~ 0
0 0 ll'>
0 0
0
0 0 o 0 0 00 0
~ ~--------------~
0 r---------------~ r----------~0~~
0 0
0
"0
y2=Medium term
0 0 t~~ 0
0 oo\~
performance
~
"'
N
0 0 "'
y3=Medium term
8
-"'
0
0 0 0
volatility
0 ?~OcP 0
0 oar~go 0
0 rrDPo
0 5 10 15 20 25 10 15 20 25
FIGURE 7.41. Financial data: scatterplot matrix
l
.,"'<.>
<Xl
~
<0
c
"'
iii
'5
"'0
:.0
c ~
"'
Oi
.c
"'
:::0
--r
20 40 60 80 100
Subset size m
FIGURE 7.42. Financial data: forward plot of scaled Mahalanobis distances with
mo = 17. There seem to be two groups
20 40 60 80 100 20 40 60 80 100
Subset size m Subset size m
FIGURE 7.44. Financial data: left panel, forward plot of components of the first
eigenvector; right panel, the components of the second eigenvector
at this point. The gap plot in the right-hand panel of Figure 7.45 shows
that there is a large amount of interchange shortly after the peak, as the
ellipsoid changes appreciably as it has to straddle two groups. Figure 7.46
is an ordered entry plot. The observations are ordered by their first entry
into the subset. If there are no interchanges, the plot has the shape at the
top right-hand corner, as units successively enter. Below this we see that
7.6 Financial Data 409
II"!
L() C!
0
:E L()
E
::I
E
.... a. 0
"'
(!}
·c: 0
0
:E ''
' ''
(')
~':
\:
•'
V
C\1
20 40 60 80 100 20 40 60 80 100
Subset size m Subset size m
FIGURE 7.45. Financial data, forward plots: left panel, minimum Mahalanobis
distance among units not in the subset; right panel, gap plot
0
~
0
CO
(i;
"E
0
0
<0
<::-
"E
<I>
i!i ....
0
u::
0
C\1
1"11111111
mmuu
mmnu
0
20 40 60 80 100
Subset size m
FIGURE 7.46. Financial data: ordered entry plot- units ordered by first entry to
the subset. The interchanges around m = 60 cause the white area in the centre
of the plot
most of these units, the last to enter the subset, were previously included,
but were removed by the interchanges starting at m = 60.
Further evidence about the duster structure of the data can be found
by looking at the trajectories of groups of distances during the search.
Figure 7.47 shows six panels of Mahalanobis distances separated at m = 38
410 7. Cluster Analysis
,,
,,::units= 18 OO<th<=06 Uni ts = 18 OO<th< = 06 i\ Units= 17 OO<th<=06
jl
1 I'
I
I I
I
\
\
\
; ~nils= 17 th>06
II ' \ - " ' ' " - " >0>06
,.
I I
II
H.~.
,. I
... ',
.... ~
FIGURE 7.47. Financial data: forward plots of Mahalanobis distances. The six
panels of distances, cut at m = 38, have been rescaled to fill the panels
by cutting on distances less than six. The top three panels contain 51 units
all of which have very similar shaped trajectories, although the magnitude
of the distances increases. The lower left-hand panel contains two kinds of
shape - one which jumps up in the middle like those of the top panels and
one which is low towards the end. Most, but not all, of the remaining units
follow this trajectory.
7. 6. 2 Explomtory Analysis
Figure 7.47 indicates that there are more than 51 units belanging to the first
duster. This corroborates the evidence from the plot of minimum distances
amongst observations not in the subset in the left panel of Figure 7.45,
which shows a broad peak leading up to m = 53. We therefore now look
at the forward plots of individual Mahalanobis distances for some units
entering around this peak. Figure 7.48 shows nine panels of individual
distances starting from m = 47. The first five panels are uneventful; the
observations joining up to m = 51 agree with those already in the subset.
But the remaining four panels show observations entering which are remote,
but seem to have much the same trajectory as those already in the subset.
The second plot of nine panels, Figure 7.49, starts from m = 56, past the
peak in the plot of minimum distances in the left-hand panel of Figure 7.45,
but still a region of high values. Indeed, the units entering are again remote
7.6 Financial Data 411
FIGURE 7.48. Financial data: nine panel plot of forward Mahalanobis distances
for specific units, starting from m = 47
from those that have already entered . The curves form= 56 and 57, units
54 and 50, have an intermediate shape rather similar tothat for observation
21 which enters when m =54. Fromm= 58, when observation 72 enters,
onwards, the shapes look more like those in the final panels of Figure 7.47
for units which enter at the end of the search. In the panels from m = 60,
several curves are given as there is appreciable interchange in this region
of the search.
We now start the search from the other putative duster , which consists
of units with large distances in the initial part of the plot of forward Ma-
halanobis distances, Figure 7.42. We select those units with the 20 largest
distances at m = 46. The forward plot of Mahalanobis distances in Fig-
ure 7.50 again shows that there are two groups. The plot is indeed similar
in structure to Figure 7.42, where we started in the first tentative duster,
except that the changes associated with the introduction of units probably
now not from the second duster starts a little earlier, around m = 45. The
summary of the forward plot of Mahalanobis distance giving the changes
in the distances is shown in Figure 7.51 in which the units are ordered by
their distances at m = 46. There appears to be an almost complete Separa-
tion into two groups around this value of m, although the last part of the
search is uninformative.
412 7. Cluster Analysis
·.
m=6' ncl 69 70,101
..
m•64 lncl 62.63.66, 75.86.99 I (
FIGURE 7.49. Financial data: further nine panel plot of forward Mahalanobis
distances for specific units, starting from m = 56 with reference distribution for
m = 47. The multiple curves in the last five panels show the effect of interchanges
0
(\J
"'c:~ ':1
~
'6
"'0
:ö
c: ~
"'
(ij
.c
"'
::;:
"'
20 40 60 80 100
Subset size m
0
0 :I 'I
::
•! ·! i,
·i
0
CO
·I . ~
0 ::
::E 0 !
<0
.!:
"'OlCl>
c
CO
.r::
ü
0
C\1
20 40 60 80 100
Subset size m
FIGURE 7.51. Financial data, changes in forward plot of Mahalanobis distances
starting with units in the second duster. Units ordered by distances at m = 46
Ii"!
0
"' ~
::;;
E
.... a. ci "'
::>
E
·c: "'
(.!)
0
ci
~
,.
,.
"
••
,,
..
and 49 show up in the gap plot on the right of Figure 7.52, which shows
the interchanges starting at m =53.
We now look at some plots of individual curves and of groups of curves.
The forward plot of all Mahalanobis distances, Figure 7.50 seems to have
two groups and little in the middle at m = 45, so we divide the units into
two t entative clusters at this point, the more compact consisting of those
with distances less than four. Figure 7.53 is the six-panel plot of t hese
distances. There are 46 units in the first row. The first two panels in the
first row and those in the second row are very different in both shape and
scale and form two distinct groups. The division is however less clear in the
third panel of the top row, where some of the curves are of an intermediate
shape and magnitude, which unfortunately is to be expected if there is not
a sharp duster boundary.
The plots of individual distances fromm= 44, given in Figure 7.54, are
helpful in this respect. The units added up to m = 46 appear to belong
together naturally. However those added when m = 47 and 48, are appre-
ciably larger, although not very different in shape. This is the area of the
peaks in the plot of minimum distances amongst units not in the subset,
left-hand panel of Figure 7.52, so that it is to be expect ed that distant ob-
servations will be added. The curves for the observations in the remairring
7.6 Financial Data 415
"Y."~46
1ncl89
lf:, .
.
~
.
~
(\
- ····\ : ····......
~
, ' .,1
1 ••
••
FIGURE 7.54. Financial data, search starting in the second duster: nine panel
plot of forward Mahalanobis distances for specific units, starting from m = 44
panels are all initially high, but finally decreasing, the shape associated
with the lower row of panels in Figure 7.53 and so with the other duster.
This analysis does not enable us to duster all units unambiguously. How-
ever it does provide an excellent starting point for a confirmatory analysis.
The peak in the plot of minimum distances for the first duster in the left
panel of Figure 7.45 and the gap plot in the right-hand panel of the same
figure are at m = 53. If we take the 52 units before this peak and likewise
the 45 units before the first peak in the plots in Figure 7.52 for the second
duster, no units have been incorporated in both clusters and only units 21,
50, 52, 54, 77 and 89 remain tobe clustered. These six units together with
the two tentative dusters are shown with different symbols in Figure 7.55.
The plot dearly shows why these units are the last to be dustered. Obser-
vation 52 is an outlier from Cluster 1. The other observations are shown, in
the two lower panels, to lie between the two clusters found so far. However,
in the panel for y 1 and y 2 , observations 77 and 89 appear to lie in Cluster
1.
Before we proceed to our confirmatory analysis, it is interesting to re-
call where these remaining units enter the two dusters in the two forward
searches, which we can do by considering the plots of individual distances.
Figure 7.48 gave distances fromm= 47 when the search started in the first
416 7. Cluster Analysis
10 20 30 40 50
5 52 "'
(\J
+ + 0
.
(\J
• •• :t"" +.7+
+ + + +
.-.•t••
• •7.
+ li-Vt "'
:f~+
y1 =Short term
performance ....
·j:~> +
•~ • 89
+~
+
+ ~
• + + + +
• 21 • 21 "'
• • 54 50
• • 54() 0
0
"' +
5~ 52
+
•••
0
-.t
+ +
+ ~ + 77 ~ ~++
0
"'
+ ~:w
:it y2=Medium term 89 iJl~ +
. ......' .
21+
50
++
+
performance • 5()21 '++
,_llf:
54 54
•
0
-
(\J
0 • •
+ +
+:t(+ +
"'
(\J
+ 1tt'+<t~ 5~ + ~t+ + + 5
+ -!'~-:\:,. + + ~* iirt+ 0
(\J
21 21
~ 54
50 y3=Medium term
89 89
"'
t'·
volatility
.
• :~··7 77
• i.,-
ile• •
·~
~
•
0 5 10 15 20 25 10 15 20 25
FIGURE 7.55. Financial data: scatterplot matrix; two tentative clusters and six
unclustered units. + units in Cluster 1; • units in Cluster 2
duster. The units to enter fromm= 53 are successively 52, 21 and 89, none
of which had previously been assigned. Figure 7.49 shows that the next two
units to enter are 54 and 50, also unassigned. After this, units enter which
were induded in the second group before the peak in distances. The story
is similar if we start from m = 45 in the second group. As Figure 7.54
shows, four units in the group, 89, 54, 50 and 21 enter before observation
26 which was assigned in the search from Cluster 1. Only observation 77
fails to be induded in either group.
We now interpret these results in the light of the scatterplot matrix,
Figure 7.55, where observation 52 appears as an outlier from Cluster 1, but
away from Cluster 2. This observation is not dose to either duster centre,
but is much doser to that of the first duster. The next two observations to
enter the search from Cluster 1 are 21 and 89. Theseobservations also enter
Cluster 2 in the steps after m = 45. Tagether with units 50 and 54, they
are very much between the groups. It is only unit 77 that is apparently far
from both groups.
7.6 Financial Data 417
20 40 60 60 100
Subset size m
FIGURE 7.56. Financial data: forward plot of scaled Mahalanobis distances for
the 52 observations in Cluster 1. The distances for the six unclustered units are
highlighted
--- ----------.,-52
---
20 40 60 80 100
Subset size m
FIGURE 7.57. Financial data: forward plot of scaled Mahalanobis distances for
the 45 observations in Cluster 2. The six unclustered units are highlighted
2 89 ••••••••••••••••••••
2 77 • + + + + + + • • • • • • • • • • • • •
1 54
••••••••••••••••••••
1 52 + + + + + + + + + + + + + + + + + + + +
1 50
••••••••••••••••••••
••••••••
1 21 + + + + + + + + + + + +
gr. L
••••••••••••••••••••
gr. 1 + + + + + + + + + + + + + + + + + + + +
82 83 84 85 86 87 88 89 90 9 1 92 93 9 4 95 96 97 98 99 101 103
FIGURE 7.58. Financial data: confirmatory search with two clusters using un-
standardized distances. Allocation of units in the last steps of the search
7. 7 Diabetes Data
1.1.1 Preliminary Analysis
Our final example is of 145 observations on diabetes patients, which have
been used in the statisticalliterature as a difficult example of duster anal-
ysis. A discussion is given, for example, by Fraley and Raftery (1998). The
data were introduced by Reaven and Miller (1979). There are three mea-
surements on each patient:
Figure 7.59 is a scatterplot matrix of the data which are in Table A.17.
There seems to be a central duster and two "arms" forming separate dus-
ters. The first duster is appreciably more compact than the other two.
However, the plot of y 1 against y 2 is diagonal with increasing variance as
values of y 1 and y 2 increase. In the absence of the third variable we would
expect the forward search to start at the bottom left-hand corner of the
plot and to include units with increasing values of both variables. Although
there would seem to be no obvious breaks between clusters, each observa-
tion to enter would be slightly and increasingly remote from the duster
already established, so units might be expected to enter with increasingly
large Mahalanobis distances. It is hard, from visual inspection, to tell what
the effect of the third variable is on this argument.
We start our forward analysis with a subset found by the method of ro-
bust ellipses, for which mo = 23. The resulting forward plot of Mahalanobis
distances in Figure 7.60 has some strange features. There is a shortage
of very small distances and overall the distances perhaps fall into three
groups: there is a gap between the largest distances and the rest around
m = 70, these largest distances beingrather uniformly distributed. Around
m = 45 it looks as if there is a gap between the smallest distances and those
which are somewhat larger, which again have a rather uniform distribution.
These impressions are confirmed by the QQ plot of Mahalanobis distances
at m = 30 in the left-hand panel of Figure 7.61 which shows some evidence
of three regimes as well as of three outliers.
We repeat the forward search with a starting point consisting of the units
giving the 15 smallest Mahalanobis distances at m = 45 in Figure 7.60.
7.7 Diabetes Data 421
1600
I!
400 800 1200
0 0
°0§0
g
0
0
oo (loo
0 0
0 0
y1=Piasma r/0
11)0 'b 0
0
glucose resp. 0 t% N
to oral ~0
fii'J
00 0 0
glucose
0 .
<oo~ odl ~
8
~
~ 0 (j 00
0 0 o0
8 0 0 oo
~ 00 0 ~0
•ow
Q)8>%
y2=Piasma
I
0 ~Cl)
0
CX)
insulin resp. 0 Q)
8...
to oral
glucose
~~~og 0
,,
0 0
{} oo 0
0 0 ~
Ii.
Ooo 0 '6
y3=Degree of ...
0
0
<o insulin
0 8
~ @% resistance "'
0~ ~ co
<o 0 0 0
dl i c:o9 O ~&JQ) OoO g<fl6
0
FIGURE 7.59. Diabetes data: scatterplot matrix. There seem tobe three clusters
The resulting forward plot of distances in Figure 7.62 is not much different
from that we have seen earlier, but the gap between the first and second
groups around m = 30 seems a little clearer. The QQ plot of Mahalanobis
distances at this point in the right panel of Figure 7.61 is similar to that
in the left panel, but shows slightly more division into three groups; there
are three approximately linear regions of increasing slope. The plot of the
variance explained by the first two principal components in the left panel
of Figure 7.63 also suggests three groups, with an early peak in the curve
indicating some further structure. As we have noted before, the changes
in this plot occur later than the introduction of the first unit from a new
group. The first two principal components seem to explain virtually all the
variability in the data.
Other plots also suggest three groups without clear boundaries. The for-
ward plot of minimum Mahalanobis distances among the units not in the
subset is in the left panel of Figure 7.64. This starts approximately hor-
izontally, then increases around m = 70 to a larger slope. From about
422 7. Cluster Analysis
U)
Q)
<.>
r::
1ll 0.... -<
i5
U)
:c0
r::
"'
(ij
-'=
"'
::;: 0
"'
0 0 0
0 10
<0 oo
ooOoo
d'o
U)
8r::
0
10
0'
0
U)
Q)
<.>
....
0
""
r::
"'
0
i5
~ ~
0
.ll!
.!'l
-o
0 ~
.
(')
~
U)
:c 0 .!'l e
,.I
(') ~ .0
0 0
r:: #' r:: 0
.-/
"'
(ij 0 "' "'
(ij
"' "'
-'=
::;:
.r=
"' ~
:::;: )
.
~
0 0
0~
2 3 2 3
Ouantiles Quantiles
Subset slze m
m = 110 the distances are all large but oscillate appreciably. This is not
a pattern we have seen before. For the 60:80 data the comparable plot in
Figure 7.8 was initially horizontal: there was a sharp peak as the second
duster entered, followed by a second, roughly horizontal, period before the
424 7. Cluster Analysis
lO
0
::::;:
E
:::1
E
....
·c:
:E
"'
"'
20 40 60 80 100 120 140 20 40 60 80 100 120 140
Subset size m Subset size m
FIGURE 7.64. Diabetes data, forward plots: left panel, minimum MahaJanabis
distance among units not in the basic subset; right panel, gap plot
final outlier. Neither the plot for the data with three groups and two out-
liers, Figure 7.12, nor that for the data with a bridge, Figure 7.26, show
a monotonic trend in the latter part of the plot. However, both the plots
for the financial data, Figure 7.45 and Figure 7.52, do show an increase in
the second half of the search as units from the next group are included.
Tagether with our earlier discussion of the y 1 , Y2 panel of the scatterplot
matrix of Figure 7.59, this suggests that the initial group is being extended
by the inclusion of units from a second group. The last 30 observations look
different again.
The gap plot in the right-hand panel of Figure 7.64 shows that the search
proceeds steadily up to around m = 110, without interchanges. Although
evidence from the minimum Mahalanobis distances suggests that a second
duster is being included, the absence of interchanges indicates that the
change is gradual. The forward plot of the eccentricity of the ellipse from
the first two eigenvalues in the right panel of Figure 7.63 shows both the
change in the shape of the ellipsoid at the beginning of this process and,
very clearly, the effect of the introduction of the third group.
A nine panel plot of individual Mahalanobis distances is given in Fig-
ure 7.65 from a cut at m = 45. As is usually the case, there is a progression
from small distances to !arge, here without any sharp breaks, although the
shape of the distances does change steadily. There are 68 distances in the
first two panels and 28 in the last row. The plot of distances after rescal-
ing in Figure 7.66 shows that the first 68 distances have in common that
they are horizontal towards the end. The 47 distances in the next four
panels have a common decreasing shape, which is shared by, perhaps, four
curves in the last row. The curves for the remairring 24 observations in the
bottom row seem to share a horizontal shape after the initial peak before
they decline. The grouped nature of these last observations is shown very
7. 7 Diabetes Data 425
....
Units=06 09<th<=16 Units=06 09<th<=16 Units=OS
·.
Units= 10 th> 16
FIGURE 7.65. Diabetes data: forward plots of Mahalanobis distances. The nine
parreis of distances come from a cut at m = 45
FIGURE 7.66. Diabetes data: forward plots of Mahalanobis distances. Nine pan-
els of distances from a cut at m = 45. Here the distances in Figure 7.65 have
been rescaled to fill the panels
"" i: i 'll•.. ·i!• ' :·, '11• .. :: ' ~:!1'' 1 ::!·!··!,:!•! !'1' ·:·•:•,•..··•! :: :· ...::.:· ....:.. ;.u: ..•
,. 'I -~,: I :1:'.:1· • • ··~· . :. • .. ..... I;: ...... 1·.;, ih • .-: .... ri:n"•iht'· .... :.. :,•::~~~ .., ...
<f>
<I>
1I I 111. •••• • ··.I ••• J.IJ.II'''" ····.·I .
I ,.11·~··· .. .
0> 11."1' I•
I •.•.'• ... I .. 1''ll •I 't •.1 ... •, •······· ..... '•··•····•·· 1........ ,., •.......
•. •••t
• •• 1 ••'"1••,11 1......•..
c: "'"1111," •11"1111 1111
0
"' •1° · 11 11••
l';,":,:·::ji:·!,·l;jll!.j:l . ·: :·::.::1.,-!:::;1.:11'' ·: ·:!-=··öto·;;·,•::r.:-·,;:öö,;;l:.,l!~,"l·:,··l•l•l.il
t .. t
0
(.)
11.11 •1111· ........ ~ •. •· .. II 1'11 .•.•. lll .h..:,: .............. •. ,.,.,. ,II • '· !'! 1111111111111111 I
....0
• 1, 11 111 :.;. ' •;,. I • • . : • .1: II 111., .. 'I jj•i:•~ 1 !llt
1·1. I .... ,,
1 j
1
1•l•;•l:!ti•ltll.•.:l•:•.h•ll
1
Subset size m
8
y1=Piasma 0
+ +
glucose resp. "'
to oral + + +
+
*+
0
CD
+ +
glucose + +
++ + +
+ + + ++
+
+
i
+ ++ + ++++
++ ~ +
++ + ++ -tt- +++ "t+
+ ++ + ...... +
+ + + + + ++ +-. y2=Piasma + -t+ + +
+ + + +
+
*++
+
*
! :F ++ + + insulin resp. :j: +
++++
++ ++ ++
:f:++ +
+ + :j:
to oral +
+ t ++ +
++ ++ +
+
+
+ glucose +
+
+ +
+ + + + 0
+
+++ ++ +
+ "'
N
+ 0
+ :j: ++ + 0
+
!++- + +i
++ y3=Degree of
N
+ +++ + + 0
+ ++ +.,:!:"':!: + + insulin ~
+* i; + + + +
+ ++ + + + + resistance
+ + §
+ + + + ++ + +
sufficiently small that the search responds slightly to this local irregular-
ity, producing, for example, the initial peak in the plot of the percentage
of variance explained in the left panel of Figure 7.63. Overall, the 70 Ob-
servations are more uniformly than normally distributed over the marginal
ellipses: the density of the observations seems not to increase sufficiently to-
wards the centre. This feature explains the absence of very small distances
in forward plots of Mahalanobis distances, such as Figure 7.62.
The scatterplot of the second tentative duster in Figure 7.69 shows that
these observations also form a loose group, again without a noticeably nor-
mal structure. In Figure 7.70 we finally give the scatterplot matrix when the
observations are divided into three dusters. The scatterplot for y 1 agairrst
Y2 shows that a sensible division has been obtained, although it is not
always dear whether units belong to one duster or to the adjacent one.
However, the other panels which indude Y3 show that one observation in
the third duster seems to be wrongly dustered. We have already noted
that this duster will contain at least one outlier.
428 7. Cluster Analysis
....,
• •
.~'...:··· . •• • • • • • • 0
..
~
• ",. •
y1=Piasma
glucose resp.
to oral
.. .,.' ••
• • ·~.
••
• •• •
• • • ••
•
• •
0
~
glucose •• • • • • CO
R • • ••
.'....
.
•• • • • •
• •
8CD •
• • • ~
•
• ~.· • y2=Piasma • • ••• • ••
.._....,....~·
0
••
0
"' insulin resp.
• ·~
... •• • • • to oral
•• • • • •
0
0
• glucose • ••
• •
0
•
• • ••
0
..•: ... .
• • • "'
.. • ...
0
•• • •
0
... ....
• •••• •
••• •
0
0
.....• •
•• • • • y3=Degree of C')
• •
• • •• • •• •• insulin 8
• ;' • • • •
•
• •
• ' resistance "'
0
~
" "
" ""~" ../)""
t "
" " "" •
y1 =Piasma t"" " "'"~tJ.tJ.
glucose resp.
to oral
flll"
• • Ii
"~
. . , ..
glucose ~··· ""' "
':
.. A"&
"""
y2=Piasma
~·
."""',. "'..
'6 /!' ""
.. . . "
insulin resp.
to oral ··-'!·- tJ.b.
glucose ~···
,
" •
...•. ....".•'
lJ
,· y3=Degree of
..
f100
••~
•
%
'l. ~ A"•
200
I!.
.$-
II.
300
6
J
~·
... ,
•
~6{1, ll. lJ.
FIGURE 7.70. Diabetes data: scatterplot matrix of all observations in three ten-
tative clusters
2 and 3 have curves which are mostly very different from those for Cluster
1.
Next we look at the curves for individual units in tentative Cluster 1.
Figure 7. 73 shows the curves starting with m = 50, since we want to
monitor the curve for unit 79, which enters at m =54. The central panel
of the figure shows that the trajectory for this unit is different from that
for the duster not only at the end, but also in the centre of the search, as
the distance decreases from being large to small. Other units which seem
a little different are 10, entering when m = 56 and 43 when m = 58.
The continuation of this analysis in Figure 7. 74 suggests several more units
which behave differently. In order of entry these are 83, 2, 81, 3, 42, 33
and 105. Only two units in this figure seem typical of the duster. The
last three units to enter are 40, 59 and 60, which may also be categorized
as behaving atypically although, of course, the reason that they enter the
duster later in the search is that they have larger Mahalanobis distances
than the observations which entered earlier.
430 7. Cluster Analysis
0
N
""!
.,"'
0
<=
~
'ö
Vl C!
:i5
0
<=
.,"'
o;
.s::;
:::;;
"'ci
0
ci
...... __ ...
FIGURE 7.72. Diabetes data: forward plots of scaled Mahalanobis distances for
the three dusters from a search starting in the first duster
7.7 Diabetes Data 431
· r--m
-•~
5~2 ~
1nc~l~
76~----~
FIGURE 7.73. Diabetes data: nine panel plot of forward scaled Mahalanobis
distances for specific units, starting from m = 50
~ m•60 lncl 02 7
'~
!~
~ L----------------------J ~L---------------------_J
m•63 lncl 81 ~ r---m--•~6~4~1n-c~I~O~J----------~
:~
~ L----------------------J
~~
m•66 lncl 33 m•67 lncl 10~
FIGURE 7.74. Diabetes data: further nine panel plot of forward scaled Maha-
lanobis distances for specific units, starting from m = 59 with reference distribu-
tion for m = 50 .
.".
"'"'
0
"' ~
c:
"'
u;
'0
"'0
:0
(\j
c:
"'
(ij
.<=
"'
:::;;
FIGURE 7.75. Diabetes data: forward plot of scaled Mahalanobis distances for
the 45 units of Cluster 2, starting from a search in Cluster 2
7.7 Diabetes Data 433
FIGURE 7.76. Diabetes data: forward plots of scaled MahaJanabis distances for
the three clusters from a search starting in the second cluster; distances separated
at m =40
...,
....
,....
N
~
c Ii')
0>
<I>
0
...,
CX>
c
,....
0 0>
'- Ii')
Q>
,....
0 ....
"0
..., Ii')
,.,
N
0 _j
20 40 60 80 100 120 140
Subset slze m
FIGURE 7.78. Diabetes data: forward plot of scaled Mahalanobis distances for
the 45 units of Cluster 2, starting from a search in Cluster 2 as in Figure 7.75.
Units 25, 38 and 50 may be atypical for the duster
7.7 Diabetes Data 435
!'"'c:! <0
*
'6
"'0
:c
c
<D
"'
äi
-'=- .,.
"'
::;:
"'
0
FIGURE 7.79. Diabetes data: forward plot of scaled Mahalanobis distances for
the 30 units of Cluster 3, starting from a search in Cluster 3
,,
• Uni ts= 15 Group 2 Uni ts= 15 Group 2
----------':::~
~... czZC
FIGURE 7.80. Diabetes data: forward plots of scaled Mahalanobis distances for
the three clusters from a search starting in the third duster
436 7. Cluster Analysis
with Cluster 1, the distances in Figure 7. 72 for Cluster 1 are initially low,
but then increase as the centre of the fitted ellipsoid moves away from the
duster. In Figure 7.80, on the other hand, the distances for Cluster 1 are
initially large, fl.uctuate together as the centre for Cluster 3 changes and
then decrease as the centre moves towards fitting all observations and so
towards Cluster 1.
+ + + + + + + + + + +
.. ...••• ••••
+ + + + + + + + + + +
+ + + + + + + + + + +
• • •• • •• ••• ••
+ + + + + +
.
+ + +
+ + + +
+•• • • • •+
•
• +
+ + + +
! ~······························· •••••••••••
:1::::::::::::::::::::::::::::::: :::::::::::
1 ........................................... .
!~::::::::::::::::::::::::::::::: :::::::::::
•••••••••••
••••••••••••••••••••••••••••••••••••••••••
~~t~o•••••••••••••••••••••••••••••••
,t·······························
a; •••••••••••••••••••••••••••••••
:,~:::::::::::::::::::::::::::::::
•.......•..
.... ........ .. . .. .. . .. .. . ...... .... . .......... .
•••••••••••
:::::::::::
gr .. 1 •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+ •+
gr
101 104 107 110 113 116 1 19 122 1 25 128131 134137 140 143 146
3 134
••••••••
• ••• . •••••••••••
• .
•••
• • • • • • • • • •
•• • • • •• •••
.
2 10~ + + + + + + + + + +
2 83
• • • •• + + +
• +
••• + + + + +
•• • • • + + + + +
• • •• •• ••• •• • • ••••••
. .
2 81 + + + + + + + + + +
2 79 + + + + + + + + + + + + + + + + + + + + + + + + + + + +
•• •• • • •
. .
I 6~ + + + + + + + + + + + + + + + + + + + +
•
.
I ~ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
I l8
I 33
+
+
+
+
+
+
+
+
+
+
+
+ .
+ +
+
+ +
+
+
+
• +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+ +
+
+
+ +
+
+ +
I 31 + • + + • + + + + + + + + + + + + + + + + + + + + + + + +
1 26
1 2!1
+ + + + + + + + + + + + + + + + + + + • + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1 16 + + + + + + + + + + + + + + + +
• + • • •
+ + + + + + + +
3 137
• • • • • • • • •• • •• ••• • • •••
•• • • • •• •
3 13!1
• • • • • • • •••• •• ••• • • ••
••• • • • •••
3 131
• • • • • • • • •• • •• • • • • • • ••
• • • • • •••
3 136
• • • • • • • •••• •• ••• • • •••
•• • • • • • •
3 II~
• • • • • • • •••• •• •• • • •• • •• ••
• • •• • • • • ••
2 111
•• ••• •• • • • •. • •. • •• •• • •• • •• • .. • •• •• .• •• •• •
2 107 • • • • • • •
•+ • • •+ • •
2 84 + + + + + + + + + + + + + +
•••••••• • • • ••
1 75
••••••••••• •• •• ••••••••• • • • ••
1 64
••••••••••• •• ••••••••••• • •• • ••
296 + +
• • • • • •
+ + + +
. •.
+ + + + + + + + + + +
• • • • • • • • • • • • • • • • • • • •
+ + + + + + + + + + +
.
gr. 3
gr . 2
gr . 1 +
•••••••••••••••••••••••••••••
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
115 1 17 119 121 123 125 127 129 131 133 135 137 139 1 41 143 14 5
FIGURE 7.82. Diabetes data: duster membership during the second confirmatory
search with three dusters using standardized distances starting with mo = 117
438 7. Cluster Analysis
400 800 1200 1600
6 !J.[ /j.O,
6~
6 t. 6 6
6
y1=plasma
glucose 6
I'"
111
8<0
8C\1
~
y2=plasma
8CXl
insulin
0
....0
• •
•• •••
y3=insulin 8....
resistance
~"" f;/A
"\&' A
Cluster I II III
Doctors 1:76 77:112 113:145
Certain 96 64, 75, 115
131, 136
Uncertain 79, 83 81, 105 134
monitor these units separately and keep the established dassification, which
originated from our conservative exploratory analysis on single dusters .
A general feature of Table 7.1 isthat Cluster 2 appears to have "stolen"
several units from the other dusters and indeed we would expect that, using
standardized distances, units would be attracted to a less dispersed duster.
But it is instructive to compare our dustering with that of the doctors.
7.8 Discussion 439
8
""'h
~
"b "
0
0
;!:
"
8 " "
~
"
" "
8
~ "" "" "
~
" 6
0
0 " & ""
"' 131
.. .. . .
135
6 111
§
.,
6
1RJ • " •
0
0
• + .H!J ..
++ ;*144+-tt'F"'i t+
•
+:t
•• •
* + ++
~
+ + +"*ttt-+
....__ + ct.+++ -t
+
y3
7.8 Discussion
Our robust dassification methods are notably different from those that ap-
pear in the literatme and that are currently used in applications. Most
traditional methods of duster analysis allow automatic classification of the
available data. This means that, after the user has selected a specific dis-
tance definition and what algorithm to use, the procedure ends with a sharp
440 7. Cluster Analysis
400 800 1200 1600
l1l1
l1 l1
6 if l1
l>.Ji; @>
tJ.l1 .!;,!16
l1 l1
6 6
y1 =plasma l'l1 a.t>.~ 8
glucose 6 ~ "'
""
6 1ft ~
§ ....--------6-b_l>._l>.,...,
l>.- · 6
·'
... ••
• 8
8 l1
6 l1 6.6
l1
~ l16
"' 66 l1 i6
8
t!#
/A t>.
y2=plasma l,l,e>~
;j
~tr.
insulin
·~': ••• •
• •
•• ••• 8
+
"'
y3=insu lin 8....
resistance
8
"'
100 200 300 0 200 400 600
FIGURE 7.85. Diabetes data: scatterplot matrix of all observations with the
doctors' dustering
y1=plasma
glucose
•• 8
y2=plasma
insulin
8.... •• •
•
••
y3=insulin 8....
resistance
8C\1
FIGURE 7.86. Diabetes data: scatterplot matrix of all observations. Units which
we classify differently from the doctors are highlighted with the doctors' dustering
symbol. Units uncertainly dustered in Table 7.1 are represented with squares
L L
n1 n2
dc1Cz = di1iz/(n1nz) ,
i 1 =1 iz=l
which gets amazingly large even for moderate values of n a nd g. For in-
stance, even in a small size problern where n = 100 and g = 5 this number
is as large as 1068 (Everitt, Landau, and Leese 2001). In practice most par-
titioning methods operate through iterative algorithms which approximate
the global objective function. These iterative algorithms are computation-
ally fast and can be applied effectively to large databases.
The most widely adopted partitioning technique is the k-means algorithm
(we should say 'g-means' in our notation). Although a number of variants
of this method are currently available, its basic version works through the
following steps:
1. Given g initial duster centres (called seeds), the algorithm starts with
a t entative g-group dassification by assigning each observation to the
dosest seed;
(a) Compute the distance between the ith observation and each cen-
troid;
(b) lf the dosest centroid is not that of the duster to which the ith
observation currently belongs, reassign the observation to the
nearest duster;
(c) lf duster membership has changed at Step 2b, recompute the
duster means.
444 7. Cluster Analysis
FIGURE 7.87. The 60:80 data: dendrogram from the average linkage algorithm
using Euclidean distances
example, the pattern of the data is obvious and we expect that any sensible
dustering technique should be able to recover it. In Figure 7.2 we showed
that this is not the case if dassification is performed through robust Maha-
lanobis distances. We now see that the performance of traditional methods
of duster analysis may be slightly better, but is stilllargely unsatisfactory.
Figure 7.87 gives a dendrogram of the 60:80 data, obtained by hierar-
chical dassification through the average linkage algorithm and Euclidean
distances on standardized variables. The partition with g = 2 (the true
number of groups) is dearly useless for dassification purposes, since it sep-
arates the natural outlier ( unit 57) from the rest of the data. Even if we put
aside this atypical observation, the picture seems to suggest a partition with
four dusters and an additional outlier (unit 27). Hence, hierarchical duster
analysis would split the dispersed group into three different subgroups plus
some noise.
On the partitioning side, Figure 7.88 shows the 2-group dustering of
the 60:80 data obtained through the robust PAM algorithm and the L 1
distance on standardized variables. In this picture, called clusplot (Pison,
Struyf, and Rousseeuw 1999), each observation is represented by a point
in two dimensional space, using principal components. Of course, in this
bivariate example the first two components explain all of the variability.
Around each duster an ellipse is drawn. Even in the ideal situation where g
is known in advance (an overly optimistic assumption in most applications),
the algorithm wrongly allocates three observations from the large group
(units 36, 43, 72). The k-means method provides very similar results, with
446 7. Cluster Analysis
"' oo 0 0
0 0 0 0
0
0<eo "!,<f'o
0 0 0 0 0 0
0 0
0 Oe!> 0
0 0
"' 0
'&,o'b aoo0
~
0 0
:8
0 0
0
0. 00 0
0
oo
E 0
0 0
0 0 0
0 0 &
')'
0
-2 -1 0 2
Component 1
FIGURE 7.88. The standardized 60:80 data: clusplot from the PAM algorithm
with g = 2 and the L1 metric
oo 0 0
0
0 ?,o go<>o
"'.,<= l>
<=
8.
E l>
0
(.) l>
l>
"'
'l' l>
l>
l>
l>
'f
-1 0 2 3
Component 1
FIGURE 7.89. Standardized bridge data: clusplot from the PAM algorithm with
g= 3 and the L1 metric
highlight. For instance, the decisions made about variable scaling, or about
what distance and what algorithm should be used, can lead to remarkably
different results.
Model-based dustering methods take a different way and make an ex-
plicit link between duster analysis and formal statistical models. Specifi-
cally, it is assumed that observations Yi , i = 1, ... , n, are a sample from a
mixture of g multivariate distributions, each with mean f-Lt and covariance
matrix I:t, l = 1, .. . , g. This is the formulation we used in Chapter 6 for
discriminant analysis, but, to repeat, here neither g nor the duster labels
are known. Let ft(Yi; f-Lt, I:t) be the density of observation Yi from the lth
distribution. In most applications ft (Yi; f-Lt, I:t) is taken to be the density of
a multivariate normal distribution, so from (2. 7)
Let (} = ((} 1 , ... , Bn) T be the n-dimensional vector containing the duster
label of each observation. That is, (}i = l if Yi belongs to the lth population
(i = 1, . . . ,n; l = 1, ... ,g). Similarly, let 1r = (n1, ... ,n9 )T be the g-
dimensional vector of mixture weights. That is, 1r1 2:: 0 is the probability
that an observation comes from the lth population (,Ey= 1 1r1 = 1). There
are essentially two alternative approaches to model the dusters, according
to whether the focus is on (} or on 1r.
448 7. Cluster Analysis
7. 8. 5 Further Reading
Anderberg (1973) and Hartigan (1975) are two dassical references of the
early literature on duster analysis, from which the basic algorithms origi-
nate. Kaufman and Rousseeuw (1990), Gordon (1999) and Everitt, Landau,
and Leese (2001) provide more recent accounts, where a number of ernerg-
ing topics are also considered. Among these we mention the important issue
of duster validation, that is assessment of the validity of dassifications that
have been obtained from the application of a dustering algorithm. The val-
idation step can be performed to judge the effect of any choice made along
the duster analysis application. For instance, the same algorithm might be
applied using different distance measures, or after removal or transforma-
tion of some variables. As a diagnostic tool, it can be applied to highlight
7.8 Discussion 449
the effect of one or several observations on the duster analysis solution, thus
providing influence measures for dassification (Jolliffe, Jones, and Morgan
1995, Cheng and Milligau 1996). However, these measures can suffer from
masking and swamping precisely as any other backward method based on
case deletion. It also seems desirable that the dustering of a set of units
should remain unchanged after the removal of dusters from which they are
absent. Kaufman and Rousseeuw (1990, p. 239) call this property duster
omission admissibility. Some robustness issues of partitioning methods (es-
pecially k-means) are examined in Cuesta-Albertos, Gordaliza, and Matnin
(1997) and Garcia-Escudero and Gordaliza (1999).
Maronna and Jacovkis (1974) were among the first to exploit a version of
the k-means algorithm based on the Mahalanobis distance. In the context
of hierarchical duster analysis, the use of Mahalanobis metrics is consid-
ered by Gnanadesikan, Harvey, and Kettenring (1993), at the cost of the
assumption of a common covariance matrix for all groups. Alternative par-
titioning methods based on minimizing different functions of the within
groups matrix of residual sums of squares and products of the data are
described in Friedman and Rubin (1967) and Marriott (1982).
Cluster analysis methods based on maximizing a likelihood function orig-
inate from the work of Scott and Symons (1971). Fraley and Raftery (2002)
give an overview of recent advances in model-based dustering techniques
and applications, with emphasis on the dassification approach. Symons
(1981) , Banfield and Raftery (1993) and Fraley and Raftery (1998) provide
applications to the diabetes data of §7. 7. McLachlan and Peel ( 2000) is
a book length treatment of the mixture approach to statistical modelling,
with component distributions possibly different from the normal one.
Finally, we mention that duster analysis methods (especially k-means)
are a key tool in what is nowadays called data mining, that is the "analysis
of (often large) observational data sets to find unsuspected relationships
and to summarize the data in novel ways that are both understandable
and useful to the data owner" . We refer to Hand, Mannila, and Smyth
(2001) (from which the quoted definition of data mining is taken) and to
Hastie, Tibshirani, and Friedman (2001) for a broad description of this
research area.
450 7. Cluster Analysis
CO
0 CO
....
0
"'
0 0 0 0 0
0 0 C\1 0 0
0 o '8::,& sl?" o o CJD&Bi?" o
cn<lQ~
C\1
C\1
,g C\1
cn ooo~ §;j
•
>- 0 >-o 0 0 @ 0
0 o cP ao
•oJ•o~
0 00 0 0
0 0
0
0 0 0 .,.
0
~
0 0 ~
~
-40 -30 -20 -10 0 10 20 -30 -20 -10 0 10
y1 y1
FIGURE 7.90. Three Clusters, Two Outliers: ellipses when (left) m = 60 and
(right) m = 61, starting from mo = 28. Filled symbols are used for units in the
subset
7. 9 Exercises
Exercise 7.1 Figure 7.90 shows the ellipses at m = 60 and m = 61 for the
Three Clusters, Two Outtiers example of § 1.4, when the search is started
from m 0 = 28. Figure 7.91 gives the ellipses at steps 149 ::::: m ::::: 152 for
the same search. Comment on the pictures and compare them to the entry
plot of Figure 7.1 0.
Exercise 7.2 Apply the PAM algorithm from S-Plus to the 60:80 data of
§ 7. 3, using the L 1 distance. Set g = 2 and do not standardize the variables.
Obtain the clusplot and comment on it. Which units are now incorrectly
classified?
Exercise 7.4 Highlight observations 118 and 168 in the scatterplot of the
bridge data of §7. 5. Explain why they are in the 'wrang' position, thus being
incorrectly classified by the forward search.
Exercise 7.5 Apply hierarchical dustering to the bridge data of§7.5. Adopt
average and single linkage m ethods based on the Euclidean distance after
variable standardization.
7.10 Salutions 451
m=149 m=150
: ..
-;~"'
··~
..
..
m=151 m=152
FIGURE 7.91. Three Clusters, Two Outliers: ellipses when m =149, 150, 151 and
152. Filled symbols are used for units in the subset
7.10 Salutions
Exercise 7.1
The tight ellipse in the left panel of Figure 7.90 surrounds all the units
of the tight group, Iahelied from 81 to 140 in the entry plot, which form
the subset when m = 60. When m = 61 the first observation from the
dispersed group ( unit 68, as we can see from the entry plot) enters the
subset, thus changing the orientation of the ellipse and increasing its size.
The rotation of the ellipse is responsible for the sudden change in the
ordering of Mahalanobis distances apparent in Figure 7.11.
From Figure 7.91 we note that when m = 149 both outliers are in the
subset, with one on the outer ellipse. When m = 150, the more distaut
outlier has left the subset. All other observations now have smaller Ma-
halanobis distances than the remaining outlier, which is on the ellipse. In
particular, not all members of the third duster of observations are in the
subset. So, in the next step, the lower left-hand panel of Figure 7.91, there
is an interchange and the ellipse changes orientation, reflecting the exclu-
sion of the two outliers and the continuing inclusion of members of the
third group. This interchange is also clear around m = 150 in the top two
rows of the entry plot of Figure 7.10, corresponding to observations 159
and 160 (the two outliers) .
452 7. Cluster Analysis
"'
l
E u;> +
8
+
·5 0 5 10
Component 1
FIGURE 7.92. The 60:80 data: clusplot from the PAM algorithm with g = 2 and
the L1 metric. The variables are not standardized. Different symbols are used for
the two clusters identified by the algorithm
Exercise 7. 2
Figure 7.92 is the dusplot from S-Plus. Comparing it with Figure 7.88,
we see that more units from the diffuse group are now misdassified. The
resulting ellipse for the duster containing the tight group has changed
orientation and has increased its size, so this duster is even more dispersed.
Furthermore, the two ellipses now overlap. This is an indication that the
detected groups are not well separated.
There are 9 observations from the diffuse group which are dassified in-
correctly. From the output of the S-Plus PAM procedure, we see that these
are units 13, 22, 33, 36, 43, 45, 57, 71 and 72.
Exercise 7.3
Figure 7.93 is the output from the S-Plus MCD function. As for the 60:80
data set, the very robust procedure settles over different dusters, including
all the Observations from the tight group, the bridge and part of the diffuse
group. The orientation and shape of the ellipse are very similar to that of
Figure 7.2. Its centre lies approximately in the middle of the bridge, so the
index plot of Figure 7.94 now also contains some very small distances.
Exercise 7.4
Figure 7.95 replots Figure 7.18, with different symbols for the three groups
7.10 Solutions 453
057
045
0
0 0 0
0 0 0 0
0 oo o o
0 0 0 0 0 0
0
0
0 0\1& 0 0
0 q,oo
0
q,
0 0
0 0 0
0 0 "so:>
o'O 'b.ll.g_
027 oVIJOP '~>o,~0
o 0 Jlo~
-10 ·5 0 5 10
y1
FIGURE 7.93. Bridge data: ellipse from very robust fit nominally containing
97.5% of the data. The half of the data used for estimation contains observations
from all clusters
057
045
027
0
0 0
0 0
0 (j) 0
0 0
0 0
0
0 0 0 0 0 c9 0
00 0 6'0 0
0
0 0 0
0 oo 0 o
0
0 0 0 oo
0
0 50 100 150
Index
FIGURE 7.94. Bridge data: index plot of robust Mahalanobis distances from the
MCD fit . The horizontal line is the nominal 97.5% point of the distribution of
distances
454 7. Cluster Analysis
+
"'
+
+
+ ++ +
+ + +
+ + ++ + +
C\J + +t+ +
>. + + + + + +
0 - "!;. .... + +
~ + + + +
+ + +
+++ + + +
+ + +
+ +
+-IJ. + ....
+
+ +
+ "" 'Ir @
+
+
"'t,.tz,.t:P>
'{>
"~-
+
"" 1680
-10 -5 0 5 10
y1
FIGURE 7.95. Bridge data: scatter plot with observations 118 and 168 labelled.
Different symbols are used for the three groups and also for the two Iabelied
observations
and with labels for observations 118 and 168. Both observations lie on the
borders of the tight duster, although in different directions. Unit 118, which
actually comes from this duster, is in the same direction as the bridge and
is indeed very dose to one unit from it. On the contrary, observation 168
is extreme with respect to the bridge to which it belongs and lies well
on the incorrect side of the boundary line separating the two groups. The
positioning of units 118 and 168 in the scatter plot thus explains why they
exhibit forward Mahalanobis distances (see Figure 7.29) that are more akin
to those of the duster to which they do not belong.
Exercise 7. 5
Figure 7.96 shows the dendrogram obtained through the average linkage
algorithm. This picture is similar to the corresponding plot for the 60:80
data (Figure 7.87), with one unit (57) very far, in the Euclidean metric,
from the bulk of the data; it suggests that the dispersed group should be
split into a few dusters. There is now evidence of one or two additional
clusters, more akin to the tight group of observations pictured on the left
of the dendrogram, although less compact. In fact, these dusters indude
7.10 Solutions 455
Average Iinkaga
E
·~ N
:I:
FIGURE 7.96. Standardized bridge data: dendrogram from the average linkage
algorithm, using the Euclidean distance
all the units from the bridge with the exception of observation 168, which
is still interchanged with observation 118.
As a contrast, Figure 7.97 gives an example of the "chain effect" often
exhibited by the single linkage method. In this algorithm dusters are joined
at each stage if their boundaries are sufficiently dose. This may result in
sequences of individual aggregations, starting from the nearest objects and
adding observations with increasing distance from the existing duster cen-
tres. Two dusters merge as soon as one gains a single observation between
them that is sufficiently dose to the boundary of the other duster. This
behaviour is apparent in Figure 7.97. The agglomerative process starts by
fusing together the units belonging to the tight group , to which the units in
the bridge are then added. After one single observation from the dispersed
group has joined this duster, it merges with one relatively large duster
coming from the dispersed group. Then other units from the diffuse group
join, induding another relatively large duster. It is also instructive to note
that the two main dusters obtained from splitting the dispersed group (pic-
tured one on the left and the other on the right of the dendrogram) do not
fuse together, as in the average linkage method, before joining the large
duster coming from the tight group. This example shows the poor per-
formance of the single linkage algorithm for the purpose of distinguishing
poorly separated groups.
456 7. Cluster Analysis
Single Iinkaga
FIGURE 7.97. Standardized bridge data: dendrogram from the single linkage
algorithm, using the Euclidean distance
8
Spatial Linear Models
8.1 Introduction
The main goal of spatial modeHing is to provide a description of continu-
ous or categorical phenomena observed at locations (i.e. points or surfaces)
in space. By far the most common applications have been on the earth's
surface, where each location can be described by a two-dimensional vec-
tor of geographical coordinates. One example is the analysis of yield from
spatially contiguous plots in an experimental design, when the plots are
subject to different treatments the effects of which are to be estimated.
Another major example in environmental sciences is the study of pollution
data recorded at a number of monitoring stations within the same area.
In all applications considered in previous chapters of this book, as well
as in all analyses performed in the companion volume by Atkinson and
Riani (2000), we made the basic assumption that the data consisted of
independent observations from the same distribution, possibly after some
transformationandremoval of outliers. However, the independence assump-
tion is usually not realistic when analysing spatially indexed variables. In
a spatial context it is often reasonable to expect that the response read-
ings on a sample element will be similar to the readings on other elements
close to it. Statistically speaking, this means that observations collected
at neighbouring sites are usually more similar than would be predicted by
an independent outcomes process. This phenomenon is known as (positive)
spatial dependence and its recognition has been a fundamental advance
towards a more realistic description of geo-referenced variables.
458 8. Spatial Linear Models
sets for which the kriging model may be appropriate and serve the purpose
of describing the information potential of the forward search for kriging.
We move to spatial autoregression in §8.7. After giving some theoretical
background, we describe a few standard diagnostics for these models. Since
explanatory variables might be at hand, we focus also on leverage measures.
Agairr the effects of masking and swamping are paramount. In §8.8 we
describe our block forward search algorithm for autoregressive models. In
§§8.9 and 8.10 we apply the block forward search principle to several data
sets, agairr showing the power of the forward search.
The chapter ends with some suggestions for further reading.
where 1-L is a fixed but unknown constant and errors {8(s): s E V} follow
a spatial stochastic process with mean 0. This representation assumes the
absence of large-scale variation, that is spatial trend in the expected value
of the response variable. However, spatial dependence of errors implies that
Observations taken at different sites are correlated.
For inference from a finite realization of process (8.1), we have to impose
some constraints on the spatial dependence structure of the response vari-
able. In kriging and other geostatistical models it is customary to restriet
attention to the second-order properties of y( s) . The most familiar measure
of dependence between pairs of random variables is their covariance
valid if var{y(s)} = var{y(t)} = o-;. Hence, the factor 2 vanishes from both
sides and we obtain
v(s, t) = o-;- c(s, t). (8.3)
We suppose that for all pairs s, t E V
E{y(s)} 1-L (8.4)
and 2v(s,t) 2v(s- t) = 2v(h), (8.5)
y(si)
y(so) ? •
•
•
does not involve the unknown mean J.L. Hence the sample analogue of (8.6)
is not affected by the finite sample bias introduced by estimation of J.L in
c(h). Furthermore, the potential bias originated by misspecification of a
constant mean in the ordinary kriging model (8.2) is usually smaller for
the estimator of 2v(h) than for the covariance estimator (Cressie 1993,
pp. 70-73).
In practice, measurements on the response variable are collected at a
network of n spatiallocations, say S = (s 1 , ... , sn)· The available sample
is then the n x 1 vector y = (y(s 1 ), ... , y(sn)f. To simplify notation,
sometimes we write Yi = y(si) for the observation at site Si, i = 1, ... , n.
Prediction rather than parameter estimation is the main goal of kriging
applications. That is, given observations in S, one is usually interested
in prediction of the value y(so) at the unsampled site s0 E D. A simple
example with n = 4 observation sites is shown in Figure 8.1.
For the purpose of spatial prediction, we have to consider the amount
of spatial dependence between y(s 0 ) and all sample elements y(si), i =
1, ... , n, as well as among all pairs of observations in the sample. Let
v = (v(so- si), ... , v(so- sn))T. Define Y to be the n x n symmetric
matrix whose elements are given by v(si- s1 ), for i,j = 1, . .. , n. Given the
462 8. Spatial Linear Models
(8.7)
where
(8.8)
T _1 (1- JTY- 1 v) 2
2
a (s 0 \S) = E { y(so)- y(so\S) }2 =
A
v Y v- JTY-lJ (8.9)
E(aT y) = E{y(so)},
c(s) rv N(o,a;),
where the measurement error variance a; does not depend on s, and
cov{c(s),c(t)} =0
8.2 Background on Kriging 463
for all pairs of sites s =f. t E 1J. Again, 11 is unknown and {y(s): s E 1J} is
assumed to be intrinsically stationary.
The quantity a; constitutes a part of what is usually called the nugget
effect of the variogram, a positive limit to which 2v(h) may tend as llhll ~
0. Therefore, measurement error is one reason for the possible discontinuity
at the origin allowed in many variogram models (see §8.2.2 below), since
by definition
2v(h) = 0 if llhll = 0.
Another possible cause for the nugget effect is discontinuity at very small
scales, which is exhibited by many geophysical phenomena.
Under the measurement error model (8.11), interest usually lies in knowl-
edge of the smoothed version
y*(s) = 11 + 8(s)
of y( s), rather than in prediction of y( s) itself. Proper appreciation of the
role of measurement error is important not only from the modeler's point
of view, but also for its implications on the performance of the forward
search (see the remark at the end of §8.3.1).
If there is measurement error, the variogram of the response variable can
be decomposed as
and equation (8.8), which gives the vector of prediction coeffi.cients ry, must
be modified as follows.
Let v* = [v*(s 0 - sl), ... , v*(s 0 - sn)]T be the n x 1 vector that repre-
sents spatial dependence between y(s 0 ) and y(si), i = 1, ... , n, under the
measurement error model (8.11). The elements of v* are defined as
if (8.13)
and
1 2
v*(so- si) = 2var{y(sih- y(si)z} = ac: if (8.14)
Equation (8.13) yields the same semivariogram value as under the basic
model (8.2), since measurement error has no effect on prediction of a new
observation at an unsampled location. On the contrary, equation (8.14)
takes into account the variability of repeated measurements at location Si·
The matrix Y = [v(si- sj)] remains unchanged and still has zeros on its
main diagonal, since observation of the noise-corrupted process y( s) is the
464 8. Spatial Linear Models
where
(8.16)
(8.17)
so that the kriging predictor now smooths the observed data instead of
being an exact interpolator. On the contrary,
That is, measurement error does not affect prediction at locations where
no data have been observed.
In practice it is difficult to obtain sample information about the true
value of the measurement error variance a;.
Furthermore, measurement
error might also be confounded with a microscale component, that is a
structure of the observed phenomenon with a range shorter than the sam-
pling support. In the absence of actual replications of the measurement
process or very close sampling points, a tentative estimate of a; can be
obtained by linear extrapolation of variagram estimates near the origin or
by subject-matter information. Knowledge about the physical nature of the
problern is essential at this stage.
It might be argued that ordinary kriging models (8.2) and (8.11) pro-
vide a simplistic representation of natural phenomena, as they imply the
absence of large-scale variation in the response variable. The more general
approach of universal kriging allows for the definition of a space-varying
mean function, say f.-l(s), at each location. However, for the purpose of our
diagnostic analysis through the forward search, models (8.2) and (8.11) are
precisely what is needed: a simple benchmark to which all observations
are to be contrasted. After a brief description of some isotropic variagram
models, in §8.2.3 we introduce a simple example of a spatial data set con-
taining a few known outliers which serves the purpose of illustrating our
main ideas in kriging.
8.2 Background on Kriging 465
~0 + ol {1.5llhll/02- 0.5(llhll/02)
llhll =0,
v(llhll; 0) = { 0 < llhll :::; 02,
3}
Oo + 81 llhll ~ 02,
(8.20)
where Oo, 81 and 82 arenonnegative parameters and 0 = (Oo, 81, 82)r. The
value of Oo is the nugget effect, since v(llhll)-+ Oo if llhll-+ 0. Furthermore,
00 + 01 gives the variance of the process (also including measurement error),
while 02 defines the range of spatial dependence. In fact, under this model
observations are uncorrelated if their distance llhll ~ 82. Representation
(8.20) is known as the spherical semivariogram model.
Another popular semivariogram model is exponential
~0 + ol {1- exp(-llhll/02)}
llhll = 0, (8.21)
v(llhll; O) = { llhll > 0,
Spherical semivariogram
.---------------~-----. ~ ~~--------------------,
~ "' ~ "'
..<:: "'
() = (1,2,5)T ..<:: "' () = (0.5, 3, 8)T
~ ~
0 • 0 .
0 2 4 6 8 10 0 2 4 6 8 10
llhll llhll
Exponential semivariogram
~ ~
() = (1, 2, 5)T
~ "' ~ "'
..<:: "' ..<:: "'
'-'
;::. ~
0 . 0 .
0 2 4 6 8 10 0 2 4 6 8 10
llhll llhll
FIGURE 8.2. Spherical and exponential semivariogram models, for selected pa-
rameter values. All models show a nugget effect
TABLE 8.1. Simulated kriging data before contamination: readings are at the
nodes of a 9 x 9 regular lattice
Row Column
1 2 3 4 5 6 7 8 9
1 11.39 13.43 12.79 13.20 9.85 11.37 11.97 10.79 10.53
2 11.83 12.70 10.77 13.79 14.43 11.31 8.73 6.79 7.21
3 11.72 12.26 15.64 12.31 12.34 10.21 8.95 7.68 10.58
4 12.78 9.95 12.79 9.70 10.43 8.36 5.46 7.19 10.00
5 11.64 11.86 13.78 9.98 8.89 8.46 7.53 10.59 9.11
6 11.45 11.42 14.55 12.57 11.63 11.56 8.35 8.69 10.35
7 11.43 13.06 11.30 12.38 11.25 7.43 11.04 11.41 9.49
8 15.57 12.28 12.23 14.03 11.59 11 .36 10.78 11.49 10.18
9 12.68 12.61 13.79 15.96 12.83 11.62 12.22 10.85 10.85
8. 2. 3 Spatial Outtiers
The purpose of this section is to describe a simple example of a spatial data
set containing a few known outliers which serves the purpose of illustrating
our main ideas. The dataset is given in Table 8.1. It refers to measurements
at the nodes of 9 x 9 regular lattice. Sites on the lattice are indexed in
lexicographical order, so that s 1 = (1 , l)r , s2 = (1 , 2)r , ... , ss 1 = (9, 9)T.
The data were produced by simulation of the measurement-error ordi-
nary kriging model (8 .11), under a normal distribution for both and s. o
Spatial dependence in y ( s) was modelled through the isotropic spherical
semivariogram (8.20). In this example J-L = 10, (} 0 = 2, 01 = 4, 02 = 8 and
a; = 0.1. Figure 8.3 shows a selection of three-dimensional views of these
simulated data from different perspectives.
In spatial statistics outlier detection often aims at highlighting observa-
tions which are unusual with respect to their surrounding values. We call
them spatial outliers, to emphasize that anomaly is intended with respect
to the spatial distribution of the response variable. Thus a spatial outlier
might or might not be anomalaus in the traditional sense, that is in the
analysis of the univariate distribution of sample observations y 1 , ... , Yn·
In the present example a duster of spatial outliers is introduced in the
northwest corner of the grid by changing the observations at sites s 1 , s 2
and s 3 . This modification is performed by adding constants 6, 4 and 5, re-
spectively, to the original readings. As a result, contaminated values clearly
become outliers with respect to the bulk of the data, both from a spatial
and a distributional (univariate) point of view.
468 8. Spatial Linear Models
X ez 180
X e = 60
X • 240
:
.,
I
!
..
where Yi,S(i) =y( Si Is(i)) stands for the kriging predictor of Yi based only
on observations in S(i), and &ls , ( •). is a robust estimate
of the correspond-
ing kriging variance a}' 8 (•). . In §8.2.5 we show how the unknown ingredients
a-;
required for computing ' 8 (•). (i.e. v and Y) can be estimated robustly
from the available sample. Note that definition (8.23) is valid irrespective
of the role of measurement error. In fact , even under the measurement
8.2 Background on Kriging 469
error model (8.11), Yi is the only observable quantity at site Si when pre-
diction is performed from the reduced network s(i)l with Si deleted. Hence
f):" , 8 ( t.) = Yi ' s<. l , in view of the interpolation property (8.18). Some theoret-
t
and
Y(i) = (fj*(s1IS(i)) , . . . , fj*(sniS(i)))T.
Then, we suggest computing a spatial version of Cook's distance for pre-
diction at site Si as
C i,S<i l = - (Y(i)
'* - Y'*)Tf - 1(Y(i)
'* - Y'*) i = 1, ... , n, (8.24)
where Y denotes a robust estimate of the matrix Y (see §8.2.5). The minus
sign in the right-hand side of equation (8.24) is necessary to guarantee that
nonnegative distances are obtained, in view of the conditionally negative
definiteness property of the variogram function (see §8.2.2).
Figure 8.4 shows boxplots of both standardized prediction residuals ei ,S(i))
and square roots of Cook distances for the simulated data of §8.2.3 with
a cluster of spatial outliers. None of the contaminated values has predic-
tion diagnostics which can be judged extreme with respect to the specified
model. Therefore, there is clear evidence of masking due to the presence of
multiple spatial outliers. Indeed, the only apparent outlier displayed both in
Figure 8.4(a) and 8.4(b) is the observation at site s 12 , where the relatively
low value e12 ,s(1 2 l (and hence the high value C12 ,S< 12 ) has been swamped
by spatial proximity to the contaminated corner.
Such undesirable effects may be present even if we apply other exploratory
techniques specifically devised for the purpose of locating spatial anoma-
lies. For instance, Table 8.2 shows for each row and column of the grid
the absolute value of the standardized (mean- median) difference (Cressie
1993, p. 38)
u = n~ 12 (y- ii)/(0.7555() , (8.25)
where n 1 is the number of spatial sites located on a specific row or column
of the grid, and y and ii are respectively the average and median value
computed on that row or column. In addition, ( is a resistant measure of
dispersion computed as
( = (interquartile range)/1.349.
470 8. Spatial Linear Models
(a) (b)
FIGURE 8.4. Simulated kriging data with multiple outliers: boxplots of (a) stan-
dardized prediction residuals given in (8.23) and (b) square root of Cook distances
for spatial prediction given in (8.24)
TABLE 8.2. Simulated kriging data with multiple outliers: absolute values of the
standardized (mean- median) difference u. Values lul > 3 are usually of interest
for the purpose of detecting spatial outliers
Row number 1 2 3 4 5 6 7 8 9
Iu I 1.1 0.6 1.1 0.8 0.4 1.2 4.4 3.4 0.0
Col. number 1 2 3 4 5 6 7 8 9
Iu I 5.2 2.2 0.4 0.3 0.3 2.1 1.0 1.8 1.9
(8.26)
where N ( h) is the number of pairs of sites {Si, s1 } at lag h and LN(h) de-
notes summation over such pairs. The estimator 2v(h) is called the sample
variogram. When spatiallocations are irregularly spaced within S, compu-
tation of 2v(h) is usually smoothed by inclusion of the points lying in some
specified (small) tolerance region around h.
The sample variogram 2v(h) is unbiased under the ordinary kriging mod-
els (8.2) and (8.11). Unfortunately, it is biased (and sometimes badly so)
when these models are contaminated by outliers. For this reason, we adopt
the robust variogram estimator of Cressie and Hawkins (1980)
(8.28)
with respect to the parameter vector B. Here, K is the total number of lags
for which the robust variogram v(h) is computed and h 1 , . . . , hK denote
such different lags. The vector 0 is then the minimizer of the weighted least
squares fit (8.28) . More complex likelihood-based methods for estimating
() are described in many books, including Cressie (1993) and Stein (1999) .
mo-dimensional vector of indices. Let fj. s<=ol derrote the kriging predictor
,, '
at site Si given Observations in sfrno). With a slight abuse of notation, if
Si E sfrno) then Yi,s:=o) Stands equivalently for the predictor of the observed
value y(si) or of the noiseless value y*(si), according to the assumed model.
The corresponding standardized prediction residual is
(8.29)
e.t , s<=ol
L
= Yi- fj.t, s<=o>·
1,
Chapter 7: see, e.g., Figures 7.6 and 7.20, Exercise 3.4, and also Remark
4 in §2.12. In the specific context of ordinary kriging, Cerioli and Riani
(1999) showed that allowing for measurement error can ensure a high de-
gree of interchange at the very first steps of the search. Hence, under the
measurement-error model (8.11), the method is often resistant to the in-
clusion of some outliers into the starting set of locations, as they are im-
mediately ejected from it. On the contrary, the requirement that be simo)
outlier free is essential if we assume that measurement error does not exist.
In fact, in that instance, equation (8.10) shows that
Yt , 8 •<mol = Yi if
and units cannot leave the subset once they have joined it.
In our applications to ordinary kriging, where J-L is the only large-scale
parameter to be estimated from the data, we start from mo = 2, as this
is the smallest dimension for which f; . s<mol can be computed. For more
t, '
general trend surface models, where E{y( s)} = J-L( s) is a function of a num-
ber of unknown parameters, m 0 has to be increased accordingly. If C;:J
is too large, minimization (8.29) is performed over some large number of
samples, although approximate algorithms usually lead to inferior proper-
ties of robust estimators in multiple regression (Hawkins and Olive 2002).
Alternative methods for selecting the initial subset in kriging models when
n or mo are large are described in Cerioli and Riani (1999) and Riani and
Cerioli (1999).
m = m0 + 1, 000, n- 1, (8031)
m = mo + 1, 000, n- 1, (8032)
and
~ * ~ 2
m = mo + 1, 000, n- 1,
<Tm = <T[m+l],Si"'l (8°33)
~
<Tm =
~ 2
(T[n],S~"') m = mo + 1, 000, n- 1. (8034)
If there is no interchange (see §2014) the values of e;"_ and o-;", correspond
to the minimum absolute standardized prediction residual and estimated
kriging variance, respectively, among the units not belonging to the subseto
On the contrary, em and &m show the largest absolute residual and variance
among all unitso
The forward plots of e;"_ and o-;"_ will show a peak in the step prior to
the inclusion of the first outlier, as happened in previous chapters to the
smallest Mahalanobis distance not in the subseto On the other hand, with
a duster of spatial outliers the curves of em and &m will have a sharp
decrease when the first outlier joins sim),
due to the masking effecto The
same decrease will be also apparent in the plots of e;"_ and o-;", at subsequent
steps, as further outliers enter the subseto
Average residuals and kriging variances. Similar but smoother in-
formation is provided by plots of average residuals and variances, such as
n
e~ = __1_ """ m = m0 + 1, 000, n- 1, (8035)
n-m ~
i=m+l
and
~a
<Tm= - - -
1 L n
~2
<T['] sC=J m = mo + 1, 000, n- 1. (8036)
n-m t,"'
i=m+l
C\1
~
CO
-<:: <0
~
>;:>
....
0 2 4 6 8
Distance classes
FIGURE 8.5. Simulated kriging data with multiple outliers: solid line,
robust semivariogram estimates; dotted line, fitted spherical model with
{J = (1.37, 7.31, 11.38)T
We start our analysis by looking at Figure 8.6, the forward plot of stan-
., .
dardized prediction residuals e. s<"'l. The three contaminated values now
show up clearly, as they have the largest standardized residuals for most
of the search. The effect of masking is also apparent when the first outlier
joins the subset at m = 79. Table 8.3 reports the units included in the last
10 steps of the forward search for both contaminated and original data.
Contaminated locations move into sim)
at the last three steps, with an
ordering which reflects their degree of outlyingness.
Also moves at previous steps can be motivated by inspection of the data
given in Table 8.1. From the uncontaminated readings we see that they refer
to locations whose values are less in agreement with the postulated spatial
model. In addition, the prediction residual corresponding to the largest un-
contaminated value y(s 2 1) = 15.6 has an upward increase at m = 68, when
the relatively low observation at site s 40 is included. Locations entering in
steps m < 79 are also included in the last stages of the forward search run
on the original data without contamination. As usual, the forward search
algorithm provides a "natural" ordering of the observations according to
the spatial structure implied by the measurement-error ordinary kriging
model (8.11), even in the case of well-behaved data.
Information about the presence of three spatial outliers is reinforced by
the inspection of Figure 8.7, which shows the forward plots of statistics
(8.31) through (8.34), restricted for ease of presentation to steps m::::: 17.
8.4 Contaminated Kriging Examples 479
C\J
"'
[ij
"
"0
·c;;
~
c:
0
u
'ö
0
a.
Q)
-o
U5
0 20 40 60 80
Subset size m
FIGURE 8.6. Simulated kriging data with multiple outliers: forward plot of stan-
dardized prediction residuals. Units are monitored until they join the subset
TABLE 8.3. Simulated kriging data with multiple outliers: units included in the
last 10 steps of the forward search for contaminated and original data
Steps 72 73 74 75 76 77 78 79 80 81
Contaminated data 17 76 5 34 64 60 21 1 2 3
Original data 48 29 76 17 18 34 5 64 21 60
The sharp peak at m = 78 in the plots of e;", and <7;"_ (panels a and c)
anticipates the inclusion of the first outlier, unit s 1 . This Observation is
also responsible for the elbows at m = 79 in the plots of em (panel b) and
<7m (panel d). Apart from the first stages, where results may be unstable,
all such plots lead to the same conclusions and clearly unveil the three
contaminated values.
(a) (b)
20 30 40 50 60 70 80 20 30 40 50 60 70 80
(c) (d)
20 30 40 50 60 70 80 20 30 40 50 60 70 80
Subset size m Subset size m
FIGURE 8.7. Simulated kriging data with multiple outliers: forward plot of (a)
e;" , (b) ern, (c) fT;", (d)
frm. Allplots indicate the existence of 3 spatial outliers
involves a specific area within the study region where the postulated model
does not apply. We call this area a "nonstationary packet".
Figure 8.8 displays three-dimensional views of the contaminated data
from different perspectives. Visual inspection of the data may suggest a
steep gradient towards the end of some columns, although it is difficult
to tell where this gradient actually begins and which sites are involved.
Furthermore, comparison of Figure 8.8 and Figure 8.3 (where no contam-
ination occurs) shows little difference, suggesting that random noise could
also contribute to the perceived spatial trend.
Classical spatial exploratory techniques are again of no help in identifying
the presence of contaminated values. For instance, in the first two rows
of the grid, where the atypical area is actually located, the standardized
(mean- median) difference (8.25) takes the values u = - 1.4 and u = - 0.2,
respectively. Hence we move to our forward approach.
Figure 8.9 is the forward plot of standardized prediction residuals for this
dataset. Now the 9 contaminated observations are clearly visible, as they
have the largest standardized prediction residuals along the search. The
effect of masking also appears when the first outlier, unit s 5 , joins the subset
at m = 72. This site has the smallest observation in the nonstationary
packet. Other sites in that area follow at subsequent steps, as Table 8.4
8.4 Contaminated Kriging Examples 481
X 1e ~ 120 X
~-
X : 240
TABLE 8.4. Simulated kriging data with a nonstationary pocket: units included
in the last 12 steps of the forward search. Two fitted semivariogram models
Fitted Steps
semivariogram 70 71 72 73 74 75 76 77 78 79 80 81
Spherical 21 64 5 1 3 2 12 4 10 11 60 6
Linear 5 1 3 2 12 21 4 64 11 10 60 6
N
"':::>
'·u;"
"C
~
c
0
u
15
0
~
a.
-o
ü5
C)l
0 20 40 60 80
Subset size m
FIGURE 8.9. Simulated kriging data with a nonstationary pocket and estimated
spherical semivariogram: forward plot of standardized prediction residuals
(a) (b)
20 30 40 50 60 70 80 20 30 40 50 60 70 80
(c) (d)
0
-i
<0
ll)
"'
20 30 40 50 60 70 80 20 30 40 50 60 70 80
Subset size m Subset size m
FIGURE 8.10. Simulated kriging data with a nonstationary pocket and estimated
spherical semivariogram: forward plot of (a) e~ , (b) em , (c) a-~, (d) 8-m
Table 8.5 gives the wheat yield data, taken from Cressie (1993, p . 455)
and also available through the module SPATIALSTATS of S-Plus (Math-
soft 1996). These data consist of 500 observations on the production of
wheat grain (in pounds). Measurements refer to a 20 x 25 lattice of plots.
Row indices run in the north-south direction, while column indices run in
the east-west direction. The global size of the study area is 1 acre. Although
there is some ambiguity as regards the actual plot size, plot dimensions are
taken tobe 3.30 meters (10.82 feet) in the east-west direction and 2.51 me-
ters (8.25 feet) in the north-south direction, as in Cressie (1993). The data
come from a uniformity trial, that is an agricultural experiment in which
there are no differences in treatment (e.g., Hinkelman and Kempthorne
1994). The purpose of this study was to assess natural variation in soil
fertility and so to determine the optimal plot size for future wheat yield
trials.
The wheat yield data have been much studied in the statisticalliterature,
since their introduction by Mercer and Hall in 1911. Several analyses have
tried to account for spatial autocorrelation between neighbouring plots,
through application of spatial autoregressive models (see §§8. 7 and 8.8)
and spectral methods for spatial processes. Cressie (1993, §4.5) provides a
brief account of this literatme and performs a detailed exploratory study
TABLE 8.5. Mercer and Hall wheat-yield data: readings are at the nodes of a 20 x 25 regular lattice
Row Column
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 3.63 4.15 4.06 5.13 3.04 4.48 4.75 4.04 4.14 4 4.37 4.02 4.58 3.92 3.64 3.66 3.57 3.51 4.27 3.72 3.36 3.17 2.97 4.23 4.53
2 4.07 4.21 4.15 4.64 4.03 3.74 4.56 4.27 4.03 4.5 3.97 4.19 4.05 3.97 3.61 3.82 3.44 3.92 4.26 4.36 3.69 3.53 3.14 4.09 3.94
3 4.51 4.29 4.4 4.69 3.77 4.46 4.76 3.76 3.3 3.67 3.94 4.07 3.73 4.58 3.64 4.07 3.44 3.53 4.2 4.31 4.33 3.66 3.59 3.97 4.38
4 3.9 4.64 4.05 4.04 3.49 3.91 4.52 4.52 3.05 4.59 4.01 3.34 4.06 3.19 3.75 4.54 3.97 3.77 4.3 4.1 3.81 3.89 3.32 3.46 3.64
5 3.63 4.27 4.92 4.64 3.76 4.1 4.4 4.17 3.67 5.07 3.83 3.63 3.74 4.14 3.7 3.92 3.79 4.29 4.22 3.74 3.55 3.67 3.57 3.96 4.31
6 3.16 3.55 4.08 4.73 3.61 3.66 4.39 3.84 4.26 4.36 3.79 4.09 3.72 3.76 3.37 4.01 3.87 4.35 4.24 3.58 4.2 3.94 4.24 3.75 4.29
7 3.18 3.5 4.23 4.39 3.28 3.56 4.94 4.06 4.32 4.86 3.96 3.74 4.33 3.77 3.71 4.59 3.97 4.38 3.81 4.06 3.42 3.05 3.44 2.78 3.44
8 3.42 3.35 4.07 4.66 3.72 3.84 4.44 3.4 4.07 4.93 3.93 3.04 3.72 3.93 3.71 4.76 3.83 3.71 3.54 3.66 3.95 3.84 3.76 3.47 4.24
9 3.97 3.61 4.67 4.49 3.75 4.11 4.64 2.99 4.37 5.02 3.56 3.59 4.05 3.96 3.75 4.73 4.24 4.21 3.85 4.41 4.21 3.63 4.17 3.44 4.55
10 3.4 3.71 4.27 4.42 4.13 4.2 4.66 3.61 3.99 4.44 3.86 3.99 3.37 3.47 3.09 4.2 4.09 4.07 4.09 3.95 4.08 4.03 3.97 2.84 3.91
.$
Q)
11 3.39 3.64 3.84 4.51 4.01 4.21 4.77 3.95 4.17 4.39 4.17 4.17 4.09 3.29 3.37 3.74 3.41 3.86 4.36 4.54 4.24 4.08 3.89 3.47 3.29
"0 12 4.43 3.7 3.82 4.45 3.59 4.37 4.45 4.08 3.72 4.56 4.1 3.07 3.99 3.14 4.86 4.36 3.51 3.47 3.94 4.47 4.11 3.97 4.07 3.56 3.83
0
::E 13 4.52 3.79 4.41 4.57 3.94 4.47 4.42 3.92 3.86 4.77 4.99 3.91 4.09 3.05 3.39 3.6 4.13 3.89 3.67 4.54 4.11 4.58 4.02 3.93 4.33
~
Q)
14 4.46 4.09 4.39 4.31 4.29 4.47 4.37 3.44 3.82 4.63 4.36 3.79 3.56 3.29 3.64 3.6 3.19 3.8 3.72 3.91 3.35 4.11 4.39 3.47 3.93
.: 15 3.46 4.42 4.29 4.08 3.96 3.96 3.89 4.11 3.73 4.03 4.09 3.82 3.57 3.43 3.73 3.39 3.08 3.48 3.05 3.65 3.71 3.25 3.69 3.43 3.38
~
16 5.13 3.89 4.26 4.32 3.78 3.54 4.27 4.12 4.13 4.47 3.41 3.55 3.16 3.47 3.3 3.39 2.92 3.23 3.25 3.86 3.22 3.69 3.8 3.79 3.63
~
:;; 17 4.23 3.87 4.23 4.58 3.19 3.49 3.91 4.41 4.21 4.61 4.27 4.06 3.75 3.91 3.51 3.45 3.05 3.68 3.52 3.91 3.87 3.87 4.21 3.68 4.06
ol
0.. 18 4.38 4.12 4.39 3.92 4.84 3.94 4.38 4.24 3.96 4.29 4.52 4.19 4.49 3.82 3.6 3.14 2.73 3.09 3.66 3.77 3.48 3.76 3.69 3.84 3.67
rn
19 3.85 4.28 4.69 5.16 4.46 4.41 4.68 4.37 4.15 4.91 4.68 5.13 4.19 4.41 3.54 3.01 2.85 3.36 3.85 4.15 3.93 3.91 4.33 4.21 4.19
00 20 3.61 4.22 4.42 5.09 3.66 4.22 4.06 3.97 3.89 4.46 4.44 4.52 3.7 4.28 3.24 3.29 3.48 3.49 3.68 3.36 3.71 3.54 3.59 3.76 3.36
-.:!'
00
-.:!'
8.5 Wheat Yield Data 485
(a) (b)
20 30 40 50 60 70 80 20 30 40 50 60 70 80
(c) (d)
0
-.i
0
c?
0
C\J
C\i
20 30 40 50 60 70 80 20 30 40 50 60 70 80
Subset size m Subset size m
FIGURE 8.11. Simulated kriging data with a nonstationary pocket and estimated
linear semivariogram: forwa rd plot of (a) e ~ , (b) em , (c) 8-~ , (d) 8-m. To be
compared with Figure 8.10
XA = 120 X : 180
X e = 80
FIGURE 8.12. Wheat yield data: three-dimensional views of the data from six
perspectives
0
.,;
-- ;--!
-I I
II ~ -I I I I - . I ~ II I ~ il i;iI I
-
II)
..;
0
..;
i
II)
oi
0
oi
.:.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0
.,;
I !I
II) --
PI;~~~~ I ; I I I I I I I
..;
0
..;
II)
oi
0
oi
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
r__ ,;------
'<t
- ----- ----- - - - - -- ~
"'
"iö
"
'C
'iii
C\1
~
c:
0
n
u 0
~
a.
'E
(/)
C)J
"f
290
I()
..,.iii""'
(ij
e
c
0
n
'ö
0
e0.
,;
ijj
'?
208
FIGURE 8.15. Wheat yield data: forward plot of standardized prediction resid-
uals. Units are monitored until they join the subset . The trajectory for location
8430 is displayed in black
~
0
0
"'
0 ,..,
~
0 0
~
~
0
0 0
0
~
0
0
~
~
4 76 4 84 492 500
Subs e t size m ~ Subse t size m
0
0
~
0
0
"'
~
0
0
~
0
0
0
~
FIGURE 8.16. Wheat yield data: forward plot of, left-hand panel, e;" and,
right-hand panel, a-;;.,, together with zooms taken at the end of the search
0 0
>-<D >-<D
(.) (.)
c:: c::
<ll <ll
:I :I
O"o 0"0
~ .". ~ .".
u.. u..
0 0
C\1 C\1
0 0
5 10 15 20 5 10 15 20 25
Verticallag Horizontal lag
FIGURE 8.17. Wheat yield data: frequency distribution of, left-hand panel, row
distances d1m and, right-hand panel, column distances d2m - Note the secondary
peak at lag 3 in the right panel
the right-hand panel of Figure 8.17 thus suggests the presence of a cyclic
component in the east-west direction, with length equal to 3 columns. It
is interesting to note that McBratney and Webster (1981) reached a sim-
ilar conclusion through more complex and less intuitive spectral analysis
techniques.
8.6 Reftectance Data 491
g,-------------------------. 0
~
g
"'
::'
"'0
" 0
ai
0
~ a:i
Su b se t si z e m
0
~
"
;:>
150 2 00 250 3 00 350 400 4 50 500 150 2 00 250 300 3 50 400 4 50 500
Subset size m Subset size m
FIGURE 8.18. Wheat yield data: forward plots of e;", left-hand panel, before and,
right-hand panel, after perturbation with Gaussian noise in the second decimal
digit. A constant variogram model is fitted in both cases
Row Column
1 2 3 4 5 6 7 8 9
1 32 35 36 37 38 47 34 35 31
2 38 39 43 41 55 42 38 34 37
3 50 62 46 39 55 37 40 32 28
4 45 50 43 33 24 38 44 42 39
5 40 36 16 18 31 37 52 30 24
6 37 14 10 21 26 30 35 41 19
7 10 12 5 12 17 18 20 24 23
8 50 62 19 6 14 17 17 5 6
9 46 35 0 4 5 5 6 0 0
The observations in Table 8.6, taken from Haining (1990, p. 217), give
reflectance values extracted from an aerial survey along the south coast
of England. The purpose of the survey was to monitor pollution levels
arising from the pumping of waste material into the English Channel, in
a coastal area where sewage disposal had taken place. Higher reflectance
values indicate higher levels of pollution.
The spatial locations where such values were collected form the nodes
of a 9 x 9 regular grid, so that n = 81. As in our previous examples,
sites on the grid are indexed in lexicographical order. For the purpose of
statistical analysis, the sides of the lattice were standardized to have unit
length and refiectance values were multiplied by 10- 1 . The last row of the
data set was missing in the original source, although additional information
(i.e. residuals from median polish) was provided about it. We have thus
reconstructed all missing values from this supplementary information.
These data are likely to contain a large-scale effect (trend surface), an
autocorrelation component and local-scale effects ( nonstationary pockets) .
The closer an observation is to the source of pollution, the greater its re-
flectance value is likely to be. In this case the parameter values of the trend
component indicate the dispersal gradient. One reason for the presence of
spatial correlation is that the reflectance value recorded in any pixel is a
partial averaging of reflectance values in neighbouring pixels. This appears
to be a general problern with remotely sensed data. Furthermore, at a pro-
cess level, pollution might be affected by local mixing and local dispersal
due tosmall scale turbulence and wave action.
The complicated spatial structure of the reflectance data set is revealed
by Figure 8.19. In a first study, Haining (1987) concluded that reflectance
in this sample area can be described by a "first order trend surface model
with autoregressive errors" . Subsequently, the same author (Haining 1990,
8.6 Refiectance Data 493
X ~ 120 XA = 180
..
XA •80
X e = 240
'"'"
V
::J
"C
·c;;
~
c:
uu
0 C\l
~
c.
'E
Cl) 0
0 20 40 60 80
Subset size m
(o) (b)
~
<D
0
<D
<D
"'
N
N
"'
00
~
47 59 65 71 77 ~ 41
su bset s ize m
FIGURE 8.21. Reftectance data: curves of (a) e:r, and (b) &;,. for m 2 40 and
O"; = 0.1.
A remarkable peak at m = 77 in both panels
r __ §_S_,...."'
ii~~::~~::~:::~::'_ :__ _ __
Jlt"'f ==v=-~
c:
0
n
'ö
0
~
a.
u C)l
ü5
"t
20 40 60 80
Subset size m
w·.
•,J 1 if s1 is immediately to the north, (8.40)
south, east or west of Si,
w·•,J. 0 otherwise.
w·.
•,J 1 if units Si and s1 share a common boundary, (8.41)
w •,J
·. 0 otherwise.
ss 86 87 Sg
Sg 810 su
neighbours than interior points. This is immediately seen in the simple ex-
ample of Figure 8.23. For instance, location s 1 has only two neighbours in
S (i.e., sites s 2 and s 5 ) instead of four, since no information is available
about the response values at the unobserved sites u 1 and u2. This defi-
ciency is the source of non-negligible bias in the statistical properties of
estimators, which is particularly severe if n is small. Furthermore, the the-
oretical properties of autoregression models are affected by the existence
of boundary sites with a reduced number of neighbours. For example, even
the simplest form of SAR model, such as (8.43) in §8.7.2 with p = 1, is
not second-order stationary on a finite lattice S under the neighbourhood
scheme (8.40) (Exercise 8.7).
For these reasons, we modify the weight matrix W in order to take edge
effects into account. In our applications we explore two simple but widely
adopted techniques of edge correction. The first one is toroidal correction
(Ripley 1981, p. 152), which wraps a reetangular region onto a torus. Edge
points on opposite borders are thus considered to be close, and all sites
have the same number of neighbours. For example, in the 4 x 4 grid of
Figure 8.23, location s 1 becomes a neighbour also of sites s 4 and s 13 , so
500 8. Spatial Linear Models
wl,4 = W1,13 = 1.
and so forth.
In the second instance we apply the asymmetric Neumann correction
(Moura and Bairam 1992, p. 338), where the off-region neighbours of a
boundary site have the same response value as the site itself. That is, in
Figure 8.23
(8.42)
and
However, in our applications of §§8.8 and 8.10 the results obtained through
the mirror correction are usually very similar to those produced under the
standard Neumann boundary assumption (8.42). For this reason we will
not report them in detail.
Adjusting the weight matrix W to allow for edge effects is a step that
must be taken prior to our forward analysis. Hence, it is important to
judge what is the consequence of different choices on the results from the
search. It is hoped that alternative methods of edge correction will give rise
approximately to the same conclusions. We shall see that this is in fact the
case in all our applications.
8.7 Background on Spatial Autoregression 501
(8.43)
n 1
L(ß, a 2 , p; y) = -2log(27rcr 2 ) +log! In- pWI- 2a 2 (y- Xß)TS(y- Xß),
(8.44)
where
502 8. Spatial Linear Models
TABLE 8.7. Selection of cases from simulated SAR data with multiple outliers:
readings are at the nodes of a 12 x 12 regular lattice
Row Column i Yi XT
z
1 1 1 85.75 0.384 1.239 0.324
1 2 2 75.86 -0.017 0.844 -1.048
1 3 3 81.17 -0.296 0.333 1.452
... ... ... ... ... ... . ..
1 12 12 74.14 1.507 -1.691 0.017
2 1 13 74.60 -0.293 -1.006 1.678
... ... ... ... . .. . .. . ..
12 12 144 80.61 2.122 -0.697 -0.452
similar pointalso in §8.3.1, when describing the forward search for ordinary
kriging. Furthermore, as in the familiar case of independent observations,
the globallack of fit statistic eT ej(n- p) can be decomposed into a sum of
individual contributions (the standardized residuals) which can be easily
identified with the elements of y.
After so many examples of masking and swamping for backward diagnos-
tics, one might expect that these shortcomings will strongly affect also the
standardized residuals in (8.47). To confirm this expectation we simulated
a data set on a 12 x 12 regular grid S. The response vector y was generated
according to the SAR model (8.43) with p = 4, ß = (20, 5, 4, 3)T, p = 0.2
and cr 2 = 0.5. Simulation of (In- pW)- 1 c was performed after generation
of a normally distributed disturbance vector c and addition of the constant
value 20. The 144 x 3 design matrix X was also obtained by simulation
from a multivariate normal distribution.
Response values Yi were then modified at a 4 x 4 block of sites at the
crossings of rows 1, . .. , 4 and columns 1, ... , 4, plus an additionallocation
on its border (s 49 , in lexicographical order, corresponding to the first Ob-
servation in row 5). Since contamination was performed after simulation of
y, we can think of the 17 modified values as a duster of additive spatial
outliers. Cantamination was not very marked in this example, as it simply
amounted to subtracting small constants (ranging from 0.05 to 2.2) from
the original readings. The full simulated data set after contamination is re-
ported in the website of the book, while Table 8. 7 just shows some selected
rows of it. Figure 8.24 shows a selection three-dimensional views of the re-
sponse values from different perspective angles. Visual inspection does not
provide any particular evidence of contamination.
Figure 8.25 shows a boxplot and histogram of standardized residuals
(8.47) for this simulated data set with multiple spatial outliers. Not sur-
prisingly, masking affects both diagnostic plots and does not allow proper
understanding of the features of the data. Indeed, all contaminated values
remain undetected and only a natural spatial outlier at site s 36 is high-
504 8. Spatial Linear Models
XAn le = 80
X An le = 240
i
.1;
I
hi = xf(XT X)- 1 xi ,
usually called the leverage of the ith observation. The leverage measures
how far the regressor values for that unit are from the bulk of the data in
the space spanned by the columns of X. Detection of high leverage points
is an important step of model building because they may exert an undue
inftuence on the computed fit. The effect of a high leverage point is to force
8.7 Background on Spatial Autoregression 505
N
g
36 - - - - 0
- -3 -2 -1 0 2
FIGURE 8.25 . Simulated SAR example with multiple outliers: boxplot and his-
togram of standardized residuals
the fitted model close to the observed value of the response variable. Hence,
high leverage points typically have small residuals, even in the absence of
masking (see, e.g. , Atkinson and Riani 2000, Chapter 2).
The concept of leverage can be generalized in two useful ways which take
account of spatial dependence in the response variable. The first extension
is to define individualleverages as the diagonal elements of
(8.48)
(8.49)
which are called complementary leverage8. Martin (1992) shows the prop-
erties of both generalized leverage measures. He makes the point that with
correlated data the most useful definitionisthat of equation (8.49).
To see the performance of these generalized leverage measures with mul-
tiple high leverage points, we introduce a new simulated data set, similar
to the one described in Section 8. 7.3. Here S corresponds to the tiles of
a 20 x 20 regular grid. The response vector y was simulated according to
model (8.43) with p = 4, ß = (20 , 5, 4, 3)T as before, p = 0.1 and a 2 = 1.
The 400 x 3 design matrix X was also obtained by simulation from a multi-
variatenormal distribution, but the covariate values at sites 8 2 , 8 3 and 8 24
506 8. Spatial Linear Models
TABLE 8.8. Selection of cases from simulated SAR data with multiple high lever-
age points: readings are at the nodes of a 20 x 20 regular lattice
Row Column i Yi XT
1 1 1 26.64 -1.128 '
0.434 -0.269
1 2 2 61.10 2.163 2.92 2.012
1 3 3 64.30 3.486 1.350 2.453
... ... ... ... ... ... . ..
1 20 20 29.64 -1.231 0.494 0.121
2 1 21 36.03 1.241 -1.529 0.887
... .. . ... ... ... ... .. .
20 20 400 33.98 0.099 0.615 -0.890
·2 ·I 0 I 2 3
+l
y
.. ~- .. "., . ••
..
.·... .•• ..~
~
l_ . .._
• • g
·. J~
+ 3
:.=~"\.
x1
.. *
.
• I
..
~:.
3+
x2 . ..
.:·,..
-
,...
.
....
+2
• + 3
~
M
"'
0
';-
.
• • '~"'
:r····~. !" 2
2
.......
:\+.2;
. ... . .••' .· .... '..•
0 • ,.• •
x3
....
';-
~
\
. ....
20 30 40 50 so -3 -2 -1 0 I 2 3
FIGURE 8.26. Simulated SAR example with multiple high leverage points: scat-
terplot matrix. The crosses correspond to the three modified high-leverage sites
"'
8
3 ~~:::
2==== ...
0
"'"'d
24 - - - -
0
FIGURE 8.27. Simulated SAR example with multiple high leverage points: box-
plot and histogram of complementary leverages. The three artificial high-leverage
sites are labelled in the boxplot
v2
.----------------------------.
ss
Sg
810
sn
and
(8.54)
512 8. Spatial Linear Models
These diagnostics are computed for each value of m. Their display extends
the forward leverage plot of Atkinson and Riani (2000, p. 34) to spatially
correlated observations.
Spatial interaction parameter. The estimate of the spatial interac-
tion parameter p does not remain constant during the search as new blocks
of spatial locations join the fitting subset. The forward plot of the (ap-
proximate) maximum likelihood estimates Prn provides a diagnostic tool
for detecting the effect of spatial outliers on estimation of p.
Likelihood ratio test. At each step of the search we also compute the
signed square-root of the likelihood ratio statistic (see §2.3) which tests the
null hypothesis
Ho: p =Po,
for a number of plausible null values Po.
To display the form of this statistic, consider the approximate loglikeli-
hood (8.51) and let ßrno and a-;, 0 be the corresponding estimates of ß and
o- 2 computed with A = si=) and p = PO· The test statistic at step m is
then (Exercise 8.12)
~ ~ } 112
A - 2 T
- o-rn r 8 (m) .::.S(m)T s<m)
• • *
+ O"rnors(m)
A-2 T
* 0
.::.8
*
(m) 0 Ts<=lo
•
.(8.55)
"'
----- -------------~----
20 40 60 80 100 120 140
Subset size m
FIGURE 8.29. First simulated SAR example with multiple spatial outliers: for-
ward plot of standardized residuals for b = 16. Curves corresponding to con-
taminated sites are shown in black. Toroidal edge corrections and approximate
maximum likelihood
C\1
------- -------------
20 40 60 80 100 120 140
Subset size m
FIGURE 8.30. First simulated SAR example with multiple spatial outliers: for-
ward plot standardized residuals for b = 16. Curves corresponding to contami-
nated sites are shown in black. Asymmetrie Neumann edge corrections and ap-
proximate maximum likelihood. To be compared with Figure 8.29
C\1
----- ------------ 36
FIGURE 8.31. First simulated SAR example with multiple spatial outliers: for-
ward plot of standardized residuals for b = 16. Curves corresponding to contam-
inated sites are shown in black. Toroidal edge corrections and exact maximum
likelihood. To be compared with Figure 8.29
516 8. Spatial Linear Models
FIGURE 8.32. First simulated SAR example with multiple spatial outliers: for-
ward plot of standardized residuals for a reduced block size (b = 9). Curves cor-
responding to contaminated sites are shown in black. Toroidal edge corrections
and approximate maximum likelihood. Tobe compared with Figure 8.29
Also, the choice of a different block size does not appreciably change the
results from the search. For instance, Figure 8.32 displays the forward plot
of standardized residuals for b = 3 x 3, toroidal correction and approxi-
mate likelihood. Again, this plot depicts essentially the same information
as Figure 8.29, although there is more variability in the first stages of the
search, due to the smaller size of the fitting subset. The effect of masking
also shows up a bit earlier, as spatial outliers are now spread over a larger
number of blocks.
8.9.2 Estimation of p
We introduce a new example where multiple spatial outliers have a dispro-
portionate effect on the estimate of p, not only on the residuals from the
fitted model. For this purpose, we simulated a data set from the SAR model
(8.43) with S a 12 x 12 grid, ß = (20 , 5, 4, 3)T as in §8.9.1, p = 0.1 and
<7 2 = 1. Then we modified Yi at a block of 16 sites located at the crossings
of rows 1, ... , 4 and columns 1, ... , 4, by subtracting 8 from all of them.
The full simulated data set after contamination is reported in our website,
while Table 8.9 provides a selection of it. Fixed contamination increases
the similarity of the response values in the outlying corner and thus has
a larger infl.uence on estimation of p. Different three-dimensional views of
the contaminated data set are shown in Figure 8.33.
8.9 SAR Examples With Multiple Contamination 517
TABLE 8.9. Selection of cases from the second simulated SAR example with
multiple outliers: readings are at the nodes of a 12 x 12 regular lattice
Row Column i Yi XT
•
1 1 1 32.78 0.548 0.822 0.453
1 2 2 22.47 -0.816 0.717 -0.933
1 3 3 20.31 -0.168 -1.25 0.112
. .. . .. .. . ... ... . .. . ..
1 12 12 41.13 -0.011 1.509 0.488
2 1 13 14.05 -2.078 0.367 -0.882
. .. ... ... .. . ... ... . ..
12 12 144 26.53 0.089 -0.905 -1.404
X e ~ 120 X ~ 180
XA e~ 60
X = 240
FIGURE 8.34. Second simulated example with multiple spatial outliers: trajec-
tories of standardized residuals for individuallocations. Curves corresponding to
contaminated sites are shown in black. Toroidal edge corrections and approximate
maximum likelihood
1.0
FIGURE 8.35. Second simulated example with multiple spatial outliers: left-hand
panel, forward plot of the maximum likelihood estimate of p; right-hand panel,
forward plot of the signed square-root likelihood ratio statistic for testing
Ho : p = po , for a few null values po 2': 0, with 99% asymptotic confidence
bands. Toroidal edge corrections and approximate maximum likelihood
FIGURE 8.36. Second simulated example with multiple spatial outliers: left-hand
panel, forward plot of the maximum likelihood estimate of p; right-hand panel,
forward plot of the signed square-root likelihood ratio statistic for testing
Ho : p = po, for a few null values Po ?: 0, with 99% asymptotic confidence bands.
Asymmetrie Neumann edge corrections and approximate maximum likelihood.
To be compared with Figure 8.35
I!)
I!)
FIGURE 8.37. Second simulated example with multiple spatial outliers: left-hand
panel, forward plot of the maximum likelihood estimate of p; right-hand panel,
forward plot of the signed square-root likelihood ratio statistic for testing
Ho : p = po , for a few null values po ?: 0, with 99% asymptotic confidence
bands. Toroidal corrections and exact maximum likelihood. To be compared with
Figure 8.35
8.9 SAR Examples With Multiple Cantamination 521
(0
0
0
-------
-------------
--------
===---:.-
.......
N
0
0
0
0
385 390 395 400 385 390 395 400
Subset size m Subset size m
FIGURE 8.38. Simulated SAR example with multiple high leverage points: for-
ward plots of, left-hand panel, leverages and, right-hand panel, complementary
leverages in the last 20 steps of the search. Curves corresponding to contaminated
high-leverage sites are shown in black. Toroidal edge corrections and approximate
maximum likelihood
V /;,~~~~~------ ~:~~~~~~--------
,;·=·:> ~~=~-==-::-: :-.:-::-::-.:-: :-:::-.:-:-:::-.:-~;-:-;-:;~-- =---==---=-=---:::-.:- ~7
~:~~;~~:-~~:~~-:.::.:::::==~:~~~=-=~::-~=~:~ 5
--- -----------
100 200 300 400 500
Subset size m
FIGURE 8.39. Wheat yield data: forward plot of standardized residuals from the
SAR model. Toroidal edge corrections and approximate maximum likelihood
C\1
C\1
c:i
~0
0
0
C\1
c:i
(X)
c:i
<0
~
c:i
100 200 300 400 500 100 200 300 400 500
Subset size m Subset size m
FIGURE 8.40. Wheat yield data: left-hand panel, forward plot of the maximum
likelihood estimate of p; right-hand panel, forward plot of the signed square-root
likelihood ratio statistic for testing Ho : p = po, for a few null values po 2:: 0, with
99% asymptotic confidence bands. Toroidal edge corrections and approximate
maximum likelihood
tic TLR,m, for a number of null values Po > 0. In the right-hand panel,
the corresponding 99% pointwise confidence bands are displayed. The fi-
nal estimate Pn = 0.16 corresponds to that obtained through standard
maximum likelihood. As the search stabilizes, values of p belonging to the
524 8. Spatial Linear Models
interval (0.15; 0.175) become increasingly plausible. We see that the blocks
included in the last steps of the search have only a minor effect on the
estimated autocorrelation parameter p. This is sensible behaviour in the
absence of clusters of masked spatial outliers.
(1992), Christensen, Johnson, and Pearson (1992) and Haslett and Hayes
(1998).
526 8. Spatial Linear Models
8.12 Exercises
Ordinary kriging
Exercise 8.1 Prove equations (8.8} and (8.9}, giving the ordinary krig-
ing predictor f;(soiS) and its mean-squared prediction error a- 2 (soiS) under
model (8.2}.
Exercise 8.2 Obtain the analogues of equations (8.8} and (8.9}, giving the
ordinary kriging predictor f;(soiS) and its mean-squared prediction error
a- 2 (s 0 IS) under model (8.2}, in terms of the covariance function c(h).
Exercise 8.3 Show that the ordinary kriging predictor f;(soiS) is an exact
interpolator under model (8.2) .
Exercise 8.4 Show that the variagram function 2v(h) must be condition-
ally negative definite.
Exercise 8.5 Consider the measurement error model (8.11}. If 8(s) is a
second-order stationary process, write the ordinary kriging predictor as a
function of the measurement error variance a-~ (Cerioli and Riani 1999).
Exercise 8.6 Under the ordinary kriging model, let C be the n x n covari-
ance matrix of y and J be an n x 1 vector of ones. Furthermore, define Y(i)
to be vector y with the i th observation excluded, Ce i) to be the (n -1) x (n-1)
covariance matrix of Y(i)> c(i) be the (n- 1) x 1 vector of covariances be-
tween Yi and Y(i)> and J(i) to be vector J with the ith entry removed. Show
that the variance of the prediction residual e(i) = ei,S<,> = Yi - Yi,S<,l is
b2
var(e(i)) = --•---,
bi- hi
Spatial autoregression
Exercise 8. 7 Show that the first order BAR model (8.43) is not stationary
without edge corrections.
Exercise 8.8 Let Wi be the ith eigenvalue of W . Show that the spatial
interaction parameter p of a jirst-order BAR model with W = wr must
satisfy
pwi < 1 for all i = 1, ... , n.
Exercise 8.9 Show that /3 in (8.45) and a2 in (8.46} are the maximum
likelihood estimates of ß and a- 2 under the jirst-order BAR model (8.43}.
Give the profile loglikelihood for p.
8.12 Exercises 527
Exercise 8.13 Define Y(i) tobe the (n-1) x 1 vector of observations in the
reduced network S(i)> excluding location si. Let E(yiiY(i)) and var(yiiY(i))
be the conditional expectation and the conditional variance of Yi, given Y(i).
Furthermore, as in model (8.43), let X denote a covariate matrix of di-
mension n x p and rank p, allowing also for the mean effect, and ß be
a p-dimensional pammeter vector. A first-order conditional autoregressive
model (GAR, for short) is defined by the following assumptions.
• A uforegressive structure of conditional expectations
n
E(yiiY(i)) = xT ß + P L Wi,j(Yj- xJ ß) i = 1, . .. ,n, (8.56)
j=l
where xf is the ith row of X and the nonnegative weights Wi,j provide
the neighbourhood structure of Si. These weights are defined as in
§8. 1.1, so that wi ,i = 0, but with the additional constmint that
• Gonditional homoschedasticity
i = 1, ... ,n.
The conditional density ofyi given Y(i), say f(YiiY(i);ß;a 2 ;p) is uni-
variate normal.
To summarize, a first order GAR model assumes that, for i = 1, ... , n,
(8.58)
where
~CAR = a 2 (In- pW)- 1
and W = (wi ,j), i,j = 1, ... , n, is the n x n weight matrix.
(b) Write the loglikelihood function of a first order GAR model and show
why it is different from the loglikelihood of the first order SAR model {8.43}.
Write the resulting profile loglikelihood for p.
(c) Let f-l* = (f-li, ... ,f-l~)T be the n x 1 vector of conditional expectations
f-l: = E(YiiY(i)) and define E:* = y- f-l*. Show that
cov(c*, y) = a 2 In.
8.13 Salutions
Exercise 8.1
Under the ordinary kriging model (8.2), we look at a predictor which is both
linear in the sample values, y(soiS) = 2::~= 1 "liYi, and uniformly unbiased,
i.e. E{y(soiS)} = E{y(s 0 )} = f-l for all f-l· The unbiasedness condition yields
2::~= 1 r]iE(yi) = 2::~ 1 "lif-l = f-l, that is
(8.60)
n n n n n
L ryi{y(so)- yi} 2 +L L 'fJi'fJjYiYj - L L 'fJi'rJiY'f(8.61)
i=1 i=1 j=1 i=1 j=1
r? J = 1,
Yry+aJ =v
{ (8.64)
'T}T J = 1.
(8.65)
(8.66)
- A 11
-1 a12 )
1 ) (8.67)
y+-1 = b-1 ( b
1-1 + 'V'-1JJT'Y'-1
.l
-JT1-1
.l
(8.68)
1 _1 ( v + 1 ~;;~~1 v J) , (8.69)
Exercise 8.2
We now assume second-order stationarity, that is
Again, we look forthebest linear predictor y(soiS) = 2:~= 1 'TJiYi , under the
unbiasedness constraint 2::~ 1 'TJi = 1. Working in terms of the covariance
8.13 Solutions 531
(8.71)
where c = (c(s 0 - s 1 ), ... , c(s 0 - sn))T and Cis the nxn symmetric matrix
whose elements are given by c(si- Sj), for i,j = 1, ... , n. Again, here and
in subsequent exercises we consider only the case where C is non singular.
From equations (8.70) and (8.71), the corresponding mean-squared pre-
diction error is
Exercise 8.3
The exact interpolation property of the ordinary kriging predictor (without
measurement error) is perhaps best seen from representation (8.71), where
'T} is written in terms of the covariance function c( h).
First, note that
532 8. Spatial Linear Models
)
( c("u'
"s,) c(s2- Si) c(s2 - sn)
Cq(i) 1
c(s,-
0
s,) )
c(s2- Si)
= c.
c(sn- Si)
8.13 Salutions 533
From equation (8.73), we thus see that the bestlinear unbiased predictor
at site si E S is
since premultiplication by q(i)T extracts the ith element from a vector; see
equation (2.92). Result (8. 76) implies that the ordinary kriging predictor
(without measurement error) is an exact interpolator when Si E S.
Exercise 8.4
We consider an intrinsically stationary process {y(s) : s E D} satisfying
conditions (8 .4) and (8.5), a finite collection of spatial sites s 11 ... , Sn and
real numbers a 1 , ... , an such that
n
(8.77)
var (taiYi)
•=1
(8 .78)
given that
n n n n
L L aiajy; = L L aia1yJ = 0
i=l j=l i=l j=l
Hence,
if and only if
n n
L I>iajv(si- sj)::::; 0.
i=l j=l
Exercise 8.5
If 8(s) is a second-order stationary process, the optimum weight vector TJ
can be expressed in terms of the covariance function c( h), as in equation
(8.71). However, under the measurement error model (8.11), the covari-
ance matrix C has to be modified to allow for the variability of repeated
measurements at the same location. The appropriate covariance matrix is
denoted by c* and its elements are given by
c(si- Bj) if i # j,
a y2 + a E:2 if i = J..
That is,
C*=C+a;In. (8.79)
The ordinary kriging predictor is then
where
1- JTC- 1 c )
TJ* = C;l ( c+ JTC* tJ J (8.81)
In the setting of equation (8. 79), A = C and a= a;. The required inverse
is then
y(soiS)
Exercise 8. 6
In order to allow for spatial dependence between observations at different
sites, we need more general deletion formulae than those given in our §2.5
andin §2.3 of Atkinson and Riani (2000). However, a useful simplification
occurs, since J.L is the only parameter to be estimated in the ordinary kriging
modeland J = (1, . . . , 1)T is the corresponding explanatory variable. Also
note that the results derived in this exercise are valid under both models
(8.2) and (8.11), since measurement error has no effect on prediction of Yi
given the reduced network s(i)·
In what follows we assume that the spatial dependence structure is
known, so that matrices Y and C are made up of known constants. Studying
the effect of variogram (or cavariogram) estimation on the theoretical prop-
erties of kriging predictors is an important research area, but goes beyond
the scope of this book. We refer the interested reader to the monograph of
Stein (1999) for further details.
To simplify notation, we write e(i) instead of ei,S<•l, as in §2.5. We also
let Y(i) denote the (n - 1) x 1 vector of observations when Yi is excluded,
C(i) be the (n - 1) x (n- 1) covariance matrix of Y(i)> J(i) be vector J
with the ith entry removed, i.e. an (n- 1) x 1 vector of ones, and C(i) be
the (n - 1) x 1 vector of covariances between Yi and Y( i). Accordingly, it is
convenient to partition the n x n covariance matrix C as
536 8. Spatial Linear Models
e(i) = Yi -
- Yi,S<,l = Yi -
- J.l-(i) r 0 (i)
- c(i) -1 ( -
Y(i) - J.l-(i) J (i) ) ,
where
- (Jr -1J )-1Jr
(i) 0 (i) Y(i)
(i) 0 (i) (i) (8.84)
-1
J.l-(i) =
(8.85)
r c-
- -- Yi- c(i)
w1·th
Yi (i) Y(i) and Xi-
1 r (i)1 J (i)·
- - 1 - c(i) c-
We know from formula (8.67) that the inverse of Cis
r
= bil ( bi C (i) + Ji) C~iC(i) 0 (i)
-1 c-1 -1 c-1 )
c-1 - (i) C(i) '
-c(i)c(i) 1
(8.86)
r 0 -1J
J (i) (i) (i) + b-1-2
i xi, (8.87)
. JT - 1 r - 1J
smce (i) 0 (i) C(i) = c(i) 0 (i ) (i ).
We have thus shown that
( Jr 0 -1 J . )-1 _ (Jr c-1 J _ b-1-2)-1
(i) (i) (•) - i xi ·
A simple adaptation of the Sherman-Morrison-Woodbury formula (2.40)
yields for scalars ß and a
2ß- 2
(ß - a2 ) -1 = ß - 1 + a .
1- a2ß- 1
J rc - 1Y = Jr c-1
(i) (i) Y(i) + b-1Jr
i
r
(i) (i) C(i)C(i) 0 (i) Y(i)
c-1 -1
-
b-1 r
i c(i) 0 (i) Y(i)
-1
-1 Jr
- bi Yi (i) 0 (i) C(i)
-1
+ Yi b-1
i
r c-1
J (i) (i) Y(i) + b-1- -
i XiYi·
Therefore,
r c-1
J (i) Jrc-1
(i) Y(i) = y- b-1- -
i XiYi· (8.89)
Putting equations (8.88) and (8.89) into (8.84) gives the change in the
generalized least-squares estimate of p, due to deletion of site Si
(8.92)
Recall that f)i = Yi- c0)C(i)1Y(i)• C(i) is the covariance matrix of Y(i) and
C(i) is the ( n - 1) x 1 vector of covariances between Yi and Y(i). Therefore,
In a similar fashion,
q(i- 1)T
q(i + 1)T
which is obtained by stacking the n-1 row vectors q(l)T = (0, .. . , 1, .. . , 0),
for l -=1- i.
It is easily verified that
Q(i)Y = Y(i)·
Therefore,
-
Yi Te-l
= Yi - c(i) (i) Y(i) = q (·)T
z Y- c(i) (i) (i)Y = { q (·)TJ
Tc-lQ z n - c(i) (i) (i) }Y
Tc-lQ
and
Exercise 8.7
We recall that {y(s): s E D} is said tobe second-order stationary if for all
pairs s, t E D
E{y(s)} f.l
and c(s,t) c(s- t) ,
wf = (0,1,0,0,0,1,0, ... , 0)
Exercise 8.8
Here W = wr , so that
540 8. Spatial Linear Models
Row Column
1 2 3 4 5
1 1.0656 1.1007 1.1014 1.1007 1.0656
2 1.1007 1.1397 1.1405 1.1397 1.1007
3 1.1014 1.1405 1.1413 1.1405 1.1014
4 1.1007 1.1397 1.1405 1.1397 1.1007
5 1.0656 1.1007 1.1014 1.1007 1.0656
1- Ai
- - - =Wi
p
is an eigenvalue of W. We thus establish the relationship
This result shows that the range of permissible values of p depends on the
form and size of W. For instance, for square lattices with weighting (8.40),
Wmax ---+ +4 and Wmin ---+ -4 as n ---+ 00, SO -0.25 < p < +0.25.
Exercise 8.9
The details of maximum likelihood estimation from a multivariate normal
distribution have been outlined in Exercise 2.7. Two major differences arise
in the framework of spatial autoregression. First, the mean is not constant;
we assume instead that p, = X ß. More importantly, vector y in model (8.43)
is a single observation from the n-variate normal distribution N ,....., (X ß, L:).
No independent replication occurs within the sample and the likelihood
function is simply the n-variate density of y
(8.96)
and
(8.98)
and
-na 2 + (y- Xß)TS(y- Xß) = 0. (8.99)
In (8.98), with slight abuse of notation, 0 denotes a p x 1 vector of zeros.
Equation (8.98) gives (8.45), while equation (8.99) yields (8.46). Condi-
tional on
and
542 8. Spatial Linear Models
Exercise 8.10
We know from maximum likelihood theory that the asymptotic covariance
matrix of /3, a2 and ß is the inverse of the expected information matrix,
that is
-Ea 2 L(ß ,a 2 ,p;y) ) -1
8 ß8f]
-E8 2 L(ß,a ,p;y)
aa2ßp
_ E 8 2 L(ß ,a 2 ,p ;y)
ßp2
(8.100)
Under the SAR model (8.43) , the first-order partial derivatives of the
loglikelihood L(ß, a 2 , p; y) with respect to ß and p are given in equations
(8.96) and (8.97). Hence,
(8.101)
and
(8.102)
E82 L(ß, a 2 , p; y)
8(a2)2
-2n 4
a
- 8a1 tr { a 2 .::.
~( In - pW )-1(Jn - pwr)-1}
n
-2a4 · (8.103)
1- pwi i = 1, . . . ,n,
and
Ologlln - pWI = _~ Wi .
ap L.q-
i=1
pw·
•
Furthermore,
o(y- Xß)T'3(y- Xß)
(y- Xß)T(-WT- W + 2pWTW)(y- Xß)
ap
-2(y- Xßf(In- pW)rW(y- Xß),
noting that (y- Xß)TWT(y- Xß) = (y- Xß)TW(y- Xß). Collecting
the pieces, equation (8.104) becomes
Exercise 8.11
For simplicity we consider the first-order SAR model (8.43) with ß known.
The least squares estimate of p is obtained by minimizing the expression
D(p) = (y- Xß)T'3(y- Xß)
544 8. Spatial Linear Models
Exercise 8.12
We are testing the null hypothesis
Ho: p =Po (8.108)
against the two-sided alternative H 1 : p =I= po , using information from a
subset of m spatial locations. For simplicity, in what follows we suppress
the subscript corresponding to sim).
Ifwe adopt the loglikelihood approximation L *(ß, u 2 , p; y) given in (8.51),
the signed square-root of the likelihood ratio statistic is defined as
1/ 2
= sign(ß- Po) 2L*(ß, 8-2 , ß; y)- 2L*(ßoJr5, Po; y)
}
,
A A
TN(P) {
where /Jo and a-g are the maximum likelihood est imates of ß a nd u 2 under
the constraint (8.108). From (8.51), we have
sign(p- Po) {
1 a2
logl220 1- mlog-;;z- ~(y- Xßf3(y- Xß)
A 1 A A A
a0 a
1 A T A } 1/2
+-;;z(Y- Xßo) 2o(Y- Xßo)
ao
Exercise 8.13
( a) A first-order CAR model is defined through the set of conditional den-
sities f (Yi IY(i); ß; a 2 ; p), for i = 1, ... , n. The model is well defined if these
conditional densities are mutually consistent and yield a proper joint dis-
tribution for y. Therefore, part (a) of the exercise is proved if we can show
that the joint distribution of y exists under condition (8.58), and that this
joint distribution is precisely of the form given in (8.59).
A general key result relating joint and conditional distributions is a fac-
torization theorem noted by Brook (1964) and fully developed by Besag
(1974) in the context of spatial processes. This theorem is proved also in
several books, including Cressie (1993 , pp. 412-413) and Guttorp (1995,
p. 7). We introduce the following notation. Let a = (a 1 , ... , an)T and
b = (b 1 , ... , bn)T denote two possible realizations of {y(s) : s E S}. Let
f(a) and f(b) be the joint densities of a and b, while f(ail·) and f(bil·) are
the conditional densities of ai and bi, i = 1, ... , n. The factorization theo-
rem says that, under a mild regularity condition, the conditional densities
must satisfy the relationship
(8.109)
The regularity condition (not stated here) implies that each term in the
denominator of (8.109) is positive. This is clearly true if f(bil·) is the normal
density function.
Equation (8.109) is important because it relates the joint probability
structure of {y(s) : s E S}, given in terms of the density ratio f(a)/ f(b),
546 8. Spatial Linear Models
In a similar way,
1
J(J.tiiY1, · · ·, Yi-1, J.ti+l, · · ·, J.tn) = ~ X
2rra 2
}2]
i-1 n
exp -[ 2~2 {J.ti- f.li - p L Wi,j(Yj- /-lj)- p _L J = •+1
Wi,j(/-lj- /-lj)
J= 1
J(y)
log f(X ß)
8.13 Solutions 547
This gives
f(y)
log f(Xß)
(8.110)
We note that
n i- l n n
L L Wi ,j(Yi- J.Li)(yj- J.Lj) = L L Wi ,j(Yi- J.Li)(yj- J.Lj)
i=l j=l i=l j=i+l
n
L Wi ,i(Yi- J.Li) 2 =0
i=l
since Wi ,i = 0. Hence,
n i-1 n n
2 LL Wi,j(Yi - J.Li)(yj- J.Lj) = L L Wi,j(Yi- J.Li)(yj- J.Lj)·
i=l j=l i=l j=l
Substituting into equation (8.110), the logarithm of the density ratio be-
comes
548 8. Spatial Linear Models
where
(8.113)
In order to be a valid covariance matrix, :Ec AR must be symmetric and
positive definite. Symmetry is ensured by assumption (8.57) . Under the
binary weighting scheme (8.40), positive definiteness of (In - pW) and
thus of :EcAR follows if -0.25 < p < +0.25 (see Exercise 8.8).
As a final remark, we point out why the density ratio f (y) / f (X ß) (or
equivalently its logarithm) gives a complete characterization of the joint
density f(y). This is easily seen by noting that
J
since f(y)dy = 1. Hence, knowledge of f(y)/ f(Xß) as a function of y is
equivalent to knowledge of the joint density function.
The validity of the factorization theorem (8.109) is general and it is by
no means limited to the case of normal conditional distributions. Therefore,
this theorem provides a basic tool for deriving valid models of spatial pro-
cesses with Markov structure, known as Markov random fields, also under
non-Gaussian assumptions (Besag 1974; Kaiser and Cressie 2000).
8.13 Solutions 549
(b) The loglikelihood of a first order CAR model is simply the logarithm
of the multivariate normal density (8.112), seen as a function of ß, CJ 2 and
p. Hence,
Furthermore, we know from part (a) of this exercise that E(y) = Xß and
E{(y-Xß)(y-Xß)T} =CJ 2 (In -pW)- 1 . Hence,
E(c:*) = 0,
an n x 1 vector of zeros, and
TABLE A.l. Swiss heads d a ta: six dimensions in millimetres of the heads of 200
Swiss soldiers
Number Y1 Y2 Y3 Y4 Y5 Y6
1 113.2 111.7 119.6 53.9 127.4 143.6
2 117.6 117.3 121.2 47.7 124.7 143.9
3 112.3 124.7 131.6 56.7 123.4 149.3
4 116.2 110.5 114.2 57.9 121.6 140.9
5 112.9 111.3 114.3 51.5 119.9 133.5
6 104.2 114.3 116.5 49.9 122.9 136.7
7 110.7 116.9 128.5 56.8 118.1 134.7
8 105.0 119.2 121.1 52.2 117.3 131.4
9 115.9 118.5 120.4 60.2 123.0 146.8
10 96.8 108.4 109.5 51.9 120.1 132.2
11 110.7 117.5 115.4 55 .2 125.0 140.6
12 108.4 113.7 122.2 56.2 124.5 146.3
13 104.1 116.0 124.3 49.8 121.8 138.1
14 107.9 115.2 129.4 62.2 121.6 137.9
15 106.4 109.0 114.9 56.8 120.1 129.5
16 112.7 118.0 117.4 53 .0 128.3 141.6
17 109.9 105.2 122.2 56.6 122.2 137.8
18 116.6 119.5 130.6 53.0 124.0 135.3
19 109.9 113.5 125.7 62.8 122.7 139.5
20 107.1 110.7 121.7 52.1 118.6 141.6
21 113.3 117.8 120.7 53.5 121.6 138.6
22 108.1 116.3 123.9 55.5 125.4 146.1
23 111.5 111.1 127.1 57.9 115.8 135.1
24 115.7 117.3 123.0 50.8 122.2 143.1
25 112.2 120.6 119.6 61.3 126.7 141.1
26 118.7 122.9 126.7 59.8 125.7 138.3
27 118.9 118.4 127.7 64.6 125.6 144.3
28 114.2 109.4 119.3 58.7 121.1 136.2
29 113.8 113.6 135.8 54.3 119.5 130.9
30 122.4 117.2 122.2 56.4 123.3 142.9
31 110.4 110.8 122.1 51.2 115.6 132.7
32 114.9 108.6 122.9 56.3 122.7 140.3
33 108.4 118.7 117.8 50.0 113.7 131.0
34 105.3 107.2 116.0 52.5 117.4 133.2
35 110.5 124.9 122.4 62.2 123.1 137.0
36 110.3 113.2 123.9 62.9 122.3 139.8
37 115.1 116.4 118.1 51.9 121.5 133.8
38 119.6 120.2 120.0 59.7 123.9 143.7
39 119.7 125.2 124.5 57.8 125.3 142.7
40 110.2 116.8 120.6 54.3 123.6 140.1
554 Appendix: Tables of Data
Number Y1 Y2 Y3 Y4 Ys Y6
41 118.9 126.6 128.2 63.8 125.7 151.1
42 112.3 114.7 127.7 59.4 125.2 137.5
43 113.7 111.4 122.6 63.3 121.6 146.8
44 108.1 116.4 115.5 55.2 123.5 134.1
45 105.6 111.4 121.8 61.4 117.7 132.6
46 111.1 111.9 125.2 56.1 119.9 139.5
47 111.3 117.6 129.3 63.7 124.3 142.8
48 119.4 114.6 125.0 62.5 129.5 147.7
49 113.4 120.5 121.1 61.5 118.1 137.2
50 114.7 113.8 137.7 59.8 124.5 143.3
51 115.1 113.9 118.6 59 .5 119.4 141.6
52 114.6 112.4 122.2 54.5 121.2 126.3
53 115.2 117.2 122.2 60.1 123.9 135.7
54 115.4 119.5 132.8 60.3 127.8 140.3
55 119.3 120.6 116.6 55.8 121.5 143.0
56 112.8 119.3 129.6 61.0 121.1 139.4
57 116.6 109.6 125.4 54.6 120.2 122.6
58 106.5 116.0 123.2 52.8 121.7 134.9
59 112.1 117.4 128.2 59.9 120.3 131.5
60 112.8 113.0 125.4 64.8 119.4 136.6
61 114.6 119.0 116.8 57.4 123.8 140.0
62 110.9 116.5 125.8 53.5 124.8 142.9
63 109.1 117.0 123.7 60.0 120.1 137.7
64 111 .7 117.3 121.0 51.5 119.7 135.5
65 106.4 111.1 124.4 59.1 122.4 138.4
66 121.2 122.5 117.8 54.8 121.5 143.9
67 115.2 121.2 117.4 54.9 121.9 144.0
68 123.2 124.2 120.0 57.9 119.4 138.4
69 113.1 114.5 118.9 56.9 121.8 135.0
70 110.3 108.9 115.2 55.9 119.0 138.0
71 115.0 114.7 123.5 66 .7 120.3 133.6
72 111.9 111 .1 122.3 63 .8 117.1 131.6
73 117.2 117.5 120.2 60.5 119.5 129.6
74 113.8 112.5 123.2 62.0 113.5 132.4
75 112.8 113.5 114.3 53 .8 128.4 143.8
76 113.3 118.4 123.8 51.6 122.7 141.7
77 123.9 120.5 118.3 54.3 122.0 133.8
78 119.8 119.6 126.1 57.5 124.7 130.9
79 110.9 113.9 123.7 62.7 124.8 143.5
80 111.9 125.1 121.8 58.1 112.1 134.8
Appendix: Tables of Data 555
Number YI Y2 Y3 Y4 Y5 Y6
81 114.0 120.8 131.2 61.0 124.7 152.6
82 113.6 110.4 130.9 60.2 118.5 132.5
83 118.9 126.1 121.9 56.1 127.3 145.6
84 119.4 127.8 128.0 61.8 120.6 141.3
85 121.0 121.1 116.8 56.3 124.2 140.9
86 109.0 105.7 126.2 59.4 121.2 143.2
87 117.9 125.1 122.6 58.2 128.4 151.1
88 124.8 123.8 128.3 60.4 129.1 147.2
89 120.6 124.3 120.0 59.5 123.4 144.1
90 115.6 115.9 117.2 54.0 119.9 135.3
91 116.6 119.1 131.0 58.0 123.3 136.4
92 118.7 118.9 129.6 68.6 123.0 141.4
93 114.3 117.1 127.1 55.7 119.1 139.8
94 110.9 113.1 124.1 60.6 115.7 132.1
95 119.2 120.0 136.9 55.1 129.5 142.0
96 117.1 123.7 108.7 53.2 125.6 136.6
97 109.3 110.2 129.3 58.5 121.0 136.8
98 108.8 119.3 118.7 58.9 118.5 132.7
99 109.0 127.5 124.6 61.1 117.6 131.5
100 101.2 110.6 124.3 62.9 124.3 138.9
101 117.8 109.0 127.1 53.9 117.9 135.8
102 112.4 115.6 135.3 55.8 125.0 136.1
103 105.3 109.8 115.4 59.6 116.6 137.4
104 117.7 122.4 127.1 74.2 125.5 144.5
105 110.9 113.7 126.8 62.7 121.4 142.7
106 115.6 117.5 114.2 55.0 113.2 136.6
107 115.4 118.1 116.6 62.5 125.4 142.1
108 113.6 116.7 130.1 58.5 120.8 140.3
109 116.1 117.6 132.3 59.6 122.0 139.1
110 120.5 115.4 120.2 53.5 118.6 139.4
111 119.0 124.1 124.3 73.6 126.3 141.6
112 122.7 109.0 116.3 55.8 121.8 139.4
113 117.8 108.2 133.9 61.3 120.6 141.3
114 122.3 114.2 137.4 61.7 125.8 143.2
115 114.4 117.8 128.1 54.9 126.5 140.6
116 110.6 111.8 128.4 56.7 121.7 147.5
117 123.3 119.1 117.0 51.7 119.9 137.9
118 118.0 118.0 131.5 61.2 125.0 140.5
119 122.0 114.6 126.2 55.5 121.2 143.4
120 113.4 104.1 128.3 58.7 124.1 142.8
556 Appendix: Tables of Data
Number I Y1 Y2 Y3 Y4 Y5 Y6
121 117.0 111.3 129.8 55.6 119.5 136.1
122 116.6 108.3 123.7 61.0 123.4 134.0
123 120.1 116.7 122.8 57.4 123.2 145.2
124 119.8 125.0 124.1 61.8 126.9 141.2
125 123.5 123.0 121.6 59.2 115.3 138.4
126 114.9 126.7 131.3 57.3 122.7 139.2
127 120.6 110.8 129.6 58.1 122.7 134.7
128 113.0 114.8 120.7 54.1 119.7 140.9
129 111.8 110.2 121.0 56.4 121.4 132.1
130 110.8 114.9 120.5 58.7 113.4 131.6
131 114.8 118.8 120.9 58.4 119.7 135.9
132 122.5 122.3 116.7 57.4 128.1 147.3
133 105.9 105.6 129.3 69.5 123.6 136.6
134 108.0 111.3 116.9 53.8 117.8 129.6
135 114.4 111.7 116.3 54.3 120.2 130.1
136 117.9 112.9 119.1 54.2 117.9 134.8
137 110.7 113.9 114.5 53.0 120.1 124.5
138 112.3 110.4 116.8 52.0 121.0 133.4
139 110.9 110.0 116.7 53.4 115.4 133.0
140 126.6 127.0 135.2 60.6 128.6 149.6
141 116.2 115.2 117.8 60.8 123.1 136.8
142 117.2 117.8 123.1 61.8 122.1 140.8
143 114.5 113.2 119.8 50.3 120.6 135.1
144 126.2 118.7 114.6 55.1 126.3 146.7
145 118.7 123.1 131.6 61.8 123.9 139.7
146 116.2 111.5 112.9 54.0 114.7 134.2
147 113.9 100.6 124.0 60.3 118.7 140.7
148 114.4 113.7 123.3 63.2 125.5 145.5
149 114.5 119.3 130.6 61.7 123.6 138.5
150 113.3 115.9 116.1 53.5 127.2 136.5
151 120.7 114.6 124.1 53.2 127.5 139.1
152 119.1 115.3 116.6 53.5 128.2 142.6
153 113.2 107.7 122.0 60.6 119.4 124.2
154 113.7 110.0 131.0 63.5 117.3 134.6
155 116.3 119.3 116.6 57.3 122.0 141.6
156 117.6 117.8 122.5 59.9 119.4 136.3
157 114.8 115.0 115.2 58.9 122.5 135.2
158 127.3 123.9 130.3 59.8 128.3 138.7
159 130.5 125.5 127.4 62.1 130.1 153.3
160 110.4 105.4 122.1 56.2 114.6 122.8
Appendix: Tables of Data 557
Nurober Y1 Y2 Y3 Y4 Y5 Y6
161 108.5 105.4 119.1 59.4 120.4 134.7
162 121.6 112.1 126.5 60.6 122.7 142.9
163 117.9 115.2 139.1 59.6 125.5 141.3
164 112.7 111.5 114.9 53.5 113.9 132.6
165 121.8 119.0 116.9 56.5 120.1 139.2
166 118.5 120.0 129.8 59.5 127.8 150.5
167 118.3 120.0 127.5 56.6 122.0 139.4
168 117.9 114.4 116.4 56.7 123.1 136.3
169 114.2 110.0 121.9 57.5 116.1 126.5
170 122.4 122.7 128.4 58.3 131.7 148.1
171 114.1 109.3 124.4 62.8 120.8 133.4
172 114.6 118.0 112.8 55.6 118.5 135.6
173 113.6 114.6 127.1 60.8 123.8 143.1
174 111.3 116.7 117.7 51.2 125.7 141.9
175 111.4 120.4 112.1 56.4 120.3 137.1
176 119.9 114.4 128.8 69.1 124.9 144.3
177 116.1 118.9 128.3 55.8 123.7 139.7
178 119.7 118.2 113.5 59.5 127.0 146.5
179 105.8 106.7 131.2 61.3 123.7 144.3
180 116.7 118.7 128.2 55.8 121.2 143.9
181 106.4 107.3 122.9 57.6 122.3 132.9
182 112.2 121.3 130.1 65.3 120.3 137.9
183 114.8 117.3 130.3 60.9 125.6 137.4
184 110.0 117.4 114.1 54.8 124.8 135.1
185 121.5 121.6 125.4 59.5 128.5 144.7
186 119.8 119.4 119.6 53.9 122.3 143.6
187 107.7 108.4 125.1 62.3 122.7 137.2
188 118.4 115.7 121.1 57.8 124.9 140.5
189 119.8 113.9 132.0 60.8 122.4 137.6
190 114.1 112.8 119.3 52.7 114.2 136.9
191 117.7 121.8 120.0 59.1 122.6 138.3
192 111.1 117.7 117.7 60.2 124.6 139.2
193 111.1 117.7 117.7 59.1 124.7 141.9
194 128.1 118.3 129.4 61.0 134.7 148.6
195 120.4 118.7 126.4 59.4 133.1 147.1
196 112.9 112.0 123.5 57.2 121.3 133.3
197 118.2 114.4 114.8 55.3 126.1 149.1
198 119.0 112.7 129.1 62.0 127.6 146.6
199 111.8 116.0 117.8 60.9 114.4 128.7
200 116.6 111.4 115.6 60.9 117.8 137.4
558 Appendix: Tables of Data
Number Country Yl Y2 Y3 Y4 Y5 Y6 Y7
secs secs secs mins mins mins mins
1 Argentina 11 .61 22.94 54.50 2.15 4.43 9.79 178.52
2 Australia 11.20 22.35 51.08 1.98 4.13 9.08 152.37
3 Austria 11.43 23.09 50.62 1.99 4.22 9.34 159.37
4 Belgium 11.41 23.04 52.00 2.00 4.14 8.88 157.85
5 Bermuda 11.46 23.05 53.30 2.16 4.58 9.81 169.98
6 Brazil 11.31 23.17 52.80 2.10 4.49 9.77 168.75
7 Burma 12.14 24.47 55.00 2.18 4.45 9.51 191.02
8 Canada 11.00 22.25 50.06 2.00 4.06 8.81 149.45
9 Chile 12.00 24.52 54.90 2.05 4.23 9.37 171.38
10 China 11.95 24.41 54.97 2.08 4.33 9.31 168.48
11 Colombia 11.60 24.00 53.26 2.11 4.35 9.46 165.42
12 Cook ls. 12.90 27.10 60.40 2.30 4.84 11.10 233.22
13 Costa Rica 11.96 24.60 58.25 2.21 4.68 10.43 171.80
14 cz 11.09 21.97 47.99 1.89 4.14 8.92 158.85
15 Denmark 11.42 23.52 53.60 2.03 4.18 8.71 151.75
16 Dominica 11.79 24.05 56.05 2.24 4.74 9.89 203.88
17 Finland 11 .13 22.39 50.14 2.03 4.10 8.92 154.23
18 France 11.15 22.59 51.73 2.00 4.14 8.98 155.27
19 GDR 10.81 21.71 48.16 1.93 3.96 8.75 157.68
20 FRG 11.01 22.39 49 .75 1.95 4.03 8.59 148.53
21 GB 11.00 22.13 50.46 1.98 4.03 8.62 149.72
22 Greece 11.79 24.08 54.93 2.07 4.35 9.87 182.20
23 Guatemala 11 .84 24.54 56.09 2.28 4.86 10.54 215.08
24 Hungary 11.45 23.06 51.50 2.01 4.14 8.98 156.37
25 Irrdia 11.95 24.28 53.60 2.10 4.32 9.98 188.03
26 Indonesia 11.85 24.24 55.34 2.22 4.61 10.02 201.28
27 Ireland 11.43 23.51 53.24 2.05 4.11 8.89 149.38
28 Israel 11.45 23.57 54.90 2.10 4.25 9.37 160.48
Appendix : Tables of Data 559
Number Country Y1 Yz Y3 Y4 Ys Y6 Y1
secs secs secs mins mins mins mins
29 Italy 11.29 23.00 52.01 1.96 3.98 8.63 151.82
30 Japan 11 .73 24.00 53.73 2.09 4.35 9.20 150.50
31 Kenya 11.73 23.88 52.70 2.00 4.15 9.20 181.05
32 Korea 11.96 24.49 55.70 2.15 4.42 9.62 164.65
33 DRK 12.25 25.78 51.20 1.97 4.25 9.35 179.17
34 Luxembourg 12.03 24.96 56.10 2.07 4.38 9.64 174.68
35 Malaysia 12.23 24.21 55.09 2.19 4.69 10.46 182.17
36 Mauritius 11 .76 25.08 58.10 2.27 4.79 10.90 261.13
37 Mexico 11.89 23.62 53.76 2.04 4.25 9.59 158.53
38 Netherlands 11.25 22.81 52.38 1.99 4.06 9.01 152.48
39 New Zealand 11.55 23.13 51.60 2.02 4.18 8.76 145.48
40 Norway 11.58 23.31 53.12 2.03 4.01 8.53 145.48
41 Papua NG 12.25 25.07 56.96 2.24 4.84 10.69 233.00
42 Philippines 11.76 23.54 54.60 2.19 4.60 10.16 200.37
43 Poland 11.13 22 .21 49.29 1.95 3.99 8.97 160.82
44 Portugal 11.81 24.22 54.30 2.09 4.16 8.84 151.20
45 Rumania 11.44 23.46 51.20 1.92 3.96 8.53 165.45
46 Singapore 12.30 25.00 55.08 2.12 4.52 9.94 182.77
47 Spain 11.80 23.98 53.59 2.05 4.14 9.02 162.60
48 Sweden 11.16 22 .82 51.79 2.02 4.12 8.84 154.48
49 Switzerland 11.45 23.31 53.11 2.02 4.07 8.77 153.42
50 Taiwan 11.22 22.62 52.50 2.10 4.38 9.63 177.87
51 Thailand 11.75 24.46 55.80 2.20 4.72 10.28 168.45
52 Turkey 11.98 24.44 56.45 2.15 4.37 9.38 201.08
53 USA 10.79 21.83 50.62 1.96 3.95 8 .50 142.72
54 USSR 11.06 22.19 49.19 1.89 3.87 8.45 151.22
55 W. Samoa 12.74 25.85 58.73 2.33 5.81 13.04 306.00
560 Appendix: Tables of Data
TABLE A.4. Swiss bank notes data: six dimensions in millimetres of 200 Swiss
1,000 Franc notes
Number Yl Y2 Y3 Y4 Y5 Y6
1 214.8 131.0 131.1 9.0 9.7 141.0
2 214.6 129.7 129.7 8.1 9.5 141.7
3 214.8 129.7 129.7 8.7 9.6 142.2
4 214.8 129.7 129.6 7.5 10.4 142.0
5 215.0 129.6 129.7 10.4 7.7 141.8
6 215.7 130.8 130.5 9.0 10.1 141.4
7 215.5 129.5 129.7 7.9 9.6 141.6
8 214.5 129.6 129.2 7.2 10.7 141.7
9 214.9 129.4 129.7 8.2 11.0 141.9
10 215.2 130.4 130.3 9.2 10.0 140.7
11 215.3 130.4 130.3 7.9 11.7 141.8
12 215.1 129.5 129.6 7.7 10.5 142.2
13 215.2 130.8 129.6 7.9 10.8 141.4
14 214.7 129.7 129.7 7.7 10.9 141.7
15 215.1 129.9 129.7 7.7 10.8 141.8
16 214.5 129.8 129.8 9.3 8.5 141.6
17 214.6 129.9 130.1 8.2 9.8 141.7
18 215.0 129.9 129.7 9.0 9.0 141.9
19 215.2 129.6 129.6 7.4 11.5 141.5
20 214.7 130.2 129.9 8.6 10.0 141.9
21 215.0 129.9 129.3 8.4 10.0 141.4
22 215.6 130.5 130.0 8.1 10.3 141.6
23 215.3 130.6 130.0 8.4 10.8 141.5
24 215.7 130.2 130.0 8.7 10.0 141.6
25 215.1 129.7 129.9 7.4 10.8 141.1
26 215.3 130.4 130.4 8.0 11.0 142.3
27 215.5 130.2 130.1 8.9 9.8 142.4
28 215.1 130.3 130.3 9.8 9.5 141.9
29 215.1 130.0 130.0 7.4 10.5 141.8
30 214.8 129.7 129.3 8.3 9.0 142.0
31 215.2 130.1 129.8 7.9 10.7 141.8
32 214.8 129.7 129.7 8.6 9.1 142.3
33 215.0 130.0 129.6 7.7 10.5 140.7
34 215.6 130.4 130.1 8.4 10.3 141.0
35 215.9 130.4 130.0 8.9 10.6 141.4
36 214.6 130.2 130.2 9.4 9.7 141.8
37 215.5 130.3 130.0 8.4 9.7 141.8
38 215.3 129.9 129.4 7.9 10.0 142.0
39 215.3 130.3 130.1 8.5 9.3 142.1
40 213.9 130.3 129.0 8.1 9.7 141.3
Appendix: Tables of D ata 563
Number Y1 Y2 Y3 Y4 Y5 Y6
41 214.4 129.8 129.2 8.9 9.4 142.3
42 214.8 130.1 129.6 8.8 9.9 140.9
43 214.9 129.6 129.4 9.3 9.0 141.7
44 214.9 130.4 129.7 9.0 9.8 140.9
45 214.8 129.4 129.1 8.2 10.2 141.0
46 214.3 129.5 129.4 8.3 10.2 141.8
47 214.8 129.9 129.7 8.3 10.2 141.5
48 214.8 129.9 129.7 7.3 10.9 142.0
49 214.6 129.7 129.8 7.9 10.3 141.1
50 214.5 129.0 129.6 7.8 9.8 142.0
51 214.6 129.8 129.4 7.2 10.0 141.3
52 215.3 130.6 130.0 9.5 9.7 141.1
53 214.5 130.1 130.0 7.8 10.9 140.9
54 215.4 130.2 130.2 7.6 10.9 141.6
55 214.5 129.4 129.5 7.9 10.0 141.4
56 215.2 129.7 129.4 9.2 9.4 142.0
57 215.7 130.0 129.4 9.2 10.4 141.2
58 215.0 129.6 129.4 8.8 9.0 141.1
59 215.1 130.1 129.9 7.9 11.0 141.3
60 215.1 130.0 129.8 8.2 10.3 141.4
61 215.1 129.6 129.3 8.3 9.9 141.6
62 215.3 129.7 129.4 7.5 10.5 141.5
63 215.4 129.8 129.4 8.0 10.6 141.5
64 214.5 130.0 129.5 8.0 10.8 141.4
65 215.0 130.0 129.8 8.6 10.6 141.5
66 215.2 130.6 130.0 8.8 10.6 140.8
67 214.6 129.5 129.2 7.7 10.3 141.3
68 214.8 129.7 129.3 9.1 9.5 141.5
69 215.1 129.6 129.8 8.6 9.8 141.8
70 214.9 130.2 130.2 8.0 11.2 139.6
71 213.8 129.8 129.5 8.4 11.1 140.9
72 215.2 129.9 129.5 8.2 10.3 141.4
73 215.0 129.6 130.2 8.7 10.0 141.2
74 214.4 129.9 129.6 7.5 10.5 141.8
75 215.2 129.9 129.7 7.2 10.6 142.1
76 214.1 129.6 129.3 7.6 10.7 141.7
77 214.9 129.9 130.1 8.8 10.0 141.2
78 214.6 129.8 129.4 7.4 10.6 141.0
79 215.2 130.5 129.8 7.9 10.9 140.9
80 214.6 129.9 129.4 7.9 10.0 141.8
564 Appendix: Tables of Data
Number Y1 Y2 Y3 Y4 Y5 Y6
81 215.1 129.7 129.7 8 .6 10.3 140.6
82 214.9 129.8 129.6 7.5 10.3 141.0
83 215.2 129.7 129.1 9.0 9.7 141.9
84 215.2 130.1 129.9 7.9 10.8 141.3
85 215.4 130.7 130.2 9.0 11.1 141.2
86 215.1 129.9 129.6 8.9 10.2 141.5
87 215.2 129.9 129.7 8.7 9.5 141.6
88 215.0 129.6 129.2 8.4 10.2 142.1
89 214.9 130.3 129.9 7.4 11.2 141.5
90 215.0 129.9 129.7 8.0 10.5 142.0
91 214.7 129.7 129.3 8.6 9.6 141.6
92 215.4 130.0 129.9 8.5 9.7 141.4
93 214.9 129.4 129.5 8.2 9.9 141.5
94 214.5 129.5 129.3 7.4 10.7 141.5
95 214.7 129.6 129.5 8.3 10.0 142.0
96 215.6 129.9 129.9 9.0 9.5 141.7
97 215.0 130.4 130.3 9.1 10.2 141.1
98 214.4 129.7 129.5 8.0 10.3 141.2
99 215 .1 130.0 129.8 9.1 10.2 141.5
100 214.7 130.0 129.4 7.8 10.0 141.2
101 214.4 130.1 130.3 9 .7 11.7 139.8
102 214.9 130.5 130.2 11.0 11.5 139.5
103 214.9 130.3 130.1 8.7 11 .7 140.2
104 215.0 130.4 130.6 9.9 10.9 140.3
105 214.7 130.2 130.3 11.8 10.9 139.7
106 215.0 130.2 130.2 10.6 10.7 139.9
107 215.3 130.3 130.1 9.3 12.1 140.2
108 214.8 130.1 130.4 9.8 11.5 139.9
109 215.0 130.2 129.9 10.0 11.9 139.4
110 215 .2 130.6 130.8 10.4 11.2 140.3
111 215.2 130.4 130.3 8.0 11.5 139.2
112 215.1 130.5 130.3 10.6 11.5 140.1
113 215.4 130.7 131.1 9.7 11.8 140.6
114 214.9 130.4 129.9 11 .4 11.0 139.9
115 215 .1 130.3 130.0 10.6 10.8 139.7
116 215.5 130.4 130.0 8.2 11.2 139.2
117 214.7 130.6 130.1 11.8 10.5 139.8
118 214.7 130.4 130.1 12.1 10.4 139.9
119 214.8 130.5 130.2 11.0 11.0 140.0
120 214.4 130.2 129.9 10.1 12.0 139.2
Appendix: Tables of Data 565
Number Y1 Y2 Y3 Y4 Y5 Y6
121 214.8 130.3 130.4 10.1 12.1 139.6
122 215.1 130.6 130.3 12.3 10.2 139.6
123 215.3 130.8 131.1 11.6 10.6 140.2
124 215.1 130.7 130.4 10.5 11.2 139.7
125 214.7 130.5 130.5 9.9 10.3 140.1
126 214.9 130.0 130.3 10.2 11.4 139.6
127 215.0 130.4 130.4 9.4 11 .6 140.2
128 215.5 130.7 130.3 10.2 11.8 140.0
129 215.1 130.2 130.2 10.1 11.3 140.3
130 214.5 130.2 130.6 9.8 12.1 139.9
131 214.3 130.2 130.0 10.7 10.5 139.8
132 214.5 130.2 129.8 12.3 11.2 139.2
133 214.9 130.5 130.2 10.6 11.5 139.9
134 214 .6 130.2 130.4 10.5 11.8 139.7
135 214.2 130.0 130.2 11.0 11.2 139.5
136 214.8 130.1 130.1 11.9 11.1 139.5
137 214.6 129.8 130.2 10.7 11.1 139.4
138 214.9 130.7 130.3 9.3 11.2 138.3
139 214.6 130.4 130.4 11.3 10.8 139.8
140 214.5 130.5 130.2 11.8 10.2 139.6
141 214.8 130.2 130.3 10.0 11.9 139.3
142 214.7 130.0 129.4 10.2 11 .0 139.2
143 214.6 130.2 130.4 11.2 10.7 139.9
144 215.0 130.5 130.4 10.6 11 .1 139.9
145 214 .5 129.8 129.8 11.4 10.0 139.3
146 214.9 130.6 130.4 11.9 10.5 139.8
147 215.0 130.5 130.4 11.4 10.7 139.9
148 215 .3 130.6 130.3 9.3 11.3 138.1
149 214.7 130.2 130.1 10.7 11 .0 139.4
150 214.9 129.9 130.0 9.9 12.3 139.4
151 214.9 130.3 129.9 11.9 10.6 139.8
152 214.6 129.9 129.7 11.9 10.1 139.0
153 214.6 129.7 129.3 10.4 11.0 139.3
154 214.5 130.1 130.1 12.1 10.3 139.4
155 214.5 130.3 130.0 11.0 11.5 139.5
156 215.1 130.0 130.3 11.6 10.5 139.7
157 214.2 129.7 129.6 10.3 11.4 139.5
158 214.4 130.1 130.0 11.3 10.7 139.2
159 214.8 130.4 130.6 12.5 10.0 139.3
160 214.6 130.6 130.1 8.1 12.1 137.9
566 Appendix: Tables of Data
Number Y1 Y2 Y3 Y4 Y5 Y6
161 215.6 130.1 129.7 7.4 12.2 138.4
162 214.9 130.5 130.1 9.9 10.2 138.1
163 214.6 130.1 130.0 11.5 10.6 139.5
164 214.7 130.1 130.2 11.6 10.9 139.1
165 214.3 130.3 130.0 11.4 10.5 139.8
166 215.1 130.3 130.6 10.3 12.0 139.7
167 216.3 130.7 130.4 10.0 10.1 138.8
168 215.6 130.4 130.1 9.6 11.2 138.6
169 214.8 129.9 129.8 9.6 12.0 139.6
170 214.9 130.0 129.9 11.4 10.9 139.7
171 213.9 130.7 130.5 8.7 11.5 137.8
172 214.2 130.6 130.4 12.0 10.2 139.6
173 214.8 130.5 130.3 11.8 10.5 139.4
174 214.8 129.6 130.0 10.4 11.6 139.2
175 214.8 130.1 130.0 11.4 10.5 139.6
176 214.9 130.4 130.2 11.9 10.7 139.0
177 214.3 130.1 130.1 11.6 10.5 139.7
178 214.5 130.4 130.0 9.9 12.0 139.6
179 214.8 130.5 130.3 10.2 12.1 139.1
180 214.5 130.2 130.4 8.2 11.8 137.8
181 215.0 130.4 130.1 11.4 10.7 139.1
182 214.8 130.6 130.6 8.0 11.4 138.7
183 215.0 130.5 130.1 11.0 11.4 139.3
184 214.6 130.5 130.4 10.1 11.4 139.3
185 214.7 130.2 130.1 10.7 11.1 139.5
186 214.7 130.4 130.0 11.5 10.7 139.4
187 214.5 130.4 130.0 8 .0 12.2 138.5
188 214.8 130.0 129.7 11.4 10.6 139.2
189 214.8 129.9 130.2 9.6 11.9 139.4
190 214.6 130.3 130.2 12.7 9.1 139.2
191 215.1 130.2 129.8 10.2 12.0 139.4
192 215.4 130.5 130.6 8.8 11.0 138.6
193 214.7 130.3 130.2 10.8 11.1 139.2
194 215.0 130.5 130.3 9.6 11.0 138.5
195 214.9 130.3 130.5 11.6 10.6 139.8
196 215.0 130.4 130.3 9.9 12.1 139.6
197 215.1 130.3 129.9 10.3 11.5 139.7
198 214.8 130.3 130.4 10.6 11.1 140.0
199 214.7 130.7 130.8 11.2 11.2 139.4
200 214.3 129.9 129.9 10.2 11.5 139.6
Appendix: Tables of Data 567
TABLE A.5. Babyfood data from Box and Draper (1987) . The responses are the
viscosity in centipoises at the time of manufacture and when measured 3, 6 and
9 months later
Number X! X2 XJ X4 X5 Y1 Y2 Y3 Y4
TABLE A.6. Musseis data from Cook and Weisberg (1994). The lengths are in
millimetres, the masses in grams
Number Y1: width y2: height y3: length y 4 : shell mass y 5: mass
Number Y1: width Y2: height y3: length y4: shell mass y5: mass
42 208 33 99 54 9
43 277 45 123 129 18
44 241 39 110 104 23
45 219 38 105 66 13
46 170 27 87 24 6
47 150 21 75 19 6
48 132 20 65 10 1
49 175 30 86 36 8
50 150 22 69 18 5
51 162 25 79 20 6
52 252 47 124 133 22
53 275 48 131 179 24
54 224 36 107 69 13
55 211 33 100 59 11
56 254 46 126 120 18
57 234 37 114 72 17
58 221 37 108 74 15
59 167 27 80 27 7
60 220 36 106 52 14
61 227 35 118 76 14
62 177 25 83 25 8
63 230 47 112 125 18
64 288 46 132 138 24
65 275 54 127 191 29
66 273 42 120 148 21
67 246 37 110 90 17
68 250 43 115 120 17
69 290 48 131 203 34
70 226 35 111 64 16
71 269 45 121 124 22
72 267 48 121 153 24
73 263 48 123 151 19
74 217 36 104 68 13
75 188 33 93 51 10
76 152 25 76 19 5
77 227 38 112 88 15
78 216 25 110 53 12
79 242 45 112 61 12
80 260 44 123 133 24
81 196 35 101 68 15
82 220 36 105 64 16
570 Appendix: Tables of Data
TABLE A.7. Dyestuff data from Box and Draper (1987). The responses are
strength (Yl), hue (y2) and brightness (y3)
Number Xl X2 X3 Yl Y2 Y3 Number Xl X2 X3 Yl Y2 Y3
1 -1 -1 -1 3.4 15 36 33 -1 -1 1 12.6 32 32
2 1 -1 -1 9.7 5 35 34 1 -1 1 10.5 10 34
3 -1 -1 -1 7.4 23 37 35 -1 -1 1 11.3 28 30
4 1 -1 -1 10.6 8 34 36 1 -1 1 10.6 18 24
5 -1 -1 -1 6.5 20 30 37 -1 -1 1 8.1 22 30
6 1 -1 -1 7.9 9 32 38 1 -1 1 12.5 31 20
7 -1 -1 -1 10.3 13 28 39 -1 -1 1 11.1 17 32
8 1 -1 -1 9.5 5 38 40 1 -1 1 12.9 16 25
9 -1 1 -1 14.3 23 40 41 -1 1 1 14.6 38 20
10 1 1 -1 10.5 1 32 42 1 1 1 12.7 12 20
11 -1 1 -1 7.8 11 32 43 -1 1 1 10.8 34 22
12 1 1 -1 17.2 5 28 44 1 1 1 17.1 19 35
13 -1 1 -1 9.4 15 34 45 -1 1 1 13.6 12 26
14 1 1 -1 12.1 8 26 46 1 1 1 14.6 14 15
15 -1 1 -1 9.5 15 30 47 -1 1 1 13.3 25 19
16 1 1 -1 15.8 1 28 48 1 1 1 14.4 16 24
17 -1 -1 -1 8.3 22 40 49 -1 -1 1 11 31 22
18 1 -1 -1 8 8 30 50 1 -1 1 12.5 14 23
19 -1 -1 -1 7.9 16 35 51 -1 -1 1 8.9 23 22
20 1 -1 -1 10.7 7 35 52 1 -1 1 13.1 23 18
21 -1 -1 -1 7.2 25 32 53 -1 -1 1 7.6 28 20
22 1 -1 -1 7.2 5 35 54 1 -1 1 8.6 20 20
23 -1 -1 -1 7.9 17 36 55 -1 -1 1 11.8 18 20
24 1 -1 -1 10.2 8 32 56 1 -1 1 12.4 11 36
25 -1 1 -1 10.3 10 20 57 -1 1 1 13.4 39 20
26 1 1 -1 9.9 3 35 58 1 1 1 14.6 30 11
27 -1 1 -1 7.4 22 35 59 -1 1 1 14.9 31 20
28 1 1 -1 10.5 6 28 60 1 1 1 11.8 6 35
29 -1 1 -1 9.6 24 27 61 -1 1 1 15.6 33 16
30 1 1 -1 15.1 4 36 62 1 1 1 12.8 23 32
31 -1 1 -1 8.7 10 36 63 -1 1 1 13.5 31 20
32 1 1 -1 12.1 5 35 64 1 1 1 15.8 11 20
Appendix: Tables of Data 571
TABLE A.8. Milk data from Daudin, Duby and Trecourt (1988). Eight measure-
ments of properties of 85 milk samples
Number Yl Y2 Y3 Y4 Y5 Y6 Y7 Ys
1 10.318 37.7 35.7 26.5 27.1 27.4 127.1 15.35
2 10.316 37.5 35.3 26.0 27.2 27.2 128.7 14.72
3 10.314 37.0 32.8 25.3 24.8 23 .9 124.1 14.61
4 10.311 39.5 33.7 26.8 25.6 25.8 127.5 14.56
5 10.309 36.0 32.8 25.9 25 .1 24.9 121.6 13.74
6 10.322 36.0 33.8 26.9 25.6 25.7 124.5 14.31
7 10.311 36.0 33.8 26.9 25.8 25.4 125.3 14.13
8 10.314 36.7 34.1 27.0 25.9 25.9 124.9 14.16
9 10.292 37.2 31.5 24.8 23.6 23.9 122.5 14.13
10 10.297 35.0 31.6 24.9 23.9 23.8 121.0 14.58
11 10.282 34.7 29.9 23.5 22.7 22.5 114.7 13.83
12 10.262 31.5 30.1 23.6 22 .8 22 .7 111.1 13.18
13 10.270 30.5 30.1 23.8 22.7 22 .6 115.0 13.45
14 10.269 31.6 29.8 23.3 22.4 22.3 112.7 12.82
15 10.264 34.9 29.7 23.2 22.2 22.3 113.5 13.36
16 10.275 35.7 32.5 25 .7 24.4 23 .8 120.1 14.61
17 10.275 37.9 31.8 25.0 23.4 23.5 122.6 14.74
18 10.293 34.6 32.9 26.1 25.3 24.4 120.8 13.74
19 10.282 36.6 32.2 25.3 24.4 24.1 121.1 14.63
20 10.300 37.2 32.1 25.6 25.0 24.2 123.4 14.74
21 10.300 34.0 33.1 26.4 25.3 25.1 119.7 13.80
22 10.300 35.3 33.3 26.0 25.1 25.0 121.5 14.07
23 10.295 35.8 33.9 26.6 25.9 25.5 121.7 14.57
24 10.295 35.9 33.8 26.5 25.2 25.3 121.4 14.88
25 10.288 34.8 32.9 26.2 25 .2 25.0 118.4 13.99
26 10.290 35.9 33.4 26.3 25.5 25.4 121.1 14.12
27 10.290 35.5 32.7 25.4 24.0 23.8 119.9 13.81
28 10.301 35.8 35.7 28.0 26.7 27.2 122.8 14.69
29 10.302 37.0 34.9 27.7 26.6 26.6 125.5 15.31
30 10.300 35.8 33.9 26.8 26.0 25.8 122.6 14.37
31 10.305 34.2 34.5 27.2 26.1 25.8 123.9 14.63
32 10.302 34.8 33.7 26.1 25 .2 25.2 124.2 14.59
33 10.300 36.5 33.9 26.6 25.5 25.4 124.1 14.89
34 10.300 36.6 33.2 25.9 24.6 24.9 123.4 14.60
35 10.300 34.8 34.2 26.8 25.5 25.4 121.4 14.66
36 10.300 37.3 34.9 27.4 26.7 26.7 125.1 14.75
37 10.300 34.6 34.6 27.0 26.2 25.8 122.6 14.33
38 10.310 35.7 34.8 27.4 26.1 26.3 123.4 14.58
39 10.300 35.7 33.4 26.1 24.9 24.9 120.8 14.67
40 10.302 37.9 33.6 26.1 25.3 24.9 126.1 15.20
41 10.300 36.8 35.4 26.0 27.1 27.0 125.1 14.94
42 10.300 39.3 34.8 27.4 26.1 26.2 127.8 15.43
43 10.310 35.6 34.0 26.8 26.4 25.7 125.5 14.40
572 Appendix: Tables of Data
Number Y1 Y2 Y3 Y4 Y5 YB Y1 YB
44 10.300 37.8 34.1 27.2 25.9 27.0 126.9 15.97
45 10.310 35.6 33.3 26.4 25.1 24.9 122.6 14.77
46 10.310 35.7 34.1 26.8 25.8 25.7 121.7 14.59
47 10.305 33.3 33.2 26.6 25.1 25.3 126.7 14.17
48 10.300 34.4 33.3 26.5 25.2 25.4 123.0 14.79
49 10.300 33.3 32.2 25.6 24.6 24.3 121.5 14.70
50 10.305 38.4 32.5 25.8 24.9 24.9 126.2 15.20
51 10.305 34.0 31.9 25.5 24.4 24.3 122.6 14.11
52 10.300 33.1 31.1 24.8 23.7 23.9 119.0 14.06
53 10.305 34.8 32.3 25.7 25.2 24.7 122.6 14.09
54 10.305 35.5 32.7 26.0 25.1 24.9 122.7 14.41
55 10.310 35.6 33.3 26.5 25.6 25.6 123.3 14.16
56 10.300 36.1 32.5 25.7 24.9 24.8 123.1 14.34
57 10.295 36.2 32.6 25.5 24.7 24.9 121.4 14.04
58 10.300 36.0 33.5 26.7 25.6 25.7 121.9 14.02
59 10.290 35.2 31.8 25.1 24.1 24.1 122.2 14.00
60 10.300 35.4 32.1 25.5 24.3 24.2 122.1 13.78
61 10.300 36.6 32.1 25.3 24.5 24.4 123.4 14.14
62 10.300 37.5 32.6 25.6 24.8 24.9 125.4 14.86
63 10.300 36.9 33.8 27.0 25.6 25.6 124.9 14.29
64 10.300 37.3 31.5 24.9 23.7 23.8 123.4 14.46
65 10.300 35.2 31.7 25.0 24.0 23.9 121.5 13.82
66 10.300 36.4 32.3 25.8 24.6 24.4 123.5 14.54
67 10.300 34.5 31.4 24.8 24.1 24.0 123.9 14.24
68 10.300 35.8 32.1 25.9 24.7 24.4 122.6 14.14
69 10.295 36.0 31.2 24.6 33.8 33.7 122.9 14.05
70 10.300 35.8 31.3 24.9 23.7 23.9 121.7 14.04
71 10.305 36.5 33.1 26.6 25.4 25.4 125.1 14.20
72 10.305 35.5 31.2 24.7 23.3 23.0 122.8 14.15
73 10.300 34.5 30.2 23.8 22.8 22.7 112.3 14.00
74 10.305 35.9 32.0 25.3 23.3 24.3 124.2 14.35
75 10.301 34.3 32.2 25.4 24.6 24.4 122.4 13.98
76 10.295 34.6 30.8 23.9 22.7 22.7 115.7 13.93
77 10.310 36.3 32.6 25.8 24.8 24.7 124.7 14.26
78 10.305 35.2 32.6 25.5 24.8 24.6 123.3 14.07
79 10.300 34.9 32.7 25.8 24.7 24.9 119.0 14.42
80 10.300 35.9 33.7 26.6 25.6 25.7 123.4 14.20
81 10.301 37.6 33.2 26.3 24.9 24.8 125.6 14.94
82 10.310 36.1 33.6 26.2 25.6 25.6 125.3 14.43
83 10.300 34.1 33.2 26.0 25.4 25.2 124.2 14.37
84 10.305 39.0 32.3 25.4 24.4 24.7 123.5 14.40
85 10.300 34.4 33.1 26.5 25.6 25.5 123.5 14.18
Appendix: Tables of Data 573
TABLE A.9. Indices of the quality of life in the provinces of Italy. Data from Il
Sole - 24 Ore 2001
Number Y1 Y2 Y3 Y4 Y5 Y6
1 9962 .65 105.60 507.78 13.50 10.75 9.45
2 8468 .69 19.93 421.21 18.27 13.24 1.84
3 9770.61 47.25 540.63 9.57 11.89 3.78
4 9706.74 30.60 504.93 16.10 9.13 3.38
5 8275.02 59.37 733.30 28.50 12.82 3.11
6 9121.25 35.83 480.45 11.63 12.80 4.59
7 8288 .02 24.31 479.30 23.78 14.80 4.14
8 7174.69 18.67 451.22 14.94 14.94 4.06
9 10525.64 19.90 317.61 19.07 7.46 3.93
10 6403 .96 23.57 589.65 10.17 16.64 10.43
11 7421.80 35.39 997.12 17.16 16.80 4.65
12 9572.63 44.94 495.60 25.90 15.83 13.19
13 7172 .86 38.82 363.79 14.89 13.09 3.23
14 9277.42 31.08 484.54 17.91 12.55 4.36
15 8761.72 38.33 434.94 10.50 10.87 3 .37
16 9154.08 3.38 166.69 21.96 6.76 2.36
17 20890.66 102.47 324.47 12.85 12.77 17.24
18 9406 .42 31.92 299.26 9.75 6.36 4.48
19 10046.40 57.16 551.31 10.25 8.45 4.47
20 9626.65 30.45 412.06 17.03 9.62 4.23
21 9141.12 28.89 250.82 20.26 10.43 4.15
22 9433.11 30.04 319.42 13.03 13.82 3.84
23 8816.74 28.88 520.42 12.83 10.59 1.91
24 9846.11 22.81 231.13 15.21 8.11 1.81
25 12051.57 16.55 148.95 29.23 8.38 9.16
26 11100.26 14.02 243.59 11.93 6.49 4.74
27 9786 .08 41.11 440.26 16.40 10.73 37.48
28 8962 .15 31.33 419.45 17.24 11.95 4.40
29 7350.34 6.16 229.80 30.80 8.05 1.41
30 8558.84 37.68 489.69 10.08 10.84 4.22
31 8159.78 42.56 536.16 19.63 14.11 7.46
32 9922.04 35.74 353.19 10.55 13.01 3.79
33 7507.03 23.43 342.39 13.97 9.04 2.37
34 10359.46 19.60 283.79 16.72 10.18 6.55
35 8941.28 20.89 159.18 20.89 10.08 5.03
574 Appendix: Tables of Data
Number Y1 Y2 Y3 Y4 Ys Y6
36 13643.31 43.41 165.54 45.85 11.36 28.33
37 8759.12 13.44 387.85 24.75 7.42 3.69
38 10012.27 34.83 401.52 20.97 14.98 6.87
39 11417.11 31.00 344.01 23.00 12.75 8.81
40 10264.58 33.11 308.11 19.30 17.11 5.56
41 10179.11 32.25 312.82 17.70 12.01 7.81
42 12600.70 90.79 510.68 19.20 13.34 16.44
43 7636.55 18.41 413.69 15.54 12.95 5.86
44 9408.58 67.57 578.04 24.98 9.94 8.88
45 10598.16 39.25 432.91 27.20 15.42 5.52
46 8788.12 89.56 618.20 13.47 21.48 5.89
47 9697.86 16.70 353.76 19.00 11.51 9.33
48 8740.72 28.00 313.56 10.53 18.14 7.87
49 8317.71 25.95 323.59 15.11 14.45 7.80
50 7661.13 22.38 335.67 13.75 15.91 4.77
51 7164.87 24.08 495.05 12.04 10.53 3.67
52 8838.89 38.07 454.14 30.08 12.51 10.76
53 8680.71 24.76 434.51 7.76 13.30 6.88
54 11232.08 51.54 508.41 11.50 11.81 16.34
55 6727.87 22.75 399.06 9.88 15.87 4.78
56 8471.09 28.89 448.82 23 .21 15.73 4.76
57 10523.71 27.19 194.65 14.21 9.58 2.61
58 11336.44 33.45 332.18 24.80 13.38 4.99
59 7618.48 13.45 327.47 23.37 10.20 3.40
60 11022.14 59.04 357.82 23.87 12.15 4.33
61 7977.96 32.40 318.29 22.68 11.99 8.76
62 6801.26 30.48 241.58 13.45 9.41 3.81
63 5748.72 25.19 381.90 8.85 9.19 2.59
64 5411.53 52.90 402.67 10.58 5.95 1.42
65 15558.38 102.04 472.97 10.26 11.72 14.31
66 5085.80 34.67 463.92 4.28 7.99 2.32
67 4503.98 29.74 344.31 5.26 9.71 0.65
68 4056.48 122.19 350.35 5.84 5.49 1.91
69 4142.85 17.42 353.11 9.90 4.10 2.68
Appendix: Tables of Data 575
Number Y1 Y2 Y3 Y4 Y5 Y6
70 6049.62 251.53 161.30 2.68 8.29 5.85
71 4825 .67 16.81 221.94 10.68 5.91 0.37
72 4960.91 54.37 317.17 8.33 6.77 2.38
73 6139.46 23.72 221.74 10.54 13.18 2.50
74 7431.26 27.05 352.27 13.35 18.49 1.72
75 7423 .60 31.85 357.80 10.50 22.36 5.54
76 5841.90 13.57 226.36 10.75 11.52 1.75
77 4550.97 8.48 167.50 8.48 7.63 0.94
78 4119.42 15.33 172.96 14.23 7.66 0.77
79 5311.58 56.76 181.54 6.93 5.05 1.27
80 6527.75 45.30 272.95 6.52 7.15 3.38
81 4693.84 26.41 257.59 12.95 5.28 2.25
82 4835.48 50.60 218.95 7.30 4.62 1.53
83 4200.45 36.78 281.60 12.01 5.76 2.15
84 4610.62 10.03 130.86 16.80 3.76 1.27
85 5893.22 12.14 169.02 14.08 5.34 0.91
86 4122.81 24.77 273.15 5.38 5.79 2.04
87 4443.40 14.41 197.78 7.07 7.86 1.21
88 4093.96 47.36 129.28 5.26 7.54 1.72
89 4123.18 17.90 179.00 2.31 1.73 0.94
90 3418.24 20.51 174.37 3.42 3.42 0.76
91 4727.70 25.41 454.35 5.08 6.70 3.22
92 5506.50 232.05 323.24 22.69 7.70 5.22
93 4279.83 42 .58 289.28 9.35 8.75 3.08
94 4593.08 16.07 175.31 14.57 2.36 1.58
95 5202.02 22.30 174.43 18.41 5.66 0.91
96 3805.09 13.87 170.89 18.86 5.55 0.66
97 4465.26 118.97 248.28 7.71 9.07 5.97
98 4761.98 31.37 342.73 10.24 9.91 1.60
99 4408.57 40.32 383.02 14.93 7.72 2.13
100 5762.56 25.70 440.60 7.41 8.49 2.36
101 5509.60 34.70 223.51 9.70 4.85 1.45
102 6669.89 32.84 283.81 15.70 8.77 5.13
103 5310.75 19.15 257.91 17.87 4.47 1.17
576 Appendix: Tables of Data
TABLE A.lO. Iris data from Anderson (1935) . Iris setosa: Yl sepal length, Y2
sepal width, Y3 petal length and Y4 petal width, all in centimetres
Number Y1 Y2 Y3 Y4 Number Y1 Y2 Y3 Y4
1 5.1 3.5 1.4 0.2 26 5.0 3.0 1.6 0.2
2 4.9 3.0 1.4 0.2 27 5.0 3.4 1.6 0.4
3 4.7 3.2 1.3 0.2 28 5.2 3.5 1.5 0.2
4 4.6 3.1 1.5 0.2 29 5.2 3.4 1.4 0.2
5 5.0 3.6 1.4 0.2 30 4.7 3.2 1.6 0.2
6 5.4 3.9 1.7 0.4 31 4.8 3.1 1.6 0.2
7 4.6 3.4 1.4 0.3 32 5.4 3.4 1.5 0.4
8 5.0 3.4 1.5 0.2 33 5.2 4.1 1.5 0.1
9 4.4 2.9 1.4 0.2 34 5.5 4.2 1.4 0.2
10 4.9 3.1 1.5 0.1 35 4.9 3.1 1.5 0.2
11 5.4 3.7 1.5 0.2 36 5.0 3.2 1.2 0.2
12 4.8 3.4 1.6 0.2 37 5.5 3.5 1.3 0.2
13 4.8 3.0 1.4 0.1 38 4.9 3.6 1.4 0.1
14 4.3 3.0 1.1 0.1 39 4.4 3.0 1.3 0.2
15 5.8 4.0 1.2 0.2 40 5.1 3.4 1.5 0.2
16 5.7 4.4 1.5 0.4 41 5.0 3.5 1.3 0.3
17 5.4 3.9 1.3 0.4 42 4.5 2.3 1.3 0.3
18 5.1 3.5 1.4 0.3 43 4.4 3.2 1.3 0.2
19 5.7 3.8 1.7 0.3 44 5.0 3.5 1.6 0.6
20 5.1 3.8 1.5 0.3 45 5.1 3.8 1.9 0.4
21 5.4 3.4 1.7 0.2 46 4.8 3.0 1.4 0.3
22 5.1 3.7 1.5 0.4 47 5.1 3.8 1.6 0.2
23 4.6 3.6 1.0 0.2 48 4.6 3.2 1.4 0.2
24 5.1 3.3 1.7 0.5 49 5.3 3.7 1.5 0.2
25 4.8 3.4 1.9 0.2 50 5.0 3.3 1.4 0.2
Appendix: Tables of Data 577
TABLE A.lO. Iris data (continued). Iris versicolor: Y1 sepal length, Y2 sepal
width, Y3 petallength and Y4 petal width, all in centimetres
Number Y1 Y2 Y3 Y4 Number Y1 Y2 Y3 Y4
51 7.0 3.2 4.7 1.4 76 6.6 3.0 4.4 1.4
52 6.4 3.2 4.5 1.5 77 6.8 2.8 4.8 1.4
53 6.9 3.1 4.9 1.5 78 6.7 3.0 5.0 1.7
54 5.5 2.3 4.0 1.3 79 6.0 2.9 4.5 1.5
55 6.5 2.8 4.6 1.5 80 5.7 2.6 3.5 1.0
56 5.7 2.8 4.5 1.3 81 5.5 2.4 3.8 1.1
57 6.3 3.3 4.7 1.6 82 5.5 2.4 3.7 1.0
58 4.9 2.4 3.3 1.0 83 5.8 2.7 3.9 1.2
59 6.6 2.9 4.6 1.3 84 6.0 2.7 5.1 1.6
60 5.2 2.7 3.9 1.4 85 5.4 3.0 4.5 1.5
61 5.0 2.0 3.5 1.0 86 6.0 3.4 4.5 1.6
62 5.9 3.0 4.2 1.5 87 6.7 3.1 4.7 1.5
63 6.0 2.2 4.0 1.0 88 6.3 2.3 4.4 1.3
64 6.1 2.9 4.7 1.4 89 5.6 3.0 4.1 1.3
65 5.6 2.9 3.6 1.3 90 5.5 2.5 4.0 1.3
66 6.7 3.1 4.4 1.4 91 5.5 2.6 4.4 1.2
67 5.6 3.0 4.5 1.5 92 6.1 3.0 4.6 1.4
68 5.8 2.7 4.1 1.0 93 5.8 2.6 4.0 1.2
69 6.2 2.2 4.5 1.5 94 5.0 2.3 3.3 1.0
70 5.6 2.5 3.9 1.1 95 5.6 2.7 4.2 1.3
71 5.9 3.2 4.8 1.8 96 5.7 3.0 4.2 1.2
72 6.1 2.8 4.0 1.3 97 5.7 2.9 4.2 1.3
73 6.3 2.5 4.9 1.5 98 6.2 2.9 4.3 1.3
74 6.1 2.8 4.7 1.2 99 5.1 2.5 3.0 1.1
75 6.4 2.9 4.3 1.3 100 5.7 2.8 4.1 1.3
578 Appendix: Tables of Data
TABLE A.lO. Iris data (concluded). Iris virginica: Yl sepallength, Y2 sepal width,
Y3 petal length and Y4 petal width, all in centimetres
Number Y1 Y2 Y3 Y4 Number Y1 Y2 Y3 Y4
101 6.3 3.3 6.0 2.5 126 7.2 3.2 6.0 1.8
102 5.8 2.7 5.1 1.9 127 6.2 2.8 4.8 1.8
103 7.1 3.0 5.9 2.1 128 6.1 3.0 4.9 1.8
104 6.3 2.9 5.6 1.8 129 6.4 2.8 5.6 2.1
105 6.5 3.0 5.8 2.2 130 7.2 3.0 5.8 1.6
106 7.6 3.0 6.6 2.1 131 7.4 2.8 6.1 1.9
107 4.9 2.5 4.5 1.7 132 7.9 3.8 6.4 2.0
108 7.3 2.9 6.3 1.8 133 6.4 2.8 5.6 2.2
109 6.7 2.5 5.8 1.8 134 6.3 2.8 5.1 1.5
110 7.2 3.6 6.1 2.5 135 6.1 2.6 5.6 1.4
111 6.5 3.2 5.1 2.0 136 7.7 3.0 6.1 2.3
112 6.4 2.7 5.3 1.9 137 6.3 3.4 5.6 2.4
113 6.8 3.0 5.5 2.1 138 6.4 3.1 5.5 1.8
114 5.7 2.5 5.0 2.0 139 6.0 3.0 4.8 1.8
115 5.8 2.8 5.1 2.4 140 6.9 3.1 5.4 2.1
116 6.4 3.2 5.3 2.3 141 6.7 3.1 5.6 2.4
117 6.5 3.0 5.5 1.8 142 6.9 3.1 5.1 2.3
118 7.7 3.8 6.7 2.2 143 5.8 2.7 5.1 1.9
119 7.7 2.6 6.9 2.3 144 6.8 3.2 5.9 2.3
120 6.0 2.2 5.0 1.5 145 6.7 3.3 5.7 2.5
121 6.9 3.2 5.7 2.3 146 6.7 3.0 5.2 2.3
122 5.6 2.8 4.9 2.0 147 6.3 2.5 5.0 1.9
123 7.7 2.8 6.7 2.0 148 6.5 3.0 5.2 2.0
124 6.3 2.7 4.9 1.8 149 6.2 3.4 5.4 2.3
125 6.7 3.3 5.7 2.1 150 5.9 3.0 5.1 1.8
Appendix: Tables of Data 579
TABLE A.ll. Electrodes data from Flury and Riedwyl (1988). Machirre one: y1 ,
y2 and y5, scaled diameters; Y3 and y4, scaled lengths
Number Y1 Y2 Y3 Y4 Y5 Number Y1 Y2 Y3 Y4 Y5
1 40 58 31 44 64 26 43 60 35 38 62
2 39 59 33 40 60 27 40 59 29 41 60
3 40 58 35 46 59 28 40 59 37 41 59
4 39 59 31 47 58 29 40 60 37 46 60
5 40 60 36 41 56 30 40 58 42 45 61
6 45 60 45 45 58 31 42 63 48 47 64
7 42 64 39 38 63 32 41 59 37 49 60
8 44 59 41 40 60 33 39 58 31 47 60
9 42 66 48 20 61 34 42 60 43 49 61
10 40 60 35 40 58 35 42 59 37 53 62
11 40 61 40 41 58 36 40 58 35 40 59
12 40 58 38 45 60 37 40 59 35 48 58
13 38 59 39 46 58 38 39 60 35 46 59
14 42 59 32 36 61 39 38 59 30 47 57
15 40 61 45 45 59 40 40 60 38 48 62
16 40 59 45 52 59 41 44 60 36 44 60
17 42 58 38 51 59 42 40 58 34 41 58
18 40 59 37 44 60 43 38 60 31 49 60
19 39 60 35 49 59 44 38 58 29 46 60
20 39 60 37 46 56 45 39 59 35 43 56
21 40 58 35 39 58 46 40 60 37 45 59
22 39 59 34 41 60 47 40 60 37 44 61
23 39 60 37 39 59 48 42 62 37 35 60
24 40 59 42 43 57 49 40 59 35 44 58
25 40 59 37 46 60 50 42 58 35 43 61
580 Appendix: Tables of Data
TABLE A.ll. Electrodes data (concluded). Machirre two: y1, Y2 and ys, scaled
diameters; Y3 and Y4, scaled lengths
Number Y1 Y2 Y3 Y4 Y5 Number Y1 Y2 Y3 Y4 Y5
51 44 58 32 25 57 76 44 57 33 11 59
52 43 58 25 19 60 77 44 60 25 10 59
53 44 57 30 24 59 78 44 58 22 16 59
54 42 59 36 20 59 79 44 60 36 18 57
55 42 60 38 29 59 80 46 61 39 14 59
56 43 56 38 32 58 81 42 58 36 27 57
57 43 57 26 18 59 82 43 60 20 19 60
58 45 60 27 27 59 83 42 59 27 23 59
59 45 59 33 18 60 84 43 58 28 12 58
60 43 58 29 26 59 85 42 57 41 24 58
61 43 59 39 22 58 86 44 60 28 20 60
62 43 59 35 29 59 87 43 58 45 25 59
63 44 57 37 19 58 88 43 59 35 21 59
64 43 58 29 20 58 89 43 60 29 2.0 60
65 43 58 27 8 58 90 44 59 22 11 59
66 44 60 39 15 60 91 44 58 46 25 58
67 43 58 35 13 58 92 43 60 28 9 60
68 44 58 38 19 58 93 43 59 38 29 59
69 43 58 36 19 58 94 43 58 47 24 57
70 43 58 29 19 60 95 42 58 24 19 59
71 43 58 29 21 58 96 43 60 35 22 58
72 42 59 43 26 58 97 45 60 28 18 60
73 43 58 26 20 58 98 43 57 38 23 60
74 44 59 22 17 59 99 44 60 31 22 58
75 43 59 36 25 59 100 43 58 22 20 57
Appendix: Tables of Data 581
TABLE A.12. Muscular dystrophy data. Non carriers: Y1 age, Y2 month, Y3- Y6
Ievels of four serum markers. Units 1 - 39 are included in the "small" data set
Number Yl Y2 Y3 Y4 Ys Y6
1 27 6 22 84.0 2.8 145
2 26 7 30 76.0 17.1 145
3 26 3 35 76.7 10.9 105
4 26 7 34 78.0 8.0 140
5 31 10 27 90.0 15.6 167
6 31 3 22 71.5 11.8 98
7 31 6 22 73.5 5.1 184
8 25 9 72 80.5 12.0 225
9 35 7 51 70.0 16.6 146
10 36 2 30 66.7 15.3 124
11 36 7 23 66 .3 4.4 142
12 33 4 67 98.0 9.3 225
13 27 9 50 69.0 15.1 160
14 27 3 92 68.0 16.5 115
15 36 10 55 78.2 21.8 188
16 25 6 38 82.0 15.8 161
17 33 3 27 100.0 10.3 169
18 22 8 34 84.0 12.0 175
19 23 2 44 81.3 10.5 159
20 25 7 32 86.5 6.7 149
21 22 10 35 59.4 11.3 130
22 31 5 35 90.3 15.3 124
23 33 8 31 75.5 13.7 160
24 33 11 25 78.9 12.2 127
25 27 6 52 77.0 17.9 198
26 30 10 34 75.0 15.4 171
27 20 10 53 93.2 22.3 349
28 30 2 69 66.7 8.7 119
29 30 7 25 70.5 5.3 123
30 27 5 24 89.5 16.1 176
31 35 8 21 108.5 9.8 148
32 26 6 51 82.0 12.9 149
33 27 6 37 77.3 3.9 141
34 24 7 24 82.0 14.2 123
35 25 2 30 77.0 16.2 124
36 26 7 34 81.3 9.7 158
37 20 9 22 102.0 10.3 177
38 22 10 32 79.2 5.8 190
39 34 10 24 70.4 10.6 181
40 22 7 20 72.0 11.9 110
41 22 6 34 91.0 14.5 144
42 24 9 25 92.0 14.0 166
43 38 12 26 109.0 8.9 163
582 Appendix: Tables of Data
Number YI Y2 Y3 Y4 Ys Y6
44 39 1 28.0 102.3 17.1 146
45 39 3 21.0 92.4 10.3 197
46 39 3 23.0 111.5 10.0 133
47 39 4 26 .0 92.6 12.3 196
48 39 4 25.0 98.7 10.0 174
49 39 6 21.0 93.2 5.9 181
50 32 2 56.0 72.0 9 .9 227
51 34 10 48.0 83.0 13.7 228
52 22 6 51.0 91.0 12.7 149
53 39 3 18.0 95 .0 11.3 66
54 33 7 28.0 104.0 6.9 169
55 33 11 41.0 105.5 15.1 252
56 20 6 40.0 81.0 6.1 167
57 22 7 21.0 74.5 12.2 163
58 25 2 95.0 69.8 7.3 169
59 25 6 59.0 72.5 10.7 314
60 36 2 40.0 72.7 7.0 131
61 36 8 30.0 79.5 11.9 130
62 22 6 48.0 76.0 16.6 133
63 39 6 39.0 88.5 7.6 168
64 30 3 30.0 82.7 18.1 124
65 37 7 38.0 85.0 21.6 198
66 27 3 27.0 87.2 12.5 99
67 28 7 32.0 76.3 5.6 159
68 29 2 74.0 80.4 8.9 207
69 30 6 33.0 86.0 3.8 149
70 31 7 34.0 80.5 11.1 149
71 31 2 45.0 86.5 10.8 169
72 32 6 52.0 79.0 10.7 187
73 32 7 28.0 82.5 17.4 144
74 32 12 35.0 97.0 14.5 137
75 25 6 37.0 93.0 15.3 167
76 26 2 44.0 81.3 15.3 166
77 26 6 68.0 82.8 11.9 177
78 37 10 97.5 34.0 12.0 203
79 25 7 37.0 98.0 16.4 198
80 25 7 34.0 92.0 12.1 217
81 20 6 30.0 80.0 12.9 129
82 25 6 37.0 98.0 11.7 177
83 34 9 24.0 100.5 14.0 231
84 32 4 41.0 78.5 10.9 191
85 32 6 43.0 87.5 6.0 136
Appendix: Tables of Data 583
Number Yl Y2 Y3 Y4 Y5 Y6
86 32 10 30 90.5 15.3 136
87 33 7 30 85.0 11.4 176
88 33 10 43 88.5 20.3 175
89 22 6 52 83.5 10.9 176
90 32 8 20 77.0 11.0 200
91 36 7 28 86.5 13.2 171
92 22 11 30 104.0 22.6 230
93 23 1 40 83.0 15.2 205
94 30 5 24 78.8 9.6 151
95 27 8 15 87.0 13.5 232
96 30 11 22 91.0 17.5 198
97 25 10 42 65.5 13.3 216
98 26 2 130 80.3 17.1 211
99 26 3 48 85.2 22.7 160
100 27 7 31 86.5 6.9 162
101 26 10 47 53.0 14.6 131
102 27 3 36 56.0 18.2 105
103 27 7 24 57.5 5.6 130
104 31 4 34 92.7 7.9 140
105 31 9 38 96.0 12.6 158
106 35 10 40 104.6 16.1 209
107 28 4 59 88.0 9.9 128
108 28 8 75 81.0 10.1 177
109 28 9 72 66.3 16.4 156
110 27 7 42 77.0 15.3 163
111 27 3 30 80.2 8.1 100
112 28 6 24 87.0 3.5 132
113 24 9 26 84.5 20.7 145
114 23 8 65 75.0 19.9 187
115 27 3 34 86.3 11.8 120
116 25 2 37 73.3 13.0 254
117 34 3 73 57.4 7.4 107
118 34 7 87 76.3 6.0 87
119 25 7 35 71.0 8.8 186
120 20 7 31 61.5 9.9 172
121 20 5 62 81.0 10.2 181
122 31 6 48 79.0 16.8 182
123 31 7 40 82.5 6.4 151
124 26 7 55 85.5 10.9 216
125 26 7 32 73.8 8.6 147
126 21 11 26 79.3 16.4 123
127 27 6 25 91.0 10.3 135
584 Appendix: Tables of Data
TABLE A.l2. Muscular dystrophy data. Carriers: the first set of unit numbers
are those for the "small" data set
Number Number Y1 Yz Y3 Y4 Y5 Y6
40 128 30 10 167 89.0 25.6 364
41 129 41 10 104 81.0 26.8 245
42 130 22 8 30 108.0 8.8 284
43 131 22 8 44 104.0 17.4 172
44 132 20 10 65 87.0 23.8 198
45 133 42 9 440 107.0 20.2 239
46 134 59 8 58 88.2 11.0 259
47 135 35 9 129 93.1 18.3 188
48 136 36 6 104 87.5 16.7 256
49 137 35 2 122 88.5 21.6 263
50 138 29 4 265 83.5 16.1 136
51 139 27 4 285 79.5 36.4 245
52 140 27 9 25 91.0 49.1 209
53 141 28 4 124 92.0 32.2 298
54 142 29 8 53 76.0 14.0 174
55 143 30 2 46 71.0 16.9 197
56 144 30 7 40 85.5 12.7 201
57 145 30 8 41 90.0 9.7 342
58 146 31 6 657 104.0 110.0 358
59 147 32 2 465 86.5 63.7 412
60 148 32 5 485 83.5 73.0 382
61 149 37 2 168 82.5 23.3 261
62 150 38 6 286 109.5 31.9 260
63 151 39 1 388 91.0 41.6 204
64 152 39 9 148 105.2 18.8 221
65 153 34 6 73 105.5 17.0 285
66 154 35 4 36 92.8 22.0 308
67 155 58 8 19 100.5 10.9 196
68 156 58 2 34 98.5 19.9 299
69 157 38 1 113 97.0 18.8 216
70 158 30 8 57 105.0 12.9 155
71 159 42 8 78 118.0 15.5 212
72 160 43 11 73 104.0 20.6 201
73 161 29 3 69 111.0 16.0 175
Appendix: Tables of Data 585
Number Y1 Y2 Y3 Y4 Y5 Y6
162 30 10 177 103.5 19.8 241
163 35 6 48 98.0 16.4 233
164 35 7 34 96.5 10.4 122
165 35 9 42 100.1 17.1 184
166 44 9 109 81.0 25.3 227
167 35 9 925 81.0 62.9 279
168 35 4 1288 82.0 51.6 368
169 36 9 325 76.3 33.9 413
170 53 6 59 93.0 22 .2 240
171 54 4 69 92.6 20.9 243
172 30 4 363 91.3 36.0 325
173 35 11 37 84.0 12.8 156
174 53 6 101 77.5 11.7 280
175 41 3 99 93.2 18.6 156
176 40 9 125 90.5 19.4 438
177 42 8 52 93.3 11.2 272
178 59 6 560 106.0 21.0 345
179 31 8 85 94.0 20.1 198
180 32 6 72 88.0 8.3 166
181 52 6 197 91.5 25 .2 236
182 52 3 242 85.5 16.6 168
183 53 8 245 89.5 22 .7 269
184 39 10 154 103.5 21.3 296
185 39 6 228 104.0 10.2 236
186 43 8 80 90.5 12.1 269
187 44 6 28 104.0 22.0 142
188 45 6 35 86.3 14.4 184
189 33 5 57 88.0 8.9 190
190 26 11 326 98.0 27.1 358
191 26 6 700 90.0 49.1 343
192 61 9 100 101.0 11.8 301
193 61 2 80 97.5 15.1 262
194 48 6 115 79.0 14.2 258
586 Appendix: Tables of Data
Number Y1 Y2 Number Y1 Y2
1 -5.012 -0.739 36 5.575 -1.425
2 -5.535 -0.876 37 3.338 0.705
3 -1.016 -0.314 38 -0.929 2.217
4 -9.488 0.449 39 0.400 0.019
5 3.434 1.369 40 -2.274 -0.602
6 0.455 1.958 41 -6.458 -1.084
7 -1.416 0.679 42 -5.623 -0.922
8 -5.544 -1.809 43 4.600 -2.675
9 -7.615 -2 .717 44 -9.047 -1.510
10 -7.078 -2 .640 45 6.944 5.484
11 2.402 5.193 46 -2.049 -0.523
12 -0.564 -3.159 47 -3.544 0.466
13 7.103 2.199 48 -3.965 -4.043
14 5.275 0.482 49 -2.074 -2.499
15 -5.661 1.042 50 -2.083 -0.930
16 5.325 3.274 51 -6.175 -2.212
17 0.879 -2.706 52 3.176 1.211
18 4.345 1.376 53 2.396 2.057
19 -1.234 1.464 54 -5.982 -1.003
20 0.025 -0.350 55 -5 .354 -2.102
21 -0.575 0.063 56 -4.732 -1.773
22 8.103 2.426 57 11.639 9.161
23 3.720 -0.530 58 -0.957 0.746
24 0.989 -2.850 59 -2.667 2.636
25 1.768 0.066 60 -1.817 0.892
26 4.399 1.344 61 -2 .956 -1.730
27 -7.954 -5 .681 62 1.063 -0.646
28 2.065 5.042 63 1.939 0.912
29 -4.554 -3.484 64 0.027 -2.777
30 -3.260 -3.660 65 -0.024 -1.929
31 -5.093 -1.782 66 0.300 0.155
32 -4.163 -0.377 67 1.093 -2.275
33 7.246 0.552 68 -0.427 -0.202
34 0.964 1.931 69 -4.791 0.375
35 -0.395 -1.249 70 -4.956 -2 .937
Appendix: Tables of Data 587
Number Y1 Y2 Number Y1 Y2
71 5.926 1.355 106 6.759 -5.894
72 5.228 -1.806 107 5.845 -6.799
73 3.895 -0.023 108 6.704 -6.189
74 -0.592 0 .097 109 5.902 -5.774
75 -1.349 0.472 110 6.618 -6.566
76 -5.471 -1.188 111 5.301 -6.897
77 -6.910 -2.812 112 6.169 -6.784
78 -1.660 -1.560 113 5.861 -7.152
79 4.715 2.412 114 5.463 -6.305
80 2.151 1.496 115 6.787 -6.698
81 6.138 -7.120 116 6.675 -6.660
82 6.002 -6.742 117 7.132 -6.113
83 6.227 -6.788 118 5.281 -5.725
84 5.895 -6.612 119 5.956 -7.255
85 6.237 -6.754 120 6.686 -6.598
86 6.178 -6.021 121 6.604 -6.482
87 6.761 -6.821 122 6.163 -6.335
88 6.748 -6.501 123 5.814 -7.142
89 6.800 -6.308 124 6.931 -6.749
90 6.295 -7.015 125 6.49 -6.674
91 7.157 -6.809 126 6.143 -6.455
92 6.259 -7.497 127 5.717 -6.803
93 6.215 -6.484 128 6.243 -7.318
94 6.547 -7.371 129 6.870 -5.965
95 5.806 -6.889 130 6.05 -6.719
96 6.244 -5.959 131 6.602 -5.782
97 6.195 -6.246 132 6.186 -6.208
98 6.968 -6.612 133 5.626 -6.905
99 6.055 -6.954 134 5.688 -6.903
100 6.566 -5.645 135 6.479 -5.913
101 6.324 -6.704 136 6.376 -6.523
102 6.324 -6.606 137 6.210 -6.970
103 6.144 -6.535 138 5.698 -6.570
104 5.746 -7.259 139 6.318 -6.417
105 5.885 -6.875 140 6.703 -6.993
588 Appendix: Tables of Data
Number Y1 Y2 Number Y1 Y2
1 1.682 0.932 41 5.906 1.715
2 -3.716 0.026 42 0.762 -0.696
3 -6.653 -0.751 43 11.267 4.463
4 -9.166 -2.120 44 0.869 -4.606
5 1.153 2.829 45 -0.828 -1.242
6 5.679 1.283 46 3.884 3.720
7 -2.375 -0.251 47 2.597 3.473
8 -0.219 -0.495 48 -2.351 0.045
9 1.545 -2.629 49 -3.123 0.786
10 3.298 -0.232 50 5.260 1.596
11 8.033 5.651 51 2.985 -0.543
12 -2.142 -1.198 52 -0.871 0.404
13 6.907 2.628 53 -0.717 2.866
14 3.186 1.108 54 3.256 1.875
15 -6.594 -1.747 55 2.572 -0.363
16 0.337 1.857 56 -2 .127 -1.428
17 4.748 0.208 57 5.911 0.445
18 4.526 1.462 58 -3.877 -0.277
19 -0.749 -2.873 59 4.908 4.052
20 2.499 -2.227 60 2.306 1.785
21 -1.284 -1.254 61 1.317 3.234
22 2.338 0.247 62 0.725 -0.670
23 -9.096 -0.316 63 10.644 5.478
24 -0.387 2.785 64 1.131 -0.138
25 8.332 4.728 65 3.695 3.078
26 -3.184 -3.332 66 -4.991 1.467
27 3.201 0.967 67 -1.589 1.694
28 -0.245 -2.648 68 2.175 -3.422
29 -3.337 2.500 69 5.496 1.910
30 4.656 -0.901 70 -5.380 -2 .006
31 4.029 1.718 71 4.104 3.952
32 -0.546 -0.558 72 -0.026 0.276
33 -0.450 1.908 73 -6.982 -1.564
34 1.543 0.386 74 -2.369 -3.255
35 -1.908 -1.891 75 -1.892 0.831
36 -4.493 1.269 76 -0.409 -1.461
37 4.298 0.725 77 2.792 0.351
38 2.223 1.987 78 5.785 1.071
39 0.569 -1.264 79 -6.592 -4.13
40 -4.569 -4.819 80 2.593 2.910
Appendix: Tables of Data 589
Number Y1 Y2 Number Y1 Y2
81 6.590 -6.348 121 5.980 -7.278
82 6.897 -6.334 122 6.389 -6.913
83 6.656 -5.906 123 5.557 -6.441
84 6.142 -6.002 124 5.798 -6.304
85 6.719 -6.827 125 6.572 -6.290
86 6.333 -6.499 126 5.504 -6.898
87 6.642 -6.703 127 6.365 -5.376
88 6.154 -5.958 128 5.599 -5.577
89 6.310 -6.706 129 6.391 -6.220
90 6.085 -6.233 130 6.497 -6.739
91 6.525 -6.410 131 5.282 -6.437
92 5.991 -6.212 132 5.986 -6.776
93 6.485 -6.279 133 7.151 -6.489
94 6.529 -6.663 134 6.200 -5.995
95 6.852 -6.335 135 6.748 -6.358
96 5.993 -6.373 136 5.793 -6.919
97 6.181 -6.749 137 6.387 -6.492
98 5.734 -6.360 138 6.971 -6.490
99 6.110 -6.515 139 7.050 -6.104
100 6.675 -7.288 140 6.247 -6.210
101 6.002 -6.023 141 -11.541 -9.499
102 6.753 -6.808 142 -11.123 -10.368
103 6.455 -7.162 143 -10.139 -10.070
104 6.750 -6.262 144 -11.810 -10.152
105 6.233 -6.302 145 -11.124 -11.049
106 5.853 -7.013 146 -11.999 -10.550
107 6.019 -6.462 147 -10.918 -10.602
108 6.690 -6 .270 148 -11 .548 -10.489
109 6.691 -6.710 149 -10.814 -10.284
110 6.004 -6.080 150 -11.033 -10.100
111 6.410 -6.634 151 -11.359 -10.029
112 6.401 -6.614 152 -10.080 -10.895
113 6.275 -6.086 153 -10.902 -10.486
114 6.924 -7.248 154 -10.778 -10.507
115 6.511 -6.648 155 -11.402 -10.590
116 4.727 -6.225 156 -11.129 -10.052
117 5.953 -5.145 157 -11.906 -10.140
118 5.526 -6.154 158 -11.212 -10.178
119 6.618 -6.723 159 -5.873 11.385
120 6.201 -6.924 160 -5.416 10.832
590 Appendix: Tables of Data
Number YI Y2 Number YI Y2
1 -5.012 -0.739 44 -9.047 -1.510
2 -5.535 -0.876 45 6.944 5.484
3 -1.016 -0.314 46 -2.049 -0.523
4 -9.488 0.449 47 -3.544 0.466
5 3.434 1.369 48 -3.965 -4.043
6 0.455 1.958 49 -2.074 -2.499
7 -1.416 0.679 50 -2.083 -0.930
8 -5.544 -1.809 51 -6.175 -2.212
9 -7.615 -2.717 52 3.176 1.211
10 -7.078 -2.640 53 2.396 2.057
11 2.402 5.193 54 -5.982 -1.003
12 -0.564 -3.159 55 -5.354 -2.102
13 7.103 2.199 56 -4.732 -1.773
14 5.275 0.482 57 11.640 9.161
15 -5.661 1.042 58 -0.957 0.746
16 5.325 3.274 59 -2.667 2.636
17 0.879 -2.706 60 -1.817 0.892
18 4.345 1.376 61 -2.956 -1.730
19 -1.234 1.464 62 1.063 -0.646
20 0.025 -0.351 63 1.939 0.912
21 -0.576 0.063 64 0.027 -2.777
22 8.103 2.426 65 -0.024 -1.929
23 3.720 -0.530 66 0.300 0.155
24 0.988 -2.850 67 1.093 -2.275
25 1.768 0.066 68 -0.427 -0.202
26 4.399 1.344 69 -4.791 0.375
27 -7.954 -5.681 70 -4.956 -2.937
28 2.065 5.042 71 5.926 1.355
29 -4.554 -3.484 72 5.228 -1.806
30 -3.260 -3.660 73 3.895 -0.023
31 -5.093 -1.782 74 -0.592 0.097
32 -4.163 -0.377 75 -1.349 0.472
33 7.246 0.552 76 -5.471 -1.188
34 0.964 1.931 77 -6.910 -2.812
35 -0.395 -1.249 78 -1.660 -1.560
36 5.575 -1.425 79 4.715 2.412
37 3.338 0.704 80 2.151 1.496
38 -0.930 2.217 81 6.138 -7.120
39 0.400 0.019 82 6.002 -6.742
40 -2.274 -0.602 83 6.227 -6.788
41 -6.458 -1.084 84 5.895 -6.612
42 -5.623 - 0.922 85 6.237 -6.754
43 4.600 -2.675 86 6.178 -6.021
Appendix: Tables of Data 591
Number Yl Y2 Number Yl Y2
87 6.761 -6.821 129 6.870 -5.965
88 6.748 -6.501 130 6.050 -6.719
89 6.800 -6.308 131 6.602 -5.782
90 6.295 -7.015 132 6.186 -6.208
91 7.157 -6.809 133 5.626 -6.905
92 6.259 -7.497 134 5.688 -6.903
93 6.215 -6.484 135 6.479 -5.913
94 6.547 -7.371 136 6.376 -6.523
95 5.806 -6.889 137 6.210 -6.970
96 6.244 -5.959 138 5.698 -6.570
97 6.195 -6.246 139 6.318 -6.417
98 6.968 -6.612 140 6.703 -6.993
99 6.055 -6.954 141 3.042 -4.594
100 6.566 -5.645 142 3.663 -4.623
101 6.324 -6.704 143 2.124 -4.524
102 6.324 -6.606 144 3.809 -6.481
103 6.144 -6.535 145 4.891 -5.099
104 5.746 -7.259 146 4.084 -5.492
105 5.885 -6.875 147 3.562 -5.194
106 6.759 -5.894 148 4.234 -3.757
107 5.845 -6.799 149 4.065 -4.472
108 6.704 -6.189 150 2.541 -4.430
109 5.902 -5.774 151 3.614 -3.448
110 6.618 -6.566 152 3.716 -5.075
111 5.301 -6.897 153 3.969 -5.732
112 6.169 -6.784 154 4.303 -5.312
113 5.861 -7.152 155 2.467 -4.155
114 5.463 -6.305 156 2.036 -3.381
115 6.787 -6.698 157 3.654 -5.486
116 6.675 -6.660 158 3.410 -2.913
117 7.132 -6.113 159 3.738 -5.630
118 5.281 -5.725 160 4.029 -3.836
119 5.956 -7.255 161 4.541 -6.641
120 6.686 -6.598 162 3.991 -5.016
121 6.604 -6.482 163 3.628 -3.185
122 6.163 -6.335 164 3.250 -4.873
123 5.814 -7.142 165 2.747 -2.800
124 6.931 -6.749 166 4.012 -4.499
125 6.490 -6.674 167 4.986 -5.499
126 6.143 -6.455 168 5.178 -7.342
127 5.717 -6.803 169 4.184 -5.203
128 6.243 -7.318 170 3.306 -5.657
592 Appendix: Tables of Data
TABLE A .l6. Investment funds data from Il Sole - 24 Ore 1999: Yl and y2
performance, Y3 volatility
Number Y1 Y2 Y3 Number Y1 Y2 Y3
1 13.20 29.40 21.30 27 14.40 31.60 22.10
2 12.10 27.10 20.00 28 10.40 29.50 21.60
3 15.50 32.40 21.70 29 10.30 34.10 22.90
4 10.60 28.30 21.00 30 11.50 36.20 23.30
5 12.40 28.90 21.20 31 12.60 28.70 21.20
6 14.60 34.90 20.70 32 11.80 25.70 20.10
7 9.60 30.50 21.60 33 12.70 34.30 24.00
8 8.70 28.30 22.20 34 9.20 30.80 24.00
9 14.80 32.90 20.30 35 18.30 28.30 19.80
10 17.30 36.90 22.30 36 11.20 30.40 22.10
11 7.70 25.80 21.10 37 21.00 36.40 20.40
12 12.10 27.60 23.50 38 9.20 25.40 21.90
13 13.50 32.50 22.40 39 17.00 45.10 23.20
14 5.50 32.40 25.50 40 10.20 30.20 22.40
15 10.90 32.40 23.00 41 15.50 35.70 22.60
16 15.40 34.60 20.20 42 10.90 32.20 22.80
17 11.60 29.90 22.00 43 16.70 33.20 23.20
18 13.00 27.60 22.40 44 5.50 27.50 23.10
19 10.00 32.50 20.60 45 12.60 31.00 21.40
20 14.60 40.80 23.40 46 10.00 27.20 20.00
21 3.80 28.70 18.20 47 11.60 30.00 21.50
22 15.00 30.90 22.00 48 10.00 28.80 22.30
23 10.70 30.10 21.20 49 10.60 26.40 20.00
24 12.70 31.60 20.40 50 -0.30 28.00 16.70
25 11.40 26.10 19.80 51 13.80 31.40 22.80
26 9.60 28.00 19.50 52 24.60 49.10 22.00
Appendix: Tables of Data 593
Number Y1 Y2 Y3 Number Y1 Y2 Y3
53 12.20 34.60 20.90 79 11 .40 19.60 12.00
54 -0.40 22.10 15.90 80 -0.60 10.80 9.10
55 15.80 34.00 23.20 81 9.00 19.70 11.90
56 14.80 32.10 20.30 82 7.10 15.00 8.80
57 7.70 10.80 9.50 83 10.50 14.10 9.30
58 13.40 18.60 10.70 84 9.20 15.70 9.30
59 15.50 13.00 10.50 85 7.60 20.20 11.70
60 11.40 13.80 8.50 86 8.40 17.10 12.60
61 10.30 19.20 11.60 87 9.90 16.40 9.10
62 6.70 17.20 11.00 88 13.20 19.90 12.20
63 9.60 17.20 10.30 89 10.80 31.10 15.50
64 8.70 13.60 8.60 90 12.90 26.60 12.20
65 9.20 20.00 12.20 91 11.40 18.00 11.40
66 8.10 16.50 12.10 92 8.30 17.40 12.20
67 5.80 22.70 12.50 93 11.90 20.80 12.10
68 15.20 23.50 13.40 94 7.40 18.00 10.90
69 7.90 18.80 12.20 95 8.90 12.40 8.70
70 11.60 23.20 12.70 96 0.70 18.90 12.90
71 9.70 22.80 12.70 97 3.90 19.40 10.80
72 6.50 18.20 13.50 98 11 .60 17.10 10.30
73 16.20 20.50 12.40 99 10.50 17.90 12.30
74 10.30 19.80 10.80 100 8.40 16.30 13.30
75 9.80 16.50 10.30 101 9.30 21.20 12.30
76 9.00 19.00 11.20 102 10.30 18.70 10.30
77 17.00 34.30 12.70 103 10.20 21.20 12.10
78 7.80 13.20 10.00
594 Appendix: Tables of Data
TABLE A .17. Diabetes data from Reaven and Miller (1979): Yl and Y2 responses
to oral glucose, y3 insulin resistance
Nurober Y1 Y2 Y3 Number Y1 Y2 Y3
1 80 356 124 38 78 335 241
2 97 289 117 39 106 396 128
3 105 319 143 40 98 277 222
4 90 356 199 41 102 378 165
5 90 323 240 42 90 360 282
6 86 381 157 43 94 291 94
7 100 350 221 44 80 269 121
8 85 301 186 45 93 318 73
9 97 379 142 46 86 328 106
10 97 296 131 47 85 334 118
11 91 353 221 48 96 356 112
12 87 306 178 49 88 291 157
13 78 290 136 50 87 360 292
14 90 371 200 51 94 313 200
15 86 312 208 52 93 306 220
16 80 393 202 53 86 319 144
17 90 364 152 54 86 349 109
18 99 359 185 55 96 332 151
19 85 296 116 56 86 323 158
20 90 345 123 57 89 323 73
21 90 378 136 58 83 351 81
22 88 304 134 59 100 398 122
23 95 347 184 60 llO 426 ll7
24 90 327 192 61 80 333 131
25 92 386 279 62 96 418 130
26 74 365 228 63 95 391 137
27 98 365 145 64 82 390 375
28 100 352 172 65 84 416 146
29 86 325 179 66 100 385 192
30 98 321 222 67 86 393 115
31 70 360 134 68 93 376 195
32 99 336 143 69 107 403 267
33 75 352 169 70 ll2 414 281
34 90 353 263 71 93 364 156
35 85 373 174 72 93 391 221
36 99 376 134 73 90 356 199
37 100 367 182 74 99 398 76
Appendix: Tables of Data 595
Number Y1 Y2 Y3 Number Y1 Y2 Y3
75 93 393 490 111 114 643 155
76 89 318 73 112 103 533 120
77 98 478 151 113 300 1468 28
78 88 439 208 114 303 1487 23
79 100 429 201 115 125 714 232
80 89 472 162 116 280 1470 54
81 91 436 148 117 216 1113 81
82 90 413 344 118 190 972 87
83 94 426 213 119 151 854 76
84 85 425 143 120 303 1364 42
85 96 465 237 121 173 832 102
86 111 558 748 122 203 967 138
87 107 503 320 123 195 920 160
88 114 540 188 124 140 613 131
89 101 469 607 125 151 857 145
90 108 486 297 126 275 1373 45
91 112 568 232 127 260 1133 118
92 105 527 480 128 149 849 159
93 103 537 622 129 233 1183 73
94 99 466 287 130 146 847 103
95 102 599 266 131 124 538 460
96 110 477 124 132 213 1001 42
97 102 472 297 133 330 1520 13
98 96 456 326 134 123 557 130
99 95 517 564 135 130 670 44
100 112 503 408 136 120 636 314
101 110 522 325 137 138 741 219
102 92 476 433 138 188 958 100
103 104 472 180 139 339 1354 10
104 75 455 392 140 265 1263 83
105 92 442 109 141 353 1428 41
106 92 541 313 142 180 923 77
107 92 580 132 143 213 1025 29
108 93 472 285 144 328 1246 124
109 112 562 139 145 346 1568 15
110 88 423 212
Bibliography
1. f. Jolllffc