You are on page 1of 15

Phenetic Taxonomy at the Species Level and above

Author(s): P. H. A. Sneath
Source: Taxon, Vol. 25, No. 4 (Aug., 1976), pp. 437-450
Published by: International Association for Plant Taxonomy (IAPT)
Stable URL: http://www.jstor.org/stable/1220526 .
Accessed: 17/06/2014 21:55

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

International Association for Plant Taxonomy (IAPT) is collaborating with JSTOR to digitize, preserve and
extend access to Taxon.

http://www.jstor.org

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
TAXON 25(4): 437-450. AUGUST 1976

PHENETIC TAXONOMY AT THE SPECIES LEVEL AND ABOVE

P. H. A. Sneath*

Summary
Phenetic analyses are valuable at all taxonomic ranks. A review of work at higher
ranks is presented, and certain difficulties are discussed: these are just as serious for
traditional studies, though often glossed over. Difficulties include determination of
homologies, incompleteness of data and shortage of constant characters. It is also
necessaryto employ similarity coefficients and cluster methods that are not too sensitive,
respectively, to effects of gross size and numbers of OTU's. The number of OTU's
required to represent a homogeneouscluster is apt to be underestimated:even with a
simplified model of phenetic variation this number should be at least io and preferably
25 or more. Some newer developments in numerical analysis of phylogeny are briefly
reviewed.

Phenetic taxonomy is now generally contrasted with phylogenetic taxonomy,


and this was an essential idea in the original use of the term phenetic relationship
by Cain and Harrison (I960) to mean overall similarity based on all available
characters. They were speaking of the resemblance or similarity that was
estimated from the observed features of organisms without reference to how
these features had arisen in the course of evolution. It is now widely conceded
that the bulk of taxonomic work is basically phenetic, and that phylogenetic
deductions must be made from phenetic evidence. This has been most cogently
argued by Colless (I967, I969a, b) and a well balanced discussion is provided by
Moss and Hendrickson (I973).
However, although Cain and Harrison defined phenetic relationship as being
based on unweighted (more precisely equally-weighted) characters, this would
seem to be a subsidiary condition, and it is probable that they simply wished to
emphasize that explicit phyletic weighting, or extremely heavy weighting of any
kind (e.g. restriction to a few "key" characters), was not intended. Burtt (1964)
has pointed out that numerical phenetics need not be based on equally-weighted
characters, and refers to classifications based on equally-weighted characters as
isocratic. The point has been further elaborated recently (Moss, I972; Moss and
Hendrickson, I973). We cannot study the totality of characters of an organism, so
that overall similarity is that over all of some accessible sample of characters.
Though this may lead to philosophical problems (and to certain statistical problems
attendant on the difficulty of defining what is the population of characters that is
being sampled), it has not proved particularly troublesome in practice at and above
the species level. This is largely because of the substantial congruence between
classifications based on large samples of characters at the higher ranks.
The species level is more difficult to define because of the lack of a single
definition of the term species. Sokal and Crovello (I970) note the very poor
operational definition of the "biological species concept", but the root of the
problem is the use of the term species in several different senses (Ravin, I963).
For the present discussion we may consider the species level is that at which
distinct phenetic clusters can be observed.
The practical steps in carrying out a phenetic study can be briefly summarized
as follows. First, the scope of the study must be decided and the Operational

: Departmentof Microbiology,University of Leicester,Leicester,England.

AUGUST 1976 437

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
Taxonomic Units (OTU's) must be chosen. Then the charactersmust be selected,
and coded and scaled into appropriate numerical form. Next the resemblance
between the OTU's must be estimated, using a resemblancemeasureappropriate
to the work, followed by some techniqueto reveal the main taxonomic structure
in terms of phenetic groupings (phenons). Lastly one can obtain appropriate
generalisationsor deductionsabout these phenons,- for example the selection of
diagnostic charactersor hypotheses on cladogeny. Comprehensivetreatmentand
reviews can be found in Jardineand Sibson (I971), Sneath and Sokal (I973) and
Clifford and Stephenson(1975), and for special aspects in microbiology Sneath
(1972); other reviews include Rowell (1970), Moss and Hendrickson (i973) and
Crovello (I970). But first one or two points about high taxonomic ranks should
be mentioned.
It has occasionallybeen suggestedthat numericalpheneticsshould be restricted
to specified rank categoriesor certain kinds of characters.Thus Oldroyd (1966)
suggestedit should not be used above the species level, and Throckmorton(I965)
has similarviews. But argumentsof this kind are not very cogent, and there have
been many successful applications to a wide range of ranks and character sets
(Sneath and Sokal, I973).
There are, however, some specialproblemsof studiesat high ranks.Homology is
one of the weakest areas in phenetics, and homologies may be particularly
uncertain at high ranks (Jardine, 1969; Jardine and Sibson, I97I). This is true
also of attractive alternativesin molecular biology; for example there is a good
deal of uncertainty about how best to determinehomology in protein sequences
(e.g. see Fitch and Margoliash, I970) despite the fact that protein sequencesare
capable of giving some numerical estimates of resemblancebetween the most
diverse organisms.Thus yeast and man have about 6o?/oof amino acids in com-
mon in their cytochromec sequence(see Dayhoff, 1972 for a review), and what is
recognizably the basic phylogenetic tree of living things is emerging from such
studies. But such work is critically dependent upon correct determination of
homologies,and to a lesserextent upon other assumptionsdiscussedlater. Problems
of homology remainwhatever definition of homology is adopted, whether phylo-
genetic (derivation of an organ from a common ancestor), morphogenetic(e.g.
treating bracts as modified leaves), isomorphiccorrespondence(e.g. homologizing
bones by their spatial relations to adjoining bones and organs), or compositional
(e.g. presenceof a given chemicalcompoundin seeds). Baum (I973) has suggested
how one might estimate the reliability of homologies in data on taxa of high
ranks, so as to give some indication of the confidence one would place upon
debatablefindings.
Another problem with very high ranks is that high-level taxa show numerous
exceptions to constancy of characterstates. Thus there are insects without wings,
arthropodswithout legs, angiospermswithout leaves, and so on, so that many
characterscannot be given a single, constant, character-statevalue. If they are
omitted the Character Relevance (Crovello, I968a) may become too low, so
rather than omitting such characters(or scoring them as "no comparison")it is
usual to choose exemplarsin such a way as to reflect the main variation patterns.
A further problem is that above the species level it is often found that existing
descriptionsare very incomplete, even in specialist monographs;thus one may
find that numerouscharactersare recordedfor only a few of the taxa.
Thereis much currentinvestigationupon different numericalphenetic methods,
a great deal of which revolves about two main areas,resemblancecoefficientsand
methods of cluster analysis.Below the species level the variation in size between
individuals is usually not very great, (though there are notable exceptions in
plants) so that at least the more commonlyused resemblancecoefficientsare likely
to give much the same results,whereasthe often complex taxonomic structurecan
be summarizedin very different ways by different methods of cluster analysis.
The choice of clustering method is therefore particularly important. At high
ranks, in contrast, differences in size (and other components of phenetic re-
semblance), can be large. Different resemblance coefficients can thus lead to
438 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
radically different patterns of resemblance.It might be thought that choice of
cluster method would then not matter very much, especially as at higher ranks
there will be large gaps in phenetic hyperspacebetween the phenon clusters. To
some extent this is true, because, for example, single and average link methods
will then usually both recover the main clusters. But certain cluster analysis
methods are sentitive to the numberof OTU's that belong to a given cluster, and
this is discussedlater. The method should be one that is not too sensitive to this
effect if a wide range of taxonomic ranks are being studied, otherwise serious
distortionmay result.
It is first necessary to choose a selection of organismsas the OTU's. Because
the number of OTU's is likely to be severely limited for several reasons (acces-
sibility, time and effort to code characters,numberof OTU's that can be handled
by the computer, etc.) the sample for each is likely to be limited to a few re-
presentativesof each species or higher taxon; these are chosen as exemplars of
the taxa. The exemplars need not be individual organisms, though there are
some advantagesin this becausein this way we may expect to reflect the natural
variation closely. They may, instead, be formal descriptionsof species and other
taxa, as represented,for example, by character averages, so the OTU's would
then be hypothetical"averageorganisms".There will then be only one OTU for a
species, and perhaps only a few specieswill be chosen to representa genus even
if the genus contains numerousspecies. It will be realized that the selection of
exemplarsmay be difficult, particularly if the taxonomic structureis uncertain,
or the validity of the species are in doubt, and for this reason it is wise to give
some considerationto the choice of exemplars, an area that has not had much
attention. When the variation pattern in a genus is very complex and there are
numerousminor variant forms (well illustrated by the recent numerical taxon-
omic study of Plantago by Rahn, I975), these problemsmay be severe.
However, currentwork with the exemplar method gives general confidence in
it. Thus workers like Moss and Webster (I969), McNeil, Parker and Heywood
(I969) and da Cunha (I973) have found that conspecific individuals nearly
always cluster together, and when there is a marked subspecificstructurethis is
also usually preserved(Bidault, I968). If possible at least three individuals should
be chosen for each known homogeneouscluster (usually species), both to obtain
some reasonablerepresentationof the variation within species in relation to the
gaps between them, and also as an insuranceon the validity of the species them-
selves, so that unexpected results can be specially investigated. The number of
individualsneeded to representa speciesadequately is of course many more than
three, but, as always, a compromisebetween theory and practice must be reached.
It is an advantage to have about the same numberof exemplarsfor each basic
taxon becauseof the sensitivity of some cluster methods to the numberof OTU's
in a cluster(discussedlater), but at higher ranks this may lead to difficulty because
a genus of many species and one of few speciescannot both be easily represented
by three OTU's: the larger genus is bound to be poorly sampled. If three OTU's
are chosen from every species of both genera the larger genus may be somewhat
over represented,but this would seemthe betterof the two alternatives.
We have recently been experimenting with a simplified model of phenetic
variation in bacteria,in which it is assumedthat a speciescan be representedfairly
closely by a hypersphericalmultivariatenormal swarm.The low level of character
correlations in our material, and the fact that binary characters,if sufficiently
numerous,yield Euclidean distances that are quite well-behaved in a statistical
sense (Sneath, I974b), give confidence that this model is a reasonableguide to
practice for studies on bacteria. With some modification we hope the conclusions
will be applicable for studies on other organisms.In this model a taxon can be
defined by its centroid and its standarddeviation. If V d 2 jE is the root mean
squaredistancebetween the OTU's j and the true centroid E, V 2 j / (t-
i.e. where there are t OTU's, (t being large), then the standard deviation in any
AUGUST I976 439

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
direction is assumed to be about / (d2 jE / n) where there are n uncorrelated
characters. This is obviously not so for axes parallel to the character axes in the
case of binary characters, but it is approximately true in our material for other
axes (which represent the great majority of possible ones).
It is well known that the precision of estimate of the centroid of a hyper-
spherical normal swarm based on a sample of t individuals, if expressed as the
expected distance dES between the sample centroid S and the true centroid E
is V (d 2 jE / t). The effective radius of such a swarm is rather greater than
V d2 jE (Sneath, I974b) so that if we wish the observed centroid to lie within
the effective radius we must make t sufficiently large to make certain that
the observedcentroid will have a high probability, P, of lying within V d2 jE
from the true centroid. In order to ensure this it is necessary to know the Stan-
dard Error of the expected discrepancy d ES , and normal distribution theory
gives this SE (dES) as approximately V d2 jE / t V2. If we wish the observed
centroid to lie within the "true" envelope (taken as a sphere of V d2 jE about
the true centroid), we need to have t large enough so that d ES + k SE (d ES )
is less than V d2 jE , where k is the one-tailed probability integral. We can con-
veniently take k as 2.0 corresponding to P of 980/o.
Values of t = 3 or t = 4 are near the lower acceptable limit and barely give
d ES less than the effective radius. A parallel calculation can be made on the
accuracy of estimate of the effective radius, whose Standard Error is about
V (d 2 jE /2t), which gives a small increase in the uncertainty of estimate of the
centroid and radius of the small sample, but for a rough indication we may
neglect this. We would naturally prefer to obtain better estimates, and if we wish
these to be about one-fifth of the effective radius we need t of 25. A rather less
stringent criterion is afforded by t = Io, giving the position of the taxon ac-
curate to about I/3 of the effective radius. Our current experience with bacterial
species suggests that these formulae give quite good guidelines to the size of
sample needed, and are probably applicable to other material where the clusters
are reasonably multivariate normal hyperspheres.
It is only a small step conceptually from representing a taxon by its centroid
and radius (estimated from a sample of exemplars) to the step of representing the
taxon in the same way in a higher rank study. By this means each lower taxon
could be represented by a new composite OTU based on a large sample of
individuals, without having to include all these numerous individuals in the
higher rank study itself. Experiments on bacteria on these lines are now being
undertaken in our laboratory. It is clear that some precision is lost by assuming
that the higher and lower taxa are hyperspherical; also, new resemblance coef-
ficients, that include the character state variation as well as the mean or modal
value, will be needed, such as those of Sanghvi (I953), Crovello (i968b) and
McNeill (I974). Nevertheless useful developments seem to be possible on these
lines.
The choice of characters is unlikely to pose special problems except at very
high ranks, although there must be a large enough sample to keep sampling
errors reasonably small, preferably oo00-200. At the highest rank levels there may
be few characters that can be employed. As noted earlier, homology problems
may be severe, and in comparing different classes or phyla one may not have
many characters that are comparable. The use of protein sequences has been
mentioned above: what other characters could be used to compare man and
yeast? Other possibilities include the presence of certain widespread chemical

440 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
substances (e.g. cellulose, chitin, haemoglobin), or the occurrence of defined
histologicalcell types (e.g. tracheids,striatedmuscle,goblet cells). Some characters
of this sort have of course been used to distinguish high-level taxa since the
earliest days, particularlywithin the algae, but the phenetic study of large sets of
such charactershas received little attention. Other possibilities from molecular
biology also exist, but it should be noted that DNA pairing and comparative
serology are apt to lose sensitivityat the highest taxonomicranks.The congruence
between classificationsbased on different character sets is probably fairly good
at high taxonomic ranks,though I have pointed out (Sneath, 1971) that this is not
an all-or-none phenomenon, and, as Farris (I97I) has noted, it is congruence
ratherthan the non-specificityhypothesisthat is of most significancefor phenetics.
In some fields there is a good deal of experimentalerror in determiningchar-
acter states. This is so in microbiologicaltests for example, and is probably so
also in much chemotaxonomicwork (e.g. Weimarck, 1970; Sneath and Johnson,
1972). Variation caused by environmentaleffects, season of the year, etc., can
also be considerablein plants (e.g. Taylor, I97I). All these sources of variation
will increase the "noise" of the system, and unless they are themselves under
investigation, the taxonomist will wish to remove this as unwanted "error", or at
least to know the extent of disturbance they have caused. For matching coefficients
the effect of such error has been discussed by Sneath and Johnson (1972), and
the effect on taxonomic distance is given in Sneath and Sokal (I973). For
example, for matching coefficients the effect of a proportion p of erroneous
presence-absence characters is to reduce the similarity of identical OTU's by
about 2p, so that with p = 5?/o the similarity between replicates would be about
go9/o instead of the expected ioo0?/. This value also has a variance that is dependent
on p and on the number of characters. If p is over about i0o/o the distortion
becomes serious. Similar error variance is produced by other kinds of "error",
including mistaken homology and gaps in the primary data (an area that requires
study).
A final problem is how to handle very large data sets, though this is not peculiar
to studies at higher ranks. Although a number of computer methods are known
that can reduce the computation (e.g. Gower and Ross, I969; Lance and Wil-
liams, 1966), the most practical method is to process the OTU's in batches, and
then choose exemplars from the clusters that have been found, and run these
again together with any unclustered OTU's (Sneath, 1964), thus achieving the
sort of hierarchic progression that is used in traditional systematic work.
Many resemblance coefficients have been used in numerical taxonomy, (discuss-
ed at length by Sneath and Sokal, I973). There are no special classes of coef-
ficient required at high ranks, but, as noted earlier, the wide variation in high-
ranking studies means that the taxonomist must be careful to choose methods
that yield the components of phenetic similarity that he requires. The major
components for most phenetic studies are size versus shape. Although we have
a clear idea of gross size, on closer examination it is not so easy to define
rigorously, and our concept of shape is a good deal more vague. Various models
have been suggested, but one of the most useful is a vector model in which size is
the length of the line from the origin of the character hyperspace to the point
representing an OTU, and shape difference between two OTU's is the angle
between them when viewed from the origin (Sneath and Sokal, I973). The well-
known correlation coefficient measures principally the shape difference in this
sense, and indeed it is an expression of an angle from a special viewpoint.
At the species level and above the taxonomist will usually require shape
coefficients, so correlations are generally suitable. If only presence-absence (o,
i) characters are available then size in the ordinary sense will not apply (though
an analogue of size can be calculated, Sneath, 1968) and the Simple Matching
Coefficient is generally satisfactory. It must be noted that with plants there
can be extreme differences in size even within the same population, and this is a
phenomenon that poses yet further problems, because size differences in individual
plants do not affect all parts equally. Thus in many dwarfed plants the flowers
AUGUST I976 44I

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
are scarcely smaller than normal. Each major part or organ responds differently
to the dwarfing, and no consistent technique for handling this has yet been
developed. An allied problem is what emphasis to place on large-scale versus
small-scale detail in complex organs (Sneath, I967). In such situations it may be
bestto rely mainly on presence-absencecharacters.
The coding and scaling of charactersis bound up with character weighting.
The arguments in favour of equal weighting are now well known, and few
consistent alternativeshave been proposed; even then the results are usually not
very different (Sneath and Sokal, I973; investigationsinto the way taxonomists
have apparently weighted characterswhen performing conventional taxonomy
have recently been initiated, e.g. Hansell and Ewing, I973). The practical steps
in coding and scaling are conceptuallysimple but lengthy to describein detail, so
it will suffice to refer to the discussionin Sneath and Sokal (I973), with the
added note that coding and scaling must be chosen to preserve the desired
componentsof pheneticresemblance.There are still some problemsin handling a
mixture of quantitative and presence-absencecharacters,but coefficients like that
of Gower (I971) can handle these, and resultsseem usually acceptable(for further
discussionsee Lance and Williams, 1967; Anderson, I97I).
Both of the two principal ways of representingtaxonomic structure,- cluster
analysis yielding phenograms, and ordination yielding ordination plots, - are
useful at the higher ranks. They emphasize rather different aspects of that
structure.Cluster analysis tends to give poor representationof the more distant
relationships:these are in reality multiple, becausethe resemblancebetween OTU's
from different clustersare seldom, if ever, all the same. This can be indicated in
the phenograms(Rohlf, I970). Ordination gives poor representationof the closest
relationships.Cluster methodswill impose a structureon any data, even random
data, but they do yield discrete and objective phenons. Ordination gives useful
maps or models, but the phenons are then circumscribedby eye, a step that may
be unacceptablysubjective.The newer techniquesof principal coordinateanalysis
and multidimensionalscaling seem particularly promising forms of ordination.
It seems good advice to compute both a phenogram and an ordination. Space-
conserving cluster methods such as the Average Link methods have several
advantages (Lance and Williams, 1967; Hall, 1969), and for higher ranks the
usual non-overlappingclustersare generally satisfactory.Moss (1967) illustratesa
numberof novel graphicmethods.
It is important to calculate some measureof the distortion of the phenogram
or ordination comparedto the similarity matrix from which they are derived; a
useful review is that of Rohlf (I974). The cophenetic correlation coefficient is
particularly useful here. It should be rememberedthat difference in levels of
branchesin phenogramsof less than about one StandardError of the resemblance
coefficient cannot be taken as significant (at least when single OTU's are involv-
ed); there is a similar uncertaintyin the position of OTU's in ordinationplots, a
fact that is not always realised. There is still a good deal of uncertainty about
the shape of taxon clustersin characterspace (discussedby Rohlf, I970), though
they are commonlysupposedto be roundedor ellipsoidal.
One aspect of taxonomic structurethat is of special significance at high ranks
is the sensitivity of the methodsto the numbersof OTU's in the various clusters
(Fig. i). Certain techniquesare particularlyliable to give misleadingresults.The
greatest danger is with Information Analysis (Williams, Lambert and Lance,
1966; Hall, I967; Williams, Clifford and Lance, I97I). Because the quantity of
information(in the informationtheoreticsense) increaseswith increasingnumbers
of OTU's, it is possiblefor a clusterof numerousvery similar OTU's to dominate
the picture to such an extent that all other, rarer, OTU's are forced together into
one clustereven though they are extremelydifferent from one another. Thus, one
can get a phenon of quite unrelated organisms,- the "rag-bag"effect. This also
producesphenogramswith a notable shortageof monotypic phenons (e.g. McNeill
et al., 1969), in contrastto the well-known preponderanceof monotypic taxa of
all ranks first noted by Willis (1922). Some other cluster methods, notably those
442 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
?
O 00 (

?
Sums of
WPGMA UPGMA Squares
a
0

d
10

A
Fig. i. Illustrationof the effect of numberof OTU's in a cluster upon different clustering
methods.
(A) The upper part of the figure shows the positions of OTU's a to g two dimensions.
The lower part shows the phenograms based on Euclidean distances resulting from
three methods of clustering. These are the Weighted Pair Group Method, with
arithmetic averages (WPGMA), the Unweighted Pair Group Method with arithmetic
averages (UPGMA), and Sums of Squares (in which the OTU's or clusters are joined
to maintain at a minimum the total of the within-group sums of squares). Information
Analysis is known to behave very similarly to Sums of Squares but is not readily
illustrated with two-dimensional data. All three phenograms show the same topology.

that minimize the sums of squared differences within clusters, behave similarly.
A geometric interpretation is that the distances (more strictly the squared
distances) are summed, rather than averaged as in the Average Link methods.
Indeed it is known that Information Analysis behaves much like sums of squares,
because of its relation to the chi-square distribution. Examples of distortion by
these cluster methods are seen in the studies of Clifford and Williams (1973) and
Tadauchi(I975)
The opposite behaviour is seen with the weighted group methods. Thus, in
contrast to the unweighted pair-group method using arithmetic averages
(UPGMA), the weighted group method, WPGMA, is not disturbed by a large
number of very similar OTU's within one of the clusters (Fig. I). This is because

AUGUST I976 443

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
c 6 OTU's

Sums of
WPGMA UPGMA Squares

oa 0

5- 5

d d1
10 10

15 15

(B) OTU c has been replicated so that there are 6 identical OTU's at this position
on the two-dimensional diagram. The WPGMA phenogram is unchanged except for
the furcation carrying 6 members of phenon c. The UPGMA phenogram now shows a
united first with d, because the average distance between a and the phenon formed
by b plus six c's is slightly greater than the distance between a and d. The Sums of
Squares phenogram shows that a, b and d have been united before the six c's join,
because the sums of the squares of the cluster a, b, d is less than the sums of the
squares given by a cluster formed by b and 6 c's.

WPGMA does not, - as the various clusters join, - give as much weight to each
OTU in the dense cluster as it does to the OTU's of the sparse clusters. Despite
the fact that the cophenetic correlation is always higher with UPGMA than with
WPGMA (Sneath, 1969), this insensitivity to the number of OTU's is an advantage
that may be important for high level studies. It may be noted in passing that
Single Link clustering is also relatively insensitive to the number of OTU's,
except insofar as an increased number is likely to give a maximum similarity
between clusters that is slightly higher than the maximum when few OTU's are
present.
There have been several recent critiques of numerical phenetics (Johnson, I968;
Sneath, I97I; Moss and Hendrickson, I973; Sneath and Sokal, I973), so I shall

444 TAXON VOLUME 2 5

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
C?
0
':: 17OT's
17 OTU's

?
Sums of
Squares

0-

5-

d
10

15-

(C) The number of identical c's has been increased to 17. The WPGMA phenogram is
still unaffected by the numerous OTU's in phenon c. The UPGMA phenogram has
scarcely been further affected. The Sums of Squares phenogram, however, shows that
all the OTU's outside phenon c have been forced together into a group which c
joins at a low level. This is because the sums of squares of the group a b d e f g
is less than that of any group that contains 17 c's plus another OTU from a, b, d, e,
f or g.

turn to a few points only. It is sometimes thought that numerical phenetics at


higher ranks simply confirms what was already well known and well established.
Whereas it is true that there is considerable concordance between the old and new
classifications, there are also commonly a number of discordances. Some of these
may be due to inappropriate numerical techniques. But quite commonly these
discrepancies affect only a minor part of the phenetic structure, and it is dif-
ficult to explain such selective effects by inappropriate technique or improper
character weighting. Phipps (1972) among others has pointed out that the reason
that early taxonomies were inadequate is likely to be because of limitations of
the human mind to process finer details of complex and voluminous data, despite
the outstanding ability of the mind to perceive the main patterns. Similar com-

AUGUST 1976 445

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
ments are made by Steam (I964), Pasteels and Kistner (I97I) and others, and it
is now widely recognized that good numerical studies can yield reliable inform-
ation that is hard to obtain by traditional methods. This has been especially marked
in microbiology. Where workers have been willing to accept the broad outlines
of phenetic groupings they have commonly been satisfied with the numerical
analyses (either as phenetic or often as probable cladistic arrangements), and one
may cite papers of Pasteels and Kistner (1971) and Ivimey-Cook (1969) as
examples in zoology and botany. An illuminating study, in which several of the
common methods are compared, is the recent paper by McNeill (I975) on the plant
genera Montia and Claytonia: the reasons for many of the historical difficulties
in classifying the group are made much clearer by the phenetic analyses. Moss and
Hendrickson (1973) also stress the value of numerical phenetics as an adjunct
to ordinary taxonomic studies.
There have been few numerical phenetic studies at the very highest ranks. One
of the most noteworthy is the attempt by Young and Watson (1970) to classify
numerically the families of dicotyledons. This is a large and difficult endeavour,
both because of the large number of families (with a very uncertain arrangement
of orders), and also because the recorded data is very incomplete with much
ambiguity on homologies. Some of the findings are not very convincing. For
example the position of the Polygonaceae and the Chenopodiaceae close to
the magnolid groups conflicts with the protein sequence evidence (Boulter et al.,
1972; Boulter I974) which suggests that they represent an early offshoot from
the angiosperm stock (though this is also a matter for some debate). A detailed
critique of the findings would be of considerable interest, but it should be noted
that a very large number of relationships are as one would expect, particularly
those involving obviously closely-related families. It is the unexpected findings
that need explaining, and the reasons would surely be illuminating. Although it is
not possible to be sure without reworking their data, it is possible that the method
that Young and Watson employed, which was Information Analysis, may have
been responsible for some distortion, perhaps for the reasons discussed earlier.
Other applications of numerical phenetics at high ranks are the studies of
Cuffey (I973) and Barnett (1974) on bryozoa and foraminifera respectively,
which gave generally acceptable results despite some difficulties in obtaining suf-
ficient characters and deciding upon homologies. A study of molluscan shell
proteins by Ghiselin, Degens, Spencer and Parker (1967) was based on rather
few characters, - the amounts of different amino acids, - so that the disagreement
with the presumed phylogeny, though marked, is probably not very significant.
An examination of genera of molluses by Bretsky (1971) showed more convincing
results: congeneric species were almost always clustered together as expected,
though there were discrepancies at the higher levels with the traditional phylogeny,
and probably some unwanted general size effects were also present. In both of
these studies much of the disagreement depends on the accuracy of the phylogenies,
for which critical evidence is difficult to assemble.
In microbiology there have been few unsatisfactory reports. Skyring and Quad-
ling (I969) were unable to recover the expected taxonomic structure of a varied
collection of soil bacteria, but Debette et al. (I975), using a different numerical
technique did find much of the underlying variation in another similar collection
of bacteria. Probably the two-stage ordination used by Skyring and Quadling
led to loss of significant taxonomic information. Kendrick and Weresub (I966)
in attempting a numerical taxonomy of orders of Basidiomycetes were unable
to find any technique that gave results closely concordant with the traditional
taxonomy, but choice and homologies of characters is particularly difficult in the
fungi, and in the absence of any independent evidence (e.g. from protein sequences)
it is difficult to evaluate either numerical or traditional classifications. Numerical
taxonomy on other fungi (e.g. genera and species of yeasts, Campbell, I975)
seems to give generally acceptable results, although there is still little evidence
on the degree of congruence between classifications based upon perfect and im-
perfect states of fungi.
446 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
Another study by Watson and his colleagues on the Ericales (Watson, Wil-
liams and Lance, 1967) has been criticised by Burtt, Hedge and Stevens (I970),
largely on grounds that inappropriate weights or homologies were used. It is
however surprisingthat many of Watson's suggestions (broadly supported by
recent chemotaxonomicstudies of Harborne and Williams, I973) seem to have
become incorporated into the new classification of Stevens (1971), although
little mention is made there of the work of Watson and his colleagues. It will be
interesting to see how much of the findings of El-Gazzar et al. (I968) on the
numerical phenetics of Salvia will become incorporatedinto the next revision
of this genus, despite the criticisms of Hedges (in Burtt et al., 1970) on the
choice of characters.
Recent years have seen the rapid development of numerical methods for
studying phylogeny, under the name of numerical cladistics (reviewed by Esta-
brook, I972). Whether or not one prefers to base the usual taxonomies solely on
phenetic evidence, the interrelationsbetween phenetics and cladisticswill always
be of interest to systematists.This subjectis of particularsignificancein botany,
because there is a good deal of evidence that evolution in flowering plants has
not been as straightforwardas it is thought to have been in most animal groups:
in particular there is much circumstantial evidence for reticulate evolution
(Grant, I97I) and this requires new concepts and methods of study (Sneath,
I974a; Sneath, Sackin and Ambler, I975; Sneath I975). It is possible that some
of the unexpectedfindings on plant proteins (Boulter et al., I972; Boulter et al.,
1976) may be explainedby newer methodsof analysis, discussedin several of these
papers, that could reveal phenomena like distant episodes of hybridization,
convergence,different rates of evolution in different lineages or in different sets
of characters,and the like.
Throughout this paper I have tried to mention the difficulties that are com-
monly met with, and it must be noted that there are still many unsolved
problems in numerical phenetics. Some of these depend on the inadequaciesof
taxonomic theory, i.e. numerical techniques cannot be developed until taxon-
omists can say exactly what it is they wish to measure. The majority of dif-
ficulties are equally severe problems for orthodox taxonomy, but were not
obvious until numerical studies were attempted. We still lack satisfactory tests
of the significance of clusters and of criteria of optimality of structure.Among
the biggest challengesare those posed by homology and by charactercoding and
scaling. For both of these we still lack comprehensiveand practical solutions,
which will be essentialif automatic scanning of specimensfor systematicwork is
to become a reality. This is probably the area of greatest challenge, and one that
will have to draw heavily on pattern recognitionand other computertechnologies.
This will become an especially important field of study when data banks in
systematicsbecomeestablished,as will probablyhappenin the coming decade..

References
ANDERSON, A. J. B. 1971 - Similaritymeasurefor mixed attributetypes. Nature
232: 416-417.
BARNETT,R. S. I974 - An application of numerical taxonomy to the classification of
the Nummulitidae(Foraminiferida). J. Paleontol.48: 1249-2163.
BAUM, B. R. I973 - The conceptof relevancein taxonomywith specialemphasison
automaticclassification.Taxon22: 329-332.
BIDAULT, M. 1968 - Essai taxonomieexperimental et numeriquesur Festuca ovina L.
s.l. dansle sud-estde la France.Rev. Cytol. Biol. Veg. 31: 2I7-356.
BOULTER,D. I974 - The evolutionof plant proteinswith specialreference to higher
plant cytochromes c. Curr.Adv. Plant Sci. 8: i-I6.
BOULTER, D., B. G. HASLET, D. PEACOCK, J. A. M. RAMSHAW and M. D. SCOWAN 1976 -
The chemistry,functionand evolutionof plastocyanin.In D. H. Northcote,(ed.)
Medicaland TechnicalPubl.Co., Lancaster.
PlantBiochemistry. in press
AUGUST 1976 447

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
BOULTER, D., J. A. M. RAMSHAW, E. W. THOMPSON, M. RICHARDSON, and R. H. BROWN
I972 - A phylogeny of higher plants based on the animo acid sequences of
cytochrome c and its biological implications. Proc. Roy. Soc. Lond. B 181: 441-455.
BRETSKY, S. M. 1971 - Evaluation of the efficacy of numerical taxonomic methods: an
example from the bivalve molluscs. Syst. Zool. 20: 204-222.
BURTT, B. L. 1964 - Angiosperm taxonomy in practice. In V. H. Heywood and J.
McNeill (eds.), Phenetic and Phylogenetic Classification, pp. 5-I6. Syst. Ass. Pub.
6. i64 pp.
BURTT, B. L., I. C. HEDGE and P. F. STEVENS I970 - A taxonomic critique of recent
numerical studies in Ericales and Salvia. Notes Roy. Bot. Gard. Edinburgh 30:
I4I-I58.
CAIN, A. J. and G. A. HARRISON 1960 - Phyletic weighting. Proc. Zool. Soc. Lond.
I35: 1-3I.
CAMPBELL, I. I975 - Numerical analysis and computerized identification of the yeast
genera Candida and Torulopsis. J. Gen. Microbiol. go: 125-132.
CLIFFORD, H. T. and W. STEPHENSON I975 - An Introduction to Numerical Classification.
Academic Press, New York.
CLIFFORD, H. T. and W. T. WILLAMS I973 - Classificatory dendrograms and their
interpretation. Aust. J. Bot. 21: 151-162.
COLLESS, D. H. 1967 - The phylogenetic fallacy. Syst. Zool. i6: 289-295.
COLLESS, D. H. g969a- The phylogenetic fallacy revisited. Syst. Zool. i8: II5-I26.
COLLESS, D. H. I969b - The interpretation of Hennig's "Phylogenetic Systematics" - a
reply to Dr. Schlee. Syst. Zool. 18: 134-I44.
GROVELLO, T. J. 1968a - Different concepts of relevance in a numerical taxonomic study.
Nature 218: 492.
CROVELLO, T. J. i968b - The effect of alteration of technique at two stages in a numerical
taxonomic study. Univ. Kansas Sci. Bull. 47: 761-786.
CROVELLO, T. J. 1970 - Analysis of character variation in ecology and systematics.
Annu. Rev. Ecol. Syst. I: 55-98.
CUFFEY, R. J. 1973 - An improved classification, based upon numerical-taxonomic
analyses, for the higher taxa of entoproct and ectoproct bryozoans. In Larwood,
G. P. (ed.) Living and Fossil Bryozoa, pp. 549-564. Academic Press, London.
CUNHA, R. A. DA I973 - Taxonomia numerica de algunas Meliponinae (Hymenoptera
= Apidae) Ciencia Biologica, Portugal, I: 25-42.
DAYHOFF, M. 0. (ed.). 1972 - Atlas of Protein Sequence and Structure 1972, vol. 5.
National Biomedical Research Foundation, Washington, D.C.
DEBETTE,J., J. LOSFIELD,and R. BLONDEAUI975 - Taxonomie numerique de bacteries
telluriques non fermentants a Gram negatif. Can. J. Microbiol. 21: 1322-I334.
EL-GAZZAR,A., L. WATSON,W. T. WILLIAMS,and G. N. LANCE 1968 - The taxonomy
of Salvia: a test of two radically different numerical methods. J. Linn. Soc. Lond.
Bot. 60: 237-250.
ESTABROOK, G. F. I972 - Cladistic methodology: a discussion of the theoretical basis
for the induction of evolutionary history. Annu. Rev. Ecol. Syst. 3: 427-456.
FARRIS, J. S. 197I - The hypothesis of nonspecificity and taxonomic congruence. Annu.
Rev. Ecol. Syst. 2: 277-302.
FITCH, W. M. and E. MARGOLIASH 1970 - The usefulness of amino acid and nucleotide
sequence in evolutionary studies. Evolut. Biol. 4: 67-Io9.
GHISELIN,M. T., E. T. DEGENS,D. W. SPENCER,and R. H. PARKERI967 - A phylogenetic
survey of molluscan shell matrix proteins. Breviora, No. 262, 35 pp.
GOWER, J. C. 1971 - A general coefficient of similarity and some of its properties.
Biometrics 27: 857-871.
GOWER, J. C., and G. J. S. Ross 1969 - Minimum spanning trees and single linkage
cluster analysis. Appl. Statist. 18: 54-64.
GRANT, V. E. 1971 - Plant Speciation. Columbia Univ. Press, New York.
HALL, A. V. 1967 - Studies in recently developed group-forming procedures in taxonomy
and ecology, J. S. Afr. Bot. 33: I85-I96.
HALL, A. V. 1969 - Avoiding informational distortion in automatic grouping programs.
Syst. Zool. 8: 3 8-329.

448 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
HANSELL, R. I. C., and B. EWING I973 - The detection and estimation of character
weighting in classifications. J. Theor. Biol. 39: 297-314.
HARBORNE, J. B. and C. A. WILLIAMS I973 - A chemotaxonomic survey of flavonoids
and simple phenols in leaves of the Ericaceae. Bot. J. Linn. Soc. 66: 37-54.
IVIMEY-COOK, R. B. 1969 - Investigations into the phenetic relationships between species
of Ononis. Watsonia 7: 1-23.
JARDINE,N. 1969 - The observational and theoretical components of homology: a study
based on the morphology of the dermal skull-roofs of rhipidistian fishes. Biol. J.
Linn. Soc. I: 327-36I.
JARDINE,N., and R. SIBSON I97I - Mathematical Taxonomy. John Wiley, London.
286 pp.
JOHNSON, L. A. S. I968 - Rainbow's end: the quest for an optimal taxonomy. Proc.
Linn. Soc. New South Wales 93: 8-45; reprinted with additional comments in
Syst. Zool. I9: 203-239 (I970).
KENDRICK,W. B. and L. K. WERESUB1966 - Attempting neo-Adansonian computer
taxonomy at the ordinal level in the basidiomycetes. Syst. Zool. I5: 307-329.
LANCE,G. N. and W. T. WILLIAMS1966 - Computer programs for hierarchical poly-
thetic classification ("similarity analyses"). Computer J. 9: 60-64.
LANCE, G. N., and W. T. WILLIAMSI967 - Mixed-data classificatory programs I.
Agglomerative systems. Aust. Computer J. I: I5-20.
MCNEILL, J. I974 - The handling of character variation in numerical taxonomy.
Taxon 23: 699-705.
MCNEILL, J. I975 - A generic revision of Portulacaceae tribe Montieae using techniques
of numerical taxonomy. Can. J. Bot. 53: 789-809.
MCNEILL, J., P. F. PARKER, and V. H. HEYWOOD 1969 - A taxometric approach to the
classification of the spiny-fruited members (tribe Caucalideae) of the flowering-
plant family Umbelliferae. In A. J. Cole (ed.), Numerical Taxonomy. Proceedings
of the Colloquium in Numerical Taxonomy held in the University of St. Andrews,
September 1968, pp. 129-147. Academic Press, London. 324 pp.
Moss, W. W. 1967 - Some new analytic and graphic approaches to numerical taxonomy,
with an example from Dermanyssidae (Acari). Syst. Zool. i6: 177-207.
Moss, W. W. I972 - Some levels of phenetics. Syst. Zool. 21: 236-239.
Moss, W. W., and J. A. HENDRICKSON, JR. 1973 - Numerical Taxonomy. Annu. Rev.
EntomoI. i8: 227-258.
Moss, W. W., and W. A. WEBSTER1969 - A numerical taxonomic study of a group
of selected strongylates (Nematoda). Syst. Zool. 8: 423-443.
OLDROYD,H. 1966 - The future of taxonomic entomology. Syst. Zool. I5: 253-260.
PASTEELS, J. M., and D. H. KISTNER I97I - Revision of the termitophilous subfamily
Trichopseniinae (Coleoptera: Staphylinidae). II. The remainder of the genera with
a representational study of the gland systems and a discussion of their relationships.
Misc. Pub. Entomol. Soc. Amer. 7: 351-399.
PHIPPS, J. B. 1972 - Studies in the Arundinelleae (Gramineae). XI. Taximetrics of
changing classifications. Can. J. Bot. 50: 787-802.
RAHN, 0. I974 - Plantago section virginica. A taxonomic revision of a group of
American plantains, using experimental, taximetric and classical methods. Dansk
Bot. Arch. 30 (2): I-I80.
RAVIN, A. W. 1963 - Experimental approaches to the study of bacterial phylogeny.
Amer. Natur. 97: 307-318.
ROHLF, F. J. 1970 - Adaptive hierarchical clustering schemes. Syst. Zool. I9: 58-82.
ROHLF, F. J. 1974 - Methods of comparing classifications. Annu. Rev. Ecol. Syst. 5:
IOI-II3.
ROWELL, A. J. 1970 - The contribution of numerical taxonomy to the genus concept.
In E. L. Yochelson (ed.), Proceedings of the North American Paleontological Con-
vention, Chicago 1969, vol. i, part C, pp. 264-293. Allen Press, Lawrence, Kansas,
2 vols.
SANGHVI, L. D. I953 - Comparison of genetical and morphological methods for a
study of biological differences. Amer. J. Phys. Anthropol. II: 385-404.
SKYRING, G. W. and C. QUADLING I969 - Soil bacteria: principal component analysis of

AUGUST 1976 449

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions
descriptions of named cultures. Can. J. Microbiol. 15: 141-158.
P. H. A. 1964 - New approaches to bacterial taxonomy: use of computers.
SNEATH,
Annu. Rev. Microbiol. i8: 335-346.
SNEATH,P. H. A. 1967 - Trend-surface analysis of transformation grids. J. Zool.,
Lond. I5I: 65-122.
P. H. A. 1968 - Vigour and pattern in taxonomy. J. Gen. Microbiol. 54: I-I.
SNEATH,
SNEATH, P. H. A. 1969 - Evaluation of clustering methods. In A. J. Cole (ed.),
Numerical Taxonomy. Proceedings of the Colloquium in Numerical Taxonomy
Held in the University of St. Andrews, September 1968, pp. 257-27I. Academic
Press, London. 324 pp.
SNEATH,P. H. A. I971 - Numerical taxonomy: criticisms and critiques. Biol. J. Linn.
Soc. 3: I47-157.
SNEATH,P. H. A. I972 - Computer taxonomy. In J. R. Norris and D. W. Ribbons (eds.),
Methods in Microbiology,vol. 7A, pp. 29-98. Academic Press, London.
SNEATH,P. H. A. I974a - Phylogeny of micro-organisms. Symp. Soc. Gen. Microbiol.
24: I-39.
P. H. A. i974b - Test reproducibilityin relation to identification. Int. J. Syst.
SNEATH,
Bacteriol. 24: 508-523.
SNEATH,P. H. A. 1975 - Cladistic representation of reticulate evolution. Syst. Zool.
24: 360-368.
SNEATH, P. H. A. and R. JOHNSONI972 - The influence on numerical taxonomic
similarities of errors in microbiological tests. J. Gen. Microbiol. 72: 377-392.
SNEATH,P. H. A., M. J. SACKIN,and R. P. AMBLER1975 - Detecting evolutionary in-
compatibilities from protein sequences. Syst. Zool. 24: 311-332.
SNEATH,P. H. A., and R. R. SOKAL1973 - Numerical Taxonomy: the Principles and
Practice of Numerical Classification. W. H. Freeman and Company, San Fran-
cisco. 573 PP.
SOKAL,R. R., and T. J. CROVELLO1970 - The biological species concept: a critical
evaluation. Amer. Natur. 104: 127-153.
STEARN,W. T. 1964 - Problems of character selection and weighting: introduction. In
V. H. Heywood and J. McNeil (eds.), Phenetic and Phylogenetic Taxonomy, pp.
83-86 Syst. Ass. Pub. 6. 164 pp.
STEVENS,P. F. 1971 - A classification of the Ericaceae: subfamilies and tribes. Bot. J.
Linn. Soc. 64: I-53.
TADAUCHI,O. I975 - Numerical phenetic relationships of the genus Andrena (Hymenop-
tera, Andrenidae) of Japan, with a new introduction of component pattern diagrams.
Kontyu, Tokyo 43: 181-201.
TAYLOR,R. J. 1971 - Intraindividual phenolic variation in the genus Tiarella (Saxi-
fragaceae); its genetic regulation and application to systematics. Taxon 20: 467-472.
THROCKMORTON, L. H. 1965 - Similarity versus relationship in Drosophila. Syst. Zool.
14: 221-236.
WATSON,L., W. T. WILLIAMS, and G. N. LANCE1967 - A mixed-data numerical appraoach
to angiosperm taxonomy: the classification of Ericales. Proc. Linn. Soc. Lond.
I78: 25-35.
WEIMARCK,G. I970 - Spontaneous and induced variation in some chemical leaf
constituents in Hierochloe (Gramineae). Bot. Notiser I23: 231-268.
WILLIAMS, W. T., H. T. CLIFFORD,and G. N. LANCE 1971 - Group-size dependence: a
rationale for choice between numerical classifications. Computer J. 14: I57-I62.
WILLIAMS, W. T., J. M. LAMBERT, and G. N. LANCE1966 - Multivariate methods in plant
ecology. v. Similarity analyses and information-analysis. J. Ecol. 54: 427-445.
WILLIS,J. C. I922 - Age and Area. A Study in Geographical Distribution and Origin
of Species. Cambridge Univ. Press, Cambridge. 259 pp.
YOUNG,D. J., and L. WATSON1970 - The classification of dicotyledons: a study of the
upper levels of the hierarchy. Aust. J. Bot. 18: 387-433.

450 TAXON VOLUME 25

This content downloaded from 195.34.79.223 on Tue, 17 Jun 2014 21:55:49 PM


All use subject to JSTOR Terms and Conditions

You might also like