You are on page 1of 17

Archaeometry 43, 1 (2001) 131147.

Printed in Great Britain


Department of Mathematics, Statistics and Operational Research, Nottingham Trent University, Clifton Campus, Nottingham NG11 8NS, UK

Model-based methods for clustering artefacts, given their chemical composition, often assume sampling from a mixture of multivariate normal distributions and/or make explicit assumptions about the way in which a composition is formed. It is argued that, analysed within a modelling framework, several important and apparently competing methodologies are more similar than would initially appear. The opportunity is taken to note that models for populations are often not compatible with models for compositions, and that dilution correctionwhich can be accomplished in a variety of wayscan be interpreted as an attempt to resolve this problem.


This paper has arisen from a project that has, as one of its aims, the investigation of statistical model-based approaches to the analysis of artefact compositional data. The particular focus is on the use of model-based methods for grouping or clustering data. A model-based analysis is taken to be an approach that incorporates some, or all, of the following features: (1) A model is formulated for the composition of an artefact or case. (2) A model is formulated that explains why artefacts (cases) of the same kind (and possibly from the same source) differ in their composition. This may include assumptions about the nature of the population from which a sample of artefacts is drawn. (3) The data analysis is inuenced by the modelling assumptions of (1) and/or (2). Model-based approaches may be contrasted with exploratory methods, which include methods of cluster analysis (e.g., average linkage, complete linkage) commonly used in the archaeometric literature. Model-based methods typically involve the exploitation of statistical assumptions that are absent from exploratory methods. They may be more demanding of data and computational resources, but are also potentially more rewarding in ways to be discussed later. The focus in this paper is on model-based approaches that have been used in the archaeometric literature for the analysis of artefact compositional data. A main contention is that several apparently competing, and supercially distinct, methodologies that have been proposed are much more similar than is apparent at rst sight. For convenience of exposition, there is an emphasis on the analysis of ceramic compositional data, though the arguments advanced hold more generally. A separate paper will investigate model-based clustering approaches that have been proposed in the statistical literature, but which have yet to be exploited by archaeologists (Papageorgiou et al. 2000).
* Received 13 March 2000; accepted 5 July 2000. # University of Oxford, 2001.


M. J. Baxter

It is noted that models that have been proposed for ceramic paste compositions and models that have been used to analyse samples of ceramics are contradictory, in the sense that assumptions of the latter are incompatible with the implications of the former. What has sometimes been referred to as `dilution correction' can be viewed as an attempt to resolve this problem, and this is also studied from a modelling perspective. The next two sections establish notation and develop the models for ceramic pastes and samples of ceramics that underpin discussion in the main section of the paper. In the main section methodologies developed in laboratories at Brookhaven, Missouri, Bonn and Barcelona (Bieber et al. 1976; Glascock 1992; Beier and Mommsen 1994; Buxeda 1999) are examined, and compared within an explicit modelling framework. In all but the last case, the methodologies depend (at least in part) on assumptions about the normality of compositional data within groups, and use probability calculations based on Mahalanobis distance calculations to assess whether cases belong to a group (e.g., Sayre 1975; Sayre et al. 1992; Glascock 1992; Beier and Mommsen 1994; Leese and Main 1994). Such calculations, which exploit the modelling assumption of normality, are not possible in exploratory methodologies, and constitute one of the major uses of modelling assumptions in statistical analyses of archaeometric data.

A single artefact will be referred to as a case; a collection of n cases, to be subjected to chemical and statistical analysis, will be called a sample. Models may be proposed for both case composition and sample distribution, as will be seen in the next section. The observed composition for the ith case will be denoted by the p 1 vector xi , with xi j the j and s2 value of case i and jth variable, Xj . Let x j be the estimated mean and variance of the jth the p 1 vector of means. variable, with x The notation xi j will also be used for transformed data, as follows and as will be clear from the context: (1) xi j xi j , untransformed data; j , centred data; (2) xi j xi j x j =sj , standardized data; (3) xi j xi j x (4) xi j log xi j , logged data; (5) xi j logxi j =xi p for j 1; 2; . . . ; p 1, log-ratio transformed data. The data matrix X with typical element xi j will be called the untransformed, centred, standardized, logged or log-ratio data matrix, according to which transform is used; and other possibilities, such as standardized logged data, exist. The multivariate normal distribution (MVN) plays a fundamental role in most modelling approaches to the analysis of artefact compositional data. If xi is sampled from a p-variate multivariate normal distribution with mean vector m and covariance matrix S, write xi , MVN m; S: is an unbiased estimate of m and If all n cases are sampled from the same MVN , then x S X0 X=n 1 is an unbiased estate of S. In many applications, data are analysed with the expectation that there is more than one group or cluster in the data, reecting the underlying population structure. In this situation assume that the data are sampled from G populations, where G may be unknown in advance of analysis. Let ng be the number of cases sampled from the gth population, so that SG g1 ng n. Let g g1 ; . . . gn 0 be a labelling vector such that gi g if the case is a sample from the gth

Statistical modelling of artefact compositional data


population. Then Xg is a data matrix whose rows consist of those x0i for which gi g. That is, Xg is an ng p (or ng p 1 for log-ratios) data matrix. It is typically assumed that xi , MVN mg ; Sg where mg and Sg are the mean vector and covariance matrix for population g and x0i is a row of Xg .

Models for cases To x ideas, consider a model for a ceramic made from two components, a clay and a temper. The composition of the ceramic paste may be modelled as 1 yi p1i z1i p2i z2i where the p 1 vector zci : c 1; 2 is the composition of component c and pci : c 1; 2 are mixing proportions with p1i p2i 1. This is a weighted sum of the two components, and expresses, in mathematical language, a natural idea about how pastes are formed. To explain why two pastes formed from components from the same clay and temper vary in composition, statistical assumptions must be invoked. One obvious reason (e.g., Neff et al. 1988, 1989) is that mixing proportions, pci , may vary. Another is that zci is randomly distributed within a source. If this distribution is modelled as MVN we have a model for a case of the form yi , p1i MVN n1 ; Q1 p2i MVN n2 ; Q2 2 where nc and Qc are the mean vector and covariance matrix that characterize variation within the source for component c. Equation (2) generalizes directly to C > 2 components, as C X pci MVN nc ; Qc 3 yi , and to other materials, where c1 pci 1. The paste composition yi may be modied by a variety of processes, so that the observed composition xi may differ from that of the paste. Formally, this can be written as xi yi z1i ; z2i ; . . . ; zCi to indicate the transformation from components to paste to measured composition. To simplify presentation it is assumed, unless otherwise stated, that there is no signicant difference between the paste and observed composition, so xi and yi can be used interchangeably, and equation (3) is the model for xi . In equation (3), yi (or xi is a weighted sum of normal distributions, and is thus itself normal. The precise form of the distribution depends on the mixing proportions, pci , so that, in general, two cases xi and xj will be samples from different normal distributions. A sample of n cases (with the same component sources) can thus be regarded as a mixture of n MVN s (assuming that the pci differ). Since a mixture of n MVN s is not, in general, MVN , an immediate consequence is that the sample is not itself MVN . The model for case compositions in equation (3) is supercially similar to, and inspired by, models used in the simulation studies of Neff et al. (1988, 1989) and Bishop and Neff (1989) to PC
c 1


M. J. Baxter

investigate the effects of tempering in the statistical analysis of compositional data. However, it is not identical. They model the zci as log-normal rather than normal, so that for a single variable a multiplicative model for the paste composition of the form yi j
C Y k1
ki zp ki j

is obtained. This is less natural than either the additive model discussed above or the alternative multiplicative model introduced later in equation (5). If individual components are log-normal the paste composition will not, of course, be normal under the additive model. Models for samples Neff (1998, 116), in a discussion of chemistry-based ceramic provenance studies, argues that the fundamental challenge is to `align geographic co-ordinate units with the multi-dimensional space dened by measured elemental concentration units'. This is based on the provenance postulate of Weigand et al. (1977, 24) `that there exist differences in chemical composition between different natural sources that exceed, in some recognizable way, the differences observed within a given source'. Neff (1998, 116) equates `source' with a `point or zone of origin' from which the materials used to make a pot originate. This can be viewed as a `model' of the kind of variation to be expected in samples of cases obtained from different sources. Ideally, cases will form separated clusters in a high-dimensional space determined by the number of variables measured. These clusters may then be viewed as samples from distinct populations that may be equated with `source' and located in geographical space. These ideas can be eshed out in the form of statistical models, one possibility being a mixture model of the form xi ,
G X g1

pg MVN mg ; Sg

where the pg are the mixing proportions, and SG g1 pg 1. It is important to emphasize that what we now have is a model for the population from which the sample of cases xi is drawn, rather than a model for the composition of a single case, as previously. The model states that the observed sample is drawn from G separate populations, which have distinct MVN distributions. The MVN assumption is not essential, but is that almost invariably used in model-based approaches. Many provenance studies are not based on models such as equation (4), but rely on exploratory methods of cluster analysis to determine the value of G, and associate cases with the population from which they are sampled. This approach, or any sensible grouping method, is likely to work well if the component populations of the mixture are well separated, and does not require the MVN assumption. Problems can arise if components are not well separated and/or exhibit similar and high variable correlations within different populations. The main point to make here is that the models for cases in equation (3) and for samples in equation (4) are incompatible. In particular, if the former model is valid, it cannot be assumed that a sample of cases from a population have an MVN distribution, and hence the assumption that populations in the latter model are MVN must be invalid. The former model is the more fundamental and it then follows that any modelling approach based on the latter model must be wrong, unless corrective action is taken. Some practitioners are clearly aware of this difculty

Statistical modelling of artefact compositional data


and attempts to avoid or minimize the problem are discussed after rst looking at how models have been used in practice.

Mahalanobis distance Mahalanobis distance plays an important role in several approaches that make use of models that can be related to equation (4). Its properties are discussed at some length in Baxter (1999) and Baxter and Buck (2000), and only the most salient features are reviewed here. The squared Mahalanobis distance between a case and the centroid of a group is dened as 0 S1 xi x : d2 i xi x It can be viewed as a generalization of Euclidean distance that allows for the correlation structure of a group. The smaller d 2 i is, the closer a case is to a group centroid. For sufciently `small' d 2 i a case may plausibly be regarded as a member of the group. The idea of `small' may be made precise by introducing the modelling assumption that the group is a sample from a p-variate MVN population. This allows d 2 i to be transformed to a probability ( p-value). If the p-value is too smallconventionally less than 0.05 or 0.01it may be doubted that the case belongs to the group against which it is being tested. Details of implementation of this idea vary, with some researchers basing the probability calculations on large sample approximations (e.g., Beier and Mommsen 1994) and others on small sample approximations (e.g., Glascock 1992). The more sophisticated uses of the idea take into account whether or not xi belongs to the group against which its membership is being tested (e.g., Leese and Main 1994; Slane et al. 1994). In ceramic studies this usage of Mahalanobis distance can be traced to work at the Brookhaven National Laboratory (BNL) in the 1970s (Sayre 1975; Bieber et al. 1976; Harbottle 1976). It has subsequently been developed by researchers, some now at the Missouri University Research Laboratory (MURR) (Neff 1992; Glascock 1992) and will be referred to as the BNL/MURR approach. In certain approaches to the statistical analysis of lead isotope ratio data, Mahalanobis distance is also important (Sayre et al. 1992). Lead isotope elds for an ore source are assumed to have a trivariate normal distribution. Samples are used to estimate the distribution of a eld, and cases too distant from the centroid of the sample are excluded from this estimation process. Once a eld has been delineated in this way, Mahalanobis distance and probability calculations, based on the lead isotope signatures of artefacts, may be used to assess whether an ore source is a possible provenance for an artefact. A major practical problem in using Mahalanobis distance that has long been recognized is the sample size requirement. For a single group, as a minimum, n > p is necessary in order to be able to estimate S. For stable estimation much larger values of n are needed, with n > 3p and preferably n > 5p having been suggested (Harbottle 1976). The sample size implications, when there are several groups in the data and p is large, as is typical with many modern analytical techniques such as Neutron Activation Analysis (NAA), are obvious. Even for problems where the dimensionality is small, as in lead isotope ratio analysis where p 3, the requirements are not trivial. Pollard and Heron (1996, 328) detected an emerging consensus that n 20 was an `agreeable minimum' sample size. This has been challenged in Baxter et al. (2000) who argue that 20 may be adequate if lead isotope elds are MVN distributed, but will usually be inadequate if one wishes to test this modelling assumption.


M. J. Baxter

Two themes emerge here, that recur in the study of model-based methods. One is that sample sizes may preclude the use of model-based methods that one would like to use. The second is that, even if such use is possible, it will often not be feasible to test the assumptions of the model, so that the model's validity is essentially an act of faith. The BNL/MURR approach The BNL/MURR approach is not prescriptive, but rather may be viewed as comprising a set of tools to be deployed as appropriate (e.g., Slane et al. 1994; Lizee et al. 1995; Steponaitis et al. 1996; Hegmon et al. 1997; Triadan et al. 1997; Herrera et al. 1999). The discussion that follows concentrates on aspects relevant to model-based approaches to statistical analysis. Typically data are logged (to base 10) before analysis. One reason for this is an assumption that the variables (often trace elements measured in ppm) are more likely to be MVN within groups on the log scale, and the MVN assumption is necessary for other procedures that are used. The merits, or otherwise, of transformation are discussed in Sayre (1975), Bieber et al. (1976), Pollard (1986), Beier and Mommsen (1994), Baxter (1995) and below. It is often assumed that measurement error is unimportant in relation to `natural' variation, so that the former can be ignored, and an explicit justication for this assumption is given in Bieber et al. (1976). A recurring concern is the problems posed by high correlations among variables within populations (Sayre 1975; Harbottle 1976, 1991). If present, these give rise to populations, and samples from them, that have an elliptical shape in p-dimensional space. Exploratory methods of cluster analysis are the multivariate workhorse in the analysis of compositional data, and often the only method presented in publications (Baxter 1994). These often produce spherical clusters of roughly equal size and can be misleading if the true structure is ellipsoidal. This can be demonstrated explicitly for Ward's method of cluster analysis, but may also be the case for other supposedly `model-free' methods (Gordon 1999, 658). Where this is a concern a solution is to model the correlation or covariance structure explicitly (Krzanowski and Marriott 1995, 89). The BNL/MURR approach relegates cluster analysis to a minor role and applies group evaluation methodology to groups provisionally dened using cluster analysis or on the basis of archaeological criteria. Using the assumption that the groups being sought are samples from MVN distributions, probability calculations are undertaken using Mahalanobis distance to evaluate whether or not a case could belong to a group. This allows for the elliptical shape of clusters, provided that sample sizes permit the calculation. Where this is not immediately possible, recourse may be had to a subset of the principal components of the data (e.g., Slane et al. 1994; Herrera et al. 1999) or to subsets of variables that appear to discriminate between groups (Bieber et al. 1976, 678). Cases may be added to, or deleted from, a group and the process repeated until a stable clustering is found. It is worth noting that, where initial groups are dened using multivariate methods, this often involves the use of unstandardized logged data. Subsequent group evaluation does, however, introduce standardization through the use of Mahalanobis distance. It may be remarked that, in assuming an MVN distribution, cases that do not conform with the assumption are likely to be rejected from groups, so that what nally remains will tend to be MVN . In this sense the MVN assumption might be regarded as a `self-fullling prophecy' that tends to impose MVN structure on the results obtained. The Bonn approach The approach developed at Bonn University (e.g., Mommsen et al. 1988; Beier and Mommsen

Statistical modelling of artefact compositional data


1994) challenges several aspects of the BNL/MURR methodology, including the need for using logged data, the lack of importance of measurement error, and the importance of high correlations. The argument to follow is that the methodologies are more similar than might appear at rst sight. The central ideas in Beier and Mommsen (1994) are readily explained. Starting from a single case, or a small number of similar cases, a group is `grown' by adding to it cases that are `close' to the starting set. This gives a new group, and cases close to this are further added to the group. This process proceeds iteratively until no cases are close to the group. A second group is then `grown' from a different starting point and so on until all cases are assigned to a group or regarded as outliers. The fundamental modelling assumption is that groups are sampled from an MVN distribution. Closeness to a group is decided on the basis of distance and probability calculations, underpinned by the MVN assumption, using either weighted Euclidean or Mahalanobis distance. Groups are dened sequentially and iteratively, whereas the BNL/MURR method involves a simultaneous determination of groups that are subsequently rened in an iterative fashion. Arguably, the spirit underlying both methods is very similar. The Bonn methodology allows for measurement uncertainty. In assessing whether a case xi Sx S, could belong to a group, S in the denition of Mahalanobis distance is replaced by S where Sx is a diagonal matrix of the measurement `uncertainties' (i.e., the variance of the 2 may be written as s 2 analytical error). The jth diagonal element of S x j s j , where the former term is the analytical error variance, and the latter term the estimated variance of the variable within the group, which incorporates both analytical and natural error. This generalization of other methodologies may have little effect in many practical situations if either natural variation dominates analytical variation, or if there are only a few variables for which analytical variation is the major component of variability within a group. It is argued, in Beier and Mommsen (1994), that there is no need to transform data logarithmically, as this gives similar results to the use of untransformed data. Some of the detailed evidence for this is given in Beier and Mommsen (1991). Evaluating the general validity of this claim raises a number of issues that have wider implications. One reason for a lack of distinction between results based on untransformed and logged data appears to be the relatively small spread of variable values within groups. This may well be a function of the particular type of ne ware studied, and does not necessarily generalize. It is unlikely that it is the high precision of results that allows the formation of groups with quite small spreads (Beier and Mommsen 1991), since precision of measurement and the natural spread of a variable in a group are logically unrelated. It is possible that the modelling methodology used gives rise to results of the kind reported. In particular, the groups examined for normality are dened by an algorithm that assumes normality, and which may impose that kind of structure on the groups found. This relation between modelling assumptions and structure found has already been noted for the BNL/MURR methodology. Another possible reason for the lack of a distinction between results for untransformed and logged data is the role played by standardization. Beier and Mommsen (1994) argue against the use of principal component analysis, on the basis that this usually involves the use of standardized data, and that this changes as cases are added to a data base. However, their approach is also dependent on standardization. The main difference is that standardization takes place within groups rather than across the sample as a whole (and will change as these groups are iteratively modied). Baxter (1995) found little difference in results obtained in studies using standardized data and standardized logged data, except in the presence of clear outliers. Since


M. J. Baxter

the Bonn methodology does depend on standardization, and outliers are excluded in group formation, it follows that results may not be strongly dependent on whether or not data are logged. The gist of the argument so far is that the Bonn and BNL/MURR approaches are more similar than would seem at rst sight. The renement of allowing for analytical error in the grouping procedure will make little difference when natural variation dominates. When high-precision data are used, and analytical variation dominates natural variation, groups are likely to be tightly dened and found by any reasonable method. When analytical variation dominates and precision is low it is questionable whether the elements so affected should be used in clustering. It has also been argued that the use of untransformed as opposed to logged data is not a critical difference. The treatment of highly correlated data is considered in the subsection on modelling dilution effects. Multiplicative models An alternative multiplicative model for case compositions to that introduced earlier is that of Buxeda (1999). This has statistical implications in the sense that a particular log-ratio, transformation of the datanot widely employed in archaeometric studiesis indicated for data analysis. Given this transformation, any of the available methods of statistical analysis might then be used. It is argued in what follows that Buxeda's (1999) approach will often be well approximated by the simpler use of logged data, bringing it into the mainstream of methodologies that have been proposed for artefact compositional analysis. Buxeda's (1999) model views the composition as a perturbation of the original clay composition, z1i say. If there are A separate perturbation processes, a multiplicative model, for a single element, of the form i j z 1 i j ui j x is obtained, where ui j
A Y a1

uai j

and uai j > 0 represents the effect of the perturbation at the ath stage, on the composition that exists at that point. If all naturally occurring elements (D say) in the periodic table are measured, the xi j must sum to 100%, so that the observed composition is of the form X D i j x xi j xi j 100 and j1 xi j 100. This gives rise to (fully) compositional data in the sense of Aitchison (1986). The compositional constraint presents problems for `standard' statistical analysis, documented in Aitchison (1982, 1986). One way of avoiding these problems, and that advocated in Buxeda (1999), is to base analyses on log-ratios of the form yi j logxi j =xi D 6 for j 1; 2; . . . ; D 1 and a suitable choice of the Dth element for the divisor. Part of the reasoning behind this choice is that, whereas xi j is constrained to lie between 0 and 100%, yi j is PD

Statistical modelling of artefact compositional data


unrestricted and more amenable to analysis by standard statistical methods. In practice, of P course, only p p D elements are used. Provided that these are such that p j1 xi j < 100, similar considerations Papply with yi j logxi j =xi p and j 1; 2; . . . ; p 1. If the p measured elements are such that p j1 xi j p 100, there is less of a case for using the log-ratio transformation (see the discussion of Aitchison 1982), but this debate is not pursued here. Equation (5) can be viewed as a mathematical model of ceramic compositional data, the form of which suggests that a particular data transformation be applied before statistical analysis. That the transformation depends on the choice of divisor may be viewed as a potential weakness of the methodology, and is the focus of some attention in Buxeda (1999), following Aitchison (1986, 1990). The choice is intimately linked to the concept of the variation matrix dened (Aitchison 1990) as the p p matrix with typical element
2 ti j varflogXi =Xj g v2 i v j 2ri j v i v j

where varf:g is the variance; v2 i is the variance of the logarithm of variable i, varflogXi g; and ri j is the correlation between logXi and logXj . The total variation in the data is then dened as X t i j = 2p vt

and t:s



is the total variance in the log-ratio covariance matrix when variable s is used as a divisor. It can be shown that t:s > vt, and the excess is interpretable as variability imposed by the choice of variable s as the divisor in equation (6). Buxeda's (1999) strategy is to choose as a divisor that variable for which vt=t:s is a maximum. In other words' variable s is chosen to impose the least variability. The choice of divisor requires the minimization of t :s
p X i1 2 v2 i pvs 2vs p X i1

ris vi :

The rst term on the right-hand side is constant for all s, so it is the last two terms that must be minimized. If, on the log-scale, a variable is approximately constant, these last two terms will be approximately zero, and analysis is then effectively based on the log-transformed data of the remaining variablesa standard procedure (e.g., Bieber et al. 1976; Glascock 1992). This ideal is unlikely to arise in practice, but the thrust of Buxeda's (1999) strategy is to choose the variable for which this state is most closely approximated. It can thus be conjectured that the use of log-ratios proposed by Buxeda (1999) will often be closely approximated by the simpler procedure of using logarithms of variables. The argument above is heuristic, but empirical evidence suggests that it is reasonable. In Buxeda's (1999) analysis of ceramic data from Abella results are largely determined by just six of 14 ratios used. These account for most of the variance in the data on the unstandardized log-ratio scale. It is straightforward to determine, empirically, that the same six variables dominate an analysis based on unstandardized log-transformed data, and lead to virtually identical results. Essentially this happens because the transformations (log-ratio or log) differentially weight the variables in the absence of subsequent standardization, leading to implicit variable selection of the same variables. In the


M. J. Baxter

context of analyses of glass compositional data that sum to 100%, Baxter (1993) noted a tendency for a small number of minor oxides with high variances on the log-ratio scale to dominate analysis. Re-analysis of several glass data sets of the kind referred to has conrmed that virtually identical results are obtained if unstandardized log-transformed data are used. Finally, some previous debate on the relative merits of using log-transformed and log-ratio data has demonstrated that they produce similar results (Church 1995; Hoard et al. 1995). The analysis given above may help explain why. Likelihood and Bayesian clustering Other model-based approaches to clustering that have received limited use in archaeometry are only noted briey. These include the Bayesian methodology of Buck and Litton (1996) and Buck et al. (1996), which assumes that within a particular provenance a sample of cases (possibly after transformation) is drawn from an MVN distribution. The total sample is assumed to be drawn from a mixture of G such distributions, where G is unknown. The procedure is illustrated in Buck and Litton (1996) for a 150 15 data set, in which there are three fairly clear groups. Similar assumptions underpin classication and mixture maximum likelihood models that have also had little archaeometric application. They are investigated in Papageorgiou et al. (2000). One potential attraction of both methodologies is that tests of the numbers of groups in the sample are possible. A second potential attraction is the ability to model elliptical groups of the kind to be expected with correlated data. Kraznowski and Marriott (1995) give a concise account of the mathematics of the methodologies. Modelling `dilution' effects The term `dilution' has been used in various ways in the literature and is introduced here through an idealized example. Suppose that for a two-component paste, modelled as in equation (1), repeated below in slightly modied form, yi p1i z1i 1 p1i z2i the clay, z1i , involves p 1 non-zero elements and is identical for each case in a sample of n. Suppose, also, that the temper consists of a single element, different from those in the clay. The composition for a case is then p1i z1i 1 ; p1i z1i 2 ; . . . ; p1i z1i p1 ; 1001 p1i since the pth variable comprises 100% of the temper. It is clear in this formulation that the composition of cases from the same clay source differs only because of differences in p1i or, equivalently, the proportion of temper in the paste. This effect, in which the variable addition of temper to a paste can obscure the similarity of the clay compositions has sometimes been referred to as a dilution effect. What we have here is a simple model of dilution. If interest centres on identifying cases for which the clay source is the same, or similar, the model can be used to remove or understand the effect of tempering, in order better to identify cases with similar sources. Bishop and Neff's (1989, 83) emphasis on the importance of modelling encompasses this kind of modelling, as distinct from the more purely statistical models discussed elsewhere in this paper. In the present simple example, a dilution correction can be accomplished in several equivalent ways. One is to identify and remove the tempering element from the composition and rescale remaining elements to sum to 100%. A second possibility is to remove the tempering element

Statistical modelling of artefact compositional data and work with ratios of the form yi j =yi k p1i z1i j =p1i z1i k z1i j =z1i k


for j k and some choice of k, in which the effect of tempering is `cancelled out'. A third possibility is to note that, for distinct cases i and j, yi k =yj k p1i z1i k =p1j z1j k p1i =p1j ai j where ai j is constant for all k, and estimate ai j using any k. Once this is done, the values of any case may be adjusted to match, as closely as possible, that of any other case or, as is more commonly done, a group mean. These ideas, while simple and based on an idealized model, form the basis of much that has been done in practice to deal with dilution effects. For i 1; . . . ; n and any pair of variables, j; k, not involved in the temper, a bivariate plot of yi j against yi k will show a scatter of points lying on a straight line, or a vector passing through the origin. This can be seen by noting that the plot is of p1i z1i j against p1i z1i k and, by assumption, z1i k bz1i j for some constant b and for all i. Another way of stating this is that the effect of dilution, of the kind being modelled, will be to induce high positive correlations among the variables. In the present case a dilution correctionbased on the centroid of the point scatter, for examplewill be to `shrink' all observations to that point and remove all correlation from the data. It is, in fact, possible to view high positive correlations among variables as potentially indicative of dilution, where dilution is now interpreted in a much more general sense that an effect due to tempering. This is the view taken in Beier and Mommsen (1994, 2956) who note a variety of technical effects that can give rise to `dilution', which they dene to `include both shifts due to different additional components in the clay and due to technical effects'. From this perspective dilution can give rise to elliptical clouds of points, in p-dimensional hyperspace, in which the major axis of the ellipse passes through the origin. It may be remarked that this denition of dilution encompasses data that are naturally, and highly correlated, a point that is considered further below. In practice, of course, matters are more complicated. Clays from the same source will vary; tempers will consist of more than one element; and the elemental composition of clays and tempers will overlap. Nevertheless, several researchers have considered the tempering model used above to be sufciently close to what may sometimes occur in practice to devote time to developing methods to correct for it. More realistically, and expressed somewhat informally, `dilution' may be occurring if, for two cases, i and j, y i k < ai j y j k for some constant ai j and for a majority, p , of the p variables. In our idealized example ai j could be determined from the ratio yi k =yj k for any variable, k, not involved in the temper. In practice, different k will give rise to different values, so that ai j must be estimated in some way. In the best relative t method developed at BNL (Harbottle 1976), if it is thought that a case yi and a dilution model of the form yi j ai y j is postulated, ai is is related to a group with mean y estimated as " #1=p0 p Y j yi j =y a i
j1 0


M. J. Baxter

the geometric mean of the correction factor determined separately for each element considered i can be calculated. An to be affected by dilution. From this, adjusted values of the form yi j =a arithmetic mean might also be used (e.g., Mommsen et al. 1988, 50). The most general of the procedures proposed in Beier and Mommsen (1994) is numerically more complex. For matching a case to a mean a modied Mahalanobis `distance' of the form
1 0 S d2 i vi xi x vi vi xi x

is used, where vi is a parameter, used to model the dilution effect, that is estimated to minimize d2 i , and v v2 S i Sx S: i In general, vi must be determined numerically, although simplication is possible if the measurement uncertainty is ignored. Buxeda's (1999) methodology, being based on log-ratios, will deal with dilution effects of the kind under consideration. Similarly, ratios in the form yi j logfxi j =gxi g where gxi is the geometric mean of the elements of xi , have been used explicitly to deal with dilution effects arising from tempering in Leese et al. (1989) and, less transparently, to deal with dilution arising for technical reasons, in Pike and Fulford (1983). To illustrate some of the foregoing ideas, a data set published in Tubb et al. (1980) on the chemical composition of Romano-British pottery, measured by atomic absorption spectrometry, will be used. In its original form this is a 48 9 data set. The pottery comes from ve kiln sites in three regions and previous multivariate analyses suggest the three regions are chemically distinct. This is shown in the upper plot of Figure 1 based on a principal component analysis of scaled data. Four oxides (Fe2 O3 , MgO, CaO and K2 O), identied in Tubb et al. as the only ones necessary for discrimination have been used, and one outlier with an unusually low value of K2 O has been omitted (C14 in the original publication). The separation of the three regional groups is evident. Groups 1 and 2 are dispersed in comparison to group 3 and, to the centre right of the plot, there is a single specimen of group 1 that is outlying with respect to the rest of the group. It has previously been noted that this is a multivariate outlier, possibly attributable to dilution effects (Baxter 1999). This can be examined j is postulated this and corrected for in a variety of ways. For example, if the model yi j ai y j . In Figure 2 logyi j is plotted gives rise to a model of the form logyi j logai logy j (using base 10 logarithms) along with the associated regression line. The solid against logy line is that which would be obtained in the absence of dilution (i.e., logai 0. It is approximately parallel to the regression line and the difference between the two lines, of about 0:14, gives an informal estimate of logai . This suggests that ai is about 0.73 or, in other words, that values for the specimen be multiplied by about 1.37 ( 1=0:73) to `correct' for dilution. Using exact calculations, the best relative t method leads to ai 0:76 and a multiplying factor of 1.32. Use of the BeierMommsen approach, ignoring measurement j =yi j leads to a error, gives a multiplying factor of 1.34. Even more simply, averaging y multiplying factor of 1.33. In this instance, the different methods of correcting for dilution lead to very similar results. Using this last approach to obtaining a correction factor, and adjusting all cases in group 1 to

Statistical modelling of artefact compositional data


Figure 1 Principal component plots of scaled data using a subset of variables and cases from Tubb et al. (1980) (see text for details). The upper gure uses the original data, and the lower after `correcting' for dilution in two of three regional groups.


M. J. Baxter

Figure 2 Using four variables, the logged (to base 10) values for a single case are plotted against the logged means for each of the variables and a regression (dotted) line is shown. The good linear t is approximately parallel to the solid line that would be obtained if the case had values equal to the means and is indicative of a dilution effect. The vertical distance between the lines provides an estimate of logai , where ai denes the proportionate relation between the case and `variable' means in the presence of dilution.

the mean of that group, and similarly for group 2, leads to the principal components plot in the lower part of Figure 1. It can be seen that, with the exception of four cases in group 1, the groups are more concentrated and spherical than in the original analysis. It is interesting to view these attempts to deal with dilution in the context of the statistical models for compositional data that have been used in this paper. It has been argued that the MVN assumption used in all the modelling applications discussed is at odds with the more fundamental model of case composition presented in equation (3). One way to reconcile the two models is to transform case compositions in such a way that the MVN assumption for a sample from a component population is more likely to be true. Dilution correction procedures can be viewed as an attempt to do precisely this, and the variety of approaches noted above all stem from the same simple model of dilution. When sample sizes preclude the use of Mahalanobis distance recourse must be had to Euclidean distance, and this is less than ideal when dealing with highly elliptical clusters. Dilution corrections will, if successful, have the effect of reducing correlations within groupspossibly quite considerably (Beier and Mommsen 1994)so that Euclidean methods are more satisfactory. Beier and Mommsen (1994) interpret their results as showing that the prevalence of highly correlated data, and the problems it causes, has been exaggerated, but their methodology does not distinguish between naturally and articially correlated data. The foregoing argument suggests that their methodology can be interpreted as an approach to data transformation that will generate approximately spherical and normally distributed groups if successful. Thus, from a statistical standpoint, dilution correction can be viewed as a methodology for making model assumptions more valid and easing the computational burden.

Statistical modelling of artefact compositional data



This paper has examined a number of competing approaches to the statistical analysis of artefact compositional data within a model-based framework. It has been argued that, despite their apparent differences, the methods examined have strong similarities and might often be expected to produce similar results in practice. Using a modelling framework makes quite explicit the fact that models that have been entertained for artefact compositions are at variance with models that are used for grouping cases. Whether this is a serious problem will depend on the separation of groups and departure from multivariate normality of a sample from a population. Methods that assume normality will tend to impose normal structure on the groups found, regardless of the `true' situation, and may mislead about their effectiveness. Different approaches to dilution correction can be viewed as attempts to transform compositions to normality that satisfy the assumptions of grouping procedures, although this is not the primary reason for the development of such approaches. In the absence of dilution and presence of non-normality simpler methods, such as the use of log transformation, exist. Finally, the creators of the methods discussed here might not necessarily view what they do as `model-based', although adopting such a view helps in understanding how methods compare. More thorough-going model-based Bayesian and likelihood methods of grouping data, developed in the statistical literature and little used in archaeometry, have been noted. They have the potential attraction that clusters of highly correlated variables can be dealt with in a natural fashion, and tests of the number of clusters in the data are possible. Some of these methods are investigated in more detail in Papageorgiou et al. (2000).
ACKNOWLEDGEMENTS s, Hans Mommsen and Hector Neff for I am particularly grateful to Thomas Beier, Caitlin Buck, Jaume Buxeda i Garrigo discussing their approaches to data analysis with me. There is no implication that they necessarily agree with my s is thanked, interpretations, and any misunderstandings and errors are entirely my responsibility. Jaume Buxeda i Garrigo additionally, for providing and allowing use of his data from Abella. My colleagues Christian Beardah and Ioulia Papageorgiou are thanked for contributing to my understanding of the practicalities of implementing some of the methods discussed. This work forms part of the GEOPRO Research Network funded by the DGXII of the European Commission, under the TMR Network Programme (Contract Number ERBFMRX-CT98-0165).

REFERENCES Aitchison, J., 1982, The statistical analysis of compositional data (with discussion), Journal of the Royal Statistical Society, B44, 13977. Aitchison, J., 1986, The statistical analysis of compositional data, Chapman and Hall, London. Aitchison, J., 1990, Relative variation diagrams for describing patterns of compositional variability, Mathematical Geology, 22, 487511. Baxter, M. J., 1993, Comment on D. Tangri and R. V. S. Wright, `Multivariate analysis of compositional data ldots', Archaeometry, 35, 11215. Baxter, M. J., 1994, Exploratory multivariate analysis in archaeology, Edinburgh University Press, Edinburgh. Baxter, M. J., 1995, Standardization and transformation in principal component analysis, with applications to archeometry, Applied Statistics, 44, 51327. Baxter, M. J., 1999, Detecting multivariate outliers in artefact, compositional data, Archaeometry, 41, 32138. Baxter, M. J., and Buck, C. E., 2000, Data handling and statistical analysis, in (eds. E. Ciliberto and G. Spoto) Modern analytical methods in art and archaeology 681746, John Wiley, New York. Baxter, M. J., Beardah, C. C., and Westwood, S., 2000, Sample size and related issues in the analysis of lead isotope data, Journal of Archaeological Science, 27, 97380.


M. J. Baxter

Beier, T., and Mommsen, H., 1991, On the distribution function of elements within groups of pottery and some consequences for multivariate analysis, unpublished conference paper. Beier, T., and Mommsen, H., 1994, Modied Mahalanobis lters for grouping pottery by chemical composition, Archaeometry, 36, 287306. Bieber, A. M., Brooks, D. W., Harbottle, G., and Sayre, E. V., 1976, Application of multivariate techniques to analytical data on Aegean ceramics, Archaeometry, 18, 5974. Bishop, R. L., and Neff, H., 1989, Compositional data analysis in archaeology, Archaeological chemistry IV in (ed. R. O. Allen), American Chemical Society Advances in Chemistry Series 220, 5786, Washington, DC. Buck, C. E., and Litton, C. D., 1996, Mixtures, Bayes and archaeology, in Bayesian statistics 5 (eds. J. M. Bernado, J. O. Berger, A. P. Dawid and A. F. M. Smith), 499506, Clarendon Press, Oxford. Buck, C. E., Cavanagh, W. G., and Litton, C. D., 1996, Bayesian approach to interpreting archaeological data, John Wiley, New York. s, J., 1999, Alteration and contamination of archaeological ceramics: the perturbation problem, Journal Buxeda, i Garrigo of Archaeological Science, 26, 295313. Church, T., 1995, Comment on `Neutron-activation analysis of stone from the Chadron formation and a Clovis site on the Great Plains' by Hoard et al. (1992), Journal of Archaeological Science, 22, 15. Glascock, M. D., 1992, Characterization of archeological ceramics at MURR by neutron activation analysis and multivariate statistics, in Chemical characterization of ceramic pastes in archaeology (ed. H. Neff), 1126, Prehistory Press, Madison, Wisconsin. Gordon, A. D., 1999, Classication, 2nd edn, Chapman and Hall/CRC, London. Harbottle, G., 1976, Activation analysis in archaeology, in Radiochemistry 3 (ed. G. W. A. Newton), 3372, Chemical Society, London. Harbottle, G., 1991, The efciencies and error-rates in Euclidean Mahalanobis searches in hypergeometries of user archaeological ceramic compositions, in Archaeometry 90 (eds. E. Pernicka and G. A. Wagner), 41324, Birkha Verlag, Basel. Hegmon, M., Allison, J. R., Neff, H., and Glascock, M. D., 1997, Production of San Juan red ware in the Northern Southwest: insights into regional interaction in early Puebloan prehistory, American Antiquity, 62, 44963. Herrera, R. S., Neff, H., Glascock, M. D., and Elam, J. M., 1999, Ceramic patterns, social interaction, and the Olmec: neutron activation analysis of Early formative Pottery in the Oaxaca Highlands of Mexico, Journal of Archaeological Science, 26, 96787. Hoard, R. J., Holen, S. R., Glascock, M. D., and Neff, H., 1995, Additional comments on neutron-activation analysis of stone from the Great Plainsreply, Journal of Archaeological Science, 22, 710. Krzanowski, W. J., and Marriott, F. H. C., 1995, Multivariate analysis: part 2, Edward Arnold, London. Leese, M. N., and Main, P. L., 1994, The efcient computation of unbiased Mahalanobis distances and their interpretation in archaeology, Archaeometry, 36, 30716. Leese, M. N., Hughes, M. J., and Stopford, J., 1989, The chemical composition of tiles from Bordesley, in Computer applications and quantitative methods in archaeology 1989 (eds. S. Rahtz and J. Richards), 2419, BAR International Series 548, British Archaeological Reports, Oxford. Lizee, J. M., Neff, H., and Glascock, M. D., 1995, Clay acquisition and vessel distribution patternsneutron activation analysis of Late Windsor and Shantok tradition ceramics from southern New England, American Antiquity, 60, 515 30. Mommsen, H., Kreuser, A., and Weber, J., 1988, A method for grouping pottery by chemical composition, Archaeometry, 30, 4757. Neff, H, (ed.), 1992, Chemical characterization of ceramic pastes in archaeology, Prehistory Press, Madison, Wisconsin. Neff, H., 1998, Units in chemistry-based provenance investigations of ceramics, in Measuring time, space and material: unit issues in archaeology (eds. A. F. Ramenofsky and A. Steffen), 11527, University of Utah Press, Provo, UT. Neff, H., Bishop, R. L., and Sayre, E. V., 1988, A simulation approach to the problem of tempering in compositional studies of archaeological ceramics, Journal of Archaeological Science, 15, 15972. Neff, H., Bishop, R. L., and Sayre, E. V., 1989, More observations on the problem of tempering in compositional studies of archaeological ceramics, Journal of Archaeological Science, 16, 5769. Papageorgiou, I., Baxter, M. J., and Cau, M. A., 2000, Model-based cluster analysis of artefact compositional data (submitted for publication). Pike, H. H. M., and Fulford, M. G., 1983, Neutron activation analysis of black-glazed pottery from Carthage, Archaeometry, 25, 7786. Pollard, A. M., 1986, Multivariate methods of data analysis, in Greek and Cypriot pottery: a review of scientic studies (ed. R. E Jones), 5683, Occasional Paper 1, British School at Athens Fitch Laboratory, Athens.

Statistical modelling of artefact compositional data


Pollard, A. M., and Heron, C., 1996, Archaeological chemistry, Royal Society of Chemistry, Cambridge. Sayre, E. V., 1975, Brookhaven procedures for statistical analyses of multivariate archaeometric data, Unpublished manuscript. Sayre, E. V., Yener, K. A., Joel, E. C., and Barnes, I. L., 1992, Statistical evaluation of the presently accumulated lead isotope data from Anatolia and surrounding regions, Archaeometry, 34, 73105. Slane, K. W., Elam, J. M., Glascock, M. D., and Neff, H., 1994, Compositional analysis of eastern sigillata A and related wares from Tel-Anafa (Israel), Journal of Archaeological Science, 21, 5164. Steponaitis, V. P., Blackman, M. J., and Neff, H., 1996, Large scale patterns in the chemical composition of Mississippian pottery, American Antiquity, 61, 55572. Triadan, D., Neff, H., and Glascock, M. D., 1997, An evaluation of the archaeological relevance of weak-acid extraction ICP: White Mountain redware as a case study, Journal of Archaeological Science, 24, 9971002. Tubb, A., Parker, A. J., and Nickless, G., 1980, The analysis of Romano-British pottery by atomic absorption spectrophotometry, Archaeometry, 22, 15371. Weigand, P. C., Harbottle, G., and Sayre, E. V., 1977, Turquoise sources and source analysis: Mesoamerica and the southwestern U.S.A., in Exchange systems in prehistory (eds. T. K. Earle and J. E. Ericson), 1534, Academic Press, New York.