Professional Documents
Culture Documents
The use of statistical techniques in that the hoards are very similar to each other,
archaeology is full of pitfalls, as many and all date to around the Boudican rebellion.
archaeologists know to their cost. Occasion- Creighton (1994) suggested a chronological
ally, papers are published which attempt to ordering of the hoards, based on a simple
‘set the record straight’ and bring back seriation. In turn, van Arsdell (1996) criti-
straying archaeologists to the straight and cised Creighton on the grounds that he had
narrow path (e.g. Thomas 1978). It is not demonstrated that there were statistically
especially important that such papers should significant differences between the hoards,
themselves be free of serious statistical error, and went on to apparently show that there
as they are likely to be taken as a model by were no such differences. The first point is
archaeologists seeking for some certainty in a perfectly valid: it is all to easy to spot
sea of statistical confusion. patterns in ‘random’ data, and an objective
It is particularly unfortunate, therefore, that check that any possible patterns are ‘real’ is
a recent paper in this journal (van Arsdell an essential prerequisite to their interpret-
1996), while attempting to improve the meth- ation. The problem comes with the method
odology of an earlier contribution (Creighton used to demonstrate the apparent lack of
1994), has only succeeded in digging an even statistical significance.
deeper pit to entrap unwary archaeologists. It The first step of this method was to convert
is necessary to point out the fatal flaws in the all the counts to permillia (i.e. coins per
methodology before it has a chance to thousand, in contrast to the more common
become an accepted technique. percentages, or coins per hundred), following
The issue concerns the statistical analysis the example of Reece (1981). This is a
of a simple dataset: counts of coins of 9 types perfectly respectable technique of data
found in 12 hoards. The traditional view is presentation, and lends itself well to the
production of comparative graphs. The next hoards were ten times the size, the conclusion
step was to compare each hoard against the would be exactly the same. But with ten times
aggregate of all the others, using the more evidence, it is much more likely that the
permillia of each coin type as the bases for observed pattern would be judged to be
comparison. For each coin type in turn, a statistically significant. The method fails to
range of two standard deviations was take account of the size of the dataset and thus
plotted about the mean permillia figure for of the weight of the evidence. The problem of
‘other hoards’, and the corresponding figure the relationship between sample size and
for the chosen hoard was plotted to see if it statistical significance has been widely
fell within the range; if it did, this was taken discussed, for example by Shennan (1988,
as evidence that this difference was not 77–8).
statistically significant. The error is compounded by van Arsdell’s
This is where the method breaks down, misleading explanation of the meaning of a
because two key assumptions, on which this significance test (e.g. 1996, 236). If a hoard is
‘2 sigma’ rule is based, do not hold in this proven to be different from the others at the
particular dataset. First, the data are not two standard deviation level, it is not true that
normally distributed, but in fact have a ‘there will still be a five percent chance that
skewed distribution (this can be seen from they are actually the same’. What can be said
the way in which the 2 standard deviations is that if they were the same, there would be a
bars frequently encompass impossible five percent chance of the difference between
negative values). Second, and more serious, them appearing to be as big as (or bigger
the individual permillia values from which than) the observed difference. This misinter-
each standard deviation is calculated do not pretation is repeated at several stages in the
have the same statistical distribution. Even argument.
under the implicit null hypothesis that all the So what can be done? In statistical terms,
hoards have the same pattern (and hence all the dataset is a small two-way contingency
hoards have the same mean permillia for any table, amenable to analysis by any of the
chosen coin type), they all have different techniques designed for such tables, e.g. the
standard deviations because they are based 2 (chi-squared) test, the related G2 test, or
on hoards of different sizes. The calculation log-linear analysis (Bishop et al. 1975). Here
of the standard deviation in this way (i.e. on I shall first use the simplest and most
percentages or permillia), and its use in familiar, the 2 test, which compares the
constructing a hypothesis test (in effect, a t- observed data with the figures that would
test, although it is not called such), is invalid. have been obtained if (in this case) each coin
Discussion of this point led Reece to stop type had occurred in the same proportions in
using this technique (Reece 1988, 22–3). all the hoards, and gives a measure of the
To the archaeologist, this may seem like an statistical significance of the difference
obscure technical point, but it is a technicality between the two (known as the ‘observed’
that ‘pulls the rug from beneath the feet’ of and the ‘expected’ respectively).
the method. At a common-sense level, the If we apply this test to the data as they
fatal weakness of the method can be seen in stand, we obtain a value of 2 = 696 on 88
the observation that the apparent significance degrees of freedom (d.f.), which is statisti-
of the differences in no way depends on the cally significant at the 0.0001 level, i.e. if the
sizes of the hoards. If, for example, all the hoards were really ‘the same’, there would be
a chance of less than 0.0001 (1 in 10,000) of and coin type contribute most to the statistical
them appearing as different as they do. significance of the overall value of 2 that we
Unfortunately, it is not quite that simple. have already observed. There is unfortunately
The calculation of 2 is only an approxi- no statistical test which would enable us to
mation, which can become inaccurate if some say which of these contributions are statisti-
of the ‘expected’ values are ‘too small’ cally significant, but one can often form a
(Cochran 1954). This implies that two small clear impression of which cells contribute
hoards (Brettenham, 5 coins, and March, 8 strongly. It is these cells which give a dataset
coins) and three rare coin types (D, 8 its distinctive pattern. The values of the con-
examples, E, 7 examples, and M, 9 examples) tributions to 2 for the reduced dataset (i.e. as
must either be deleted before the analysis, or immediately above) are given in Table 1.
merged with other categories. If they are Table 1 shows that some coin types con-
deleted, the resulting 6-by-10 table gives 2 tribute strongly to 2 and to the pattern, and
= 149 on 45 d.f., and p is again < 0:0001. My others scarcely at all. Type C+D+E makes
preference would be to delete the two small several strong contributions, indicating that
hoards and to merge coin types D and E with there are more coins than ‘expected’ of this
C, and M with LN (as done by van Arsdell in group of types in hoards Honingham,
his Tables 1 and 2), again leading to a 6-by- Lakenheath, Weston and Wimblington, and
10 table, but this time with 2 = 226, and yet fewer in hoards Field Baulk, Scole, and
again p < 0:0001. So there is clearly a possibly Eriswell. There are individual large
statistically significant pattern in the data; the contributions arising from there being more
question is — where is it? coins than ‘expected’ (i) of type L+M+N
We can approach this question by a closer (more specifically, type M) at Joist Fen, and
examination of the dataset itself, or by trying (ii) of type F at Eriswell. Type G shows a
to make a picture of it. I shall do each I turn. remarkably uniform pattern, as does the
The 2 statistic is made up from individual Fring hoard.
‘contributions’ from each cell (i.e. each Even with this approach, we have the task
combination of hoard and coin type) in the of scanning tables to look for interesting
dataset. We can examine these contributions features. A visual approach might give us a
to see which particular combinations of hoard more accessible route into this dataset. The
Table 1
Contributions to the overall chi-squared statistic from each cell of the ’reduced’ dataset. Very large contributions are shown in bold. The . . . indicates a
number smaller than 0.01 but greater than zero.
Coin type
C+D+E F G HIJK L+M+N O
Honingham 6.21 1.00 ... 1.03 0.47 1.07
Lakenheath 6.19 3.68 0.02 0.05 0.08 0.01
Joist Fen 0.01 0.01 0.05 0.91 53.33 1.05
Weston 11.47 1.75 0.02 4.62 0.50 1.52
Santon Downham 2.69 1.19 0.61 0.17 0.02 0.06
Wimblington 39.53 0.84 1.01 2.09 0.41 1.96
Fring 0.28 1.71 0.09 0.81 2.48 1.06
Eriswell 4.24 29.04 0.16 2.42 0.50 4.71
Field Baulk 14.39 1.18 0.69 2.61 0.04 1.82
Scole 7.32 0.70 0.79 1.50 1.07 0.62
Figure 1
Plot of the first two axes of a correspondence analysis of the ‘reduced’ dataset.
appropriate statistical technique for the sort prominent, and vice versa (1984, 6). In our
of dataset is correspondence analysis example, points representing coin types will
(Greenacre 1984; 1992), which represents lie ‘in the direction of’ the hoards in which
the rows and columns of a contingency table they occur more frequently than ‘expected’.
as points on a scatterplot, in which (roughly The first run of this technique was with the
speaking) the point representing a particular ‘reduced’ dataset as described above, and
row will lie ‘in the direction of’ the points produced the plot shown as Figure 1. Here
representing the columns in which it is we can see a central core of types and hoards,
Figure 2
Plot of the first two axes of a correspondence analysis of the ‘reduced’ dataset, after the omission of coin type M.
with two sets of outliers defining a pattern: coins of type C+D+E at Wimblington and
(i) in the horizontal axis, type C+D+E and the Weston than ‘expected’, and more of type
hoards Wimblington and to a lesser extent L+M+N at Joist Fen. This agrees with the
Weston, indicating an association between high contributions shown in Table 1.
this type and these hoards, (ii) on the vertical It is clear that pattern is to some extent
axis, type L+M+N and the Joist Fen hoard, dominated by the exclusive relationship
indicating an association between this type between type M and Joist Fen (i.e. type M
and this hoard. In other words, there are more is only found there in this dataset). There is
cochran, w.g. 1954: Some methods for strengthening greenacre, m.j. 1992: Correspondence Analysis in
the common chi-squared tests. Biometrics 10, 417–451. Practice (London, Academic Press).
thomas, d.h. 1978: The awful truth about statistics in reece, r. 1988: My Roman Britain. Cotswold Studies
archaeology. American Antiquity 43, 231–244. 3.
madsen, t. 1988: Multivariate statistics and shennan, s. 1988: Quantifying Archaeology.
archaeology. In Madsen, T. (ed.), Multivariate (Edinburgh, Edinburgh University Press).
Archaeology (Jutland Archaeological Society
van arsdell, r. 1996: A statistical analysis of Icenian
Publications XXI), 7–27.
coin hoards. OJA 15, 235–242.
reece, r. 1981: The ‘Normal’ Hoard. PACT 5.
Statistics and Numismatics.