You are on page 1of 8

Journal of Archaeological Science (1997) 24, 347354

Some Archaeological Applications of Kernel Density Estimates


M. J. Baxter and C. C. Beardah
Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University,
Nottingham NG11 8NS, U.K.
R. V. S. Wright
Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia
(Received 10 November 1995, manuscript accepted 11 March 1996)
Kernel density estimates, which at their simplest can be viewed as a smoothed form of histogram, have been widely
studied in the statistical literature in recent years but used hardly at all within archaeology. They provide an eective
method of data presentation for univariate and particularly bivariate data and this is illustrated with a range of
examples. The methodology can be used as an informal approach to spatial cluster analysis, and one example suggests
that it is competetitive with other approaches in this area. A reason for the lack of use of kernel density estimates by
archaeologists may be the lack of accessible software. The analyses described here were undertaken in the MATLAB
package using routines developed by the second author, and are available on request. 1997 Academic Press Limited
Keywords: KERNEL DENSITY ESTIMATES, BIVARIATE DATA, CONTOURING, SPATIAL
CLUSTERING, MATLAB.
Introduction
K
ernel density estimates (KDEs) at their simplest
can be thought of as an alternative to the
histogram. They typically provide a smoother
representation of the data and, unlike the histogram,
their appearance does not depend on a choice of
starting point. In this sense KDEs alleviate problems
with the histogram that have been perceived by some
archaeologists (Whallon, 1987).
The smoothness of the KDE means that it is
aesthetically more pleasing than the histogram. It also
facilitates the presentation of several data sets in a
single gure, and makes it easier to compare data sets.
This has been argued and illustrated in Baxter &
Beardah (1995b).
It might be argued that, with univariate data, the
advantages of using a KDE as opposed to a histogram
for data representation are not so great as to cause
them to be preferred on a routine basis. For bivariate
data the case for using KDEs is much stronger, and the
purpose of this paper is to illustrate this by example.
Two-dimensional histograms require large amounts of
data, are unwieldy, may be dicult to interpret, and
cannot easily be used as the basis for other methods of
data representation such as contouring. This paper will
illustrate how KDEs readily overcome these problems.
Although the possibility of using KDEs for archaeo-
logical data presentation is implicit in Ortons (1988)
comments on Whallons (1987) paper, we are not
aware of any such uses outside our own work. An
example of an application to bivariate data is given in
Baxter & Beardah (1995a). This arose when one of us
(MJB) wished to explore the potential of the method-
ology for representing results from a principal compo-
nent analysis of archaeometric compositional data and
asked the second author (CCB) if it was possible to do
this in the MATLAB package. Subsequent collabor-
ation, described in Beardah & Baxter (1995) and
Baxter & Beardah (1995b), has led to the development
of a set of MATLAB routines that include many of the
approaches described in the recent book by Wand &
Jones (1995). That book, the earlier text of Silverman
(1986), and the paper by Bowman & Foster (1993) may
be referred to for the technical developments that
underpin the work described here.
The main ideas of kernel density estimation necess-
ary for this paper are presented in the next section,
with more technical detail and discussion of compu-
tational matters in the appendix. The main section of
the paper illustrates applications of the methodology,
and the concluding section summarizes what we think
are its merits.
Kernel density estimation
Histograms are among the most common methods
of data presentation in archaeology. Anyone who
has drawn a histogram by hand will know that its
347
0305-4403/97/040347+08 $25.00/0/as960119 1997 Academic Press Limited
appearance may be crucially aected both by the point
at which the histogram is startedthe originand the
width of the intervals used, or bin-width. Good
computer software packages will make automatic and
sensible choices for the origin and bin-width, but it
should be possible to vary these and this will aect the
results obtained.
Let the origin of the histogram be m
0
, with subse-
quent interval boundaries at m
1
, m
2
, etc. and assume
that (m
j
m
j1
)=c for some constant c for j=1,2, . . . (i.e.
intervals are of equal width). Let and q be values such
that is small and q=c. It is then possible to imagine
the construction of successive histograms with origins
at (m
0
+i) for i=0,1, . . . , q1. If the q histograms so
obtained are averaged then an average shifted histo-
gram (ASH) (Scott, 1992) is obtained. The appearance
of the ASH will not be dependent on the choice of m
0
.
Its smoothness will depend on c, and increases as c
increases. The limiting form of the ASH, as <0, is a
kernel density estimate. An example is given in Baxter
& Beardah (1995b).
Another way to think of KDEs is as follows. Given
n points X
1
, X
2
, . . . , X
n
situated on a line a KDE can
be obtained by placing a bump at each point and
then summing the height of each bump at each point
on the X-axis. The shape of the bump is dened by a
mathematical function, the kernel K(x), that integrates
to 1. The spread of the bump is determined by a
window- or band-width, h, that is analogous to the
bin-width, c, of a histogram. The kernel is usually a
symmetric probability density function.
The shape of the resulting KDE does not depend on
a choice of origin and is relatively insensitive to the
exact form of K(x), which is taken to be a normal
density function in the rest of the paper. The choice of
h is more critical and will be considered shortly.
We have presented two simple ways of conceptual-
ising what a KDE is. Mathematically, the latter
approach gives the KDE as
where f|(x) is an estimate of the density underlying the
data.
Large values of h over-smooth, while small values
under-smooth the data. A variety of approaches can be
used to select h, including subjective choice and it may
often be sensible to look at KDEs for several values
of h.
More objective or data-driven choices of h can be
made, and a wide range of methods have been pro-
posed for this. These are described in detail in Wand
& Jones (1995) and in summary form in Baxter &
Beardah (1995b). An outline of a subset of these
methods is given here.
The data can be thought of as a sample of n from
an underlying and unknown true density, f(x). It is
possible to dene a measure of closeness between the
KDE and the true density, leading to an estimate of h
that maximizes the closeness. If it is assumed that
the true density is normal then it can be shown that an
optimal choice of h is
h=106n
1/5
,
where is an estimate (possibly robust) of , the S.D.
of the normal distribution. This is the normal scale rule
and will typically over-smooth the data if the under-
lying density is not normal.
The estimate of h depends, in general, on properties
of the true density that are unknown, and in particular
on a quantity that may be interpreted as the rough-
ness of the density. A family of direct plug-in (DPI)
estimates can be dened in which an estimate of h can
be obtained by plugging-in an estimate of roughness
into the equation that denes h. More details are given
in the Appendix.
A related approach is the solve the equation
(STE) method, in which an equation that relates h to a
function of the unknown density is dened. In essence,
an initial estimate of h leads to an estimate of the
density, that in turn leads to a new value for h and a
new density estimate. The process continues until the
estimate of h converges. Wand & Jones (1995: 96)
suggest that a suitable data analytic strategy is to look
at several dierent estimates of h, but that if a single
value is required DPI and STE estimates appear to be
among the more suitable.
The prime purpose of the paper is to illustrate the
use of bivariate KDEs and the generalization to these
is relatively straightforward. By analogy with the
previous discussion of univariate KDEs we may
think in terms of n points in a plane dened by
co-ordinates X
(i)
=(X
i
, Y
i
), for i=1,2, . . . , n. Locating
a bump at each point corresponds in this case
to centering a three-dimensional bump or hill at
each point and then, at each point in the plane,
summing the height of the bumps. The bump, or
kernel, is taken in this paper to be a bivariate normal
distribution.
For two variables, X and Y, a bivariate normal
distribution is dened by the means of X and Y, taken
to be zero; their S.D.; and their correlation, which
determines the orientation of the bump. If this corre-
lation is taken to be zero, as we do here, then smooth-
ing will be in the direction of the coordinate axes and
the degree of smoothing is determined by the S.D. One
will often not lose much by taking the correlation to be
zero, whereas smoothing equally in both directions, by
using the same window-widths, is not generally to
be recommended (Wand & Jones, 1995: 108).
The theory underlying the optimal choice of
window-widths is not as well developed for the bivari-
ate as for the univariate case. The examples in this
paper use window-widths for the X and Y directions
determined as for the univariate case, using either STE
estimates or the normal scale estimates.
348 M. J. Baxter et al.
With the assumption of zero correlation the
representation of the bivariate KDE, f |(x,y), is given by
where h
1
and h
2
are the window-widths in the X and Y
directions.
An attraction of using KDEs is that they can be used
as a basis for producing contour plots of the data and
this leads to graphical representations of data of a kind
that archaeologists should nd familiar. The following
discussion of how contouring can be used is based on
the paper by Bowman & Foster (1993).
After a bivariate KDE has been obtained each
(two-dimensional) data point is associated with a
density height that may be ranked from largest to
smallest. The rst 50% ranked observations, for
example, may be used to dene contours that enclose
the densest 50% of the data. The level of contouring
can be varied to contain any specied proportion of
the data, and several contours can be superimposed
on a plot, with the original data if this is helpful.
Bowman & Foster (1993: 173) note that in some
ways this provides a two-dimensional analogy to the
one-dimensional boxplot, and also that the approach
is useful for looking for modes or clusters in the
data.
A further extension, noted in the same paper, occurs
when the data points can be classied, by period or
context for example. In this case a particular contour
level such as 75% might be selected and then contours
at this level drawn for each group separately, to reveal
how similar or distinct they are. This will also be
illustrated in the next section.
Examples
There are many ways in which univariate KDEs might
be used in archaeology, and several of these have been
illustrated in our previous work. Data presentation for
a single data set and comparison between the distri-
butions of dierent data sets are obvious uses. It is
worth remarking that the boxplot, another good way
of looking at and comparing univariate data, does not
work well with multi-modal data. Bounded data, in the
sense that certain values are impossible, and data
aected by outliers can be handled using boundary
kernels and adaptive estimates respectively, and this
is discussed and illustrated in Beardah & Baxter
(1995).
For practical purposes a distinction may be drawn
between kernel density estimation as applied to simple,
or simply transformed, variables, and as applied to
composite variables such as those derived in principal
component and other forms of multivariate analysis.
This latter greatly extends the potential for the use of
KDEs and is illustrated in Examples 1, 3 and 4.
Example 1
Principal component analysis is one of the more com-
monly used multivariate methods in archaeology and a
detailed account and bibliography is given in Baxter
(1994). Typically, data are standardized and an analy-
sis results in new, linear combinations of the original
variables, called principal components, that can be
inspected for structure using plots (usually) based on
the rst two or three components. If there is structure
in the data it will often show in the rst component and
it can be useful to examine this using a KDE.
The data used for the rst example are 105 speci-
mens of Roman waste glass, with a principal compo-
nent analysis based on their chemical composition
with respect to 11 oxides. The data are given, and
extensively analysed, in Baxter (1994). The specimens
come from two sites and the statistical analyses suggest
that there are perhaps three clusters in the data that are
related to, but do not exactly coincide with the site
classication.
As a rst illustration of kernel density estimation
Figure 1 shows two KDEs for the principal component
scores, based on the normal scale estimate of h and an
STE estimate of h. The normal scale estimate over-
smooths the data, as expected, and misses the central
and smaller mode suggested by the STE approach.
The usual bivariate component plot can be repre-
sented by a KDE in various ways. Figure 2 shows a
scatter plot of the scores on the rst two components
and Figure 3 shows a KDE using the STE estimate of
h. Three main concentrations are evident. For this
example inspection of the scatterplot has led one of us
(Baxter, 1994) to the same conclusion, so that a KDE
is not essential. In Examples 3 and 4 much larger
data sets are used for which the scatterplot is a less
useful tool.
8
0.3
0
8
Fi rst component
R
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
2
0.05
0.25
0.2
0.15
0.1
6 4 2 0 4 6
Figure 1. Two univariate kernel density estimates for scores on the
rst principal component of an analysis of the chemical composition
of 105 specimens of Romano-British waste glass. : STE rule;
: normal scale rule.
Kernel Density Estimates 349
Example 2
An obvious use for bivariate KDEs is in the presen-
tation and interpretation of spatial data in the form of
coordinates of nd spots, for example. To illustrate
this an ethnoarchaeological data set, Binfords (1978)
Mask Site data, is used. The data are taken from
appendix A of Blankholm (1991), who uses them to
test a variety of approaches to intrasite spatial analysis.
The data, as presented by Blankholm, consists of the
spatial coordinates of ve classes of nd that might
occur in the archaeological record, such as artefacts,
large bones and bone splinters. We use the subset
based on the coordinates of the locations of 276 bone
splinters.
Figures 4 and 5 show analyses in which the normal
scale rule and STE estimates have been used to deter-
mine window widths separately for the two coordinate
directions. Both analyses show 25, 50, 75 and 100%
contours superimposed on the distribution of the bone
splinters. Once again the normal scale analysis pro-
duces a smoother picture. There are clearly three main
concentrations in the data with the STE analysis
suggesting a subdivision of one of these, in the bottom
right of the graph, into two groups and a fth group in
the upper left of the gure.
It is instructive to compare our results with those
obtained by a variety of methods in Blankholm (1991).
His gure 9, using contouring at equal heights (rather
than encompassing specied proportions of the data),
is less revelatory of structure than our gures, while a
k-means cluster analysis (his gure 17) suggests a three
cluster distribution. Contour maps or clustering arising
from local density analysis (his gure 32) and nearest
neighbour analysis (his gure 39) are also given. We
think that our gures, and particularly that for the
STE analysis, suggest structure as well asor more
clearly thanthe analyses in Blankholm (1991).
5
4
8
5
Component 1
C
o
m
p
o
n
e
n
t

2
2
6
2
0
2
4
4 2 0 4 1 3 3 1
Figure 2. Principal component plot for the rst two components
from an analysis of the chemical composition of 105 specimens of
Romano-British waste glass.
6
0.2
0
6
Component 1
R
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
2
0.05
0.15
0.1
4
2
0
4
Component 2
5
5
0
Figure 3. A KDE estimate, based on an STE rule for the selection of
h, for the data.
13
12
3
Component 1
C
o
m
p
o
n
e
n
t

2
5
9
8
10
7
4 6
6
11
4
5 12 11 10 7 8 9
Normal scal e rul e
Figure 4. A KDE of the Mask Site data using the normal scale rule.
The contours are for 25, 50, 75 and 100% inclusion levels.
13
12
4
3
Component 1
C
o
m
p
o
n
e
n
t

2
5
9
8
10
7
4 6
6
11
5 12 11 10 7 8 9
STE rul e
Figure 5. As for Figure 4 but using an STE estimate.
350 M. J. Baxter et al.
How real is the structure suggested? In fact the
location of hearths, activity areas and features such as
rocks is known, and Blankholm provides a map of
these that can be overlaid on his gures. There are ve
hearths and two of them are associated with concen-
trations detected in all analysesthose to the left of
our gures. Two other hearths that are adjacent, and at
the bottom left, are associated with the third main
concentration. Only our STE analysis suggests two
subdivisions of this group. The fth hearth is associ-
ated with a less dense area of bone splinters in the
upper right of the diagram and is suggested by our STE
analysis and some of those reported by Blankholm.
From this discussion we conclude that, for this
example at least, the KDE approach is competitive
with other statistical approaches to spatial analysis in
archaeology of the kind that seeks clustering in artefact
scatters.
From the foregoing discussion it is obvious that
contouring of artefact scatters can be undertaken with-
out reference to kernel density estimation. The merits,
or otherwise, of dierent approaches will be discussed
in the concluding section. It will also be obvious that
KDEs can be used as an informal means of cluster
analysis for these kind of data, and in this sense
competes with more formal methods such as k-means
cluster analysis. It is known that k-means analysis has
a tendency to produce spherical clusters, whether or
not the real structure has this form. This diculty is
avoided by the clusters (or contours) suggested by
KDEs. Determining the number of clusters is a prob-
lem for any clustering approach. With KDEs it should
be informative to examine contours at dierent levels
of inclusion as a means of looking for structure at
dierent scales of spatial resolution.
Figure 6 shows the alternative representation, for the
STE estimate, of the KDE as a three dimensional
diagram. There are four clear concentrations, or
modes, with a much gentler hillock, visible behind
the front peaks, that is associated with the fth hearth.
Example 3
This third example is based on anthropometric rather
than archaeological data, but is ideal for showing how
KDEs can be used to illuminate the message of large
data sets. The data are discussed and analysed in
Relethford & Crawford (1995) and consist of 17 body
and craniofacial measurements from 7214 male adults
in 31 birth counties in Ireland. The data were originally
used to investigate the genetic distances between the
populations dened by the counties.
It was of interest for one of us (RVSW) to investi-
gate the performance of a principal component analy-
sis, in order to see how the rst two principal
components relate to geography. Some strong corre-
lations of this sort, but for dierent data, have been
reported by Wright (1992). An obvious problem, in
terms of the usual component plots presented from
such an analysis, is that there are too many data points
to plot the data sensibly in the usual way. Here we
concentrate on what KDEs have to oer in terms of
handling such a mass of data, without going into
aspects of substantive interpretation, and note that any
two-dimensional scatter of data can be handled in a
similar way.
Figures 710 show four dierent representations of
these data. Figure 7 is an attempted scatter plot that
shows how hopeless it is to try and discern structure in
the data in this way; Figure 8 is a three-dimensional
plot along the lines of Figures 3 & 6; and Figure 9 is a
contour plot along the lines of Figure 5. An STE
estimate has been used. The interesting feature of these
last two plots is that there are no interesting features;
there is no evidence of any kind of grouping in the
data. The nal plot in Figure 10 shows separate 75%
contours for three of the counties, and there is no
evidence of any dierence between them. Although the
plot becomes very crowded this remains the case if all
6
0
2
Component 1
R
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
14
0.2
0.25
4
Component 2
10
14
12
12
10
8
2
8
6
4
0.15
0.1
0.05
Figure 6. A density plot of the Mask Site data using the STE
estimate.
0.15
0.1
0.06
0.15
Component 1
C
o
m
p
o
n
e
n
t

2
0.05
0.04
0.08
0.04
0
0.02
0.1 0 0.1 0.05
0.06
0.02
Figure 7. A scatterplot of the scores from a principal component
analysis of the Irish body and craniofacial measurement data. Based
on 7214 individuals, the purpose is to illustrate that such plots are of
limited use for looking at large data sets.
Kernel Density Estimates 351
counties are represented in a similar way on the same
plot. The plots indicate that the correlation (if any)
that exists between the principal components and
geography must be a low one.
Example 4
The nal example is similar in kind to the previous
example, and arises from a correspondence analysis
originally undertaken by one of us. It possesses
additional features of interest.
The data are the frequencies of eight dierent types
of archaeological site, recorded for 4712 km
2
in the
Australian state of Victoria. The resulting 4712 corre-
spondence analysis object scores are plotted on
Figure 11.
It is not possible to get a sensible looking plot here
because the structure of the data is such that many of
the points represent multiple occurrences. This over-
printing happens because many of the kilometre
squares have identical frequencies of sites, though
multiple occurrences are not evident from the plot.
Solutions such as jittering exist, in which points are
displaced by a small and random amount, which would
tend to give a separate point for each site, but this then
leads to problems similar to that evidenced in Figure 7.
An alternative approach is to use a KDE and
contour it at some suitable level in order to see where
the points pile up, and this is done in Figure 12.
Here 90, 95 and 100% contours are shown using an
STE estimate, and are suggestive of, perhaps, nine
groups.
Discussion
For simple data presentation and comparison of uni-
variate data KDEs can be regarded as an alternative to
the histogram. We think that there are aesthetic and
0
0.2
Component 1
R
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
0.2
Component 2
0.1
0.15
200
100
0.1
0
0.1
0.1
0.05
0
0.05
500
400
300
Figure 8. The data of Figure 7 represented as a density plot.
0.15
0.1
0.06
0.15
Component 1
C
o
m
p
o
n
e
n
t

2
0.05
0.04
0.08
0.04
0
0.02
0.1 0 0.1 0.05
0.06
0.02
Figure 9. The data of Figure 7 represented as a contour plot
showing 25, 50, 75 and 100% levels of inclusion.
0.06
0.03
0.04
0.06
Component 1
C
o
m
p
o
n
e
n
t

2
0.02
0.03
0
0.02
0.04 0 0.04 0.02
0.02
0.01
0.01
Figure 10. The data of Figure 7 showing 75% contours for three of
31 counties. The contours largely overlap, and this is the case
however the counties are selected.
Figure 11. A scatterplot based on the rst two components of a
correspondence analysis of the Victorian sites data. Many of the
points displayed represent multiple occurrences.
352 M. J. Baxter et al.
interpretational benets to be obtained from using
KDEs and have argued this elsewhere (Baxter &
Beardah, 1995b), but others may regard it as a matter
of taste.
For bivariate data the case for regarding KDEs as a
tool to be routinely used is a strong one. Even when an
ordinary scatter plot could be used, as in Examples 1
and 2, KDEs can be more eective for showing con-
centrations of points or modes in the data. For very
large data sets as in Examples 3 and 4, where scatter-
plots are uninterpretable, it is unarguable that KDEs
or other methods with similar aims are needed to get a
view of the data.
Of course it would be quite possible to generate
contour plots and density plots of the kind shown in
the gures without recourse to KDEs, so why use
KDEs? One answer is that statistical theory associated
with KDEs provides guidance as to the appropriate
degree of smoothing to use and, as the rst two
examples show, this can have a critical eect on the
interpretation of the data. Another reason for using
KDEs is that they lend themselves naturally to con-
touring in terms of inclusion of specied percentages of
the most densely clustered points. As the examples
show, this means that KDEs can be used as an
informal kind of clustering method that does not
impose structure on the data in the way that more
formal methods often do.
Acknowledgements
Caroline Jackson is thanked for providing the
glass compositional data used in Example 1. John
Relethford and Michael Crawford are thanked for
providing the data used in Example 3, and for per-
mission to use it. Richard MacNeill and Stewart
Simmons (Aboriginal Aairs Victoria) are thanked for
the sites data used in Example 4. Responsibility for the
use to which these data sets have been put in the paper,
and interpretations, rests with the present authors.
References
Baxter, M. J. (1994). Exploratory Multivariate Analysis in Archae-
ology. Edinburgh: Edinburgh University Press.
Baxter, M. J. & Beardah, C. C. (1995a). Graphical presentation
of results from principal components analysis. In (J. Huggett &
N. Ryan, Eds) Computer Applications and Quantitive Methods
in Archaeology 1994. Oxford: BAR International Series 600,
pp. 6367.
Baxter, M. J. & Beardah, C. C. (1995b). Beyond the Histogram
Archaeological Applications of Kernel Density Estimation. Re-
search Report 6/95, Nottingham Trent University Department of
Mathematics, Statistics and Operational Research, Nottingham
Trent University, Nottingham, U.K.
Beardah, C. C. & Baxter, M. J. (1995). MATLAB Routines for Kernel
Density Estimation and the Graphical Representation of Archaeo-
logical Data. Research Report 2/95, Nottingham Trent University
Department of Mathematics, Statistics and Operational Research,
Nottingham Trent University, Nottingham, U.K.
Binford, L. R. (1978). Dimensional analysis of behavior and site
structure: learning from an Eskimo hunting stand. American
Antiquity 34, 330361.
Blankholm, H. P. (1991). Intrasite Spatial Analysis in Theory and
Practice. Aarhus: Aarhus University Press.
Bowman, A. & Foster, P. (1963). Density based exploration of
bivariate data. Statistics and Computing 3, 171177.
Orton, C. R. (1988). Review of Quantitative Research in Archaeology,
Aldenderfer, M. S. (Ed.). Antiquity 62, 597598.
Relethford, J. H. & Crawford, M. H. (1995). Anthropometric
variation and the population history of Ireland. American Journal
of Physical Anthropology 96, 2538.
Scott, D. W. (1992). Multivariate Density Estimation. New York:
Wiley.
Silverman, B. (1986). Density Estimation for Statistics and Data
Analysis. London: Chapman and Hall.
Wand, M. P. & Jones, M. C. (1995). Kernel Smoothing. London:
Chapman and Hall.
Whallon, R. (1987). Simple statistics. In (M. S. Aldenderfer, Ed.)
Quantitative Research in Archaeology: Progress and Prospects.
London: Sage, pp. 135150.
Wright, R. V. S. (1992). Correlation between cranial form and
geography in Homo sapiens: CRANIDa computer program
for forensic and other applications. Archaeology in Oceania 27,
128134.
Appendix: Technical Details
The closeness of a KDE to the true density can be
dened in terms of the asymptotic mean integrated
square error (AMISE) and the value of h which
minimizes this, h
AMISE
, can be shown to have the form
h
AMISE
=[g(K)R( f)]
1/5
,
where g(K) is a function of the known kernel and
R( f)=

x
f
2
dx
is a function of the unknown true density that can be
interpreted as its roughness. Assuming that the true
density is normal leads to the normal scale rule for
choice of h. Estimating R(f), which can be done with
3
3.5
1
1.5
Component 1
C
o
m
p
o
n
e
n
t

2
1
0.5
0
0.5
1 0 1.5 0.5
2
1
1.5
2 2.5 0.5
3
2.5
Figure 12. The same data as in Figure 11 in the form at 90, 95 and
100% contours arising from a KDE based on an STE rule for
window-width selection.
Kernel Density Estimates 353
varying degrees of renement, leads to the family of
direct plug-in (DPI) estimates. Details are given in
Wand & Jones (1995).
Solve-the-equation (STE) estimates are closely
related to DPI estimates.The formula for h
AMISE
is the
starting point, and R( f) is replaced by an estimate that
depends on h and can be determined for an initial
choice of h. This leads to the estimate of h
AMISE
that in
turn is used to estimate a new R( f) and a new h
AMISE
.
This process continues until h converges.
These and other techniques have been implemented
in the MATLAB package by one of the authors (CCB)
and are freely available to anyone who wants them
(email c.beardah@maths.ntu.ac.uk). The routines were
developed because our interest in KDEs occurred at a
time when nothing else was obviously and easily avail-
able to us. We believe that kernel density estimation is
a valuable tool for data analysis that can be fruitfully
deployed by archaeologists. We are also aware that
software to implement the ideas involved, including
our own, is not readily available and is expensive. It is
likely that this situation will change, and that kernel
density estimation will become available in accessible
software packages. Our hope is that the present paper
will encourage the use of such methodology when it
becomes more readily available.
354 M. J. Baxter et al.