© All Rights Reserved

11 views

© All Rights Reserved

- An Efficient Construction of Itinerary Planning for Multi-Users
- machine learning
- statistik iif
- kmeans1
- 14. Eng Survey S. Subitha
- Paper 26-Integration of Automated Decision Support Systems With Data Mining Abstract a Client Perspective
- Warehouse Location
- QQ Plots for Normality Check
- Tutorial
- Principle Component Analysis
- Lesson 1-08 More on Summary Measures and Graphs STAT
- garca2014
- Automatic pattern segmentation of jacquard warp-knitted fabric based on hybrid image processing methods
- Chem lab 4
- Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System
- 02-New Describe Data
- 15.pdf
- Analisis de Consist en CIA Cuenca Sama
- tema5_teoria-2830
- Kroener98_icpr Authentication of Free Hand Drawings

You are on page 1of 8

M. J. Baxter and C. C. Beardah

Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University,

Nottingham NG11 8NS, U.K.

R. V. S. Wright

Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia

(Received 10 November 1995, manuscript accepted 11 March 1996)

Kernel density estimates, which at their simplest can be viewed as a smoothed form of histogram, have been widely

studied in the statistical literature in recent years but used hardly at all within archaeology. They provide an eective

method of data presentation for univariate and particularly bivariate data and this is illustrated with a range of

examples. The methodology can be used as an informal approach to spatial cluster analysis, and one example suggests

that it is competetitive with other approaches in this area. A reason for the lack of use of kernel density estimates by

archaeologists may be the lack of accessible software. The analyses described here were undertaken in the MATLAB

package using routines developed by the second author, and are available on request. 1997 Academic Press Limited

Keywords: KERNEL DENSITY ESTIMATES, BIVARIATE DATA, CONTOURING, SPATIAL

CLUSTERING, MATLAB.

Introduction

K

ernel density estimates (KDEs) at their simplest

can be thought of as an alternative to the

histogram. They typically provide a smoother

representation of the data and, unlike the histogram,

their appearance does not depend on a choice of

starting point. In this sense KDEs alleviate problems

with the histogram that have been perceived by some

archaeologists (Whallon, 1987).

The smoothness of the KDE means that it is

aesthetically more pleasing than the histogram. It also

facilitates the presentation of several data sets in a

single gure, and makes it easier to compare data sets.

This has been argued and illustrated in Baxter &

Beardah (1995b).

It might be argued that, with univariate data, the

advantages of using a KDE as opposed to a histogram

for data representation are not so great as to cause

them to be preferred on a routine basis. For bivariate

data the case for using KDEs is much stronger, and the

purpose of this paper is to illustrate this by example.

Two-dimensional histograms require large amounts of

data, are unwieldy, may be dicult to interpret, and

cannot easily be used as the basis for other methods of

data representation such as contouring. This paper will

illustrate how KDEs readily overcome these problems.

Although the possibility of using KDEs for archaeo-

logical data presentation is implicit in Ortons (1988)

comments on Whallons (1987) paper, we are not

aware of any such uses outside our own work. An

example of an application to bivariate data is given in

Baxter & Beardah (1995a). This arose when one of us

(MJB) wished to explore the potential of the method-

ology for representing results from a principal compo-

nent analysis of archaeometric compositional data and

asked the second author (CCB) if it was possible to do

this in the MATLAB package. Subsequent collabor-

ation, described in Beardah & Baxter (1995) and

Baxter & Beardah (1995b), has led to the development

of a set of MATLAB routines that include many of the

approaches described in the recent book by Wand &

Jones (1995). That book, the earlier text of Silverman

(1986), and the paper by Bowman & Foster (1993) may

be referred to for the technical developments that

underpin the work described here.

The main ideas of kernel density estimation necess-

ary for this paper are presented in the next section,

with more technical detail and discussion of compu-

tational matters in the appendix. The main section of

the paper illustrates applications of the methodology,

and the concluding section summarizes what we think

are its merits.

Kernel density estimation

Histograms are among the most common methods

of data presentation in archaeology. Anyone who

has drawn a histogram by hand will know that its

347

0305-4403/97/040347+08 $25.00/0/as960119 1997 Academic Press Limited

appearance may be crucially aected both by the point

at which the histogram is startedthe originand the

width of the intervals used, or bin-width. Good

computer software packages will make automatic and

sensible choices for the origin and bin-width, but it

should be possible to vary these and this will aect the

results obtained.

Let the origin of the histogram be m

0

, with subse-

quent interval boundaries at m

1

, m

2

, etc. and assume

that (m

j

m

j1

)=c for some constant c for j=1,2, . . . (i.e.

intervals are of equal width). Let and q be values such

that is small and q=c. It is then possible to imagine

the construction of successive histograms with origins

at (m

0

+i) for i=0,1, . . . , q1. If the q histograms so

obtained are averaged then an average shifted histo-

gram (ASH) (Scott, 1992) is obtained. The appearance

of the ASH will not be dependent on the choice of m

0

.

Its smoothness will depend on c, and increases as c

increases. The limiting form of the ASH, as <0, is a

kernel density estimate. An example is given in Baxter

& Beardah (1995b).

Another way to think of KDEs is as follows. Given

n points X

1

, X

2

, . . . , X

n

situated on a line a KDE can

be obtained by placing a bump at each point and

then summing the height of each bump at each point

on the X-axis. The shape of the bump is dened by a

mathematical function, the kernel K(x), that integrates

to 1. The spread of the bump is determined by a

window- or band-width, h, that is analogous to the

bin-width, c, of a histogram. The kernel is usually a

symmetric probability density function.

The shape of the resulting KDE does not depend on

a choice of origin and is relatively insensitive to the

exact form of K(x), which is taken to be a normal

density function in the rest of the paper. The choice of

h is more critical and will be considered shortly.

We have presented two simple ways of conceptual-

ising what a KDE is. Mathematically, the latter

approach gives the KDE as

where f|(x) is an estimate of the density underlying the

data.

Large values of h over-smooth, while small values

under-smooth the data. A variety of approaches can be

used to select h, including subjective choice and it may

often be sensible to look at KDEs for several values

of h.

More objective or data-driven choices of h can be

made, and a wide range of methods have been pro-

posed for this. These are described in detail in Wand

& Jones (1995) and in summary form in Baxter &

Beardah (1995b). An outline of a subset of these

methods is given here.

The data can be thought of as a sample of n from

an underlying and unknown true density, f(x). It is

possible to dene a measure of closeness between the

KDE and the true density, leading to an estimate of h

that maximizes the closeness. If it is assumed that

the true density is normal then it can be shown that an

optimal choice of h is

h=106n

1/5

,

where is an estimate (possibly robust) of , the S.D.

of the normal distribution. This is the normal scale rule

and will typically over-smooth the data if the under-

lying density is not normal.

The estimate of h depends, in general, on properties

of the true density that are unknown, and in particular

on a quantity that may be interpreted as the rough-

ness of the density. A family of direct plug-in (DPI)

estimates can be dened in which an estimate of h can

be obtained by plugging-in an estimate of roughness

into the equation that denes h. More details are given

in the Appendix.

A related approach is the solve the equation

(STE) method, in which an equation that relates h to a

function of the unknown density is dened. In essence,

an initial estimate of h leads to an estimate of the

density, that in turn leads to a new value for h and a

new density estimate. The process continues until the

estimate of h converges. Wand & Jones (1995: 96)

suggest that a suitable data analytic strategy is to look

at several dierent estimates of h, but that if a single

value is required DPI and STE estimates appear to be

among the more suitable.

The prime purpose of the paper is to illustrate the

use of bivariate KDEs and the generalization to these

is relatively straightforward. By analogy with the

previous discussion of univariate KDEs we may

think in terms of n points in a plane dened by

co-ordinates X

(i)

=(X

i

, Y

i

), for i=1,2, . . . , n. Locating

a bump at each point corresponds in this case

to centering a three-dimensional bump or hill at

each point and then, at each point in the plane,

summing the height of the bumps. The bump, or

kernel, is taken in this paper to be a bivariate normal

distribution.

For two variables, X and Y, a bivariate normal

distribution is dened by the means of X and Y, taken

to be zero; their S.D.; and their correlation, which

determines the orientation of the bump. If this corre-

lation is taken to be zero, as we do here, then smooth-

ing will be in the direction of the coordinate axes and

the degree of smoothing is determined by the S.D. One

will often not lose much by taking the correlation to be

zero, whereas smoothing equally in both directions, by

using the same window-widths, is not generally to

be recommended (Wand & Jones, 1995: 108).

The theory underlying the optimal choice of

window-widths is not as well developed for the bivari-

ate as for the univariate case. The examples in this

paper use window-widths for the X and Y directions

determined as for the univariate case, using either STE

estimates or the normal scale estimates.

348 M. J. Baxter et al.

With the assumption of zero correlation the

representation of the bivariate KDE, f |(x,y), is given by

where h

1

and h

2

are the window-widths in the X and Y

directions.

An attraction of using KDEs is that they can be used

as a basis for producing contour plots of the data and

this leads to graphical representations of data of a kind

that archaeologists should nd familiar. The following

discussion of how contouring can be used is based on

the paper by Bowman & Foster (1993).

After a bivariate KDE has been obtained each

(two-dimensional) data point is associated with a

density height that may be ranked from largest to

smallest. The rst 50% ranked observations, for

example, may be used to dene contours that enclose

the densest 50% of the data. The level of contouring

can be varied to contain any specied proportion of

the data, and several contours can be superimposed

on a plot, with the original data if this is helpful.

Bowman & Foster (1993: 173) note that in some

ways this provides a two-dimensional analogy to the

one-dimensional boxplot, and also that the approach

is useful for looking for modes or clusters in the

data.

A further extension, noted in the same paper, occurs

when the data points can be classied, by period or

context for example. In this case a particular contour

level such as 75% might be selected and then contours

at this level drawn for each group separately, to reveal

how similar or distinct they are. This will also be

illustrated in the next section.

Examples

There are many ways in which univariate KDEs might

be used in archaeology, and several of these have been

illustrated in our previous work. Data presentation for

a single data set and comparison between the distri-

butions of dierent data sets are obvious uses. It is

worth remarking that the boxplot, another good way

of looking at and comparing univariate data, does not

work well with multi-modal data. Bounded data, in the

sense that certain values are impossible, and data

aected by outliers can be handled using boundary

kernels and adaptive estimates respectively, and this

is discussed and illustrated in Beardah & Baxter

(1995).

For practical purposes a distinction may be drawn

between kernel density estimation as applied to simple,

or simply transformed, variables, and as applied to

composite variables such as those derived in principal

component and other forms of multivariate analysis.

This latter greatly extends the potential for the use of

KDEs and is illustrated in Examples 1, 3 and 4.

Example 1

Principal component analysis is one of the more com-

monly used multivariate methods in archaeology and a

detailed account and bibliography is given in Baxter

(1994). Typically, data are standardized and an analy-

sis results in new, linear combinations of the original

variables, called principal components, that can be

inspected for structure using plots (usually) based on

the rst two or three components. If there is structure

in the data it will often show in the rst component and

it can be useful to examine this using a KDE.

The data used for the rst example are 105 speci-

mens of Roman waste glass, with a principal compo-

nent analysis based on their chemical composition

with respect to 11 oxides. The data are given, and

extensively analysed, in Baxter (1994). The specimens

come from two sites and the statistical analyses suggest

that there are perhaps three clusters in the data that are

related to, but do not exactly coincide with the site

classication.

As a rst illustration of kernel density estimation

Figure 1 shows two KDEs for the principal component

scores, based on the normal scale estimate of h and an

STE estimate of h. The normal scale estimate over-

smooths the data, as expected, and misses the central

and smaller mode suggested by the STE approach.

The usual bivariate component plot can be repre-

sented by a KDE in various ways. Figure 2 shows a

scatter plot of the scores on the rst two components

and Figure 3 shows a KDE using the STE estimate of

h. Three main concentrations are evident. For this

example inspection of the scatterplot has led one of us

(Baxter, 1994) to the same conclusion, so that a KDE

is not essential. In Examples 3 and 4 much larger

data sets are used for which the scatterplot is a less

useful tool.

8

0.3

0

8

Fi rst component

R

e

l

a

t

i

v

e

f

r

e

q

u

e

n

c

y

2

0.05

0.25

0.2

0.15

0.1

6 4 2 0 4 6

Figure 1. Two univariate kernel density estimates for scores on the

rst principal component of an analysis of the chemical composition

of 105 specimens of Romano-British waste glass. : STE rule;

: normal scale rule.

Kernel Density Estimates 349

Example 2

An obvious use for bivariate KDEs is in the presen-

tation and interpretation of spatial data in the form of

coordinates of nd spots, for example. To illustrate

this an ethnoarchaeological data set, Binfords (1978)

Mask Site data, is used. The data are taken from

appendix A of Blankholm (1991), who uses them to

test a variety of approaches to intrasite spatial analysis.

The data, as presented by Blankholm, consists of the

spatial coordinates of ve classes of nd that might

occur in the archaeological record, such as artefacts,

large bones and bone splinters. We use the subset

based on the coordinates of the locations of 276 bone

splinters.

Figures 4 and 5 show analyses in which the normal

scale rule and STE estimates have been used to deter-

mine window widths separately for the two coordinate

directions. Both analyses show 25, 50, 75 and 100%

contours superimposed on the distribution of the bone

splinters. Once again the normal scale analysis pro-

duces a smoother picture. There are clearly three main

concentrations in the data with the STE analysis

suggesting a subdivision of one of these, in the bottom

right of the graph, into two groups and a fth group in

the upper left of the gure.

It is instructive to compare our results with those

obtained by a variety of methods in Blankholm (1991).

His gure 9, using contouring at equal heights (rather

than encompassing specied proportions of the data),

is less revelatory of structure than our gures, while a

k-means cluster analysis (his gure 17) suggests a three

cluster distribution. Contour maps or clustering arising

from local density analysis (his gure 32) and nearest

neighbour analysis (his gure 39) are also given. We

think that our gures, and particularly that for the

STE analysis, suggest structure as well asor more

clearly thanthe analyses in Blankholm (1991).

5

4

8

5

Component 1

C

o

m

p

o

n

e

n

t

2

2

6

2

0

2

4

4 2 0 4 1 3 3 1

Figure 2. Principal component plot for the rst two components

from an analysis of the chemical composition of 105 specimens of

Romano-British waste glass.

6

0.2

0

6

Component 1

R

e

l

a

t

i

v

e

f

r

e

q

u

e

n

c

y

2

0.05

0.15

0.1

4

2

0

4

Component 2

5

5

0

Figure 3. A KDE estimate, based on an STE rule for the selection of

h, for the data.

13

12

3

Component 1

C

o

m

p

o

n

e

n

t

2

5

9

8

10

7

4 6

6

11

4

5 12 11 10 7 8 9

Normal scal e rul e

Figure 4. A KDE of the Mask Site data using the normal scale rule.

The contours are for 25, 50, 75 and 100% inclusion levels.

13

12

4

3

Component 1

C

o

m

p

o

n

e

n

t

2

5

9

8

10

7

4 6

6

11

5 12 11 10 7 8 9

STE rul e

Figure 5. As for Figure 4 but using an STE estimate.

350 M. J. Baxter et al.

How real is the structure suggested? In fact the

location of hearths, activity areas and features such as

rocks is known, and Blankholm provides a map of

these that can be overlaid on his gures. There are ve

hearths and two of them are associated with concen-

trations detected in all analysesthose to the left of

our gures. Two other hearths that are adjacent, and at

the bottom left, are associated with the third main

concentration. Only our STE analysis suggests two

subdivisions of this group. The fth hearth is associ-

ated with a less dense area of bone splinters in the

upper right of the diagram and is suggested by our STE

analysis and some of those reported by Blankholm.

From this discussion we conclude that, for this

example at least, the KDE approach is competitive

with other statistical approaches to spatial analysis in

archaeology of the kind that seeks clustering in artefact

scatters.

From the foregoing discussion it is obvious that

contouring of artefact scatters can be undertaken with-

out reference to kernel density estimation. The merits,

or otherwise, of dierent approaches will be discussed

in the concluding section. It will also be obvious that

KDEs can be used as an informal means of cluster

analysis for these kind of data, and in this sense

competes with more formal methods such as k-means

cluster analysis. It is known that k-means analysis has

a tendency to produce spherical clusters, whether or

not the real structure has this form. This diculty is

avoided by the clusters (or contours) suggested by

KDEs. Determining the number of clusters is a prob-

lem for any clustering approach. With KDEs it should

be informative to examine contours at dierent levels

of inclusion as a means of looking for structure at

dierent scales of spatial resolution.

Figure 6 shows the alternative representation, for the

STE estimate, of the KDE as a three dimensional

diagram. There are four clear concentrations, or

modes, with a much gentler hillock, visible behind

the front peaks, that is associated with the fth hearth.

Example 3

This third example is based on anthropometric rather

than archaeological data, but is ideal for showing how

KDEs can be used to illuminate the message of large

data sets. The data are discussed and analysed in

Relethford & Crawford (1995) and consist of 17 body

and craniofacial measurements from 7214 male adults

in 31 birth counties in Ireland. The data were originally

used to investigate the genetic distances between the

populations dened by the counties.

It was of interest for one of us (RVSW) to investi-

gate the performance of a principal component analy-

sis, in order to see how the rst two principal

components relate to geography. Some strong corre-

lations of this sort, but for dierent data, have been

reported by Wright (1992). An obvious problem, in

terms of the usual component plots presented from

such an analysis, is that there are too many data points

to plot the data sensibly in the usual way. Here we

concentrate on what KDEs have to oer in terms of

handling such a mass of data, without going into

aspects of substantive interpretation, and note that any

two-dimensional scatter of data can be handled in a

similar way.

Figures 710 show four dierent representations of

these data. Figure 7 is an attempted scatter plot that

shows how hopeless it is to try and discern structure in

the data in this way; Figure 8 is a three-dimensional

plot along the lines of Figures 3 & 6; and Figure 9 is a

contour plot along the lines of Figure 5. An STE

estimate has been used. The interesting feature of these

last two plots is that there are no interesting features;

there is no evidence of any kind of grouping in the

data. The nal plot in Figure 10 shows separate 75%

contours for three of the counties, and there is no

evidence of any dierence between them. Although the

plot becomes very crowded this remains the case if all

6

0

2

Component 1

R

e

l

a

t

i

v

e

f

r

e

q

u

e

n

c

y

14

0.2

0.25

4

Component 2

10

14

12

12

10

8

2

8

6

4

0.15

0.1

0.05

Figure 6. A density plot of the Mask Site data using the STE

estimate.

0.15

0.1

0.06

0.15

Component 1

C

o

m

p

o

n

e

n

t

2

0.05

0.04

0.08

0.04

0

0.02

0.1 0 0.1 0.05

0.06

0.02

Figure 7. A scatterplot of the scores from a principal component

analysis of the Irish body and craniofacial measurement data. Based

on 7214 individuals, the purpose is to illustrate that such plots are of

limited use for looking at large data sets.

Kernel Density Estimates 351

counties are represented in a similar way on the same

plot. The plots indicate that the correlation (if any)

that exists between the principal components and

geography must be a low one.

Example 4

The nal example is similar in kind to the previous

example, and arises from a correspondence analysis

originally undertaken by one of us. It possesses

additional features of interest.

The data are the frequencies of eight dierent types

of archaeological site, recorded for 4712 km

2

in the

Australian state of Victoria. The resulting 4712 corre-

spondence analysis object scores are plotted on

Figure 11.

It is not possible to get a sensible looking plot here

because the structure of the data is such that many of

the points represent multiple occurrences. This over-

printing happens because many of the kilometre

squares have identical frequencies of sites, though

multiple occurrences are not evident from the plot.

Solutions such as jittering exist, in which points are

displaced by a small and random amount, which would

tend to give a separate point for each site, but this then

leads to problems similar to that evidenced in Figure 7.

An alternative approach is to use a KDE and

contour it at some suitable level in order to see where

the points pile up, and this is done in Figure 12.

Here 90, 95 and 100% contours are shown using an

STE estimate, and are suggestive of, perhaps, nine

groups.

Discussion

For simple data presentation and comparison of uni-

variate data KDEs can be regarded as an alternative to

the histogram. We think that there are aesthetic and

0

0.2

Component 1

R

e

l

a

t

i

v

e

f

r

e

q

u

e

n

c

y

0.2

Component 2

0.1

0.15

200

100

0.1

0

0.1

0.1

0.05

0

0.05

500

400

300

Figure 8. The data of Figure 7 represented as a density plot.

0.15

0.1

0.06

0.15

Component 1

C

o

m

p

o

n

e

n

t

2

0.05

0.04

0.08

0.04

0

0.02

0.1 0 0.1 0.05

0.06

0.02

Figure 9. The data of Figure 7 represented as a contour plot

showing 25, 50, 75 and 100% levels of inclusion.

0.06

0.03

0.04

0.06

Component 1

C

o

m

p

o

n

e

n

t

2

0.02

0.03

0

0.02

0.04 0 0.04 0.02

0.02

0.01

0.01

Figure 10. The data of Figure 7 showing 75% contours for three of

31 counties. The contours largely overlap, and this is the case

however the counties are selected.

Figure 11. A scatterplot based on the rst two components of a

correspondence analysis of the Victorian sites data. Many of the

points displayed represent multiple occurrences.

352 M. J. Baxter et al.

interpretational benets to be obtained from using

KDEs and have argued this elsewhere (Baxter &

Beardah, 1995b), but others may regard it as a matter

of taste.

For bivariate data the case for regarding KDEs as a

tool to be routinely used is a strong one. Even when an

ordinary scatter plot could be used, as in Examples 1

and 2, KDEs can be more eective for showing con-

centrations of points or modes in the data. For very

large data sets as in Examples 3 and 4, where scatter-

plots are uninterpretable, it is unarguable that KDEs

or other methods with similar aims are needed to get a

view of the data.

Of course it would be quite possible to generate

contour plots and density plots of the kind shown in

the gures without recourse to KDEs, so why use

KDEs? One answer is that statistical theory associated

with KDEs provides guidance as to the appropriate

degree of smoothing to use and, as the rst two

examples show, this can have a critical eect on the

interpretation of the data. Another reason for using

KDEs is that they lend themselves naturally to con-

touring in terms of inclusion of specied percentages of

the most densely clustered points. As the examples

show, this means that KDEs can be used as an

informal kind of clustering method that does not

impose structure on the data in the way that more

formal methods often do.

Acknowledgements

Caroline Jackson is thanked for providing the

glass compositional data used in Example 1. John

Relethford and Michael Crawford are thanked for

providing the data used in Example 3, and for per-

mission to use it. Richard MacNeill and Stewart

Simmons (Aboriginal Aairs Victoria) are thanked for

the sites data used in Example 4. Responsibility for the

use to which these data sets have been put in the paper,

and interpretations, rests with the present authors.

References

Baxter, M. J. (1994). Exploratory Multivariate Analysis in Archae-

ology. Edinburgh: Edinburgh University Press.

Baxter, M. J. & Beardah, C. C. (1995a). Graphical presentation

of results from principal components analysis. In (J. Huggett &

N. Ryan, Eds) Computer Applications and Quantitive Methods

in Archaeology 1994. Oxford: BAR International Series 600,

pp. 6367.

Baxter, M. J. & Beardah, C. C. (1995b). Beyond the Histogram

Archaeological Applications of Kernel Density Estimation. Re-

search Report 6/95, Nottingham Trent University Department of

Mathematics, Statistics and Operational Research, Nottingham

Trent University, Nottingham, U.K.

Beardah, C. C. & Baxter, M. J. (1995). MATLAB Routines for Kernel

Density Estimation and the Graphical Representation of Archaeo-

logical Data. Research Report 2/95, Nottingham Trent University

Department of Mathematics, Statistics and Operational Research,

Nottingham Trent University, Nottingham, U.K.

Binford, L. R. (1978). Dimensional analysis of behavior and site

structure: learning from an Eskimo hunting stand. American

Antiquity 34, 330361.

Blankholm, H. P. (1991). Intrasite Spatial Analysis in Theory and

Practice. Aarhus: Aarhus University Press.

Bowman, A. & Foster, P. (1963). Density based exploration of

bivariate data. Statistics and Computing 3, 171177.

Orton, C. R. (1988). Review of Quantitative Research in Archaeology,

Aldenderfer, M. S. (Ed.). Antiquity 62, 597598.

Relethford, J. H. & Crawford, M. H. (1995). Anthropometric

variation and the population history of Ireland. American Journal

of Physical Anthropology 96, 2538.

Scott, D. W. (1992). Multivariate Density Estimation. New York:

Wiley.

Silverman, B. (1986). Density Estimation for Statistics and Data

Analysis. London: Chapman and Hall.

Wand, M. P. & Jones, M. C. (1995). Kernel Smoothing. London:

Chapman and Hall.

Whallon, R. (1987). Simple statistics. In (M. S. Aldenderfer, Ed.)

Quantitative Research in Archaeology: Progress and Prospects.

London: Sage, pp. 135150.

Wright, R. V. S. (1992). Correlation between cranial form and

geography in Homo sapiens: CRANIDa computer program

for forensic and other applications. Archaeology in Oceania 27,

128134.

Appendix: Technical Details

The closeness of a KDE to the true density can be

dened in terms of the asymptotic mean integrated

square error (AMISE) and the value of h which

minimizes this, h

AMISE

, can be shown to have the form

h

AMISE

=[g(K)R( f)]

1/5

,

where g(K) is a function of the known kernel and

R( f)=

x

f

2

dx

is a function of the unknown true density that can be

interpreted as its roughness. Assuming that the true

density is normal leads to the normal scale rule for

choice of h. Estimating R(f), which can be done with

3

3.5

1

1.5

Component 1

C

o

m

p

o

n

e

n

t

2

1

0.5

0

0.5

1 0 1.5 0.5

2

1

1.5

2 2.5 0.5

3

2.5

Figure 12. The same data as in Figure 11 in the form at 90, 95 and

100% contours arising from a KDE based on an STE rule for

window-width selection.

Kernel Density Estimates 353

varying degrees of renement, leads to the family of

direct plug-in (DPI) estimates. Details are given in

Wand & Jones (1995).

Solve-the-equation (STE) estimates are closely

related to DPI estimates.The formula for h

AMISE

is the

starting point, and R( f) is replaced by an estimate that

depends on h and can be determined for an initial

choice of h. This leads to the estimate of h

AMISE

that in

turn is used to estimate a new R( f) and a new h

AMISE

.

This process continues until h converges.

These and other techniques have been implemented

in the MATLAB package by one of the authors (CCB)

and are freely available to anyone who wants them

(email c.beardah@maths.ntu.ac.uk). The routines were

developed because our interest in KDEs occurred at a

time when nothing else was obviously and easily avail-

able to us. We believe that kernel density estimation is

a valuable tool for data analysis that can be fruitfully

deployed by archaeologists. We are also aware that

software to implement the ideas involved, including

our own, is not readily available and is expensive. It is

likely that this situation will change, and that kernel

density estimation will become available in accessible

software packages. Our hope is that the present paper

will encourage the use of such methodology when it

becomes more readily available.

354 M. J. Baxter et al.

- An Efficient Construction of Itinerary Planning for Multi-UsersUploaded byijtetjournal
- machine learningUploaded byShahid KI
- statistik iifUploaded byMauidzotussyarifah
- kmeans1Uploaded byAnibal
- 14. Eng Survey S. SubithaUploaded byImpact Journals
- Paper 26-Integration of Automated Decision Support Systems With Data Mining Abstract a Client PerspectiveUploaded byEditor IJACSA
- Warehouse LocationUploaded byAdhitya Sulis Handono
- QQ Plots for Normality CheckUploaded bydkanand86
- TutorialUploaded byYosef Oscar Sugi
- Principle Component AnalysisUploaded byShashank Surheley
- Lesson 1-08 More on Summary Measures and Graphs STATUploaded byallan.manaloto23
- garca2014Uploaded byAnshu
- Automatic pattern segmentation of jacquard warp-knitted fabric based on hybrid image processing methodsUploaded byAndrian Wijayono
- Chem lab 4Uploaded byqan
- Bayesian Networks for Earthquake Magnitude Classification in a Early Warning SystemUploaded byoksya
- 02-New Describe DataUploaded bymia
- 15.pdfUploaded bysuma
- Analisis de Consist en CIA Cuenca SamaUploaded byJaime Chuchon Remon
- tema5_teoria-2830Uploaded byLuis Alejandro Sanchez Sanchez
- Kroener98_icpr Authentication of Free Hand DrawingsUploaded byouifou
- Vol 15 No 3 - May 2015Uploaded byijcsbi
- Ghosh 2015Uploaded byBambang Riyono
- Adaptive Blind Multiuser Detection Under Impulsive Noise Using Principal ComponentsUploaded byCS & IT
- A New Approach for video denoising and enhancement using optical flow EstimationUploaded byAnonymous kw8Yrp0R5r
- ConferenceUploaded byRyan Rakhmat Setiadi
- IT446_Wk03.2_HanKamberPei_03Preprocessing.ppt.pdfUploaded byام زياد المطلق
- Modelling With RUploaded byJeiel França
- A Localized Algorithm for Structural Health Monitoring Using WirelessUploaded byAdam Iskandar
- 113.pdfUploaded byAubin Tango
- Rainwater in Montereal IslandUploaded bySampurna Maharjan

- Harrower2010Uploaded byGustavo Lucero
- Daehnke 2011 Space and PlaceUploaded byGustavo Lucero
- Schoocongdej Forager Mobility OrganizationUploaded byGustavo Lucero
- Hirth 1998 the Distributional ApproachUploaded byGustavo Lucero
- Ley 1981 Cultural and Humanistic Geography.pdfUploaded byGustavo Lucero
- Holben 1986Uploaded byGustavo Lucero
- Williams Et Al 2013Uploaded byGustavo Lucero
- Herzog Yepez 2010 LeastCost KDEUploaded byGustavo Lucero
- Baxter y Beardah 1997 Archaeological Applications of Kernel Density Estimates.pdfUploaded byGustavo Lucero
- Bender Wright 88 High Altitude OccupationsUploaded byGustavo Lucero
- Kent_1992Uploaded byGustavo Lucero
- Int J Morphol 2011Uploaded byQhip Nayra
- Walsh 2006 WorldarchUploaded byGustavo Lucero
- MoyesUploaded byGustavo Lucero
- 10-06_gietl_et_al-dachstein-libre.pdfUploaded byGustavo Lucero
- Bousman.pdfUploaded byGustavo Lucero
- 31-1-2-Richardsetal.pdfUploaded byGustavo Lucero
- 209475866-Wiessner-1982-1Uploaded byGustavo Lucero
- foley 1981 ch 2Uploaded byGustavo Lucero
- Thomas Nonsite Sampling in Archaeology 1975Uploaded byGustavo Lucero
- Lock 2000Uploaded byGustavo Lucero
- Nuñez Et Al 2010Uploaded byGustavo Lucero
- Kent 1992Uploaded byGustavo Lucero
- Bettinger Baum Hoff 1982Uploaded byGustavo Lucero
- Kuhn 1994 Formal Approach Mobile ToolkitsUploaded byGustavo Lucero
- Intensification in the PacificUploaded byGustavo Lucero
- Kehoe 2011 Binford and His Moral MajorityUploaded byGustavo Lucero

- San Agustin InglesUploaded byicanh2012
- Kremkau Late Horizon JequetpequeUploaded byBasili Olärraz
- Ancient History EncyclopediaUploaded byfmaldonadopr
- ARC000321 Lecture 15 Industrializing CitiesUploaded byAung Htun Linn
- Alizadeh - Excavations at the Perhistoric Mound of Chogna Bonut, Khuzestan, IRan.pdfUploaded byNancy Escalante
- Farnham Lane, East BurnhamUploaded byWessex Archaeology
- 42nd Annual Report of the Bureau of American EthnologyUploaded byPasha Urr
- Archaeology Moche Occupation of the Santa Valley PUploaded byCelia Vilca Reto
- 148: Llys Pentwyn Uchaf Farm, Oakdale, Caerphilly. Building RecordingUploaded byAPAC Ltd
- Bifaces (Stone Tools)Uploaded byPaul Redish
- Museum Education and Archaeological EthicsUploaded byDan Octavian Paul
- Rabat Roman AquedactUploaded byJacob Grech
- Fingringhoe and Middlewick Ranges, ColchesterUploaded byWessex Archaeology
- Habu 2004 Chaps 1 and 7Uploaded byJose Luis Osorio Villafuerte
- Forensic Approaches to Death Disaster and AbuseUploaded bytupac446
- ItanosGaiaCommentsJAMOR18Jun2014FINAL3Uploaded byManuel Moore
- PapadopoulosUploaded bytesting124
- Saville BAR Boddam Offprint[1]Uploaded byJason Hunt
- Zedeño - The Archaeology of Territory and Territoriality (Arqueologia).pdfUploaded byPaulo Marins Gomes
- Costin 1989 consumo PerúUploaded byMijaely Castañon
- Valerie A. Andrushko, John W. Verano (2008).pdfUploaded byYordano Marocho Contreras
- IRWA Strategies for Pipeline PersonnelUploaded byDory Hippauf
- Understanding Cultural LandscapesUploaded bystephen_duplantier
- Newsletter 82Uploaded byHeri Santoso
- DEEP MAPPING THE MEDIA CITY.pdfUploaded byMichele Louise Schiocchet
- CERAMIC ETHNOARCHAEOLOGYUploaded bymark_schwartz_41
- CRL.report 3. Silicone Oil and Organic ConservationUploaded byTrinidad Pasíes Arqueología-Conservación
- 9789088904363 - Mol Et Al. 2017 - The Interactive Past - E-bookUploaded bySidestone Press
- M. McGeough Exchange Relationships at UgaritUploaded byVlad Stangu
- ANCIENTPLANET_01Uploaded byEgiptologia Brasileira