You are on page 1of 21

Journal of Archaeological Science 84 (2017) 74e94

Contents lists available at ScienceDirect

Journal of Archaeological Science


journal homepage: http://www.elsevier.com/locate/jas

Geospatial Big Data and archaeology: Prospects and problems too


great to ignore*
Mark D. McCoy
Department of Anthropology, Southern Methodist University, P.O. Box 750336, Dallas, TX, 75275-0336, USA

a r t i c l e i n f o a b s t r a c t

Article history: As spatial technology has evolved and become integrated in to archaeology, we face a new set of chal-
Received 22 December 2016 lenges posed by the sheer size and complexity of data we use and produce. In this paper I discuss the
Received in revised form prospects and problems of Geospatial Big Data (GBD) e broadly dened as data sets with locational
27 May 2017
information that exceed the capacity of widely available hardware, software, and/or human resources.
Accepted 1 June 2017
While the datasets we create today remain within available resources, we nonetheless face the same
Available online 11 July 2017
challenges as many other elds that use and create GBD, especially in apprehensions over data quality
and privacy. After reviewing the kinds of archaeological geospatial data currently available I discuss the
Keywords:
Geospatial
near future of GBD in writing culture histories, making decisions, and visualizing the past. I use a case
Big Data study from New Zealand to argue for the value of taking a data quantity-in-use approach to GBD and
Spatial technology requiring applications of GBD in archaeology be regularly accompanied by a Standalone Quality Report.
Cyberinfrastructure 2017 Elsevier Ltd. All rights reserved.
Data science

1. Introduction results that are scientic (testable, replicable), authentic (a faithful


representation of the archaeological record and the human past),
Archaeology has long recognized that spatial location is a core and ethical (protects cultural resources). To that end, I am guided in
variable in our eld (Spaulding, 1960). Today, we create, use, and this paper by three questions: 1) What kinds of geospatial data are
share geospatial archaeological data on an unprecedented scale. In available today? 2) How will larger and more accessible geospatial
a recent paper, Bevan outlined many of the challenges we face with databases shape the near future of archaeology? And, using a case
oods of new evidence about the past that are largely digital, study from New Zealand, I examine the question, 3) What can we
frequently spatial, increasingly open and often remotely sensed do now about apprehensions regarding data quality, privacy, and
(Bevan, 2015:1473, emphasis added). As our locational datasets the growing size of archaeological geospatial datasets?
grow, and become more accessible, so does apprehension about These questions e what data is available, what will be the im-
data quality, privacy (especially the protection of the locations of pacts of larger and more accessible data, and what can we do
archaeological sites), and how best to manage large and growing mitigate our concerns about data e exemplify current debates
geospatial data. At the same time, we have amassed such large about Big Data in general, and Geospatial Big Data specically.
databases that, on some topics, it would be disingenuous to claim Geospatial Big Data (GBD) can be broadly dened as data sets that
we do not yet have enough data (Bevan, 2015:1477). include locational information and exceed the capacity of widely
There is a growing literature in archaeology aimed at bringing available hardware, software, and/or human resources. Before we
attention to how we can best use technology (Kintigh, 2006; Snow go further, it is important to note that as of today, nearly all
et al., 2006) to achieve our larger disciplinary goals (e.g., Kintigh archaeological datasets fall short of being dened as GBD since the
et al., 2014). The need for larger and more integrated geospatial volume of data we work with rarely outstrips the capacity of
data and analyses cross-cuts virtually all of our goals and aspira- available resources; with the exception of remotely sensed data
tions as a science (Table 1). These require us to produce data and (satellite imagery, lidar). But, while the volume of archaeological
geospatial datasets are currently manageable, there are at least two
good reasons we should begin to think about our geospatial data-
sets as GBD. First, due to the fragmentary nature of archaeological
*
The special issue was handled by Meghan C.L. Howey and Marieka Brouwer
material evidence we are compelled to work with a broad variety of
Burg.
E-mail address: mdmccoy@smu.edu. sources of data, to code complex contextual information in to a

http://dx.doi.org/10.1016/j.jas.2017.06.003
0305-4403/ 2017 Elsevier Ltd. All rights reserved.
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 75

Table 1
Grand Challenges for Archaeology and the Need for Larger Geospatial Data and Analyses. Kintigh et al. (2014:5) summary of the most important scientic challenges for
archaeology highlight a number of areas where the need for larger geospatial datasets and analyses is paramount. The purpose of this paper is identify how we are currently
building, using, and sharing geospatial data; what larger geospatial datasets will mean for the near future; and suggest ways we may overcome apprehension over the ad-
equacy of large geospatial datasets e otherwise known as Geospatial Big Data e that will be necessary to meet our disciplinary goals.

General Topic Examples of the Need for Larger Geospatial Datasets and Analyses

Emergence, Communities, and


Complexity
Archaeological data on cities range from small architectural details and short-lived cities to broad
patterns of heterogeneous urban textures covering many square kilometers and presenting a historical
depth of millennia. Consequently, characterizing long-term urban fabrics and animating associated
behaviors via computational modeling requires enormous data archives and substantial computational
infrastructure (Kintigh et al., 2014:10, emphasis added).
Conict is notoriously difcult to identify and quantify through archaeological remains more sys-
tematic and large-scale analyses are certainly necessary. (Kintigh et al., 2014:10, emphasis added).
Inequality can be systematically inferred through studies of landscape, monuments, residences, and
mortuary remains Quantitative dynamic modeling to emplace general models of sociopolitical
change in specic prehistoric and historical settings will be critical to our success. (Kintigh et al.,
2014:9, emphasis added).
Resilience, Persistence,
Transformation, and The archaeological record is replete with examples of the rise and fall of communities of all scales
Collapse With recent advances in the quantity and quality of archaeological and historical studies, we can uncover
robust patterns in societal collapses over time and space. (Kintigh et al., 2014:11, emphasis added).
Movement, Mobility, and
Migration Typically, archaeologists have explored human mobility through a case-study approach based on
archaeological and ancillary data from small-scale research projects. However, we also see the need for
regional- and continental-scale studies that match the scale of the problem to the scale of particular
interactions. (Kintigh et al., 2014:13, emphasis added).
Cognition, Behavior, and
Identity . how did humanity arise?... a massive body of emerging data are critical to resolving this question
(Kintigh et al., 2014:15, emphasis added).
Tracking and evaluating localized arrangements and recongurations necessitates extensive in-
vestments in digital spatial datasets that incorporate LiDAR, geophysical, and other three-dimensional
data that allow virtual exploration and analysis. (Kintigh et al., 2014:15, emphasis added).
Human Environment
Interaction How do humans perceive and react to changes in climate and the natural environment over short-
and long-terms?... The challenge is to move from case or regional studies to larger scale comparative
research, and to learn how to make generalizable statements about how people make choices that
draw on universal biases in cognition [this] will require making data from relatively small eld
projects widely accessible and increasing current technological capabilities to allow for studies of human-
environment interaction to increase in scope and complexity (Kintigh et al., 2014:18e19, emphasis
added).

digital format, and to interpolate trends across time and space and participate in public discourse about science and heritage, I felt
using sparse data. These types of problems (variety, veracity, it is timely to review and comment on this topic for a broad audi-
visualization) mirror issues raised by Big Data (see also Huggett, ence in as non-technical terms as is reasonable.
2016). Second, from the perspective of data science our data are
probably best classied as embryonic Geospatial Big Data in that 2. Geospatial big data and archaeology
they are likely to grow extremely large in volume in the future. We
have the opportunity now to shape our growing geospatial datasets Today, we refer to any information, of or relating to the relative
before it becomes necessary to come up with specialized solutions position of things on the earth's surface as geospatial data (Collins
for common tasks. It is also important to note that the problem of English Dictionary). Geospatial Big Data (GBD) is geospatial data
best practices regarding geospatial data is well-known to the sub- that exceeds the capacity of widely available resources (i.e., hard-
eld of geospatial archaeology, as well as archaeology that en- ware, software, human resources) and requires specialized effort to
gages with computer and data science. As the science and tech- work with. Applied research in GBD tends to be driven by the
nology dealing with GBD evolves, the hyper-technical side of perceived economic benet of mining data to reveal spatial re-
archaeology is more important than ever. But, since GBD is already lationships that make businesses more cost efcient, enhance
inuencing how we write culture history, visualize our research, insight in to customer's behavior, and help industry make better
76 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

decisions. For example, when Netix suggests movies and TV justify the use of unveried sources of information, the underlying
shows to watch, it does so in part based on what is popular near logic being that the sheer volume of data will overcome the inclusion
you. That is GBD at work. It is Big in that Netix is using millions of of some datasets with poor accuracy. We are beginning to see this
previous searches and it is Geospatial by tagging search data by zip approach applied to archaeological geospatial data, and not sur-
code and then using that variable in its recommendation algorithm prisingly, this has raised concerns for how we account for context.
(Lee and Kang, 2015:76). Visualization of data is employed at all stages of research (i.e.,
For data science, GBD is a subset of the more general effort of generating hypotheses, identifying patterns, representing results)
dealing with Big Data, and since much of the data in the world can to help us make sense of abstract information. Archaeology has
be geo-referenced, it is hard to underestimate the importance of developed by consensus a number of methods for visualizing our
geospatial big data handling (Li et al., 2016:120). There is no spe- geospatial data in static products (i.e., regional maps, site plans,
cic test that would identify and classify data as GBD, rather GBD is stratigraphic drawings, etc.). Today, with the advent of 3D tech-
commonly said to have one of several characteristics: volume, ve- nologies, it much easier to also represent the forms of artifacts and
locity, and variety (Laney, 2001). Other additional characteristics sites in an interactive digital format, but these have yet to supplant
that have emerged including veracity, visualization, and visibility (Li static products.
et al., 2016; see also Suthaharan, 2014). Visibility of archaeological geospatial datasets is at an all-time
Archaeology is not what data scientists had in mind when high in terms of coverage, variety, and richness. Advances in
outlining what constitutes GBD. In my view, we nonetheless should cloud technology and web GIS mean we are seeing a growth in
begin to think of our geospatial data as GBD while they are still online data repositories, as well as site location indexes, atlases, and
manageable in terms of volume, in part, so we can improve how we gazetteers. Increased visibility naturally comes with increased
deal with the other issues inherent in creating, maintaining, and concerns with privacy and misuse of archaeological data, and
using GBD. Below are some examples of how the GBD character- perhaps counter-intuitively, illustrates the gaps where geospatial
istics apply to archaeology: datasets are not visible.
Volume for web-based businesses is measured not in gigabytes, These examples are certainly not an exhaustive list of the ways
or terabytes, but petabytes (1000 gigabytes 1 terabyte; 1000 in which archaeological geospatial data has the qualities of GBD,
terabytes 1 petabyte). For archaeology, it is impossible to say nonetheless they illustrate why in this paper I have chosen to
precisely how much data there are, but we know there are two classify our largest geospatial datasets as GBD.
major sources of geospatial data that together represent high and
growing volume. First, data coming from legacy projects, especially 3. How we use Geospatial Big Data in the present
as we migrate the white paper backlog in to digital (reports, forms,
catalogs, eld notes, photographs, etc.). This is an unknown but The kinds of geospatial data that are available to professional
probably substantial volume of information that will grow even archaeologists and the public today varies wildly depending upon
though the underlying research may have been nished decades region, the time period, the topic of interest, and the type of evi-
ago. Second, the often cited statistic that 90% of the world's data dence of the human past. For the purposes of this discussion I have
was generated in the last two years applies to archaeology too. classied geospatial datasets (Table 2) in a number of different
Therefore, much more daunting in terms of volume is the size of types: data repositories, location indexes, radiocarbon databases,
satellite remote sensing (Wiseman and El-Baz, 2007), eld data project websites, and academic sources. This is not an exhaustive
(GPS, photogrammetry, drones, laser scanning, etc.), and computer list, nor are these exclusive categories, they are instead meant to
based research, such as simulation. represent a cross-section of how we create, share, and use GBD in
Velocity is often a problem for GBD applications because of the practice today. I have further broken down these categories by a
torrent of information coming in around the clock. For archaeology, qualitative summary of the sources of geospatial data used, acces-
high and increasing velocity is a concern, but the inconsistency in sibility, and quality, as a way to evaluate sources in terms of their
the velocity of data is equally problematic. Take for example one of potential for data mining. Data mining itself is a misleading term
the most visible archaeological datasets on the web, the Digital in that the goal is not to extract a specic piece of existing data, but
Archaeological Record (tDAR). In 2011, tDAR integrated a large to discover new patterns and/or associations that would be
(350,000) database of reports and citations created by the US impossible to recognize and evaluate; a process also referred to by
National Parks (National Archaeological Database, NADB). This was the somewhat ambiguous, but appropriate, term from data science:
a major positive step forward for the digital archive, but in terms of knowledge discovery. I have not attempted to review geospatial
velocity, it means in one year it grew six times larger than all other databases as they apply to museums and artifact collections (Do rter
years combined (2008e16). and Davis, 2013), although I recognize that these have several
Variety in sources, types, and precision of data can create added complexities in terms of data quality and the need to code
intractable problems for any database. For archaeology, Cooper and locational information on provenience (of where an object was
Green (2015) recently summarized how in the English Landscapes reportedly found), and provenance (where it has been since it was
and Identities project they dealt with information coming from a found, and where it is located today).
wide range of sources collected over generations. Even after careful One of the rst-order differences between contemporary geo-
research, there would sometimes be no clear way to tell, for spatial databases is the distinction between archival databases
example, if sources were talking about the same monument ve verses integrative databases. Archival datasets grow by accretion of
times, or ve different monuments in the same place. Precision in distinct datasets, whereas in integrative datasets new data is added
terms of geolocation is much easier to achieve today. Nonetheless, in to a single database, as they are available. For example, data
the number of ways we might record a site (i.e., map data, imagery) repositories like Archaeology Data Service (ADS) and The Digital
and index it (i.e., site name, place name, site type), means variety Archaeological Record (tDAR) are omnibuses that takes in single
will continue to be a challenge. geospatial datasets that keep their distinct character and are
Veracity is more than locational accuracy, for archaeology it is a discoverable along with any number of other types of non-
question of the quality of the information within a narrowly dened geospatial data. In contrast, site indexes like the Digital Index of
set of relational variables; something we commonly refer to as North American Archaeology (DINNA), digital gazetteers like
context. In data science, the size of Big Data is sometimes used to Pleiades (pleiades.stoa.org), or radiocarbon databases on
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 77

archaeological materials like the Canadian Archaeological Radio- archaeological databases available, the geographic bias toward
carbon Database (CARD), continuously take in new information research on North America and Europe is readily apparent.
from a variety of sources and compile it in to a single database. Each There may also be some geographic biases in the other two
of these two kinds of datasets draws on academic and applied examples, but the regional-temporal focuses would appear to
cultural resource management sources for information. achieve a level of evenness not seen in other kinds of datasets.
The integrative databases, as opposed to the archival databases, An even more difcult quality to evaluate in our largest geo-
are especially good at exposing clear biases in what kinds of in- spatial datasets is geospatial-temporal coverage. Take for example
formation is currently accessible. For example, Fig. 1 shows a recent studies from the Near East (Lawrence et al., 2016) and China
several kinds of geospatial data: Mesoamerican sites (The Elec- (Hosner et al., 2016). The goals of each study are similar e to qualify
tronic Atlas of Ancient Maya Sites), ancient places of the Medi- and quantify paleodemographic and settlement pattern trends over
terranean (Pleiades), and sample locations derived from several the Neolithic through Bronze Ages e and both take a time-slice
major radiocarbon databases and regional studies. Each uses co- approach where site records are coded by cultural period (e.g.,
ordinates to record location and so are easily converted in to the Bronze Age) and by absolute time (century or millennium scale)
same data model (vector, points). The site database is a pan-Maya with a beginning/end (min/max age) and time period (e.g. 6 kya).
registry of ancient Maya settlements and each point has a corre- The Near Eastern study is focused on urbanism and, using a mix of
sponding assessment of Site Rank to allow for quantitative geo- survey and remote sensing, each record includes an estimate of site
spatial analyses. In contrast, ancient places includes settlements size. Both studies compare their data to shifting climate regimes in
and an extremely broad variety of other categories, such as place their respective regions and other key variables. These are both
names, to aid in the qualitative analyses of historical texts. The third ambitious and important undertakings using the best available
map shown, radiocarbon data from archaeological sites, is different and most up-to-date coverage of archaeological sites obtained by
again. Like the site database, radiocarbon databases are clearly built salvage and research excavations and surveys and recognize the
with quantitative analysis in mind, but like the ancient places underlying data has known limitations in terms of uneven
database, it includes any and all kinds of phenomenon (i.e., evi- geospatial-temporal coverage within the respective regions
dence of settlements, foraging, farming, burials, etc.). It should also (Hosner et al., 2016: 1589, emphasis added). At rst glance, the
be noted that it is possible that some radiocarbon dates within a Chinese dataset appears to have much better coverage
database reect natural processes rather than human behavior. with 50,000 site records compared with less than 400 sites in the
Nonetheless, even after combining some of the largest Near East. However, if we account for the total size of the regions

Fig. 1. Geospatial big data on sites, ancient places, and radiocarbon dates. Sources: The electronic atlas of ancient maya sites: a geographic information system (GIS); pleiades e the
stoa consortium; Goldberg et al., 2016; Martindale et al., 2016; Russell et al., 2014; Silva et al., 2015; Vermeersch, 2016).
78 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Table 2
Examples of Geospatial Big Data in Archaeology. See full internet references in List of Web and Data Sources.

Type of Source Examples Sources of Geospatial Data Accessibility Data Quality


Information

Data Repositories Archaeology Data Service (ADS) Stand-alone static datasets (GIS Public and private web portals. Approved digital repositories
The Digital Archaeological layers, locations as elds in User registration required for for the results of publicly
Record (tDAR) other datasets) from academic, download. funded science. Supports
government, and cultural training in best practices.
resource management.
Location Indexes Site Indexes. Digital Index of Single databases from a union Web-GIS, public and private Mix of approved digital
North American Archaeology of site information from web portals. repositories for the results of
(DINNA); ArchSite (New academic, government, and Most require user registration publicly funded science and
Zealand Archaeological cultural resource management. for download/access. geodata built for public
Association) consumption. Supports training
Atlases and Gazetteers. CORONA in best practices.
Atlas of the Middle East,

Antiquity A-la-carte and
Pleiades, American Institute of
Archaeology's Archaeology of
North America
Radiocarbon Databases Canadian Archaeological Single databases from a union Public and private web portals. Long-term projects with
Radiocarbon Database (CARD of independently reported User registration required for updates to correct errors.
2.0) radiocarbon results with download.
Radiocarbon Palaeolithic locational information.
Europe Database (Version 20)
Project Websites Preservation. CyArk (non-prot, Variable. Datasets reect the Public web portals. No user Variable. Some long-term
3D and VR), Sketchfab (for goals and focus of study but can registration required. Some projects are a union of datasets,
prot, community portal be broken in to broad categories allow download. others are stand-alone
supporting 3D and VR) (atlases, preservation, databases from completed
Digital Archiving. Comparative archiving, etc.). projects.
Archaeology Database
(University of Pittsburg)
Long-term Projects. Paleoindian
Database of the Americas
(PIDBA), English Landscapes
and Identities Project
Academic Sources Journals. Journal of Some archaeological journals Public web portals. Datasets published as
Archaeological Science, Journal allow optional supporting Journal articles with supplemental material have
of Archaeological Science: datasets with geospatial supplemental data may be undergone peer-review. Self-
Reports, Archaeometry, information. behind the pay wall. User archived datasets may or may
Archaeological Prospection University support centers and registration required for some not have been reviewed.
Academic Libraries and Centers. libraries provide historic and downloads.
Stanford Geospatial Center, environmental datasets useful
Harvard Geospatial Library, for archaeology.
Ancient World Mapping Center
(Brown), Center for Advanced
Spatial Technologies (Arkansas)

examined (~0.3 million vs ~9.6 million square km), the Chinese simply missing. Major academic journals, including the Journal of
dataset has only about four times as many records per million Archaeological Science, do not require publishing geospatial data
square km as the Near Eastern dataset, and if one considers alongside of new research. This is not a problem that is unique to
coverage within contemporaneous time periods, the density of geospatial data, but one that could be xed (see Horsburgh et al.,
sites is even more similar. Between 3 and 4 kya, there are less than 2016 for a similar critique regarding the lack of rigorous publica-
three times as many sites per million square km in the Chinese data tion of zooarchaeological data).
than the Near Eastern data (19,837 sites/9.6 mil sq km vs. 222 sites/ Publishing the locations of archaeological sites raises is a
0.3 mil sq km). The lesson here is clear e these datasets are more number of serious privacy issues. In a rare exception to the Freedom
alike than one would estimate just based on the number of records of Information Act in the United States, federal archaeologists
when one accounts for space and time. routinely withhold the location of archaeological sites that are not
The quality of geospatial data goes beyond simply reporting the formally open for public visitation to protect the sites from
provenience of artifacts, or the location of sites, and is probably best looting and vandalism (Hitchcock, 2006:471). Large databases of
thought of in terms of how well the dataset conforms to established site records have also been used to ght looting, as has been the
best practices. Most archaeological geospatial datasets rely on users case in satellite imagery monitoring of the impacts of looting on
to self-police when it comes to quality and report critical infor- culture heritage in the Near East (Contreras and Brodie, 2010;
mation that others might use to evaluate quality as metadata. Here Stone, 2008, 2015). For example, in Syria, Casana (2015:150) has
again the distinction between archived and integrative is a critical discovered through examining the impacts of looting over time
one; archived data are frozen in place, integrative data can be that war-related looting is most frequent and most widespread in
revised with updated versions. This does not mean one is better Kurdish and opposition-held areas, which are, perhaps unsurpris-
than the other, or that high quality studies will necessarily have ingly, also the regions with the weakest centralized authority.
matching high quality geospatial datasets. For example, we would
expect archived data published as supplementary material in
4. Use of Geospatial Big Data in the near future
peer-reviewed journals to conrm closely to best practices. Un-
fortunately, geospatial data underlying many, if not most, studies is
Kintigh et al. (2014) clearly outline why we should aspire for
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 79

larger and more accessible geospatial databases, but in my review and Hiscock, 2015). Second, there is the question of sampling.
of how we currently use GBD, I found several trends that I believe There will always be locations, and time periods, that will be over-
are strong indicators as to how this technology will be applied in sampled or under-sampled, due to the natural process of
the near future. To be clear, these are not necessarily the most taphonomy and the spatially discontinuous nature of archaeolog-
important or innovative uses of GBD, and I do not predict that these ical research. Even the largest mega-radiocarbon database studies
will change the fundamentals of how we dene our goals. I do recognize that these must be dealt with (Chaput et al., 2015:12131).
however take the position that the changes I highlight here will Lastly, and related to the question of sampling, is the trans-
apply broadly, and not just to a sub-set of tech savvy experts. formation of data. Radiocarbon dates are statements of probability,
Having said that, as we approach the volume of Big Data, we may and these probabilities are often transformed, as pooled or sum-
nd it more and more necessary to use of machine learning to med probabilities (e.g., Bamforth and Grund, 2012; Contreras and
classify data (e.g., Maaten et al., 2007), engage in far more open data Meadows, 2014), or as Bayesian models (e.g., Long and Taylor,
archaeology (Huggett, 2014, 2015a, 2015b), and rethink how we 2015), to identify trends over time. When it comes to trans-
create datasets with downstream Big Data analysis and simulation forming data over space, the interpolation of data points is a well-
in mind (Barton, 2013; Kansa et al., 2014; Kintigh, 2015). trodden path for geospatial analyses. Results are often presented as
time-slice maps, and short videos, as seen in recent studies of
4.1. Culture history in the era of Geospatial Big Data demography in Ireland (vector, point; McLaughlin et al., 2016), and
North America (raster, heat-map; Chaput et al., 2015).
Archaeology has had a long love affair with GIS. There were Historical archaeology in North America, where radiocarbon
concerns early on that GIS would lead us down a path to environ- dating is more rarely used, presents an interesting counter to the
mental determinism (Gaffney and van Leusen, 1995). The evidence role of GBD in archaeology. For example, the Digital Archaeological
to the contrary is all around us in the variety of uses we have put Archive of Comparative Slavery (DAACS), is an extraordinarily data
GIS to (McCoy and Ladefoged, 2009). I am not overly concerned rich Web-based initiative designed to foster inter-site, compara-
about newer additions to our spatial technology tool kit pulling us tive archaeological research on slavery including 2 million arti-
down one or another theoretical path. Geospatial Big Data, in my facts, chronological information (mean ceramic date; South, 1977)
view, will not lead us down an unintended path given that GBD is that can be derived a number of ways depending on the query. Like
something that has attracted our colleagues in both the Earth Sci- radiocarbon databases, the DAACS is designed to be regional
ences (e.g., Karmas et al., 2016) and in Digital Humanities (e.g., (geographic coverage includes the US Southeast and the Carib-
Bodenhamer et al., 2010). This is not to say that it will have no effect bean). The quantity of sites is many times lower than regional
on how we do archaeology, and here we turn to the topic of how we radiocarbon database (DAACS includes 73 sites), but the quality of
write culture histories in the era of GBD. information is outstanding, including site plans, Harris matrixes,
The best example of how GBD can, and will, inuence how we and a range of other types of information.
write culture histories of the prehistoric past is in the use of One factor should concern archaeology, no matter what time
radiocarbon mega-databases. From the beginning of the use of period or region, is balancing new opportunities for writing culture
radiocarbon dating in archaeology, we have seen value in the histories based on large geospatial datasets, against the unintended
collection of regional radiocarbon databases. As early as 1960s, thoughtlessness toward context that such studies could promote.
Green (1964), made the case for a standardized paper index card Specically, the kind of thoughtlessness that concerns me comes
system for radiocarbon dates in Oceania (see also Jelinek, 1962). from either leaving out important data because it is difcult to
Today, radiocarbon databases continue to be regional organized, include in the dataset or over-including inappropriate data simply
and they are most often, but not exclusively, applied to one of two because it is easy and available. So, while on the one hand we
global phenomenon: (1) migrations (Silva and Steele, 2014) and cannot let our GBD can become hoppers lled with data that is
demography (Steele, 2010) of modern humans in the Pleistocene in handy, it would be a waste if we fail to gain the benet that data
Europe (Vermeersch, 2016), Australasia (Field et al., 2007; Williams mining large databases would allow. We have to come to terms
et al., 2014), and the Americas (Chaput et al., 2015; Delgado et al., with the fact that some results will migrate to geospatial datasets
2015; Goldberg et al., 2016; Peros et al., 2010); and (2) the spread well and others will be much more difcult. With existing data-
of Neolithic farmers, or domesticates, in Europe (Crombe  and bases there are a number of options including treating large data-
Robinson, 2014), Asia (Silva et al., 2015), the Americas (Lemmen, bases as they were complete to expose biases (e.g., Cooper and
2012), Africa (Russell et al., 2014), and Polynesia (Mulrooney, Green, 2015), using computational models (e.g., Barton et al.,
2013; Wilmshurst et al., 2011). The geographic distribution of the 2010) to overcome sampling problems, and to use data on mod-
databases that underlay this research by necessity stretch beyond ern population and land cover as proxy measures for systemic bias
national boundaries, and vary in total size from a few hundred to in recovery (e.g., Miller, 2016). This is especially important to
tens of thousands of records. For the most part, they use the co- identify and account for gaps due to recovery bias since large
ordinates (latitude, longitude; northing, easting) of the site where a geospatial radiocarbon databases can reveal periods of de-
radiocarbon date was reported. There are always a small fraction of mographic collapse that should be of keen interest to archaeology
dates without the requisite site location coordinate information. (Shennan et al., 2013; McLaughlin et al., 2016; Mulrooney, 2013;
These databases often have extensive information on the context, Zubimendi et al., 2015).
material dated, and laboratory results.
Concerns regarding studies using mega-radiocarbon databases 4.2. Archipelagos of geospatial data and decision making
naturally vary from case to case e and at this stage probably war-
rant their own lengthy review e but, in brief, concerns tend to One of the times when geospatial data is most critical is when
center around a few related key points. First, there is the question of making decisions regarding site preservation; a factor weighted
what underlying phenomenon is being measured. These databases carefully in academic research and cultural resource management.
are the result of many different studies and it is not always clear Take for example the concerns over the current and future impacts
what thresholds have been used to separate natural from cultural of the Dakota Access Pipeline (DAPL) to the natural environment
phenomenon, and proxy measures for the presence-absence or and cultural sites. The DAPL project is a 1886 km (1172 miles) long
intensity of activity in a location is study dependent (Attenbrow pipeline that is designed to bring oil produced in North Dakota to
80 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

production facilities in Illinois. At the time of writing, intense of the four states on the proposed route have their own sets of
protests continue in the tribal territory of the Standing Rock Sioux relevant state agencies, universities, and libraries, querying more
in North Dakota centering mainly on water quality but also the than a dozen sources of geospatial data is required to identify
effect the project has had on sacred sites. Colwell (2016) rhetori- previously recorded sites.
cally asks, What sacred sites have been damaged by [the pipeline]? The administrative and institutional silos highlighted above of
We can't really know for certain e and our legal system [in the US] course do not apply to national scale databases that cross US state
is partly to blame. The Society for American Archaeology has administrative boundaries; these larger datasets belong to a silo
written to the US Army Corps of Engineers to ask for a review of the dened by cultural value. Specically, the National Register, and to a
project citing a number of problematic signs that the approach lesser degree and National Landmark designation, include places
taken to tribal consultation process was too piecemeal and the area that the US government considers of national importance. The
under consideration may have been too restrictive (Gifford- National Register lists 90,000 locations, and is accessible as a web
Gonzalez, 2016). based GIS point layer online (nps.gov). In the Standing Rock Sioux's
It is unclear how the DAPL situation will ultimately be resolved territory, and across the US, the National Register is mainly made up
but it does throw in to high relief some of the consequences of data of historic buildings. Therefore, in rural areas the density of sites is
silos. Here I am using the term silo to refer to the degree of access low; indeed, there are only 441 in all of North Dakoka (as of July
the public has to geospatial data. For example, the company that is 2015). Most relevant to the DAPL project, is the question of Tradi-
building the pipeline, Energy Transfer Partners, presents the pro- tional Cultural Properties (TCP). Within the National Register a TCP
posed route of the pipeline online in ve static maps is a property eligible for inclusion based on its associations
(daplpipelinefacts.com); one that highlights the 50 counties it with cultural practices, traditions, beliefs, lifeways, arts, crafts, or
crosses, and four state-scaled maps of North Dakota, South Dakota, social institutions of a living community (nps.gov). This is a cate-
Iowa, and Illinois. The actual route is silo-ed by the fact that only gory dened by value of a place to local groups, and while it is
static, low quality maps are made available. There have been a common for physical evidence of the past (e.g., an archaeological
number of efforts to counter this by mapping the route reportedly site) to also be a cultural site, sacred places are not necessarily
based on public records and crowdsourced information (Nitin Gadia, marked by physical evidence of past activities, nor are their loca-
bakkenpipelinemap.com). Maps have also been a used as a form of tions necessarily something that is appropriate to be shared
protest of the DAPL including one juxtaposing the proposed route broadly. Indigenous geographers have made in-roads in thinking
and unceded Sioux land under 1851 treaty (northlandia. through how to use advances in spatial technology (Dobbs and
wordpress.com), and a map showing the location of protests with Louis, 2015), but consultation and collaboration remain the best
Lakota/Dakota place names (map by Jordan Engle and Dakota Wind) way to identify TCP.
as part of The Decolonial Atlas (decolonialatlas.wordpress.com). To Crowdsourcing is one avenue to break down silos and the use of
varying degrees these reect a broader trend of more, largely un- crowdsourcing to fund archaeology and create geospatial datasets
trained, private citizens participating in Volunteered Geographic has attracted a great deal of attention. Bonacchi et al. (2015)
Information (Goodchild, 2007), exemplied by OpenStreetMap.org, describe lessons learned from the Micropast website
a platform that has been used in humanitarian crisis mapping. (crowdsourced.micropasts.org) - a site used to try and attract
The natural question is where along the proposed route are crowdfunding and which also served as a portal to access the re-
known archaeological sites, and here we see other examples of data sults of crowd sourced data - including that crowdfunding was
silos. The environmental assessment report prepared by Energy most effective when used as a catalyst for more mixed models
Transfer Partners for the US Army Corp of Engineers for the Illinois along with ofine donations. Parcak's new GlobalXplorer website
segment of the proposed route in part reads (Dakota Access, (globalxplorer.org) is aimed at attracting crowdfunding as well as
2016:66): spreading the analysis of satellite imagery through creating a
global network of citizen explorers. But, without a way to access
A check of previously-recorded cultural resources was under-
the data that is created through this volunteer science, the results
taken within a 1.6-km (km) (1.0-mile) radius of the Proposed
may prove to be another data silo.
Action Areas/Connected Action Areas prior to the commence-
While I have emphasized the roles of silos, I would note that the
ment of eldwork. Online databases were consulted, including
notion of creating a top-down, single geospatial cyberinfrastructure
the National Historic Landmark list and the National Register of
(CI) is probably doomed to failure. Snow et al. (2006) highlighted
Historic Places. The Historic and Architectural Resources
the fundamental problem of our inability to simultaneously access
Geographic Information System (HARGIS), maintained by the
different categories of information (databases, grey literature, and
Illinois Historic Preservation Agency (IHPA), was consulted for
images) and pointed out CI should be allowed, to evolve as it is
locational and other information regarding historic buildings,
adopted, used, and contributed to by a community to do so also
historic engineering structures, and cemeteries. The Illinois In-
involves solving problems of condentiality and trust, and securing
ventory of Archaeological Sites geodatabase, maintained by the
long-term commitment from agencies (Snow et al., 2006:959).
Illinois State Museum, was consulted for locational and other
Along those lines, we are beginning to see the organic conglom-
data regarding recorded archaeological sites and previously-
eration of independently created geospatial databases in to archival
reported archaeological surveys and excavations. The Illinois
or integrative databases. For example, the Canadian-based CARD
Cultural Resource Management Report Database, maintained by
database (Martindale et al., 2016) now includes a massive database
the University of Illinois, was consulted for detailed information
on Paleolithic Europe (Vermeersch, 2016), and the Australia-based
available in previous reports. General Land Ofce maps were
Field Acquired Information Management System (FAIMS) project
researched at the Federal Township Plats website maintained by
moved their repository due to a lack of funds and resources to the
the Illinois Secretary of State. Old county plat maps and atlases
US-based Digital Archaeological Record (tDAR). However, to be
were researched at the Illinois State Library and the Galesburg
clear, I do not believe this portends a single CI for geospatial data in
Public Library.
archaeology on the horizon.
One thing that I would like to see, and I think we will see, is
This type of convoluted site record searching is typical of the due more visibility between geospatial datasets by creating more geo-
diligence required in cultural resource management, and since each portal platforms to connect archipelagos of related datasets. Well-
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 81

tended silos are how Geospatial Big Data are going to grow and be problems that need to be faced head on or they will have a para-
quality checked. Without them we lose institutional knowledge of lyzing effect on advances in archaeology. I present three small
the character of databases and the ability to update them. But at studies on the archaeology of fortications in New Zealand, called
the same time we need to build digital architecture that allows us to pa by Maori. For these examples I will employ different types of
see that other data silos exist, even if access is limited. The tech- data to examine culture history, evaluate public and professional
nology for creating archipelagos of geospatial datasets exists and is site records, and create visualizations. For the purposes of this
improving - the European Union's Inspire Geoportal (inspire-geo- paper I am taking a data quality-in-use approach (Merino et al.,
portal.ec.europa.eu) is a good example - and the marketplace for 2016), meaning that I do not presume to know the adequacy of
cloud computing web GIS (i.e., CartoDB) will likely give archaeology the existing GBD, or the improvements necessary to achieve the
many more options than currently available as desktop software tasks at hand, before the study. Rather, a post hoc assessment is
(i.e., ESRI's ArcGIS; Quantum GIS) or web mapping (i.e., Google made of the adequacy of the original dataset and improvements in
Maps, Google Earth). a Standalone Quality Report. I recognize this goes against our in-
stincts as scholars and looks like we are abandoning our core values
4.3. Visualization and ways to see the past regarding data quality. To the contrary, what I am advocating is
nding a productive way to apply those values on Geospatial Big
It is difcult to underestimate the power of good visuals in Data and identify issues in use, and share how they have been dealt
communicating the results of archaeological research and con- with, so we can have better GBD in the future.
necting with the public. The high-resolution satellite imagery Maori fortications are an example of a topic about which we
showing major sites before-and-after looting, for example, is a are data rich and information poor. We know, for example, when
powerful way to illustrate ongoing threats to cultural heritage. It they began to be built, we know about how many were built, we
also goes without saying that the proliferation of UAV (unmanned know many more were built in warmer, coastal environments with
aerial vehicles), 3D terrestrial laser scanning, and other spatial good farmland and high population density. It remains unclear,
technologies is multiplying the volume of geospatial data available however, if population density was always high in regions with
to archaeology at a breakneck pace. good farmland from rst settlement of the islands, or if there were
It is interesting to see how archaeology deals with the inherent any geographic shifts in where fortications were built over time.
difculties in visualization of GBD. Data visualization when it New Zealand was rst settled after 1250 CE through long-
comes to GBD pushes against the natural limitations of what the distance voyages from Eastern Polynesia (for a recent summary,
human eye can visually process; one reason why we see more use see Dye, 2015). The rst centuries of New Zealand's culture history,
of geographic scale-dependent representations (see for example, referred to as the Early Period (1250e1450 CE) includes strong
DINAA). In archaeology, the topic of social network analysis (SNA) is evidence for a highly mobile settlement pattern across the country,
a domain where we are seeing lots of visualization of large geo- but no fortications (Walter et al., 2010). There remains no strong
spatial databases. SNA comes with its own data model (nodes, evidence for the construction of fortications until around 1500 CE
links) and diagrams that illustrate networks. The connection back (Schmidt, 1996; McFadgen et al., 1994), in the Middle Period
to geographic space, and geographic relationships, can be (1450e1650). The use of fortications was documented by Euro-
embedded within the visualization (i.e., Clark et al., 2014), or the pean visitors during the Late Period (1650e1800 CE) and in the
results of the SNA can be mapped on to the real world in some Historical Period (after 1800 CE) when Maori continued to use
fashion (i.e., Mills et al., 2013). traditional fortications with adaptations for the introduction of
Representative visualization e that is trying to get across to your muskets.
audience how archaeology looks today, or looked at some point in As noted above, we have a good idea of the number, geographic
the past e is more accessible as 3D models become easier to create range, and preferred location of fortications (Fig. 2). The total
and manipulate through technologies like structure-from-motion number of fortications built by the ancestors of Maori over three
3D models (i.e., AutoDesk's 123D Catch), and more user-friendly centuries has been given in various sources as being between 4000
computer aided drawing software (i.e., Trimble's SketchUp). The and 6000 (Davidson, 1984), followed by more specic site record
results of professional surveying are also reaching a larger audience based gure of 6528 (Schmidt, 1996), and present professional site
through work by outts like CyArk and community sharing web records give a gure of 7314 (ArchSite, 2017). Fortications have
platforms like Sketchfab. 3D models are being integrated within been recorded across the entirety of New Zealand's two major
web GIS, through Google Earth's Street View that allow viewers to islands and offshore islands. There is however a well-known pref-
visit an archaeological site, and as Virtual Reality (VR) becomes erence for northern, warmer, coastal environments, as seen in the
more commonplace, these virtual visits will certainly become more site predictive model shown in Fig. 2 (Leathwick, 2000). The North
immersive. Island, and the northern parts of the South Island, are the only lo-
There are a number of great examples of the use of GBD and cations suitable for the crops that the ancestors of Maori brought
social media, such as a recent web GIS (CartoDB) visualization of with them to New Zealand, and so while the paleodemography of
geotagged posts from around the world as part of the Day of the islands is currently a matter of speculation, we presume that
Archaeology Project (jessogden.carto.com). The potential for edu- the agricultural economy in the warmer north allowed for a much
cation and public outreach through social media is clear to see, faster growth rate than the hunting-shing economy of the colder
even if it is less clear how exactly it will unfold as technology and south (Davidson, 1984:56e59). And so, the spatial distribution of
tastes change. Other trends, like the Internet of Things (IoT), are fortications is positively correlated with both good farmland and
even harder to predict how they will articulate with the goals of the regions with the highest population density at the time of Eu-
archaeology, but certainly as the gap between the digital and ropean contact.
modern things becomes smaller, so will the gap between the digital In Allen (2006) summary of research on Maori fortications he
and ancient things (see also Horton, 2014). identies ecological, political, and symbolic perspectives. The
ecological model, originally conceived by Vayda (1960), suggests
5. Geospatial Big Data in action: Maori fortications (Pa) that seizing cleared gardens from neighboring groups became less
difcult than nding and clearing new land as the population grew.
Data quality, privacy, and the growing size of our datasets are The political model interprets the distribution of larger
82 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Fig. 2. General model for the spatial distribution of Maori fortications in New Zealand. It is well established that the majority of fortications and other pre-European contact era
archaeological sites are found in the warmer northern region of New Zealand, especially in coastal environments, as shown in this example of site predictive model (Leathwick,
2000:Fig. 3).

fortications in ecologically rich areas as reecting the consolida- allow us to determine if geographic preference is something that
tion of resources and power by chiefdoms (Allen, 1994, 1996, 2008; was evident from the earliest periods, or did it vary over time (see
Earle, 1997) and the symbolic model sees fortications as symbols also McFadgen et al., 1994). To approximate the spatial distribution
in larger Maori cosmology (Barber, 1996). of population e a factor currently not currently quantied in the
study area e the frequency of radiocarbon dates will be used to
5.1. Temporal GBD and the fortication of New Zealand model population distribution from settlement (1250 CE) through
the period of fortication use. The methods applied here are
GBD can be used to create a geospatial-temporal model of the adapted from Chaput et al. (2015).
fortication of New Zealand. As noted, the date for the onset of
fortications is around 1500 CE and the preference for fortication 5.1.1. Sources of data
construction in northern, warmer locations better suited for The primary source of archaeological data for this study is an
farming have already been determined in previous studies. Here archived radiocarbon database created by the Waikato Radiocarbon
radiocarbon dates from across New Zealand were used to estimate Lab called NZ C14 Data (version 0.5) (http://www.waikato.ac.nz/
the distribution of evidence for fortication use over time slices to nzcd/C14kml.kmz) (Fig. 3). Created about 15 years ago to
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 83

Fig. 3. Geospatial Database Radiocarbon Dates in New Zealand. This image shows an archived geospatial database (kmz) created by Waikato Radiocarbon Lab.

compliment an online database of radiocarbon dates (www. 5.1.2. Methods


waikato.ac.nz), the database includes n 1671 dates. The data- The archaeological geospatial database (NZ C14 Data) under-
base is remarkably rich with over three times the density as the went a number edits, transformations, and lters to generate raster
North American coverage in the CARD database, the bases for time-slices of population and fortication distribution. These steps
Chaput et al. (2015), over a much shorter time frame (~700 yrs, as are summarized below: 1) Assembling Archived Data, 2) Adding
opposed to 12.5 ky). It is in a Google Earth format (kmz), which was Environmental Variable to Data, 3) Coding Site Type, 4) Coding
still new at the time it was created, and has a clear disclaimer that it Temporal Values, 5) Spatial Sampling, and 6) Raster Interpolations.
comes with no claims of quality. The online database however does Assembling Archived Data. The point layer format (kmz) of NZ
have a great deal of relevant metadata. C14 Data was transferred to ESRI's ArcMap 10.3. Locational infor-
Environmental data for this study was sourced from the New mation (lat, long) and radiocarbon lab identication transferred
Zealand government's Land Information (LINZ) division and was easily, however, other data (site type, material dated) did not
created as part of the Land Environments of New Zealand (LENZ) migrate smoothly. It was necessary to search the online database
classication (Fig. 4). This included a polygon layer of the country using radiocarbon lab identication numbers to re-attach this in-
representing the main islands and over 900 offshore islands that formation to records. Once complete, a point shapele was created
was ltered in this study to include only the four largest islands in ArcMap that was transformed to a local datum and projection
(North Island, South Island, Steward Island, Great Barrier Island) for (NZ 2000 Map Grid).
ease of processing. To give a general approximation of the climate at Adding Environmental Variable to Data. To allow the results to be
different locations a layer of average modern temperature was used quantied relative to the local climate where radiocarbon dated
with the caveat that there are known shifts in the climate over the materials were found, a raster representing mean annual modern
period of human occupation of New Zealand in the pre-European temperature was used to add a eld (Temperature) to the point
era (1250e1769 CE), notably the Little Ice Age. record. Some points (n 199) were outside of the raster and
84 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Fig. 4. Average Modern Temperature and Distribution of Radiocarbon Dates. Temperature ranges from 6.0 C (black) to 16.2 C (white). Sources: LINZ, Waikato Radiocarbon Lab.
Data reproduced with the permission of Landcare Research New Zealand Limited.

excluded from the analysis. Excluding so many records was not which dates were included or excluded based on material type and
ideal, however, since there did not appear to be any systematic sorted in to Early, Middle, Late, and Historical periods as well as
error that caused them to be outside the environmental raster, the overlapping periods (e.g., Early-Middle, Middle-Late). This yields
analysis moved forward. good results but of course requires a great deal of individual eval-
Coding Site Type. The native classication of radiocarbon dates uation of dates in terms acceptability of material and distribution of
by site type included 43 recognized types including several types of multiple intercept dates at 1- and 2-sigma. To make it possible to
fortications. In practice, there were 47 unique site types coded in process a large number of dates, a rule-based ltering was required.
the dataset, mainly due to typos in the original data. To simplify In this case, all material was classied as either identied terrestrial
how sites were coded a new eld called Grouped_Type was charcoal, marine shell or bone, or other. No attempt was made to
created with 10 options, one for fortications and the remainder lter out long-lived verses short-lived charcoal among the identi-
based on broad formal/functional designations. ed charcoal since the way the material was reported was not well-
Coding Temporal Values. The most time consuming aspect of this suited for searching and classifying.
analysis was ltering, classifying, and coding temporal information. Second, radiocarbon dates were calibrated using Calib 7.1
First, the recent and brief period of human occupation of New (Stuiver et al., 2017). Terrestrial charcoal was calibrated using the
Zealand means that one must have a protocol for interpreting Southern Hemisphere curve (SHcal13) and marine material using
radiometric results that overlap. In addition, there is question of in- the recommended marine calibration (MARINE13) as described in
built age in unidentied charcoal (see Dye, 2015 for a recent dis- Smith (2010). Dates with greater than 1000 CRA were ltered out to
cussion of this issue). Smith (2010) outlined one such protocol in exclude most dates on pre-human natural phenomenon, and dates
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 85

equal to or less than 120 CRA were ltered out from analysis as heavily used. Second, concurrent with the earliest signs of forti-
being too recent. For this analysis the resulting mean intercept date cations, evidence of human activity is shifts to the North Island, and
was used to put dates in to century scale periods and these were the earliest fortications show a preference for the North Island
given cultural period names adapted from Smith (2010) with the that is consistent through 1400e1600 CE. Third, the phase when we
addition of Natural to account for periods that likely include all or see the beginning of sustained contact with Europeans, 1600e1800
some dates that pre-date human settlement of New Zealand in CE, activity on the South Island continues to fall, although we do
1250 CE. nd the rst dates of fortications in the far southern reaches of the
Spatial Sampling. As Chaput et al. (2015) note, over-sampling is a South Island. This last trend is represented on both the unrestricted
major concern in these types of analysis since there will be some and coast-restricted rasters, although the later leaves off the
sites, and regions, that have far more dates than others, as without southernmost date from a fortication.
some spatial sampling would appear more active than is war- The geospatial trends identied are also seen in modern mean
ranted. In this case, most sites (~60%) have a single date in any given temperature at the locations where radiocarbon dates were re-
period, and for these it was straight forward to code them as pre- ported. Fig. 7 shows the range of values in different site classica-
sent/absent for each period. For the remaining sites, these were also tions (pa, non-pa, all dates). There is a consistent preference for
coded as present/absent for each time period regardless of the warmer climates for fortications over time, broadening in the nal
number of dates within that period. periods; as well as a shift from activity that reects no preference,
Raster Interpolations. Raster were created based on the method or possibly a southern/colder preference, to activity shifting toward
described in Chaput et al. (2015) using a kerging density function the temperature range of fortications.
(ArcMap 10.2) with a search radius of 600 km (Figs. 5 and 6). These
were normalized using the highest density results (using the Raster
Calculator function), to the Middle Period (mean dates of 5.2. Site records GBD and the distribution of fortications (Pa)
1500e1600 AD). The coastline of New Zealand creates an edge ef-
fect, complicated by the fact that some dates (n 19 out of 495) are GBD of fortications in New Zealand are a good example of the
mis-located off the coast. Therefore, two methods were run; one difference between professional (privately maintained and
where there was no spatial constraint on the raster, and then the restricted) and publically available site records. As noted above, in
results was clipped to the island polygon layer, and another where New Zealand the spatial distribution of fortications is not in
the island polygon layer was used to restrict the raster calculation. dispute, nor is the value of science and scientic facts, and so it is
largely unnecessary to make such a comparison. But, given that in
the United State at the moment there is a real danger to the authority
5.1.3. Results and value of scientic data, specically when it is perceived as an
The estimate of fortication and population distribution yields impediment to economic development, it is in archaeology's interest
three generalized phases. First, in the period before fortication, to be able to clearly show that professional site records, while often
the population distribution was remarkably uniform across the kept from the public view to avoid looting, do exist and are the tip of
islands, with indications that the South Island may have been more the iceberg when compared with sites that are better known.

Fig. 5. Distribution of Radiocarbon Dates: Population. This time series shows all radiocarbon dates as a proxy for spatial distribution of population (after Chaput et al., 2015). See text
for description of periods.
86 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Fig. 6. Distribution of Radiocarbon Dates: Fortication. This time series shows all radiocarbon dates as a proxy for spatial distribution of population (after Chaput et al., 2015). See
text for description of periods.

Fig. 7. Average Temperature at Locations with Radiocarbon Dates. When organized by time series, it appears that fortications favored warmer areas from their rst appearance and
grew to include colder regions through time. The general trend for the proxy for population distribution appears to shift toward warmer locations over time with a broad range of
environments occupied throughout.

5.2.1. Sources of data Zealand's cultural heritage and to publish, promote and foster
The New Zealand Archaeological Association (NZAA) has been research into archaeology. The site database is also a resource for
responsible for the systemically documentation of archaeological local tribal cultural resource managers, although Maori continue to
sites for decades, rst as a paper record (also known as the be under-represented in archaeology (Rika-Heke, 2010). In the
Archaeological Site Recording Scheme), and today as ArchSite early days of the online database there was some concern from
(archsite.org.nz). Although the NZAA partners with government indigenous scholars that it would be commercial scheme; it
agencies, it is an independent charity created to protect New nonetheless remains non-prot and governed by a board drawn
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 87

from the NZAA membership and web administrator who approves beyond the coastline, no points were left out of the analysis due to
changes and additions to the database (Bickler, personal their mislocation relative to the coast.
communication).
The ArchSite web portal has a public face where one can browse 5.2.2. Methods
a map of 68,753 archaeological sites across the country. Visitors are The archaeological geospatial databases e the professional
blocked from zooming in to a scale that might give away the spe- database (ArchSite) and the public database (LINZ-Pa-pts) e un-
cic location of sites and full access to the geospatial database is derwent a number transformations and lters to generate vector
granted only to members of the NZAA. This is an integrated data- layer representing fortication distribution. These steps are sum-
base with key information its history (older site identication marized below: 1) Frequency of All Known Sites (ArchSite); 2)
number, current identication number, last time the record was Fortications from Professional Records (ArchSite); and 3) Counting
updated, if its location has migrated from white-paper records or Fortications from Public Records (LINZ-Pa-pts).
GPS, etc.). It also includes a much cleaner site type classication Frequency of All Known Sites (ArchSite). A transitional shapele
than was in use when the NZ C14 Data was created so it is was created where the frequency of all sites currently recorded in
straightforward to identify the 7314 fortication (pa) across the ArchSite within an individual polygon of the NZ map grid was
country. calculated as well as the density of site records.
The public data of fortications was created by Land Information Fortications from Professional Records (ArchSite). Another tran-
New Zealand (LINZ) based on government topographic maps sitional shapele was created where the frequency of fortication
(1:150,000 scale) that has migrated to GIS (Fig. 8). It includes 2135 sites currently listed in ArchSite within an individual polygon of the
locations and is available for download at LINZ (linz.govt.nz). It is NZ map grid was calculated as well as the density of site records.
also currently hosted on OpenStreetMaps.org. While it is not con- Counting Fortications from Public Records (LINZ-Pa-pts). A nal
nected to the professional database, both list the traditional name shapele was created where the frequency of publically known
of fortication, if known, and in some cases it would be possible to sites (LINZ, NZ topo maps) within an individual polygon of the NZ
link them through location. map grid was calculated as well as the density of site records.
For this study, it was critical that the locations of sites in Arch-
Site, the professional archaeological database, were not inadver-
5.2.3. Results
tently revealed. To do this, a polygon layer representing New
The results show how surprisingly poorly a public geospatial
Zealand's Map Grid system was used to summarize the distri-
database represents the actual geographic distribution and density
bution of sites. Map Grid is already used as the rst alpha-numeric
of fortications. Fortications are present in ~20% of all map grid
in site records in New Zealand (for example, site P05/214 is located
polygons. To put that in more meaningful terms, at any given place
in map grid reference location P05). Reference grid vary slightly
in New Zealand there is a 20% chance that a fortication is within a
across the country but generally are about 7.5 km (N-S) x 5.0 km (E-
1e2 h walk. The public records of fortications alone would un-
W), covering about 37.5 square kilometers. Also, since grids go
derestimate how common fortications are across space

Fig. 8. Distribution of Public and Professional Site Records of Fortications. Note that site locations are masked by using a polygon layer representing the NZ map grid. Sources: LINZ,
ArchSite.
88 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

dramatically. There is even less correspondence when we compare local relief model (SLRM), Sky-View Factor (SVF), Anisotropic Sky-
the density of fortications in the public and professional data- View Factor (SVF-A), Openness e Positive (OPEN-POS), Openness
bases. It is exceedingly rare that the two correspond to within 10% e Negative (OPEN-NEG), Sky illumination (SIM), and Local domi-
of one another; this is true for only with less than 2% of locations. nance (LD). Each function produced two versions (32-bit, 8-bit), for
This same data to infer that a further ~30% of New Zealand has no a total of 22 individual rasters. Almost all rasters were useful rep-
fortication recorded, although other sites have been recorded in resentations of the site (Fig. 10); only the Sky illumination (SIM)
that area, and an additional half of country has no archaeological and Local dominance (LD) functions, when applied at RTV's default
site recorded in the professional site record system. settings, did not produce rasters with enough variance for
visualization.
5.3. Remote sensing GBD and the mapping of fortications (Pa) 3D Model Fly-Through. To create an example of a 3D model y
through of Puketona Pa, I rst created a slope layer for the original
GBD can be used to create 3D models of fortications of New DEM and then clipped it to a circle (460 m diameter) around the
Zealand and are a good example of how embryonic geospatial center of the site. The resulting layer was opened in ESRI's ArcScene
datasets can improve without necessarily getting big. The NZAA's to create a y-through video (.avi) where z-dimension was derived
ArchSite database includes links to site maps, and for most forti- from the DEM. The short video (25 s, 1048  796) is a brief tour
cations, these come from a brief site visit many years ago when a around the site (Fig. 11).
not-to-scale map was drawn of the defenses (ditches, banks) and
internal earthworks (terraces, pits). These are extremely valuable as 5.3.3. Results
in many cases they are our only eld map of the site. Site records are The methods and digital products here are not new to archae-
also updated, where possible, with GPS locations of major features, ology - in fact many innovative uses of 3D data have specically
and there has been an effort to include air photos (Jones, 2007). For targeted hillforts (for a recent PhD dissertation on the topic, see
this use of GBD, I wanted to show how airborne LiDAR from O'Driscoll, 2016) - and it is well-established that it can be bene-
immediately in and around a fortication can also be used to cial to carry out a more detailed desktop survey using lidar data
visualize the site, but more importantly, how that can grow the size and other sources, such as standard aerial photographs, and taking
of a single site record. this information into the eld instead of, or together with, the
simple lidar derived imagery (English Heritage, 2010:38). What
5.3.1. Sources of data might be surprising is that while the les created are a great deal
Since the purpose of this study is to show how new visualiza- larger than the scans of paper les currently stored on the ArchSite
tions of a fortication can grow the size of a geodatabase I began by website (an order of magnitude in the case of 32-bit rasters and the
selecting a site that I created a DEM using airborne LiDAR data video), they are not that large in absolute terms. If we scaled these
several years ago, called Puketona Pa (Fig. 9). Puketona Pa is classed up to include similar les for all 7000 fortications, it would be
as a Hill Pa on an isolated natural hill above the Waitangi River in less than a few terabytes, and scaled up again to the 64,000 sites
the Northland District. The features present (terraces, defensive in all of New Zealand, it would t on a handful of servers. Having
ditches, pits) are typical of fortications (pa) and the fortied in- said that, if one automated the production of these visualizations,
ternal area at the summit covers ~20,000 m2 putting it in the large there would have to be a way to lter out results like the SIM and LD
size class (5000e40,000 m2), putting it in with fortications that to keep from creating a lot of raster les with no useable image (See
may serve as either internal or external power bases (Marshall, Table 3).
2004:77). Oral history describes the occupation of the site in the What this back-of-the-envelope calculation demonstrates is
generations prior to European contact and conrms it was indeed a that although LiDAR and other remote sensing data are genuinely
political center of some importance. unwieldy GBD, the types of digital products that are of direct in-
The purpose of creating the LiDAR derived DEM was to have terest to archaeology need not be. In other words, it is possible to
remotely sensed imagery in advance of a site visit. The original reap the benet of large volume GBD (LiDAR) without necessarily
source of the airborne LiDAR was a survey funded by the Northland increasing the volume of archaeological GBD to a point where they
Regional Council in the wake of a serious ooding of the Waitangi become equally unwieldly.
River in 2007. The DEM was created using Nearest Neighbor func-
tion at a high resolution (0.25  0.25 m) in order to dene the edges 6. Discussion: standalone quality reporting
of earthworks and covered an around of about 90 ha. To reduce the
size of down-stream digital products in this visualization, and since I have been guided in this paper by three questions: 1) What
the site occupies a much smaller portion of that area, a selection of kinds of geospatial data are available today? 2) How will larger and
2 ha was used here reducing the baseline DEM from greater than more accessible geospatial databases shape the near future of
60 MB to less than 8 MB. archaeology? And, using a case study from New Zealand, 3) What
can we do now about our apprehensions regarding data quality,
5.3.2. Methods privacy, and volume?
The DEM of Puketona Pa underwent a number transformations First, we are a eld that values locational information but
to generate layers representing the fortication. These steps are archaeology is also a profession with an obligation to keep that
summarized below: 1) Deriving Relief Visualizations; and 2) 3D information away from those who would misuse it. Even with that
Model Fly-Through. important caveat, one would think that it would be straightforward
Deriving Relief Visualizations. To create a number of different to identify large, complex, geospatial datasets on site locations,
types of visualizations I used the Relief Visualization Toolbox (RVT) artifacts, architecture, radiocarbon dates, and so on, given trends
(http://iaps.zrc-sazu.si/en/rvt#v) created by Kokalj and the team at like web based GIS, open access publishing, volunteer geography,
the Institute of Anthropological and Spatial Studies (Kokalj et al., and Big Data science. There are indeed some outstanding archae-
2013). The standalone executable version of the toolkit was used ological geodatabases out there that are a testament to tenacity,
to make, in just over 1 min of processing time, 11 functions: and generosity, of those who create and contribute to them. But, on
Analytical hillshading (HS), Hillshading from multiple directions the whole, these are small oases in what continues to be a data
(AHS), PCA of hillshading (PCA), Slope gradient (SLOPE), Simple desert.
Fig. 9. Web Maps with Puketona Pa (Site P05/214). Sources (top to bottom): OpenStreetMap, Google Earth, Topomapnz, Archsite (public view).
90 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Fig. 10. Terrain Relief Visualizations of Puketona Pa. From clockwise from top left: slope (64 to 1), multi-hs, pca, srlm, open-neg, open-pos (150e58), svf-a (1e0.4), svf (255e0),
and eld map from site record (not-to-scale).

Fig. 11. Screen Capture from Video Fly Through of 3D Model of Puketona Pa. Areas of high slope (white) contrast with at areas (black).

Second, what happens when our data desert turns in to a data other remote sensing and environmental datasets increase in size
ocean is largely up to us and how we deal with the datasets we and complexity. In this context, sharing data will become an ever
have and collect today. As spatial technology has evolved and more delicate balance between privacy, quality, and the potential
become integrated in to the practice of archaeology, we are dealing benets of being able to show the world what we have found. We
with challenges posed by the sheer size and complexity of data we can see this especially in how we construct culture histories, how
use and produce. Field survey and excavations regularly yield far we make decisions using archaeological geospatial data, and visu-
more pieces of spatial information than ever before. At the same alize archaeology for research and for the public.
time, the amount of available satellite imagery, airborne lidar, and Third, the case study I presented here is a good example of how
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 91

Table 3
Estimated Storage Size of Visualizations. The digital products from this study for Puketona Pa would take the current 1 MB of les associated with the site up to 469 MB.

Puketona Pa Estimated Total for All Fortications (GB) Estimated Total for All Sites (TB)
Data Stored (MB)

Current ArchSite (scans of paper les) 1 9 0.1


DEM (region, 90 ha) 63 461 4
DEM (large site, 2 ha) 8 56 1
Relief Visualization (32-bit) 119 870 8
Relief Visualization (8-bit) 25 186 2
Fly-Through Video (desktop, 25 s) 254 1858 17

Table 4
Suggested Questions for Standalone Quality Reports on the Use of Geospatial Big Data in Archaeology. The specic data qualities listed here were modied from Merino et al.
(2016). This type of document is in line with the London Charter (Denard, 2012) principle of documentation for computer-based visualization of cultural heritage.

Qualities Adequacy Question Improvement Question

Contextual relevant and complete What is the intellectual justication for using What changes from the primary data were
this data for the task? made, if any, to assure the data was relevant and
complete?
unique and semantically interoperable Are there known duplicate or semantically What changes were made, if any, to eliminate
redundant records in the new data? duplicate or semantically redundant records?
semantically accurate How does the data represent real entities? What changes were made, if any, to how the
data represent real entities?
credible What criteria were used to assess the credibility What data, if any, was ltered due to a question
of data relative to the task? of credibility?
condential Was the primary data accessible to the analyst? What has been done to protect the primary data
from inappropriate use?
compliant Does the data meet regulations and standards? Does the newly derived data meet relevant
regulations and standards?
Temporal time-concurrent How is the data grouped by time periods? What changes, if any, where made to how data
is grouped by time periods?
current Is original data an archived (static) or How does the database identify when was data
integrative database (updated)? collected?
timely Are there known data from ongoing project or Has the primary data been updated/modied
other new data not included in this database? for the task at hand?
frequent Is space-time trend analysis possible? Have changes been made to make space-time
trend analysis possible?
time-consistent Is time represented coherently? Were changes made to make time represented
coherently?
Operational available, recoverable, accessible What logistical barriers were there to gaining How, if at all, did logistical barriers to accessing
access to the data? primary data shape this new data?
authorized What restrictions have the stewards of the What restrictions are there on this new data for
original data placed on the data? secondary use?
similar data types, precision, portable Are there technical barriers to working with the What improvements were made to overcome
data? technical barriers to working with the data?
efcient Is the native data model appropriate to the Was the native data model changed for the
intended task? intended task?
traceable How can one trace access and changes to the Were previous changes to primary data known
data? to the creators of this new dataset?

we might take a data quality-in-use approach to GBD in archae- it makes it possible to share the results without putting archaeo-
ology. In the rst example, I used an archived geodatabase of logical sites in greater danger of looting. In my last example, I
radiocarbon dates to estimate the spread of populations and forti- created a number of digital representations of a single site (Puke-
cations in New Zealand. In the analysis I discovered a number of tona Pa) that are ultimately derived from airborne LiDAR. I found
data quality problems: a lack of accurate site location data meant that while the underlying LiDAR data is large, that including visu-
many radiocarbon dates could not be compared with environ- alizations based on relief and video improved the site record
mental data; site types used in the database were not ideal for the without necessarily creating an insurmountable problem with data
project and contained typos; calibration and assigning radiocarbon volume.
dates to century-scale temporal bins raised a number of issues and This information on the adequacy of the datasets used, and the
required lowering quality protocols; and interpolation was inu- improvements made, are absolutely necessary for anyone who
enced by an edge effect along the coast that again was complicated would use the original dataset or the digital products of the
by a lack of site location accuracy. As described above, some of these research presented here. To that end, I suggest we need to require
problems were solved by ltering problematic data out of the new analyses using GBD come with a Standalone Quality Report. The
analysis and/or improving the dataset by creating new elds, such International Organization for Standardization (ISO) reference on
as the grouped site type and time periods. The second example also geographic information e data quality (ISO 19157:2013(E)) denes
demonstrates how the need to keep archaeological site locations a standalone quality report as free text document providing fully
condential complicates analysis. In that case I lowered the reso- detailed information about data quality evaluations, results and
lution of the information on the two site databases in the digital measures used. In the case of archaeology, a data quality report
end products (i.e., maps, data) by summarizing them using an would need to describe information about original archaeological
arbitrary grid (NZ Map Grid). I view that as an improvement in that geospatial dataset and the data derived for a specic task such as
92 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

archaeological research, assessment, documentation, or visual to a skeptical ear will sound like the same platitudes that are
representation. It would need to be supplementary to administra- dragged out every time archaeology discovers a new digital toy to
tive and technical information contained in metadata; in technical play with. But, I would argue, that there are good reasons to think
terms the report would paradata (data published alongside data). that these benets will materialize. My optimism comes from ex-
I would nd narrative descriptions of the adequacy of existing amples of GBD that already exist and the parallel trends that are
data and improvements useful in a Standalone Quality Report, but directly or indirectly dealing with GBD, in terms of cyber-
we would also do well to include some standardized questions that infrastructure (Yang et al., 2010), and spatial analyses (Ortman
will make future searches easier (Kintigh, 2015). Table 4 gives an et al., 2015).
example of the types of questions that might be included in a We must get on with the task of using massive amounts of
Standalone Quality Report. One set examines the initial adequacy of descriptive geospatial data in a fashion that is scientic (testable,
the existing geodatabase for the task described, the second asks replicable, etc.), authentic (a faithful representation of our obser-
about the improvements made, if any, for the study. The quality vations on the archaeological record and the human past), and
categories are adapted based on suggested evaluations of Big Data ethical (protection of cultural resources). Information on the ade-
in terms of contextual, temporal, and operational adequacy and quacy of the datasets used, and improvements made, are absolutely
improvements (Merino et al., 2016). Interestingly, Big Data science necessary if we are to deal with apprehensions over data quality,
has become increasingly concerned with how it performs in the privacy, and volume. I suggest we need to require new analyses
temporal dimension (e.g., when does the data pertain to, when was using GBD come with a Standalone Quality Report to go beyond the
it recorded) in part because of the volume of streaming data. usual administrative metadata. This small change is something that
I will be the rst to say that the data quality-in-use approach can be implemented now to the benet of large and growing body
sounds a great deal like opening the door to messy research done of knowledge about the human past.
with big, bad data e and this is why the Standalone Quality Report
is so important. We need to use our core disciplinary values Acknowledgements
regarding data quality and privacy and apply them to GBD. A quality
report will increase the visibility of our siloed datasets and talk Thanks to Marieka Brouwer-Burg and Meghan Howey for their
about why that information is siloed in ways that make it more invitation the SAA session and for organizing this special issue. This
visible and resilient to being lost to those who would deny scientic paper has evolved thanks to lively discussions with my colleagues
data. Equally importantly, if we begin to publish our data with in Southern Methodist University's Big Data Working Group and a
standalone quality reports where we acknowledge the improve- grant from the Maguire Ethics Center. Special thanks to all my
ments we are making, we are encouraging an professional culture colleagues and students who have helped me form the opinions
were we work cumulatively, improving upon geospatial data, expressed here: Michael Aiuvalasit, Nick Belluzzo, Simon Bickler,
rather than creating datasets that are used for one task and then Emma Brooks, Steve Burrow, Jesse Casana, Maria Codlin, Ann
abandoned. Horsburgh, Ian Jorgeson, James Flexner, David Meltzer, Andrew
Martindale, Stace Maples, Thegn Ladefoged, Cliff Patterson, Leslie
7. Conclusions Reeder-Myers, Nico Tripcevich, Robert Wayumba, and Joshua Wells.
Trying to capture the evolution of spatial technology in archaeology
I describe geospatial datasets in archaeology today as oases in a at any one time is a daunting task and I am grateful to three
data desert, that would one day become a data ocean as we create anonymous reviewers who helped guide my search and press me to
more, and larger, datasets that are more easily discoverable and look forward to the future.
accessible. The most immediate pay off of this trend will be in terms
of having access to all available relevant data. Due to natural
Appendix A. Supplementary data
taphonomic processes and modern development, the archaeolog-
ical record is by denition an incomplete set of material evidence;
Supplementary data related to this article can be found at http://
evidence we destroy by excavating. Not only is it in our interest to
dx.doi.org/10.1016/j.jas.2017.06.003.
know what information already exists that might answer our
questions, without it we are more likely to needlessly destroy sites
in pursuit of redundant data. References
The question of how to deal with GBD touches all our ethical
Allen, M.W., 1994. Warfare and Economic Power in Simple Chiefdoms: the Devel-
obligations (e.g., SAA principles: stewardship, accountability, opment of Fortied Villages and Polities in Mid-Hawkes Bay, New Zealand.
commercialization, public education and outreach, intellectual Department of Anthropology, UCLA, University Microlms, Ann Arbor. Un-
property, public reporting and publication, records and preserva- published PhD. dissertation.
Allen, M.W., 1996. Pathways to economic power in Maori chiefdoms: ecology and
tion, training and resources). We are seeing progress on this front in warfare in prehistoric Hawke's Bay. Res. Econ. Anthropol. 17, 171e225.
the success of archival databases (ADS, tDAR) and the promotion of Allen, M.W., 2006. Transformations in Maori warfare: Toa, pa, and pu. In:
best practices through the responsible use of available integrated Arkush, E.N., Allen, M.W. (Eds.), The Archaeology of Warfare: Prehistories of
Raiding and Conquest. University Press of Florida, Gainesville, pp. 184e213.
databases. As these evolve, it would be excellent to see geoportal Allen, M.W., 2008. Hillforts and the cycling of Maori chiefdoms: Do good fences
data platforms that connect related databases; that is, healthy data make good neighbors? In: Railey, J.A., Reycraft, R.M. (Eds.), Global Perspectives
silos that are visible, but not fully accessible. on the Collapse of Complex Systems. Maxwell Museum of Anthropology
(Anthropological Papers No. 8), Albuquerque, pp. 65e81.
The development of GBD in archaeology offers new ways to Attenbrow, V., Hiscock, P., 2015. Dates and demography: are radiometric dates a
engage with other scientists, stakeholder communities, and the robust proxy for long-term prehistoric demographic change? Archaeol. Ocean.
public. I have no doubt that GBD will be the basis for writing culture 50, 29e35.
Bevan, A., 2015. The data deluge. Antiquity 89 (345), 1473e1484.
histories in the foreseeable future, and it will be the way forward
Barber, I.G., 1996. Loss, change, and monumental landscaping: Towards a new
for interdisciplinary research as other elds grow to discover all interpretation of the classic Maori emergence. Curr. Anthropol. 37 (5),
that we have already learned. These same efforts will reveal more 868e880.
ways to involve the public as participants in discovery and Barton, M., 2013. Stories of the past or science of the future? Archaeology and
computational social science. In: Bevan, A., Lake, M. (Eds.), Computational
stewardship. Approaches to Archaeological Spaces. Left Coast Press, Walnut Creek,
These benets e data access, ethics, and broader engagement e pp. 151e178.
M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94 93

Barton, M.C., Ullah, I., Mitasova, H., 2010. Computational modeling and Neolithic Bronze Age: An overview. Holocene 26 (10), 1576e1593.
socioecological dynamics: a case study from Southwest Asia. Am. Antiq. 75 (2), Huggett, J., 2014. Promise and paradox: accessing open data in archaeology. In:
364e386. Mills, C., Pidd, M., Ward, E. (Eds.), Proceedings of the Digital Humanities
Bamforth, D.B., Grund, B., 2012. Radiocarbon calibration curves, summed proba- Congress 2012. Series: Studies in the Digital Humanities. HRI Online Publica-
bility distributions, and early Paleoindian population trends in North America. tions, Shefeld.
J. Archaeol. Sci. 39, 1768e1774. Huggett, J., 2015a. Digital haystacks: open data and the transformation of archae-
Bodenhamer, D.J., Corrigan, J., Harris, T.M., 2010. The Spatial Humanities: GIS and ological knowledge. In: Wilson, A.T., Edwards, B. (Eds.), Open Source Archae-
the Future of Humanities Scholarship. Indiana University Press, Bloomington. ology: Ethics and Practice. De Gruyter Open, pp. 6e29. http://dx.doi.org/
Bonacchi, C., Bevan, A., Pett, D., Keinan-Schoonbaert, A., 2015. Crowd- and 10.1515/9783110440171-003. ISBN 9783110440171.
community-fuelled archaeological research. Early results from the MicroPasts Huggett, J., 2015b. A manifesto for an introspective digital archaeology. Open
project. In: Proceedings of the Conference Computer Applications and Quan- Archaeol. 1 (1), 86e95.
titative Methods in Archaeology, pp. 279e288. Huggett, J., 2016. Biggish Data. Introspective Digital Archaeology Blog. https://
Casana, J., 2015. Satellite imagery-based analysis of archaeological looting in Syria. introspectivedigitalarchaeology.wordpress.com/2016/05/20/biggish-data/.
Near East. Archaeol. 78 (3), 142e152. International Organization for Standardization, 2013. Geographic Information e
Contreras, D.A., Brodie, N., 2010. The utility of publicly-available satellite imagery Data Quality (Geneva, Switzerland). https://www.iso.org.
for investigating looting of archaeological sites in Jordon. J. Field Archaeol. 35 Jelinek, A.J., 1962. An index of radiocarbon dates associated with cultural materials.
(1), 101e114. Curr. Anthropol. 3 (5), 451e477.
Chaput, M.A., Kriesche, B., Betts, M., Martindale, A., Kulik, R., Schmidt, V., Jones, K.L., 2007. The Penguin Field Guide to New Zealand Archaeology. Penguin,
Gajewski, K., 2015. Spatiotemporal distribution of Holocene populations in Auckland.
North America. Proc. Natl. Acad. Sci. U. S. A. 112, 12127e12132. Karmas, A., Tzotosos, A., Karantzalos, K., 2016. Geospatial Big Data for Environ-
Clark, G.R., Reepmeyer, C., Melekiola, N., Woodhead, J., Dickinson, W.R., Martinsson- mental and Agricultural Applications. In: Yu, S., Guo, S. (Eds.), Big Data Con-
Wallin, H., 2014. Stone tools from the ancient Tongan state reveal prehistoric cepts, Theories, and Applications. Springer, New York.
interaction centers in the Central Pacic. Proc. Natl. Acad. Sci. U. S. A. 111, Kansa, E.C., Kansa, S.W., Arbuckle, B., 2014. Publishing and pushing: mixing models
10491e10496. for communicating research data in archaeology. Int. J. Digital Curation 9 (1),
Contreras, D.A., Meadows, J., 2014. Summed radiocarbon calibrations as a popula- 57e70.
tion proxy: A critical evaluation using a realistic simulation approach. Kintigh, K., 2006. The Promise and Challenge of Archaeological Data Integration.
J. Archaeol. Sci. 52, 591e608. Am. Antiq. 71 (3), 567e578.
Colwell, C., 2016. How the archaeological review behind the Dakota Access Pipeline Kintigh, K.W., 2015. Extracting information from archaeological texts. Open
went wrong. The Conversation. https://theconversation.com/how-the- Archaeol. 1, 96e101.
archaeological-review-behind-the-dakota-access-pipeline-went-wrong-67815. Kintigh, K.W., Altschul, J.H., Beaudry, M.C., Drennan, R.D., Kinzig, A.P., Kohler, T.A.,
Cooper, A., Green, C., 2015. Embracing the complexities of Big Data in archaeology: Limp, W.F., Maschner, H.D.G., Michener, W.K., Pauketat, T.R., Peregrine, P.,
The case of the English Landscape and Identities Project. J. Archaeol. Method & Sabloff, J.A., Wilkinson, T.J., Wright, H.T., Zeder, M.A., 2014. Grand challenges for
Theory 23, 271e304. archaeology. Am. Antiq. 79 (1), 5e24.
Crombe , P., Robinson, E., 2014. 14C dates as demographic proxies in Neolithisation  Zaksek, K., Ostir, K., 2013. Visualizations of Lidar Derived Relief Models.
Kokalj, Z.,
models of northwestern Europe: A critical assessment using Belgium and In: Opitz, Rachel, David Cowley, C. (Eds.), Interpreting Archaeological Topog-
northeast France as a case-study. J. Archaeol. Sci. 52, 558e566. raphy e Airborne Laser Scanning, Aerial Photographs and Ground Observation.
Dakota Access, LLC, 2016. Dakota Access Pipeline Project Section 408 Consent for Oxbow Books, Oxford, pp. 100e114.
Crossing Federally Authorized Projects and Federal Flowage Easements. Report Laney, D., 2001. 3D Data Management: Controlling data volume, velocity, and va-
prepared for U.S. Corp of Engineers. https://assets.documentcloud.org/ riety. Application Delivery Strategies. https://blogs.gartner.com/doug-laney/
documents/3036302/DAPLSTLFINALEAandSIGNEDFONSI-3Aug2016.pdf. les/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-
Davidson, J.M., 1984. The Prehistory of New Zealand. Longman Paul, Auckland. and-Variety.pdf.
Delgado, M., Aceituno, F.J., Barrientos, G., 2015. 14C and the early colonization of Lawrence, D., Philip, G., Hunt, H., Snape-Kennedy, L., Wilkinson, T.J., 2016. Long
Northwest South America: A critical assessment. Quat. Int. 363, 55e64. Term Population, City Size and Climate Trends in the Fertile Crescent: A First
Dobbs, G.R., Louis, R.P., 2015. Geospatial technologies and indigenous communities Approximation. PLoS ONE 11 (3), e0152563. http://dx.doi.org/10.1371/
engagement. Int. J. Appl. Geospat. Res. 6 (1), ivexiii. journal.pone.0152563.
Do rter, G., Davis, L., 2013. Bridging geographic information systems (GIS) into the Leathwick, J.R., 2000. Predictive Models of Archaeological Site Distributions in New
museum world. Digit. Herit. Int. http://dx.doi.org/10.1109/ Zealand. Science and Research Internal Report 181. Department of Conserva-
DigitalHeritage.2013.6743843. tion, Wellington.
Denard, H., 2012. A New Introduction to the London Charter. In: Bentkowska- Lee, J., Kang, M., 2015. Geospatial Big Data: Challenges and opportunities. Big Data
Kafel, A., Baker, D., Denard, H. (Eds.), Paradata and Transparency in Virtual Res. 2, 74e81.
Heritage Digital Research in the Arts and Humanities Series. Ashgate, pp. 57e71. Lemmen, C., 2012. Different mechanisms shaped the transition to farming in Europe
Dye, T.S., 2015. Dating human dispersal in Remote Oceania: a Bayesian view from and the North American Woodland. Archaeol. Ethnol. Anthropol. Eurasia 41 (3),
Hawaii. World Archaeol. 47, 661e676. 48e58.
Earle, T., 1997. How Chiefs Come to Cower the Political Economy in Prehistory. Li, S., Dragicevic, S., Castro, F.A., Sester, M., Winter, S., Coltekin, A., Pettit, C., Jiang, B.,
Stanford University Press, Stanford. Haworth, J., Stein, A., Cheng, T., 2016. Geospatial big data handling theory and
English Heritage, 2010. The Light Fantastic: Using airborne lidar in archaeological methods: A review and research challenges. ISPRS J. Photogram. Remote Sens.
survey. https://content.historicengland.org.uk/images-books/publications/ 115, 119e133.
light-fantastic/light-fantastic.pdf/. Long, T., Taylor, D., 2015. A revised chronology for the archaeology of the lower
Field, J.S., Petraglia, M., Lahr, M.M., 2007. The southern dispersal hypothesis and the Yangtze, China, based on Bayesian statistical modeling. J. Archaeol. Sci. 63,
South Asian archaeological record: Examination of dispersal routes through GIS 115e121.
analysis. J. Anthropol. Archaeol. 26, 88e108. Maaten, L., Boon, van der P., Lange, G., Paijmans, H., Postma, E., 2007. Computer
Gaffney, V., van Leusen, M., 1995. Postscript e GIS, environmental determinism and Vision and Machine Learning for Archaeology. In: Clark, J.T., Hagemeister, E.M.
archaeology: a parallel text. In: Lock, G.R., Stancic, Z. (Eds.), Archaeology and (Eds.), Digital Discovery. Exploring New Frontiers in Human Heritage. CAA2006.
Geographic Information Systems: a European Perspective. Taylor & Francis, Computer Applications and Quantitative Methods in Archaeology. Proceedings
London, pp. 367e382. of the 34th Conference, Fargo, United States, April 2006. Archaeolingua,
Gifford-Gonzalez, D., 2016. Letter to Lieutenant General Todd Semonite, Com- Budapest, Pp. CD-ROM, pp. 476e482.
manding General and Chief of Engineers. 13 September 2016. http://www.saa. Marshall, Y., 2004. Social organization. In: Furey, L., Holdaway, S. (Eds.), Change
org/Portals/0/SAA/GovernmentAffairs/DAPL_LETTER.pdf. through Time: 50 Years of New Zealand Archaeology, vol. 26. New Zealand
Goldberg, A., Mychajliw, A.M., Hadly, E.A., 2016. Post-invasion demography of Archaeological Association Monograph, pp. 55e84.
prehistoric humans in South America. Nature 532, 232e235. Martindale, A., Morlan, R., Betts, M., Blake, M., Gajewski, K., Chaput, M., Mason, A.,
Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. Vermeersch, P., 2016. Canadian Archaeological Radiocarbon Database (CARD
GeoJournal 69 (4), 211e221. 2.1) (Accessed 7 November 2016).
Green, R.C., 1964. Carbon-14 dating. Curr. Anthropol. 5 (5), 428e429. McCoy, M.D., Ladefoged, T.N., 2009. New developments in the use of spatial tech-
Hitchcock, A., 2006. FOIA and protecting cultural resources. In: Harmon, D. (Ed.), nology in archaeology. J. Archaeol. Res. 17 (3), 263e295.
People, Places, and Parks: Proceedings of the 2005 George Wright Society McFadgen, B.G., Knox, F.B., Cole, T.R.L., 1994. Radiocarbon calibration curve varia-
Conference on Parks, Protected Areas, and Cultural Sites. The George Wright tions and their implications for the interpretation of New Zealand prehistory.
Society, Hancock, Michigan, pp. 468e474. Radiocarbon 36, 221e236.
Horsburgh, K.A., Orton, J., Klein, R.G., 2016. Beware the Springbok in Sheep's McLaughlin, T.R., Whitehouse, N.J., Schulting, R.J., McClatchie, M., Barratt, P.,
Clothing: How Secure Are the Faunal Identications upon Which We Build Our Bogaard, A., 2016. The Changing Face of Neolithic and Broze Age Ireland: A Big
Models? Afr. Archaeol. Rev. 1e9. Data Approach to the Settlement and Burial Records. J. World Prehist. 29,
Horton, M., 2014. Join an Archaeological Dig... Courtesy of the Internet of Things. 117e153.
Huffpost Tech (updated 16 Nov 2014). http://www.hufngtonpost.co.uk/mark- Merino, J., Caballero, I., Rivas, B., Serrano, M., Piattni, 2016. A data quality in use
horton/join-an-archaeological-di_b_5827698.html. model for big data. Future Gener. Comput. Syst. 63, 123e130.
Hosner, D., Wagner, M., Tarasov, P.E., Chen, X., Leipe, C., 2016. Spatialtemporal dis- Miller, D.S., 2016. Modeling Clovis landscape use and recovery bias in the South-
tribution patterns of archaeological sites in China during the Neolithic and eastern Unites States using the Paleoindian Database of the Americas (PIDBA).
94 M.D. McCoy / Journal of Archaeological Science 84 (2017) 74e94

Am. Antiq. 81 (4), 697e716. cba/.


Mills, B.J., Clark, J.J., Peeples, M.A., Haas, W.R., Roberts, J.R., Hill, J.B., Huntley, D.L., The Digital Archaeological Record (tDAR) http://core.tdar.org.
Borck, L., Breiger, R.L., Clauset, A., Shackley, M.S., 2013. Transformation of social ArchSite (New Zealand Archaeological Associations site recording scheme):http://
networks in the late pre-Hispanic US Southwest. Proc. Natl. Acad. Sci. U. S. A. 110 www.archsite.org.nz.
(15), 5785e5790. Digital Index of North American Archaeology (DINAA): http://ux.opencontext.org/
Mulrooney, M.A., 2013. An island-wide assessment of the chronology of settlement archaeology-site-data/.
and land use on Rapa Nui (Easter Island) based on radiocarbon data. J. Archaeol. The Electronic Atlas of Ancient Maya Sites: a Geographic Information System (GIS):
Sci. 40, 4377e4399. http://mayagis.smv.org.
O'Driscoll, J., 2016. The Baltinglass Landscape and the Hillforts of Bronze Age Digital Archaeological Archive of Comparative Slavery:http://www.daacs.org/.
Ireland. PhD Thesis. University College Cork. Open Context https://opencontext.org/.
Ortman, S.G., Cabaniss, A.H.F., Strurm, J.O., Bettencourt, L.M.A., 2015. Settlement CORONA Atlas of the Middle East http://digitalhumanities.dartmouth.edu/projects/
scaling and increasing returns in ancient society. Sci. Adv. 1, e1400066. the-corona-atlas-project/.
Peros, M., Munoz, S., Gajewski, K., Viau, A., 2010. Prehistoric demography of North 
Antiquity A-la-carte http://awmc.unc.edu/wordpress/alacarte/.
America inferred from radiocarbon data. J. Archaeol. Sci. 37, 656e664. American Institute of Archaeologys Archaeology of North America https://www.
Rika-Heke, M., 2010. Archaeology and indigeneity in Aotearoa/New Zealand: Why archaeological.org/news/aianews/6871.
do Maori not engage with archaeology? In: Phillips, C., Allen, H. (Eds.), Bridging Paleoindian Database of the Americas (PIDBA)http://pidba.utk.edu/.
the Divide: Indigenous Communities and Archaeology into the 21st Century. English Landscapes and Identities Project https://englaid.com/er4.
Left Coast Press, Walnut Creek, CA, pp. 197e212. Comparative Archaeology Database (University of Pittsburg) http://www.cadb.pitt.
Russell, T., Silva, F., Steele, J., 2014. Modelling the Spread of Farming in the Bantu- edu/.
Speaking Regions of Africa: An Archaeology-Based Phylogeography. PLoS ONE US National Register of Historic Places (National Park Service) https://www.nps.
9 (1), e87854. http://dx.doi.org/10.1371/journal.pone.0087854. gov/maps/full.html?mapId7ad17cc9-b808-4ff8-a2f9-a99909164466.
Shennan, S., Downey, S.S., Timpson, A., Edinborough, K., Colledge, S., Kerig, T., et al., Field Acquired Information Management System (FAIMS) https://www.fedarch.org/.
2013. Regional population collapse followed initial agriculture booms in mid- European Unions Inspire Geoportal http://inspire-geoportal.ec.europa.eu/.
Holocene Europe. Nat. Commun. 4, 2486. http://dx.doi.org/10.1038/ Pangaea www.pangaea.de.
ncomms3486. ArkeoGIS http://arkeogis.org/en/home.
Schmidt, M., 1996. The commencement of pa construction in New Zealand pre- The Survey of Hillforts http://www.arch.ox.ac.uk/hillforts-atlas-survey.html.
history. J. Polyn. Soc. 105 (4), 441e451.
Silva, F., Stevens, C.J., Weisskopf, A., Castillo, C., Qin, L., Bevan, A., et al., 2015. Artifacts
Modelling the Geographical Origin of Rice Cultivation in Asia Using the Rice
Archaeological Database. PLoS ONE 10 (9), e0137024. http://dx.doi.org/10.1371/
journal.pone.0137024. Portable Antiquities Scheme http://nds.org.uk/database.
Silva, F., Steele, J., 2014. New methods for reconstructing geographical effects on
dispersal rates and routes from large-scale radiocarbon databases. J. Archaeol. Radiocarbon
Sci. 52, 609e620. http://dx.doi.org/10.1016/j.jas.2014.04.021.
Steele, J., 2010. Radiocarbon dates as data: quantitative strategies for estimating
Vermeersch, P.M., 2016. Radiocarbon Palaeolithic Europe Database, Version 20.
colonization front speeds and event densities. J. Archaeol. Sci. 37 (8),
Available at. http://ees.kuleuven.be/geography/projects/14c-palaeolithic.
2017e2030. http://dx.doi.org/10.1016/j.jas.2010.03.007.
Canadian Archaeological Radiocarbon Database (CARD) http://www.
Snow, D.R., Gahegan, M., Giles, C.L., Hirth, K.G., Milner, G.R., Mitra, P., Wang, J.Z.,
canadianarchaeology.ca/.
2006. Cybertools and Archaeology. Science 311, 958e959.
New Zealand Radiocarbon Database http://www.waikato.ac.nz/nzcd/intro.html.
South, S., 1977. Method and Theory in Historical Archaeology. Academic Press.
Rapa Nui Interactive Radiocarbon Database http://data.bishopmuseum.org/C14/.
Smith, I., 2010. Protocols for organizing radiocarbon dated assemblages from New
Utz Bo hner and Daniel Schyle, radiocarbon CONTEXT database 2002-2006 http://
Zealand archaeological sites for comparative analysis. J. Pac. Archaeol. 1 (2),
context-database.uni-koeln.de/[http://dx.doi.org/10.1594/GFZ.CONTEXT.Ed1].
184e187.
RADON - Central European and Scandinavian database of 14C dates for the Neolithic
Spaulding, A.C., 1960. The dimensions of archaeology. In: Dole, G.E., Carneiro, R.L.
and Early Bronze Age http://radon.ufg.uni-kiel.de.
(Eds.), Essays in the Science and Culture in Honor of Leslie a. White. Crowell,
Andes 14C: Radiocarbon Database for Bolivia, Ecuador and Peru http://andes-c14.
New York, pp. 437e456.
arqueologia.pl/database.html.
Stone, E.C., 2008. Patterns of looting in Iraq. Antiquity 82, 125e138.
Stone, E.C., 2015. An update on the looting of archaeological sites in Iraq. Near East.
Archaeol. 78 (3), 178e186. Volunteer GIS
Stuiver, M., Reimer, P.J., Reimer, R.W., 2017. CALIB 7.1 [WWW program] at. http://
calib.org (Accessed 26 May 2017). Web-GIS Map of Day of Archaeology Posts https://jessogden.carto.com/me.
Suthaharan, S., 2014. Big data classication: problems and challenges in network A #NoDAPLMap: https://northlandia.wordpress.com.
intrusion prediction with machine learning. Perform. Eval. Rev. 41 (4), 70e73. The Bakken Pipeline: https://bakkenpipelinemap.com.
Vayda, A.P., 1960. Maori Warfare: Polynesian Society Monographs, vol. 2. A.H. and The Decolonial Atlas: https://decolonialatlas.wordpress.com.
A.W. Reed, Auckland. Pleiades e The Stoa Consortium: https://pleiades.stoa.org.
Walter, R., Jacomb, C., Bowron-Muth, S., 2010. Colonisation, mobility and exchange Micropasts http://crowdsourced.micropasts.org.
in New Zealand prehistory. Antiquity 84, 497e513. GlobalXplorer http://globalxplorer.org.
Williams, A.N., Ulm, S., Smith, M., Reid, J., 2014. AustArch: A Database of 14C and
Non-14C Ages from Archaeological Sites in Australia - Composition, Compila-
tion and Review (Data Paper). Internet Archaeol. 36 http://dx.doi.org/10.11141/ Spatial toolkit for visualization
ia.36.6.
Wilmshurst, J.M., Hunt, T.L., Lipo, C.P., Anderson, A.J., 2011. High-precision radio- Relief Visualization Toolbox (RVT) http://iaps.zrc-sazu.si/en/rvt#v.
carbon dating shows recent and rapid initial human colonization of East Pol-
ynesia. Proc. Natl. Acad. Sci. U. S. A. 108, 1815e1820. 3D
Wiseman, J., El-Baz, F., 2007. Remote Sensing in Archaeology. Springer-Verlag, New
York.
Yang, C., Raskin, R., Goodchild, M., Gahegan, M., 2010. Geospatial cyberinfras- CyArk http://www.cyark.org/.
tructure: Past, present and future. Computers. Environ. Urban Syst. 34, Sketchfab https://sketchfab.com/.
264e277.
Zubimendi, M.A., Ambrustolo, P., Zilio, L., Castro, A., 2015. Continuity and discon- Institutional Centers
tinuity in the human use of the north coast of Santa Cruz (Patagonia Argentina)
through its radiocarbon record. Quat. Int. 356, 127e146.
Stanford Geospatial Center, Harvard Geospatial Library http://library.stanford.edu/
research/stanford-geospatial-center.
Web and Data References Ancient World Mapping Center (Brown) http://awmc.unc.edu/wordpress/.
Center for Advanced Spatial Technologies (Arkansas) http://www.cast.uark.edu/.
Institute of Anthropological and Spatial Studies, Slovakian Academy of Science and
Site Databases, Atlases, & Archives Arts http://iaps.zrc-sazu.si/en#v.

Archaeology Data Service http://archaeologydataservice.ac.uk/archives/view/c14_