You are on page 1of 8

Geographic Information Science and Systems

PA Longley, Department of Geography, University College London, London, United Kingdom


Michael Frank Goodchild, Department of Geography, University of California Santa Barbara, Santa Barbara, CA, United States
© 2020 Elsevier Ltd. All rights reserved.
This article is a revision of the previous edition article by M. F. Goodchild, volume 4, pp 526–538, © 2009 Elsevier Ltd.

Glossary
Buffer Dilation of point, line, or area features by a defined distance
Ecological fallacy Confusion of the characteristics of areas with those of the point events that occur within them. For example,
an investigation using census tract geography may suggest an association between increased mean age of voters and the local
share of the electorate voting Republican: although confirmed using the aggregated data, it may not in fact be the more elderly
voters within each zone who actually vote Republican. It may be inappropriate to stereotype individuals with the aggregate
characteristics of the areas in which they are observed.
FAIR Data Principles A set of guiding principles in order to make data findable, accessible, interoperable, and reusable. They
are intended to encourage data producers and data publishers to promote maximum use of research data.
Modifiable Areal Unit Problem (MAUP) The potentially distorting effects of scale and aggregation upon the analysis of
geographic phenomena. MAUP effects may occur when point-referenced events are aggregated into zones that are imperfectly
suited to the purpose of an investigation, and within-zone distributions are poorly understood. If analysis is undertaken of
associations or relationships between zonally aggregated attributes, the results may be more a manifestation of the zonal
schema used rather than the true underlying point distribution. The likelihood of MAUP effects increases if the delineation of
zones is unrelated to the goals of spatial analysis.
Spatial heterogeneity Unevenness in the concentration of point attributes within any given area.

Geographic Information Systems


G-I-S
Geographic information Science (GIScience) may be defined in important respects in terms of the underpinning technologies of
geographic information systems (GISystems, or GIS). These technologies provide an automated resource for the input, storage,
retrieval, and analysis of information that is geographically referenceddgeographic information, GI. We begin by describing the
elements of the GIS acronym, noting some ways in which their connotations and use have changed in recent years.
The term “geographic”reminds us that the purview of GISystems is defined in relation to the Earth’s surface and near surface. The
substantive focus of the discipline of Geography is taken as bounded by geographic scales of measurement, conventionally taken as
extending from the architectural to the global. Improvements in both the volume and granularity of geographic data in recent years
(arising from technological innovations such as the global positioning system) have increased the precision with which locations
may be georeferenced on the surface of the Earth, and for this reason many of today’s GISystem applications pertain to highly
granular events and occurrences. Such phenomena are identified using, as a minimum, a tuple of (x,y) coordinates and an attribute
(z)dand often a timestamp (t) as well. For this reason, geographic datasets tend to be voluminous, and GIS are frequently required
to process what has become known as Big Data. Many of today’s GISystem applications thus engender point, line, area, and
volumetric objects that are defined at sub-geographic scales of measurement, but that contribute to forms and processes that are
manifested as geographic phenomena. The geo- prefix reminds us that GIS applications pertain to the unique spaces of the Earth.
The uniqueness of these locations means that GISystems applications pertain to unique places, and the deployment of GIS to repre-
sent places often seeks to represent them as outcomes of the application of general processes to unique forms. As such, GISystems
can be thought of as representing a kind of augmented reality in which unique starting conditions are successively draped with the
cumulative outcomes of a succession of processes that are known and understood.
Although often loosely used as a synonym of “data,” the term “information” is conventionally thought of as raw data that have
been selectively processed in preparation for some purpose. The applied use of GISystems places success in practical application at
the heart of their deployment and use. In recent years, the wide invocation of the term “Big Data,”but not “big information” reminds
us that raw georeferenced data usually require explicit grounding in geographic context. The technology of GISystems is
inherently suited to processing of large data volumes. Three of the “v’s” of Big Datadtheir volume, variety, and update frequency
(velocity)dpresent ongoing challenges for GISystem technology to keep abreast of the accelerating rate at which new and ever larger
assemblages of data are created and need to be maintained, in their many and varied forms. A fourth “v” of Big Data concerns their
veracity which, as described below, can be particularly challenging when Big Data sources are created as a by-product of some occur-
rence (such as a consumer purchase, or a journey) which was not principally motivated by the intention to create a georeferenced
digital record.

International Encyclopedia of Human Geography, 2nd edition, Volume 6 https://doi.org/10.1016/B978-0-08-102295-5.10557-8 29


30 Geographic Information Science and Systems

The term “GIS” came into common parlance in the relatively data poor years of the 1980s when source data were typically
digitally encoded from preexisting hard copy sources and there was clear advantage in encoding only the minimum necessary
to achieve a purpose. Representations were guided by parsimonydexpending the minimum effort in digital recreation of data
to fulfill the requirements of a clearly framed research designdwith due regard to balance in content and coverage. The frame-
works provided by today’s Big Data may be very much more detailed in content, yet coverage may be guided by some primary
purpose that is unconducive or even detrimental to evenness. In many social and environmental realms, therefore, there should
be additional onus upon the use of GISystems to ensure that Big Data adequately frames the application for which they are
intended.
“System” has at least two GIS-relevant connotations. First, the term achieved wide usage in Geography in this 1960s when,
borrowing from the emergent field of Systems Theory, it was used as a pragmatic means of geographically bounding a set of
elements of potential interest for analysis, as for example in isolating a city system in order to examine its properties. Given the rarity
of natural units of analysis in Geography this was a somewhat artificial analytical expedient, the more so because the vagaries of
statistical reporting units frequently led to under- or over-bounding the application. Today, the effects of what has been described
as the Modifiable Areal Unit Problem can be explored using computationally intensive methods. But there is no purely analytical
solution to this issue, and the analyst must remain the ultimate arbiter of what is a robust and defensible definition of the system of
interest.
This applications-based definition of a system of interest also has resonance in the configuration of GISystems as a technology of
problem-solving. The early GIS were bounded in hardware terms to single computer devices, but became linked first through the
intranets of large organizations in the late 1990s and then by the Internet from the early 2000s onwards. Today it no longer makes
sense to talk of any isolated GI “system” save for the rare instances in which it is a requirementdsuch as reasons of information
security or disclosure controldthat a hardware configuration be isolated from the rest of the globally networked GISystem. Such
instances are rare, although emergent e-infrastructure is becoming increasingly structured by expedients of data processing and
the requirements of data access protocols. The client–server model of the early 2000s, in which a GIS user accesses the system using
a client device but most or all of the information processing is carried out on a remote server, has gradually evolved into massively
parallelized computer architectures in which massive server farms may retrieve multiple datasets and conduct analysis in different
locations that may be geographically dispersed, and the regulation of the jurisdictions in which different sources and types of data
may be legally held has itself reshaped the data and GI services industry. Indeed, more generally, GISystems are becoming increas-
ingly shaped by data access protocols and the ways in which datasets can be concatenated and conflated with due regard to data
protection, especially where personal or other sensitive data may be deanonymized through data linkage and data intensive
processing.

The Layer Model


The layer is perhaps the most conspicuous and iconic concept of GISystems, having its roots very early in GISystem history and
capturing one of its strongest motivations. It proposes that the geographic world can be represented as a series of thematic
layers, each carrying information relevant to one particular thematic domain. To mapping agencies, each layer might corre-
spond to the information portrayed in one color of ink on a topographic map: contours of topographic elevation in brown,
urban areas in pink, or wooded areas in green. To Ian McHarg, a landscape architect seeking to develop a new model for
a department at the University of Pennsylvania in the late 1950s, each layer represented the perspective of one discipline:
the geologic layer captured the factors believed by geologists to be important in planning; the ecological layer captured factors
originating in concern for biological conservation; and various social layers captured factors related to the economy and human
populations.
By stacking the layers, one could combine factors in various ways. A power line, for example, might require one distinct set
of weights to be applied to each discipline’s layer, while a new shopping center might require a different set of weights. This
notion of partitioning geographic variation into a number of thematic layers underlies much traditional practice in cartog-
raphy, where each layer might be the subject of a different printed map, or might be depicted on a single map in a distinctive
color. From another perspective, it captures the ability of GISystems to relate seemingly unrelated information through
common geographic location. Only in a GISystem, it is argued, can one combine detailed information on the ethnicity of
a city’s neighborhoods with detailed information on patterns of atmospheric pollution, to address the central question of envi-
ronmental equity: do minorities bear a disproportionate impact from industrial pollution? Only in a GISystem can one
combine a map showing the distribution of schools with one showing the distribution of liquor outlets. While all of these tasks
could in principle be completed manually, the effort needed to redraft onto transparent media and to correct for differences of
scale and map projection is often prohibitive. The layer model provides a compelling way in which the uniqueness of different
places scattered across the Earth’s surface may be decomposed into constituent physical, environmental, and socioeconomic
attributes.
In representing the sometimes highly variegated nature of geographic phenomena, GISystems fundamentally provide an envi-
ronment for applied problem-solving. Success in this regard is straightforward in the applications alluded to in the previous para-
graph, although in others closer examination of the ingredients that make up the layers is required, and/or to the ways in which they
may be combined in pursuit of solutions. Such examination is the purview of geographic information science, to which we next
turn.
Geographic Information Science and Systems 31

Geographic Information Science (GIScience)


Science and Problem-Solving
While the nature and meaning of science is constantly debated, there is general consensus on core principles and concepts. First,
science seeks laws and principles that can be shown to be valid in the observable world, and are generalizable in the sense that
they apply everywhere and at all times. Second, science is founded on definitions of terms that are rigorously stated and understood
throughout the scientific community. Third, scientific experiments and their results are replicable, being stated in sufficient detail
that someone else could expect to obtain them by carrying out an identical experiment. In this context black box is a pejorative term
used to describe circumstances in which the research procedures carried out in an ostensibly scientific application cannot be fully
described and therefore replicated. Fourth, it is recognized that in geography, experiments carried out in the unique environments of
the Earth surface and near surface are often described as investigations rather than experiments, since each case study location is
necessarily unique. Replicability is nevertheless possible within limits, since scientific procedures require that every element that
makes up the investigation has a known and prespecified chance of selection. Any selectivity and bias in the creation of a represen-
tation needs to be understood if the analyst is to have any chance of accommodating itdfor example vehicle GPS traces that record
potholes will not give a municipality a reliable estimate of city-wide repair costs if vehicles are not driven down potholed as well as
clear roads. If the processes that led to the creation of a representation of the world are not clearly set out in full, the potential sources
and operation of bias cannot be understood. In addition, scientific conventions apply to the details of reporting, as in the rule that
any measurement or numerical result be stated to a precision (number of significant digits) that reflects the accuracy of the
measuring device or model. Principles such as these help to define GIScience, and to distinguish it from less rigorous applications
of GISystems and related technologies.
A distinction is often drawn between pure science, or science driven by curiosity and the search for general discoveries, and applied
science, or science that seeks to solve problems in the observable world using scientific methods. The quest to develop scientific
approaches at geographic scales of measurement can be thought of as curiosity driven and pure, insofar as they seek to uncover
core organizing principles and concepts that are of universal applicabilitydbut are almost invariably developed through applied
science, tested in the unique observatory of the Earth’s surface and near surface.

The Defining Characteristics of Geographic Information


Geography is fundamentally concerned with the question “where”? and the unique and sometimes difficult and complex issues that
it raises. In the 1980s geographer Luc Anselin framed these issues using two universal attributes of geographic information: spatial
dependence and spatial heterogeneity. The first of these had previously been referred to as the “First Law of Geography” by Waldo
Tobler in 1970, namely that “[a]ll things are similar, but nearby things are more similar than distant things.” This statement is
readily formalized in the principles of regionalized variables that underlie the science of geostatistics and in the models widely
used in spatial statistics. It is clear that the vast majority of phenomena distributed over the Earth’s surface and near surface adhere
to this regularity, albeit that there is wide variation in precisely how similarity decays with distance. Moreover, the Law clearly under-
pins various operations of GISystems, with buffering being perhaps the most obvious example.
The principle essentially encapsulates the concept of geographic context, concerned as it is with the association between
a phenomenon at one point with the same phenomenon at nearby points. It appears to apply well in three-dimensional space
and also to apply in four-dimensional space-time. Perhaps the easiest way to demonstrate its validity is to consider the absurdity
of a world in which a minute change in location on the Earth’s surface switches to a completely independent environment: cliff
edges and potholes provide instances where this is indeed the case, but less abrupt variation is very much the norm.
As a cornerstone of GIScience the principle has two major implications. First, similarity over short distances allows the Earth’s
surface to be divided into regions within which phenomena are approximately homogeneous, achieving great economies in data
volume by expressing attributes as properties of entire areas rather than of individual points. This principle enables the assumed
within-zone homogeneity within the polygons that underpin many representations in GIS. Similarly, it allows reasonable guesses
to be made of the properties of places that have not been visited or measured, in a process known as spatial interpolation. The
principle thus justifies the techniques that are used, for example, to create contour maps from scattered point observations of
altitude.
While this principle of spatial dependence aids geographic representation, it impedes use of conventional statistical inference
from geographic information, since it runs counter to the commonly required assumption that the data was acquired through
a process of random and independent sampling from a parent population. An analysis of the 39 counties of Washington, for
example, cannot make that assumption since the principle implies that conditions in neighboring counties will likely be similar.
Moreover, there is no larger universe of which the set of all counties of Washington constitute a random sample.
Anselin’s second principle addresses spatial heterogeneity, or the tendency for parts of the Earth’s surface to be distinct from one
another. This has profound implications for the way in which we represent the world since, for example, the range of soil types
found in the United Kingdom is not likely to be relevant to the classification of soils in many other parts of the world where combi-
nations of geology, climate, and relief result in soils that simply are not found in the UK. A UK soil classification is thus likely to be
only partially relevant, at best, to the classification of soils in very different parts of the world. More generally, any universal standard
will inevitably be suboptimal for any local jurisdiction, and there will always be tension between the desire to be locally optimal
and the desire to be globally universal.
32 Geographic Information Science and Systems

Simulation in GIScience
The concepts of spatial dependence and spatial heterogeneity invite normative (idealized) conceptions of the nature, size, and
spacing of geographic phenomena. Thus, Walter Christaller was able to show that simple assumptions about the effects of distance
and behaviordusing the nearest settlement that offers the required level of service in the settlement systemdled to a hexagonal
geometric arrangement of settlements across a perfectly uniform (isotropic) plane. Similarly, William Morris Davis was able to theo-
rize about the development of topography through the process of erosion, but only by assuming a starting condition of a flat,
uplifted plateau of uniform structure and exposure to geophysical processes. Yet these controlled conditions very rarely exist in
the observable world, and so the ways in which such hypotheses play out can be difficult to predict, given the intrinsic heterogeneity
and complexity of the Earth’s surface. Research in geography thus tells us that the perfect theoretical patterns predicted almost never
arise in practice.
One way of addressing such issues is to assume that in the seeming infinite complexity of the observable world, all patterns are
equally likely to emerge; and that the properties we will observe will be those that are most likely. This strategy enabled Alan Wilson
to demonstrate that the most likely form of distance decay in human interaction was the negative exponential; and Ronald Shreve
was able to show that the effect of random development of stream networks would be the laws previously postulated by Robert E.
Horton. Similar approaches have been applied to the statistical distribution of city size, or the patterning of urban form. However,
although the results often “look right” in terms of size, shape, scale, and dimension when viewed on a GISystem, vagaries inherent
in representing multiple physical processes or human agency limit the accuracy of predictions at specific locations. As such, the
results of simulations are not usually directed toward practical problem-solving, but rather are used to gauge the effects of simple
hypotheses about behavior on the uniquely complex landscapes of the geographic world. The value of such approaches lies in the
general hypotheses they advance about human behavior, landscape evolution, and the spatial patterning of geographic phenomena.
Such approaches fall into two major categories, depending on how the hypotheses about behavior are expressed. The approach
of cellular automata begins with a representation of the landscape as a raster grid, and implements a set of rules about the conditions
in any cell of the raster. The approach was originally popularized by John Conway in his Game of Life, in which he was able to show
that distinct patterns emerged through the playing out of simple rules on a uniform landscape. These patterns are known as emergent
properties, since they would be virtually impossible to predict through mathematical analysis. The cellular automata approach has
been used by Keith Clarke and others to simulate urban growth, based on simple rules that govern whether or not a cell will change
state from undeveloped to developed. These approaches allow for the testing of policy options, expressed in the form of modifica-
tions to the rules or to the landscape, and have been widely adopted by urban planners.
A different approach centers on the concept of the agent, an entity that is enabled to move across the geographic landscape and
behave according to specified rules. This agent-based approach is thus somewhat distinct from the cell-based approach of cellular
automata. Agent-based models have been widely implemented in GIScience, for example, to study crowd management by simu-
lating the behavior of individuals within them and examining scenarios which might trigger panic or cause mass injury.
Any model is only as good as the rules and hypotheses about behavior on which it is based, and so it is unlikely that simulated
results will lead directly to a modification of the rules. It is more likely that, in the light of results, rules will be improved using
controlled experiments outside the context of the modeling. If patterns emerge that were unexpected, one might argue that scientific
knowledge has advanced, but on the other hand such patterns may be due to the specific details of the modeling, and may not
replicate anything that actually happens in the observable world.
Validation or verification of simulation is always problematic, since the results purport to represent a future that is still to come.
Hindcasting from the present day state of a system to known past states is a useful technique, or may work forward from some time in
the past. But the predictions of the model will never replicate reality perfectly, forcing the investigator to consider the level of error in
prediction that is acceptable. It is possible and indeed likely that the rules and hypotheses about social behavior that drive the model
will change in the future. In this respect, models of physical processes may be more reliable than models of social processes.

Scale and Aggregation Issues in GIScience


The term scale is often used in GIScience in the sense of spatial resolution, to distinguish between fine-scale or detailed data and
coarse-scale or generalized data. When building a representation, scale is sacrificed by using a smaller representative fraction
(e.g., 1:10,000 or 1/10,000 is a larger fraction than 1:1,000,000, or 1/1,000,000), and may be justified in order to reduce data
volume, particularly if details are extraneous. This can reduce the processing power required to run a model, and is also desirable
in mapping where extraneous data may clutter and obscure interpretations of the map.
In practical terms, the elemental units of analysis in GIScience are often in fact reporting areas (units), such as census tracts. Their
extents are usually guided by the number of true individuals they contain and the heterogeneity of their characteristics, usually to
prevent violation of individual privacy. The combined effects of the scales (granularities) at which individual units are defined and
the different ways in which they may be aggregated into higher order units gives rise to the modifiable areal unit problem. The term was
characterized and popularized by Stan Openshaw, who demonstrated, first, how associations between variables tended to increase
at coarser granularities (the scale effect) and, second, that different geographic aggregations of elemental reporting units into simi-
larly scaled aggregations could lead to dramatic changes in results. His contribution to GIScience was to develop a first generation of
computationally intensive applications to demonstrate the effects of scale and aggregation using different zones. Unfortunately, in
most cases the elemental units are themselves predefined aggregations.
Geographic Information Science and Systems 33

In important respects, this analytical work differs from the approaches to simulation discussed in Simulation in GIScience
section in that it is guided solely by repeated numerical simulation in the absence of any guiding hypothesis as to the most appro-
priate scale and zonal configuration at which a geographic phenomenon should manifest itself. Openshaw’s original case study
examined the 99 counties of Iowa to explore the relationship between percentage of the resident population over 65 and the
percentage of registered Republican voters. Different aggregations of elemental counties revealed different results. But what is
missing in this case is any well-defined hypothesis as to why any correlation should appear, and at what scale it should be man-
ifested. For example, if a process were to work at the individual level, and older people were more likely to vote Republican, the
hypothesis is best tested at the individual level. Conversely, the process might be ecological, in that a neighborhood in which
many residents are over 65 might attract many Republican voters, irrespective of their ages. In the latter case the appropriate scale
of analysis is that of neighborhoods, requiring their formal definition as a set of places, each comprising aggregations of elemental
finer-scale (e.g., census block) data. The general point is that we should be looking for statistics that are sensitive to the scale at which
the phenomenon is likely to manifest itself. Thus, the MAUP is not an empirical problem but rather is a theoretical requirement that
can be used to hone statistics to the explicitly geographic context in which they are applied.
A closely related fundamental problem, also well-recognized in GIScience, is the ecological fallacy, the fallacy of reasoning from
the aggregate to the individual. The fallacy already appeared in the previous paragraph, since it would be wrong to infer from
a county-level correlation that individuals over 65 tend to vote Republican. In fact, in the extreme, Openshaw’s correlations could
exist in Iowa at the county level even though no person over 65 was a registered Republican.
Transferred to today’s computationally intensive simulations of different geographic environments, this point is akin to the
distinction between virtual reality and augmented reality: the former may focus upon simulation of multiple worlds that “look right”
in impressionistic terms but without any unique constellation of features that define a known place on the Earth’s surface; while the
latter retains recognizable characteristics of unique places as a framework for additional visual features. More generally still, this
resonates with geographers” long-established preoccupations with the concept of place. GISystems require named places to be rep-
resented either as precisely located points, polylines, or polygons. Yet while these constructs are very good for establishing locational
precision, the underpinning coordinate systems such as latitude and longitude are unfamiliar to individuals who nevertheless have
excellent recall of home street address and the neighborhood in which it is located. Recently there has been much interest in a platial
approach to geographic knowledge, emphasizing named places as referents, as a more human-centric alternative to the familiar
spatial approach with its emphasis on coordinate referents. New data sources, including social media, provide a rich basis for
exploring associations of place.

Big Data and Data Science


Recent years have seen the advent of huge volumes of “Big Data” that have vastly increased the range of geographic phenomena that
can be represented using GISystems, often with astonishing geographic precision alongside very frequent temporal refresh. Such
data present the potential to represent almost anything, anywhere, at any time. As discussed in the introduction, although often
used as close synonyms, there is a nuanced difference between “data” and “information” in that the latter implies selectivity and
preparation of raw data for a particular purposedsuch as adjusting measures of ambient temperature for altitude, or standardizing
an area’s unemployment counts for the size of the working age population. For this reason, the term “Big Information” lacks reso-
nance and is not used, and for the same reason Big Data place increased strain upon GISystems to evaluate and communicate the
provenancedas, for example, with social media or smart travel card dataddata are created as the by-product of delivery of services
or goods. The characteristics and activity patterns of individuals that use public transit are often fundamentally different from those
of non-users, and the characteristics of self-selecting sub-samples of social media users who choose to reveal their geo-locations. The
creation of big data can often be thought of as the outcome of stratified sifting of a population, where the stratification criteria are
unknown, and possibly unknowable.
The terms “hard” and “soft” data have been coined to denote, respectively, data sources collected in accordance with a clearly
developed design, and data for which no design guided their collection. The design underpinning hard data sources, along with
practical issues such as those arising during the data collection process, are usually recorded in metadata d“data about data”dthat
can be used to establish the provenance and value of a data source for a given application. Metadata are only rarely available for soft
data sources (including most Big Data assemblages), save for rare instances in which such data have been properly ingested for
research purposes. If Big Data selectivity bias can be estimated, for example by triangulation of a social media Big Data source
with a conventional scientific survey such as a census, it may be possible to accommodate the estimated bias by reweighting
post stratification. Strictly speaking, this is poor science, but in practice should be considered if it liberates vast rich and timely
data sources that would otherwise be unusable in spatial analysis. The risk of encountering conflicting information when using
GISystems to bring together diverse data sources will be reduced if the representativeness of all constituent data sets is well
understood.
Metadata created for soft sources can suggest ways in which they may be “hardened”dfor example, by reweighting with reference
to well understood framework sources such as censuses. Attendant adjustments to the scopeddefined in terms of situations,
processes, or objectsdof an investigation may be required if cross validation of data suggest systematic unevenness of coverage
of a dataset, since this will otherwise likely limit the robustness of research findings.
In undocumented and unhardened form, use of soft data sources should be restricted to exploratory analysis, for example in
order to aid hypothesis formulation. For example, crowd-sourced data on attitudes to local levels of atmospheric pollution may
34 Geographic Information Science and Systems

help to shape hypotheses about remedial action, if it is clearly understood that those that voluntarily contribute may have quite
different attitudes, backgrounds and predilections to those that do not. This argues that GIScience methods should be used to render
big and conventional data sources robust and transparent through data hardening. The most obvious way in which this might be
achieved is by triangulation with conventional framework sources of known provenance, and documentation of any reweighting
that is necessary to further these goals. GIScience provides practical ways of surviving the big data deluge.
The observable world is of seemingly infinite complexity, and many of the attributes commonly processed in GISystems, such as
soil class or vegetation cover type, have an inherent degree of subjectivity and specificity to the geographic context in which they are
observed. The consequent uncertainty present in all geographic information means that few if any geographic datasets give the
researcher objective knowledge of the differences between the data set and the observable world. In addition to the triangulation
processes alluded to above, users must therefore also often rely on indirect measures such as map scale to understand the limitations
of the data.
Allied to the innovation of big data, data science has become of interest to GIScientists in recent years. In significant part,
techniques such as machine learning are in practice extensions of preexisting methods of data reduction and pattern detecting,
and limited effort has been devoted to rendering some more recent machine learning methods explicitly spatial. Artificial
intelligence(AI) in its general sense concerns the theory and development of computer systems to perform tasks that hitherto
were possible using human intelligence, such as visual perception and decision-making, and there has been progress in spatial
applications. It is less clear that AI applications in geography are in any sense autonomous, as implied by a stricter definition of
the field, and the predictive success of applications is often hamstrung by rather limited (human or machine) understanding of
the scope and limitations of the underpinning data. Academic programs have nevertheless been instituted in response to what is
perceived as a rapidly growing market for associated background data skills.
Prediction has been a major motivation underpinning the wider development of data science, and remarkable success has been
achieved when data mining is applied to comprehensive datasets, as with translation between languages or speech recognition.
However, where datasets are incomplete or selective with respect to spatial locations or time periods, there is evidence that the scope
of predictions may be limited to particular spaces and times. Tools of machine learning, including artificial neural nets and deep
learning can be very effective in mining successful predictions from big data. Yet despite its practical value, prediction has always
taken a back seat in science to explanation and understanding. The hidden layers of a trained neural network are difficult to interpret
within the kinds of hypothesis testing and theory confirmation that have underpinned much of science to date. It is difficult to see
how such techniques might achieve generalizability across study areas and time periods, especially when the underpinning data are
partial and piecemeal. Data driven geographies need to remain cognizant of the nature of geographic data and the structural
characteristics that underpin them.
The core organizing principles and concepts of information (as opposed to data) science encompass processes for storing and
retrieving information for analytical purposes.GIScience takes this further, thorough explicit recognition of the properties of spatial
data, in terms of topological relations and interrelatedness of spatial attributes. Successful GIScience is effectively framed in terms of
spatial and temporal transferability, transparency, and robustness.

The Social Context


GISystems provide technical solutions to a set of well-defined tasks, but they nevertheless raise important issues of a social nature.
Are their databases objective representations of reality, or are they to some degree social constructions? Is it possible for GISystem
databases to be influenced by the agendas of their creators? If GISystems are expensive, do they inevitably reinforce the interests of
those in power and marginalize other interests? Are GISystems tools for increasing the dominance of the English language and
Western ideologies, and do they too often ignore the interests of minorities and other languages and cultures? Does the technology
of GISystems provide a basis for surveillance, and unwanted invasion of individual privacy?
Questions such as these have driven the emergence in recent years of an important new branch of GIScience under such names as
Critical GIS and AltGIS. There is not sufficient space in this entry for a detailed examination, but several important and accessible
texts are included in the references.

The Technology of Geographic Problem-Solving


CyberGIS, E-Infrastructure, and FAIR Data
Cyberinfrastructure and e-infrastructure are terms used to describe the kinds of computing infrastructure that are required to support
science. The networking of GISystems and the data sources that supply them are illustrative of the end-to-end computer infrastruc-
ture that services individuals and teams of investigators in scientific problem-solving, alone or in collaboration. The requirements of
accessing and processing huge volumes of data have led to the development of massively parallel computers and high-performance
computing (HPC) architectures to solve complex, large-scale problems. Parallel architectures have an inherently good fit to the
nature of geographic space and its somewhat independent individual and community agents, all of which can be seen as
semi-independent decision-makers acting in parallel rather than serially. Some authors have argued that geographic research and
problem-solving requires a specific form of cyberinfrastructure that addresses several key issues, and have coined the term cyberGIS.
Geographic Information Science and Systems 35

A further defining characteristic of geographic data is the ease of disclosure of personal or sensitive personal datadterms that
have precise meanings in terms of European General Data Protection Regulationdabout human subjects. As a consequence,
geographic detail is usually the first casualty of disclosure control measures designed to protect confidentiality. This is most usually
achieved through aggregation, opening up the results of geographic analysis to the potential ecological fallacy and modifiable areal
unit effects discussed above. Recent years have seen a growth in use of computer networks to support inter- and multi-disciplinary
collaboration between individuals and teams of researchers, but the undoubted benefits do need to be weighed against the possi-
bilitydmay lead to disclosure of information about identifiable human subjects. Issues of data stewardship infrastructure are thus
strategically important alongside physical data and computational infrastructure.
Partly as a response, and as part of a wider movement to ensure correct and ethical attribution of research resources and findings,
authentication, authorization, and accounting infrastructure (AAAI) has developed to facilitate the use of computation and data in
the research process. Core data and computational e-infrastructure and high-performance networks have developed rapidly in
recent years, enabling researchers to stretch their uses of e-infrastructure and thereby develop more ambitious approaches to collab-
orative research. This has enabled use of high-volume, high data-capture rate instruments and the capability to acquire and process
extremely large data sets. GIScience is well placed to link more and more research domains and their extensive arrays of datasets, and
to synthesize their growing complexity and richness. In consequence, the desire to share, collaborate and synthesize data derived
from combining data sets is also growing.
Today, data rich GIScience is about more than analyzing the small fraction of all datasets that happen to be available as conven-
tional statistical or other open sources. For example, data assembled as a by-product of business-to-customer interactions in the
supply of goods or services account for an increasing real share of all of the (ever-increasing) data that are collected about citizens
today. Data are, arguably, the world’s greatest resource, and geographic information technologies are pivotal to ensure that data
analysis conforms to FAIR principles of being findable, accessible, interoperable, and reusable; however, in the new multi-sector
big data economy, open data are but one component of the data resources that can be accessed under FAIR principles, and
AAAI is increasingly central to effective and ethical data resource exploitation.

Conclusion

Alongside developments in computation and computer architectures, the advance of GIScience is inextricably linked to vast and
rapid changes in the data economy and the facility to access, link, and share large datasets in ways that are efficient, effective,
and uncompromizing with respect to disclosure control. Most fundamentally, however, advancement arises out of attendant
changes in scientific practice that may appear mundane but are nonetheless profound and far-reaching. Good science is relative
to what we have now, and improved understanding of data and their provenance is a necessary precursor to better analysis of spatial
distributions in today’s data- and computation-rich world.
The “geo-” prefix makes clear that GIScience is an applied science of the observable world, and that its continuing progress will
consequently be judged upon the success of its applications in that unique laboratory. Transparency of computationally rich
methods and techniques are undoubtedly strategically important to this ongoing quest, yet the experience of the last quarter century
suggests that there are rather few purely computational solutions to substantial real-world problems. The broader challenge is to
address the ontologies that govern our conception of real-world phenomena, and to undertake robust appraisal of the provenance
of data that are used to represent the world using GISystems.
The apparatus of science is in practice adaptable to the inherent vagaries of the ever-broadening range of available data and the
vicissitudes of the ways in which they are interpreted by humans. GIScience aligns fundamental principles and concepts with the
messy empirical domains of real world places and contexts. Scientific approaches to representing places will continue to benefit
from the widest possible availability of new data sources and novel applications of existing ones. An ever-broadening constituency
of researchers and end users will participate in the creation, maintenance, and evaluation of real world representations. All of this
will contribute to perhaps the most important goal of GIScience: to develop explicitly geographical representations of the accumu-
lated effects of historical and cultural processes upon unique places.

See Also: Cartographic Anxiety; Modifiable Areal Unit Problem; Quantitative Methodologies; Spatial Ontologies.

Further Reading

Anselin, L., 1995. Local indicators of spatial association – LISA. Geogr. Anal. 27 (2), 93–115.
Batty, M.J., Longley, P.A., 1994. Fractal Cities: A Geometry of Form and Function. Academic Press, San Diego, CA.
Clarke, K.C., Gaydos, L., 1998. Loose coupling a cellular automaton model and GIS: long-term growth prediction for San Francisco and Washington/Baltimore. Int. J. Geogr. Inf. Sci.
12 (7), 699–714.
Crampton, J.W., 2010. Mapping: A Critical Introduction to Cartography and GIS. Wiley-Blackwell, Chichester, UK.
Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley, Hoboken, NJ.
Goodchild, M.F., 1992. Geographical information science. Int. J. Geogr. Inf. Systems 6 (1), 31–45.
Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69 (4), 211–221.
36 Geographic Information Science and Systems

Hey, A.J.G., Tansley, S., Tolle, K.M., 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA.
Janelle, D.G., Goodchild, M.F., 2018. Territory, geographic information, and the map. In: Wuppuluri, S., Doria, F.A. (Eds.), The Map and the Territory: Exploring the Foundations of
Science, Thought and Reality. Springer, Dordrecht, pp. 609–628.
Kelleher, J.D., Tierney, B., 2018. Data Science. MIT Press, Cambridge, MA.
Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., 2015. Geographic Information Science and Systems, fourth ed. Wiley, Hoboken, NJ.
Longley, P.A., Cheshire, J.A., Singleton, A.D., 2018. Consumer Data Research. UCL Press, London.
Openshaw, S., 1983. The Modifiable Areal Unit Problem. GeoBooks, Norwich, UK.
Openshaw, S., Veneris, Y., 2003. Numerical experiments with central place theory and spatial interaction modelling. Environ. Plan. A 35 (8), 1389–1403.
Schuurman, N., 2000. Trouble in the heartland: GIS and its critics in the 1990s. Progr. Human Geogr. 24 (4), 569–590.
Shreve, R.L., 1966. Statistical law of stream numbers. J. Geol. 74, 17–37.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 46 (2), 234–240.
UKRI, 2019. E-infrastructure Roadmap. UKRI, Swindon.
Wang, S., 2016. CyberGIS and spatial data science. GeoJournal 81 (6), 965–968.
Wilson, M.W., 2017. New Lines: Critical GIS and the Trouble of the Map. University of Minnesota Press, Minneapolis.
Yates, J., 2019. UKRI Data Infrastructure Roadmap. UKRI, Swindon.
Zhang, J.X., Goodchild, M.F., 2002. Uncertainty in Geographical Information. Taylor and Francis, New York.

You might also like