Professional Documents
Culture Documents
Glossary
Buffer Dilation of point, line, or area features by a defined distance
Ecological fallacy Confusion of the characteristics of areas with those of the point events that occur within them. For example,
an investigation using census tract geography may suggest an association between increased mean age of voters and the local
share of the electorate voting Republican: although confirmed using the aggregated data, it may not in fact be the more elderly
voters within each zone who actually vote Republican. It may be inappropriate to stereotype individuals with the aggregate
characteristics of the areas in which they are observed.
FAIR Data Principles A set of guiding principles in order to make data findable, accessible, interoperable, and reusable. They
are intended to encourage data producers and data publishers to promote maximum use of research data.
Modifiable Areal Unit Problem (MAUP) The potentially distorting effects of scale and aggregation upon the analysis of
geographic phenomena. MAUP effects may occur when point-referenced events are aggregated into zones that are imperfectly
suited to the purpose of an investigation, and within-zone distributions are poorly understood. If analysis is undertaken of
associations or relationships between zonally aggregated attributes, the results may be more a manifestation of the zonal
schema used rather than the true underlying point distribution. The likelihood of MAUP effects increases if the delineation of
zones is unrelated to the goals of spatial analysis.
Spatial heterogeneity Unevenness in the concentration of point attributes within any given area.
The term “GIS” came into common parlance in the relatively data poor years of the 1980s when source data were typically
digitally encoded from preexisting hard copy sources and there was clear advantage in encoding only the minimum necessary
to achieve a purpose. Representations were guided by parsimonydexpending the minimum effort in digital recreation of data
to fulfill the requirements of a clearly framed research designdwith due regard to balance in content and coverage. The frame-
works provided by today’s Big Data may be very much more detailed in content, yet coverage may be guided by some primary
purpose that is unconducive or even detrimental to evenness. In many social and environmental realms, therefore, there should
be additional onus upon the use of GISystems to ensure that Big Data adequately frames the application for which they are
intended.
“System” has at least two GIS-relevant connotations. First, the term achieved wide usage in Geography in this 1960s when,
borrowing from the emergent field of Systems Theory, it was used as a pragmatic means of geographically bounding a set of
elements of potential interest for analysis, as for example in isolating a city system in order to examine its properties. Given the rarity
of natural units of analysis in Geography this was a somewhat artificial analytical expedient, the more so because the vagaries of
statistical reporting units frequently led to under- or over-bounding the application. Today, the effects of what has been described
as the Modifiable Areal Unit Problem can be explored using computationally intensive methods. But there is no purely analytical
solution to this issue, and the analyst must remain the ultimate arbiter of what is a robust and defensible definition of the system of
interest.
This applications-based definition of a system of interest also has resonance in the configuration of GISystems as a technology of
problem-solving. The early GIS were bounded in hardware terms to single computer devices, but became linked first through the
intranets of large organizations in the late 1990s and then by the Internet from the early 2000s onwards. Today it no longer makes
sense to talk of any isolated GI “system” save for the rare instances in which it is a requirementdsuch as reasons of information
security or disclosure controldthat a hardware configuration be isolated from the rest of the globally networked GISystem. Such
instances are rare, although emergent e-infrastructure is becoming increasingly structured by expedients of data processing and
the requirements of data access protocols. The client–server model of the early 2000s, in which a GIS user accesses the system using
a client device but most or all of the information processing is carried out on a remote server, has gradually evolved into massively
parallelized computer architectures in which massive server farms may retrieve multiple datasets and conduct analysis in different
locations that may be geographically dispersed, and the regulation of the jurisdictions in which different sources and types of data
may be legally held has itself reshaped the data and GI services industry. Indeed, more generally, GISystems are becoming increas-
ingly shaped by data access protocols and the ways in which datasets can be concatenated and conflated with due regard to data
protection, especially where personal or other sensitive data may be deanonymized through data linkage and data intensive
processing.
Simulation in GIScience
The concepts of spatial dependence and spatial heterogeneity invite normative (idealized) conceptions of the nature, size, and
spacing of geographic phenomena. Thus, Walter Christaller was able to show that simple assumptions about the effects of distance
and behaviordusing the nearest settlement that offers the required level of service in the settlement systemdled to a hexagonal
geometric arrangement of settlements across a perfectly uniform (isotropic) plane. Similarly, William Morris Davis was able to theo-
rize about the development of topography through the process of erosion, but only by assuming a starting condition of a flat,
uplifted plateau of uniform structure and exposure to geophysical processes. Yet these controlled conditions very rarely exist in
the observable world, and so the ways in which such hypotheses play out can be difficult to predict, given the intrinsic heterogeneity
and complexity of the Earth’s surface. Research in geography thus tells us that the perfect theoretical patterns predicted almost never
arise in practice.
One way of addressing such issues is to assume that in the seeming infinite complexity of the observable world, all patterns are
equally likely to emerge; and that the properties we will observe will be those that are most likely. This strategy enabled Alan Wilson
to demonstrate that the most likely form of distance decay in human interaction was the negative exponential; and Ronald Shreve
was able to show that the effect of random development of stream networks would be the laws previously postulated by Robert E.
Horton. Similar approaches have been applied to the statistical distribution of city size, or the patterning of urban form. However,
although the results often “look right” in terms of size, shape, scale, and dimension when viewed on a GISystem, vagaries inherent
in representing multiple physical processes or human agency limit the accuracy of predictions at specific locations. As such, the
results of simulations are not usually directed toward practical problem-solving, but rather are used to gauge the effects of simple
hypotheses about behavior on the uniquely complex landscapes of the geographic world. The value of such approaches lies in the
general hypotheses they advance about human behavior, landscape evolution, and the spatial patterning of geographic phenomena.
Such approaches fall into two major categories, depending on how the hypotheses about behavior are expressed. The approach
of cellular automata begins with a representation of the landscape as a raster grid, and implements a set of rules about the conditions
in any cell of the raster. The approach was originally popularized by John Conway in his Game of Life, in which he was able to show
that distinct patterns emerged through the playing out of simple rules on a uniform landscape. These patterns are known as emergent
properties, since they would be virtually impossible to predict through mathematical analysis. The cellular automata approach has
been used by Keith Clarke and others to simulate urban growth, based on simple rules that govern whether or not a cell will change
state from undeveloped to developed. These approaches allow for the testing of policy options, expressed in the form of modifica-
tions to the rules or to the landscape, and have been widely adopted by urban planners.
A different approach centers on the concept of the agent, an entity that is enabled to move across the geographic landscape and
behave according to specified rules. This agent-based approach is thus somewhat distinct from the cell-based approach of cellular
automata. Agent-based models have been widely implemented in GIScience, for example, to study crowd management by simu-
lating the behavior of individuals within them and examining scenarios which might trigger panic or cause mass injury.
Any model is only as good as the rules and hypotheses about behavior on which it is based, and so it is unlikely that simulated
results will lead directly to a modification of the rules. It is more likely that, in the light of results, rules will be improved using
controlled experiments outside the context of the modeling. If patterns emerge that were unexpected, one might argue that scientific
knowledge has advanced, but on the other hand such patterns may be due to the specific details of the modeling, and may not
replicate anything that actually happens in the observable world.
Validation or verification of simulation is always problematic, since the results purport to represent a future that is still to come.
Hindcasting from the present day state of a system to known past states is a useful technique, or may work forward from some time in
the past. But the predictions of the model will never replicate reality perfectly, forcing the investigator to consider the level of error in
prediction that is acceptable. It is possible and indeed likely that the rules and hypotheses about social behavior that drive the model
will change in the future. In this respect, models of physical processes may be more reliable than models of social processes.
In important respects, this analytical work differs from the approaches to simulation discussed in Simulation in GIScience
section in that it is guided solely by repeated numerical simulation in the absence of any guiding hypothesis as to the most appro-
priate scale and zonal configuration at which a geographic phenomenon should manifest itself. Openshaw’s original case study
examined the 99 counties of Iowa to explore the relationship between percentage of the resident population over 65 and the
percentage of registered Republican voters. Different aggregations of elemental counties revealed different results. But what is
missing in this case is any well-defined hypothesis as to why any correlation should appear, and at what scale it should be man-
ifested. For example, if a process were to work at the individual level, and older people were more likely to vote Republican, the
hypothesis is best tested at the individual level. Conversely, the process might be ecological, in that a neighborhood in which
many residents are over 65 might attract many Republican voters, irrespective of their ages. In the latter case the appropriate scale
of analysis is that of neighborhoods, requiring their formal definition as a set of places, each comprising aggregations of elemental
finer-scale (e.g., census block) data. The general point is that we should be looking for statistics that are sensitive to the scale at which
the phenomenon is likely to manifest itself. Thus, the MAUP is not an empirical problem but rather is a theoretical requirement that
can be used to hone statistics to the explicitly geographic context in which they are applied.
A closely related fundamental problem, also well-recognized in GIScience, is the ecological fallacy, the fallacy of reasoning from
the aggregate to the individual. The fallacy already appeared in the previous paragraph, since it would be wrong to infer from
a county-level correlation that individuals over 65 tend to vote Republican. In fact, in the extreme, Openshaw’s correlations could
exist in Iowa at the county level even though no person over 65 was a registered Republican.
Transferred to today’s computationally intensive simulations of different geographic environments, this point is akin to the
distinction between virtual reality and augmented reality: the former may focus upon simulation of multiple worlds that “look right”
in impressionistic terms but without any unique constellation of features that define a known place on the Earth’s surface; while the
latter retains recognizable characteristics of unique places as a framework for additional visual features. More generally still, this
resonates with geographers” long-established preoccupations with the concept of place. GISystems require named places to be rep-
resented either as precisely located points, polylines, or polygons. Yet while these constructs are very good for establishing locational
precision, the underpinning coordinate systems such as latitude and longitude are unfamiliar to individuals who nevertheless have
excellent recall of home street address and the neighborhood in which it is located. Recently there has been much interest in a platial
approach to geographic knowledge, emphasizing named places as referents, as a more human-centric alternative to the familiar
spatial approach with its emphasis on coordinate referents. New data sources, including social media, provide a rich basis for
exploring associations of place.
help to shape hypotheses about remedial action, if it is clearly understood that those that voluntarily contribute may have quite
different attitudes, backgrounds and predilections to those that do not. This argues that GIScience methods should be used to render
big and conventional data sources robust and transparent through data hardening. The most obvious way in which this might be
achieved is by triangulation with conventional framework sources of known provenance, and documentation of any reweighting
that is necessary to further these goals. GIScience provides practical ways of surviving the big data deluge.
The observable world is of seemingly infinite complexity, and many of the attributes commonly processed in GISystems, such as
soil class or vegetation cover type, have an inherent degree of subjectivity and specificity to the geographic context in which they are
observed. The consequent uncertainty present in all geographic information means that few if any geographic datasets give the
researcher objective knowledge of the differences between the data set and the observable world. In addition to the triangulation
processes alluded to above, users must therefore also often rely on indirect measures such as map scale to understand the limitations
of the data.
Allied to the innovation of big data, data science has become of interest to GIScientists in recent years. In significant part,
techniques such as machine learning are in practice extensions of preexisting methods of data reduction and pattern detecting,
and limited effort has been devoted to rendering some more recent machine learning methods explicitly spatial. Artificial
intelligence(AI) in its general sense concerns the theory and development of computer systems to perform tasks that hitherto
were possible using human intelligence, such as visual perception and decision-making, and there has been progress in spatial
applications. It is less clear that AI applications in geography are in any sense autonomous, as implied by a stricter definition of
the field, and the predictive success of applications is often hamstrung by rather limited (human or machine) understanding of
the scope and limitations of the underpinning data. Academic programs have nevertheless been instituted in response to what is
perceived as a rapidly growing market for associated background data skills.
Prediction has been a major motivation underpinning the wider development of data science, and remarkable success has been
achieved when data mining is applied to comprehensive datasets, as with translation between languages or speech recognition.
However, where datasets are incomplete or selective with respect to spatial locations or time periods, there is evidence that the scope
of predictions may be limited to particular spaces and times. Tools of machine learning, including artificial neural nets and deep
learning can be very effective in mining successful predictions from big data. Yet despite its practical value, prediction has always
taken a back seat in science to explanation and understanding. The hidden layers of a trained neural network are difficult to interpret
within the kinds of hypothesis testing and theory confirmation that have underpinned much of science to date. It is difficult to see
how such techniques might achieve generalizability across study areas and time periods, especially when the underpinning data are
partial and piecemeal. Data driven geographies need to remain cognizant of the nature of geographic data and the structural
characteristics that underpin them.
The core organizing principles and concepts of information (as opposed to data) science encompass processes for storing and
retrieving information for analytical purposes.GIScience takes this further, thorough explicit recognition of the properties of spatial
data, in terms of topological relations and interrelatedness of spatial attributes. Successful GIScience is effectively framed in terms of
spatial and temporal transferability, transparency, and robustness.
A further defining characteristic of geographic data is the ease of disclosure of personal or sensitive personal datadterms that
have precise meanings in terms of European General Data Protection Regulationdabout human subjects. As a consequence,
geographic detail is usually the first casualty of disclosure control measures designed to protect confidentiality. This is most usually
achieved through aggregation, opening up the results of geographic analysis to the potential ecological fallacy and modifiable areal
unit effects discussed above. Recent years have seen a growth in use of computer networks to support inter- and multi-disciplinary
collaboration between individuals and teams of researchers, but the undoubted benefits do need to be weighed against the possi-
bilitydmay lead to disclosure of information about identifiable human subjects. Issues of data stewardship infrastructure are thus
strategically important alongside physical data and computational infrastructure.
Partly as a response, and as part of a wider movement to ensure correct and ethical attribution of research resources and findings,
authentication, authorization, and accounting infrastructure (AAAI) has developed to facilitate the use of computation and data in
the research process. Core data and computational e-infrastructure and high-performance networks have developed rapidly in
recent years, enabling researchers to stretch their uses of e-infrastructure and thereby develop more ambitious approaches to collab-
orative research. This has enabled use of high-volume, high data-capture rate instruments and the capability to acquire and process
extremely large data sets. GIScience is well placed to link more and more research domains and their extensive arrays of datasets, and
to synthesize their growing complexity and richness. In consequence, the desire to share, collaborate and synthesize data derived
from combining data sets is also growing.
Today, data rich GIScience is about more than analyzing the small fraction of all datasets that happen to be available as conven-
tional statistical or other open sources. For example, data assembled as a by-product of business-to-customer interactions in the
supply of goods or services account for an increasing real share of all of the (ever-increasing) data that are collected about citizens
today. Data are, arguably, the world’s greatest resource, and geographic information technologies are pivotal to ensure that data
analysis conforms to FAIR principles of being findable, accessible, interoperable, and reusable; however, in the new multi-sector
big data economy, open data are but one component of the data resources that can be accessed under FAIR principles, and
AAAI is increasingly central to effective and ethical data resource exploitation.
Conclusion
Alongside developments in computation and computer architectures, the advance of GIScience is inextricably linked to vast and
rapid changes in the data economy and the facility to access, link, and share large datasets in ways that are efficient, effective,
and uncompromizing with respect to disclosure control. Most fundamentally, however, advancement arises out of attendant
changes in scientific practice that may appear mundane but are nonetheless profound and far-reaching. Good science is relative
to what we have now, and improved understanding of data and their provenance is a necessary precursor to better analysis of spatial
distributions in today’s data- and computation-rich world.
The “geo-” prefix makes clear that GIScience is an applied science of the observable world, and that its continuing progress will
consequently be judged upon the success of its applications in that unique laboratory. Transparency of computationally rich
methods and techniques are undoubtedly strategically important to this ongoing quest, yet the experience of the last quarter century
suggests that there are rather few purely computational solutions to substantial real-world problems. The broader challenge is to
address the ontologies that govern our conception of real-world phenomena, and to undertake robust appraisal of the provenance
of data that are used to represent the world using GISystems.
The apparatus of science is in practice adaptable to the inherent vagaries of the ever-broadening range of available data and the
vicissitudes of the ways in which they are interpreted by humans. GIScience aligns fundamental principles and concepts with the
messy empirical domains of real world places and contexts. Scientific approaches to representing places will continue to benefit
from the widest possible availability of new data sources and novel applications of existing ones. An ever-broadening constituency
of researchers and end users will participate in the creation, maintenance, and evaluation of real world representations. All of this
will contribute to perhaps the most important goal of GIScience: to develop explicitly geographical representations of the accumu-
lated effects of historical and cultural processes upon unique places.
See Also: Cartographic Anxiety; Modifiable Areal Unit Problem; Quantitative Methodologies; Spatial Ontologies.
Further Reading
Anselin, L., 1995. Local indicators of spatial association – LISA. Geogr. Anal. 27 (2), 93–115.
Batty, M.J., Longley, P.A., 1994. Fractal Cities: A Geometry of Form and Function. Academic Press, San Diego, CA.
Clarke, K.C., Gaydos, L., 1998. Loose coupling a cellular automaton model and GIS: long-term growth prediction for San Francisco and Washington/Baltimore. Int. J. Geogr. Inf. Sci.
12 (7), 699–714.
Crampton, J.W., 2010. Mapping: A Critical Introduction to Cartography and GIS. Wiley-Blackwell, Chichester, UK.
Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley, Hoboken, NJ.
Goodchild, M.F., 1992. Geographical information science. Int. J. Geogr. Inf. Systems 6 (1), 31–45.
Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69 (4), 211–221.
36 Geographic Information Science and Systems
Hey, A.J.G., Tansley, S., Tolle, K.M., 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA.
Janelle, D.G., Goodchild, M.F., 2018. Territory, geographic information, and the map. In: Wuppuluri, S., Doria, F.A. (Eds.), The Map and the Territory: Exploring the Foundations of
Science, Thought and Reality. Springer, Dordrecht, pp. 609–628.
Kelleher, J.D., Tierney, B., 2018. Data Science. MIT Press, Cambridge, MA.
Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., 2015. Geographic Information Science and Systems, fourth ed. Wiley, Hoboken, NJ.
Longley, P.A., Cheshire, J.A., Singleton, A.D., 2018. Consumer Data Research. UCL Press, London.
Openshaw, S., 1983. The Modifiable Areal Unit Problem. GeoBooks, Norwich, UK.
Openshaw, S., Veneris, Y., 2003. Numerical experiments with central place theory and spatial interaction modelling. Environ. Plan. A 35 (8), 1389–1403.
Schuurman, N., 2000. Trouble in the heartland: GIS and its critics in the 1990s. Progr. Human Geogr. 24 (4), 569–590.
Shreve, R.L., 1966. Statistical law of stream numbers. J. Geol. 74, 17–37.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 46 (2), 234–240.
UKRI, 2019. E-infrastructure Roadmap. UKRI, Swindon.
Wang, S., 2016. CyberGIS and spatial data science. GeoJournal 81 (6), 965–968.
Wilson, M.W., 2017. New Lines: Critical GIS and the Trouble of the Map. University of Minnesota Press, Minneapolis.
Yates, J., 2019. UKRI Data Infrastructure Roadmap. UKRI, Swindon.
Zhang, J.X., Goodchild, M.F., 2002. Uncertainty in Geographical Information. Taylor and Francis, New York.