You are on page 1of 12

The Cartographic Journal Vol. 41 No. 3 pp.

217–228 December 2004


© The British Cartographic Society 2004

REFEREED PAPER

Alternative Visualization of Large Geospatial Datasets


Etien L. Koua and Menno-Jan Kraak
International Institute for Geoinformation Science and Earth Observation (ITC), PO Box 6, 7500 AA Enschede,
The Netherlands
Email: kraak@itc.nl
Published by Maney Publishing (c) The British Cartographic Society

Exploring large volumes of geospatial data is difficult. This paper presents an approach that combines visual and
computational analysis to make this process easier. This approach is based on the effective application of computational
algorithms, such as the Self-Organizing Map (SOM). These are used to uncover the structure, patterns, relationships and
trends in the data, and for the creation of abstractions where conventional methods may be limited. In addition, graphical
representations are applied to portray extracted patterns in a visual form that allows for better understanding of the derived
structures and possible geographical processes, and should facilitate knowledge construction.

1. INTRODUCTION techniques can be beneficial. The Self-Organizing Map


(SOM) (Kohonen, 1995) is an Artificial Neural Network
The development of visualization and exploration tools
algorithm that can support this type of integration of
to deal with complex geospatial data is one of the major
both approaches. In general, Artificial Neural Networks
research areas in the GIScience discipline. Current geo- have the ability to perform pattern recognition and classifica-
spatial data analysis techniques have their limitations, and
tion. They are especially useful in situations where the data
should be improved, extended or replaced if one aims to
volumes are large and the relationships are unclear or even
fully realize the richness of large geographic datasets that
hidden. This is due to their ability to handle noisy data
may hold implicit but difficult to discern information. Multi-
in difficult non-ideal contexts (Openshaw and Openshaw,
variate analysis and exploratory data analysis techniques
1997). Particular attention has been directed to using the
are commonly used to try to reveal the structure of such
Self-Organizing Map (SOM) as a means of organizing
datasets. However, the effectiveness of these techniques is
complex information spaces (Girardin, 1995; Chen, 1999;
often limited in the case of very large volumes of data with
Fabricant and Buttenfield, 2001). The SOM is also com-
no clear clustering structure and tendency.
monly recognized to be suitable for processing temporal
Geographic visualization and knowledge discovery
data (Guimaraes, 2000). It has been, for example applied
methods are increasingly used to try to understand processes
in speech recognition (Behme et al., 1993), industrial pro-
related to such complex data. Some visualization tools have
cess control (Alhoniemi et al., 1999) and pattern discovery
been recently developed to directly support data mining
(Ultsch and Siemon, 1990; Kaski and Kohonen, 1996;
and knowledge construction (Keim and Kriegel 1996; Mac-
Gahegan, 2000; Koua and Kraak, 2004).
Eachren et al. 1999; Gahegan 2000). For spatio-temporal
In this paper, we use the SOM as the basis for data mining
data specifically, some authors have proposed spatio-
and knowledge discovery, to process and extract patterns,
temporal modeling (Wachowicz 2000; Roddick and Lees
relationships and trends from a spatio-temporal dataset
2001) for understanding space-time concepts and the
related to the production of food (cereals) in Africa over the
modelling of abstractions, such as states, events and episodes
last 40 years. The objective is to represent and visualize
as used in a specific knowledge domain. In this case,
underlying dynamics using multiple views that simulta-
the understanding of the spatial, temporal and thematic
neously present interactions between several variables over
aspects of the knowledge domain is crucial for the represen-
the attribute space of the SOM and time. The use of these
tation of variations and structure in the spatio-temporal
techniques intends to support the exploration of the time-
phenomena.
related geographical trends and patterns, improve data
New approaches in geographic representation, spatial
analysis and allow visual change detection in such processes.
analysis and visualization are needed for effective extraction
of features in large geospatial data and for the representation
of underlying structures and processes (Gahegan, 2000;
2. DATA MINING AND KNOWLEDGE DISCOVERY FOR
Miller and Han, 2001). Data mining and knowledge discov-
UNDERSTANDING GEOGRAPHICAL PROCESSES
ery algorithms can be used for the extraction of patterns, and
visualization techniques can support the representation of One approach to analysis of a large amount of data is by
extracted patterns and processes. The combination of these using data mining and knowledge discovery methods. The
DOI: 10.1179/000870404X13283
218 The Cartographic Journal

main goal of data mining is identifying valid, novel, poten- elements (units or neurons) usually arranged on a rectangu-
tially useful and ultimately understanding patterns in data lar or hexagonal grid, where each neuron is connected to the
(Fayyad et al., 1996). Generally, three main categories input. The goal is to group nodes close together in certain
of data mining goals can be identified (Weldon, 1996): areas of the data value range. The resultant maps (SOMs)
explanatory (to explain some observed events), confirmatory are organized in such a way that similar data are mapped
(to conform a hypothesis), exploratory (to analyse data for onto the same node or to neighbouring nodes in the map.
new or unexpected relationships). Typical tasks, for which This leads to a spatial clustering of similar input patterns
data mining techniques are often used, include clustering, in neighbouring parts of the SOM and the clusters that
classification, generalization and prediction. These tech- appear on the map are themselves organized internally. This
niques vary from traditional statistics to artificial intelligence arrangement of the clusters in the map reflects the attribute
and machine learning. The most popular methods include relationships of the clusters in the input space. For example,
decision trees (tree induction), value prediction and associa- the size of the clusters (the number of nodes allotted to each
tion rules often used for classification (Miller and Han, cluster) is reflective of the frequency distribution of the
2001). patterns in the input set. Actually, the SOM uses a distribu-
Artificial Neural Networks are particularly used for explor- tion preserving property that has the ability to allocate more
atory analysis as non-linear clustering and classification tech- nodes to input patterns that appear more frequently during
niques. Neural networks such as the Self-Organizing Map, the training phase of the network configuration. The SOM
Published by Maney Publishing (c) The British Cartographic Society

are a type of neural clustering technique, and neural archi- also applies a topology preserving property, which comes
tecture using backpropagation and feedforward are neural from the fact that similar data are mapped onto the same
induction methods used for classification (supervised learn- node, or to neighbouring nodes in the map. In other words,
ing). The algorithms used in data mining are often inte- the topology of the dataset in its n-dimensional space is
grated into Knowledge Discovery in Databases (KDD), a captured by the SOM and reflected in the ordering of its
larger framework that aims at finding new knowledge from nodes. This is an important feature of the SOM that allows
large databases. In general, KDD stands for discovering and the data to be projected onto the lower dimension space,
visualizing the regularities, structures and rules from data while roughly preserving the order of the data in its original
(Miller and Han, 2001), discovering useful knowledge from space.
data (Fayyad et al., 1996) and for finding new knowledge. It Another important feature of the SOM for knowledge
consists of several generic steps, namely data pre-processing, discovery in complex datasets is the fact that it is an unsuper-
vised learning network, meaning that the training patterns
transformation (dimension reduction, projection), data
have no category information that accompany them. Unlike
mining (structure mining) and interpretation / evaluation.
supervised methods which learn to associate a set of inputs
Recent effort in data mining and KDD has provided a
with a set of outputs using a training data set for which both
window for geographic data mining and knowledge discov-
input and output are known, SOM adopts a learning strat-
ery, which has become an established field in geographic
egy where the similarity relationships between the data and
visualization (Sibley, 1988; Weijan and Fraser, 1996;
the clusters are used to classify and categorize the data. The
MacEachren et al., 1999; Gahegan et al., 2001; Liu et al.,
SOM algorithm can be useful as a knowledge discovery tool
2001; Miller and Han, 2001; Roddick and Lees, 2001). This
in database methodology, since it follows the probability
framework has been used in geospatial data exploration density function of underlying data.
(Openshaw et al., 1990; MacEachren et al., 1999; Wacho- When applied in a geographical context, the term (SOM)
wicz, 2000; Gahegan et al., 2001; Miller and Han, 2001) to map refers to the visualization of attribute space. Visual
discover unexpected correlation and causal relationships, representations can be offered by the application of the
and understand structures and patterns in complex geo- algorithm to enable easy data exploration.
graphical data. The promises inherent in the development
of data mining and knowledge discovery processes for
geospatial analysis include the ability to yield unexpected 3.2. SOM computational analysis and visualization of geospatial data
correlation and causal relationships. Since the dimensionality of the dataset is very high, it is often
We explore the SOM as a data mining tool, to extract ineffective to work in such high dimension space to search
patterns in large geospatial datasets, and represent the results for patterns. We use the SOM algorithm as a data mining
using graphical representations to support visual explora- tool to project input data into an alternative measurement
tion. In the next section, we provide a brief description of space based on similarities and relationships in the input data
the SOM algorithm, and the framework for its use for the that can aid the search for patterns. It becomes possible to
visualization of large geospatial datasets. achieve better results in such similarity space rather than
the original attribute space (Strehl and Ghosh, 2002). As
described in the previous paragraph, the SOM adapts its
3. THE SELF-ORGANIZING MAP AND THE
internal structures to structural properties of the multidi-
EXPLORATION OF LARGE GEOSPATIAL DATA mensional input such as regularities, similarities and fre-
quencies. These properties of the SOM can be used to search
3.1. The Self-Organizing Map algorithm for structures in the multidimensional input. Graphical
The Self-Organizing Map (Kohonen, 1989) is an Artificial representations are then used to enable visual data explora-
Neural Network used to map multidimensional data onto a tion allowing the user to get insight into the data, evaluate,
low-dimensional space, usually a 2D representation space. filter and map outputs. This is intended to support visual
The network consists of a number of neural processing data mining (Keim, 2002) by allowing several variables
Alternative Visualization of Large Geospatial Datasets 219

and their interactions to be inspected simultaneously, and


receive feedback from the knowledge discovery process
(Cabena et al., 1998) by means of interaction techniques
that support the process.
The first level of the computational analysis described
above provides a mechanism for extracting patterns from the
data. The output of this computational process is depicted
using graphical representations (information spaces) to
facilitate human perception and cognitive processes (Mac-
Eachren, 1995; Card et al., 1999), by offering visualizations
of the general structure of the dataset (clustering), as well as
the exploration of relationships among attributes. Several
graphical representations provide ways for representing
similarity (patterns) and relationships, including a distance
matrix representation (Figures 3a and 3d), 2D and 3D pro-
jections (Figures 3b and 3e), 2D and 3D surfaces (Figures 3c
and 3f) and component planes visualization (Figure 5 and
Published by Maney Publishing (c) The British Cartographic Society

6). They highlight different characteristics of the computa-


tional solution and integrate them with other graphics into
multiple views to allow brushing, linking, zooming, and
3D rotation for exploratory analysis and knowledge discov-
ery purposes, and enhance exploration. These information
spaces suggest and take advantage of natural environment
metaphor characteristics such as ‘near=similar, far=
Figure 1. African countries included in the study with the total popu-
different’ (MacEachren et al., 1999), which is epitomized lation estimates for 2000. The country codes displayed as labels are used
by Tobler’s first law of geography (Tobler, 1970). This is in the SOM visualizations as geographic reference
an example of spatialization, an approach discussed more
generally by (Fabrikant and Skupin, 2003).
We integrate the graphical representations mentioned population (total population, male and female population,
above and maps to represent the attribute space. Using rural population, urban population, agricultural population,
multiple views, interactions between several variables can non-agricultural population) for 48 African countries. Find-
be presented simultaneously over the space of the SOM, ing patterns and understanding the variations in the food
maps and parallel coordinate plots. This can emphasize vi- production in such a large dataset can be very complex. For
sual change detection and the monitoring of the variability example, understanding how aspects of population growth,
through the attribute space. These alternative and different climate or other factors have impacted the production
views on the data can help stimulate the visual thinking of cereals, or how these factors relate to famine situations
process that is characteristic for visual exploration. Four
in parts of the region, requires a clear depiction of the
goals of the exploration are emphasized:
processes (Turner et al., 1993; Turner and Schwarz, 1980).
• Pattern discovery (through similarity representations) The exploration of this dataset using the proposed
• Correlations and relationships exploration for hypoth- approach and techniques intends to allow the analyst to
esis generation formulate hypotheses in the process of understanding
• Exploration of the distribution of the dataset on the the geographical patterns of shortages, famine and socio-
map economic changes. Further exploration and explanation
• Detection of irregularities in the data of the results, might need domain expertise in an agri-
culture-related discipline. Additionally, factors related to
In the remaining sections, an application of the SOM for
governmental policies and political instability, and other
a large geospatial dataset is explored, and different visualiza-
environmental factors such as droughts should be closely
tion techniques are used to illustrate the exploration of
analysed in relation to the patterns found in the exploration
(potential) patterns.
of the dataset. Some traditional maps of the African rice,
maize and millet production for selected years are presented
in Figure 2.
4. EXPLORATION OF A DATASET ON FOOD
PRODUCTION IN AFRICA
4.2. SOM based visualization
4.1 The data
Based on the SOM output, a number of visualization tech-
An application of the method described in the previous niques can be explored. Non-linear dependencies between
section is applied to the annual food and agriculture statistics variables can be presented using three main categories of
in Africa collected from 1961 to 2002. This dataset is pro- visualization and exploration techniques:
vided by FAO for all African countries (see Figure 1) and
consists the production in metric ton for three main cereals • Visualization of the overall structure of the dataset,
(rice, maize and millet) for the last 42 years and also it’s clustering, patterns (similarities) and irregularities
includes socio-economic indicators such as the countries’ (such as important gaps). This includes a similarity
220 The Cartographic Journal
Published by Maney Publishing (c) The British Cartographic Society

Figure 2. Some maps of the production of the three cereals (rice, maize and millet) for selected years (1961, 1991 and 2002). The entire period is 42
years (1961 to 2002)

representation (Figures 3a and 3d), projections in two 4.2.1. Visualization of the overall structure of the dataset: similarity
or three-dimensional space (Figures 3b and 3e) and 2D representation
and 3D surfaces (Figures 3c and 3f). The proposed approach offers a number of visualizations
• Exploration of correlations and relationships. This is pri- to show the clustering structure and similarity (patterns).
marily based on component plane displays (Figures 4b, These techniques use a distance matrix to show distances
Figure 5 and Figure 6) in multiple views and allows for between neighbouring SOM network units. The most
the visualization of very detailed information that can widely distance matrix technique used is the U-matrix
support hypothesis generation. (Ultsch and Siemon, 1990). It contains the distances from
• Visualization of temporal patterns. Examples are each unit center to all of its neighbours. The neurons of
ordered component displays and trajectories (Figure 7). the SOM network are represented by hexagonal cells (see
Examples of techniques for the representation of Figure 3a). The distance between the adjacent neurons
spatio-temporal dynamics will be given in the next is calculated and presented with different colourings. A dark
paragraphs. colouring between the neurons corresponds to a large
Alternative Visualization of Large Geospatial Datasets 221
Published by Maney Publishing (c) The British Cartographic Society

Figure 3. Similarity matrix representation of the production of rice (left) and maize (right) all years combined. The distance representation (a) for rice
and (d) for maize show the similarity between the countries for the production of the two cereals. The same information is show in projection (b) and
(e) and surface plots (c) and (f). The 2D and 3D projections, as well as the 2D and 3D surfaces can be interactively manipulated with zooming, rotation,
and other motion related such as walkthrough (move through) to explore the level of detail of the information spaces, and enhance exploration

distance and thus represents a gap between the values in the An example of a similarity representation is depicted
input space. A light colouring between the neurons signifies in Figure 3 and reveals the clustering structure of the dataset
that the vectors are close to each other in the input space. explored in this experiment. This shows similarity among
Thus light areas represent clusters and dark areas represent countries for the production of rice and maize for all pro-
cluster separators (a gap between the values in the input duction years. In Figure 3, all the variables of the dataset
space). This representation can be used to visualize the are included to analyse the patterns of the production of
structure of the input space and to get an impression of rice and maize. The position of the map units (countries) is
otherwise invisible structures in a multidimensional data relative to the overall similarity among countries, according
space. This distance matrix representation shows not only to the multivariate attributes (all the 40 years of production)
the values at map units but also the distances between map for rice production.
units. In Figure 3a, the structure of the data set is visualized The countries are arranged on the SOM grid according to
in a distance matrix representation. In contrast to other their production level for rice and maize for all years. This is
projection methods in general, the SOM does not try to a similarity representation that depicts the general structure
preserve the distances directly but rather the relations or of the data for all attributes. For example, Nigeria, Egypt,
local structure of the input data. While the distance matrix Madagascar and Côte d’Ivoire are the largest producers of
representation is a good method for visualizing clusters, it rice. Similarity is observed among a number of other coun-
does not provide a very clear picture of the overall shape of tries by the proximity in their position on the map. This is
the data space because the visualization is tied to the SOM shown by the distance between their position on the map
grid. Alternative representations to the distance matrix relative to the other countries and the colour of the cells
representation can be used. 2D and 3D projections (using separating them from the rest. There is also some gap in
projection methods such as the Sammon’s mapping and the production of these four countries. This is shown by
PCA to project SOM results), 2D and 3D surface plots, and the red colour cells separating the map units representing
component planes. These techniques will be described in the these countries. The production of maize is relatively
next paragraphs. In Figure 3b, the projection of the SOM important in countries, such as Nigeria, Egypt, Kenya,
offers a view of the clustering of the data with data items Ethiopia, Zimbabwe, Tanzania, Zambia and Democratic
depicted as coloured nodes. Similar data items are grouped Republic of Congo.
together with the same type or colour of markers. Size,
position and colour of markers can be used to depict the
relationships between the data items. This gives an informa- 4.2.2. Exploration of correlations and relationships
tive picture of the global shape and the overall smoothness The spatial and temporal attributes can be explored using
of the SOM in 2D or 3D space. the component planes visualization. The component planes
222 The Cartographic Journal
Published by Maney Publishing (c) The British Cartographic Society

Figure 4. Component display. Detailed exploration of the dataset using the SOM component visualization: All the components can be displayed to
reveal the relationships between the variables and the spatial locations (countries) in (b). The variations in value (colour) indicate the relationship
between the countries for the attribute represented in the plane. Selected components related to a specific hypothesis can be further explored and to
facilitate visual recognition of relationships among selected variables. Geographic maps of components corresponding to hypothesis found in the
exploration can be displayed (c). A description of the different axes used in the component plane display is described in (d), as well as indications for
their reading in a text box (e)

(Figure 4) show the values of the map elements for different space. Comparatively with the maps, patterns and relation-
attributes and how each input vector varies over the space of ships among all the attributes can be easily examined in
the SOM units or hexagonal grid (here representing coun- a single visual representation using the SOM component
tries). Unlike standard choropleth maps, the position of the planes visualization (Figure 4b). Just like with a collection of
map units (which is the same for all displays) is determined maps or processing maps (Bertin, 1983), representing one
during the training of the network, according to the charac- attribute at the overall level, defining regions and geographi-
teristics of the data samples. A cell here can represent one cal correlations, the component plane display answers the
or several political units according to the similarity in the elementary question: at a given location, what is there in a
data. Two variables that are correlated will be represented by given state? This results in the perception of similarity help
similar displays. This can be appropriate for viewing and to determine geographical correlations, or define regions of
exploring correlations and relationships in the input data a particular characteristic. There are easy to read and provide
Alternative Visualization of Large Geospatial Datasets 223

an immediate answer to questions and are useful for rela- hypotheses. Geographic maps can be made to represent the
tionships involving the entire dataset. This is an exploratory result of this reasoning process for better geographical
process that does not need initial hypothesis to represent exploration and comparison (see Figure 4c).
patterns and facilitate visual comparison, and perception
since visual recognition of the elements of the graphic
4.2.3. Exploration of spatio-temporal patterns
and the relationships among them can be easily compared
with colour over the attributes for different locations (see Spatio-temporal representations are an important aspect of
Figure 4 for illustration of the reading of the component research in Geographic Information Science. A large part
plane). of the research effort in this area has been directed to data
Since the SOM represents the similarity clustering of the models. A representative of this approach is Peuquet (1994)
multivariate attributes, the visual representation becomes who proposed TEMPEST (Temporal Geographic Informa-
more accessible and easy to use for exploratory analyses, tion System) to integrate space and time data models in GIS.
to help in identifying causes and correlates (Cromley and In this approach, the primary organization is based on time
McLafferty, 2002). This kind of spatial clustering have to represent processes by time line to show changes that
been important hypothesis-generating tools in research occur. This approach suggests the key notions of location,
and policy-making (Croner et al., 1992). In Figure 4b, all time and object (where, when and what), the basic charac-
the components are displayed and a selection of one example teristics of geospatial data. On the representational level,
two main models exist: models based on space representa-
Published by Maney Publishing (c) The British Cartographic Society

attribute is made more visible for the analysis, with the name
and position of the map units (countries). From view in tion and models focusing on time representation of data.
Figures 4b, 5a, 5b and 5c, correlations and relationships The traditional approach in GIS focuses on spatial repre-
can be explored, and hypotheses can be made. To enhance sentation of entities based on the geometric and thematic
visual detection of the relationships and correlations, the properties. In such models, the main concept is the absolute
components can be ordered so that variables that are corre- view of space, and time is implicitly represented by changes
lated are displayed next to each other. The kind of visual that occur over the space. Time-based models focus on time
representation (imagery cues) provided in the SOM compo- representation as a fourth dimension or a parameter in the
nent planes visualization can be used as an effective tool to data, in which events that occur can be located. This time
visually detect correlations among operating variables in a structure of the representation has often been organized
large volume of multivariate data, facilitates visual detection according to intervals between events, points of occurrence
and has an impact on knowledge construction (Keller and of events, or both intervals and key points (Wachowicz,
Keller, 1992). Individual component planes are shown 2000). The greater promise of spatio-temporal GIS resides
in subplots linked together through similar position. In ultimately in their capacity to examine casual relationships
each component plane, a particular map unit (hexagon) and their effects for exploration, explanation, prediction
in the SOM is always in the same place and the value of and planning (Peuquet, 1994). In general computer-based
one variable is shown using colour-coding (see notes in animation (Dorling and Openshaw, 1992) and visualization
Figure 4e on how to read the component planes visualiza- techniques rely on three strategies for depicting change:
tion). By using the position and colour (value), relationships sequence of discrete displays or snapshots at various times,
between different map units can be easily explored. This can dynamically and interactively modifying display elements as
be used to visualize the variations among the attributes of time goes along, and depicting change in specific locations
the input data (Figure 4 and Figure 5). Further analysis can or over the entire region. This was explored in the triad
be conducted by searching for correlations and interactions framework (Peuquet, 1994) in which information is stored
between different variables. This visualization reveals very relating to where (location-based view), what (object-based
detailed information. The actual values can be returned for view) and when (time based view).
every component (see the selected component display for The main representational technique for spatio-temporal
agricultural population in 2001 in Figure 6), and allows patterns using the SOM is the visualization of component
comparison between correlated attributes and places (coun- planes. Since the component planes can be displayed and
tries). For example, if we consider attributes such as total ordered in sequence to represent time related attributes,
population, rural population, urban population, agricultural they can be used to relate attributes to locations, times and
population and non-agricultural population, we can easily events, which are the primary entities in spatio-temporal
view relationships between them (see Figure 6). Figure 6 representation (Galton, 2001). This can allow exploration of
shows important changes in population patterns for Nigeria the interdependencies among the various attributes over
over the years. It shows the urbanization trend. Relatively time for different locations. The SOM grid can be adjusted
more growth is observed in the urban population from the to a set of variables of interest. In Figure 6, only the
80s (see green circles around Nigeria in Figure 6). This can demographic variables were used to represent population
be partly due to rural exodus, a persistent phenomenon in patterns. The position of the map units is relative to these
the 80s and 90s in most African countries. The agricultural particular variables. Different spaces (SOM grids) can be
population follows the reverse effect, dropping dramatically used to explore sets of variables in each individual space.
in 2001 (see Figure 6). This may be one of the reasons of For example, the analysis of population patterns in Figure 6
the decline in the production of rice in this country during can easily be related to the production of the cereals in
recent years (see Figure 6). New knowledge can be unear- Figure 5, by relating the value attached to the map units
thed through this process of exploration, which can be (countries) in each grid. For comparison purposes between
followed by the identification of associations between the different years, the normalized values of the vectors
attributes, and finally the formulation and ultimate testing of were used. The actual values of the vectors can be returned
224 The Cartographic Journal
Published by Maney Publishing (c) The British Cartographic Society

Figure 5. SOM component planes visualization for the production of rice (a), maize (b), and millet (c). The values of the production for the different
years were normalized between [0, 1]. A detailed component showing the production of rice is provided in 2002 and a geographic index in (d)
Alternative Visualization of Large Geospatial Datasets 225
Published by Maney Publishing (c) The British Cartographic Society

Figure 6. Population changes from 1961 to 2001. The figure shows few selected years, 1961, 1971, 1981, 1991, 2001 for rural, urban, agriculture
and non-agriculture population. The values were normalized between 0 and 1 for comparison. Information of the individual components can be
retrieved with the legend corresponding to the actual value in the input data space (see example of display on right side for the agricultural population
in 2001)

when needed for detailed analysis. The idea is to compare a times (see Figures 5 and 6) can be compared in multiple
sequence of patterns for a set of locations over time to deter- views. For example, in Figure 6 the urban population
mine how factors affect changes. The different component growth and agricultural population changes between 1961
displays representing the spatial distribution at different and 2001 can be explored to observe population dynamics
226 The Cartographic Journal
Published by Maney Publishing (c) The British Cartographic Society

Figure 7. Scatter plots and trajectories of the selected data samples. Production of rice, in Nigeria in scatter plot (a), and in trajectories (b). Production
of maize in Zimbabwe, in scatter plot (c), and in trajectories (d)

for Ethiopia’s rural population. There has been more growth which makes it possible to track the process dynamics and
in Ethiopia’s rural population over the years than there enable interpretation of the temporal relations among pat-
has been for the urban population (see Figure 6 and black terns at distinct levels (Guimaraes, 2000). A display of the
circles for Ethiopia for the selected years displayed). The process as trajectory linking the different moments in time
agricultural population for this country has had an important can help visualize the process dynamics in the data. This
growth showing more concentration of the agricultural visualization can be used to study the behaviour of a phe-
population in rural areas. This is a contrast to the frequent nomenon over time. To illustrate the use of trajectories
food shortages and famine in this country. This kind of in the analysis of the behaviour of a process, a times-series
exploratory analysis can be performed to include other extracted from the dataset explored in the experiment, and
factors that may play in the geographical process at study. related to the production of rice in Nigeria and maize in
This situation of Ethiopia can be due to other environmental Zimbabwe over the last 40 years, is considered. For clarity, a
factors such as droughts, rainfall and land degradation. simple view of the data samples selected using scatter plots is
The component plane visualization is shown in Figure 5 provided (see Figures 7a and 7c).
for the production of rice, maize and millet over the years. A trajectory of these productions in the two countries is
Information on the variations in the production of these presented in Figures 7b and 7d.
cereals over the years can be revealed, as well as other inter- The visualization of the trajectory in Figures 7b and 7d
actions between the different attributes. For example it is relates the different time states (years of production) on top
very easy see that Nigeria started to increase its production of a SOM component display representing a clustering of
of rice in the 80s and has suffered from important decrease in the years of production. This reveals the different states of
the production in recent years. Zimbabwe was one of the the production, relates values and similarities between the
largest producers of Maize in the 70s and 80s, but has had different years of production of rice in Nigeria (Figure 7b)
a dramatic decease in production the last few years. This and maize in Zimbabwe (Figure 7d). In this SOM compo-
kind of visual representation (imagery cues) used with the nent plane, a clustering of the years of production is shown
component planes can facilitate visual detection, and has
and one can easily see similarity and differences between
an impact on knowledge construction (Keller and Keller,
the different years of production. This provides an easy way
1992). As such, the SOM can be used as an effective correla-
to visualize the production process: gaps between years of
tion-hunting tool among operating variables. Modelling
production, changes in production levels, similarity among
and prediction can be made possible using the SOM as a
the years of production and patterns of production for differ-
nonlinear regression (Alhoniemi et al., 1999).
ent countries. For example, the years where the production
of rice in Nigeria was highest can be easily seen (1989, 1991,
Visualization of trajectories 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2002) and
The different stages of the process of mapping the data on compared to other years in-between where there was a drop
the SOM can be visualized as a trajectory on the SOM grid, (in 1990, 1994, 1995 and 2001). This can prompt the
Alternative Visualization of Large Geospatial Datasets 227

analyst to search for more patterns for these years to under- attributes can be used in parallel with maps to represent the
stand these changes. The production of maize in Zimbabwe overall patterns in the data. The proposed SOM visualiza-
has apparently had few bad years every decade. This explains tions are not necessarily better than traditional maps and
why the trajectory shows constantly a back and forth process diagrams, but especially when one deals with a multitude
from the high values to the low values. The path linking the of variables, they offer alternative and different views on the
years reveals the variability in the production over the years. data than one is used to or might expect, and as such stimu-
Trajectories can be projected on top of all component planes late the visual thinking process that is characteristic of visual
to compare patterns in the productions for the different exploration.
countries. To enhance exploration and provide more flexibility
A number of spatio-temporal representation techniques and control for spatial analysis purposes, a user interface is
have been based on such visualizations of paths or trajec- being developed to integrate the SOM representations
tories in recent years. Some recent work have been based into a multiple views environment linking other views, such
on the space-time cube concept (Hägerstrand, 1970; as parallel coordinate plots, and the maps. Interaction is
Hägerstrand, 1982) for the representation of geospatial central in this design. A number of interaction techniques
processes (Andrienko et al., 2003; Kraak, 2003). are provided in the graphics and allow for different
viewpoints, rotation, panning, brushing, zooming and
motion-related interactions.
Published by Maney Publishing (c) The British Cartographic Society

5. CONCLUSION A usability project is currently being planned to test the


In this paper, an approach has been presented that combines effectiveness of this approach.
visual and computational analysis techniques to deal with
large volumes of geospatial data. The approach is based
on the effective application of computational algorithms to REFERENCES
extract patterns and relationships in geospatial data, and Alhoniemi, E., J. Hollmen, O. Simula and J. Vesanto (1999). ‘Process
the visual representation of derived information, focusing on monitoring and modeling using the Self-Organizing Map’,
the effective use of visualization to facilitate knowledge Integrated Computer-Aided Engineering, 6, 3–14.
construction. An application with a large spatio-temporal Andrienko, N., G. Andrienko and P. Gatalsky (2003). Visual Data
Exploration Using Space–Time Cube, International Cartographic
dataset related to the production of three main cereals in Conference, Durban, South Africa.
Africa was explored. The Self-Organizing Map (SOM) algo- Behme, H., W. D. Brandt and H. W. Strube (1993). Speech Recogni-
rithm was used to uncover the structure, patterns, relation- tion by Hierarchical Segment Classification, International
ships and trends in this dataset. Graphical representations of Conference on Artificial Neural Networks (ICANN 93), Springer
the SOM output, including a distance matrix representation, Verlag, Amsterdam.
Bertin, J. (1983). Semiology of Graphics, University of Wisconsin
projections, and component planes visualization, were used Press, Madison, WI.
to portray extracted patterns in a visual form that can allow Cabena, P., P. Hadjnian, R. Stadler, J. Verhees and Z. Alessandro
for better understanding of the derived structures and the (1998). Discovering Data Mining: From Concept to Implemen-
geographical processes. Some techniques to specifically tation, Prentice Hall, New Jersey.
Card, S. K., J. D. Mackinlay and B. Shneiderman (1999). Readings in
address temporal representations were explored.
Information Visualization. Using Vision to Think, Morgan
We used ordered component planes to simultaneously Kaufmann Publishers, San Francisco.
present interactions between time vectors over the space of Chen, C. (1999). Information visualization and Virtual
the SOM. This was to emphasize visual change detection Environments, Springer-Verlag, London.
and the monitoring of the variability through the attribute Cromley, E. K. and S. L. McLafferty (2002). GIS and Public Health,
The Guilford Press, New York.
space. A visualization of trajectories was used to understand Croner, C., L. Pickle, D. Wolf and A. White (1992). ‘A GIS
space–time dynamics in the data. One of the advantages of approach to hypothesis generation in epidemiology’, in ASPRS/
the SOM is that the algorithm is fast and effective for extrac- ACSM Technical Papers, vol. 3, pp. 275–83, Voss, Washinton,
tion of patterns and relationships in very large datasets. DC.
Based on a similarity analysis, the algorithm was found to Dorling, D. and S. Openshaw (1992). ‘Using computer animation to
visualize space–time patterns’, Environment and Planning B:
be effective in searching for correlations among operating Planning and Design, 19, 639–50.
variables. This can be achieved using the SOM component Fabricant, S. I. and B. Buttenfield (2001). ‘Formalizing semantic spaces
planes visualization that allows the understanding of pro- for information access’, Annals of the Association of American
cesses through visual representation, allowing several vari- Geographers, 91, 263–80.
Fabrikant, S. I. and A. Skupin (2003). Cognitively Plausible
ables and their interactions to be inspected simultaneously. Information Visualization. Exploring GeoVisualization, ed.
Patterns, relationships, irregularities and distributions can be by M. J. Kraak, Elsevier, Amsterdam.
effectively visualized. This method provides opportunities to Fayyad, U., G. Piatetsky-Shapiro and P. Smyth (1996). ‘From data min-
improve geographical analysis and to support exploration ing to knowledge discovery in databases’, Artificial Intelligence
and knowledge discovery in the context of large geospatial Magazine, 17, 37–54.
Gahegan, M. (2000). ‘On the application of inductive machine learning
datasets. tools to geographical analysis’, Geographical Analysis, 32,
One of the essential objectives in this work has been to 113–39.
find alternatives to explore patterns and relationships in the Gahegan, M., M. Harrover, T. M. Rhyne and M. Wachowicz (2001).
data using similarity-based information spaces. Another goal ‘The integration of Geographic Visualization with Databases, Data
mining, Knowledge Discovery Construction and Geocomputation’,
has been to link the non-geographic information spaces Cartography and Geographic Information Science, 28, 29–44.
to the original geographic space. In this respect, the SOM Galton, A. (2001). ‘Space, Time and the Representation of
outcome (similarity matrix) representing the multivariate Geographical Reality’, TOPOI, 20, 173–87.
228 The Cartographic Journal

Girardin, L. (1995). Mapping the virtual geography of the World discovery in databases methods’, International Journal of
Wide Web, Fifth International World Wide Web Conference, Paris, Geographical Information Science, 13, 311–34.
France. Miller, H. J. and J. Han (2001). Geographic Data Mining and
Guimaraes, G. (2000). ‘Temporal knowledge discovery with Self- Knowledge Discovery, Taylor and Francis, London.
Organizing Neural Networks’, International Journal of Openshaw, S., A. Cross and M. Charlton (1990). ‘Building a prototype
Computer Science and Systems. geographical correlates machine’, International Journal of
Hägerstrand, T. (1970). ‘What about people in Regional Science?’ Geographical Information Systems, 4, 297–312.
Papers of the Regional Science Association, 24, 7–21. Openshaw, S. and C. Openshaw (1997). Artificial Intelligence in
Hägerstrand, T. (1982). ‘Diorama, path and project’, Tijdschrift voor Geography, Chichester, John Wiley & Sons.
Economische en SocialeGeographie, 73, 323–39. Peuquet, D. J. (1994). ‘It’s About Time: A Conceptual Framework
Kaski, S. and T. Kohonen (1996). Exploratory data analysis by for the Representation of Temporal Dynamics in Geographic
the self-Organizing Map: Structure of welfare and poverty in Information Systems’, Annals of the Association of American
the world, Third International Conference on Neural Networks Geographers, 84, 441–61.
in the Capital market, London, England. Roddick, J. F. and B. G. Lees (2001). ‘Paradigms for Spatial and
Keim, D. and H. Kriegel (1996). ‘Visualization technique for mining Spatio-Temporal Data Mining’, in Geographic Data Mining and
large database’, Journal of Computational and Graphical Knowledge Discovery, ed. by H. J. Miller and J. Han, pp. 33–49,
Statistics, 8, 923–38. Taylor and Francis, London.
Keim, D. A. (2002). ‘Information Visualization and Visual Data Sibley, D. (1988). Spatial Applications of Exploratory Data
Mining’, IEEE Transactions on Visualization and Computer Analysis, Geo Books, Norwich.
Graphics, 7, 100–07. Strehl, A. and J. Ghosh (2002). ‘Relationship-based clustering and visu-
alization for multidimensional data mining’, INFOMS Journal on
Keller, P. and M. Keller (1992). Visual Clues: Practical Data
Published by Maney Publishing (c) The British Cartographic Society

Computing, 0, 1–23.
Visualization, IEEE Computer Scociety Press, Los Alamitos, CA.
Tobler, W. (1970). ‘A Computer Movie Simulating Urban Growth in
Kohonen, T. (1989). Self-Organization and Associative Memory,
the Detroit Region’, Economic Geography, 46, 234–40.
Spring-Verlag.
Turner, B. L., G. Hyden and R. Kates (1993). Population Growth
Kohonen, T. (1995). Self-Organizing Maps, Spring-Verlag.
and Agricultural Change in Africa, University Press of Florida,
Koua, E. L. and M. J. Kraak (2004). ‘Evaluating Self-organizing Maps Gainesville.
for Geovisualization’, in Exploring Geovisualization, ed. by J. Turner, B. L. and H. Schwarz (1980). Trends and Interrelatonships
Dykes, A. M. MacEachren and M. J. Kraak, Elsevier, Amsterdam. in Food, Population, and Energy in Eastern Africa: A Prelimi-
Kraak, M. J. (2003). The Space–Time Cube Revisited from a nary Analysis, Clark University.
Geovisualization Perspective, 21st International Cartographic Ultsch, A. and H. Siemon (1990). Kohonen’s Self-Organizing
Conference (ICC 03), Durban, South Africa. Feature Maps for Exploratory Data Analysis, Proceedings
Liu, W., S. Gopal and C. Woodcock (2001). ‘Spatial data mining International Neural Network Conference INNC’90P, Dordrecht,
for classification, visualization and interpretation with ARTMAP The Netherlands.
Neural Network’, in Geographic Data Mining and Knowledge Wachowicz, M. (2000). ‘The role of geographic visualization and
Discovery, ed. by H. J. Miller and J. Han, Taylor and Francis, knowledge discovery in spatio-temporal modeling’, Publications
London. on Geodesy, 47, 27–35.
MacEachren, A. M. (1995). How Maps Work: Representation, Weijan, W. and D. Fraser (1996). ‘Spatial and temporal Classification
Visualization, and Design, The Guilford Press, New York. with Multiple Self-Organizing Maps’, Society of Photo-
MacEachren, A. M., M. Wachowicz, R. Edsall, D. Haug and R. Masters optical instrumentation, 2955, 307–14.
(1999). ‘Constructing knowledge from multivariate spatiotemporal Weldon, J., L. (1996). ‘Data mining and visualization’, Database
data: integrating geographical visualization with knowledge Programming and Design, 9.

You might also like