You are on page 1of 7

Chapter 1

A Gentle Introduction to Spatiotemporal


Data Mining

Abstract Spatiotemporal data mining refers to the extraction of knowledge, reg-


ularly repeating relationships, and interesting patterns from data with spatial and
temporal aspects. In recent years, many spatiotemporal frequent pattern mining
algorithms were developed for spatiotemporal event instances represented by a
series of region objects that evolves over time. These algorithms focus on the
discovery of spatiotemporal co-occurrence patterns and event sequences by inspect-
ing the spatiotemporal overlap and follow relationships. Before moving onto these
relationships, we will demonstrate different types of spatiotemporal knowledge to
place the relationships and methods in the greater context. This chapter provides
a bird-eye view on the output of spatiotemporal data mining techniques in the
literature, gives rationale for mining spatiotemporal patterns from evolving regions,
and explains the challenges of mining patterns from evolving region data.

The rapid advancements in satellite imagery technology (NASA’s SDO [93],


MODIS Terra and Aqua [107]), GPS enabled devices, sensor networks, Internet
of things, location-based web services (Google Maps, Uber, Lyft, tracking services
from delivery companies), and social networks (Facebook, Twitter, Swarm) caused
a proliferation of massive spatiotemporal data sets in the last two decades. Many
consumer-oriented applications such as social networks, location-based targeted
advertising, mobile routing services, ride sharing applications consume and gen-
erate spatiotemporal location data [100]. Furthermore, there are many massive
spatiotemporal data repositories generated by scientific resources, either through
observation or simulation. Some example phenomena in these spatiotemporal data
repositories include solar events [108], migrating animals [20], and meteorological
phenomena [128].
The explosive growth in spatiotemporal data as well as the emergence of
new technologies emphasize the need for automated discovery of spatiotemporal
knowledge. One of the very interesting knowledge discovery tasks is spatiotemporal
data mining from trajectory data. Discovering spatiotemporal knowledge from

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2018 1


B. Aydin, R. A. Angryk, Spatiotemporal Frequent Pattern Mining
from Evolving Region Trajectories, SpringerBriefs in Computer Science,
https://doi.org/10.1007/978-3-319-99873-2_1
2 1 A Gentle Introduction to Spatiotemporal Data Mining

trajectories comes in different forms such as destination and future route prediction
from personal movement data [30], real-time monitoring of water quality using
trajectories of live fish [71], analyzing the trajectories of migrating birds [111],
searching for similar trajectories in spatial networks [118], or understanding the
traffic flow using trajectories in road networks [86].
Discovering interesting, but implicit patterns from spatiotemporal datasets is
crucial for many scientific domains such as astronomy [57, 74], ecology [130],
meteorology [52], geophysics [105], and criminology [127]. The ever-growing
nature of data being generated and collected from various scientific sources makes
the data-driven knowledge discovery process very challenging to the researchers in
these fields. The manually performed data analysis tasks are no longer feasible with
the volume and velocity of massive spatiotemporal datasets. Thus, automated dis-
covery of relevant information from spatiotemporal datasets is important for many
organizations that employs these datasets in their decision making processes [109].

1.1 Types of Spatiotemporal Knowledge

There are eight categories of the spatiotemporal knowledge discovery described


by Abraham et al. in [1], Roddick et al. in [102], and Shekhar et al. in [109]
are: outlier, association (coupling), generalization (summarization), prediction,
clustering (partitioning), hotspot, evolution rule (change), and meta-rule. Table 1.1
shows the descriptions of these knowledge types in detail with the example data
mining applications in the literature. The tasks in frequent pattern discovery from
spatiotemporal data require mining of multiple types of knowledge from the above-
mentioned categories [37]. The examples of frequently occurring spatiotemporal
patterns can be seen in various scientific fields such as material science, epidemi-
ology, biology, meteorology, ecology, and astronomy [52, 105, 110, 125, 127, 130].
For instance, identification of anomalous moving objects (outlier detection) can be
used in ecology for detecting outliers in bird migration. Another example is the
spatiotemporal hotspot detection, which can be used for understanding the dynamics
of epidemics in a geographic region.
The spatiotemporal frequent patterns that will be described throughout this book
are related to finding the relationships between different event types. These patterns
fall under the category of spatiotemporal associations (couplings). The frequent
patterns in this book are formed by a set or a series of event types (also referred to as
feature types), whose instances frequently satisfy a spatiotemporal predicate defined
for evolving regions. The resulting spatiotemporal patterns signify the relationships
among the different event types and their strength in the datasets.
1.2 Motivation and Challenges 3

Table 1.1 Types of spatiotemporal knowledge


Type Description Example
Outlier Spatiotemporal objects whose Identification of anomalous
non-spatiotemporal attributes moving objects [41],
significantly differ from those of discovering flow anomalies
other objects in its spatiotemporal in spatial networks [61]
neighborhood
Association Frequent patterns and association Discovering co-occurrence
(couplings in rules formed by feature types, patterns [94], mining
[109]) where instances of participating spatiotemporal sequential
types satisfies a complex or simple patterns [56]
spatiotemporal predicate [124]
Generalization Process of data aggregation created Summarization of network
(summarization in using concept hierarchies to create a trajectories in K-primary
[109]) compact representation of corridors [39]
spatiotemporal data [102, 109]
Prediction Learning models that can predict a Dynamic spatiotemporal
target variable dependent on models with Bayesian
spatiotemporal explanatory hierarchical framework
variables [109]. When the target [33], spatiotemporal
variable is categorical, the task is autoregressive regression
also referred to as classification, [33]
otherwise called regression.
Clusters Task of grouping similar data items Spatiotemporal event
(partitioning in based on their spatial, temporal, or clustering [18], trajectory
[109]) spatiotemporal attributes [63] data partitioning based on
their similarity [38]
Hotspot Special clusters (or regions) where Discovery of outbreaks
an attribute or the number of epidemic diseases [115]
spatiotemporal objects are
unexpectedly higher within
particular time intervals [109]
Evolution rule Explicit spatiotemporal evolution Identification of spatial
(spatiotemporal actions (variations in spatial and changes between snapshots
change in [109]) temporal footprints), which a using raster-based spatial
particular set of objects frequently footprints [58],
performs [35]. spatiotemporal volume
change patterns [66]
Meta-rule Rules derived from rules obtained Tracking the differences
by performing data mining on a set between spatiotemporal
of discovered knowledge instead of association rules that
datasets [102] change over different
datasets [102]

1.2 Motivation and Challenges

The spatiotemporal frequent pattern mining can be useful for the verification and
prediction of scientific phenomena in a broad range of scientific fields including
meteorology, geophysics, epidemiology, and astronomy [37]. The discovered spa-
tiotemporal patterns can be used for modeling various scientific phenomena (e.g.,
4 1 A Gentle Introduction to Spatiotemporal Data Mining

tornadoes, propagation of epidemics, clouds). These patterns can be utilized for


performing large-scale verification of current knowledge, as well as the prediction of
unknown spatiotemporal relationships among different event types (e.g., predicting
the spread of epidemics such as cholera, malaria, and West Nile virus [65],
verification of hurricane landfall precipitation models [36], discovery of the patterns
in wildlife migration [40], or prediction of blastocyst formation [132]). We present
three application domains where spatiotemporal frequent pattern mining can be
used for verifying, predicting, or potentially discovering the characteristics of
spatiotemporal relationships.

1.2.1 Solar Physics

One important application area for spatiotemporal frequent pattern mining is


the space weather prediction. Solar physics researchers entered the big data era
with the launch of NASA’s Solar Dynamics Observatory (SDO) mission, which
captures approximately 70,000 high resolution images every day, and generates 0.55
petabytes of raster data each year [74]. In addition to image data, many software
modules continuously work on SDO’s image data, to detect the locations various
solar events. The detected solar events can be considered as vector-based objects
with spatial and temporal attributes [57].
A large-scale solar image dataset with labeled regions was published in [108],
and the tracking and interpolation algorithms were introduced in [19, 62] (See
Fig. 1.1 for two tracked coronal hole instances). The solar event tracking algorithm
uses the locations and corresponding image parameters [108] for linking the
polygon-based evolving regions. Then, it creates spatiotemporal trajectory objects
with extended geometric representations. The interpolation algorithms help fill the
gaps in the trajectory data by estimating the locations of the solar events [19].
In essence, there is an abundance of vector-based solar event data, which is in the
form of spatiotemporal trajectories of continuously evolving regions.
Spatiotemporal patterns frequently transpire among solar events such as active
regions, flares, and sunspots. Identifying these patterns appearing on the Sun can
help us better understand the implicit spatial and temporal relationships among solar
event types, and eventually lead to better modeling and forecasting of important

Fig. 1.1 Polygon-based representations of two coronal holes reported to Heliophysics Event
Knowledgebase (HEK) [69] between ‘23 January 2012 07:00’ and ‘25 January 2012 07:00’
1.2 Motivation and Challenges 5

events such as coronal mass ejections and solar flares. Coronal mass ejections
and solar flares impact radiation in space, can reduce the safety of space and air
travel, disrupt intercontinental communication and GPS, and even damage power
grids [67].

1.2.2 Biomedical Sciences

In vitro fertilization (IVF) is a complex series of procedures used to treat fertility or


genetic problems and assist with the conception of a child. IVF technology allowed
us to view and analyze the early events of human fertilization and embryogenesis
[91]. Conventional embryo selection methods are still associated with a relatively
low IVF success rate with a clinical pregnancy rate of approximately 30% per trans-
fer [84]. This often leads to the transfer of more than one embryo at a time, which
increases the risk of multiple pregnancies, and the associated neonatal complications
and maternal pregnancy-related health problems [126]. Improvements in methods to
select embryos for transfer would potentially enable further increases in pregnancy
rates, and facilitate broader acceptance and adoption of single embryo transfer [51].
Nevertheless, the basic pathways and events of early human embryo development
and the factors aiding the prediction of success and failure is not well-known [132].
Time-lapse imaging is an emerging tool that allows the identification of parame-
ters that can potentially help predict the developmental potential of an embryo with
continuous monitoring [32]. Time-lapse observation presents an opportunity for
optimizing embryo selection based on morphological grading and it provides novel
kinetic parameters, which can further improve accurate selection of viable embryos
[77]. Time-lapse imaging can also aid in transforming the early embryo images into
spatiotemporal vector data, which can be used in spatiotemporal frequent pattern
mining. In Figs. 1.2 and 1.3, two illustrations of embryo cells from [132] and [32],
which are tracked with an automated image analysis software.
In [51], Herrero and Meseguer present their findings on the predictive markers
that influence the success rate of IVF. Those markers include spatial characteristics
of the early embryo stages such as appearance (shape) of pronuclei (nucleus of
sperm and egg), and temporal characteristics such as duration of first cleavage, and
time interval between first and second mitotic division. Conaghan suggests that
slower blastocyst formation is associated with poorer embryo viability [32]. The
associated markers as well as the embryo cells can be modeled as moving objects
with evolving regions. The validity of these markers and predictors can be tested
with spatiotemporal frequent patterns by performing a verification task on. Such
data analyses can help the scientists better comprehend the relationships among
different procedures in the IVF process.
6 1 A Gentle Introduction to Spatiotemporal Data Mining

Fig. 1.2 In [132], Wong et al. illustrate their cell tracking results, and compare its accuracy with
manual image analysis performed by human experts. They argue that these two methods have
excellent agreement. The tracking software models the embryos as a collection of ellipses with
position, orientation, and overlap indices. Images in top row show the frames from original time-
lapse sequence. Images in bottom row show the overlaid ellipses found after tracking. Wong et
al. claims that with these models, the duration of cytokinesis and time between mitoses can be
identified. (Image is copied from [132]—See Figure 3.a)

Fig. 1.3 In [32], Conaghan et al. present the results of the tracking software they used. The
primary features tracked by the software are the cell membranes. By using a data-driven
probabilistic framework, the software generates an embryo model that includes an estimate of
the number of blastomeres, as well as spatiotemporal attributes such as size, location, and shape,
as a function of time. (Image is copied from [32]—See Figure 2.a)

1.2.3 Epidemiology

It is commonly accepted that climate plays a role in the transmission of many


infectious diseases, some of which are among the most important causes of
mortality and morbidity in developing countries [65]. The early identification of
an epidemic of infectious disease is an important first step towards implementing
effective interventions to control the disease and reduce the resulting mortality and
morbidity in human populations. However, the epidemics are usually well advanced
before the authorities are notified and epidemic control measures are prepared or
deployed [70].
Malaria shows significant seasonal patterns by which the disease transmission is
highest in the months of heavy rainfall and humidity [120]. The spatial distribution
of disease-transmitting insects are closely related with these phenomena, where a
rise in temperature accelerates the reproduction rate of insects, or humid weather
conditions create desirable reproduction habitats for insects [76, 87]. Malaria
1.3 Challenges 7

demonstrates its most catastrophic effects in sub-Saharan Africa, where it is one


of the largest causes of morbidity and mortality, creating a significant barrier to
economic development [103].
The areas influenced by epidemics caused by mosquito vectors [120], high
and low temperature areas [87], and rainfall anomaly zones [65] can be modeled
as spatiotemporal trajectories of moving regions. Spatiotemporal frequent pattern
mining can be helpful for prediction of epidemics by demonstrating the associations
between climatic risk factors and disease outbreaks.

1.3 Challenges

Spatiotemporal data is collected from vastly different application domains. The


first and possibly the most persisting challenge for spatiotemporal frequent pattern
mining comes from its inherent interdisciplinary nature. Solely creating mining
schemata and providing very efficient algorithms for problems that does not exist
in the real-life is stunningly meaningless. For a functioning knowledge discovery
process, the data mining task needs to be very well defined and task-relevant data be
carefully curated. While the spatial data instances with time annotations is abundant,
the availability of trajectory datasets with region-based geometric representations
is limited. There are two reasons for that: (1) many spatial algorithms uses the
point-based spatial representations as they are easier to process and regionalization
of point-based vector data or a spatial raster is difficult and (2) the trajectories of
regions can usually be obtained only via a dedicated tracking module.
Another challenge is posed by the characteristics of spatial and temporal
dimensions, in which the real-life phenomena resides. Both spatial and temporal
spaces are continuous, but the evolving region trajectory data follows a temporal
snapshot model, where the locations of objects are recorded at particular times. This
can create a number of problems for both identifying spatiotemporal relationships
and obtaining meaningful results. One way to alleviate this is to sample the data at
finer temporal resolution, however this poses a computational efficiency problem,
which we will discuss next.
In transactional databases, the relations between the items are explicit, in a way
that it is what a transaction contains. For trajectory data, this is not the case. Firstly,
meaningful relationships need to be defined for a particular mining task. Secondly,
spatiotemporal relationships are implicit, and finding them requires computationally
heavy spatiotemporal operations. On top of those, single-handedly finding a partic-
ular relationship can result in the inclusion of spurious patterns. Understanding the
importance of a particular relationship, both among the trajectories and among event
types is a vital task for the relevance of the discovered patterns. Last but not least,
creating efficient and effective mining algorithms for discovering the patterns are
needed.

You might also like