Professional Documents
Culture Documents
Machine Learning and Artificial Intelligence-Based Model Development For Geospatial Data
Machine Learning and Artificial Intelligence-Based Model Development For Geospatial Data
2
Spatial Data
▪ Spatial data, also known as
Geographic data and information,
are defined in the ISO/TC 211
series of standards as data and
information having an implicit or
explicit association with a location
relative to Earth
▪ Approximately 90% of
government sourced data has a
location component. Location
information is stored in a
geographic information system
(GIS). Visual representation of data layers or themes in a GIS. Credit: Government
Accountability Office, 2012
3
Spatial Data
4
Spatial Data Sources and Types
5
Data Types
Spatial Data Representation
A data model is a way of defining and representing real world surfaces and
characteristics in GIS. There are two primary types of spatial data
: Vector and Raster.
Vector Data: data represents features as Raster Data: data represents features as a
discrete points, lines, and polygons. square/rectangular matrix of square cells
(pixels) 6
Data Types
Spatial Data Representation
Point
Line
Polygon
Raster data are described by a cell grid, one value per cell, while
vector data are described by point, line and polygon.
7
Data Types
Data Attributes
Feature tables for vector data
Hydro-meteorological stations.
Road network
8
Data Sources
Remote Sensing
▪ Remote sensing is the process of detecting and monitoring
the physical characteristics of an area by measuring its
reflected and emitted radiation at a distance (typically from
satellite or aircraft). Special cameras collect remotely
sensed images, which help researchers "sense" things about
the Earth. - USGS
▪ Applications
❖ Agriculture
❖ Urban monitoring
❖ Forest watch
❖ Oil and gas
❖ Maritime etc.
Source: https://www.fe-lexikon.info/
10
Data Sources
Remote Sensing Systems
Source: Soli 2013, Estimation of rainfall –runoff in a watershed using Remote sensing and GIS 11
Data Sources
Remote Sensing
Electromagnetic spectrum
False color combination; Red – NIR, Green – VR, Blue – Green (NIR, Red, Green)
Natural color combination; Red , Green, Blue
13
Major Spatial Data Sources
▪ Sentinel 1 A/B, 2 A/B, Landsat 1 – 8, MODIS,
Hyperspectral
How many
satellites are there
in the space?
Total operating
satellites: 4084
(Total sent: 8,378)
United States: 2505
Russia: 168
China: 431
Other:980
May 1, 2021
18
Uniqueness of Geospatial (Big) Data
Comparing to traditional GIS and RS data
▪ Most of them have trajectory data and time series analysis, (Traditional
GIS software are lack of spatiotemporal analysis function)
19
SDS Analytics Applications
Geo-referenced big data
Examples:
▪ GPS trajectories
▪ Check-in records
▪ Earth observation
imagery
▪ Spatial events, eg.
crimes, accidents,
disease outbreak
▪ Climate models
simulations
Wy “Spatial” matters?
▪ Impact everyday life
▪ Computational
challenges
*Some of the figures were taken from internet Covid -19 Risk Map - Disease Outbreak
20
Challenges for RS based data
▪ Multi-source
▪ Variable noise
▪ Missing data
22
RS Processing Solutions
Timeline of technical solutions and their degree of interactivity (e.g. online processing, up- and
downloading of data). Overview of available systems and solutions dealing with Big Earth data.
Sudmanns et al. 2020. Big Earth data: disruptive changes in Earth observation data management and analysis?, International Journal of Digital Earth, 13:7, 832-850,
DOI: 10.1080/17538947.2019.1585976
23
Research Trend?
24
Research Trend
Research trend using machine
learning/artificial intelligence methods with
remote sensing dataset
Number of publications for different study targets Distribution of overall accuracies for the
classification sub study area ( LULC classification,
Ma et al. (2019). Deep learning in remote sensing applications: A meta analysis and object detection, scene classification 27
review. ISPRS Journal of Photogrammetry and Remote Sensing. 152: 167 - 177
Research Trend
Distribution of application areas Number of conference papers and articles in the Scopus
database for a general search on [“deep learning” AND
“remote sensing”]
Ma et al. (2019). Deep learning in remote sensing applications: A meta analysis and
review. ISPRS Journal of Photogrammetry and Remote Sensing. 152: 167 - 177 28
Research Trend
The taxonomy containing four tasks: image processing, classification, change detection, and
accuracy assessment.
29
Model development
1
6
Mishra et al. (2021). Methods in the spatial deep learning: current
Deep
status and Learning
future direction. Model
Spatial Development
Information Research, 30
Model development
Input and Output Selection
▪ Most common way to select the input are using a priori system knowledge or ad-hoc.
▪ Consequence of excluding one or more significant inputs may result an inefficient model – that
may not develop the best possible input-output relationship;
Image pre-processing
▪ The selected input data set should be pre-processed – all should be terrain
corrected, atmospheric correction or bring to the same spatial resolution.
▪ Matching the temporal resolution
o The inconsistent input could results the inefficient system – could not
achieve the best model.
o Too small input dataset could often results the overfitting
34
Applications…
Image Fusion
Image fusion refers to the process of combining two or
more images into one composite image, which integrates the
information contained within the individual images. The result is
an image that has a higher information content compared to any of
the input images
▪ Pan-sharpening – fusion of a low-resolution multi-spectral
(MS) image and a high-resolution panchromatic (PAN)
▪ Low resolution hyper-spectral (HS) image and high-resolution
MS image to generate a high-resolution hyper-spectral
images.
35
Applications…
Image Registration
36
Applications…
Scene Classification and Object
Detection
▪ Scene classification is defined as a procedure to determine the images categories form
numerous pictures e.g. agriculture scenes, forest scenes, and beach scenes, and
training samples are series of labeled pictures.
▪ Object detection is to detect different objects in a single image eg. airplanes, cars, and
urban clusters, and training samples are the pixels in a fixed-sized window or patch.
▪ Using benchmark dataset is most common to train models such as RSSCN7 dataset,
UC-Merced dataset and WHU-RS dataset.
▪ Augmentation techniques are were applied to develop the large size and efficiency of the
training dataset.
▪ This is applicable in a very high spatial resolution dataset and CNN is most common
method.
37
Applications…
LULC Classification
The process of sorting pixels into a finite number of individual classes, or
categories of data, based on their spectral response (the measured brightness of
a pixel across the image bands, as reflected by the pixel’s spectral signature).
▪ Deep learning are most popular with the hyperspectral images or with the
very high spatial resolution images. CNN is the most frequently used ,method
followed by DBN and GAN.
▪ Augmentation techniques are were applied to develop the large size and
efficiency of the training dataset.
▪ Often used single source images or the timeseries/multi-temporal remote
sensing images as well.
▪ Application in urban, vegetation, forest, wetlands are the most common.
38
Predictive Modeling
using ANN
39
Snow Cover prediction
Case study…
45
Case study…
Spatiotemporal data fusion
▪ Sentinel-2 is an European Earth ▪ Landsat 8 is an American Earth
observation Satellite that acquires observation Satellite. It is the eighth
optical imagery at high spatial satellite in the Landsat program that was
resolution (10m). started from 1972. It acquired optical
imagery at moderate spatial resolution
(30m).
Mishra B. and Shahi T.B. (2020), Deep learning based spatiotemporal data
fusion of Landsat 8 and Sentinel 2 Images, Preparation to submit :
Geoscience and Remote Sensing Letters. 46
Case study…
NDVI
▪ Normalized Differential
Vegetation Index (NDVI) is
used to quantify vegetation
greenness and is useful in
understanding vegetation
density and assessing
changes in plant health.
NDVI is calculated as a ratio
between the red (R) and
near infrared (NIR) values.
47
Case study…
Spatio-temporal data fusion
▪ Possible scenarios Sentinel 2A/B Landsat 8
t3 (c1= {t1<t2<t3})
t3_h t3_l
t2 t3(c2= {t1<t2=t3})
t3(c3)
t2_l
t3(c4= {t1<t2>t3}) t2_h= ?
t2
t1 t3(c5= {t1=t2=t3})
t1_h t1_l
t3(c6= {t1=t2>t3})
t3(c7 = {t1>t2<t3})
t2
t3(c8= {t1>t2=t3}) 10m spatial 30m spatial
resolution resolution
t3(c9= {t1>t2>t3})
48
Case study…
Study Area and Data Used
▪ Normalized Differential
Vegetation Index (NDVI) is
used to quantify vegetation
greenness and is useful in
understanding vegetation Region/climatic
condition
Date of Landsat
(YYYDDD)
8 Date of Sentinel 2A
(YYYDDD)
density and assessing Set I: West Nepal – 20170523 20170523
changes in plant health. Subtropical climate 20170504 20170508
20170415 20170423
NDVI is calculated as a ratio Set II: Alberta – Canada 20190905 20190903
between the red (R) and – Temperate climate 20190921 20190921
20191007 20191006
near infrared (NIR) values.
Practical Cases Case I Case II
Input / Output Set t1_h, = f(t1_h, t3_h, t1_l, t2_l, t3_l, ) t1_h, = f(t1_h, t3_h, t2_l, )
49
Process Flow
▪ CNN Layers Number of Kernel Size Activation
nodes/filters
Convolution 25 2x2 ReLu
MaxPooling - 2x2 -
Convolution 50 2x2 ReLu
Dense 1 Linear
▪ Random Forest
Best_parameters={'bootstrap': True, 'max_depth': 5,
'max_features': 'auto', 'min_samples_leaf': 1,
'min_samples_split': 2, 'n_estimators': 20}
Case study…
50
Case study…
Results
▪ In overall LSTM perform better than any others.
▪ Random forest perform worst comparison however,
the overall performance of all methods is very good.
Result: Site I case I – Five input images
Indicators C1 C3 C7 C9 Overall
Source: https://carto.com/what-is-spatial-data-science/#why-is-spatial-data-science- 54
important-in-business
Future Perspective
55
Challenges
▪ The major challenges include the availability of sufficient
training samples, strong non-linearity, and low signal-to-noise
ratios.
▪ Handling multi-source multi-temporal, multi-resolution and multi-
platform, dataset is always a big challenge
▪ Making use of multi-model dataset can ease the data
availability that helps to increase the quality and reliability of
DL methods but handling them is challanging.
56
Future direction
▪ Majority of the work was found in segmentation, clustering, object detection, scene detection, i.e.
focused in the feature but not in the complete cycle of the geospatial analysis.
▪ Digital mapping would another promising application with the DL in future.
▪ Smart digitization.
▪ Extract the topological relationship, geotagging, and help explainable digitization.
▪ Visualization of automatically extracted features in different applications) has not fully been
explored with the spatial dataset,
▪ Semi-supervised or unsupervised learning is required to overcome the dependency on label-based
datasets.
▪ An explainable DL model is required in order to widespread acceptance and building trust.
57
Thanks!