You are on page 1of 58

Machine Learning and Artificial

Intelligence for Geospatial Data : Status,


Challenges and Perspective

Bhogendra Mishra, PhD


Policy Research Institute
Kathmandu Nepal
September 18, 2021
▪ Geospatial Data
▪ ML and AI Algorithms with Geospatial Data
▪ Applications
▪ Current Status and Challenges
▪ Future Perspective

2
Spatial Data
▪ Spatial data, also known as
Geographic data and information,
are defined in the ISO/TC 211
series of standards as data and
information having an implicit or
explicit association with a location
relative to Earth

▪ Approximately 90% of
government sourced data has a
location component. Location
information is stored in a
geographic information system
(GIS). Visual representation of data layers or themes in a GIS. Credit: Government
Accountability Office, 2012
3
Spatial Data

4
Spatial Data Sources and Types

5
Data Types
Spatial Data Representation
A data model is a way of defining and representing real world surfaces and
characteristics in GIS. There are two primary types of spatial data
: Vector and Raster.

Vector Data: data represents features as Raster Data: data represents features as a
discrete points, lines, and polygons. square/rectangular matrix of square cells
(pixels) 6
Data Types
Spatial Data Representation

Point

Line

Polygon

Raster data are described by a cell grid, one value per cell, while
vector data are described by point, line and polygon.
7
Data Types
Data Attributes
Feature tables for vector data

Value attribute tables for categorical grid data

Hydro-meteorological stations.

Nepal elevation map

Road network
8
Data Sources
Remote Sensing
▪ Remote sensing is the process of detecting and monitoring
the physical characteristics of an area by measuring its
reflected and emitted radiation at a distance (typically from
satellite or aircraft). Special cameras collect remotely
sensed images, which help researchers "sense" things about
the Earth. - USGS

▪ Remote sensing is the acquisition of information about an


object or phenomenon without making physical contact with
the object and thus in contrast to on-site observation,
especially the Earth. - Wikipedia
9
Data Sources
Remote Sensing
▪ Science and art of obtaining information about an object, area
or phenomenon through an analysis of data acquired by a
device that is not in direct contact with the area, object or
phenomenon under investigation.

▪ Applications
❖ Agriculture
❖ Urban monitoring
❖ Forest watch
❖ Oil and gas
❖ Maritime etc.
Source: https://www.fe-lexikon.info/
10
Data Sources
Remote Sensing Systems

Sun is the primary source of


energy
Satellite and drones are the
most common platform to
mount the sensor.
Ground station/set of the
antenna receive the data
from sensors
Various types of users
process and use the
acquired images.

Source: Soli 2013, Estimation of rainfall –runoff in a watershed using Remote sensing and GIS 11
Data Sources
Remote Sensing

Electromagnetic spectrum

We can capture the broad range of spectrum and visualize them 12


Data Sources
Remote Sensing

False color combination; Red – NIR, Green – VR, Blue – Green (NIR, Red, Green)
Natural color combination; Red , Green, Blue
13
Major Spatial Data Sources
▪ Sentinel 1 A/B, 2 A/B, Landsat 1 – 8, MODIS,
Hyperspectral

▪ Airborne Light detection and Range (LiDAR ) – Discrete


return and full-waveform and multispectral and single
photon)

▪ Aerial photos and point clouds

▪ Terrestrial Laser Scanning

▪ Harvester data / Field Sensors

▪ Field-based reference data Point cloud: set of data points in space.

▪ Crowd-sourced data (open street map, google map etc.)

▪ Personally collected data from unmanned Arial Vehicles Source: https://www.gim-international.com/content/news/point-


clouds-photogrammetry-or-lidar
(UAVs) and other sensors
High variety 14
Satellites
Remote Sensing Data Sources by Country

How many
satellites are there
in the space?
Total operating
satellites: 4084
(Total sent: 8,378)
United States: 2505
Russia: 168
China: 431
Other:980
May 1, 2021

More than 150 Earth-observation satellites


Source: https://www.ucsusa.org/resources/satellite-
database
Lie 2020. A Survey of remote-sensing big 15
data, Frontiers in Environmental Science.
Remote Sensing Data Volume
Copernicus Sentinel Data Access

In between 1 December 2017 and 30


November 2018 :
▪ 187,800 registered users in Sentinel
systems
▪ They publishes 26,500 products/day
▪ Daily download volume 166TB and
▪ 122.9 millions produces were downloaded
and
▪ Total data volume of 86.9PB

In between 1 December 2018 and 30 November


2019 :
▪ 280,000 registered users in Sentinel systems
▪ They publishes 30,500 products/day
▪ Daily download volume of 214TB
▪ 254 millions products were downloaded
about 128 million - occurred during Y2019
alone Soilla et al. 2018, A versatile data-intensive computing platform for
▪ Total data volume of 158.4 PB information retrieval from big geospatial data, Future Generation of
Computer Systems, 16
Data Volume Geopbyte
1030
Today data scientist uses Yottabytes to
describe how much government data the FBI
has on people altogether.
1027 Brontobyte
This will be our digital
universe tomorrow
Yottabyte 1024
This is our digital 1021 Zettabyte
1.32 of network traffic
universe today
in 2016
Exabyte 1018
7EB of data is created Petabyte
on the internet each day 1015 5PB of new data per day are
ingested in Facebook database
1012
Terabyte
In near future, Brontobyte will be 109 Gigabyte
the measurement to describe the
106
type of sensor data that will be Megabyte
generated from the IoT
17
Data science vs. Spatial Data Science

▪ Spatial Data Science (SDS) is a subset of Data


Science that focuses on the special characteristics
of data, using modeling to know where and why
things happen.
▪ It moving beyond simply looking at where things
happen (GIS) to understand why/when they
happen (DS) there (SDS).

18
Uniqueness of Geospatial (Big) Data
Comparing to traditional GIS and RS data

▪ Most of them have trajectory data and time series analysis, (Traditional
GIS software are lack of spatiotemporal analysis function)

▪ Unstructured data (No-SQL databases, social media data)

▪ Traditional GIS data are “relational databases” and “well-structured”.

▪ Multi-level and dynamic scaling (how to aggregate point data into


meaningful scale level ? (census block, zip codes, country, city
boundary?) (traditional GIS data are at single scale)

19
SDS Analytics Applications
Geo-referenced big data
Examples:
▪ GPS trajectories
▪ Check-in records
▪ Earth observation
imagery
▪ Spatial events, eg.
crimes, accidents,
disease outbreak
▪ Climate models
simulations
Wy “Spatial” matters?
▪ Impact everyday life
▪ Computational
challenges
*Some of the figures were taken from internet Covid -19 Risk Map - Disease Outbreak
20
Challenges for RS based data
▪ Multi-source

▪ Variable spatial resolution

▪ Variable noise

▪ Missing data

▪ Relevance of geographic location

▪ Need of reference data for model


building

Requires new ways to process remote sensing data………


21
What are the solutions?

22
RS Processing Solutions

Machine Learning and Artificial Intelligence

Timeline of technical solutions and their degree of interactivity (e.g. online processing, up- and
downloading of data). Overview of available systems and solutions dealing with Big Earth data.

Sudmanns et al. 2020. Big Earth data: disruptive changes in Earth observation data management and analysis?, International Journal of Digital Earth, 13:7, 832-850,
DOI: 10.1080/17538947.2019.1585976
23
Research Trend?

24
Research Trend
Research trend using machine
learning/artificial intelligence methods with
remote sensing dataset

Applied into various data sources:


Hyperspectral image, Lidar point cloud, Medium
low resolution image, very high resolution
Number of publications using deep learning methods
images, SAR data, Airborne/UAV images etc.
with remote sensing dataset over the years.
Mishra et al. (2021). Methods in the spatial deep learning: current status and future
direction. Spatial Information Research, 25
Research Trend
Research trend using machine learning/deep
learning methods with remote sensing dataset

Tag cloud: Highest frequency terms appearing in the


title and abstract of the peer-reviewed literature are in
Distribution of deep learning model used in the studies a larger font size.
based on remote sensing dataset CNN is more popular than other DL models, the main
focus of the study is LULC classification, then fusion,
Ma et al. (2019). Deep learning in remote sensing applications: A meta analysis and
review. ISPRS Journal of Photogrammetry and Remote Sensing. 152: 167 - 177 segmentation, change detection, registration etc. 26
Research Trend
Research trend – application area and accuracy achieved in selected applications

Number of publications for different study targets Distribution of overall accuracies for the
classification sub study area ( LULC classification,
Ma et al. (2019). Deep learning in remote sensing applications: A meta analysis and object detection, scene classification 27
review. ISPRS Journal of Photogrammetry and Remote Sensing. 152: 167 - 177
Research Trend

Distribution of application areas Number of conference papers and articles in the Scopus
database for a general search on [“deep learning” AND
“remote sensing”]
Ma et al. (2019). Deep learning in remote sensing applications: A meta analysis and
review. ISPRS Journal of Photogrammetry and Remote Sensing. 152: 167 - 177 28
Research Trend

The taxonomy containing four tasks: image processing, classification, change detection, and
accuracy assessment.

29
Model development
1

6
Mishra et al. (2021). Methods in the spatial deep learning: current
Deep
status and Learning
future direction. Model
Spatial Development
Information Research, 30
Model development
Input and Output Selection
▪ Most common way to select the input are using a priori system knowledge or ad-hoc.
▪ Consequence of excluding one or more significant inputs may result an inefficient model – that
may not develop the best possible input-output relationship;

Image pre-processing
▪ The selected input data set should be pre-processed – all should be terrain
corrected, atmospheric correction or bring to the same spatial resolution.
▪ Matching the temporal resolution
o The inconsistent input could results the inefficient system – could not
achieve the best model.
o Too small input dataset could often results the overfitting

• Stepwise – constructive, pruning


• Sensitivity analysis
• Non-linearity analysis
31
Model development
Model Architecture Selection
A wide variety of architectures are available to match the different
use cases that applications might encounter in the real world.

The most common architectures


• Convolution Neural Network (CNN)
• Autoencoders
• Recurrent Neural Network (RNN)
• Deep Belief Networks (DBNs)

• Image/video processing, object detection – CNN could be the default choice


• Time series application, text data, audio data – RNN is the obvious choice.

Accuracy, Speed and Size 32


Model development
Network Structure Selection
Model (network) structure, together with model (network) architecture,
defines the functional form of the relationship between model inputs and
output(s), f(•).
• It involves identify the suitable number of nodes per layer
• Number of hidden layers
• Kernel size (if applicable)
• How to process incoming signals (e.g. transfer function)
• Optimization
MetaQNN
a meta-modeling algorithm based on reinforcement learning to automatically
generate high-performing CNN architectures for a given learning task. 33
Model development
Model Evaluation
▪ Model evaluation is the process to determine which network structure is
optimal, the performance of a calibrated model is evaluated against one or
more criteria.
▪ It is often done by the quantitative error metric
▪ The most common approaches are
• Squared - Root Mean Square Error
• Absolute – Absolute Mean Error
• Relative - Normalized RMSE

34
Applications…
Image Fusion
Image fusion refers to the process of combining two or
more images into one composite image, which integrates the
information contained within the individual images. The result is
an image that has a higher information content compared to any of
the input images
▪ Pan-sharpening – fusion of a low-resolution multi-spectral
(MS) image and a high-resolution panchromatic (PAN)
▪ Low resolution hyper-spectral (HS) image and high-resolution
MS image to generate a high-resolution hyper-spectral
images.
35
Applications…
Image Registration

A method of aligning two or more images captured by different sensors,


at different times or from different sources. This is very important for the
change detection, classification etc.
In general, image registration includes the following four steps
(i) Feature extraction,
(ii) Feature matching
(iii) Transformation model estimation and
(iv) Image resampling

36
Applications…
Scene Classification and Object
Detection
▪ Scene classification is defined as a procedure to determine the images categories form
numerous pictures e.g. agriculture scenes, forest scenes, and beach scenes, and
training samples are series of labeled pictures.
▪ Object detection is to detect different objects in a single image eg. airplanes, cars, and
urban clusters, and training samples are the pixels in a fixed-sized window or patch.
▪ Using benchmark dataset is most common to train models such as RSSCN7 dataset,
UC-Merced dataset and WHU-RS dataset.
▪ Augmentation techniques are were applied to develop the large size and efficiency of the
training dataset.
▪ This is applicable in a very high spatial resolution dataset and CNN is most common
method.

37
Applications…
LULC Classification
The process of sorting pixels into a finite number of individual classes, or
categories of data, based on their spectral response (the measured brightness of
a pixel across the image bands, as reflected by the pixel’s spectral signature).
▪ Deep learning are most popular with the hyperspectral images or with the
very high spatial resolution images. CNN is the most frequently used ,method
followed by DBN and GAN.
▪ Augmentation techniques are were applied to develop the large size and
efficiency of the training dataset.
▪ Often used single source images or the timeseries/multi-temporal remote
sensing images as well.
▪ Application in urban, vegetation, forest, wetlands are the most common.
38
Predictive Modeling
using ANN

39
Snow Cover prediction
Case study…

Hypsometic elevation zoning, zone I – snow


free area, zone II –occasional snow cover, zone
III – snow cover except summer, zone IV –
perennial snow cover area

Objective: prediction of snow cover area


Mishra et al. 2013, Artificial Neutral Netowork-Based Snow Cover Predictive Modeling in the across the Kaligandaki river basin in the
Higher Himalayas. Journal of Mountain Science 11(4)
climate change scenario at 2040.
40
Case study…
Why Snow
▪Global maximum temperature is increasing over the decades,
▪Snow is highly sensitive to the maximum temperature.
▪Waters in our rivers are most from snow,
▪Livelihood of the large population is going to affect while changing the
temperature
▪Quantitative analysis of snow cover over the year is very important

As a computer/data scientist – you should not focus in developing the


methods (largely you don’t do, do you?) only, you should apply it in the
real world problem.
Interdisciplinary research work is very important for the development
41
Data and Model
Historical Dataset
Data name Spatial Temporal
resolution resolution
MOD10CM – 500m Monthly
snow cover
TRMM 3B43 – 25km Monthly
Precipitation
MOD11C3 – 5km Monthly
LST
Network NARX
Future Dataset
Activation layer LogSig
HadCM2(temp 2.5×3.75km Monthly
Optimization Gradient descent algorithm
, and ppt) Input dataset Monthly maximum temperature,
and precipitation
Time delay 3-months
No. of hidden nodes 4
Case study… 42
Case study…
Outcome

Timeseries snow cover extent forecasting using NARX and


Snow cover area simulated vs. observed (a) zone II, (b) zone HadCAM A1B scenario 2011 to 2040.
III and (c)zone IV using NARX LogSig activation function 43
Case study…
Predicted results
Performance indicator of the models in three elevation zones.

▪ Zone III is more vulnerable than any other areas


▪ Zone IV has the lowest effect from the global
warming.
▪ The magnitude of the reduction of the snow
cover is higher in this transition area
▪ Seasonal dynamics of the snow in lower
elevation zones does not affect as the extreme
scenario will increase.

Monthly snow cover extends changes over


2000-2010 vs. 2030 – 2040. 44
Data Fusion Using
Deep Learning

45
Case study…
Spatiotemporal data fusion
▪ Sentinel-2 is an European Earth ▪ Landsat 8 is an American Earth
observation Satellite that acquires observation Satellite. It is the eighth
optical imagery at high spatial satellite in the Landsat program that was
resolution (10m). started from 1972. It acquired optical
imagery at moderate spatial resolution
(30m).

Mishra B. and Shahi T.B. (2020), Deep learning based spatiotemporal data
fusion of Landsat 8 and Sentinel 2 Images, Preparation to submit :
Geoscience and Remote Sensing Letters. 46
Case study…
NDVI
▪ Normalized Differential
Vegetation Index (NDVI) is
used to quantify vegetation
greenness and is useful in
understanding vegetation
density and assessing
changes in plant health.
NDVI is calculated as a ratio
between the red (R) and
near infrared (NIR) values.

47
Case study…
Spatio-temporal data fusion
▪ Possible scenarios Sentinel 2A/B Landsat 8
t3 (c1= {t1<t2<t3})
t3_h t3_l
t2 t3(c2= {t1<t2=t3})

t3(c3)
t2_l
t3(c4= {t1<t2>t3}) t2_h= ?

t2
t1 t3(c5= {t1=t2=t3})
t1_h t1_l
t3(c6= {t1=t2>t3})
t3(c7 = {t1>t2<t3})
t2
t3(c8= {t1>t2=t3}) 10m spatial 30m spatial
resolution resolution
t3(c9= {t1>t2>t3})
48
Case study…
Study Area and Data Used
▪ Normalized Differential
Vegetation Index (NDVI) is
used to quantify vegetation
greenness and is useful in
understanding vegetation Region/climatic
condition
Date of Landsat
(YYYDDD)
8 Date of Sentinel 2A
(YYYDDD)
density and assessing Set I: West Nepal – 20170523 20170523
changes in plant health. Subtropical climate 20170504 20170508
20170415 20170423
NDVI is calculated as a ratio Set II: Alberta – Canada 20190905 20190903
between the red (R) and – Temperate climate 20190921 20190921
20191007 20191006
near infrared (NIR) values.
Practical Cases Case I Case II

Input / Output Set t1_h, = f(t1_h, t3_h, t1_l, t2_l, t3_l, ) t1_h, = f(t1_h, t3_h, t2_l, )

49
Process Flow
▪ CNN Layers Number of Kernel Size Activation
nodes/filters
Convolution 25 2x2 ReLu
MaxPooling - 2x2 -
Convolution 50 2x2 ReLu
Dense 1 Linear

▪ LSTM Parameters Values


Number of Node in Input Layer Number of input features
Number of Epoch 100 with Early Stopping Criteria of
10 epoch delay
Batch Size 1000
Hidden layer one LSTM layer with 30 units
Activation Function tanh
Dropout layer 1 with (0.2 dropout rate)
Output Layer 1

▪ Random Forest
Best_parameters={'bootstrap': True, 'max_depth': 5,
'max_features': 'auto', 'min_samples_leaf': 1,
'min_samples_split': 2, 'n_estimators': 20}
Case study…
50
Case study…
Results
▪ In overall LSTM perform better than any others.
▪ Random forest perform worst comparison however,
the overall performance of all methods is very good.
Result: Site I case I – Five input images
Indicators C1 C3 C7 C9 Overall

R2 0.873 0.867 0.891 0.951 0.904


RMSE 0.034 0.060 0.038 0.043 0.036
CNN
R2 0.888 0.85 0.87 0.95 0.91
RMSE 0.032 0.063 0.041 0.040 0.035
LSTM
Random R2 NDVI map of study site I (a)
0.869 0.859 0.88 0.925
Forest 0.876 sentinel 2 NDVI, (b) NDVI from
RMSE Random Forest, (c) NDVI from
0.035 0.062 0.039 0.036
0.045 DNN, (d) NDVI from SVM.
51
Case study…
Results
▪ In overall LSTM perform better than any others.
▪ Random forest perform worst comparison however,
the overall performance of all methods is very good.
Result: Site II case I – Five input images
Indicators C1 C3 C7 C9 Overall

R2 0.838 0.723 0.44 0.913 0.812


RMSE 0.047 0.079 0.077 0.050 0.066
CNN
R2 0.854 0.755 0.34 0.85 0.804
RMSE 0.045 0.075 0.083 0.065 0.067
LSTM
Random R2 0.765 NDVI map of study site I (a)
0.769 0.667 0.496 0.875
Forest sentinel 2 NDVI, (b) NDVI from
RMSE 0.073 Random Forest, (c) NDVI from
0.057 0.088 0.073 0.06
DNN, (d) NDVI from SVM.
52
Results
Indicators C1 C3 C7 C9 Overall Result Site I
case II
R2 0.831 0.833 0.862 0.903 0.872
CNN RMSE 0.039 0.067 0.043 0.061 0.042
R2 0.85 0.832 0.882 0.95 0.902
LSTM RMSE 0.036 0.068 0.040 0.042 0.037
Random R2 0.829 0.829 0.81 0.885 0.849
Forest RMSE 0.035 0.069 0.049 0.036 0.045
Indicators C1 C3 C7 C9 Overall Result Site II
case II
R2 0.779 0.634 0.435 0.809 0.729
CNN RMSE .055 0.092 0.078 0.075 0.078
R2 0.85 0.832 0.882 0.95 0.902
LSTM RMSE 0.036 0.068 0.040 0.042 0.037
Random R2 0.767 0.667 0.481 0.871 0.762
Forest RMSE 0.057 0.088 0.074 0.062 0.074 53
Skills and Technology
What skills & technologies are used in Spatial
Data Science?

▪ Python & R are the most


commonly used
programming languages in
the community.

Source: https://carto.com/what-is-spatial-data-science/#why-is-spatial-data-science- 54
important-in-business
Future Perspective

55
Challenges
▪ The major challenges include the availability of sufficient
training samples, strong non-linearity, and low signal-to-noise
ratios.
▪ Handling multi-source multi-temporal, multi-resolution and multi-
platform, dataset is always a big challenge
▪ Making use of multi-model dataset can ease the data
availability that helps to increase the quality and reliability of
DL methods but handling them is challanging.

56
Future direction
▪ Majority of the work was found in segmentation, clustering, object detection, scene detection, i.e.
focused in the feature but not in the complete cycle of the geospatial analysis.
▪ Digital mapping would another promising application with the DL in future.
▪ Smart digitization.
▪ Extract the topological relationship, geotagging, and help explainable digitization.
▪ Visualization of automatically extracted features in different applications) has not fully been
explored with the spatial dataset,
▪ Semi-supervised or unsupervised learning is required to overcome the dependency on label-based
datasets.
▪ An explainable DL model is required in order to widespread acceptance and building trust.
57
Thanks!

[Bhogendra Mishra] [ESI]

You might also like