You are on page 1of 52

Teachers: Gorica Bratic and Maria Antonia Brovelli

gorica.bratic@polimi.it; maria.brovelli@polimi.it

Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque


Ordoñez, Mathilde Puche and Mousa Sondoqah
GEOlab – Politecnico di Milano 1
Before we
start…
• Make sure that you fulfill all the requirements
for following the workshop:
• QGIS Desktop 3.10.x with GRASS
• scikit-learn library for Python3 in QGIS
• Data unzipped
• Clip.tif raster from Satellite Data Analysis and
Machine Learning Classification with QGIS – Part 1

Only if you did not download data for complete


workshop or if you do not have Clip.tif download the
data to be used only during second part the workshop
from one of the links below:
• Google Drive: link1
• Zenodo: link2
• WeTransfer: link3 (valid until 13/05/2021)
• WeTransfer: link4 (valid until 13/05/2021)

2
Run QGIS with GRASS as Administrator

Right-click on icon of QGIS Desktop


with GRASS
Select Run as Administrator

3
Load vectors (1)

Load the vector by selecting:


1. Layer menu
2. Add Layer
3. Add Vector Layer

4
Load vectors (2)

1. Navigate to the folder with


data and select roi.shp
2. Add Layer
3. Switch to Raster tab

5
Load rasters

1. Navigate to the GHS folder


and select all .tif files
2. Click on Add Layer
1. Navigate to the folder of
Clip.tif (e.g., output of the
1st part of the workshop)
2. Click on Add Layer
3. Close the Data Source
Manager

6
Install QGIS plugins

For installing a plugin


1. Go to the Plugins menu
2. Select Manage and Install Plugins
3. Go to the tab All
4. Type the name of the plugin
5. Select the plugin in the list
6. Click on Install Plugin

Please follow the procedure above to


install following plugins by inserting
their name in the search filed (step
4):
• dzetsaka: Classification tool
• QuickMapServices

7
Introduction to supervised classification

8
What is supervised classification?

Supervised classification takes advantage of the training set and classification algorithms to predict the class.
Training set is a set of areas or points in the region of interest for which the class is known (field survey,
photointerpretation, etc.)
Classification algorithm uses training set as input to „learn“ to recognize similar values in the satellite imagery.

Simplified schema of classification: Classification of fruits


https://www.tutorialandexample.com/wp-content/uploads/2020/11/Supervised-Machine-Learning-1.png
9
QGIS plugins – dzetsaka: Classification tool

For the classification you will


use the dzetsaka : Classification
tool plugin.
It requires a training set which is
composed by polygons with
numerical values denoting
classes.
We will use 3 classification
algorithms:
Gaussian Mixture Model - GMM
Random Forest - RF
K-Nearest Neighbors - KNN

10
Training set creation

11
Training set

The training set is a set of land cover class samples for each class expected in the classification output.
Unique guidelines for training set extraction do not exist, and approach for doing so varies depending on:
• Classification algorithm
• Number of classes
• Desired accuracy
• Budget

Often sampling theory is the basis for estimating the suitable sample size that would result in appropriate
characterization of spectral signatures.
In some other cases recommendation is to have a minimum of 10–30p samples per-class for training,
where p is the number of bands used.
For every approach, the training samples must be correct, therefore we must have confident source of
reference information:
• in situ data collection
• photo-interpretation of very high-resolution satellite imagery

12
Supervised classification - Create training set (1)

To create a training set you need to


create a new polygon shapefile which
will contain an attribute field
describing the belonging of each
polygon to the classes:
✓ 1 = non-built-up
✓ 2 = built-up

To create a new shapefile


1. Go to Layer menu
2. Select Create Layer
3. Select New Shapefile Layer

NOTE: dzetsaka does not accept


value 0 as a class, therefore the
(integer) values in the training set
must be larger than 0.
13
Supervised classification - Create training set (2)

Define the properties of the new


shapefile:
1. Define the File name (roi.shp) and
output folder
2. Select the Geometry type
as Polygon
3. Select CRS to be EPSG:32637 –
WGS 84 / UTM zone 37N (the same
as the CRS of the Sentinel-2 image)
4. Click on OK to create the new
shapefile layer

14
Supervised classification - Create training set – predefine possible values (1)

There is the possibility to define in


advance which values the
features have, i. e. to create the
attributes form. In the case of this
example, the predefined values are:
✓ 1 = non-built-up
✓ 2 = built-up
That is done from the layer
properties.
To open the layer properties:
1. Right-click on the training vector
layer (e.g., in our case, roi.shp)
2. Select Properties

15
Supervised classification - Create training set – predefine possible values (2)

1. Go to the Attributes Form tab


2. Select id
3. Select Value Map for Widget Type
4. Insert Value and Description by
double-clicking on a cell to activate
the editing, then type the cell
content. In our case, we will put:
Value Description
1 Non built-up
2 Built-up

5. Click on OK to conclude the form

16
Supervised classification – Activate base map – QuickMapServices (1)

For creating the training


samples, it is necessary to
load a base map helping us
to determine if a sample
belongs to the non built-up or
to the built-up class.

In this example we will use


the Bing Satellite base map
from QMS plugin for a
reference.

To activate the Bing Base map


1. Go to Web menu
2. Select QuickMapServices
3. Select Bing
4. Select Bing Satellite

17
Supervised classification – Activate base map – QuickMapServices (2)

If you do not see Bing in the


list of imagery you need to
expand the list of the
available base maps by
getting the contributed pack.
To do so:
1. Go to Web menu
2. Select
QuickMapServices
3. Select Settings
4. Go to More Services tab
5. Click on Get
contributed pack
6. Save changes
Repeat the procedure of the
previous slide to upload Bing
Satellite into QGIS

18
Supervised classification - Create training set – Start editing training vector

Now, based on the Bing Satellite base


map we can start digitizing polygons of
the training set. To start digitizing
polygons, it is necessary to enter to
editing mode as:
1. Select the training vector (e.g.,
roi.shp) in the Layers panel
2. Enter to the editing mode by
clicking on the Toggle Editing tool
3. Select Add Polygon Feature tool.

19
Supervised classification - Create training set – Add Non built-up class features

To create a polygon for the non built-up


class, find an area without buildings, then:
1. In the Map panel left-click to create the
vertices of the polygons. Right-click on
the initial vertex to finish the polygon
drawing. The polygon area will be
displayed in semi-transparent red color
2. Select Non built-up from predefined
values
3. Click on OK to assign the value to the
polygon feature

20
Supervised classification - Create training set – Add Built-up class features

To create a polygon for the built-up class,


find an area with buildings, then:
1. In the Map panel left-click to create
the vertices of the polygons. Right-click
on the initial vertex to finish the
polygon drawing. The polygon area will
be displayed in semi-transparent red
color
2. Select Built-up from predefined values
3. Click on OK to assign the value to the
polygon feature

21
Supervised classification - Create training set – conclude editing

When enough samples are added,


conclude editing. It is necessary to:
1. Click on the Save Layer Edits
button to save all the polygon
features added
2. Stop editing by clicking on
the Toggle Editing button

22
Supervised classification

23
Supervised classification – Activate dzetsaka classification tool

To activate dzetsaka classification


tool:
1. Go to the Plugins menu
2. Select dzetsaka
3. Select classification dock

24
Supervised classification - Classification with dzetsaka (1)

First step in the classification is to


specify input data

1. Specify the name of the image


to be classified (e.g., Clip.tif)
2. Specify the name of the
training set vector layer (e.g.,
roi.shp)
3. Specify the attribute of the
training set layer that contains
information regarding the
classes (must be numerical,
e.g., id)

25
Supervised classification - Classification with dzetsaka (2)

Moreover, we must specify outputs


and classification settings.

1. Specify the name of the output


2. Open dzetsaka settings
3. Choose the Classifier
(classification algorithm) (e.g.,
Random Forest)
4. In the Optional parameters flag
Save matrix and specify the
path where to save error matrix
5. Specify Split to 80 % so that
80% of the samples of the
roi.shp file are used for training
the algorithm and the rest for
cross-validation
6. Press Perform the classification

26
Supervised classification - Classification with dzetsaka (3)

Repeat the procedure the for


Gaussian Mixture Model and K-
Nearest Neighbors by changing
the Classifier (3), the output
name (1) and the error matrix
name (4).

27
Further information about classification algorithms included in dzetsaka

• Dzetsaka GitHub repository:


https://github.com/nkarasiak/dzetsaka

• Scikit-learn - Random Forest:


https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.
RandomForestClassifier

• Scikit-learn - K-Nearest Neighbors:


https://scikit-learn.org/stable/modules/neighbors.html#classification

• Gaussian Mixture Model:


https://doi.org/10.1109/JSTARS.2015.2441771

28
Validation

29
Assessing the success of the classification - validation

Using portion of training data for validation is just one approach for
doing validation (internal validation – cross-validation)
Often validation is conducted independently of training data set.
Validation must ensure:
• Appropriate sample size - Enough samples with minimum cost
• Statistical – Calculate the number of samples based on binomial or In QGIS there is the AcATaMa
normal approximation to the binomial distribution (Cochrane, 1977)* plugin designed for the
• Empirical – number of samples is driven by the available budget (not accuracy estimation of land
suggested) cover maps.
• Sample Allocation: It supports :
• Random – samples are distributed randomly in the area of interest • different steps in creation of
• Stratified random sampling – split area of interest into strata and then training samples (e.g., sample
select samples in each strata size definition and sample
• Equal sample size per strata allocation)
• Number of samples per strata adjusted according to the strata size • a user-friendly interface for
• Other photo interpretation of
• Confident source of reference information: samples and
• In situ data collection • the calculation of multiple
• Photo-interpretation of higher resolution imagery accuracy indexes
*https://hwbdocuments.env.nm.gov/Los%20Alamos%20National%20Labs/General/14447.pdf
30
Error matrix and accuracy indexes

Error matrix produced by dzetsaka Error matrix interpretation


Classification with Random Forest
Prediction/classification
1 Non built- 2 Built- Sum
Lines=reference PA
up up reference
# Columns=prediction (ground truth)

Reference
1 Non built-up 19309 15 19324 99.9%
19309 15
2 Built-up 26 3821 3847 99.3%
26 3821
Sum prediction 19335 3836 23171
UA 99.9% 99.6% 99.8%
• Producer’s accuracy (PA) of a class is probability
that the class present on the ground is also 𝟏𝟗𝟑𝟎𝟗
𝑷𝑨𝑵𝒐𝒏 𝒃𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟗%
captured by the classification in the thematic raster. 𝟏𝟗𝟑𝟐𝟒

• User’s accuracy (UA) of a class shows how often a 𝑷𝑨𝑩𝒖𝒊𝒍𝒕−𝒖𝒑 =


𝟑𝟖𝟐𝟏
∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟑%
user of classified map can expect to find the class 𝟑𝟖𝟒𝟕
𝟏𝟗𝟑𝟎𝟗
on the ground 𝑼𝑨𝑵𝒐𝒏 𝒃𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟗%
𝟏𝟗𝟑𝟑𝟓
• Overall accuracy (OA) is the proportion of correctly 𝟑𝟖𝟐𝟏
𝑼𝑨𝑩𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟔%
classified pixels out of the total number of pixels. 𝟑𝟖𝟑𝟔

𝟑𝟖𝟐𝟏+𝟏𝟗𝟑𝟎𝟗
𝑶𝑨 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟖%
𝟐𝟑𝟏𝟕𝟏

31
Inter-comparison

32
Assessing the success of the classification – inter-comparison (1)

The classification success can be estimated by comparison with other maps with the
same theme by computing the error matrix and the accuracy indexes like Overall
accuracy, PA and UA. Instead of PA and UA, the Commission error (1-UA) and Omission
error (1-PA) can also be used. Moreover, another index, the Kappa index, is often
computed.

For the scope of this exercise, we will use GHS-BUILT (Sentinel-1)


that contains an information layer on built-up presence as derived
from Sentinel-1 image collections (2016). This dataset is produced
by the Joint Research Center of the European Commission. The CRS
of this dataset is WGS 84 / Pseudo-Mercator (EPSG:3857)

33
Assessing the success of the classification – inter-comparison (2)

In order to compare the data, two datasets must have:


✓ the same CRS
✓ the same resolution
✓ the same extent
✓ the same values for the same classes

Parameters Classification output GHS-BUILT S1


CRS EPSG:32637 EPSG:3857
Resolution 10 m 20 m
Extent 1 tile (X: 1622 Y: 1074) 4 tiles (X: 1367 Y: 91)
NULL - Non-built up
0 - Non-built-up
Classes 1 - Built-up
1 - Built up

34
Graphical modeler for multiple step processes

As you can see many parameters


of the two dataset need to be
homogenized and this can be
more efficiently done with the
help of the graphical modeler.
The graphical modeler allows
you to create complex models
using a simple and easy-to-use
interface. It is particularly useful
for repetitive processing

To activate the Graphical


modeler
1. Go to the Processing menu
2. Select Graphical Modeler

35
Graphical modeler - workflow

We can create a model to harmonise GHS


BUILT S1 with the classified raster as shown on the
figure on the right.
The classified raster requires only one preprocessing
operation before the comparison. On the opposite for
GHS BUILT S1 we will apply: Merge operation to
merge 4 tiles, Warp (reproject) to reproject it, Clip
raster by extent (the extent of the classified
raster), r.null to change null values with 0 since in
case of GHS the non built-up values are NULL by
default.
After all the preprocessing the r.kappa can be
executed to compute the error matrix and
the accuracy indexes.
Please note that r.kappa automatically adjusts the
resolution of classified raster to the resolution of
reference raster (e.g. GHS BUILT S1), therefore this
operation was not explicitly included in the model.

36
Graphical modeler – Define input data (1)

Let’s start with saving the model

1. Specify the name of the model


(e.g., Accuracy assessment)
2. Click on the save icon and save
the file of the model by
defining its destination path
and its name (e.g., Accuracy
assessment)

37
Graphical modeler – Define input data (2)
Input parameters are the first to be
defined in the model. When defining
input parameters, we should keep in
mind the expected data type that will be
used in the model.
In our case the first input layer
is the classification output raster we
produced before.
1. Go to Inputs tab
2. Double click on Raster Layer
3. Insert the name you want to assign
to this Raster Layer in the
Description field (e.g., Classified
Raster)
4. Click on OK

At this point only the type of input data


is defined; the data to be processed will
be specified when running the model
38
Graphical modeler – Define input data (3)

The second input is the group of


raster tiles with which we are
going to compare the classification
outputs we produced before.

1. Go to Inputs tab
2. Double click on Multiple Input
3. Insert the inputs parameter
name in the Parameter name
field (e.g., Raster tiles)
4. Select Raster Data type
5. Click on OK

At this point only the type of input


data is defined; the data to be
processed will be specified when
running the model

39
Graphical modeler – Change values of classification output

The first processing consists in changing


the values of the classification output from 1
and 2 to 0 and 1, respectively.
It is a simple algebraic operation where we
use the Raster Calculator to subtract 1 from
the classified raster.

1. Go to Algorithms tab
2. Search for Raster calculator in the
search bar
3. Double click on Raster Calculator in
Raster analysis
4. Define Expression (e.g. “Classified
Raster@1”-1)
5. Open Reference layer Multiple selection
window
6. Select Classified Raster so that output
CRS, extent, and cell size are adjusted
according to this layer
7. Click on OK

40
Graphical modeler – Merge multiple raster into one

Next processing operations are dedicated to


adaptation of GHS BUILT S1 to the
classification output. The first operation is to
merge the 4 tiles of GHS BUILT S1 into a
single tile.

1. Go to Algorithms tab
2. Search for Merge in the search bar
3. Double click on Merge in GDAL →Raster
miscellaneous
4. For selecting Input layers open Multiple
selection window
5. Select input layers (e.g., Raster tiles)
6. Click on OK to confirm selection
7. Select Output data type to be integer
(e.g., Int32)
8. Click on OK

41
Graphical modeler – Reproject

Next step is to reproject GHS-BUILT S1 to the


CRS of classification output

1. Go to Algorithms tab
2. Search for Warp (reproject) in the
search bar
3. Select Warp (reproject) in GDAL→Raster
projections
4. Selecting Input layer from drop-down
menu (e.g., outcome of merge operation
denoted as ‘Merged’ from algorithm
‘Merge’)*
5. Select Source CRS (e.g., EPSG: 3857)
6. Select Target CRS (e.g., EPSG: 32637)
7. Click on OK

* Pay attention that with QGIS 3.16 you have


also to click on 123 of Input layer and select
Algorithm Output.

42
Graphical modeler – Reproject - QGIS 3.16

Next step is to reproject GHS-BUILT S1 to the


CRS of classification output

1. Go to Algorithms tab
2. Search for Warp (reproject) in the
search bar
3. Select Warp (reproject) in GDAL→Raster
projections
4. Select Algorihtm Output as a source of
input data
5. Selecting Input layer from drop-down
menu (e.g., outcome of merge operation
denoted as ‘Merged’ from algorithm
‘Merge’)*
6. Select Source CRS (e.g., EPSG: 3857)
7. Select Target CRS (e.g., EPSG: 32637)
8. Click on OK

*Pay attention that with QGIS 3.16 the step


4 is additional with respect to QGIS 3.10

43
Graphical modeler – Clip raster by extent
Now we clip GHS-BUILT S1 to the extent of the
classification output

1. Go to Algorithms tab
2. Search for Clip raster by extent in the
search bar
3. Select Clip raster by extent in
GDAL→Raster extraction
4. Selecting Input layer from drop-down
menu (e.g., the outcome of reprojection
denoted as ‘Reprojected’ from algorithm
‘Warp(reproject)’)
5. Selectin in Clipping extent the layer based
on which the clipping extent will be
calculated (e.g., Extent of Classified Raster)
6. Click on OK

NOTE: With 3.16 the Input layer must be


searched among the Algorithm output (see
previous slide) and the Clipping extent among
the Model input.

44
Graphical modeler – Fill no data values

Finally, we need to replace NULL values of


GHS BUILT S1 with 0 to have it coherent with
the classes of the classification output

1. Go to Algorithms tab
2. Search for r.null in the search bar
3. Select Clip raster by extent in
GRASS→Raster (r.*)
4. Select Name of raster map for which to
edit null values from drop-down menu
(e.g., outcome of clipping denoted as
‘Clipped (extent)’ from algorithm ‘Clip
raster by extent ’)
5. Insert The value to replace the null
value (e.g., 0.0)
6. Click on OK

45
Graphical modeler – Compute accuracy indexes
Now data are ready to be compared by means of
error matrix and accuracy indexes. For that, we
use the r.kappa algorithm.
1. Go to Algorithms tab
2. Search for r.kappa in the search bar
3. Select r.kappa in GRASS→Raster(r.*)
4. Select Raster layer containing classification
result from drop-down menu (e.g.,
preprocessed classification output denoted
as ‘Output’ from algorithm ‘Raster
calculator’)
5. Select Raster layer containing reference
classes from drop-down menu (e.g.,
preprocessed comparison dataset output
denoted as ‘NullRaster’ from algorithm
‘r.null’)
6. As this is the final goal of a model, specify
Error matrix and Kappa so that the outcome
of this operation become output parameter
(e.g., Error_matrix)
7. Click on OK
46
Graphical modeler – Save model and run it

The model is ready now!


1. Click on Save icon to save last
changes to the model
2. Click on Run button to Run the
model

47
Graphical modeler – Load and run existing model

1. Click on Open icon to navigate


your local file system and open
an existing graphical model.
QGIS graphical models are
saved as saved in .MODEL3 file
format
2. Click on Run button to Run the
model

48
Graphical modeler – Run the model

The first step when we want to run


the model is to specify the input
and output parameters of the
model.
1. From drop-down menu select
a raster for Classified Raster
parameter (e.g., RF)
2. Open Multiple Selection for
Raster Tiles
3. Select Raster Tiles (e.g., the 4
GHS_BU_S1_x)
4. Click on OK to confirm
selection
5. Define the path and the
filename for
the Error_matrix (e.g.,
EM_RF.txt)
6. Click on Run button to Run the
model

49
Graphical modeler for multiple step processes

Reference
Sum
1 Non built-up 2 Built-up prediction UA
classification
Prediction/

0 Non built-up 484810 53069 537879 90.13%


1 Built-up 36818 90631 127449 71.11%

Sum reference 521628 143700 665328


PA 92.94% 63.07% 86.49%

Result of r.kappa:
• % Commission (error)=1-UA
• % Omission (error)=1-PA
• % Observed correct =Overall accuracy
• Kappa

50
Maria Antonia Brovelli and Gorica Bratic
maria.brovelli@polimi.it; gorica.bratic@polimi.it

Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque


Ordoñez, Mathilde Puche and Mousa Sondoqah

GEOlab – Politecnico di Milano 51


This work is licensed under a following license:

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

You are free to: Under the following terms:


•Share — copy and •Attribution — You must give appropriate credit, provide a link to the
redistribute the material in license, and indicate if changes were made. You may do so in any reasonable
any medium or format manner, but not in any way that suggests the licensor endorses you or your
•Adapt — remix, transform, use.
and build upon the material •NonCommercial — You may not use the material for commercial purposes.
•The licensor cannot revoke •ShareAlike — If you remix, transform, or build upon the material, you must
these freedoms as long as you distribute your contributions under the same license as the original.
follow the license terms.

52

You might also like