Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Teachers: Gorica Bratic and Maria Antonia Brovelli
gorica.bratic@polimi.it; maria.brovelli@polimi.it
Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Ordoñez, Mathilde Puche and Mousa Sondoqah
GEOlab – Politecnico di Milano 1
Before we
start…
• Make sure that you fulfill all the requirements
for following the workshop:
• QGIS Desktop 3.10.x with GRASS
• scikit-learn library for Python3 in QGIS
• Data unzipped
• Clip.tif raster from Satellite Data Analysis and
Machine Learning Classification with QGIS – Part 1
Only if you did not download data for complete

workshop or if you do not have Clip.tif download the
data to be used only during second part the workshop
from one of the links below:
• Google Drive: link1
• Zenodo: link2
• WeTransfer: link3 (valid until 13/05/2021)
• WeTransfer: link4 (valid until 13/05/2021)
2
Run QGIS with GRASS as Administrator
Right-click on icon of QGIS Desktop

with GRASS
Select Run as Administrator
3
Load vectors (1)
Load the vector by selecting:

1. Layer menu
2. Add Layer
3. Add Vector Layer
4
Load vectors (2)
1. Navigate to the folder with

data and select roi.shp
2. Add Layer
3. Switch to Raster tab
5
Load rasters
1. Navigate to the GHS folder

and select all .tif files
2. Click on Add Layer
1. Navigate to the folder of
Clip.tif (e.g., output of the
1st part of the workshop)
2. Click on Add Layer
3. Close the Data Source
Manager
6
Install QGIS plugins
For installing a plugin

1. Go to the Plugins menu
2. Select Manage and Install Plugins
3. Go to the tab All
4. Type the name of the plugin
5. Select the plugin in the list
6. Click on Install Plugin
Please follow the procedure above to

install following plugins by inserting
their name in the search filed (step
4):
• dzetsaka: Classification tool
• QuickMapServices
7
Introduction to supervised classification
8
What is supervised classification?
Supervised classification takes advantage of the training set and classification algorithms to predict the class.
Training set is a set of areas or points in the region of interest for which the class is known (field survey,
photointerpretation, etc.)
Classification algorithm uses training set as input to „learn“ to recognize similar values in the satellite imagery.
Simplified schema of classification: Classification of fruits

https://www.tutorialandexample.com/wp-content/uploads/2020/11/Supervised-Machine-Learning-1.png
9
QGIS plugins – dzetsaka: Classification tool
For the classification you will

use the dzetsaka : Classification
tool plugin.
It requires a training set which is
composed by polygons with
numerical values denoting
classes.
We will use 3 classification
algorithms:
Gaussian Mixture Model - GMM
Random Forest - RF
K-Nearest Neighbors - KNN
10
Training set creation
11
Training set
The training set is a set of land cover class samples for each class expected in the classification output.
Unique guidelines for training set extraction do not exist, and approach for doing so varies depending on:
• Classification algorithm
• Number of classes
• Desired accuracy
• Budget
Often sampling theory is the basis for estimating the suitable sample size that would result in appropriate
characterization of spectral signatures.
In some other cases recommendation is to have a minimum of 10–30p samples per-class for training,
where p is the number of bands used.
For every approach, the training samples must be correct, therefore we must have confident source of
reference information:
• in situ data collection
• photo-interpretation of very high-resolution satellite imagery
12
Supervised classification - Create training set (1)
To create a training set you need to

create a new polygon shapefile which
will contain an attribute field
describing the belonging of each
polygon to the classes:
✓ 1 = non-built-up
✓ 2 = built-up
To create a new shapefile

1. Go to Layer menu
2. Select Create Layer
3. Select New Shapefile Layer
NOTE: dzetsaka does not accept

value 0 as a class, therefore the
(integer) values in the training set
must be larger than 0.
13
Supervised classification - Create training set (2)
Define the properties of the new

shapefile:
1. Define the File name (roi.shp) and
output folder
2. Select the Geometry type
as Polygon
3. Select CRS to be EPSG:32637 –
WGS 84 / UTM zone 37N (the same
as the CRS of the Sentinel-2 image)
4. Click on OK to create the new
shapefile layer
14
Supervised classification - Create training set – predefine possible values (1)
There is the possibility to define in

advance which values the
features have, i. e. to create the
attributes form. In the case of this
example, the predefined values are:
✓ 1 = non-built-up
✓ 2 = built-up
That is done from the layer
properties.
To open the layer properties:
1. Right-click on the training vector
layer (e.g., in our case, roi.shp)
2. Select Properties
15
Supervised classification - Create training set – predefine possible values (2)
1. Go to the Attributes Form tab

2. Select id
3. Select Value Map for Widget Type
4. Insert Value and Description by
double-clicking on a cell to activate
the editing, then type the cell
content. In our case, we will put:
Value Description
1 Non built-up
2 Built-up
5. Click on OK to conclude the form
16
Supervised classification – Activate base map – QuickMapServices (1)
For creating the training

samples, it is necessary to
load a base map helping us
to determine if a sample
belongs to the non built-up or
to the built-up class.
In this example we will use

the Bing Satellite base map
from QMS plugin for a
reference.
To activate the Bing Base map

1. Go to Web menu
2. Select QuickMapServices
3. Select Bing
4. Select Bing Satellite
17
Supervised classification – Activate base map – QuickMapServices (2)
If you do not see Bing in the

list of imagery you need to
expand the list of the
available base maps by
getting the contributed pack.
To do so:
1. Go to Web menu
2. Select
QuickMapServices
3. Select Settings
4. Go to More Services tab
5. Click on Get
contributed pack
6. Save changes
Repeat the procedure of the
previous slide to upload Bing
Satellite into QGIS
18
Supervised classification - Create training set – Start editing training vector
Now, based on the Bing Satellite base

map we can start digitizing polygons of
the training set. To start digitizing
polygons, it is necessary to enter to
editing mode as:
1. Select the training vector (e.g.,
roi.shp) in the Layers panel
2. Enter to the editing mode by
clicking on the Toggle Editing tool
3. Select Add Polygon Feature tool.
19
Supervised classification - Create training set – Add Non built-up class features
To create a polygon for the non built-up

class, find an area without buildings, then:
1. In the Map panel left-click to create the
vertices of the polygons. Right-click on
the initial vertex to finish the polygon
drawing. The polygon area will be
displayed in semi-transparent red color
2. Select Non built-up from predefined
values
3. Click on OK to assign the value to the
polygon feature
20
Supervised classification - Create training set – Add Built-up class features
To create a polygon for the built-up class,

find an area with buildings, then:
1. In the Map panel left-click to create
the vertices of the polygons. Right-click
on the initial vertex to finish the
polygon drawing. The polygon area will
be displayed in semi-transparent red
color
2. Select Built-up from predefined values
3. Click on OK to assign the value to the
polygon feature
21
Supervised classification - Create training set – conclude editing
When enough samples are added,

conclude editing. It is necessary to:
1. Click on the Save Layer Edits
button to save all the polygon
features added
2. Stop editing by clicking on
the Toggle Editing button
22
Supervised classification
23
Supervised classification – Activate dzetsaka classification tool
To activate dzetsaka classification

tool:
1. Go to the Plugins menu
2. Select dzetsaka
3. Select classification dock
24
Supervised classification - Classification with dzetsaka (1)
First step in the classification is to

specify input data
1. Specify the name of the image

to be classified (e.g., Clip.tif)
2. Specify the name of the
training set vector layer (e.g.,
roi.shp)
3. Specify the attribute of the
training set layer that contains
information regarding the
classes (must be numerical,
e.g., id)
25
Moreover, we must specify outputs

and classification settings.
1. Specify the name of the output

2. Open dzetsaka settings
3. Choose the Classifier
(classification algorithm) (e.g.,
Random Forest)
4. In the Optional parameters flag
Save matrix and specify the
path where to save error matrix
5. Specify Split to 80 % so that
80% of the samples of the
roi.shp file are used for training
the algorithm and the rest for
cross-validation
6. Press Perform the classification
26
Repeat the procedure the for

Gaussian Mixture Model and K-
Nearest Neighbors by changing
the Classifier (3), the output
name (1) and the error matrix
name (4).
27
Further information about classification algorithms included in dzetsaka
• Dzetsaka GitHub repository:

https://github.com/nkarasiak/dzetsaka
• Scikit-learn - Random Forest:

https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.
RandomForestClassifier
• Scikit-learn - K-Nearest Neighbors:

https://scikit-learn.org/stable/modules/neighbors.html#classification
• Gaussian Mixture Model:

https://doi.org/10.1109/JSTARS.2015.2441771
28
Validation
29
Assessing the success of the classification - validation
Using portion of training data for validation is just one approach for
doing validation (internal validation – cross-validation)
Often validation is conducted independently of training data set.
Validation must ensure:
• Appropriate sample size - Enough samples with minimum cost
• Statistical – Calculate the number of samples based on binomial or In QGIS there is the AcATaMa
normal approximation to the binomial distribution (Cochrane, 1977)* plugin designed for the
• Empirical – number of samples is driven by the available budget (not accuracy estimation of land
suggested) cover maps.
• Sample Allocation: It supports :
• Random – samples are distributed randomly in the area of interest • different steps in creation of
• Stratified random sampling – split area of interest into strata and then training samples (e.g., sample
select samples in each strata size definition and sample
• Equal sample size per strata allocation)
• Number of samples per strata adjusted according to the strata size • a user-friendly interface for
• Other photo interpretation of
• Confident source of reference information: samples and
• In situ data collection • the calculation of multiple
• Photo-interpretation of higher resolution imagery accuracy indexes
*https://hwbdocuments.env.nm.gov/Los%20Alamos%20National%20Labs/General/14447.pdf
30
Error matrix and accuracy indexes
Error matrix produced by dzetsaka Error matrix interpretation

Classification with Random Forest
Prediction/classification
1 Non built- 2 Built- Sum
Lines=reference PA
up up reference
# Columns=prediction (ground truth)
Reference
1 Non built-up 19309 15 19324 99.9%
19309 15
2 Built-up 26 3821 3847 99.3%
26 3821
Sum prediction 19335 3836 23171
UA 99.9% 99.6% 99.8%
• Producer’s accuracy (PA) of a class is probability
that the class present on the ground is also 𝟏𝟗𝟑𝟎𝟗
𝑷𝑨𝑵𝒐𝒏 𝒃𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟗%
captured by the classification in the thematic raster. 𝟏𝟗𝟑𝟐𝟒
• User’s accuracy (UA) of a class shows how often a 𝑷𝑨𝑩𝒖𝒊𝒍𝒕−𝒖𝒑 =

𝟑𝟖𝟐𝟏
∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟑%
user of classified map can expect to find the class 𝟑𝟖𝟒𝟕
𝟏𝟗𝟑𝟎𝟗
on the ground 𝑼𝑨𝑵𝒐𝒏 𝒃𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟗%
𝟏𝟗𝟑𝟑𝟓
• Overall accuracy (OA) is the proportion of correctly 𝟑𝟖𝟐𝟏
𝑼𝑨𝑩𝒖𝒊𝒍𝒕−𝒖𝒑 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟔%
classified pixels out of the total number of pixels. 𝟑𝟖𝟑𝟔
𝟑𝟖𝟐𝟏+𝟏𝟗𝟑𝟎𝟗
𝑶𝑨 = ∗ 𝟏𝟎𝟎 = 𝟗𝟗. 𝟖%
𝟐𝟑𝟏𝟕𝟏
31
Inter-comparison
32
Assessing the success of the classification – inter-comparison (1)
The classification success can be estimated by comparison with other maps with the
same theme by computing the error matrix and the accuracy indexes like Overall
accuracy, PA and UA. Instead of PA and UA, the Commission error (1-UA) and Omission
error (1-PA) can also be used. Moreover, another index, the Kappa index, is often
computed.
For the scope of this exercise, we will use GHS-BUILT (Sentinel-1)

that contains an information layer on built-up presence as derived
from Sentinel-1 image collections (2016). This dataset is produced
by the Joint Research Center of the European Commission. The CRS
of this dataset is WGS 84 / Pseudo-Mercator (EPSG:3857)
33
Assessing the success of the classification – inter-comparison (2)
In order to compare the data, two datasets must have:

✓ the same CRS
✓ the same resolution
✓ the same extent
✓ the same values for the same classes
Parameters Classification output GHS-BUILT S1

CRS EPSG:32637 EPSG:3857
Resolution 10 m 20 m
Extent 1 tile (X: 1622 Y: 1074) 4 tiles (X: 1367 Y: 91)
NULL - Non-built up
0 - Non-built-up
Classes 1 - Built-up
1 - Built up
34
Graphical modeler for multiple step processes
As you can see many parameters

of the two dataset need to be
homogenized and this can be
more efficiently done with the
help of the graphical modeler.
The graphical modeler allows
you to create complex models
using a simple and easy-to-use
interface. It is particularly useful
for repetitive processing
To activate the Graphical

modeler
1. Go to the Processing menu
2. Select Graphical Modeler
35
Graphical modeler - workflow
We can create a model to harmonise GHS

BUILT S1 with the classified raster as shown on the
figure on the right.
The classified raster requires only one preprocessing
operation before the comparison. On the opposite for
GHS BUILT S1 we will apply: Merge operation to
merge 4 tiles, Warp (reproject) to reproject it, Clip
raster by extent (the extent of the classified
raster), r.null to change null values with 0 since in
case of GHS the non built-up values are NULL by
default.
After all the preprocessing the r.kappa can be
executed to compute the error matrix and
the accuracy indexes.
Please note that r.kappa automatically adjusts the
resolution of classified raster to the resolution of
reference raster (e.g. GHS BUILT S1), therefore this
operation was not explicitly included in the model.
36
Graphical modeler – Define input data (1)
Let’s start with saving the model
1. Specify the name of the model

(e.g., Accuracy assessment)
2. Click on the save icon and save
the file of the model by
defining its destination path
and its name (e.g., Accuracy
assessment)
37
Input parameters are the first to be
defined in the model. When defining
input parameters, we should keep in
mind the expected data type that will be
used in the model.
In our case the first input layer
is the classification output raster we
produced before.
1. Go to Inputs tab
2. Double click on Raster Layer
3. Insert the name you want to assign
to this Raster Layer in the
Description field (e.g., Classified
Raster)
4. Click on OK
At this point only the type of input data

is defined; the data to be processed will
be specified when running the model
38
The second input is the group of

raster tiles with which we are
going to compare the classification
outputs we produced before.
1. Go to Inputs tab
2. Double click on Multiple Input
3. Insert the inputs parameter
name in the Parameter name
field (e.g., Raster tiles)
4. Select Raster Data type
5. Click on OK
At this point only the type of input

data is defined; the data to be
processed will be specified when
running the model
39
Graphical modeler – Change values of classification output
The first processing consists in changing

the values of the classification output from 1
and 2 to 0 and 1, respectively.
It is a simple algebraic operation where we
use the Raster Calculator to subtract 1 from
the classified raster.
1. Go to Algorithms tab
2. Search for Raster calculator in the
search bar
3. Double click on Raster Calculator in
Raster analysis
4. Define Expression (e.g. “Classified
Raster@1”-1)
5. Open Reference layer Multiple selection
window
6. Select Classified Raster so that output
CRS, extent, and cell size are adjusted
according to this layer
7. Click on OK
40
Graphical modeler – Merge multiple raster into one
Next processing operations are dedicated to

adaptation of GHS BUILT S1 to the
classification output. The first operation is to
merge the 4 tiles of GHS BUILT S1 into a
single tile.
2. Search for Merge in the search bar
3. Double click on Merge in GDAL →Raster
miscellaneous
4. For selecting Input layers open Multiple
selection window
5. Select input layers (e.g., Raster tiles)
6. Click on OK to confirm selection
7. Select Output data type to be integer
(e.g., Int32)
8. Click on OK
41
Graphical modeler – Reproject
Next step is to reproject GHS-BUILT S1 to the

CRS of classification output
2. Search for Warp (reproject) in the
search bar
3. Select Warp (reproject) in GDAL→Raster
projections
4. Selecting Input layer from drop-down
menu (e.g., outcome of merge operation
denoted as ‘Merged’ from algorithm
‘Merge’)*
5. Select Source CRS (e.g., EPSG: 3857)
6. Select Target CRS (e.g., EPSG: 32637)
7. Click on OK
* Pay attention that with QGIS 3.16 you have

also to click on 123 of Input layer and select
Algorithm Output.
42
Graphical modeler – Reproject - QGIS 3.16
Next step is to reproject GHS-BUILT S1 to the

CRS of classification output
2. Search for Warp (reproject) in the
search bar
3. Select Warp (reproject) in GDAL→Raster
projections
4. Select Algorihtm Output as a source of
input data
menu (e.g., outcome of merge operation
denoted as ‘Merged’ from algorithm
‘Merge’)*
6. Select Source CRS (e.g., EPSG: 3857)
7. Select Target CRS (e.g., EPSG: 32637)
8. Click on OK
*Pay attention that with QGIS 3.16 the step

4 is additional with respect to QGIS 3.10
43
Graphical modeler – Clip raster by extent
Now we clip GHS-BUILT S1 to the extent of the
classification output
2. Search for Clip raster by extent in the
search bar
3. Select Clip raster by extent in
GDAL→Raster extraction
menu (e.g., the outcome of reprojection
denoted as ‘Reprojected’ from algorithm
‘Warp(reproject)’)
5. Selectin in Clipping extent the layer based
on which the clipping extent will be
calculated (e.g., Extent of Classified Raster)
6. Click on OK
NOTE: With 3.16 the Input layer must be

searched among the Algorithm output (see
previous slide) and the Clipping extent among
the Model input.
44
Graphical modeler – Fill no data values
Finally, we need to replace NULL values of

GHS BUILT S1 with 0 to have it coherent with
the classes of the classification output
2. Search for r.null in the search bar
3. Select Clip raster by extent in
GRASS→Raster (r.*)
4. Select Name of raster map for which to
edit null values from drop-down menu
(e.g., outcome of clipping denoted as
‘Clipped (extent)’ from algorithm ‘Clip
raster by extent ’)
5. Insert The value to replace the null
value (e.g., 0.0)
6. Click on OK
45
Graphical modeler – Compute accuracy indexes
Now data are ready to be compared by means of
error matrix and accuracy indexes. For that, we
use the r.kappa algorithm.
2. Search for r.kappa in the search bar
3. Select r.kappa in GRASS→Raster(r.*)
4. Select Raster layer containing classification
result from drop-down menu (e.g.,
preprocessed classification output denoted
as ‘Output’ from algorithm ‘Raster
calculator’)
5. Select Raster layer containing reference
classes from drop-down menu (e.g.,
preprocessed comparison dataset output
denoted as ‘NullRaster’ from algorithm
‘r.null’)
6. As this is the final goal of a model, specify
Error matrix and Kappa so that the outcome
of this operation become output parameter
(e.g., Error_matrix)
7. Click on OK
46
Graphical modeler – Save model and run it
The model is ready now!

1. Click on Save icon to save last
changes to the model
2. Click on Run button to Run the
model
47
Graphical modeler – Load and run existing model
1. Click on Open icon to navigate

your local file system and open
an existing graphical model.
QGIS graphical models are
saved as saved in .MODEL3 file
format
model
48
Graphical modeler – Run the model
The first step when we want to run

the model is to specify the input
and output parameters of the
model.
1. From drop-down menu select
a raster for Classified Raster
parameter (e.g., RF)
2. Open Multiple Selection for
Raster Tiles
3. Select Raster Tiles (e.g., the 4
GHS_BU_S1_x)
4. Click on OK to confirm
selection
5. Define the path and the
filename for
the Error_matrix (e.g.,
EM_RF.txt)
model
49
Graphical modeler for multiple step processes
Reference
Sum
1 Non built-up 2 Built-up prediction UA
classification
Prediction/
0 Non built-up 484810 53069 537879 90.13%

1 Built-up 36818 90631 127449 71.11%
Sum reference 521628 143700 665328

PA 92.94% 63.07% 86.49%
Result of r.kappa:
• % Commission (error)=1-UA
• % Omission (error)=1-PA
• % Observed correct =Overall accuracy
• Kappa
50
Maria Antonia Brovelli and Gorica Bratic
maria.brovelli@polimi.it; gorica.bratic@polimi.it
Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Ordoñez, Mathilde Puche and Mousa Sondoqah
GEOlab – Politecnico di Milano 51

This work is licensed under a following license:
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
You are free to: Under the following terms:

•Share — copy and •Attribution — You must give appropriate credit, provide a link to the
redistribute the material in license, and indicate if changes were made. You may do so in any reasonable
any medium or format manner, but not in any way that suggests the licensor endorses you or your
•Adapt — remix, transform, use.
and build upon the material •NonCommercial — You may not use the material for commercial purposes.
•The licensor cannot revoke •ShareAlike — If you remix, transform, or build upon the material, you must
these freedoms as long as you distribute your contributions under the same license as the original.
follow the license terms.
52

Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Uploaded by

Copyright:

Available Formats

Teachers: Gorica Bratic and Maria Antonia Brovelli

Assistants: Francesco Bosso, Thomas Martinoli, Juan Pablo Duque

Only if you did not download data for complete

Right-click on icon of QGIS Desktop

Load the vector by selecting:

1. Navigate to the folder with

1. Navigate to the GHS folder

For installing a plugin

Please follow the procedure above to

Simplified schema of classification: Classification of fruits

For the classification you will

To create a training set you need to

To create a new shapefile

NOTE: dzetsaka does not accept

Define the properties of the new

There is the possibility to define in

1. Go to the Attributes Form tab

5. Click on OK to conclude the form

For creating the training

In this example we will use

To activate the Bing Base map

If you do not see Bing in the

Now, based on the Bing Satellite base

To create a polygon for the non built-up

To create a polygon for the built-up class,

When enough samples are added,

To activate dzetsaka classification

First step in the classification is to

1. Specify the name of the image

Moreover, we must specify outputs

1. Specify the name of the output

Repeat the procedure the for

• Dzetsaka GitHub repository:

• Scikit-learn - Random Forest:

• Scikit-learn - K-Nearest Neighbors:

• Gaussian Mixture Model:

Error matrix produced by dzetsaka Error matrix interpretation

• User’s accuracy (UA) of a class shows how often a 𝑷𝑨𝑩𝒖𝒊𝒍𝒕−𝒖𝒑 =

For the scope of this exercise, we will use GHS-BUILT (Sentinel-1)

In order to compare the data, two datasets must have:

Parameters Classification output GHS-BUILT S1

As you can see many parameters

To activate the Graphical

We can create a model to harmonise GHS

Let’s start with saving the model

1. Specify the name of the model

At this point only the type of input data

The second input is the group of

At this point only the type of input

The first processing consists in changing

Next processing operations are dedicated to

Next step is to reproject GHS-BUILT S1 to the

* Pay attention that with QGIS 3.16 you have

Next step is to reproject GHS-BUILT S1 to the

*Pay attention that with QGIS 3.16 the step

NOTE: With 3.16 the Input layer must be

Finally, we need to replace NULL values of

The model is ready now!

1. Click on Open icon to navigate

The first step when we want to run

0 Non built-up 484810 53069 537879 90.13%

Sum reference 521628 143700 665328