Imbalanced Classification Problem: A Remote Sensing Example

Imbalanced classification problem
A remote sensing example
Ali Santacruz
R-Spatialist at amsantac.co
Key ideas
· Machine learning classifiers fail to cope with imbalanced training datasets
· Performance of ML classifiers may get biased towards majority class
· Sampling methods are used to treat imbalanced datasets: undersampling, oversampling,
synthetic data generation and cost sensitive learning
· Metrics such as precision, recall, or Fscore are preferred over overall accuracy as performance
measures when dealing with imbalanced datasets
2/12
Remote sensing example
First let's import image to be classified and shapefile with training data
library(rgdal)
library(raster)
library(caret)

set.seed(123)

img <‐ brick(stack(as.list(list.files("data/", "sr_band", full.names = TRUE))))
names(img) <‐ c(paste0("B", 1:5, coll = ""), "B7")

trainData <‐ shapefile("data/training_15.shp")
responseCol <‐ "class"
3/12
Extract data from image bands
dfAll = data.frame(matrix(vector(), nrow = 0, ncol = length(names(img)) + 1))
for (i in 1:length(unique(trainData[[responseCol]]))){
category <‐ unique(trainData[[responseCol]])[i]
categorymap <‐ trainData[trainData[[responseCol]] == category,]
dataSet <‐ extract(img, categorymap)
dataSet <‐ sapply(dataSet, function(x){cbind(x, class = rep(category, nrow(x)))})
df <‐ do.call("rbind", dataSet)
dfAll <‐ rbind(dfAll, df)
}
dim(dfAll)
[1] 80943 7
4/12
Create partition for training and test sets
inBuild <‐ createDataPartition(y = dfAll$class, p = 0.7, list = FALSE)
training <‐ dfAll[inBuild,]
testing <‐ dfAll[‐inBuild,]
dim(training)
[1] 56662 7
dim(testing)
[1] 24281 7
table(training$class)

1 2 3 5 6 7
4753 21626 14866 8093 3535 3789
5/12
Model using imbalanced dataset
training_imb <‐ training[sample(1:nrow(training), 2400), ]
table(training_imb$class)

1 2 3 5 6 7
212 900 613 353 149 173
mod1_imb <‐ train(as.factor(class) ~ B3 + B4 + B5, method = "rf", data = training_imb)
note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
mod1_imb$results[, 1:2]
mtry Accuracy
1 2 0.979454
2 3 0.977318
6/12
Balancing a dataset by undersampling
undersample_ds <‐ function(x, classCol, nsamples_class){
for (i in 1:length(unique(x[, classCol]))){
class.i <‐ unique(x[, classCol])[i]
if((sum(x[, classCol] == class.i) ‐ nsamples_class) != 0){
x <‐ x[‐sample(which(x[, classCol] == class.i),
sum(x[, classCol] == class.i) ‐ nsamples_class), ]
}
}
return(x)
}
7/12
Balance training dataset
(nsamples_class <‐ 400)
[1] 400
training_bc <‐ undersample_ds(training, "class", nsamples_class)
table(training_bc$class)

1 2 3 5 6 7
400 400 400 400 400 400
8/12
Model using balanced dataset
mod1_bc <‐ train(as.factor(class) ~ B3 + B4 + B5, method = "rf", data = training_bc)
note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
mod1_bc$results[, 1:2]
mtry Accuracy
1 2 0.9797371
2 3 0.9766507
9/12
Evaluate accuracy of the two models using the
testing set
# Imbalanced data
pred1_imb <‐ predict(mod1_imb, testing)
confusionMatrix(pred1_imb, testing$class)$overall[1]
Accuracy
0.9829496
# Balanced data
pred1_bc <‐ predict(mod1_bc, testing)
confusionMatrix(pred1_bc, testing$class)$overall[1]
Accuracy
0.9788312
10/12
Evaluate sensitivity in the two models using the
testing set
# Imbalanced data
confusionMatrix(pred1_ub, testing$class)$byClass[, 1]
Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
0.9951644 0.9794283 0.9806938 0.9945213 1.0000000 0.9809816
# Balanced data
confusionMatrix(pred1_bc, testing$class)$byClass[, 1]
Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
0.9941973 0.9662191 0.9759849 0.9904844 1.0000000 0.9975460
11/12
Further resources
· For a detailed explanation please see:
This post in my blog (includes link for downloading sample data and source code)
And this video on my YouTube channel
· Also check out these useful resources:
Practical guide to deal with imbalanced classification problems in R
8 tactics to combat imbalanced classes in your machine learning dataset
12/12

Imbalanced Classification Problem: A Remote Sensing Example

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Imbalanced Classification Problem: A Remote Sensing Example

Uploaded by

Copyright:

Available Formats

Imbalanced classification problem

A remote sensing example

You might also like