You are on page 1of 14

Improving image classification accuracy

through model stacking


Ali Santacruz
R-Spatialist at amsantac.co
Key ideas
Modelensemblingincreasesaccuracybycombiningpredictionsofmultiplemodelstogether
Approachesformodelensembling:
Usesimilarclassifiersandcombinethemtogether
Combinedifferentclassifiersusingmodelstacking

Theseparatepredictorsmusthavelowcorrelationtoallowthecombinedpredictortogetthebest
fromeachmodel

2/14
Model stacking example: Image classification
Let's import the image to be classified (Landsat 7 ETM+, path 7 row 57, taken on 20000316
convertedtosurfacereflectanceandprovidedbyUSGS)andtheshapefilewithtrainingdata:

library(rgdal)
library(raster)
library(caret)

set.seed(123)

img<brick(stack(as.list(list.files("data/","sr_band",full.names=TRUE))))
names(img)<c(paste0("B",1:5,coll=""),"B7")

trainData<shapefile("data/training_15.shp")
responseCol<"class"

3/14
Extract data from image bands
dfAll=data.frame(matrix(vector(),nrow=0,ncol=length(names(img))+1))
for(iin1:length(unique(trainData[[responseCol]]))){
category<unique(trainData[[responseCol]])[i]
categorymap<trainData[trainData[[responseCol]]==category,]
dataSet<extract(img,categorymap)
dataSet<sapply(dataSet,function(x){cbind(x,class=rep(category,nrow(x)))})
df<do.call("rbind",dataSet)
dfAll<rbind(dfAll,df)
}

4/14
Create partition for training, test and validation
sets
#Createvalidationdataset
inBuild<createDataPartition(y=dfAll$class,p=0.7,list=FALSE)
validation<dfAll[inBuild,]
buildData<dfAll[inBuild,]

#Createtrainingandtestingdatasets
inTrain<createDataPartition(y=buildData$class,p=0.7,list=FALSE)
training<buildData[inTrain,]
testing<buildData[inTrain,]

5/14
Balancing a dataset by undersampling
source("undersample_ds.R")
undersample_ds

function(x,classCol,nsamples_class)
{
for(iin1:length(unique(x[,classCol]))){
class.i<unique(x[,classCol])[i]
if((sum(x[,classCol]==class.i)nsamples_class)!=
0){
x<x[sample(which(x[,classCol]==class.i),sum(x[,
classCol]==class.i)nsamples_class),]
}
}
return(x)
}

6/14
Balance training dataset
nsamples_class<600

training_bc<undersample_ds(training,"class",nsamples_class)

7/14
Build separate models in the training data and
examine their correlation
#RandomForestsmodel
set.seed(123)
mod.rf<train(as.factor(class)~B3+B4+B5,method="rf",data=training_bc)
pred.rf<predict(mod.rf,testing)
#SVMmodel
set.seed(123)
mod.svm<train(as.factor(class)~B3+B4+B5,method="svmRadial",data=training_bc)
pred.svm<predict(mod.svm,testing)

results<resamples(list(mod1=mod.rf,mod2=mod.svm))
modelCor(results)

mod1mod2
mod11.000000000.02574656
mod20.025746561.00000000

8/14
Create a new dataset combining the two
predictors
predDF<data.frame(pred.rf,pred.svm,class=testing$class)
predDF_bc<undersample_ds(predDF,"class",nsamples_class)

Fitastackedmodeltorelatetheclassvariabletothetwopredictions:

set.seed(123)
combModFit.gbm<train(as.factor(class)~.,method="gbm",data=predDF_bc,
distribution="multinomial")
combPred.gbm<predict(combModFit.gbm,predDF)

9/14
Overall accuracy based on test dataset
#RFmodelaccuracy
confusionMatrix(pred.rf,testing$class)$overall[1]

Accuracy
0.9812897

#SVMmodelaccuracy
confusionMatrix(pred.svm,testing$class)$overall[1]

Accuracy
0.967816

#Stackedmodelaccuracy
confusionMatrix(combPred.gbm,testing$class)$overall[1]

Accuracy
0.9838786

10/14
Validation
pred1V<predict(mod.rf,validation)
pred2V<predict(mod.svm,validation)
predVDF<data.frame(pred.rf=pred1V,pred.svm=pred2V)
combPredV<predict(combModFit.gbm,predVDF)

11/14
Overall accuracy based on validation dataset
accuracy<rbind(confusionMatrix(pred1V,validation$class)$overall[1],
confusionMatrix(pred2V,validation$class)$overall[1],
confusionMatrix(combPredV,validation$class)$overall[1])
row.names(accuracy)<c("RF","SVM","Stack")
accuracy

Accuracy
RF0.9817141
SVM0.9658993
Stack0.9830320

12/14
Producer's accuracy based on validation dataset
prod_acc<rbind(confusionMatrix(pred1V,validation$class)$byClass[,1],
confusionMatrix(pred2V,validation$class)$byClass[,1],
confusionMatrix(combPredV,validation$class)$byClass[,1])
row.names(prod_acc)<c("RF","SVM","Stack")
round(prod_acc,4)

Class:1Class:2Class:3Class:5Class:6Class:7
RF0.99270.97480.97690.986710.9982
SVM0.99130.94390.96150.990510.9914
Stack0.99270.97520.97790.996310.9914

13/14
Further resources
Foradetailedexplanationpleasesee:

Thispostinmyblog(includeslinkfordownloadingsampledataandsourcecode)
AndthisvideoonmyYouTubechannel
Alsocheckouttheseusefulresources:

Kaggleensemblingguide
How to build an ensemble of Machine Learning algorithms in R (ready to use boosting,
baggingandstacking)

14/14