You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329164139

Visual-based trash detection and classification system for smart trash bin robot

Conference Paper · October 2018


DOI: 10.1109/KCIC.2018.8628499

CITATIONS READS
8 2,943

3 authors:

Irfan Salimi Bima Sena Bayu Dewantara


Electronics Engineering Polytechnic Institute of Surabaya Electronics Engineering Polytechnic Institute of Surabaya
1 PUBLICATION   8 CITATIONS    67 PUBLICATIONS   154 CITATIONS   

SEE PROFILE SEE PROFILE

Iwan Kurnianto Wibowo


Electronics Engineering Polytechnic Institute of Surabaya
43 PUBLICATIONS   93 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ERSOW - Middle Size Robot Soccer View project

E-TrashBot View project

All content following this page was uploaded by Bima Sena Bayu Dewantara on 19 February 2019.

The user has requested enhancement of the downloaded file.


Visual-based trash detection and classification system for
smart trash bin robot
Irfan Salimi Bima Sena Bayu Dewantara Iwan Kurnianto Wibowo
Informatic and Computer Department Informatic and Computer Department Informatic and Computer Department
Electronic Engineering Polytechnic Electronic Engineering Polytechnic Electronic Engineering Polytechnic
Institute of Surabaya Institute of Surabaya Institute of Surabaya
Surabaya, Indonesia Surabaya, Indonesia Surabaya, Indonesia
irfansalimi@ce.student.pens.ac.id bima@pens.ac.id eone@pens.ac.id

Abstract—This paper presents a trash detection and the human needs, will produce a rapid growth of waste when
classification system that will be implemented on a social- the growth is not controlled with the trash handling habit.
education trash bin robot. The robot is expected can be A lot of researches regarding the trash robot has been
implemented in public facilities, like airport, railway conducted recently. This kind of research also done by Prof.
station, hall and more which is there are a lot of people Paolo Dario from Scuola Superiore Sant’Anna CRIM Lab in
that potentially producing waste. We use Haar-Cascade Pisa, Italy in 2006 – 2009 [2]. They proposed a robot that
method to first detect any objects on the floor. Then, named Dustbot which can collect garbage door to door over a
Gray-Level Co-Occurrence Matrix (GLCM) and call service from a registered customer. Dustbot has 1,5
Histogram of Oriented Gradient (HOG) are combined to meter tall and 70-kilogram weight which can carry about 30-
get a set of features. Support Vector Machines (SVM) is kilogram garbage travel around 16 kilometers by a battery
used to classify the features into organic waste, non- that it has. Another research conducted by Immersive
organic waste, and non-waste. Offline testing of Robotics collaborate with SNCF, developing a robot named
classification system using 5-fold Cross Validation Baryl that will follow a person that predicted will throw
method obtain 82,7% of accuracy. Online testing of garbage [3]. Baryl will follow that person until that person
detection and classification system obtain 63.5% of throws their garbage in Baryl. Another research conducted by
accuracy with the best distance gained when the camera Daniel Deutsch, manager of Real Simple Ideas company
is tilted down to -40o with minimum distance for detection develop an interactive robot named PUSH that will
is 80 cm and 200 cm for maximum detection. By using communicate will people who throw their garbage in it. This
this robot, it is expected to help instill the habit of robot placed in the amusement park Disneyland and is driven
by a puppet around it [4].
disposing of garbage in the right place. The purpose of
this research is making people aware of handling their In this research, we conduct a work to create a social-
waste in the right way and hopefully, it can reduce the education trash bin robot by detecting and classifying trash to
waste problem. organic and non-organic waste. The robot will autonomously
travel around in public facilities to scanning for trash. After a
Keywords – trash, detection, classification, gray level co- trash visually detected, the robot will calculate the distance
occurrence matrix, histogram of oriented gradient, support vector and move closer to the trash object and produce a voice to
machines attract people come closer to the robot and make people take
the trash that the robot found and throw it to the trash bin that
is mounted on the robot.
I. INTRODUCTION
Trash is a residual object that no longer used anymore. There are various method to detect an object visually
Usually, it was a result of some process, which is by a human using the camera. One of them is using Viola-Jones Object
doing or natural ecosystem. Trash management problem Detection [5]. In 2001, Paul Viola and Michael Jones
ignored by many people, which is in 2016, Indonesia has conducted research about object detection. The research that
Emergency waste condition [1]. Emergency waste occurred Paula and Jones conducted is about decomposing the object’s
when a lot of waste that spread all over the place and can’t image into elementary characteristics such as shape and
proceed to recycle. From this condition, it can cause disasters texture to form a robust descriptor. The proposed descriptor
like flood, pollution, and many diseases. is built using Gray-Level Co-occurrence Matrix (GLCM) for
texture analysis and Histogram of Oriented Gradient (HOG)
There are many classifications of trash, one of them is for shape analysis [6].
classified by organic and non-organic waste. Organic waste
is a residual object on the result of a natural process or The descriptor is then fed to Support Vector Machine
another process which is easily decomposed by the organism. (SVM) to be classified as non-trash, organic trash or non-
This organic waste usually likes leafage and animal organic trash. Given a set of training image examples to
carcasses. On the other hand, non-organic waste is hardly categorized trash object to organic waste, non-organic waste,
decomposed by the organism. It’s like plastics, bottle, glass, and non-waste.
iron and many more. This rest of the paper is organized as follows. Section II
Whether aware or not, the human is the biggest trash describes the development of our descriptor for
contributor on earth. The rapid growth of human population detecting and classifying trash. Section III discusses an
making the human needs growing fast. Surely one day, from implementation of our descriptor for a real scene application
of the trash detection and classification. Section IV presents
the experimental results and discussions. Section V system or not. The result of object detection as shown in Fig.
concludes our work and possible future works. 4 below.
II. TRASH DETECTION AND CLASSIFICATION SYSTEM
We design our detection and classification system as
shown in Fig. 1. The task is started by capturing a trash
using the camera. A Kalman-based tracker is applied to
enhance the object detection stability. The object that has Fig. 1 Detection Result
been found will be enhanced in preprocessing. Then it’s
characteristic will be extracted in feature extraction. Using Application of tracking on this system is used to improve
the values of feature extraction, the classifier determines the the stability detection of objects that are still fluctuating in
object is organic, non-organic or non-waste. detection. With the implementation of this tracking,
fluctuations in detection results can be minimized so that the
results of object detection are relatively immobile (fixed on
detected objects). Fig. 5 below shows how the tracking work
in the detection system.

Fig. 2 The design of trash detection and classification system

2.1 Trash Detection and Tracking


We applied Haar-Cascade technique proposed in [7] to
first detect the presence of a trash around the robot. We
collected a set of positive and a set of negative images for
training purposes. The images of the can, bottle, paper
Fig 5 Tracking on Detection System
bundles, plastic wrap, drink box, leaf are treated as the
positive objects, while other images are used as negative. From Fig. 5, we have a location of the detected object
from the detection system. From that location, we can
estimate the next object location with Kalman filter using the
following equation.

(a) Positive images (b) Negative images


Table 1 Tracking Kalman filter equation
Fig. 3 The examples of training image Time Update (prediction) Measurement Update
The steps of training method are the first (correction)
feature selection, in which the feature used is searched by = -
k -1 K k = P- k / P - k + R
detecting a universal framework involving the summing of - -
pixel images between several square area features. P- k = P- k - 1 k= + K k (Z k - k)

P k = (1 - K k) P- k

Where in time update, is a prior estimation and


is prior error covariance. In Kalman correction, is
calculated for Kalman gain which is necessary to calculate
k (estimated x at time k) and Pk which is necessary for the
k+1 (future) estimate, together with Estimate of k. The
result of tracking with Kalman filter is shown in Fig. 6.
Fig. 4 Haar feature extraction from stage 0 to stage 10

The second step is to create an integral image, which


results from the summing of the pixel image of some square
area features represented in the integral image. The third
step is Adaboost training stands for adaptive training, where
the created system is trained using previously collected
object data. The last stage is the cascade classifier is a
cascaded or layered filter [7], where in this stage the object Fig. 6 Result Tracking Kalman Filter
is determined whether the object found is recognized by the
2.2 Extracting The Features III. BUILDING A DESCRIPTOR FOR VALIDATING AND
Once the system can detect the trash, the system then CLASSIFYING TRASH
processes the data of the trash object to obtain their 3.1 Gray-Level Co-Occurrence Matrix (GLCM)
characteristic feature. In feature extraction, there are some The calculation of GLCM is done by calculating the
features that can be used as data to be classified. Some variation of the value neighboring of the image and then
commonly used features extractions are texture and shape. putting it into a new matrix ( ).
Texture is one of the basic elements of the image. This
image element is in the form of traits or properties contained
in the image and forms a pattern with certain intervals of
distance and direction repeatedly that satisfy most or all of
the image field. In this research, we adopt the Gray-Level
Co-Occurrence Matrix (GLCM) [8] to obtain the properties
or characteristics of textures that can be drawn from the
statistics of gray intensity values in the image.
Another feature extraction that commonly used shapes.
One of the ways to perform characteristic extraction of the
shape is using the histogram of oriented gradient method. Fig. 7 GLCM calculation process
The oriented gradient histogram (HOG) is a feature
description used in computer vision and image processing After the matrix is formed, the value of each
that aims to calculate the appearance of gradient orientation texture feature is calculated. Texture features that can be
in the localized image portion. The appearance and shape of extracted from this matrix are energy, contrast,
local objects can often be well characterized by high intensity homogeneity, Inverse Difference Momentum (IDM),
gradient distribution or edge detection. The HOG feature is entropy and mean square error.
calculated by taking the edge-oriented orientation histogram
in the local area [9]. From the feature extraction results using 1. Contrast
HOG will be combined with the texture feature extraction Contrast is used to measure the spatial frequency of
results and then used as training data input in the images and GLCM moment differences. Contrast is
classification phase of the waste object.
a measure of the presence of a variation in the
2.3 Reducing The Vector Size of HOG feature pixel-gray level of the image pixel calculated using
The result of feature extraction from the image of a trash the following equation.
object using HOG will result in a very large number of
feature data. Far different from the feature extraction using (Eq.1)
GLCM which only got 6 feature characteristics while the
2. IDM (Inverse different Moment)
feature extraction results using HOG is getting a lot of
feature values. Feature values generated from HOG IDM is used to measure inequality and contrast
calculations that don’t have edge feature have redundant which results in larger numbers for more windows
feature values (less important), so it is necessary to reduce that showing more contrast. IDM is calculated by
the HOG feature data value to obtain the feature values that this following equation.
are really required for training data on the subsequent
classification. PCA can be defined as the orthogonal (Eq.2)
projection of the data to a lower or main dimension of the 3. Energy
linear subspace so that the variation of the projected sample
Energy is used to measure the uniformity of the
can be maximized. Mathematically PCA is a linear
texture of an object, highlighting the geometry and
projection of data that minimizes the average squared
distance between each data point and its projection data [9]. the smoothness of the layer. Energy is calculated by
this following equation.

2.4 Classifying Trash and Non Trash (Eq.3)


For classifying the objects, we use Support Vector 4. Entropy
Machine (SVM). SVM is a machine learning method that Entropy is used to measure the irregularity or
works on the principle of Structural Risk Minimization complexity of an object.
(SRM) in order to find the best hyperplane that separates the
two or more classes in the input space. The SVM technique Entropy is calculated by this following equation.
seeks to find the best classifier/hyperplane function among
an infinite number of functions to separate objects [9]. The (Eq.4)
best hyperplane is a hyperplane that located halfway
between two sets of object classes. Finding the best 5. Homogeneity
hyperplane is equivalent to maximizing margins or the Homogeneity is used to measure the uniformity of
distance between two sets of objects from different classes. the object. Homogeneity is calculated as shown in
Eq.5.
(Eq.5)
6. Mean Square Error (MSE)
Mean Square Error is used to measure the mean
squared error or deviation of the extraction value
matrix. MSE is calculated by this following
equation.

(Eq.6)

The result of texture extraction using matrix as


shown in Fig. 8 below.

Fig. 9 HOG calculation process

From the 32x32 pixel object image, it has 16 cell and


from each cell, it has 9 HOG vector, so from a single object
extraction feature using HOG, it will have 144 vector value
Fig. 8 Texture feature extraction HOG. From that value, it can visualize the gradient
orientation as shown in Fig. 11.
From Fig. 8, the image was extracted using neighboring
value variation in it. Then from that value, the feature is
calculated using equation 1-6.

3.2 Histogram of Oriented Gradient (HOG)


Histogram oriented gradient (HOG) is a feature
description used in computer vision that aims to calculate
the appearance of a gradient orientation in a localized image.
Before HOG calculation, the captured object needs to be Fig. 10 HOG visualization
cropped and resized to 32x32 pixel (preprocessing). The
purpose of resizing object is reducing the calculation time so 3.3 PCA (Principal Component Analysis)
it’s not overloading the system process. The preprocessing The input of classification is a combination of extraction
object image as shown in Fig. 9 below. feature texture and shape. Compared to the result of feature
texture extraction, HOG has a lot of extraction feature
values. If the extraction feature of shape and texture are used
directly for input classification, the result of will be
drowned by the result of HOG calculation. Using PCA, the
result of HOG will be reduced to the size of the
feature extraction result. The first step in reducing the
An object that has been cropped and resized is divided HOG feature extraction data using PCA is calculating mean
into several cells while the size of the cell is 8x8 pixel. From (average) of each HOG characteristic extraction data.
32x32 object image, it has 16 cells. Every cell on the image Because the data is reduced in the form of 2-dimensional
is multiplied by the gradient image kernel. From this data, then the equation to calculate the mean using the
operation, we have the value of and , while was following equation.
the value of a cell that multiplies by horizontal kernel (Eq.8)
gradient image and the is the value of a cell that
multiplies by vertical kernel gradient image. Using the value Where n is the total of HOG calculation, and are
of and , magnitude and direction is being calculated data from the first dimension and second dimension. after
using the following equation. obtaining average value of data and average data , then
the reduction of data values and data against average
data and average data as follows:
(Eq.7)
M= (Eq.9)
From the object image, we can calculate the Histogram
of Oriented gradient from each cell. The HOG calculation After the value of data reduction x and y data obtained,
process is shown in Fig. 10 below. the next step is to calculate the covariance matrix. This
covariance matrix contains information about the
distribution of data. This covariance matrix is calculated IV. EXPERIMENTAL RESULT
using the following matrix equation:
4.1 Testing Detection System
(Eq.10) We validate our detection system using offline testing
that loading image from a folder then writes down the
Where represents the matrix transpose operation. detection result. The table shown below is offline testing of
Matrix is m x m where m is the number of data the detection system.
dimensions. Having obtained the matrix of covariance, then Status Detection Result Status Detection Result
calculated Eigenvectors and Eigenvalues of the covariance
matrix. The Principal Component is an Eigenvector of the correct correct
covariance matrix. The first principal component is the
Eigenvectors corresponding to the largest Eigenvalue, then
the second Principal component is the Eigenvector
corresponding to the second largest Eigenvalue and so on.
3.4 Combining Features to Create a Descriptor miss miss
For the training, data from extraction feature using
and calculation result will be concatenated to
create a descriptor that will be used for the input training
classification. The combining feature of and is
shown in Fig. 12.
.
miss correct

correct correct

correct correct

Fig. 11 Combining feature of HOG and GLCM


correct correct
3.5 Classification
We use (Support Vector Machine) to classify the
object that has been found in the captured frame. We train
the classifier using C-SVC type parameter for classifying a
multiclass object ( and linear kernel type of . correct correct
1000 data set image is prepared to train the classifier,
including data image organic waste object, non-organic
waste object, and non-waste object. The performance of this
classifier is measured using the percentage of classification
accuracy dataset. K-fold CV (Cross Validation) is used to
evaluate the classifier performance. Fig. 13 shows how the
classifier is being tested using K-fold Cross-Validation. From the results of the detection test on the waste image
object, it is known that the accuracy value of garbage
detection accuracy from the 1330 waste images, there are 83
Data miss- detected. From that, we can calculate the accuracy of
Training object detection as . From the equation
above we have 93,76 detection accuracy.
Data
Testing
4.2 Testing Classification System
For testing the classification system, we use k-fold Cross
Figure 12 K-fold Cross-Validation Validation output. K-fold CV is a statistical method that can
be used to evaluate the performance of a model or result of
data training which the data is separated into two subsets
which are learning process data and evaluation data. The 3. Precision, when classifier predict an object as a waste,
model or algorithm is trained by a learning subset and how often it predicts correctly.
validated by a subset of validation. In this testing, we use 5-
fold CV. The result of testing classification system using 5-
fold CV output from 1000 object testing image as shown in
the table below. 4. Prevalence, see how often does the waste condition
actually occur in our sample.
Table 2. 5-fold Cross Validation Result

No Subset Data Miss Correct Accuracy (%)


1 1 29 171 85.5
2 2 21 179 89.5
3 3 35 165 82.5 V. CONCLUSION
4 4 30 170 85
5 5 58 142 71 We have developed a detection and classification system
to determine an object is a trash of organic, non-organic or
From the table of object classification system, we can not trash. The combination of Viola-Jones object detection
calculate the average accuracy of the classification system is and tracking with Kalman filter is proposed to detect and
about 82,7 % accuracy. track a trash object. The support vector machine (SVM) is
used to classify the object that found by the detection system
4.3 Testing Classification using confusion matrix and then successfully classified into three classes, they are
The confusion matrix is used for testing the classifier organic, non-organic and not trash. Experimental results
system. The classification testing with the confusion matrix show that our detection system is suitable to detect a trash
is using a streaming image from the camera to capture the with the classification accuracy achieves 82,7% for 5-fold
environment around the robot. The frame that captured by Cross Validation Output and it can achieve 73,49% accuracy
the camera were labeled by the system to determine whether for confusion matrix validation test. The experimental
the system is failed or succeed to classify the object on the results show that our system is promising to detect and
frame. From that point, we used it as a system failure classify trash object. In our future work, we will implement
analysis. The result of this system testing as shown in this our system into the real robot.
table below.
REFERENCES
Table 3. Confusion Matrix of classification waste object [1] Yuniadhi, Agung. Indonesia Darurat Sampah. Kompas.com. 2016, 20
August.2017.http://properti.kompas.com/read/2016/01/27/121624921/
Predicted Indonesia.Darurat.Sampah.
= 83 Non-waste Waste [2] Espingardeiro, Antonio. How Recycling Robots Could Help Us Clean
the Planet. Scuola Superiore Sant'Anna's CRIM Lab, 2010. 25 July
Non-Waste TN: 10 FP: 12 22 2017.http://spectrum.ieee.org/automaton/robotics/industrialrobots/042
Actual Waste FN: 10 TP: 51 61 110-recycling-robots.
20 63 [3] Robot man. Baryl Trash Can Robot Keeps Public Places Clean. Bio-
inspirred robots. 25 july 2017. http://www.roboticgizmos.com/baryl-
From that confusion matrix, there are two possible trash-robot/
prediction classes and two possible actual classes where this [4] Walsh, Michael. Disney World PUSHes out talking trash can after
contract ends with robot's maker. New York Daily News, 2014. 20
actual and prediction are stood for the presence of object August 2017. http://www.nydailynews.com/news/world/disney-
waste on the frame capture ( ). The most basic term while world-pushes-talking-trash-article-1.1608988.
using this confusion matrix are True positive where the [5] Paul Viola and Michael J.Jones, “Robust Real-Time Face Detection”,
system predicts an object as a waste and it was actually a International Journal of Computer Vision, 2004.
waste. True negative occurs when the system predicts an [6] Bima Sena Bayu Dewantara and Jun Miura, “Estimating Head
object as non-waste and actually it was non-waste. False Orientation using a Combination of Multiple Cues”, IEICE TRANS
Vol. E99D, Issue 6, pp. 1603-1614, 2016.
positive occurs when the system predicts an object as waste
[7] Paul Viola and Michael J.Jones, “Robust Real-Time Face Detection”,
but it’s actually non-waste object. False negative occurs International Journal of Computer Vision, 2004.
when the system predicts an object as non-waste but it’s [8] Alvaro Salmador, Javier Pérez Cid, Ignacio Rodríguez Novelle,
actually a waste object. From this term we can determine “Intelligent Garbage Classifier”, IJIMAI, Vol. 1, 2008.
the reliability of our classification system using this [9] Takuya Kobayashi, Akinori Hidaka and Takio Kurita, “Selection of
equation: Histograms of Oriented Gradients Features for Pedestrian
Detection”, ICONIP, 2007.
1. Accuracy, see how often the classifier put an object to
their classes correctly.

2. Miss-classification, see how often the classifier fails to


put an object on their classes correctly.

View publication stats

You might also like