You are on page 1of 4

2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)

Bottleneck RGB Features for Tea Clones


Identification
R. Sandra Yuwana∗ , Endang Suryawati∗ , Ana Heryana∗ , Vicky Zilvan∗ , Dadan Rohdiana† and Heri Syahrian K†
∗ Research
Center for Informatics
Indonesian Institute of Sciences - LIPI, Bandung, Indonesia
Email:{rade014,enda029,anah002,vick001}@lipi.go.id
† Research Institute for Tea and Cinchona

Indonesian Agency for Agricultural Research and Development, Gambung, Indonesia


Email:{rohdiana,heri.syahrian}@ritc.id

Abstract—As each tea clone may produce different quality of features This result is a reference in our experiment, using
tea, it is important to have them identified in the field. Tea Clones RGB data as a comparison for this experiment. Spesifically
identification is one application of ICT technologies in agricul- for tea plants, the study was carried out by Zhang et al. [7]
ture. Tea clones may have very similar characteristics between
them, required to have a good amount of data to train a machine conducting real-time monitoring of tea leaf harvest time,
learning-based classifiers to have good performances. However, research on tea leaf quality was carried out by Gejima et
we may have to deal with a small amount of data in many cases. al. [8] used the RGB model of a tea leaf image. In many of
To overcome this, we propose to use an encoder-based feature these studies, vast and varied data sets are needed to achieve
reduction to produce RGB-based bottleneck features. The output good results.
features are then fed into an SVM classifier. We evaluate our
features on the classification of two tea clones of the Gambung The research of deep learning technology in plants was
Assamica (GMB) series. Our experimental results show that conducted by Sun et al. [9], who used CNN to identify tea
our proposed features achieve better performance than using leaf disease. The accuracy obtained is 93.75 %. Pardede et
full dimensions RGB.
al. [10] use a convolutional auto encoder for the detection of
Index Terms—Dimensionality reduction, bottleneck features,
Support Vector Machine (SVM), Tea clones. plant diseases. The results show that the use of conventional
autoencoders with more hidden layers gives better results. The
I. I NTRODUCTION use of convolutional networks indicates better performance.
Indonesia is one of the tea producing countries in the One reliable machine learning method for classification
world. There are currently many varieties of tea. In Indonesia, is Support Vector Machine (SVM). SVM is a method for
Research Institute for Tea and Cinchona (RITC)in Gambung, classification and regression, which is a machine learning
West Java, is one of the centers of excellence for tea and algorithm for labeled data [11]. The advantages of using
quinine studies. RITC succeeded in producing a series of su- SVM include that SVM works effectively on large dimension
perior tea clones, and they are 11 Gambung Assamica (GMB) sizes, its effectiveness is maintained when the number of
clones, and 5 Gambung Sinensis (GMBS) clones. Each clone dimensions is more significant when compared to the number
has almost the same characteristics. Different clones produce of samples, efficient memory usage because it uses a decision
different quality and quantity of tea. High physical similarities function called support vectors [11]. Research on plants using
between clones make it difficult for ordinary people like farm- SVM, among others, was conducted by Saranka et al. [12],
ers to distinguish them. Expertise is needed to differentiate who examined the monitoring of fermentation of black tea
between these clone varieties. Currently, there are not many processing by using SVM as a classifier.
experts who can identify these clones, and they still identify In recent years, data have become very valuable because
them manually. Automatic identification will be beneficial they can provide useful information. For tea leaves, there
to resolve this problem. Research on identifying plant types is currently no specific digital dataset for GMB and GMBS
using neural networks has shown many successes [1]–[5]. tea leaf clones. We collected the Gambung Assamica (GMB)
They use the Multi-Layer Perceptron (MLP) approach and series tea leaf dataset in this study manually by taking photos
deep learning in solving problems. of tea leaves directly. However, collecting data from the
The use of MLP and in-depth learning in agriculture have beginning is an activity that requires effort and high costs.
been carried out. For MLP, Kusumo et al. [6] proposed the Original image data has high dimensions. This condition is a
detection of disease in maize using SVM, Decision Tree challenge because it affects the size of the dataset. If the data
(DT), Random Forest (RF), and Naive Bayes (NB) using has many dimensions, then machine learning techniques may
various image processing features. The results show RGB has not be optimal when trained. Dimension reduction is a way
the best accuracy value when compared to other extraction to overcome this problem.

978-1-7281-4520-4/19/$31.00 ©2019 IEEE 259


Dimensionality reduction (DR) is the reduction of dimen-
sions from high to low dimensions by transforming data into
meaningful representations [13]. There are 2 ways to make
dimensionality reduction, namely, feature selection (FS) and
feature extraction (FE). FS aims to find the most relevant
features in the dataset, as well as to maintain a new dataset
schema that has variables that remain relevant to the original
dataset [14], while FE maps high-dimensional data into low-
dimensional data [15]. Both have the same goal, which is
to streamline data processing and improve data quality by
dealing with the dimensions of the data set. Research on DR
has been carried out in various fields with good performance
results, such as text [16]–[21]. Fig. 1. The Schema of Proposed Method
One type of DR is an autoencoder. Autoencoder is a deep
neural network with 2 main processes, namely encoder, and
encoder stage, the image is compressed from high to low
decoder. Research on autoencoder related to DR was carried
dimensions. The encoder scheme design has 2 convolution
out by Wang et al. [22], who tried to use autoencoder on
layers, 2 pooling layers, which are CNN layers, and flatten
MNIST data, and Olivetti faces datasets by focusing on the
and dense, which are MLP layers. Input for the convolutional
process of reducing the dimensions of the dataset. The result
layer in the form of a 64x64 tea leaf image, then by the
is the use of autoencoders based on DR will produce better
pooling layer, is reduced to 32x32. In the second pooling
visualization.
layer, the image is reduced again so that the resolution is
In this paper, we propose a feature reduction (FR) scheme
11x11. The pooling layer causes the input data to decrease in
in the image using an encoder to produce an RGB-based bot-
dimensions. The SVM classification is used to classify images
tleneck feature, and SVM as a classifier, so the identification
into members of the GMB 3 or GMB 9 classes, as shown in
of images becomes more effective.
Fig. 2.
The rest of this paper is organized as follow. Section II
explains the dimension reduction method that we propose
for GMB 3 and GMB 9 tea clones identification. Section III
describe our experimental setup. In Section IV we present our
results and discussions, and finally this work is concluded in
Section V.
II. T HE P ROPOSED M ETHOD
Besides text and speech, current image data show very
promising potential as a basis for information to support
various implementations in many fields. The digital image
data set are now widely available, both open access and not,
labeled, and unlabeled data. Today, processing digital image
data is much easier due to the increasing ability of multimedia
computing. However, some tasks using image data may be
limited to data availability. To make maximum use of it, we
need a way to reduce the dimensions of the data set to smaller
sizes, but useful information is still stored.
The original image data (RGB), which will be classified
as data classes, has a large enough dimension. This has an
effect on data processing time and large storage size. Fig.1 is
a schema diagram of the proposed method. Before classifying
an image class, the image must first be processed using an
encoder scheme. It aims to reduce the image dimensions by
reducing the features using an encoder. The result is an RGB-
based bottleneck feature. Then the image is classified using
SVM. Fig. 2. The process of identifying GMB 3 and GMB 9 tea leaves
The input for the encoder process is an RGB image of
the 64x64x3 GMB 3 and GMB 9 tea leaves. The input As a comparison for our proposed method, we use red-
shape is 12,288, we vary the input data by 20% (2457), 50% green-blue (RGB) image input, directly classify the image
(6144), and 80% (9830) of the original dimensions. In the class using SVM classifier, without experiencing a reduction

260
in dimensions. We use a linear kernels for SVM. The penalty
parameters (C) set to 1; this parameter is used to control
outliers, so misclassification in each training data can be
avoided.

III. E XPERIMENTAL S ETUP


In this experiment, we used tea leaf data from The Research
Institute for Tea and Cinchona (RITC) in Gambung, West Java
- Indonesia. Based on Prawira Atmaja et al. [23], RITC has
two superior varieties of tea leaves, namely Camellia sinensis
var. Assamica from Assam in India and Camellia sinensis
var sinensis from China. At present, for varieties Camellia
sinensis var. sinensis RITC has five classes of tea leaf clones,
namely GMBS 1, GMBS 2, GMBS 3, GMBS 4, and GMBS
5 [24]. As for Camellia Sinensis var. Assamica, there are
11 classes of Gambung clones (GMB), namely GMB clones
Fig. 4. Sample Images of GMB 9
1 to GMB 11. Between these classes of clones have quite
similar characteristics. In this experiment, we only focus on
TABLE II
two categories, namely GMB 3 and GMB 9 clones, because THE DETAILS DISTRIBUTION OF DATA IN THE EXPERIMENTS
it has a high level of production productivity. The difference
between these classes can be seen in Table I. Clones Training Validation Testing Total
GMB 3 483 59 56 598
TABLE I
THE DESCRIPTION OF GMB 3 AND GMB 9 GMB 9 548 86 65 699
Total 1031 145 121 1297
Descriptions GMB 3 GMB 9
Color dark green yellowish-green
Thick leaf 0.19 mm 0.23 mm In this experiment, we used 1297 images of tea leaves,
consisting of 598 tea leaves for GMB 3 and 699 for GMB 9,
Resistant resistant to smallpox leaves less resistant to smallpox
as shown in Table II. Randomly, the dataset was divided into
Production 4.25 tons/year 4.7 tons/year
80% training data, 10% as data validation, and 10% as data
Area medium-high low-high testing. We set various times to 50, 100, 150, and 200. Relu
activation is used in the encoder process. This experiment
Data sets of GMB clone tea leaf images in this experiment uses in-depth learning of Tensorflow, Hard and Python 3.7
were collected in mid-December 2018, using several types of
smartphones and DSLR cameras. Some examples of GMB 3 IV. R ESULTS AND D ISCUSSIONS
and GMB 9, as shown in Figs. 3 and 4. The optimum value is obtained when the epoch reaches
150 and 200, where performance around 91% is achieved.
Whereas at epoch 50 and 100, the optimum value only reaches
88% accuracy, see Table III. Overall, reduction of dimensions
using epoch 50, 100, 150, and 200 results in a minimum
accuracy value of 83.4%. Where this value is actually quite
good for accuracy.
When compared with RGB data, on epoch 200, the reduc-
tion in image features by 20%, 50%, and 80%, 3 points more
superior when compared to RGB, as shown in Table IV. Even
using only 20% data shows to be slightly better than using
full RGB features. RGB only achieves accuracy of 80.1%.
V. C ONCLUSIONS AND F UTURE W ORKS
In this paper, We applied encoder to produce a bottleneck
features for identifying and classifying 2 classes of Tea leaves
of GMB 3 and GMB 9 series clone. Using an encoder for
the dimension reduction process, we extract and reduced full
Fig. 3. Sample Images of GMB 3 RGB features and then the output features are used as inputs
to an SVM classifiers. The results show that by taking a

261
TABLE III [5] S. G. Wu, F. S. Bao, E. Y. Xu, Y.-X. Wang, Y.-F. Chang, and Q.-
T HE PERFORMANCE OF PROPOSED METHODS WHEN THE NUMBER OF L. Xiang, “A leaf recognition algorithm for plant classification using
EPOCHS ARE VARIED probabilistic neural network,” in 2007 IEEE international symposium
on signal processing and information technology, pp. 11–16, IEEE,
Number of epoch Feature Reduction 2007.
FR 20% FR 50% FR 80% [6] B. S. Kusumo, A. Heryana, O. Mahendra, and H. F. Pardede, “Machine
50 86.7 85.9 88.4 learning-based for automatic detection of corn-plant diseases using
100 85.1 88.4 86.7 image processing,” in 2018 International Conference on Computer,
150 90.9 91.7 88.4 Control, Informatics and its Applications (IC3INA), pp. 93–97, IEEE,
200 83.4 91.3 90.9 2018.
[7] L. Zhang, H. Zhang, Y. Chen, S. Dai, X. Li, I. Kenji, Z. Liu, and M. Li,
“Real-time monitoring of optimum timing for harvesting fresh tea
leaves based on machine vision,” International Journal of Agricultural
TABLE IV and Biological Engineering, vol. 12, no. 1, pp. 6–9, 2019.
THE PERFORMANCE COMPARISON OF SVM CLASSIFIER [8] Y. Gejima, M. Nagata, et al., “Basic study on kamairicha tea leaves
USING INPUT FR ( I . E ., FR 20%, 50%, AND 80%) AND WITHOUT quality judgment system.,” Basic study on Kamairicha tea leaves
FR (RGB) WHEN THE NUMBER OF EPOCH 200. quality judgment system., pp. 1–10, 2000.
[9] X. Sun, S. Mu, Y. Xu, Z. Cao, and T. Su, “Image recognition of tea
Input for Classifier Epoch (% Accuracy) leaf diseases based on convolutional neural network,” arXiv preprint
arXiv:1901.02694, 2019.
FR 20% 200 83.4 [10] H. F. Pardede, E. Suryawati, R. Sustika, and V. Zilvan, “Unsupervised
convolutional autoencoder-based feature learning for automatic detec-
FR 50% 200 91.3
tion of plant diseases,” in 2018 International Conference on Computer,
FR 80% 200 90.9 Control, Informatics and its Applications (IC3INA), pp. 158–162, IEEE,
2018.
RGB data 200 80.1 [11] M. S. Hossain, R. M. Mou, M. M. Hasan, S. Chakraborty, and M. A.
Razzak, “Recognition and detection of tea leaf’s diseases using support
vector machine,” in 2018 IEEE 14th International Colloquium on
feature of around 50% of the size of the image, it will produce Signal Processing & Its Applications (CSPA), pp. 150–154, IEEE, 2018.
[12] S. Saranka, T. Kartheeswaran, D. Wanniarachchi, and W. Wan-
an accuracy of 91.3% on the Epoch 200. This is better when niarachchi, “Monitoring fermentation of black tea with image process-
compared to other image sizes that are only taken at 20% ing techniques,” 2016.
and 80%. This accuracy is better by 11.2% when compared [13] L. Van Der Maaten, E. Postma, and J. Van den Herik, “Dimensionality
reduction: a comparative,” J Mach Learn Res, vol. 10, no. 66-71, p. 13,
to RGB images. 2009.
In the future, we also would like to identify and classify [14] W.-L. Chao, “Dimensionality reduction,” Graduate Institute of Com-
all classes of GMB clone tea leaf series, which consist of 11 munication Engineering, National Taiwan University, Tech. Rep, 2011.
[15] M. D. Patil and S. S. Sane, “Dimension reduction: A review,” Interna-
classes. We will also focus on using other features such as tional Journal of Computer Applications92, vol. 16, pp. 23–29, 2014.
texture, shape, and leaf angle. Because the weakness of an [16] G. Sasikala, R. Kowsalya, and M. Punithavalli, “A comparative study
RGB image is that it is sensitive to light, the use of stronger of dimension reduction techniques for content-based image retrieval,”
The Int. J. of Multimedia & Its Applications, vol. 2, no. 3, 2010.
features can produce better classification results. [17] Y. Mao, K. Balasubramanian, and G. Lebanon, “Dimensionality re-
duction for text using domain knowledge,” in Proceedings of the
ACKNOWLEDGMENT 23rd International Conference on Computational Linguistics: Posters,
The authors would like to thank Hilman F. Pardede for dis- pp. 801–809, Association for Computational Linguistics, 2010.
[18] H. Kim, P. Howland, and H. Park, “Dimension reduction in text clas-
cussions and inputs. This paper is partially funded by Insinas sification with support vector machines,” Journal of Machine Learning
Grant 2019 (Contract Number: 091/P/PRL-LIPI/INSINAS- Research, vol. 6, no. Jan, pp. 37–53, 2005.
1/II/2019) from Indonesian Ministry of Research, Technology, [19] M. Shafiei, S. Wang, R. Zhang, E. Milios, B. Tang, J. Tougas, and
R. Spiteri, “Document representation and dimension reduction for text
and Higher Education. The experiment on this research is clustering,” in 2007 IEEE 23rd International Conference on Data
conducted on High Performance Computing (HPC) facilities Engineering Workshop, pp. 770–779, IEEE, 2007.
in Research Center for Informatics, Indonesian Institute of [20] C. Ding, X. He, H. Zha, and H. D. Simon, “Adaptive dimension reduc-
tion for clustering high dimensional data,” in 2002 IEEE International
Sciences (LIPI). We thank our fellow researchers in the Conference on Data Mining, 2002. Proceedings., pp. 147–154, IEEE,
Research Center for Informatics- LIPI which has provided 2002.
assistance in this study. [21] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspec-
tral image classification: A dimension reduction and deep learning
R EFERENCES approach,” IEEE Transactions on Geoscience and Remote Sensing,
vol. 54, no. 8, pp. 4544–4554, 2016.
[1] B. C. Karmokar, M. S. Ullah, M. K. Siddiquee, and K. M. R. [22] Y. Wang, H. Yao, and S. Zhao, “Auto-encoder based dimensionality
Alam, “Tea leaf diseases recognition using neural network ensemble,” reduction,” Neurocomputing, vol. 184, pp. 232–242, 2016.
International Journal of Computer Applications, vol. 114, no. 17, 2015. [23] M. Prawira-Atmaja, H. Khomaini, H. Maulana, S. Harianto, D. Ro-
[2] E. Suryawati, R. Sustika, R. S. Yuwana, A. Subekti, and H. F. Pardede, hdiana, et al., “Changes in chlorophyll and polyphenols content in
“Deep structured convolutional neural network for tomato diseases camellia sinensis var. sinensis at different stage of leaf maturity,” in
detection,” in 2018 International Conference on Advanced Computer IOP Conference Series: Earth and Environmental Science, vol. 131,
Science and Information Systems (ICACSIS), pp. 385–390, IEEE, 2018. p. 012010, IOP Publishing, 2018.
[3] D. Moshou, E. Vrindts, B. De Ketelaere, J. De Baerdemaeker, and [24] B. Sriyadi and H. Khomaeni, “Klon teh sinensis unggul gmbs 1, gmbs
H. Ramon, “A neural network based plant classifier,” Computers and 2, gmbs 3, gmbs 4, dan gmbs 5,” in Prosiding Seminar Nasional
electronics in agriculture, vol. 31, no. 1, pp. 5–16, 2001. Pertemuan Teknis Teh Nasional: Teknologi Terkini Untuk Mendukung
[4] M. Dyrmann, H. Karstoft, and H. S. Midtiby, “Plant species classifica- Sustainable Tea, pp. 7–24, 2009.
tion using deep convolutional neural network,” Biosystems Engineering,
vol. 151, pp. 72–80, 2016.

262

You might also like