You are on page 1of 13

Computers and Electronics in Agriculture 188 (2021) 106359

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Original papers

Oil palm fresh fruit bunch ripeness classification on mobile devices using
deep learning approaches
Suharjito b, *, Gregorius Natanael Elwirehardja a, Jonathan Sebastian Prayoga a
a
Computer Science Department, BINUS Graduate Program – Master of Computer Science, Bina Nusantara University, Jakarta 10480, Indonesia
b
Computer Science Department, BINUS Online Learning, Bina Nusantara University, Jakarta 11480, Indonesia

A R T I C L E I N F O A B S T R A C T

Keywords: The implementations of deep learning combined with other methods such as transfer learning and data
Oil palm augmentation in oil palm fresh fruit bunch (FFB) ripeness classification have been researched throughout the
Transfer learning years. However, most of the methods require devices with high computational resources which could not be
Data augmentation
implemented in mobile applications. To overcome this problem, this research would focus on creating a mobile
Deep learning
application to classify the ripeness levels of oil palm FFB using lightweight Convolutional Neural Network (CNN).
Mobile devices
In this research, we implemented ImageNet transfer learning on 4 lightweight CNN models with a novel data
augmentation method named “9-angle crop”, which would be further optimized using post-training quantization.
Transfer learning with 3 unfrozen convolution blocks and 9 angle crop successfully increased the classification
accuracy on MobileNetV1, and when it was compared to other lightweight models, EfficientNetB0 performed
best with 0.898 test accuracy on Keras. Float16 quantization also proved to be the most suitable post-training
quantization method for this model, halving the size of EfficientNetB0 with the least increase in image classi­
fication time and an accuracy drop of only 0.005 after the model was converted to TensorFlow Lite interpreter. In
conclusion, the best model created in this research is EfficientNetB0, trained with the combination of transfer
learning, 9 angle crop, and float16 quantization, which enabled it to achieve an overall test accuracy of 0.893 on
TensorFlow Lite with 96 ms classification time per image, far surpassing the other 3 compared models with the
second-best model being MobileNetV1 with 0.811 accuracy. The model itself was able to achieve similar results
when it was implemented on an Android application to classify the ripeness levels of oil palm FFB images ob­
tained through live camera input.

1. Introduction including by simplifying of ripeness grading process using computer


vision.
Oil palm is considered as a highly influential plant in the fields of In oil palm fruit ripeness classification, various methods including
agriculture with a continually increasing production rate since 2012 image processing, machine learning, and deep learning methods have
(Shahbandeh, 2021). Countries such as Indonesia, Malaysia, and been implemented to achieve high classification accuracy using image
Guatemala have exported large quantities of oil palm fruits throughout data. Some implementations include the usage of image segmentation
the world (Index Mundi, 2021). This huge number of production rates, using Canny edge detection algorithm in classifying oil palm fruits
along with a wide variety of derivative products from oil palm plants (Septiarini et al., 2020). In 2014, machine learning using Multi-Layer
indicates that oil palm is a highly prominent international commodity. Perceptron (MLP) was used to classify images of oil palm FFB
Demands may continue to rise and oil palm farmers will require methods captured using spectral cameras (Bensaeed et al., 2014). Despite of the
to facilitate more efficient productions of oil palm Fresh Fruit Bunches high accuracy, this method requires the usage of a spectral camera to
(FFB), as one of the current existing problems is the usage of manual capture individual oil palm images and various preprocessing process
ripeness sorting, which can be considered time-consuming and less (background removal, pixel discrimination, and noise reduction). Other
efficient. Farmers may have limited access to reliable classification studies involve the usage of MLP with Principal Component Analysis
systems. Studies have been conducted to address this challenge, (PCA) and Stepwise Discriminant Analysis (SDA) to classify the ripeness

* Corresponding author.
E-mail address: suharjito@binus.edu (Suharjito).

https://doi.org/10.1016/j.compag.2021.106359
Received 14 March 2021; Received in revised form 15 June 2021; Accepted 26 July 2021
Available online 9 August 2021
0168-1699/© 2021 Elsevier B.V. All rights reserved.
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

level of oil palm FFB based hue values (Fadilah & Mohamad-Saleh, (approximately 3.4 million) parameters and fewer computational cost
2014), and Support Vector Machine (SVM) using RGB and gray color (Sandler et al., 2018). These 2 versions of MobileNet have been widely
features (Septiarini et al., 2019). implemented in various fields, such as recognizing American Sign Lan­
However, some drawbacks exist in the usage of machine learning. guage (ASL) (Rathi, 2018) and handprint recognition system (Michele
One such drawback is the need of feature extraction process outside of et al., 2019). However, it is yet to be tested in the fields of agriculture.
the model, which means that researchers need to identify the important Other lightweight CNN architecture such as EfficientNetB0 (Tan & Le,
classifying features first and implement a program to extract these fea­ 2019) and NASNet (Neural Architecture Search Network) Mobile (Tan
tures (Dargan et al., 2019). This method would consume a lot of time to et al., 2019) can also be studied for this field as they also possess similar
select influential features of the oil palm fruits. As classifying oil palm number of parameters to those of MobileNet and are comparable in
FFB would be challenging due to its complex features, such as color terms of performance (Saxen et al., 2019).
gradations among the fruits, the thorns, and the shape of the FFB, deep One shortcoming of deep learning methods is that it requires a vast
learning would be more efficient in terms of implementation. Ibrahim amount of training data in order to achieve high accuracy and avoid
et al. have also compared the performance of deep learning and machine overfitting (a condition where a neural network performed too well on a
learning in classifying oil palm FFB, in which deep learning method particular set of data, and otherwise on other sets). To deal with this
managed to outperform machine learning methods (Ibrahim et al., weakness, transfer learning can be utilized, as it has been proven
2018). Another shortcoming in previous studies for this topic is the capable of enhancing performance of CNN in cases where large amounts
requirement of high-tech devices in supporting the computer vision of data is unavailable (Hentschel et al., 2016; Alzubaidi et al., 2020).
methods, such as spectral camera or other input devices for specific data. Transfer learning is done by copying a base knowledge, or previously
Such method may prove less reliable as these devices may not always be trained weights of a neural network in the case of deep learning, to the
available for oil palm farmers, and external feature extraction program used model’s weights and freezing several of the layers, or in other
may take more time to produce outputs. Therefore, we narrowed the words, rendering the layers’ weights untrainable (Torrey and Shavlik,
scope of our proposed method, using simple ubiquitous mobile devices 2010). In particular, ImageNet has been used in transfer learning to
which take more general forms of real-time input such as images from transfer knowledge of common objects (Deng et al., 2009). This way, a
camera as these devices are not only easy-to-use and easy-to-find, but CNN would have basic knowledge about object appearance. MobileNet
also capable of executing computer vision based tasks. An example of transfer learning with ImageNet has proven to be highly accurate in
such devices includes smartphones. various studies, including in classifying brain Magnetic Resonance Im­
Unlike machine learning, deep learning utilizes the concept of rep­ aging (MRI) (Lu et al., 2020), welding defects (Pan et al., 2020), and
resentation learning which enables the models to use raw input data in even skin cancer (Velasco et al., 2019).
various tasks, including classification. This means that deep learning Another method that may be used to avoid overfitting is data
models extract input features inside of the model itself to form high-level augmentation. It is a technique to expand the size of the used datasets
abstraction form of data. Convolution and pooling are used to extract and improve their quality by creating additional data from existing data.
these features, and the output is fed into neural networks. This archi­ There are various methods of augmentation such as geometric trans­
tecture is called Convolutional Neural Network (CNN) (LeCun, Bengio, formations (translation, rotation, flipping, etc.), kernel filters, random
& Hinton, 2015). For computer vision-based classification task in mobile erasing, generative adversarial networks (GAN), and many more.
devices, this method is more suitable for its simplicity which does not Among these methods, cropping has proven capable of achieving
necessarily require data preprocessing. Previous studies with CNN distinctively higher accuracy than other simple geometric trans­
managed to produce practical classification models to be used in mobile formations (Shorten & Khoshgoftaar, 2019). In previous studies, crop­
applications by taking real-time inputs for the model to present real-time ping techniques such as Ten Crop was used to improve CNN
outputs, such as in classifying freshwater fishes (Suharto et al., 2020) performance in classifying oil palm FFB ripeness level and achieved high
and plant disease (Syamsuri & Kusuma, 2019). accuracy (Harsawardana et al., 2020; Herman et al., 2020).
Deep learning has also been used several times in past studies on oil This paper focuses on creating a lightweight CNN to classify oil palm
palm FFB ripeness classification. Several CNN architectures have been FFB by their ripeness levels which is usable in devices with low
implemented and tested, such as AlexNet (Ibrahim et al., 2018), ResNet- computational resources, particularly smartphones. In this paper, a
152 (Harsawardana et al., 2020), and DenseNet (Herman et al., 2020). novel data augmentation method which we named “9 Angle Crop” is
Ibrahim et al. managed to achieve 100% accuracy using AlexNet on their proposed to be implemented with transfer learning on lightweight
test dataset. In 2019, Harsawardana et al. implemented ResNet-152 on a CNNs. The CNNs are then compared to discover the best model in terms
smart crane grabber for classifying palm oil ripeness levels with 0.7134 of accuracy and speed in Keras, and further optimized using post-
F1 score. In 2020, Herman et al. applied residual attention on DenseNet training quantization before being compared in TensorFlow Lite. Af­
and obtained 0.6929 F1 score in palm oil ripeness level classification. In terwards, the best models were implemented in a mobile application. All
these past experiments, transfer learning and data augmentation were in all, the main contributions of this research include:
used and managed to improve classification results. However, these
models cannot be considered lightweight due to their huge number of – testing the effect of transfer learning and the number of unfrozen
parameters. In other words, the used models are potentially unable to be blocks on oil palm FFB classification using lightweight CNN,
used in simple devices with low computational resources, such as – introducing a novel image augmentation method names “9 angle
smartphones. In order to solve this problem, our research focused on crop” and testing its influence in deep learning for oil palm FFB
training lightweight CNNs to classify ripeness levels of oil palm FFB. classification,
One notable lightweight CNN architecture called MobileNet has been – comparing the effects of post-training quantization methods to
used in various research fields involving deep learning as an architecture optimize the models on changes in the models’ accuracy, speed, and
of CNN requiring light computational resources and excellent results. It size,
was first introduced in 2017 as a lightweight CNN for mobile applica­ – comparing the performances of 4 lightweight CNN architectures with
tions with approximately 4.2 million parameters (Howard et al., 2017). the proposed method to determine the best model to be deployed in a
This MobileNet architecture would be known as MobileNetV1. In 2018, mobile application,
a newer version of MobileNet was introduced as MobileNetV2. This – creating a mobile application to classify ripeness levels of oil palm
version of MobileNet implemented the usage of bottleneck blocks, FFB
consisting of convolution with ReLU6 (Rectified Linear Unit 6), depth­
wise convolution, and 1x1 convolution without non-linearity, with less This proposed method proved capable in improving classification

2
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 1. The example of 6 oil palm FFB categories: (a) unripe, (b) under-ripe, (c) ripe, (D) over-ripe, (E) abnormal, (F) empty fruit bunch.

in various outdoor locations as background during sunny days. Speci­


Table 1
fications of the smartphone and its camera settings can be viewed in
Specifications of the mobile device used.
Table 1 and Table 2 respectively. The images were saved in JPG (Joint
Device Specification Photographic Group) format with 3120 × 4160 pixels resolution and
OS Android 9.0 (Pie) labelled manually from the mill. There were 110 images for each cate­
CPU Octa-core (4 × 2.3 GHz Cortex-A53 & 4 × 1.8 GHz Cortex-A53) gory with the exception of abnormal class, which had 103 images. For
Chipset Mediatek MT6765 Helio P35 (12 nm) the dataset, the images are split with a ratio of 5:2:3 for training, vali­
Camera V1902 triple camera
dation, testing sets respectively.

Table 2 2.2. The proposed method


Settings of the mobile device’s camera used to capture
the images.
The proposed method consisted of 2 main steps and 1 implementa­
Camera Settings Value tion step. The 2 main steps consisted of data pre-processing and deep
Camera Main camera learning steps while the implementation step involved the development
F-stop f/2.2 of an android application for the created CNN model to be used in real
Focal length 4 mm life. An illustration of the proposed method can be seen in Fig. 2 which
Flash mode None
describes the whole process of the proposed method.

accuracy and maintaining fast processing speed of the tested CNNs. 2.2.1. Pre-processing
Therefore, it may be studied further in another related research. Overall, there are 4 steps involved in pre-processing: (1) localization
and square cropping, (2) Gaussian blur, (3) 9 angle crops, and (4)
2. Materials and methods resizing. Localization was used to ensure that the features in the images
would mostly belong to the fruits. It was done manually by cropping the
2.1. Dataset images in a 1:1 aspect ratio and excluding the background as much as
possible. The next process was image filtering with Gaussian blur which
The dataset consisted of 653 images taken from an oil palm mill in was done to the whole dataset. Blurring was used to reduce image noise
Central Kalimantan during the grading process. Samples of images from or degradations from image acquisition, transmission, or storage. Blur is
each category are presented in Fig. 1. The images were taken from 6 one of the most common image degradation phenomena from cameras.
categories of oil palm FFB, which are: unripe (Fig. 1(A)), under-ripe As the objective of the research is to build a mobile application for image
(Fig. 1(B)), ripe (Fig. 1(C)), over-ripe (Fig. 1(D)), abnormal (Fig. 1(E)), classification, mobile devices used may not have the best camera qual­
and empty fruit bunch (EFB) (Fig. 1(F)) using smartphone digital camera ity, as such the CNN needed to train with blurred images (Dodge &
Karam, 2016). Blurring was also used to soften the images before they

3
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 2. Overview of the proposed method.

Fig. 3. Illustrations of: (A) five-crop and (B) 9 angle crop.

are resized (Flusser et al., 2016). The third step, 9-angle crop, was used as a data augmentation
Gaussian blur is an image enhancing technique based on the technique in order to handle the limited amount of data used. This
Gaussian distribution. The formula of two-dimensional Gaussian distri­ process was only done on training and validation set. 9-angle crop is a
bution is as follows: modification of the five-crop data augmentation technique. The
augmentation process of five-crop involved cropping a rectangular re­
1 x2 +y2
G(x, y) = e− 2σ2 (1) gion of each corner of the image and the middle of the image (Kairanbay
2πσ 2
et al., 2016). Meanwhile, 9-angle crop additionally cropped the middle
where, x and y are both horizontal and vertical axes distance from the region of each side (the region between adjacent corners) from the
origin pixel and σ is the standard deviation of the distribution, or the image. A total of 9 images and the original image would be used for each
intensity of the blurring among those pixels. This function was applied image in the training dataset. Hence, 1 single image was augmented to
on a convolution kernel which is then convolved on the image. Each new 10 images. This novel data augmentation technique was named “9-angle
pixel’s value was then transformed to a weighted average (based on the crop”. An illustration of the cropping process can be viewed in Fig. 3.
Gaussian distribution) of the pixel’s neighborhood. This filter was In order to compare the performance of 9-angle crop, five-crop was
capable of reducing noise and keeping the image sharpness levels better also used in this experiment with an addition of rotation process to
than other filters. However, an extreme blur degradation may be un­ create more images. The images were rotated by 20, 40, 60, and 80
likely and blurring too much will reduce the accuracy of the model. As degrees and added to the dataset augmented with five-crop. In 9-angle
such, the used Gaussian filter was a 5x5 kernel, with the value of σ being crop, the original image was also included, creating 10 augmented im­
calculated using formula (2) ages in total for each image. Both 9 angle crop and five-crop with
rotation were selected, as cropping itself has proven to be the best
σ = 0.3*((k − 1)*0.5 − 1 ) + 0.8 (2) geometric transformation augmentation technique (Shorten & Khosh­
goftaar, 2019). Afterwards, all of the processed images were resized to
by which we get σ = 0.95.

4
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 4. Illustration of the CNN used in this research.

Table 3 Table 4
Comparison of parameters for the 4 chosen CNN. Specifications of the used PC.
Model Total Trainable Untrainable PC Specification
Architecture Parameters Parameters Parameters
OS Windows 10 64-bit
MobileNetV1 3,890,594 3,868,706 21,888 CPU Intel Core i3-8100 3.60 GHz
MobileNetV2 3,055,906 3,021,794 34,112 RAM 8192 MB
NASNet Mobile 4,948,470 4,911,732 36,738 GPU Intel(R) UHD Graphics 630
EfficientNetB0 4,847,493 4,805,470 42,023

224 × 224. Table 5


Hyperparameter configuration of the model training
process.
2.2.2. Deep learning
Hyperparameter Value

2.2.2.1. Architecture of the proposed model. The proposed model con­ Learning Rate (LR) 0.0001
sisted of 3 parts: (1) feature extraction, (2) Global Average Pooling Momentum 0.9
Optimizer SGD
(GAP), and (3) Fully Connected Layer. For the feature extraction, 4
Batch Size 8
different CNN models from Keras were deployed and compared: (1) Epoch 150
MobileNetV1, (2) MobileNetV2, (3) EfficientNetB0, and (4) NASNet
Mobile. These models were used for feature extraction, and their outputs
were passed through a Global Average Pooling (GAP) layer and a fully 1024 to a GAP layer. GAP was used in order to reduce the number of
connected classifier. We replaced the original classifier of the Keras parameters used, capture possible discriminative features, and prevent
models with a custom classifier as the original model was trained to overfitting (Zhou et al., 2016). The pooling would return 1024-sized
classify images from ImageNet dataset with 1 Dense layer. The modified parameters to be classified by the fully connected layer. In the fully
classifier consisted of 2 hidden Dense layers, 1 output Dense layer, and a connected layer, 2 hidden layers were deployed in order to enable the
Dropout layer between the 2 hidden layers. The hidden layers contained CNN to not only represent linear decision, but also estimate functions
432 and 216 neurons respectively while the output layer contained 6 with continuous mapping from one finite space to another (Heaton,
neurons which represented the 6 ripeness levels. The dropout layer has a 2008). The usage of 2 hidden layers have been proven to work better
rate of 0.5. Architecture of the proposed deep learning model is pre­ than 1, but too many hidden layers may result in the CNN being too
sented in Fig. 4. complex (Thomas et al., 2017). The usage of a Dropout layer between
The input layer was set to 224 × 224 × 3 for all models in reference the 2 hidden layers was implemented in order to help the network
to the original MobileNet and EfficientNet architectures (Howard et al., prevent overfitting (Srivastava et al., 2014). The output of the proposed
2017; Sandler et al., 2018; Tan & Le, 2019). For the feature extraction deep learning model is a list of 6 scores, each representing the confi­
process, NASNet Mobile and EfficientNet were chosen to be used as their dence score of the input image belonging to a category. The index of the
number of parameters are much closer to those of MobileNet compared maximum confidence score would be used to label oil palm FFB images
to other CNNs. A comparison of the 4 models can be viewed in Table 3. in the mobile application.
All of the chosen lightweight models have less than 5 million parame­
ters. In other words, they would not take too much inference time and 2.2.2.2. Model training. The training of the CNN was done using Python
memory usage on the device (Reddy et al., 2018). As such, MobileNetV1 3.7.3 with TensorFlow library version 2.3 in a Personal Computer (PC).
may be faster than NASNet Mobile and EfficientNetB0, while Mobile­ Specifications of the PC is displayed on Table 4. For the hyper­
NetV2 may be even faster. In this research, this hypothesis will also be parameters, Stochastic Gradient Descent (SGD) optimizer with 0.0001
tested. learning rate and 0.9 momentum was used. SGD was selected for the
Afterwards, the extracted features will be sent as input in the size of optimizer as it was known to work better for similar data between

5
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

classes (Herman et al., 2020). The learning rate was set to 0.0001 to help stopped and the model’s weights would be restored to its weights during
the model find better solutions. As a smaller learning rate was set, a the n-th epoch. This way, neither too much computational resources nor
larger epoch, specifically 150, was used in order to enable the model to training time would be wasted when the model has converged before the
reach a state of convergence. The hyperparameter configuration can be final epoch.
viewed in Table 5.
Momentum was used in order to assist the trained model to reach 2.2.2.3. Model optimization. As the model would be used in mobile
convergence (Qian, 1999). It was done by modifying the back- devices, optimization methods, such as pruning, quantization, knowl­
propagation formula of SGD. The original formula of back-propagation edge distillation, etc., had to be considered due to the limited compu­
weight update in SGD is as follows: tation resources in such devices. Among these optimization methods,
quantization has been widely used and known to be hardware-friendly
∂L
wt = wt− 1 − α∙ (3) (Chen et al., 2020). By default, Keras models in TensorFlow used 32-
∂wt− 1
bit floating point numbers as parameters or weights. Quantization was
where wt is the new updated weight and wt-1 is the current weight before done by converting the model’s parameters to lower bit-depth repre­
the update. α is a positive decimal number known as learning rate while sentations such as float16 or int8, which resulted in the model having
L is the loss value which indicated how incorrect the model prediction is. smaller size and less memory usage at the cost of a drop in its accuracy.
Usage of momentum would require a modification from formula (3), and Generally, there are 2 usable methods in quantizing models: (1) Post-
the new formula would be as follows: training quantization and (2) Quantization-aware training. In some
cases, post-training quantization may be preferable than quantization-
wt = wt− 1 − α⋅Δwt (4) aware training because the latter method would require a model anno­
tation and another training process on the converted model.
∂L
Δwt = ρ⋅Δwt− 1 − (1 − ρ)⋅ (5) The techniques that can be used for post-training quantization are
∂wt− 1 float16 quantization and dynamic range quantization. Float16 quanti­
zation was capable of halving the model’s size, while dynamic range
where ρ is the momentum value. From formula (5), it can be seen that
quantization would compress the model to a quarter of its original size
the previous gradient (Δwt-1) would have an impact in the weight up­
albeit it disabled GPU acceleration for the model (TensorFlow, n.d.).
date. This would help the model to avoid being stuck in a local best
This means that when models converted with float 16 quantization were
solution and converge faster.
tested in a PC with GPU acceleration, they would be faster than models
In addition to Dropout and 9 angle crop, regularization was also
converted with dynamic range quantization, although the same thing
implemented to reduce the possibility of overfitting. As the model’s
may not be concluded on mobile devices. In this experiment, we
complexity increases, regularization penalizes the model in order to
implemented and compared the results of float16 quantization and dy­
focus on the most relevant features and avoid overfitting. One of the
namic range quantization as quantization-aware training consumes
most common generalization methods is L1 regularization, also known
more time in training the quantization-aware models.
as Lasso Regression, which is capable of filtering out redundant and
irrelevant features. In this method, weights of less important features
2.2.3. Mobile application development
would be shrunk towards 0 causing the features to be selected (Demir-
The trained models were converted to TensorFlow Lite models called
Kavuk et al., 2011). The penalty was added in the loss function, which
interpreter with the format of “.tflite” files to enable them to be used on
modified the loss function from formula (6) to formula (7). The formula
mobile devices. The mobile application was developed for Android OS
is as follows:
using Kotlin programming language with TensorFlow Lite library to

N ∑
M enable the usage of TensorFlow Lite models in mobile phones. Addi­
Loss = (yi − xij wj )2 (6) tionally, RenderScript was used to denoise input images. These 2 li­
i=0 j=0
braries were implemented in order to enable the application to take live

N ∑
M ∑
M camera image feeds, preprocess them, and use the interpreter to classify
Loss = (yi − xij wj )2 + λ |wj | (7) them.
i=0 j=0 j=0 The first step in developing the application was preparing the models
and labels. The TensorFlow Lite interpreters were first loaded in the
where λ is the regularization factor (Taunk, 2020). application. The Keras base model’s output was a list of 6 confidence
We also utilized early stopping to reduce the possibility of over­ score, each implying how likely the image belongs to each class. The
fitting. This method means that the training process would be halted same output was produced by the interpreter when it took an image as
when a halting condition has occurred before the maximum number of input. The application would take the index of the highest confidence
epochs. When a neural network is trained, a phenomenon where the score and display the label with the same index. In order to process the
training loss continually decreases while validation loss suddenly in­ images more efficiently, the images were converted to byte buffers
creases may occur. Such cases is an example of overfitting and it is the before being fed into the interpreters.
reason behind the usage of validation dataset, which is to monitor the The next step was to construct the process flow. As the interpreter
training process and identify the signs of overfitting. In some early required some time to classify the inputs, a selection process needed to
stopping methods, validation loss is used as an indicator to halt the be used on the image input stream from the camera. This selection was
training process by ending the training process if the validation loss does done using a flag variable. If an image is still being processed, the
not increase after a specified number of epochs. If this criteria occurs, captured frames from the camera input stream would be skipped and not
the weights of the model from the epochs with the lowest validation loss used as input for the interpreter although they are still displayed on the
would be restored to the model (Prechelt, 1998). application. By default, the images captured by the camera were rep­
Aside from validation loss, different combinations of train accuracy, resented by YUV channel pixel values. Therefore, images that passed
train loss, or validation accuracy can also be used for the stopping through the selection process would be converted to RGB values.
criteria. In order to ensure that only the best models on both datasets are Denoising was then applied to these images using ScriptIntrinsicBlur
used, we implemented early stopping with both training and validation module from RenderScript, which is similar to Gaussian blurring. Af­
loss as the indicators with a patience of 10 epochs. In other words, terwards, the image was resized to 224 × 224 using nearest-neighbor
suppose that we have trained a model for n epochs and both losses does interpolation to meet the interpreter’s input size, and converted to
not improve after another 10 epochs, the training process would be

6
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 5. Algorithm of the developed mobile application in classifying oil palm FFB images.

which are: (1) classification accuracy and (2) image classification time,
Table 6
in spite of the fact that the former would be more influential in selecting
Experiment steps details.
the best model. The classification time metric would be observed for the
Experiment Goal Used Model Used Dataset final step of evaluation. Overall, there were 4 steps of experiment and
Step
evaluation to test the impact of transfer learning, data augmentation,
1 Testing the effect of MobileNetV1 Localization + and quantization for oil palm FFB classification using deep learning.
transfer learning Gaussian Blur
Based on these data, the best model in this research was determined.
and how many
convolution blocks Detailed information of the experiment process is written in Table 6.
should be left The first step of the experiment involved 6 models of MobileNetV1
unfrozen with different numbers of unfrozen convolutional blocks after transfer
2 Testing the effect of MobileNetV1 (1) Localization + learning with ImageNet. This CNN architecture was chosen as the base
data augmentation Gaussian Blur
model due to its high number of usage in studies involving CNN in
and performance of (2) Localization +
9 angle crop Gaussian Blur mobile applications, such as classifications of skin diseases (Velasco
+ FiveCrop with et al., 2019), freshwater fishes (Suharto et al., 2020), welding defects
rotation (Pan et al., 2020), and breast cancer (Ansar et al., 2020). This step was
(3) Localization +
done in order to analyze the effect of transfer learning and how the
Gaussian Blur
+ 9 angle crop
number of unfrozen blocks affect a model’s performance. The Mobile­
3 Testing the MobileNetV1, (1) Localization + NetV1 model used in this research consisted of 13 layers and the
performance of 4 MobileNetV2, Gaussian Blur experiment was conducted on 6 models of this type, with 13, 12, 9, 6, 3,
different CNNs to EfficientNetB0, + 9 angle crop and 0 unfrozen convolution blocks respectively. For most of the models,
determine the best NASNet Mobile (2) Localization +
some of the blocks were left unfrozen in order to enable the model to
model in Keras Gaussian Blur
4 Testing the MobileNetV1, (1) Localization + adapt to the used dataset in extracting features and not be solely focused
performance of 4 MobileNetV2, Gaussian Blur on extracting features from ImageNet dataset. As such, the model with
different quantized EfficientNetB0, + 9 angle crop 0 unfrozen convolution blocks was also used in order to evaluate the
TFLite models to NASNet Mobile (2) Localization +
validity of this theory. All of the models were tested on a test dataset
determine the best Gaussian Blur
model in
without data augmentation.
TensorFlow Lite Results of the first step, specifically the number of convolution blocks
that should be left unfrozen, were used in the next steps. In the second
step of the experiment, MobileNetV1 was also used as the base model in
byte buffers for the interpreter to classify. An overview of the mobile evaluating the effects of the image augmentation methods. In this step,
application’s algorithm can be viewed in Fig. 5. the novel 9 angle crop was tested using the best model from the first step,
with the former being compared to another augmentation method,
3. Results and discussions namely 5 crop with additional rotation augmentation. Afterwards, the
best method would be used to augment the dataset in the third step of
3.1. Evaluation method the experiment, which is comparing the results of MobileNetV1 with
MobileNetV2, NASNet Mobile, and EfficientNetB0 for this problem. The
Performance of each model was compared by 2 specific metrics,

7
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 6. Performance graph during training process of the first experiment: (A) untrained, (B) transfer learning with 1 frozen block, (C) 4 frozen blocks, (D) 7 frozen
blocks, (E) 10 frozen blocks, and (F) 13 frozen blocks.

fourth step of the experiment was quantizing the models by comparing


Test evaluation time
the classification results of the TensorFlow Lite models using no quan­ Classification time = (9)
Number of steps⋅Batch size
tization, dynamic range post-training quantization, and float16 post-
training quantization.
as the test dataset was used in Batch Dataset format in Python. In this
Accuracy and average classification time were used as evaluation
format, the images would be grouped according to the batch size
metrics for the 4 experiment steps. Accuracy score was calculated
hyperparameter. In this experiment, a batch consisted of 8 images.
through the following formula:
True positives + True negatives 3.2. The evaluation performance of the proposed method
Accuracy = (8)
Number of test data
The proposed method was tested on 6 classes of oil palm FFB ripeness
The classification time metric would use average time required for
levels. In the first experiment, the 6 MobileNetV1 models were trained
the model to predict an image class. This was done by using a timer at
using the same hyperparameters and classifier. Meanwhile, the training
the start and at the end of the testing process, and the following formula
dataset used was not augmented in order to focus on observing the effect
was used to calculate the classification time:
of transfer learning using ImageNet. The training graphs of the 6 models
in the first experiment is presented in Fig. 6. Results of the training
process shows that the six models, namely: (1) untrained (Fig. 6 (A)), (2)

8
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Table 7 3 unfrozen blocks was named as the best model, obtaining the best
Result of experiment step 1. training and test accuracy score and ranked third in validation accuracy
Transfer Frozen Unfrozen Average Average Test score, slightly behind model (2) and model (1) with a difference of only
Learning Blocks Blocks Train Validation Accuracy 0.016 and 0.005 respectively. Additionally, it can be inferred from Fig. 6
Accuracy Accuracy that all of the models suffered from overfitting based on the high dif­
No 0 13 0.842 0.782 0.776 ference among the training and validation accuracy. Moreover, the test
Yes 1 12 0.852 0.793 0.755 accuracy of the 4 models were still low as most of the models’ test ac­
Yes 4 9 0.853 0.746 0.781 curacy were between 70 and 80%. Therefore, data augmentation was
Yes 7 6 0.813 0.737 0.786
Yes 10 3 0.853 0.777 0.806
tested in the second step to increase the accuracy of the models as well as
Yes 13 0 0.765 0.763 0.75 reduce the difference between training and validation accuracy.
As the results presented in Table 7 showed transfer learning on
MobileNetV1 with 3 unfrozen convolution blocks managed to outper­
transfer learning with 12 unfrozen blocks (Fig. 6 (B)), (3) transfer form the others, the following experiment to evaluate the effects of data
learning with 9 unfrozen blocks (Fig. 6 (C)), (4) transfer learning with 6 augmentation used the same setting, which is transfer learning using
unfrozen blocks (Fig. 6 (D)), (5) transfer learning with 3 unfrozen blocks ImageNet with 3 unfrozen convolution blocks. From the training results
(Fig. 6 (E)), and (6) transfer learning with 0 unfrozen blocks (Fig. 6 (F)), shown in Fig. 7, it was clear that the usage of data augmentation could
managed to reach high training and validation accuracy. Through early increase the validation accuracy. Models trained with five-crop and
stopping with train and validation loss monitoring with a patience of 10, rotation (Fig. 7 (B)) and 9 angle crop datasets (Fig. 7 (C)) outperformed
model (2) and (4) converged after approximately 50 epochs, model (1) the model trained without augmentation (Fig. 7 (A)) with roughly the
and (3) converged after roughly 60 epochs, model (6) required at least same number of epochs. Without data augmentation, the model only
70 epochs, while model (5) required more than 80 epochs. However, managed to obtain an average validation accuracy of 0.777 with 0.853
model (6) performed noticeably different as it struggled to reach 90%
train accuracy. On the other hand, model (4) struggled to attain 80%
Table 8
validation accuracy. Result of experiment step 2.
As model (1), (2), (3), and (5) performed similarly on training and
Dataset Average Train Average Test
validation sets based on the graph visualizations, a more detailed
Accuracy Validation Accuracy
observation needed to be done. Details of the results are written in Accuracy
Table 7 with average train, average validation, and test accuracy. Based
Localization + Gaussian Blur 0.853 0.777 0.806
on the results, it was clear that the number of frozen blocks affected the Localization + Gaussian Blur 0.905 0.909 0.796
performance of the models, but freezing too few or too many blocks + five-crop with rotation
would result in a drop of the model’s performance. This conclusion was Localization þ Gaussian 0.917 0.915 0.811
inferred based on the performance of model (2) and (6). Model (5) with Blur þ 9 angle crop

Fig. 7. Performance graph during training process of the second experiment: (A) without data augmentation, (B) five-crop with rotation, and (C) 9 angle crop.

9
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Fig. 8. Performance graph during training process of the third experiment: (A) MobileNetV1, (B) MobileNetV2, (C) NASNet Mobile, and (D) EfficientNetB0.

experiment to determine the best model to be implemented in the mo­


Table 9
bile application. The third step of the experiment used 4 different CNNs
Result of experiment step 3.
with the same dataset and 3 unfrozen convolution blocks. The training
Model Frozen Average Average Test Time results are displayed in Fig. 8. From the graphs, MobileNetV1 (Fig. 8
Blocks Train Validation Accuracy per
(A)), MobileNetV2 (Fig. 8 (B)), and NASNet Mobile (Fig. 8 (C)) seemed
Accuracy Accuracy Image
to achieve similar results. Meanwhile, EfficientNetB0 (Fig. 8 (D)) per­
MobileNetV1 10 0.917 0.915 0.811 18.24
formed differently, as it required more time to converge and constantly
(63 ms
layers)
reached validation accuracy scores higher than its train accuracy. Its
MobileNetV2 13 0.907 0.889 0.811 24.82 training accuracy never even once surpassed its validation accuracy.
(116 ms This result indicated that EfficientNetB0 did not overfit. Meanwhile,
layers) MobileNetV2 was the fastest to converge, requiring less than 50 epochs,
NASNet Mobile 10 0.87 0.828 0.796 42.75
followed by NASNet Mobile and MobileNetV1 respectively. Effi­
(518 ms
layers) cientNetB0 took the maximum number of 150 epoch in the training
EfficientNetB0 13 0.869 0.916 0.898 43.84 process. Detailed results on the test dataset need to be analyzed in order
(184 ms to determine the best model, and displayed in Table 9.
layers) Based on the results displayed in Table 9, EfficientNetB0 managed to
outperform the other models in terms of test accuracy, boasting an
training accuracy. On the other hand, the usage of 9 angle crop managed exceptional score of 0.898. However, it was also the slowest model,
to increase the training accuracy by 0.064 and validation accuracy by requiring an average time of 43.84 ms to classify an image. MobileNetV1
0.134. With 0.002 difference in training and validation accuracy, the required the least time among the compared models to classify the im­
result of the model trained on the dataset preprocessed using 9 angle ages, as such it was fitting to be named the most efficient model. Its
crop proved that data augmentation could be considered as one of the accuracy was quite high, being the second most accurate in classifying
solutions to reduce overfitting and increase validation accuracy. 9 angle validation and test dataset. However, these models were evaluated as
crop brought about promising results, with both test and validation Keras models. To deploy these models in mobile applications, we opti­
accuracy even higher than the other 2 methods, achieving a score of mized all of them using post-training quantization and converted them
0.811 accuracy on the test dataset. Detailed results of the second step of to TensorFlow Lite models.
the experiment can be viewed in Table 8. In order to determine the most suitable quantization method in this
Based on the results displayed on Table 8, the hypothesis of this study case, the fourth experiment step was conducted. We used dynamic-range
was proven correct, as data augmentation and transfer learning and float16 quantization, the results of which were then compared
managed to improve the performance of MobileNetV1. The novel 9 together with a non-quantized model in TensorFlow Lite. The test was
angle crop method had proven to perform best in training, validation, conducted on the same test dataset. Once again, EfficientNetB0 proved
and testing. Therefore, it would be used in the third step of the superior than the others, with 0.893 test accuracy on all 3 versions of its
TensorFlow Lite models. One distinguishable difference in this

10
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Table 10 were implemented inside the application. We used the float16 quantized
Result of experiment step 4. models as the interpreters with the exception of NASNet Mobile, which
CNN Keras Test Quantization Test Model Time achieved higher test accuracy using dynamic range quantization. Users
Architecture Accuracy Type Accuracy Size per would be able to choose which interpreter to be used. Screenshots of the
Image Android application is presented in Fig. 9. When the application is
MobileNetV1 0.811 No 0.806 14,601 38 ms launched, 4 buttons will be displayed, each having the name of each
Quantization KB model for the user to choose (Fig. 9 (A)).
Dynamic 0.806 3,915 3,726 After users have selected a model, they will need to aim the camera at
Range KB ms
Float16 0.811 7,317 39 ms
an oil palm fruit that they want to classify. The labels and confidence
KB level of the classification process would be displayed in the panel at the
MobileNetV2 0.811 No 0.806 11,184 40 ms bottom of the screen (Fig. 9 (B,C,D,E)). The confidence level is shown in
Quantization KB percentage, which implies the image’s likeliness of being in the dis­
Dynamic 0.791 3,212 1,922
played class. If a captured frame has not been classified by the inter­
Range KB ms
Float16 0.776 5,625 41 ms preter, all of the frames captured by the camera would be skipped and
KB not fed into the interpreter.
NASNet Mobile 0.796 No 0.77 18,829 120 ms
Quantization KB 3.4. Conclusion
Dynamic 0.791 5,661 3,397
Range KB ms
Float16 0.776 9,621 121 ms This research produced an Android application capable of classifying
KB the ripeness levels of oil palm FFB in 6 categories (unripe, under-ripe,
EfficientNetB0 0.898 No 0.893 18,185 97 ms ripe, over-ripe, abnormal, and Empty Fruit Bunch). The deep learning
Quantization KB
models used were trained using transfer learning and a novel data
Dynamic 0.893 5,308 2,460
Range KB ms augmentation method called “9 angle crop”. Both of these methods had
Float16 0.893 9,165 96 ms proven capable of increasing the accuracy of MobileNetV1 when they
KB were tested on validation and test dataset. Transfer learning with 3
unfrozen blocks allowed MobileNetV1 to reach 0.777 validation accu­
racy and 0.806 test accuracy, with a further increase to 0.915 accuracy
experiment step is that EfficientNetB0 models performed faster than
and 0.811 test accuracy using 9 angle crop. The novel 9 angle crop also
NASNet Mobile, although they were still two-times slower than Mobi­
helped in preventing overfitting on the model, as it reduced the differ­
leNetV1 and MobileNetV2. Dynamic range quantization proved capable
ence between train and validation accuracy on MobileNetV1 by 0.074.
of compressing up to 70% of the model size while float16 quantization
Further experiments on MobileNetV2, NASNet Mobile, and Effi­
compressed up to 50% of the model size. Speed is what differentiated
cientNetB0 brought about the conclusion that EfficientNetB0 was su­
these 2 models, as float16 quantized models took far less time in clas­
perior to the other 3 models with 0.898 test accuracy on Keras.
sifying the images as it enabled GPU acceleration. Taking into account
The models were further optimized using post-training quantization,
all 3 of these aspects, namely: accuracy, model size, and speed, float16
with dynamic range quantization providing the best result for NASNet
quantized EfficientNetB0 is concluded to be the best model in this
Mobile and float16 quantization providing the best results for Mobile­
experiment step. Detailed results of the fourth experiment is displayed
NetV1, MobileNetV2, and EfficientNetB0. Float16 quantized Effi­
on Table 10.
cientNetB0 managed to reach the highest accuracy, achieving 0.893
accuracy score on test dataset. On the other hand, MobileNetV1 had
3.3. The developed application proven not only to be the most efficient CNN requiring only 39 ms image
classification time, but also to be highly accurate, with 0.811 test ac­
Since each of the models have different advantages and disadvan­ curacy. Despite MobileNetV2 having the least parameter, it was a bit
tages in terms of accuracy and classification speed, all four of the models inferior to MobileNetV1 in this research, both in terms of accuracy and

Fig. 9. User Interface design of the developed mobile application: (A) the main menu, the classification activity of (B) EfficientNetB0, (C) MobileNetV1, (D)
MobileNetV2, and (E) NASNet Mobile.

11
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

classification time. This result indicated that when the difference in the Heaton, J., 2008. Introduction to Neural Networks with Java. Heaton Research,
Chesterfield.
number of parameters is not too large, fewer parameters do not neces­
Hentschel, C., Wiradarma, T.P., Sack, H., 2016. Fine Tuning CNNs with scarce training
sarily yield faster processing time. When the four models were used in a data - adapting ImageNet to art epoch classification. In: 2016 IEEE International
mobile application, they performed well albeit the accuracy and pro­ Conference on Image Processing (ICIP). IEEE, Phoenix, pp. 3693–3697. https://doi.
cessing time for each model were different. They were capable of clas­ org/10.1109/ICIP.2016.7533049.
Herman, Susanto, A., Cenggoro, T.W., Suharjito, Pardamean, B., 2020. Oil Palm Fruit
sifying ripeness levels of oil palm FFB in devices with low computational Image Ripeness Classification with Computer Vision using Deep Learning and Visual
resources and required little processing time. In the future, results of this Attention. J. Telecommun., Electronic Comput. Eng. 12(2), 21–27.
research may be studied and improved further to classify objects using Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Adam, H.,
2017. MobileNets: Efficient convolutional neural networks for mobile vision
deep learning neural networks with higher accuracy and faster process applications. arXiv preprint arXiv:1704.04861.
time. Harsawardana, Rahutomo, R., Mahesworo, B., Cenggoro, T.W., Budiarto, A.,
Suparyanto, T., Pardamean, B., 2020. AI-Based Ripeness Grading for Oil Palm Fresh
Fruit Bunch in Smart Crane Grabber. In: IOP Conference Series: Earth and
CRediT authorship contribution statement Environmental Science. IOP Publishing Ltd, Solo, p. 426. https://doi.org/10.1088/
1755-1315/426/1/012147.
Suharjito: Conceptualization, Resources, Data curation, Writing - Ibrahim, Z., Sabri, N., Isa, D., 2018. Palm oil fresh fruit bunch ripeness grading
recognition using convolutional neural network. J. Telecommun., Electronic
review & editing, Supervision. Gregorius Natanael Elwirehardja: Comput. Eng. 10, 109–113.
Methodology, Formal analysis, Investigation, Data curation. Jonathan Mundi, 2021. Palm Oil Exports by Country in 1000 MT. Retrieved February 18, 2021,
Sebastian Prayoga: Methodology, Software, Investigation, Data cura­ from Index Mundi: https://www.indexmundi.com/agriculture/?commodity=palm
-oil&graph=exports.
tion, Writing – original draft. Kairanbay, M., See, J., Wong, L.-K., 2016. Aesthetic evaluation of facial portraits using
compositional augmentation for deep CNNs. In: Asian Conference on Computer
Declaration of Competing Interest Vision. Springer, pp. 462–474. https://doi.org/10.1007/978-3-319-54427-4_34.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. https://
doi.org/10.1038/nature14539.
The authors declare that they have no known competing financial Lu, S.-Y., Wang, S.-H., Zhang, Y.-D., 2020. A classification method for brain MRI via
interests or personal relationships that could have appeared to influence MobileNet and feedforward network with random weights. Pattern Recognition Lett.
140, 252–260. https://doi.org/10.1016/j.patrec.2020.10.017.
the work reported in this paper. Michele, A., Colin, V., Santika, D.D., 2019. Mobilenet convolutional neural networks and
support vector machines for palmprint recognition. Procedia Computer Science 157,
110–117. https://doi.org/10.1016/j.procs.2019.08.147.
Acknowledgement
Pan, H., Pang, Z., Wang, Y., Wang, Y., Chen, L., 2020. A New Image Recognition and
Classification Method Combining Transfer Learning Algorithm and MobileNet Model
The authors would like to express their gratitude to BINUS University for Welding Defects. IEEE Access 8, 119951–119960. https://doi.org/10.1109/
ACCESS.2020.3005450.
with internal research grant no. 050/VII/VR.RTT/2020 and the oil palm
Prechelt, L., 1998. Early Stopping - But When? Neural Networks: Tricks of the Trade
mill for supporting the preparation of image data. 1524, 55–69. https://doi.org/10.1007/3-540-49430-8_3.
Qian, N., 1999. On the momentum term in gradient descent learning algorithms. Neural
Networks 12 (1), 145–151. https://doi.org/10.1016/S0893-6080(98)00116-6.
Funding Rathi, D., 2018. Optimization of Transfer Learning for Sign Language Recognition
Targeting Mobile Platform. Int. J. Recent Innov. Trends Comput. Commun. 6 (4),
This research did not receive any specific grant from funding 198–203.
Reddy, N., Rattani, A., Derakhshani, R., 2018. Comparison of deep learning models for
agencies in public, commercial, or not-for-profit sectors. biometric-based mobile user. In: 2018 IEEE 9th International Conference on
Biometrics Theory, Applications and Systems (BTAS). IEEE, Redondo Beach, pp. 1–6.
Reference https://doi.org/10.1109/BTAS.2018.8698586.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C., 2018. MobileNetV2:
Inverted Residuals and Linear Bottlenecks. In: IEEE Conference on Computer Vision
Alzubaidi, L., Al-Shamma, O., Fadhel, M.A., Farhan, L., Zhang, J., Duan, Y.e., 2020.
and Pattern Recognition (CVPR), pp. 4510–4520. https://doi.org/10.1109/
March 6). Optimizing the performance of breast cancer classification by employing
CVPR.2018.00474.
the same domain transfer learning from hybrid deep convolutional neural network
Saxen, F., Werner, P., Handrich, S., Othman, E., Dinges, L., Al-Hamadi, A., 2019. Face
model. Electronics 9 (3), 445. https://doi.org/10.3390/electronics9030445.
Attribute Detection with MobileNetV2 and NasNet-Mobile. IEEE, Dubrovnik,
Ansar, W., Shahid, A.R., Raza, B., Dar, A.H., 2020. Breast cancer detection and
pp. 176–180. https://doi.org/10.1109/ISPA.2019.8868585.
localization using mobilenet based transfer learning for mammograms. In:
Septiarini, A., Hamdani, H., Hatta, H.R., Anwar, K., 2020. Automatic image
International Symposium on Intelligent Computing Systems 2020: Intelligent
segmentation of oil palm fruits by applying the contour-based approach. Scientia
Computing Systems, 1187, 11–21. https://doi.org/10.1007/978-3-030-43364-2_2.
Horticulturae 261. https://doi.org/10.1016/j.scienta.2019.108939.
Bensaeed, O.M., Shariff, A., Mahmud, A.B., Shafri, H.Z., Alfatni, M.S., 2014. Oil palm
Septiarini, A., Hamdani, H., Hatta, H.R., Kasim, A.A., 2019. Image-based processing for
fruit grading using a hyperspectral device and machine learning algorithm. In: IOP
ripeness classification of oil palm fruit. IEEE, Yogyakarta, pp. 23–26. https://doi.
Conference Series: Earth and Environmental Science. 20. IOP Publishing Ltd, Kuala
org/10.1109/ICSITech46713.2019.8987575.
Lumpur. https://doi.org/10.1088/1755-1315/20/1/012017.
Shahbandeh, M., 2021, January 27. Production volume of palm oil worldwide from
Chen, Y., Zheng, B., Zhang, Z., Wang, Q., Shen, C., Zhang, Q., 2020. Deep learning on
2012/13 to 2020/21. Retrieved February 18, 2021, from Statista: https://www.
mobile and embedded devices: state-of-the-art, challenges, and future directions.
statista.com/statistics/613471/palm-oil-production-volume-worldwide/.
ACM Comput. Surveys 53 (4), 1–37. https://doi.org/10.1145/3398209.
Shorten, C., Khoshgoftaar, T.M., 2019, July 6. A survey on Image Data Augmentation for
Dargan, S., Kumar, M., Ayyagari, M.R., Kumar, G., 2019. A survey of deep learning and
Deep Learning. J. Big Data 6(60). https://doi.org/10.1186/s40537-019-0197-0.
its applications: A new paradigm to machine learning. Arch. Computat. Methods
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014.
Eng. 27 (4), 1071–1092. https://doi.org/10.1007/s11831-019-09344-w.
Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn.
Demir-Kavuk, O., Kamada, M., Akutsu, T., Knapp, E.-W., 2011. Prediction using step-
Res. 15, 1929–1958.
wise L1, L2 regularization and feature selection for small data sets with large number
Suharto, E, Suhartono, Widodo, A P, Sarwoko, E A, 2020. The use of mobilenet v1 for
of features. BMC Bioinform. 12 (412), 1–10. https://doi.org/10.1186/1471-2105-
identifying various types of freshwater fish. J. Phys.: Conference Series 1524,
12-412.
012105. https://doi.org/10.1088/1742-6596/1524/1/012105.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: A large-scale
Syamsuri, B., Kusuma, G.P., 2019. Plant disease classification using lite pretrained deep
hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
convolutional neural network pretrained deep convolutional neural network. Int. J.
Pattern Recognition. IEEE, Miami, pp. 248–255. https://doi.org/10.1109/
Innov. Technol. Exploring Eng. 9(2), 2796–2804. https://doi.org/10.35940/ijitee.
CVPR.2009.5206848.
B6647.129219.
Dodge, S., Karam, L., 2016. Understanding how image quality affects deep neural
Tan, M., Le, Q.V., 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural
networks. In: 2016 Eighth International Conference on Quality of Multimedia
Networks. In: Proceedings of the 36th International Conference on Machine
Experience (QoMEX). IEEE, Lisbon, pp. 1-6. https://doi.org/10.1109/
Learning. Long Beach, pp. 6105-6114.
QoMEX.2016.7498955.
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., Le, Q.V., 2019.
Fadilah, N., Mohamad-Saleh, J., 2014. Color Feature Extraction of Oil Palm Fresh Fruit
MnasNet: Platform-Aware Neural Architecture Search for Mobile. IEEE/CVF
Bunch Image for Ripeness Classification. In: 13th International Conference on
Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, pp.
Applied Computer and Applied Computational Science (ACACOS’14), pp. 51–55.
2820-2828.
Flusser, J., Farokhi, S., Höschl, C., Suk, T., Zitova, B., Pedone, M., 2016. February.
Recognition of Images Degraded by Gaussian Blur. IEEE Trans. Image Proc. 25 (2),
790–806. https://doi.org/10.1109/TIP.2015.2512108.

12
Suharjito et al. Computers and Electronics in Agriculture 188 (2021) 106359

Taunk, D., 2020, March 15. L1 vs L2 Regularization: The intuitive difference. Retrieved Research on Machine Learning Applications and Trends: Algorithms, Methods, and
May 23, 2021, from Medium: https://medium.com/analytics-vidhya/l1-vs-l2-regu Techniques. United States of America: Information Science Reference - Imprint of:
larization-which-is-better-d01068e6658c. IGI Publishing, Hershey, Pennsylvania, pp. 242–264. https://doi.org/10.4018/978-
TensorFlow, n.d. Model optimization. Retrieved May 23, 2021, from TensorFlow: htt 1-60566-766-9.ch011.
ps://www.tensorflow.org/lite/performance/model_optimization#types_of_opt Velasco, J., Pascion, C., Alberio, J.W., Apuang, J., Cruz, J.S., Gomez, M.A., Jorda, R.J.,
imization. 2019. A Smartphone-Based Skin Disease Classification Using MobileNet CNN. Int. J.
Thomas, A.J., Petridis, M., Walters, S.D., Gheytassi, S.M., Morgan, R.E., 2017. Two Adv. Trends Comput. Sci. Eng. 8(5), 2632-2637. https://doi.org/10.30534/ijatcse/
Hidden Layers are Usually Better than One. In: Boracchi, G., Iliadis, L., Jayne, C., 2019/116852019.
Likas, A. (Eds.), Communications in Computer and Information Science, 744, Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A., 2016. Learning Deep Features
279–290. https://doi.org/10.1007/978-3-319-65172-9_24. for Discriminative Localization. In: 2016 IEEE Conference on Computer Vision and
Torrey, L., Shavlik, J., 2010. Transfer Learning. In: Olivas, E.S., Guerrero, J.D., Martinez- Pattern Recognition (CVPR). IEEE, Las Vegas, pp. 2921–2929. https://doi.org/
Sober, M., Magdalena-Benedito, J.R., López, A.J., Olivas, E.S. (Eds.), Handbook of 10.1109/CVPR.2016.319.

13

You might also like