You are on page 1of 8

Computers and Electronics in Agriculture 181 (2021) 105938

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Original papers

Machine learning techniques for classifying the sweetness of watermelon


using acoustic signal and image processing
Ketsarin Chawgien, Supaporn Kiattisin *
Information Technology Management, Faculty of Engineering, Mahidol University, Nakhon Pathom, Thailand

A R T I C L E I N F O A B S T R A C T

Keywords: Sweetness is an essential factor for assessing the internal quality of fresh watermelon. In this paper, a fusion non-
Sweetness destructive method for classifying watermelon sweetness based on acoustic signal and image processing tech­
Non-destructive detection niques is proposed. Tapping sound signals, watermelon rind patterns, and weight are considered as features. The
Watermelon image processing
application of the three features is inspired by techniques that are used by famers to estimate watermelon
Acoustic signal processing
maturity. Machine learning (ML) techniques are applied to develop sweetness classification models. Eight
Machine learning
classification-based ML techniques are used: Naïve Bayes, K-nearest neighbors, Decision tree, Random forest,
Artificial neural network, Logistic regression, Support vector machine, and Gradient boosted trees. The applied
ML models are evaluated classification performance using accuracy, precision, recall, F-measure, and the area
under the receiver operating characteristic (AUC). The results show that the proposed method can reliably
classify watermelon sweetness. The highest classification accuracy achieves 92%, obtained by Gradient boosted
trees.

1. Introduction efficiently predict the watermelon sweetness. This lack of sweetness


classification accuracy can cause low export competitiveness on the
Watermelon is a sweet fruit that is popularly consumed as fresh fruit global fruit market.
around the world. Watermelon is categorised as a non-climacteric fruit It is important to apply modern non-destructive detection technol­
which means that the quality of watermelon (sweetness, firmness, flesh ogies to improve the accuracy of watermelon sweetness classification.
colour, and citrulline) enhances gradually during the maturity period, Today, there are existed non-destructive detection technologies for
and the process of physiological maturation finishes at harvesting. estimating the sweetness such as nuclear magnetic resonance or near
(Kyriacou et al., 2016; Kyriacou et al., 2018). The effects of the non- infrared spectroscopy. However, the nuclear magnetic resonance is
climacteric ripening are that watermelon needs to be harvested at an expensive while the near infrared spectroscopy is difficult to setup to
appropriate maturity level to obtain optimum maturity stage. However, obtain the robust and accurate estimation results. These detection
it is difficult to estimate the maturity level of watermelon using non- technologies also require previous knowledge to calibrate the model
destructive detection equipment. (Arendse et al., 2018; Jie et al., 2014). For these reasons, it is difficult for
Traditionally, there are many non-destructive methods that famers small businesses, famers, and customers to access the non-destructive
use to estimate the maturity level of watermelon in order to harvest it detection technologies of watermelon sweetness.
properly. Tapping sounds, watermelon rind patterns, and weight are In contrast, the tapping sound signals and watermelon rind patterns
main watermelon features that have been used. For example, famers have been widely used by famers to estimate the maturity level. These
listen to the tapping sounds, observe the watermelon rind patterns and features are simple and convenient to collect and use. Two relevant
weigh a watermelon, and then they integrate the information from these detection technologies for the features can be identified which are
features to estimate watermelon maturity level. However, as the acoustic signal and image processing. Today, the two modern detection
methods require experiences and physical performances of the famer, technologies are widely used to extract features from watermelon. In the
classification errors can easily occur due to personal judgement resulting acoustic property technology, this technology usually considers the
low classification accuracy. Especially, the detection by human cannot maximum frequency (Fmax) of impulse response in the form of sound

* Corresponding author.
E-mail address: supaporn.kit@mahidol.ac.th (S. Kiattisin).

https://doi.org/10.1016/j.compag.2020.105938
Received 30 April 2020; Received in revised form 8 October 2020; Accepted 1 December 2020
Available online 7 January 2021
0168-1699/© 2020 Elsevier B.V. All rights reserved.
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

waves or vibrations. The technology is effectively applied for estimating


maturity, internal defect, and firmness. For the image processing tech­
nology, watermelon rind is commonly used to estimate the maturity and
internal damage of watermelon (Mohd Ali et al., 2017; Jie and Wei,
2018).
There is a problem regarding the low reliability and accuracy of the
sweetness classification when one of the two technologies is used
separately especially in the acoustic signal technique. This is due to after
harvesting, the sweetness hardly changes while the signal wave prop­
erties continuously change during the storage and transportation period
(Menon, 2012; Zhang and Ge, 2016; Zhu et al., 2017). For this reason,
the change in the signal wave properties is unable to guarantee whether
the watermelon is sweet or not. Thus, it is necessary to combine the
tapping sound with watermelon rind and weight to increase the per­
formance of sweetness estimation. In the watermelon rind, as the
physiological maturation finishes changing at harvesting as a non-
climacteric fruit (Kyriacou et al., 2016; Kyriacou et al., 2018), the rind
patterns allow to distinguish whether a watermelon is harvested pre­
maturely. Moreover, weight has a correlation with maturity level
(Kyriacou et al., 2018). It is reasonable to use weight as a considered
feature.
Based on the above discussion, as inspired by the methods that Fig. 1. An overview of the framework.
famers use to estimate the maturity level of watermelon, we propose a
fusion non-destructive method for classifying watermelon sweetness evaluation. In step 1, the collected watermelons were extracted the
using tapping sound signals, watermelon rind patterns, and weight. considered features. As mentioned earlier, acoustic response signals,
Acoustic signal and image processing technologies are applied to extract rind patterns, and weight of all watermelons were collected. Then, the
features from tapping sounds and rind patterns, respectively. The soluble solids content (SSC) was measured. The duration of this exper­
maximum frequency (Fmax) is extracted from the tapping sound signals iment was 1 day.
using acoustic signal analyses while the entropy value (Gouveia et al., Once all the watermelons were measured SSC, the watermelons were
2011; De and Sil, 2012) is extracted from the watermelon rind patterns divided into a sweet and unsweet group. SSC of 9 ◦ Brix was used as a
using image processing techniques. The three features (Fmax, Entropy sweetness threshold to classify the watermelons. This SSC value is the
value, weight) are obtained as input features or predictors for classifying minimum requirement of watermelon sweetness for both domestic and
sweetness. In this study, sweetness is indicated by soluble solids content international trade based on The National Bureau of Agricultural Com­
(SSC), assuming that the most soluble solids in fruit juice are simple modity and Food Standards (ACFS), Thailand Ministry of Agriculture
sugars. (Collins et al., 2007; Akashi et al., 2017; Muhammad Jawad and Cooperatives. Hence, the watermelons which had SSC more than or
et al., 2020). Machine learning (ML) methods are used to develop equal to 9 ◦ Brix were defined as sweet cases, whereas those which had
models for classifying the sweetness. Eight classification-based ML SSC lower than 9 ◦ Brix were defined as unsweet cases. After the wa­
techniques are applied: Naïve Bayes, K-nearest neighbors (KNN), Deci­ termelons were grouped, 100 watermelon samples with the best con­
sion tree, Random forest, Artificial neural network (ANN), Logistic ditions (less damages and cleared rind patterns) of each group were
regression, Support vector machine (SVM), and Gradient boosted trees selected. The total number of the watermelon samples was 200
(GBT). The applied ML models are evaluated their classification per­ (Table 1). In step 2, all features were used to develop the ML models, and
formance using accuracy, precision, recall, F-measure, and the area then were evaluated the classification performance in the step 3.
under the receiver operating characteristic (AUC) (El-Bendary et al.,
2015; Zaki and Meira, 2020) to assess the reliability of the proposed
method. In addition, the effects of different combinations of the input 2.3. Data pre-processing
features on the classification performance of the applied ML model are
examined. 2.3.1. Acoustic impulse response detection
Fig. 2 shows the process used to record and analyse the acoustic
2. Materials and methods response signals in this study. The acoustic waves were recorded by a
permanently polarised microphone (RODE, smartLav+) while a water­
2.1. Fruit material and storage melon sample was hit by a hitting ball in a noise reduction box. The
operating frequency range of the microphone is from 20 Hz to 20 kHz.
This section presents how watermelon samples are collected and The microphone was installed at the top of the watermelon samples. The
stored. In this study, a set of Kinnaree cultivar watermelons was distance between the microphone and the top surface of watermelon
collected from a local watermelon farm in Chonburi, Thailand. The samples was 4 to 5 cm approximately depending on watermelon shapes
watermelons were picked by experienced farmers while bruised and and sizes. The hitting ball used in this study is made of steel with the
damaged watermelons were not considered. The weight of the water­ diameter of 2.54 cm (1 Inch). The major advantage of the steel ball is
melons used was between 2 and 4 kg. The weight range represents that the large elastic modulus of steel causes the contact time between
Kinnaree watermelon product which is commonly found in the market. the ball and the watermelon is short; the resonant frequency of the
All watermelons were stored and tested at an air-conditioned laboratory
(26 ◦ C). Table 1
Terms used for measuring the classification.
2.2. Experimental system
Terms of SSC (◦ Brix) Watermelon samples Meaning

An overview of this framework is illustrated in Fig. 1. There are three <9 100 SSC < 9 ◦ Brix classified as unsweet
≥9 100 SSC ≥ 9 ◦ Brix classified as sweet
main steps which are pre-processing, modelling, and performance

2
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

Fig. 2. Processing and schematic of acoustic response setup.

obtained acoustic waves is consistent (Mao et al., 2016).


To begin the acoustic response detection, the hitting ball was
released to hit the middle zone of a watermelon sample at one site by
gravitational acceleration with initial velocity of zero. The acoustic
response signals were recorded as a 24-bit (51.2 kbps sampling rate) and
stored in a computer (.wav) by data acquisition card. The best three
waves of each watermelon sample were selected and set zero mean
before normalising. After the selection, each signal was reduced noises
by cutting at 0.1 threshold of its amplitude and stored for 4096 points.
Then, the noise-reduced signal was converted from time-domain into
frequency-domain responses using Fast Fourier Transformation (FFT).
FFT is an algorithm which is used to transform a signal from the time
domain to the frequency domain. FFT reduces the complexity of Discrete
Fourier Transform (DFT) by determining only Nlog2N where N is the size
of data. In general, a radix-2 FFT which restricts the number of samples
in the sequence to a power of two is applied resulting FFT consumes less
time to analyse data compared to DFT. The definition of FFT is similar to
DFT as:
∑N− 1
X(k) = n=0
(x(n)e− j2πnk/N (1)

where k = 0,1, … , N-1.


In this study, n is started at n = 1 causing Eq.(1) changes to Eq.(2).
Thus, N = 4096 is defined in this experiment.
∑N
X(k) = (x(n)e− j2πn(k− 1)(n− 1)/N (2) Fig. 3. Time-domain response signals of (a) an unsweet and (b)
n=1
sweet watermelon.
where k = 1,2, … , N.
The acoustic signal analyses were carried out using Matlab R2019a. (7.8 ◦ Brix) was 16 Hz while the Fmax of the sweet case (9.3 ◦ Brix) was 20
Only the maximum frequency (Fmax) was considered as the natural Hz.
frequency of each sample. Fig. 3 and Fig. 4 show time-domain and
frequency-domain response signals, respectively. As can be seen in 2.3.2. Image detection
Fig. 3, the time-domain response signals of the two cases of watermelon The schematic of image setup is illustrated in Fig. 5. The image type
samples are illustrated. All response signals were normalised; the used in this study was the Joint Pictures Expert Group (JPEG) format
maximum value was transformed into 1 while the others were trans­ with a resolution of 2832x2420 pixels. The first step was to capture the
formed into decimals between − 1 and 1. Fig. 4 shows the frequency- watermelon rind patterns using a digital camera (Sony a5100) under
domain response signals of two watermelon samples. It was observed controlled environment. A lighting box was used to control the light
that there were the different values of Fmax between the unsweet and intensity. The lighting source was provided by a ring-shaped LED Light
sweet case with different degree of SSC. The Fmax of the unsweet case Modules attached at the top of the lighting box. The use of ring-shaped

3
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

Entropy is a statistical measurement based on randomness. The en­


tropy can be applied to characterise the textures of an input image
(Krawozyk and Turowski, 1987; Lee and Xiang, 2001; De and Sil, 2012).
The entropy values have a rank between 0 and 1 calculated by
computing all local entropy of an image. The local entropy can be
defined by a multiple of probability Distribution (P) and the base log2 of
that probability of the class as Eq. (3).
Entropy(xi ) = − P(xi )log2 (xi ) (3)

where xi is a class feature of an image that can have more than 1 class.
As mentioned, the entropy can be determined by summarising the
probability of each class that can be written in a general form in Eq. (4).
The high value of Entropy value indicates the high level of different
information from an average content in an image. Based on the concept
of the entropy which considers an image’s textures, the entropy function
can be used to characterise the rind patterns of an input watermelon
image.
∑n
H(X) = − i=1
P(xi )log2 P(xi ) (4)

2.3.3. Measurement of weight and soluble solids content (SSC)


After collecting the acoustic signal responses and rind patterns, the
watermelons were weighed and then measured SSC. The SSC (◦ Brix) was
measured using hand-held refractometers (Atago Co., PR-1 ◦ Brix -Meter,
Fig. 4. Frequency-domain response signals of (a) an unsweet and (b) Tokyo, Japan) from the watermelon juice which was exuded from the
sweet watermelon. five-selected points (Fig. 6). An average SSC from the five points was
calculated to represent the sweetness of a sample. Table 2 summarises
light provides more stable light intensity resulting more reliable rind the statistical ranges of all features used in this study.
patterns data (Ahmad Syazwan et al., 2012). The light intensity at the Fig. 7 illustrates the watermelons with two different status. The rind
floor of the lighting box was at 762 ± 3 lux recorded using Mini Light patterns, the flesh colour of watermelons, and the other important
Meter (UNI-T UT383). The camera was placed above the samples at values of the two cases are shown. It was observed that while the weight
90◦ angle insight the light ring. The distance between the camera and the of the two cases was approximately equal, Fmax, Entropy value and SSC
lighting box floor was 46.5 cm. For the camera settings, the International of the two cases were different. The lower value of Fmax and Entropy
Standards Organization (ISO) was set at 160, the shutter speed was 1/ value were detected at the immature case (unsweet) while the larger
20, the aperture was F5.6, and zoom was 38 mm. values were detected at the mature case.
Once the rind patterns were captured, the original images were
stored in a computer. Matlab R2019a was used for performing the image 2.4. Modelling
analyse. To analyse the rind patterns, the images were operated based on
the shape of the samples called morphological operating. After this Classification-based ML models were developed using the considered
stage, the images were converted into grayscale and then were created features which were the input predictors (Fmax, Entropy value, weight)
an image mask. This masking process is produced the background that and the target output (SSC). The ML models involved Naïve Bayes, KNN,
contributes the pixel value of 0 (black). The marked images were then Decision tree, Random forest, ANN, Logistic regression, SVM, and
extracted the rind patterns using the candy technique. Following these Gradient boosted trees. Each ML model was tuned the hyper-parameter
extracting processes, the images were measured Entropy values. to perform an optimised analysis. All the ML models were developed in

Fig. 5. Processing and schematic of image setup.

4
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

performance of a ML model.

2.5. Performance measures

The performance of each applied ML model was evaluated using


confusion matrices as shown in Table 3. A confusion matrix consists of
the two typical classes which are positive and negative class. There are
four possible outcomes in the confusion matrix: true positive (TP), true
negative (TN), false positive (FP), and false negative (FN). This confu­
sion matrix consisted of two classes: class 1 with SSC ≥ 9 ◦ Brix, and class
2 with SSC < 9 ◦ Brix. The rows and the columns represent the number of
sweetness in an actual class and predicted class, respectively. TP and TN
were the correct classification of class 1 (SSC ≥ 9 ◦ Brix) and class 2 (SSC
< 9 ◦ Brix), respectively while FN and FP were the misclassification of
class 1 and class 2 respectively. Based on the confusion matrix, four
performance measures can be identified: accuracy, precision, recall, F-
Fig. 6. Five-selected positions for measuring SSC. measure as defined in Eq. (5–8).
TP + TN
Accuracy = (5)
TP + TN + FP + FN
Table 2
Statistical ranges of features. TP
Recall = (6)
Parameter Min Max Mean SD TP + FN
Entropy value 0.10 0.39 0.26 0.08
Fmax (Hz) 15.00 39.00 21.49 5.65
Weight (kg) 2.01 4.06 2.61 0.42 Table 3
SSC (◦ Brix) 6.45 12.70 8.94 1.61 Two class confusion matrix.
Predicted class
Actual condition Class 1 with SSC ≥ 9 Class 2 with SSC < 9
RapidMiner. 5-fold cross validation (CV) was used to generalise the ML ◦
Brix ◦
Brix
model and avoid the overfitting. In the 5-fold CV, all the data is divided
Actual class 1 with SSC ≥ 9 True positive (TP) False negative (FN)
into 5 folds; 4 folds is for the training while one-fold is kept for the ◦
Brix
validation. 5 times of both training and testing processes are performed Actual class 2 with SSC < 9 False positive (FP) True negative (TN)
using the different divided folds to determine the accuracy of each fold, ◦
Brix
and then the five values are averaged to represent the classification

Fig. 7. Comparison of watermelon samples with different Entropy values.

5
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

TP
Precision = (7)
TP + FP

2xPrecisionxRecall
F − measure = (8)
Precision + Recall
Moreover, the area under the receiver operating characteristic (AUC)
was also considered as an index for evaluating the performance of the
applied ML models in this study. An AUC value is obtained by the area
under ROC curve where the ROC curve is a graphical plot between recall
and the false positive rate (FPR). FPR can be defined as Eq. (9).
FP
FPR = (9)
FP + TN
The AUC value varies between 1 and 0 whereas 1 represents a perfect
prediction while the values below 0.5 show insufficient prediction of a
model. A ML model which has the AUC value above 0.9 is outstanding,
0.8–0.9 is excellent, and 0.7–0.8 is acceptable (Hosmer and Lemeshow, Fig. 8. TNR and TPR of the applied ML models.
2000).
weight as input predictors can successfully classify the sweetness of
3. Results and discussion watermelon. The ML models which had the accuracy over 90% were
Gradient boosted trees, SVM, Logistic regression, and Random forest.
3.1. Comparison of the machine learning models Among these high-accurate models, Gradient boosted trees had the
highest accuracy (92%). The accuracy of SVM and Random forest was
The eight classification-based ML techniques are evaluated the slightly lower than Gradient boosted trees with the same value of 91.5%.
classification performance. The confusion matrices of each model based For the precision and recall, the highest precision was obtained by SVM,
on the dataset are summarised in Table 4. First, the true positive rate Gradient boosted trees and Naïve Bayes while the highest recall was
(TPR) and true negative rate (TNR) are employed to evaluate the model obtained by Gradient boosted trees, Random forest, and ANN. It is worth
performance. The high TPR and TNR demonstrate the capability of the nothing that models which have high precision generally have low recall
ML models for the sweetness classification. TPR is the proportion of as seen, for example, Naïve Bayes and K-nearest neighbors. This is
actual sweet watermelon samples (SSC ≥ 9 ◦ Brix) which are predicted as because precision and recall are in tension which means that increasing
sweet. TNR is the proportion of actual unsweet samples (SSC < 9 ◦ Brix) one indicator decreases the other. Therefore, to comprehensively eval­
which are predicted as unsweet. uate the performance of each model, F-measure which is the joint
Fig. 8 shows TPR and TNR of the applied ML models. the highest rate consideration of the two indicators are used to examine the classification
was obtained by Gradient boosted trees, SVM and Naïve Bayes with TPR performance. As shown, Random forest provided better performance
of 0.93 indicating that 93 out of 100 sweet samples were classified with the highest F-Measure of 91.7% closely followed by SVM (91.4%)
correctly. On the other hand, the highest TNR of 0.92 (92 out of 100 and Gradient boosted trees (91.0%). In contrast, KNN and decision tree
unsweet samples were classified correctly) was obtained by Random had relatively low values in terms of accuracy and F-measure showing
forest closely following by Gradient boosted trees and ANN. Decision that the two models did not perform well in the classification.
tree was found as the model which had the lowest classification per­ AUC values of the eight ML models are provided in Fig. 10. As seen,
formance. Moreover, it could be noted that although Naïve Bayes had except Decision tree, the AUC values of all the ML models were larger
the largest TPR, the model was not recommended for the classification than 0.90 which could be considered as outstanding (Hosmer and
because of the low TNR. Lemeshow, 2000). The high AUC values confirm the reliability of the
Fig. 9 shows the bar charts of the accuracy, precision, recall, and F- proposed method for classifying the watermelon sweetness. The top
measure of the applied ML models. In terms of the accuracy, all models three highest AUC values were obtained by Random forest, Gradient
obtained good performance with the accuracy over 85%. The result boosted trees and SVM.
indicates that this proposed method which employs Fmax, Entropy, and Based on the comparison using various indicators, it was found that
Gradient boosted trees, Random forest, and SVM had superior perfor­
mance in terms of the accuracy, F-measure and AUC. The results show
Table 4 that these three models can be successfully used to classify the sweetness
Confusion matrices of different ML models.
of watermelon based on the input features. On the other hand, K-nearest
ML models Actual condition Predicted condition neighbors and Decision tree are not recommended to use for the clas­
Unsweet Sweet sification since the models have a comparatively low level of the clas­
Gradient boosted trees (GBT) Unsweet 91 7 sification performance. Finally, all the considered performances
Sweet 9 93 discussed above are shown in Table 5.
Support vector machine (SVM) Unsweet 90 7
Sweet 10 93
Logistic regression Unsweet 90 8
Sweet 10 92
3.2. Classification performance using different combined features
Artificial neural network (ANN) Unsweet 91 11
Sweet 9 89 This section presents the influences of using different combinations
Random forest Unsweet 92 9 of the input features on the performance of the sweetness classification.
Sweet 8 91
For the further discussion in this section, the classification model ob­
Decision tree Unsweet 88 13
Sweet 12 87 tained by Gradient boosted trees was used as the model provided the
K-nearest neighbors (KNN) Unsweet 81 19 highest accuracy. Two cases of different combined features are shown in
Sweet 10 90 Table 6. Accuracy and AUC were used to evaluate the performance of
Naïve Bayes Unsweet 86 7 each case.
Sweet 14 93
It was observed that the model using all three features had the

6
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

Fig. 9. Bar charts of Accuracy, Precision, Recall, and F-Measure.

accuracy of 92% while the model using only Fmax and Entropy value had
the accuracy of 87%. The results showed that the classification perfor­
mance was lower if the input variables was reduced. Thus, all three
features are recommended to use in this proposed method in order to
obtain the highest classification performance. However, it was also
observed that the method can achieve high accuracy using only two
features (Fmax, Entropy value). The AUC of the classification model with
the two features reached 0.92 which was still considered as outstanding
discrimination (Hosmer and Lemeshow, 2000). This demonstrate that
even though the watermelon weight is unknow, the classification per­
formance maintains highly accurate. The results support the possibility
to apply the proposed method for portable instrument such as smart
phones which may be difficult to use the devices to measure watermelon
weight.

4. Conclusion

In this paper, we propose a non-destructive detection method for


Fig. 10. AUC of the applied models. classifying watermelon sweetness using tapping sound signals, water­
melon rind patterns, and weight. Acoustic signal and image processing
technologies are applied to extract features from tapping sounds and
Table 5 watermelon rind patterns, respectively. Therefore, maximum frequency
Performance of the eight ML models. (Fmax) from acoustic response signals, Entropy value from the images of
Performance watermelon rind patterns, and watermelon weight are used as input
Classifier Accuracy AUC Precision Recall F- features for classifying sweetness indicated by SSC. The SSC value of 9
(%) (%) (%) measure ◦
Brix is used as threshold to classify the sweetness. Kinnaree cultivar
(%)
watermelon is used as a case study. This proposed method is simple, low-
Gradient boosted trees 92.0 0.97 93.0 92.0 91.0 cost, and easy to use by small businesses, farmers, and customers while
(GBT) provides high accuracy. Eight classification-based ML models are
Support vector machine 91.5 0.97 93.1 90.0 91.4
(SVM)
applied to develop the sweetness classification models namely Naïve
Logistic regression 91.0 0.96 92.3 90.0 91.9 Bayes, KNN, Decision tree, Random forest, ANN, Logistic regression,
Artificial neural network 90.0 0.96 89.9 91.0 90.2 SVM, and GBT. The models are validated using 5-fold cross validation,
(ANN) and are evaluated the performance using accuracy, precision, recall, F-
Random forest 91.5 0.97 91.8 92.0 91.7
measure, and AUC. Moreover, the effects of different combined features
Decision tree 87.5 0.85 87.8 88.0 87.7
K-nearest neighbors 85.5 0.92 83.0 90.0 86.1 on the classification performance are explored. The main findings based
(KNN) on results obtained by this study are as follows:
Naïve Bayes 89.5 0.96 92.7 86.0 89.0
- This proposed method is efficient for classifying the sweetness of
watermelon. Fmax (from acoustic signal processing), Entropy value
(from image processing), and watermelon weight can be successfully
used to classify watermelon sweetness via ML techniques. The
highest classification accuracy achieves 92%, obtained by Gradient
Table 6 boosted trees.
Accuracy and AUC with different combinations of input features. - Gradient boosted trees, Support vector machine, and Random forest
Feature Accuracy (%) AUC are the most successful models for classifying watermelon sweetness.
Fmax, Entropy value, Weight 92.0 0.97
On the other hand, K-nearest neighbors, and Decision tree are not
Fmax, Entropy value 87.0 0.92 recommended for the classification.

7
K. Chawgien and S. Kiattisin Computers and Electronics in Agriculture 181 (2021) 105938

- A good classification capability can be obtained even only Fmax and watermelon fruits. J. Sci. Food Agric. 97, 479–487. https://doi.org/10.1002/
jsfa.7749.
Entropy value are used in this proposed method. The use of two
Arendse, E., Fawole, O.A., Magwaza, L.S., Opara, U.L., 2018. Non-destructive prediction
features is suitable for portable detection instrument which may of internal and external quality attributes of fruit with thick rind: A review. J. Food
have limitations for weighing watermelons. However, all three fea­ Eng. 217, 11–23. https://doi.org/10.1016/j.jfoodeng.2017.08.009.
tures are recommended to use in order to achieve higher accuracy. Collins, J.K., Wu, G., Perkins-Veazie, P., Spears, K., Claypool, P.L., Baker, R.A.,
Clevidence, B.A., 2007. Watermelon consumption increases plasma arginine
concentrations in adults. Nutrition 23, 261–266. https://doi.org/10.1016/j.
Although the results of this study are based on the sample set from nut.2007.01.005.
one farm in Thailand with a single classification point, the results De, I., Sil, J., 2012. Entropy based fuzzy classification of images on quality assessment.
J. King Saud University - Computer and Information Sciences 24, 165–173. https://
indicate that the proposed method can be used to classify the sweetness doi.org/10.1016/j.jksuci.2012.05.001.
of watermelon effectively. However, the results in this study may not be El-Bendary, N., El Hariri, E., Hassanien, A.E., Badr, A., 2015. Using machine learning
adequate to conclude that the best effective ML technique can be applied techniques for evaluating tomato ripeness. Expert Syst. Appl. 42, 1892–1905.
https://doi.org/10.1016/j.eswa.2014.09.057.
for all watermelon varieties. To apply this proposed method for other Gouveia, L.T.D., Costa, F., Senger, L.J., Albertini, M.K., Mello, R.F.D., 2011. Entropy-
watermelon varieties, modifications regarding tuning the hyper- Based Approach to Analyze and Classify Mineral Aggregates. J. Comput. Civil Eng.
parameter of the ML models might be needed. More samples are rec­ 25, 75–84. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000071.
Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression, 2nd ed. Wiley, New
ommended to create more generalisation of the classification model. York. https://doi.org/10.1080/00401706.1992.10485291.
Moreover, deep-learning techniques such as Convolutional neural net­ Jie, D., Wei, X., 2018. Review on the recent progress of non-destructive detection
works (CNN) could also be used to train the image dataset. However, technology for internal quality of watermelon. Comput. Electron. Agric. 151,
156–164. https://doi.org/10.1016/j.compag.2018.05.031.
sufficient learning samples need to be ensured to achieve high-quality
Jie, D., Xie, L., Rao, X., Ying, Y., 2014. Using visible and near infrared diffuse
results. transmittance technique to predict soluble solids content of watermelon in an on-line
As this proposed method does not require advanced technologies, it detection system. Postharvest Biol. Technol. 90, 1–6. https://doi.org/10.1016/j.
is convenient to implement in portable and commercial on-line detec­ postharvbio.2013.11.009.
Krawozyk, A., Turowski, J., 1987. The Mathematical Theory of Communication. IEEE
tion equipment. Moreover, the concept of this proposed method can be Trans. Magn. 23, 3032–3037. https://doi.org/10.1109/TMAG.1987.1065451.
applied for classifying other fruits or objects which can emit sound Kyriacou, M.C., Leskovar, D.I., Colla, G., Rouphael, Y., 2018. Watermelon and melon
signals by hitting and have surface details/textures such as melon, or fruit quality: The genotypic and agro-environmental factors implicated. Sci. Hortic.
234, 393–408. https://doi.org/10.1016/j.scienta.2018.01.032.
gems and mineral aggregates. In addition, this method can be easily Kyriacou, M.C., Soteriou, G.A., Rouphael, Y., Siomos, A.S., Gerasopoulos, D., 2016.
applied as mobile application in smartphones so that everyone can use to Configuration of watermelon fruit quality in response to rootstock-mediated harvest
classify watermelon sweetness. The improvements of the camera and maturity and postharvest storage. J. Sci. Food Agric. 96, 2400–2409. https://doi.
org/10.1002/jsfa.7356.
microphone in smartphones allow users to interact with the mobile Lee, W., Xiang, D., 2001. Information-theoretic measures for anomaly detection. Proc.
devices more conveniently. Instead of selecting watermelons based on IEEE Computer Society Symposium on Research in Security and Privacy 130–143.
human judgement, the mobile application can provide a better way to https://doi.org/10.1109/secpri.2001.924294.
Mao, J., Yu, Y., Rao, X., Wang, J., 2016. Firmness prediction and modeling by optimizing
estimate the sweetness of watermelon more precisely. acoustic device for watermelons. J. Food Eng. 168, 1–6. https://doi.org/10.1016/j.
jfoodeng.2015.07.009.
Declaration of Competing Interest Menon, S. V, Rao, • T V Ramana, Doshi, B.R., 2012. Enzyme Activities during the
Development and Ripening of Watermelon (Citrullus lanatus (Thunb.) Matsum. &
Nakai) Fruit.
The authors declared that they have no known competing financial Mohd Ali, M., Hashim, N., Bejo, S.K., Shamsudin, R., 2017. Rapid and nondestructive
interests or personal relationships that could have appeared to influence techniques for internal and external quality evaluation of watermelons: A review.
the work reported in this paper. Sci. Hortic. 225, 689–699. https://doi.org/10.1016/j.scienta.2017.08.012.
Muhammad Jawad, U., Gao, L., Gebremeskel, H., Safdar, L.B., Yuan, P., Zhao, S.,
Xuqiang, L., Nan, H., Hongju, Z., Liu, W., 2020. Expression pattern of sugars and
Appendix A. Supplementary data organic acids regulatory genes during watermelon fruit development. Sci. Hortic.
265, 109102 https://doi.org/10.1016/j.scienta.2019.109102.
Zaki, M.J., Jr. Wagner Meira, 2020. Data Mining and Analysis: Fundamental Concepts
Supplementary data to this article can be found online at https://doi. and Algorithms, 2nd ed, Personality and Social Psychology Bulletin. Cambridge
org/10.1016/j.compag.2020.105938. University Press. https://doi.org/10.1145/3054925.
Zhang, H., Ge, Y., 2016. Dynamics of sugar-metabolic enzymes and sugars accumulation
during watermelon (Citrullus lanatus) fruit development. Pak. J. Bot. 48,
References 2535–2538.
Zhu, Q., Gao, P., Liu, S., Zhu, Z., Amanullah, S., Davis, A.R., Luan, F., 2017. Comparative
Ahmad Syazwan, N., Shah Rizam, M.S.B., Nooritawati, M.T., 2012. Categorization of transcriptome analysis of two contrasting watermelon genotypes during fruit
watermelon maturity level based on rind features. Procedia Eng. 41, 1398–1404. development and ripening. BMC Genomics 18, 1–20. https://doi.org/10.1186/
https://doi.org/10.1016/j.proeng.2012.07.327. s12864-016-3442-3.
Akashi, K., Mifune, Y., Morita, K., Ishitsuka, S., Tsujimoto, H., Ishihara, T., 2017. Spatial
accumulation pattern of citrulline and other nutrients in immature and mature

You might also like