You are on page 1of 5

Computers and Electronics in Agriculture 156 (2019) 307–311

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Original papers

Raw sugarcane classification in the presence of small solid impurity amounts T


using a simple and effective digital imaging system

Wesley Nascimento Guedesa, Fabíola Manhas Verbi Pereiraa,b,
a
Bioenergy Research Institute (IPBEN), Institute of Chemistry, São Paulo State University (UNESP), Araraquara, São Paulo 14800-060, Brazil
b
Department of Chemistry, Idaho State University, Pocatello, ID 83209, United States

A R T I C LE I N FO A B S T R A C T

Keywords: Specific amounts of solid impurities in raw sugarcane need to be detected before raw materials are carried into
Solid impurities mills. Solid impurities come from the plant, e.g., green and dry leaves and soil. This study proposed to classify
Sugarcane sugarcane via a new strategy using a well-established method that combines digital images converted into ten
Bioenergy color-scale color histograms of red (R), green (G) and blue (B), RGB; hue (H), saturation (S) and value (v), HSV;
Digital images
relative colors of RGB, rgb; and luminosity (L) with multivariate classification methods. Sampling was performed
Chemometrics
using a mixture design that comprised 122 different combinations of sugarcane stalks, vegetal plant parts and
soil to achieve 100 wt% for evaluating the desirable and undesirable situations for the solid impurity amounts.
Classical algorithms, such as soft independent modeling of class analogy (SIMCA), partial least squares dis-
criminant analysis (PLS-DA) and k nearest neighbors (kNN), were used to perform the calculations. Receive
operating characteristic (ROC) revealed the high sensitivity and specificity of the three algorithms using the
color histogram data. The outstanding result was the ability to classify sugarcane content higher than 85 wt%,
which is considered high-quality raw material by cane mills.

1. Introduction previous studies (Andrade et al., 2017, Andrade et al., 2018), the che-
mical elements related to these solid impurities were detected using a
The presence of the impurities that form during the sugarcane in- laser-induced breakdown spectroscopy (LIBS) system on leached solu-
dustrial process in refineries reduces the quality of the extracted juice tions immobilized in a polyvinyl alcohol (PVA) polymer substrate
and adds undesirable substances, such as starch, to the sugar and (Andrade et al., 2017). The most important elements were Ca, Mg and
compromises ethanol production ((Eggleston et al., 2010, Thai and K.
Doherty, 2012, Cole et al. 2013, Bakir et al. 2016)). However, the In this study, a new method was used to detect high sugarcane stalk
subject of this study is the solid impurities that commonly aggregate contents by converting robust digital images into a numerical matrix
before raw materials are carried into mills (Castro et al., 2014, related to color histograms. The solid sample compositions included the
Everingham et al., 2016). sugarcane stalks, vegetal parts of the plant and soil. A simple system
The solid impurity sources in sugarcane are vegetal parts from the was developed to acquire the digital images. These images were con-
plant tops (green and dry leaves) and soil, which are inherent to the verted into ten color-scale descriptors: red (R), green (G) and blue (B),
harvesting process, i.e., harvesting green or burnt cane (Castro et al. i.e., the RGB system; hue (H), saturation (S) and value (V), i.e., the HSV
2014, Norris et al., 2015, Lisboa et al., 2018). For sugar and ethanol system; relative colors of RGB, i.e., rgb; and luminosity (L) (Santos et al.
production, the relevant and desired part of the sugarcane plant is the 2012, Camargo et al., 2018). The application of this method for an
stalk, which can be used for effective juice extraction via a roller mill agricultural and industrial issue in a novel manner results in a very
(Norris et al., 2015). simple, effective and feasible method that can be used in cane refineries
Solid impurities in raw sugarcane are a noteworthy subject, and to detect raw sugarcane in the presence of solid impurities. The ad-
sugar and ethanol refineries need methods to evaluate impurities to vantages of this method are its high analytical frequency due to its
avoid low-quality final products and determine the appropriate use of simplicity, fast image capture speed, lack of chemical use and low cost.
several chemicals to correct the production process. In one of our Another advantage is the portability of the devices used to capture the


Corresponding author at: Bioenergy Research Institute (IPBEN), Institute of Chemistry, São Paulo State University (UNESP), Araraquara, São Paulo 14800-060,
Brazil.
E-mail address: fabiola.verbi@unesp.br (F.M.V. Pereira).

https://doi.org/10.1016/j.compag.2018.11.039
Received 24 July 2018; Received in revised form 15 October 2018; Accepted 26 November 2018
Available online 01 December 2018
0168-1699/ © 2018 Elsevier B.V. All rights reserved.
W.N. Guedes, F.M.V. Pereira Computers and Electronics in Agriculture 156 (2019) 307–311

images, which allows for local analysis and easy coupling with in-line Black
systems. plastic
RGB image analysis is a fuel science method that has been used to box
measure the wax appearance temperature (Belati and Cajaiba, 2018). Digital
Researchers used a webcam sensor, and each image represented the camera
RGB patterns as an analytical response. At the beginning of crystal-
lization, the enhanced brightness promotes an increase in the RGB va-
lues. Good results were demonstrated, and the real-time analysis results Tripod
40 cm
agreed with the calorimetric measurements.
In this study, we propose a powerful method that can be applied by
using an available and easy-to-use computational tool to convert images
into color scales. The sugarcane content assessment was performed
using multivariate classification models (Lavine and Rayens, 2009)
with the aim of creating an analytical method to measure the quality of Sample
raw materials before introducing them into the industrial stream at
sugarcane mills.
Paper
tray
2. Materials and methods

2.1. Samples
Fig. 2. Setup for image acquisition using a box and digital camera on a tripod.
Sugarcane stalks, vegetal plant parts and soil were collected at a In the paper tray is one of the solid mixture samples. The distance between the
crop field located in Ibaté, São Paulo State, Brazil. A washing procedure camera and the tray sample is 40 cm.
was not performed on this material to detect the levels of solid im-
purities before the raw material to be introduced into an industrial or reflection of light from the environment. The images were recorded
stream. The compositions of the samples were outlined by a mixture with the tray in a horizontal position with a 1600 × 1200 pixel size
design comprising 122 different combinations of sugarcane stalk (SC, (width × height) and 300 × 300 dpi (dots per inch) resolution. The
desired material), vegetal plant parts, i.e., green and dry leaves (vegetal image acquisition setup was configured to mimic real conditions at
impurities, VI), and soil (S) to cover the desirable and undesirable si- sugarcane mills. The focal distance of the camera was 10 mm with a
tuations for the solid impurity amounts, as shown in Fig. 1. The number maximum aperture of 3.5, and the region of interest (ROI) corre-
of samples in the range between 90 and 100 wt% SC was increased by sponded to 100% of the original image, as shown in Fig. 2. During the
24 samples for a total of 146 samples. The components of each sample acquisition of the images, the camera software automatic adjustments
were separately weighed and combined to achieve 100 wt%. The were intentionally disabled. Five images were recorded per sample for a
samples were divided in the following manner: range 1 (SC from 85 to total of 730 images; the samples were shaken after each image re-
100 wt%) with 46 samples, range 2 (SC from 66 to 83 wt%) with 51 cording.
samples and range 3 (from 41 to 65 wt%) with 49 samples, as shown in
Table 1S in the Supplementary material. Range 1 presents the ideal 2.3. Data analysis
composition for a sugarcane refinery.
The first step is to read the image using the function ‘imread’ in
2.2. Image acquisition setup Matlab R2015b (The MathWorks, Natick, MA, USA). Afterwards, the
original images were converted into color histograms using the function
Each solid mixture was placed in a paper tray (26.5 cm × 21.5 cm) ‘imhist’ in Matlab and the averaged color values, which were comprised
inside a black box for image acquisition using a digital camera Nikon of ten color-scale descriptors: R (red), G (green), B (blue), their relative
(COOLPIX S3500, Tokyo, Japan) with 20.1-megapixel resolution, as colors (r, g and b), H (hue), S (saturation), V (value) and L (luminosity),
shown in Fig. 2. The black box was chosen to avoid possible absorption using a laboratory-made Matlab code available in the reference

0 Fig. 1. Diagram of the mixture design out-


0 15 lined for sugarcane stalk (SC), vegetal im-
100
2 purity (VI) and soil (S) contents in samples.
The orange diamonds represent samples
4
with a range between 85 and 100 wt% SC;
25 10 the gray squares represent the range from 66
75 6
VI

to 83 wt% SC; and the gray circles represent


%)

(w

8 the range from 41 to 65 wt% SC. The en-


wt

t%
)

VI
%

S(

larged diagram is the target range of this


)
wt

(w

50 10 study. (For interpretation of the references


S(

t%

50 5 to color in this figure legend, the reader is


)

12 referred to the web version of this article.)


14
75
25
0
85 90 95 100
SC (wt%)
100
0
0 25 50 75 100
SC (wt%)

308
W.N. Guedes, F.M.V. Pereira Computers and Electronics in Agriculture 156 (2019) 307–311

2
6 85 wt% (SC), 5 wt% (VI) and 10 wt% (S) 4x10
85 wt% (SC), 5 wt% (VI) and 10 wt% (S)
2x10 A E
6
1x10
2
2x10
5
1x10
0
0 2
85 wt% (SC), 10 wt% (VI) and 5 wt% (S) 4x10
6 85 wt% (SC), 10 wt% (VI) and 5 wt% (S)
2x10 B F
6
1x10
2
2x10
5
1x10
0
Counts

Counts
0 2
85 wt% (SC) and 15 wt% (S) 4x10
6 85 wt% (SC) and 15 wt% (S)
2x10 C G
6
1x10
2
2x10
5
1x10
0
0 2
4x10
6 85 wt% (SC) and 15 wt% (VI) 85 wt% (SC) and 15 wt% (VI)
2x10
6
D H
1x10 2
2x10
5
1x10
0
0
0 256 512 768 1024 1280 1536 1792 2048 2304 2560 1 2 3 4 5 6 7 8 9 10
Color scale Averaged color values

Fig. 3. Profiles of the color histograms for samples with 85 wt% SC and different amounts of impurities: (a) 5 wt% vegetal impurity (VI) and 10 wt% soil (S), (b) 10 wt
% VI and 5 wt% S, (c) 15 wt% S and (d) 15 wt% VI; the colors are represented by 1–256 for red (R), 257–512 for green (G), 513–768 for blue (B), 769–1024 hue (H),
1025–1280 saturation (S), 1281–1536 intensity (I) (or value), 1537–1792 relative red (r), 1793–2048 relative green (g), 2049–2304 relative blue (b), and 2305–2560
luminosity (L). Averaged color values for the same samples (e) - (h); the numbers from 1 to 10 are averaged values for R, G, B, H, S, V, r, g, b and L. The images
correspond to the sample composition of the plots. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this
article.)

6x10
5
0.4 r
85-100 66-83 41-65 training set b
85-100 66-83 41-65 validation set
A g B
Loading values for PC1

5
3x10

0.2
H S
PC2 (25%)

-3x10
5
0.0

5
-6x10

5
-0.2
-8x10
-4x10
5
0 7x10
5
1x10
6
0 256 512 768 1024 1280 1536 1792 2048 2304 2560
PC1 (45%) Colors scale

Fig. 4. Plots from the PCA for (a) scores representing the three ranges of SC in wt% for the training (orange diamonds) and validation (white diamonds) data sets and
(b) loading values for PC1 with 45% explained variance calculated for the color histogram data. (For interpretation of the references to color in this figure legend, the
reader is referred to the web version of this article.)

(Camargo et al., 2018). Then, two data matrices were generated, one response corresponds to the average of the color interval, e.g., for R
for the color histograms and the other for the averaged color values, color, the average of values from 1 to 256 (Fig. 3A, Fig. 3B, Fig. 3C and
and separately evaluated (Fig. 3). Fig. 3D). For the calculations, the independent variables of the color
For the matrices, each sample was represented by an average of five histograms were mean-centered, and the averaged color values were
data images. Then, the data matrices were organized as follows: (1) 146 auto-scaled. The different preprocessing methods are related to the
rows × 2560 columns for the ten color histograms (256 value in- nature of the data; i.e., the color histograms have zero values, and the
tensities for each color) and (2) 146 rows × 10 columns for the aver- averaged color values have wide variations in magnitude and no zero
aged values of the 10 colors. The rows comprised the samples, and the values, as shown in Fig. 3.
columns were the colors. The response (independent variables) is the The two types of preprocessing, mean-centered and auto-scaled,
frequency at which each color is repeated and is represented by the ensure the mean of the independent variables equals zero, but the auto-
counts in the color histograms; in the case of the averaged colors, each scaled data have a standard deviation equal to one.

309
W.N. Guedes, F.M.V. Pereira Computers and Electronics in Agriculture 156 (2019) 307–311

1.0 1.0

A A
Sensitivity

Sensitivity
0.5 0.5

0.0 0.0
0.0 0.5 1.0 0.0 0.5 1.0
1-Specificity 1-Specificity
1.0 1.0

B B
Sensitivity

Sensitivity
0.5 0.5

0.0 0.0
0.0 0.5 1.0 0.0 0.5 1.0
1-Specificity 1-Specificity
1.0
1.0

C C
Sensitivity

Sensitivity

0.5
0.5

0.0
0.0 0.5 1.0 0.0
1-Specificity 0.0 0.5 1.0
1-Specificity
Fig. 5. ROC curves (orange) for the training set of the color histogram data
Fig. 6. ROC curves (orange) for the validation set of the color histogram data
calculated using (a) SIMCA, (b) PLS-DA and (c) kNN representing the area
calculated using (a) SIMCA, (b) PLS-DA and (c) kNN representing the area
under the curve. The black dashed-dotted line is the reference line for the 0, 0.5
under the curve. The black dashed-dotted line is the reference line for the 0, 0.5
and 1.0 values of sensitivity and specificity. (For interpretation of the references
and 1.0 values of sensitivity and specificity. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this
to color in this figure legend, the reader is referred to the web version of this
article.)
article.)

The chemometric tools used, i.e., soft independent modeling of class


3. Results and discussion
analogy (SIMCA), PLS for discriminant analysis (PLS-DA) and k nearest
neighbor (kNN), are available on Pirouette 4.5 rev. 1 software
The mixture design outlined for this study furnished a good as-
(Infometrix, Bothell, WA, USA). A well-established Kennard and Stone’s
sessment of the range from 41 to 100 wt% for sugarcane stalks with
algorithm (Kennard and Stone, 1969) was applied to select 20% of the
different amounts of added impurities. Fig. 1 shows the distribution of
samples for the validation set. The performance of the classification
the sample compositions, and the enlarged diagram is the target range
models was measured using the receive operating characteristic (ROC)
from 85 to 100 wt%, which represents high-quality raw sugarcane. In
curve that provides the general discriminating power of a parameter for
general, when a cane refinery receives raw material, the solid impurity
identifying positive and negative events, and the ideal decision is an
amounts are unpredictable and impossible to visualize by the naked eye
area equal to 1 (Brown and Davis, 2006).
because tons of material are shipped hourly. A real example is shown in
the images in Fig. 3, and the colors of the raw sugarcane and solid
impurities are very similar. Three ranges were created. The desirable

310
W.N. Guedes, F.M.V. Pereira Computers and Electronics in Agriculture 156 (2019) 307–311

stalk content value (85–100 wt%) is denoted as range 1, and the other and 0.92 for SIMCA, PLS-DA and kNN, respectively. These values were
two ranges have a balanced number of samples. accurate and confirmed the PC, LV and NN selections did not overfit the
The image system was built based on our previous experience with data. The noteworthy result is that misclassifications occurred between
methods involving digital images and chemometric tools (Camargo ranges 2 and 3 but did not occur for the samples in range 1 (the desired
et al. 2018, Pereira et al., 2011). The general concept is to employ a range). In other words, in the model results, any sample in range 1 was
simple, easy-to-use system that can be configured and used in any lo- misclassified as a member of range 2 or 3. For PLS-DA, the samples
cation. Note that the resolution of the image was not high (300 × 300 were not classified as members of any range. In addition, the three
dpi), which allows the use of other simple devices, such as a cellphone classifiers showed similar performances for both the training and vali-
or webcam, instead of a digital camera. The images were captured in a dation sets.
manner (Fig. 2) that did not require the use of sophisticate software to
crop or select the useful image area for conversion into a color histo- 4. Conclusions
gram.
Fig. 3 shows the information from the images after their conversion An exploratory analysis using PCA for the color histograms revealed
into color histograms (Fig. 3A–D) and the averaged color values the tendency of samples with 15 wt% impurities to cluster. The image
(Fig. 3E–H). The preliminary evaluation reveals that the highest color method combined with three different multivariate models can classify
variations occur for 769–1024 (H) and 1025–1280 (S) and the relative raw sugarcane content above 85 wt% in the presence of different solid
colors 1537–1792 (r), 1793–2048 (g) and 2049–2304 (b). The averaged impurity amounts. The ROC curves confirmed sensitivity and specificity
color values show slight variations. To better evaluate the data, a areas above 0.97 for the models using the color histogram data.
principal component analysis (PCA) was separately performed on both
matrices. The color histogram data in Fig. 4 show the information re- Acknowledgments
lated to the ranges. Range 1 (SC from 85 to 100 wt%) was not clustered
with the other ranges, as shown in Figs. 4A. B shows that the cluster The authors are grateful to the São Paulo Research Foundation
variables with the highest influence were H, S, r, g and b due to their (FAPESP, 2016/00779-6 and 2017/05550-0) and the Coordination for
high loading values. The same analysis was performed on the averaged the Improvement of Higher Education Personnel (CAPES, W.N.G. grant
color values, and the samples in range 1 overlapped with the samples in fellowship).
range 2 from 66 to 83 wt% (Fig. 1SA in Supplementary material). The
variables H, S, r, g and b do not have the same influence on the aver- Appendix A. Supplementary material
aged color values. The loading values for H, S, V, r, g and b for the PC1
and PC2 visualization were near zero (Fig. 1SB, Supplementary mate- Supplementary data to this article can be found online at https://
rial). Thus, these variables have not a large influence on the dis- doi.org/10.1016/j.compag.2018.11.039.
crimination of raw sugarcane with small impurity amounts for the
averaged color values. A variable selection was performed, but no im- References
provements were verified for either data set, which confirmed that the
entire color histogram profile provides complete information for sample Eggleston, G., Grisham, M., Antoine, A., 2010. Clarification properties of trash and stalk
tissues from sugar cane. J. Agric. Food Chem. 58, 366–373.
clustering.
Thai, C.C.D., Doherty, W.O.S., 2012. Characterisation of sugarcane juice particles that
The averaged color value data did not show good specificity and influence the clarification process. Int. Sugar J. 114, 719–724.
sensitivity values for range 1. However, a small dataset with 10 vari- Cole, M., Eggleston, G., Gilbert, A., Rose, I., Andrzejewski, B., St Cyr, E., Stewart, D.,
2013. The presence and implication of soluble, swollen, and insoluble starch at the
ables was tested and compared with the color histogram data con- sugarcane factory and refinery. Int. Sugar J. 115, 844–851.
taining 2560 variables. ROC curves for the data training set are shown Bakir, H., Zhang, Z., Zbik, M.S., Harrison, M.D., Doherty, W.O.S., 2016. Understanding
in Fig. 2SA, 2SB and 2SC in the Supplementary material. The areas were flocculation properties of soil impurities present in the factory sugarcane supply. J.
Food Eng. 189, 55–63.
0.55, 0.63 and 0.88 for the SIMCA, PLS-DA and kNN model training Castro, S.G.Q., Franco, H.C.J., Mutton, M.A., 2014. Harvest managements and cultural
sets, respectively, and the areas were 0.44, 0.38 and 0.69 for the vali- practices in sugarcane. Rev. Bras. Ci. Solo 38, 299–306.
Everingham, Y., Sexton, J., Skocaj, D., Inman-Bamber, G., 2016. Accurate prediction of
dation set, as shown in Fig. 3SA, 3SB and 3SC, respectively, in the sugarcane yield using a random forest algorithm. Agron. Sustain Dev. 36, 27.
Supplementary material. These values were not sufficiently accurate to Norris, C.P., Norris, S.C., Landers, G.P., 2015. A new paradigm for enhanced industry
propose a classification model since a consensus among the values was profitability: Post-harvest cane cleaning. Int. Sugar J. 117, 222–428.
Lisboa, I.P., Cherubin, M.R., Lima, R.P., Cerri, C.C., Satiro, L.S., Wienhold, B.J., Schmer,
not achieved. M.R., Jin, V.L., Cerri, C.E.P., 2018. Sugarcane straw removal effects on plant growth
The color histograms showed more potential, and the parameters and stalk yield. Ind. Crops Prod. 111, 794–806.
selected for the models are described as follows. SIMCA required 3 Andrade, D.F., Sperança, M.A., Pereira-Filho, E.R., 2017. Different sample preparation
methods for the analysis of suspension fertilizers combining LIBS and liquid-to-solid
principal components (PCs) for each class with an explained variance matrix conversion: determination of essential and toxic elements. Anal. Methods 9,
from 85 to 91%. After selecting the 3 PCs, the monotonic variation was 5156–5164.
Andrade, D.F., Guedes, W.N., Pereira, F.M.V., 2018. Detection of chemical elements re-
verified, and no improvement was shown for the models. For PLS-DA, 6 lated to impurities leached from raw sugarcane: use of laser-induced breakdown
latent variables (LVs) resulted in 91% of the explained variance for each spectroscopy (LIBS) and chemometrics. Microchem. J. 137, 443–448.
class and standard error of cross-validation (SECV) values of 0.2 for Santos, P.M., Wentzell, P.D., Pereira-Filho, E.R., 2012. Scanner digital images combined
with color parameters: a case study to detect adulterations in liquid cow’s milk. Food
range 1 and 0.3 for ranges 2 and 3. The SECV using 6 LVs showed the Anal. Methods 5, 89–95.
best predictive ability of the model. For kNN, 1 neighbor was selected Camargo, V.R., Santos, L.J., Pereira, F.M.V., 2018. A proof of concept study for the
parameters of corn grains using digital images and a multivariate regression model.
for the votes. In this case, 1, 2, 3 or 4 nearest neighbors (NN) showed
Food Anal. Methods 11, 1852–1856.
the same number for misclassifications. To verify the performance of Belati, A., Cajaiba, J., 2018. Measurement of wax appearance temperature using RGB
the models, ROC curves were also applied for the SIMCA (Fig. 5A), PLS- image analysis and FBRM. Fuel 220, 264–269.
Lavine, B.K., Rayens, W.S., 2009. Classification: basic concepts. In: Brown, S.D., Tauler,
DA (Fig. 5B) and kNN (Fig. 5C) models for samples in range 1 with a R., Walczak, B. (Eds.), Comprehensive chemometrics: chemical and biochemical data
high sugarcane content from 85 to 100 wt%. According to these results, analysis. Elsevier: Amsterdam, The Netherlands, vol. 3, pp. 507–515 (Chapter 3.15).
SIMCA and kNN showed maximum area values equal to 1 for either Kennard, R., Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11,
137–148.
quality parameter, i.e., sensitivity and specificity, and PLS-DA achieved Brown, C.D., Davis, H.T., 2006. Receiver operating characteristics curves and related
a value of 0.97. The black dashed-dotted line is the reference for values decision measures: a tutorial. Chemom. Intell. Lab. Syst. 80, 24–38.
Pereira, F.M.V., Milori, D.M.B.P., Pereira-Filho, E.R., Venâncio, A.L., Russo, M.S.T.,
0, 0.5 and 1 in the plots of Fig. 5. Afterwards, the validation samples Cardinali, M.C.B., Martins, P.K., Freitas-Astúa, J., 2011. Laser-induced fluorescence
were used to evaluate the predictability of the models. The area results imaging method to monitor citrus greening disease. Comput. Electron. Agric. 79,
for the validation samples are shown in Fig. 6A–C and were 0.81, 0.81 90–93.

311

You might also like