You are on page 1of 29

Accepted Manuscript

Optimization of rice amylose determination by NIR-spectroscopy using PLS


chemometrics algorithms

Pedro Sousa Sampaio, Andreia Soares, Ana Castanho, Ana Sofia Almeida, Jorge
Oliveira, Carla Brites

PII: S0308-8146(17)31513-3
DOI: http://dx.doi.org/10.1016/j.foodchem.2017.09.058
Reference: FOCH 21727

To appear in: Food Chemistry

Received Date: 14 February 2017


Revised Date: 11 August 2017
Accepted Date: 12 September 2017

Please cite this article as: Sampaio, P.S., Soares, A., Castanho, A., Almeida, A.S., Oliveira, J., Brites, C.,
Optimization of rice amylose determination by NIR-spectroscopy using PLS chemometrics algorithms, Food
Chemistry (2017), doi: http://dx.doi.org/10.1016/j.foodchem.2017.09.058

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
TITLE: Optimization of rice amylose determination by NIR-spectroscopy using PLS chemometrics
algorithms

Authors: Pedro Sousa Sampaio1,2*, Andreia Soares1, Ana Castanho1, Ana Sofia Almeida1, Jorge Oliveira3, Carla Brites1

Affiliation:

1
Instituto Nacional de Investigação Agrária e Veterinária (INIAV)

Av. da República,

Quinta do Marquês

2780-157 Oeiras

Portugal

2
Faculty of Engineering, Lusophone University of Humanities and Technology
Campo Grande, 376
1749-019 Lisbon
Portugal

3
University College Cork, School of Engineering

Ireland

E-mail: pnsampaio@gmail.pt

Tel: (+351) 214403684

Fax: (+351) 214416011

1
Abstract

Determining amylose content in rice with near infrared (NIR) spectroscopy, associated with a suitable multivariate

regression method, is both feasible and relevant for the rice business to enable Process Analytical Technology

applications for this critical factor, but it has not been fully exploited. Due to it being time-consuming and prone to

experimental errors, it is urgent to develop a low-cost, nondestructive and ‘on-line’ method able to provide high

accuracy and reproducibility. Different rice varieties and specific chemometrics tools, such as partial least squares

(PLS), interval-PLS, synergy interval-PLS and moving windows-PLS, were applied to develop an optimal regression

model for rice amylose determination. The model performance was evaluated by the root mean square error of

prediction (RMSEP) and the correlation coefficient (R). The high performance of the siPLS method (R=0.94;

RMSEP=1.938; 8941–8194 cm-1; 5592–5045 cm-1; and 4683–4335 cm-1) shows the feasibility of NIR technology for

determination of the amylose with high accuracy.

Keywords: Multivariate models; Process Analytical Technologies; PLS; iPLS; siPLS; mwPLS.

2
1. Introduction

Rice (Oryza sativa L.), the world's main food crop, is constituted fundamentally by starch. Starch is a complex

polysaccharide of α-D-glucose units exclusively, which are joined by a sequence of α-D-(1,4)-glucosidic linkages thus

giving rise to linear or helical chains referred to as amylose. Although α-(1,6)-glucosidic linkages are much less

frequent, they form branch points between the chains thereby creating highly branched domains, denominated

amylopectin (Pandey, Rani, Madhav, Sundaram, Varaprasad, Bohra, & Kumar, 2012). Starch biosynthesis in higher

plants including rice is catalysed by four classes of enzymes, namely, ADP-Glc pyrophosphorylase (AGPase), starch

synthase, starch branching enzymes and starch debranching enzymes. The enzyme granule bound starch synthase-I

controls the synthesis of amylose in the rice endosperm, while soluble starch synthase, starch branching enzyme and

starch debranching enzymes together control the synthesis of amylopectin (Bao, Sun, & Corke, 2002; Zhang, Cheng,

Zhang, Guo, Su, & Jiang, 2011). Amylose is considered to be the most important determinant of the eating quality of

rice, and based on its content rice varieties can be classified as waxy (0-2%); very low (3-12%); low (13-20%);

intermediate (21-25%) and high (>26%) (Juliano, Perez, Blakeney, Castillo, Kongseree, & Laignelet, 1981). The fine

structure of amylose, both molecular size and chain-length distribution, are also significant factors of the hardness of

cooked rice (Li, Prakash, Nicholson, Fitzgerald & Gilbert, 2016). Amylose content is correlated with the retrogradation

behaviour, influencing the textural properties of cooked rice and the viscoelasticity dynamic of rice starch gel (Lu,

Sasaki, Li, Yoshihashi, Li, & Kohyama, 2009).

The classical method for amylose and amylopectin determination is the colour complex formed by iodine

reaction coupled with potentiometric or amperometric titration. The method is based on the capacity inherent to

amylose to accommodate polyiodide ions, chiefly I5 -, within its helical structure. As amylopectin is unable to form such

complexes because of its short chains and branch linkages interfering with the formation of stable structures, these

complexes are specific for the amylose fraction (Hizukuri, 1996). However, the iodine affinity varies within species,

hence compromising the accuracy of this method. A survey conducted by the international network for quality rice

(INQR) showed that five different versions of the iodine binding method are currently in use and that the reproducibility

was high within laboratories but low between laboratories (Fitzgerald, Bergman, Resurreccion, Moller, Jimenez, &

Reinke, 2009). There are also other methods, such as differential scanning calorimetry (Sievert, & Holm, 1993),

potentiometry (Banks, Greenwood, & Muir, 1971), spectrophotometry (Morrison, & Laignelet, 1983), and

chromatography (Matheson & Welsh, 1988; Yun, Li, & Wood, 2013). The amylose can also be evaluated by the

enzymatic method, developed by Megazyme (Gibson, Solah, & McCleary, 1997). However this method is characterised

by some drawbacks, such as the relatively high cost per sample and, mainly, it is hard testing a large number of samples

3
(Hu, Burton, & Yang, 2010; Soong, Quek, & Henry, 2015). Despite the existence of other procedures, the colorimetric

method still commonly used, and their accuracy was improved by using standards from specific rice varieties carrying

the alleles of the Waxy gene responsible for amylose synthesis and calibration values obtained by separation of

hydrodynamic volume and molecular weight of amylose by size exclusion chromatography (ISO 6647-1,2, 2015).

Near-infrared (NIR) spectroscopy is a promising technique with fast, easy-to-use, and nondestructive analytical

potentials being widely accepted, requiring minimal or no sample preparation (Bart, Himmelsbach, McClung, &

Champagne, 2007). It has become particularly popular in recent years in the pharmaceutical industry to assist the

development of online Process Analytical Technologies (PAT) to achieve Quality by Design in manufacturing.

However, its prediction accuracy depends on sample physical status, chemical components, temperature, colour,

cleanliness, quantity used for measurement and above all, the statistical model used (Bagchi, Sharma, &

Chattopadhyay, 2016). Apparent amylose content has been predicted by NIR spectroscopy using milled rice flour (Bao,

Cai, & Corke, 2001; Delwiche, Bean, Miller, Webb, & Williams, 1995), milled whole grain (Delwiche et al., 1995;

Windham, Lyon, Champagne, Barton, Webb, & McClung, 1997; Shu, Wu, Xia, Gao, & McClung, 1999), or amylose

and proteins in rice flour (Xie, Tang, Chen, Luo, Jiao, Shao, Wei, & Hu, 2014). However, those studies faced several

drawbacks concerning the valuable rice amylose reference data and the model performance. The main difficulty of NIR

spectroscopy with multivariate analysis is related to wavenumber or spectral region selection, especially when the

spectra displays unresolved peaks or fails to identify important features. Several methods have been studied to select the

optimal variables for multivariate calibration to remove irrelevant spectral variables and improve model performance;

The multivariate calibration builds a predictive model relating measured quantities (wavenumbers) to properties of

interest (concentration data). A variety of linear regression methods based on latent variables (LVs) have been

developed to address this problem, such as partial least squares (PLS), but due to several drawbacks, such as the noise

in spectral data, the calibration and prediction errors are high, and the model can be affected (Wold, & Sjostrom, 2001).

Meanwhile, spectral region selection, using appropriate algorithms, was reported to considerably improve the

performance of the full-spectrum calibration techniques, avoiding non-modeled interferences and building a well-fitted

model (Friedel, Patz, & Dietrich, 2013; Lee, Bawn, & Yoon, 2012; Nørgaard, Saudland, Wagner, Nielsen, Munck, &

Engelsen, 2000). Studies then performed showed that it is fundamental to conduct a spectral region selection

responsible for the property of interest to increase the prediction performance (Kalivas, 1997; Spiegelman, McShane,

Goetz, Motamedi, Yue, & Coté, 1998). These methods can be classified into two classes: single wavelength selection

and wavelength interval selection. Actually, several approaches have been proposed for selection of the optimal set of

spectral regions, such an interval PLS (iPLS), synergy PLS (siPLS) and moving window PLS (mwPLS) (Friedel et al.,

4
2013; Leardi, & Noorgard, 2004; Ma, Wang, Chen, Cheng, & Lai, 2017). The principle of iPLS consists of splitting the

spectra into equal-width intervals and developing sub-PLS models for each one. The sub-intervals with the lowest value

of the root mean squared error of prediction (RMSEP) are deemed to be the best. Many methods based on iPLS were

developed to optimise the combination of the selected intervals, such as siPLS (Leardi & Noorgard, 2004). The main

advantage of this kind of method is that it uses a graphical display to focus on a choice of better sub-intervals and

conduct comparisons among the prediction performance of local models and the full-spectrum model. Instead of just

testing a series of adjacent but non-overlapping intervals, which would miss some more informative ones, mwPLS was

proposed to overcome this drawback. It builds a series in a window that moves through the whole spectra and then

chooses the informative intervals with low model complexity and low value of the sum of residuals. The mwPLS is a

modelling technique that can be thought of as a series of diagnostic PLS regressions based on all continuous window

size ‘‘H’’ in the parent data set. In effect, a window of size H is ‘‘moved’’ across the data set to collect modelling

information. The model quality and number of latent variables (LVs) required for model production during this process

can then be used to find the best spectral region(s) of size H. mwPLS is a promising procedure used to conduct

consecutive wavelength selection for building an optimal calibration model; this method is proven to be effective for

waveband selection in analysis of many objects (Chen, Yin, Tang, & Pan, 2017, Yun et al. 2013).

The objective of this study was to test the various methods proposed to develop multivariate models to select

the most appropriate to obtain reliable and accurate measurements of amylose in rice. A large set of rice varieties was

used to challenge the various models. PLS, iPLS, siPLS and mwPLS procedures for NIR quantitative analysis of

amylose were investigated and compared. The different steps required for model calibration were analysed. The number

of PLS factors and the number of region intervals was optimised according to the root mean square error in the

calibration set. The performance of the final model was evaluated according to the RMSEP and the correlation

coefficient (R) with the prediction set. The model thus created can be considered a way to obtain a fast, non-destructive,

accurate and reproducible methodology for amylose determination in different rice varieties (after a suitable milling

procedure), providing a modern gold standard for laboratory and industrial analysis amenable for the development of

PATs for the rice industry.

2. Materials and methods

2.1. Rice sample

For this study, sixteen rice varieties (including Indica and Japonica sub species) from a Portuguese Rice

Breeding Program were grown at three different sites along the basins of 3 different rivers with very different micro-

climates (Alcácer do Sal, Salvaterra de Magos and Montemor-o-Velho, Portugal) along four seasons (2012-2015),

5
providing 168 samples. Also, 11 standard rice varieties, sourced from the International Rice Research Institute, Los

Baños, Philippines, (IRRI), characterised by different amylose content, were also used: IR 65; IR 24; IR 64; WU BAI

LI; IRRI109; IRRI134; IRRI138; IRRI148; IRRI149 and IRRI151.

2.2. Rice flour sample preparation

About 20 g of rice was ground to flour in a Cyclone Sample Mill (Falling number 3100, Perten, Sweden) equipped with

a 0.8 mm screen.

2.3. Amylose determination

Amylose of rice was determined using the standard iodine colorimetric method according to ISO 6647-2 (2015). The

absorbance was measured using a spectrophotometer (Hitachi; Japan) at 720 nm. Amylose content was quantified using

a standard curve created from absorbance values of 4 calibrated samples from standard rice varieties carrying one of the

five alleles of the Waxy gene, which is the gene responsible for amylose synthesis (IR8, IR24, IR64, IR65) obtained

from IRRI. Pure amylose (potato origin) (Sigma-Aldrich, Germany) was also evaluated. The amylose content was

evaluated in duplicate for each sample of rice, and the reference value corresponds to the average.

2.4. Instrumentation and Measurements

The samples containing approximately 25 cm3 of rice flour were loaded in a circular sample cup and pressed slightly to

obtain a similar packing density. Sample spectra were collected using an NIR transflection MPA equipment (Bruker

Optics, Germany). For each rice sample, 16 successive scans were performed, over a wavenumber range (12,000 – 4000

cm-1), at 16 cm-1 of resolution. For each rice sample, two spectra were obtained.

2.5. Principal component analysis (PCA)

Principal component analysis is a linear pattern recognition technique that allows the reduction of the dimensionality of

multivariate data to n principal components. All samples were considered for analysis to enable inferring how sample

variability may affect possible trends from the direct observation of the scores plot. The outliers were identified using

PCA analysis. PCA was performed using MATLAB® 7.9.0 software.

2.6. Data and Multivariate analysis

The NIR raw spectra obtained, after outlier elimination, were treated by different data preprocessing techniques, such as

standard normal variate (SNV) transformation, multiplicative scatter correction (MSC) and smoothing derivative to

obtain reliable qualitative classification and quantitative calibration models. After the SNV and MSC, the spectra were

6
treated using first and second derivatives. Savitzky-Golay smoothing method allowed eliminate the noises like baseline-

drift, tilt, reverse, and so forth (Savitzky & Golay, 1964; Xie, Xiang, Yu, & Deng, 2009).

2.7. Partial least squares (PLS) regression

The PLS regression was performed after outliers identification. The matrices containing the data provided by the NIR

spectra, denominated by X, and the vector Y containing the amylose content, were employed to build the regression

model. The performance of the final PLS model was evaluated according to the RMSEP and the determination

coefficient (R). RMSEP is defined as:

∑
= (  ŷ )


RMSEP (1)

moreover, the coefficient of correlation is:

∑ ( − ŷ )
 = 1 − (2)
∑ ( − ȳ)

where n is the number of samples in the validation test set, yi is the experimentally measured reference result for sample

i and ŷi are the estimated results of the model for the corresponding test sample i. (Eq. 1). The correlation coefficient (R)

between the predicted and the measured values were calculated for both the calibration and the validation test sets with

Eq. 2, where ȳ is the mean of the reference measurement results for all samples in the calibration and test set. The best

combination of spectral regions and the preprocessing techniques were selected by picking the PLS model with a small

RMSEP, a high R and a low number of latent variables (LV) covering enough data variance.

2.8. Wavenumber selection

The iPLS and siPLS were applied to remove irrelevant spectral variables and to improve PLS model performance. The

iPLS models were built on the spectral division into 10, 20, 25 and 50 intervals with a similar width. The iPLS routine

generates graphical information indicating the optimum number of LV used in each interval model and RMSEP values.

In this case, the subinterval that presented the lowest RMSEP values was selected. The siPLS models were constructed

with the spectral set divided into 10, 20, 25 and 50 intervals and combinations from 2 to 3 intervals. The combined

subintervals that presented the lowest RMSEP values were selected. The mwPLS model is a modelling technique that

can be thought of as a series of diagnostic PLS regressions based on all continuous window size ‘H’ in the parent

dataset. In effect, a window of size H is “moved” across the data set to collect modelling information. The model

7
quality and number of LVs required for model production during this process can then be used to find the best spectral

region(s) of size H.

2.9. Software and algorithms

PLS, iPLS, siPLS and mwPLS models were performed using MATLAB software (The Mathworks, Natick, MA, USA).

The iToolbox for MATLAB available from (http://www.models.life.ku.dk/itoolbox) was used for calculation of interval

selection by iPLS, siPLS and mwPLS. The statistical analysis (ANOVA) of the calibration and prediction set

characterization was performed using Excel software (MS-Office2010).

3. Results and discussion

3.1. PCA Analysis

A Principal Component Analysis (PCA) was performed after pre-processing for preliminary examination of NIR

spectra to provide an overview of the data and reveal the similarities and differences among all the samples and

consequently identify outliers. PCA is a popular variable reduction technique that replaces the actual measured

variables by Principal Components, which are linear combinations of them determined sequentially to maximise

orthogonality between the different components, in such a way that each PC explains the highest possible percentage of

the total variance of the data still unexplained. PCA is one of the most frequently used chemometric tools that allow a

projection of data from a higher to a lower dimensional space. A data matrix composed of 354 raw spectra from rice

samples, represented by 1154 variables (i.e. wavenumbers), was taken for PCA analysis, allowing selection and

elimination of the outliers spectra that can interfere negatively with the model construction. The samples that plotted

away from the main cluster in the PC graphs were eliminated, this being considered as evidence of very significant

differences with the other samples. PCA also allowed discriminating the differences that exist inside the total samples.

The main cluster was defined by two small groups characterised, each one, by samples harvested in different years.

Thus, the use of a supervised classification method, with an initial knowledge about the classes to be modelled, is

required. After that, the 313 NIR raw spectra obtained were treated by different preprocessing tools, such as smoothing

derivative, standard normal variate (SNV) transformation and multiplicative scatter correction (MSC) and Savitzky-

Golay filter to obtain a reliable qualitative classification and quantitative calibration models.

3.2. Full spectrum PLS model

To avoid the bias in the sub-set division, all samples were placed in ascending order, based on the amylose content, and

the calibration set was selected to cover the full range of concentrations. After outliers elimination, the 313 spectra

8
related to all samples analysed were divided into two subsets: the calibration set (203), used to build the model and the

validation set (110), which was used to test the robustness of the model. Both subsets randomly constituted covered

similar amylose content ranges (calibration: 0-33.75%; test set: 2.72-33.65%) and means (calibration data: 19.70%; Test

set data: 20.27%). The variability of samples due to rice varieties could impose quite a challenge for the development of

a universal and strong regression model.

The raw NIR spectra (12,000-4000 cm-1) of rice flour samples are plotted in Figure 1-A. A group of atoms in a

molecule may have multiple modes of oscillation caused by stretching and bending motions of the amylose group. The

strongest absorption bands observed at 5184 cm-1 are related to the combination of stretching and bending of the O-H

group of amylose, while the peak at 6835 cm-1 is related to the combination of the first overtone of (O-H) anti-

symmetric stretching and O-H symmetric stretching of amylose molecule, respectively. The weak absorption bands at

8316 cm-1 may be due to second overtone of symmetric stretching (–CH bonds) of methyl (–CH3) groups. The OH and

–C-H bond vibrations are caused by compounds such as amylose, proteins and water (Pandiselvam, Thirupathi, &

Vennila, 2016). The same authors also obtained five absorption peaks 10.792 and 6872 cm-1 due to mainly C–H second

overtone and combination that corresponds to amylose. Based on the studies performed by Bagchi et al. (2016), two

absorption peaks between 6872 and 5058 cm-1 were obtained, and they are related to C=O stretch, O–H and N–H stretch

and also C–H stretch first overtone associated to protein present in the rice (Burns & Curczak, 1992).

The spectra of pure amylose allow to analyse and evaluate the similarities between the bands of rice samples

and the amylose spectra (Fig. 1-B). The NIR spectra of amylose present also major peaks at 4633 cm-1, 4996 cm-1, 5184

cm-1, 6834 cm-1 and 8316 cm-1. The development of the amylose prediction model was accomplished by full spectrum

PLS models without or with preprocessing data (Table 1). The PLS model performed using the raw spectra, without

preprocessing method is characterised by a low R=0.70 and high RMSEP=3.909, due to the significant noise spectra.

As can be seen, the spectral profiles present some tendencies and noise and, therefore, a suitable spectral preprocessing

is necessary to highlight the differences between several rice varieties according to amylose contents, which cannot be

distinguished only by the naked eye when the doping level is low.

Furthermore, to make full use of the informative data and to eliminate noise present in the spectra, data

pretreatment is often needed before establishing the calibration model. Particle size, for example, determines the

spectral path length, which can lead to a substantial effect on the resultant spectrum and consequently the model (Mark,

2001). To minimise the influence of these parameters the raw spectra are usually subjected to preprocessing before

developing calibration models. Pre-treatments recommended to obtain reliable, accurate and stable models were

9
applied, namely smoothing derivatives (R=0.76, RMSEP=3.571 for 1st Derivative and R=0.73, RMSEP=3.761 for 2nd

Derivative), SNV transformation (R=0.69, RMSEP=4.018), MSC (R=0.71; RMSEP = 3.863) as well as Savitzky-Golay

filter (SG (69.4.4) R=0.87, RMSEP=2.678) and SNV+SG (69.4.4) R=0.90, RMSEP=2.435) to remove and highlight the

differences that exists (Fig. 1-C) (Table 1).

The SG filter method was also applied for model optimisation. SG filter contains many different smoothing

modes. The smoothing parameters include the polynomials degree (PD), the derivatives order of polynomials (DOP),

and the number of smoothing points (NSP), are considered as very meaningful. A too-small NSP is prone to cause

calculation error, resulting in a decreased model precision, while a large NSP would over smooth and polish the spectral

data, leading to decreased accuracy. A reasonable choice of NSP is essential for SG smoothing. The NSP could be

appropriately selected according to the PLS model prediction result by combination with the selection of PLS latent

variables. For that reason, an optimisation study of the SG filter was previously carried out to determine the polynomial

degree (PD), the derivatives order of polynomials (DOP) and the number of smoothing points (NSP) that provided best

results. Based on this preliminary study, the optimum parameters (PD=69; DOP=4; and NSP=4) were obtained.

Consequently, the PLS model performed using these parameters allowed to obtain a model characterised by (R=0.87;

RMSEP=2.678). These results showed that it was possible to extract significant information contained in spectra data,

allowing to improve the PLS model.

Given these results, a simultaneous application of spectral pretreatment methods (MSC and SNV plus SG

smoothing) and PLS models were found to be more accurate: (MSG+SG (69.4.4) R=0.88; RMSEP=2,650; and

SNV+SG (69.4.4) R=0.90 and RMSEP=2.435 (Table 1, Fig. 1-D).

Near infrared spectroscopy (NIR) is characterised by an excessive background noise and weak analytical

signals due to near infrared overtones and combinations. NIR spectrum of solid samples is often accompanied by

scattering noise due to the no-uniformity of particle size, such as the rice grain that was previously ground. To make full

use of the informative data and to eliminate noise, the data pretreatment is often necessary before establishing the

calibration model. Savitzky-Golay (SG) smoothing is a widely-used pre-treatment method that can effectively remove

the noises like baseline-drift, tilt, reverse, and so forth (Gorry, 1990; Xie, Xiang, Yu, & Deng, 2009; Delwiche &

Reeves 2010; Chen, Pan, Chen, & Lu, 2011). To overcome the scattering interference, multiplicative scatter correction

method (MSC) is also used in the spectral data once it can segregate the informative absorbance of the analyte and the

scattering signal in the spectral data (Barnes et al. 1989; Silva, Ferreira, Braga, & Sena, 2012). This practical procedure

allows eliminating the spectral differences in the same batch of samples due to the non-uniform particle size. Then, the

10
SG smoothing and MSC are both spectral pretreatment methods with much potential. Indeed, the model effect would be

much different when separately (or combined) using SG smoothing and MSC pretreatment methods. Moreover, the

proper smoothing mode should be selected for the pretreatment optimisation. This requires a significant number of

computer experiments, establishing different NIR spectroscopy analysis models corresponding to different pretreatment

parameters. So, a reasonable model would be determined by contrasting the prediction effects. It is an important way to

improve the predictive ability of NIR spectroscopy analysis, especially for the samples of complex systems (Chen,

Song, Tang, Feng, & Lin, 2013). Moreover, it is evident that the most suitable smoothing mode should be selected for

the pretreatment optimisation. This requires a large number of computer runs, establishing different NIR spectroscopy

analysis models corresponding to different pretreatment parameters (Chen et al., 2013).

Based on these results, the PLS models were different when SG smoothing and MSC/SNV methods were used

separately or combined, respectively. SNV allowed normalising spectra when the effective path length varies among

samples. Such path length variation can occur when measuring the spectra of powdery samples as in this study because

of particle size, as well as colour, variation between samples. The MSC can be considered as a suitable method when

working with samples constituted by particles characterised by different size and structures of solids. As in the flour

obtained from rice, the particle size distribution varies according to the grain hardness, the samples lack of uniformity,

and so their NIR diffuse reflectance spectrum is accompanied by scattering noise. It eliminates the spectral differences

in the same batch of samples because of non-uniform particle size (Fig. 1-C). Spectral data preprocessing removes the

irrelevant information (noise) that cannot be handled properly by regression techniques, and MSC is the most popular

normalization technique used to preprocess the NIR spectral data (Næs, Isaksson, Fearn, & Davies, 2002) to

compensate for additive (baseline shift) and multiplicative (tilt) effects (Martens, & Stark, 1991). According to the PLS

models obtained after each different pretreatment, it was possible to observe that the models were improved,

comparatively to PLS model of the full spectrum without preprocessing method. Meanwhile, based on the correlation

coefficient and RMSEP for all PLS model for full-spectrum, it was not possible to create a suitable and robust

quantitative relationship between the spectral data and the amylose contents in the rice. These poor models can be due

to some regions in the spectra that may contain non-modeled information (noise) and should, therefore, be excluded

from the model. For that reason, it is important to develop a calibration model that must focus on a spectral region

selection.

3.3. iPLS model

11
The development of spectral interval selection was first accomplished by the interval PLS (iPLS) algorithm created by

Norgaard et al. (2000). The principle of this algorithm is to split the total spectra into some smaller equidistant regions

and, consequently, PLS regression models for each sub-interval were developed. After that, the R and RMSEP for every

sub-interval were determined, and the region that presents the lowest RMSEP was chosen, allowing to draw up a

calibration model. The prediction accuracy of the established iPLS model was evaluated by external test validation. The

full spectrum was split into 10, 20, 25 and 50 intervals. The optimal iPLS model obtained for 20 intervals were: MSC +

2nd derivative (R=0.84 and RMSEP=2,885), SNV + 2nd derivative (R=0.84 and RMSEP=3.012); and for 25 intervals,

the optimal model was obtained for Savitzky-Golay filter (69.4.4) (R=0.92 and RMSEP=2.133), MSC + SG (69.4.4)

(R=0.89 and RMSEP=2.475) and SNV + SG (69.4.4) (R=0.91 and RMSEP=2.330) (Table 2).

The scatter plot shows a good correlation between reference measurement and NIR predicted in the calibration

set by the iPLS model (Fig. 2C). In this case, the best iPLS model was achieved after SG filter preprocessing

characterised by 7 PLS components, R=0.92, RMSEP=2.133, selected from 25 intervals that correspond to

wavenumbers in the range 4651–4304 cm-1 (Fig. 2A-B, Table 3). The models performed after MSC+SG and SNV+SG

preprocessing for the spectral region (4651-4304 cm-1) also obtained a suitable regression model, presenting high

(R=0.90) and low RMSEP values (2.475 and 2.330), respectively. NIR spectroscopy records the spectral bands that

mainly correspond to C-H, O-H and N-H vibrations, which are overtone and combination bands, and an NIR method

was constructed to identify the origin and biochemical characteristics of rice variety. These spectral regions are

characterised by a combination of a methyl group (-CH3) (CH-stretching and CH-bending), CH2-combination specific

of amylose molecule. Comparatively to PLS models, the RMSEP values for iPLS models are lower due to specific

spectra range selection per interval range, which could automatically eliminate the weak spectral information, inducing

a RMSEP decreasing comparatively to the full-spectrum PLS, because, according to other studies, the information was

spread over the whole spectral range (Pataca, Borges Neto, Marcucci, & Poppi, 2007). The exclusion of the

uninformative and/or interfering variables contributes should avoid the inclusion of spectral regions that contain

residual information or noise that affect the final regression model. Then, based on these results, the division of full

spectra region in different small intervals allowed to select the suitable region for creating optimised regression model,

being characterised by low prediction error and consequently high accuracy.

3.4. siPLS

In the full spectrum, many informative variables could negatively affect the calibration. Accordingly, a judicious

selection of spectral regions would improve the predictive ability of the PLS model. Synergy interval PLS (siPLS)

12
algorithm used in this work was also developed by Norgaard et al. (2000). The basic principle of this algorithm is

similar to iPLS. Initially, the spectra are split into a specific number of intervals (variable-wise), and after that, PLS

regression models for all possible combinations of two and three intervals are developed. After that, RMSEP is

evaluated for every group of intervals, and the combinations characterised by lowest RMSEP value are selected.

Therefore, spectral interval selection performed by siPLS was implemented to verify whether the combination

of more than one interval would yield the models with better predictive capacity. Table 3 shows the best results of the

siPLS model calibration when the full spectra were split into different intervals. All models were performed after the

spectrum has been divided into 10, 20, 25 and 50 equal intervals and consequently combined. The best siPLS model for

amylose was obtained with 25 and 50 intervals, characterised by low RMSEP and high determination coefficient (R).

The best models were obtained as consequence of a combination of 3 intervals, after a division of 25 intervals, being

characterised by the wavenumber ranges (8941–8194 cm-1, 5592–5045 cm-1; and 4683–4335 cm-1) (Fig. 3-A).

Both regression models, obtained after SG filter and SNV + SG filter pre-processing, were characterized,

respectively, by: (9 LV) (R=0.94 and RMSEP=1.938); and R=0.93; RMSEP=1.979%, for (9 LV). Based on these

parameters, the regression models present high accuracy and can be considered suitable for amylose determination from

a wide variety of rice germplasm (Fig. 3-B). The regression model had a higher determination coefficient comparatively

to results obtained by Xie et al. (2014). According to these models, the high number of intervals allowed to select more

efficiently the best spectra range that contains more complete spectral information and, consequently, to build a strong

model with high correlation coefficient and low prediction error. Based on the selected NIR spectra, (Fig. 3-C-E) are

included the region characterized by the second overtone (anti-symmetric stretching, for methyl group, (–CH3) (8941-

8194 cm-1) which is close to region of interval (8183 and 6850 cm-1) mainly the C-H second overtone and the

combination are responsible, corresponding to amylose (Bagchi et al., 2016). The spectral range (5592-5054 cm-1) then

selected was close to the interval (5875–5495 cm-1) that, according to the studies performed by Fertig, Podczeck, Jee, &

Smith, (2004) and Vichasilp, & Kawano, (2015), might also be linked to the vibration of amylose. The bands between

5149 and 5050 cm-1 correspond to the O-H stretch and O-H band combination and the H-O-H deformation combination,

which represents the starch content (Aenugu, Kumar, Parthiban, Ghose, & Banji, 2011), and N-H/C-H bending in the

plane is at 4878–4830 cm-1 (Burns, & Curczak, 1992). The spectral range selected (4683-4335 cm-1) can also be related

to some starch bands (4760 cm-1) and the protein band of 4587 cm-1 according to Vichasilp, & Kawano (2015).

According to the regression models created, after siPLS and iPLS methodology, the same spectral regions were

selected, proving that both chemometrics techniques permit a more confident extraction of the biomolecular information

present.

13
3.5. mwPLS model

The function of the mwPLS model can be briefly described as the selection of informative regions and the

approximation of latent factors (Du, Chen, Zhong, Wang, Yu, Nordon, Littlejhon, & Holden, 2011). The informative

regions can be optimised by different moving window sizes. The window size considered in this study was set to 31.

The mwPLS models were carried out only for the models that presented low RMSEP and high determination coefficient

(R). Analysing the plot obtained after mwPLS algorithm, it is possible to observe that the spectral region selected

coincides with other analysis performed initially, with iPLS and siPLS algorithms, and was characterised by low

RMSEP values (6303-6079 cm-1; 5863-5747 cm-1; and 4737-4443 cm-1). The mwPLS has the advantage of showing the

evolution of RMSEP along the full spectrum and thus help to identify clearly which is the region more suitable to

develop the PLS model for analytical determination using the NIR technology. The optimal model was based in the

spectral region 5932-4497 cm-1 determined by the mwPLS method. These spectral regions included the wavenumber

related to strongest absorption band (5184 cm-1) associated with the combination of the O-H stretching and O-H

bending of amylose molecule. Based on these results, the mwPLS can be considered a practicable method to select the

more appropriate spectral region, characterised by low error (RMSEP).

3.6. Comparison of models

Comparing the results from PLS, iPLS, and siPLS models, siPLS models showed better predictive ability. The

experimental results lead to the following conclusions: i) For PLS models, all variables from the full spectral region

were used to calibrate models, having many variables that were noisy spectral information and uninformative variables

that inevitably weaken the performance of the models; ii) iPLS models can reduce noise by selecting definite spectral

intervals, but only one has been chosen to calibrate the PLS model, so some useful variables would be abandoned. The

overall performance of the model was inevitably weakened because too much relevant information was not considered

due to the selection performed. This is the reason why iPLS models would give weak results in the validation sets. iii)

In contrast to iPLS, siPLS shows unparalleled advantages. siPLS not only has the same benefits as iPLS but also

overcomes the disadvantages of iPLS, combining two or three intervals, obtaining better models with reduced total

variable numbers (removing noisy spectral information) and better predictive capacity (without loss of information); iv)

mwPLS was not a suitable method for amylose prediction as the RMSEP values were very poor compared to other

methods. This can be related to the sensibility or the presence of some poor or not significant spectral regions.

4. Conclusion

14
A robust calibration was obtained using different combinations of derivations, preprocessing and regression methods

regardless of sample types. The analytical methodology proposed can accurately quantify the amylose present in the

rice varieties using NIR combined with chemometric tools. Compared with PLS, iPLS and mwPLS algorithms, the

variable selection techniques of siPLS led to models with high predictive ability compared to full-spectrum PLS models

in the different pre-processing data used. The spectral region selected by siPLS in the wavenumber range 8941–8194

cm-1, (CH3, methyl group, 2nd overtone of anti- and symmetric stretching), 5592–5045 cm-1; (1st O-H stretch and O-H

band combination and the H-O-H deformation combination) and 4683–4335 cm-1 (related to some starch and protein

bands), which all are related to the starch content and consequently the amylose. Thus, the regions selected by siPLS

can lead to an increase in the prediction ability of the models. The PLS method was validated, as shown by the

satisfactory results obtained for all estimated figures of worth with no systematic errors. The proposed method presents

significant advantages over conventional lab analysis, such as a simplified procedure, low cost, fast, less chemical

waste, nondestructive and suitable for ‘on-line’ analysis. These results suggest that the combination of NIR

spectroscopy and chemometric techniques is a simple, fast and reliable method for amylose quantification in the quality

control along the rice chain.

Acknowledgments

Funding for this research has been received from the Portuguese Fundação para a Ciência e Tecnologia (FCT) under the

grant agreement number RECI/AGR-TEC/0285/2012, BEST-RICE-4-LIFE project and P.N Sampaio acknowledges the

financial support of Post-Doc research grant included in this project.

References

Aenugu, H.P.R., Kumar, D. S., Parthiban, N., Ghose, S.S., & Banji, D. (2011). Near-infrared spectroscopy - An

overview. International Journal of ChemTeCh Research, 3, 825–836.

Bagchi, T. B., Sharma, S., & Chattopadhyay, K. (2016). Development of NIRS models to predict protein and amylose

content of brown rice and proximate compositions of rice bran. Food Chemistry, 191, 21–27.

Banks, W., Greenwood, C.T., & Muir, D.D. (1971). The characterization of starch and its components. Part 3. The

technique of semimicro, differential, potentiometric titration, and the factors affecting it. Starch/Stärke, 23, 118–127.

15
Bao, J.S., Cai, Y. Z., & Corke, H. (2001). Prediction of rice starch quality parameters by near infrared reflectance

spectroscopy. Journal of Food Science, 66, 936–939.

Bao, J.S., Sun, M., & Corke, M. (2002). Analysis of the genetic behaviour of some starch properties in Indica rice

(Oryza sativa L): thermal properties, gel texture, swelling value. Theoretical and Applied Genetics, 104, 408–13.

Barnes, R.J., Dhanoa, M.S., & Lister, S.J. (1989). Standard normal variate transformation and de-trending of near-

infrared diffuse reflectance spectra. Applied Spectroscopy, 43:772–777.

Bart, M.N., Katrien, B., Els, B., Ann, P., Wouter, S., Karen, I. T., et al. (2007). Nondestructive measurement of fruit

and vegetable quality by means of NIR spectroscopy: A review. Journal of Postharvest Biology and Technology, 46,

99–118.

Burns, D. A., & Curczak, E. W. (1992). Handbook of near-infrared analysis. Practical spectroscopy series (Vol. 13) (pp.

393–395), New York: Marcel Dekker Inc.

Chen, H.Z., Pan, T., Chen, J.M., & Lu, Q.P, (2011). Waveband selection for NIR spectroscopy analysis of soil organic

matter based on SG smoothing and MWPLS methods. Chemometrics and Intelligent Laboratory Systems, 107, 139–

1146.

Chen, H., Song, Q., Tang, G., Feng, Q., & Lin, L. (2013). The combined optimisation of Savitzky-Golay smoothing and

multiplicative scatter correction for FT-NIR PLS models. Hindawi Publishing Corporation. ISRN Spectroscopy,

ID642190.

Chen, J., Yin, Z., Tang, Y., & Pan, T. (2017). Vis-NIR spectroscopy with moving-window PLS method applied to rapid

analysis of whole blood viscosity. Analytical and Bioanalytical Chemistry, 409, 2737-2745.

Delwiche, S. R., Bean, M. M., Miller, R. E., Webb, B. D., & Williams, P. C. (1995). Apparent amylose content of

milled rice by near infrared reflectance spectrophotometry. Cereal Chemistry, 72, 182–187.

16
Delwiche, S.R., & Reeves, J.B. (2010). A graphical method to evaluate spectral preprocessing in multivariate regression

calibrations: example with Savitzky-Golay filters and partial least squares regression. Applied Spectroscopy, 64, 73–82.

Du, W., Chen, Z., Zhong, L., Wang, S., Yu, R., Nordon, A., Littlejhon, D., & Holden, M. (2011). Maintaining the

predictive abilities of multivariate calibration models by spectral space transformation. Analytica Chimica Acta, 690,

64-70.

Fertig, C. C., Podczeck, F., Jee, R. D., & Smith, M. R. (2004). Feasibility study for the rapid determination of the

amylose content in starch by near-infrared spectroscopy. European Journal of Pharmaceutical Sciences, 21, 155–159.

Fitzgerald, M. A., Bergman, C. J., Resurreccion, A. P., Moller, J., Jimenez, R., Reinke, R. F., et al. (2009). Addressing

the dilemmas of measuring amylose in rice. Cereal Chemistry, 86, 492-498.

Friedel, M., Patz, C. D., & Dietrich, H. (2013). Comparison of different measurement techniques and variable selection

methods for FT-MIR in wine analysis. Food Chemistry, 141, 4200-4207.

Gibson, T.S., Solah, V.A., & McCleary, B.V. (1997). A procedure to measure amylose in cereal starches and flours

with concanavalin A. Journal of Cereal Science, 25, 111–119.

Gorry, P.A. (1990). General least-squares smoothing and differentiation by the convolution (Savitzky-Golay) method.

Analytical Chemistry, 62:570–573.

Hizukuri, S. (1996). Starch: analytical aspects. In: Eliasson, A.C. (Ed.), Carbohydrates in Food. (pp. 347–429). Marcel

Dekker, New York.

Hu, G., Burton, C., & Yang, C. (2010) Efficient measurement of amylose content in cereal grains. Journal of Cereal

Science, 51, 35–40.

ISO 6647-1:2015 Rice — Determination of amylose content — Part 1: Reference method.

17
ISO 6647-2:2015 Rice - Determination of amylose content - Part 2: Routine method.

Juliano, B.O., Perez, C.M., Blakeney, A.B., Castillo, D.T., Kongseree, N., Laignelet, B., et al. (1981). International co-

operative testing on the amylase content of milled rice. Starch/Starke, 33, 157–162.

Kalivas, J. H. (1997). Two data set for near infrared spectra. Chemometrics Intelligent Laboratories Systems, 37, 255–

259.

Leardi, L., & Nørgaard, J. (2004). Sequential application of backward interval partial least squares and genetic

algorithms for the selection of relevant spectral regions. Chemometrics, 18, 486–497.

Lee, H.W., Bawn, A., & Yoon S. (2012). Reproducibility, complementary measure of predictability for robustness

improvement of multivariate calibration models via variable selections. Analytica Chimica Acta, 757, 11-18.

Li, H., Prakash, S., Nicholson, T.M., Fitzgerald, M.A., & Gilbert, R.G. (2016). The importance of amylose and

amylopectin fine structure for textural properties of cooked rice grains. Food Chemistry, 196, 702-711.

Lu, Z.-H., Sasaki, T., Li, Y.-L., Yoshihashi, T., Li, L.-T., & Kohyama, K. (2009). Effect of amylose content and rice

type on dynamic viscoelasticity of a composite rice starch gel. Food Hydrocolloids, 23, 1712–1719.

Ma, H., Wang, J., Chen, Y., Cheng, J., & Lai, Z. (2017). Rapid authentication of starch adulteration in ultrafine granular

powder of Shanyao by near-infrared spectroscopy coupled with chemometric methods. Food Chemistry, 215, 108-115.

Mark, H. (2001). Fundamentals of near-infrared spectroscopy. In: Raghavachan, R. (Ed.), Near-infrared Applications in

Biotechnology. (pp. 293–321). Marcel Dekker, New York.

Martens, H., & Stark, E. (1991). Extended multiplicative signal correction and spectral interference subtraction: new

preprocessing methods for near infrared spectroscopy. Journal of Pharmaceutical and Biomedical Analysis, 9, 625-635.

18
Matheson, N.K. & Welsh, L.A. (1988). Estimation and fractionation of the essentially unbranched amylose and

branched amylopectin component of starches with concanavalin A. Carbohydrate Research, 180, 301-313.

Morrison, W. R., & Laignelet, B. (1983). An improved colorimetric procedure for determining apparent and total

amylose content in cereals and other starches. Journal Cereal Science, 1, 9-20.

Naes, T., Isaksson, T., Fearn T. & Davies, A. (2002). A User-Friendly Guide to Multivariate Calibration and

Classification. NIR Publications, Chichester, UK.

Nørgaard, L., Saudland, A.J., Wagner, J.P., Nielsen, L., Munck, & Engelsen, S.B. (2000). Interval Partial least-squares

regression (iPLS): A comparative chemometric study with an example from Near-Infrared spectroscopy. Applied

Spectroscopy, 54, 413–419.

Pandey, M.K., Rani, N.S., Madhav, M.S., Sundaram, R.M., Varaprasad, G.S., Sivaranjani, A.K.P., Bohra, A., Kumar,

G.R., & Kumar, A. (2012). Different isoforms of starch-synthesizing enzymes controlling amylose and amylopectin

content in rice (Oryza sativa L.). Biotechnology Advances, 30, 1697–1706.

Pandiselvam, R., Thirupathi, V., & Vennila, P. (2016). Fourier Transform – near infrared spectroscopy for rapid and

nondestructive measurement of amylose content of paddy. Scientific Journal Agricultural Engineering, 2, 93 – 100.

Pataca, L.C., Borges Neto, W., Marcucci, M. C., & Poppi, R.J. (2007). Determination of apparent reducing sugars,

moisture and acidity in honey by attenuated total reflectance-Fourier transform infrared spectrometry. Talanta, 71,

1926–1931.

Savitzky, A., & Golay, M.J.E. (1964). Smoothing and differentiation of data by simplified least squares procedures.

Analytical Chemistry, 36, 1627–1639.

Shu, Q.Y., Wu, D.X., Xia, Y.W., Gao, M.W., & McClung, A. (1999). Calibration optimization for rice apparent

amylose content by near infrared reflectance spectroscopy (NIRS). Journal of Zhejiang University (Agriculture & Life

Science), 25, 343–346.

19
Silva, M., Ferreira, M.H., Braga. J.W., & Sena, M. (2012). Development and analytical validation of a multivariate

calibration method for determination of amoxicillin in suspension formulations by near infrared spectroscopy. Talanta,

89, 342–351.

Sievert, D., & Holm, J. (1993). Determination of amylose by differential scanning calorimetry. Methods, 45, 136-139.

Soong, Y.Y., Quek, R.Y.C., & Henry, C.J. (2015). Glycemic potency of muffins made with wheat, rice, corn, oat and

barley flours: a comparative study between in vivo and in vitro. European Journal of Nutrition, 54, 1281–1285.

Spiegelman, C.H., McShane, M.J., Goetz, M.J., Motamedi, M., Yue, Q.L., & Coté, G.L. (1998). Theoretical

justification of wavelength selection in PLS calibration: development of a new algorithm. Analytical Chemistry, 70, 35–

44.

Vichasilp, C., & Kawano, S. (2015). Prediction of starch content in meatballs using near infrared spectroscopy (NIRS).

International Food Research Journal, 22, 1501–1506.

Windham, W., Lyon, B.G., Champagne, E.T., Barton, F.E., Webb, B.D., McClung, A.M., Moldenhauer, K.A.,

Linscombe, S., & McKenzle, K.S. (1997). Prediction of cooked rice texture quality using near-infrared reflectance

analysis of wholegrain milled samples. Cereal Chemistry, 74, 626–632.

Wold, S., & Sjostrom, M. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent

Laboratory Systems, 58, 109–130.

Xie, S.F., Xiang, B.R., Yu, L.Y., &. Deng, H.S. (2009). Tailoring noise frequency spectrum to improve NIR

determinations. Talanta, 80, 895–902.

Xie, L.H., Tang, S.Q., Chen, N., Luo, J., Jiao, G.A., Shao, G.N., Wei, X.J., & Hu, P.S. (2014). Optimisation of near-

infrared reflectance model in measuring protein and amylose content of rice flour. Food Chemistry, 142, 92-100.

20
Yun, Y., Li, H., & Wood, L.R., et al. (2013). An efficient method of wavelength interval selection based on random

frog for multivariate spectral calibration. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 111,

31-36.

Zhang, G., Cheng, Z., Zhang, X., Guo, X., Su, N., Jiang, L., Mao, L., & Wan, J. (2011). Double repression of soluble

starch synthase genes SSIIa and SSIIIa in rice (Oryza sativa L.) uncovers interactive effects on the physicochemical

properties of starch. Genome, 54, 448-459.

FIGURE CAPTIONS

Figure 1-NIR spectra without any pre-treatment (A); and the pure amylose spectra (B). NIR spectra plot obtained after

MSC (C); and SNV+SG (69.4.4) (D) pre-processing treatment.

Figure 2-iPLS model: spectra intervals construction (A); Spectra interval selection (B); Scatter plot obtained from iPLS

model (C).

Figure 3-siPLS model: Spectra interval selection (A); Scatter plot obtained from siPLS model (B). NIR spectral region

used for the regression model obtained after SG filter preprocessing and siPLS algorithm: Spectral range (8941-8194

cm-1) (C); Spectral range (5592-5054 cm-1) (D) and; Spectral range (5875–5495 cm-1) (E).

21
22
23
24
Table 1-Analysis of several PLS models using full spectra with and without some preprocessing methods such as
multiplicative scatter correction (MSC); standard normal variate (SNV) and; Savitzky-Golay filter (SG). Root mean
square error prediction (RMSEP); Root mean square error of calibration (RMSEC); and the correspondent
determination coefficient (Rcal and Rep).

Pre-processing Rcal RMSEC Rpred RMSEP


Raw spectra without processing 0,77 3,360 0,70 3,909
MSC 0,81 3,142 0,71 3,863
SNV 0,81 3,048 0,69 4,018
1st Derivative 0,87 2,656 0,76 3,571
2nd Derivative 0,87 2,674 0,73 3,761
MSC + 1st Derivative 0,87 2,591 0,71 3,848
MSC + 2nd Derivative 0,88 2,536 0,64 4,216
SNV + 1st Derivative 0,87 2,569 0,79 3,403
SNV + 2nd Derivative 0,88 2,536 0,70 3,921
SG (69.4.4) 0,92 2,116 0,87 2,678
MSC+SG (69.4.4) 0,92 2,045 0,88 2,650
SNV+SG (69.4.4) 0,92 2,032 0,90 2,435

25
Table 2-Results related to iPLS model such as root mean square error prediction (RMSEP) and root mean square error of
calibration (RMSEC) and the correspondent determination coefficient (Rcal and Rpred) for the all preprocessing and each
spectra intervals then performed.

Spectra region
Processing Spectra Intervals
(cm-1)
PLS Rc RMSEC Rp RMSEP

Without Preprocessing
10 6249 - 5369 4 0,55 4,346 0,55 4,565
20 8462-8022 4 0,53 4,494 0,51 4,695
25 6071-5724 7 0,64 3,925 0,74 3,685
50 5894-5724 5 0,64 3,910 0,68 4,005
MSC + 2nd Derivative
10 6249-5359 7 0,75 3,366 0,80 3,290
20 4467-4035 5 0,86 2,602 0,84 2,885
25 4652-4305 5 0,81 3,022 0,82 3,141
50 4474-4305 7 0,82 3,016 0,79 3,352
SNV + 2nd Derivative
10 6249-5369 6 0,75 3,366 0,80 3,274
20 4467-4035 5 0,86 2,560 0,84 3,012
25 4652-4305 4 0,81 3,018 0,81 3,203
50 4474-4305 7 0,82 3,012 0,81 3,219
SG (69.4.4)
10 5361-4482 8 0,89 2,334 0,87 2,720
20 4906-4474 9 0,88 2,400 0,88 2,561
25 4651-4304 7 0,90 2,228 0,92 2,133
50 4651-4482 8 0,78 3,215 0,74 3,701
MSC + SG (69.4.4)
10 5361-4482 9 0,87 2,488 0,86 2,796
20 4906-4474 9 0,87 2,519 0,88 2,652
25 4651-4304 6 0,90 2,212 0,89 2,475
50 4651-4482 8 0,77 3,253 0,73 3,768
SNV + SG (69.4.4)
10 7136-6256 5 0,61 4,066 0,57 4,499
20 4906-4474 9 0,87 2,512 0,88 2,650
25 4651-4305 7 0,90 2,236 0,91 2,330
50 4651-4482 8 0,78 3,254 0,73 3,736
MSC–Multiplicative scatter correction; SNV-Standard normal variate and; SG – Savitzky-Golay filter.

26
Table 3-Results related to siPLS model such as root mean square error prediction (RMSEP) and root mean square error
of calibration (RMSEC) and the correspondent determination coefficient (Rcal and Rpred) for the all preprocessing and
combinations of spectra intervals then performed. Values presented are related only to the best model.

Processing Spectra Intervals PLS Rcal RMSEC Rpred RMSEP


Without Preprocessing
10 4, 7, 8 10 0,77 3,254 0,77 3,254
20 13, 18, 19 10 0,91 2,070 0,95 1,795
25 6, 23, 24 9 0,90 2,262 0,91 2,248
50 32, 46, 47 10 0,89 2,304 0,92 2,174
MSC + 2nd Derivative
10 4, 8 9 0,75 3,451 0,74 3,680
20 10, 11, 19 9 0,90 2,333 0,86 2,757
25 22, 24 8 0,85 2,706 0,86 2,828
50 22, 46, 47 7 0,90 2,231 0,89 2,461
SNV + 2nd Derivative
10 4, 8 9 0,75 3,458 0,76 3,561
20 10, 11, 19 9 0,87 2,240 0,89 2,544
25 21, 22, 24 9 0,89 2,358 0,88 2,653
50 46, 47 8 0,88 2,410 0,91 2,312
SG (69.4.4)
10 3, 5, 9 10 0,90 2,256 0,9 2,353
20 18, 19 10 0,92 1,886 0,93 2,000
25 11, 21, 23 9 0,93 1,920 0,94 1,938
50 44, 46, 48 6 0,92 2,010 0,93 1,993
MSC+SG (69.4.4)
10 5, 9 9 0,90 2,246 0,82 3,162
20 9, 17, 19 10 0,92 1,933 0,92 2,164
25 21, 23 8 0,92 2,012 0,93 2,077
50 40, 45, 46 8 0,92 1,954 0,92 2,108
SNV+SG (69.4.4)
10 3, 8, 9 5 0,58 4,167 0,57 4,509
20 10, 18, 19 9 0,92 2,044 0,93 2,080
25 11, 21, 23 9 0,92 1,946 0,93 1,979
50 40, 45, 46 8 0,92 1,950 0,93 2,073
MSC-Multiplicative scatter correction; SNV-Standard normal variate and; SG – Savitzky-Golay filter.

27
Highlights

• Optimization of model for rice amylose determination using NIR spectroscopy

• PLS, iPLS, siPLS and mwPLS algorithms showed high accuracy for amylose prediction

• siPLS allowed to obtained a model with highest accuracy and low error

• NIR and chemometric can be suitable techniques for fast, ‘on-line’ and accurate amylose determination.

28

You might also like