You are on page 1of 8

Food Control 115 (2020) 107302

Contents lists available at ScienceDirect

Food Control
journal homepage: www.elsevier.com/locate/foodcont

Comparison of different sample preparation techniques for NIR screening T


and their influence on the geographical origin determination of almonds
(Prunus dulcis MILL.)
Maike Arndta,1, Marc Rurikb,f,1, Alissa Dreesa, Katharina Bigdowskia, Oliver Kohlbacherb,c,d,e,f,
Markus Fischera,∗
a
Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146, Hamburg, Germany
b
Applied Bioinformatics, Center for Bioinformatics, Sand 14, 72076, Tübingen, Germany
c
Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 10, 72076, Tübingen, Germany
d
Biomolecular Interactions, Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
e
Translational Bioinformatics, University Medical Center Tübingen, Sand 14, 72076, Tübingen, Germany
f
Institute for Bioinformatics and Medical Informatics, University of Tübingen, Sand 14, 72076, Tübingen, Germany

ARTICLE INFO ABSTRACT

Keywords: The aim of the present study was to conduct a systematic and in-depth comparison of four different sample
Almonds preparation techniques for near-infrared (NIR) spectroscopy analysis of almonds and evaluation of their suit-
NIR ability for the geographical origin determination. Although it is generally known that the sample preparation has
Sample preparation an impact on the NIR screening, there is no scientific consensus on a commonly accepted procedure. In this work,
Geographical origin
64 almond samples from six countries were analyzed as whole and bisected nuts as well as in a ground and
Support vector machine
Chemometrics
freeze-dried state (after grinding). In order to assess the suitability for future applications, both the labor effort
for sample preparation and the classification accuracy were evaluated. Using support vector machine (SVM)
classification we obtained a classification accuracy of 80.2% (±1.9%) on the validation set for the determination
of origin of freeze-dried almonds. The other three sample preparations result in at least 8.3 percentage points
lower classification accuracies. Nonetheless, the analysis of whole and bisected almonds is more suitable for an
initial rapid screening due to a lower overall required work effort. The results confirm the influence of the
sample preparation techniques in NIR screening and pave the way for future widespread analytical applications.

1. Introduction content in meat (Porep, Kammerer, & Carle, 2015), are already estab-
lished in industrial routine laboratories.
Fourier transform near-infrared (FT-NIR) spectroscopy is a highly Even though FT-NIR spectroscopy is generally a fast method, the
versatile and powerful method in verifying the authenticity of food raw sample preparation can differ significantly. Depending on the chosen
materials. It has been used to address a wide range of food-related sample preparation technique – i.e. for nuts simply using whole or bi-
questions such as the determination of adulteration in olive oil or sected nuts or more elaborately prepared materials like ground or
honey, of the geographical origin of fish, to distinguish varieties of wine freeze-dried nuts – time requirement can vary from almost immediate
and quality issues such as mycotoxin contamination (Cozzolino, 2016; spectra recording to around 72 h (Richter, Rurik, Gurk, Kohlbacher, &
Rodriguez-Saona, Giusti, & Shotts, 2016). In general, FT-NIR spectro- Fischer, 2019; Zhang, Jiang, Liu, Mei, & Huang, 2017). Currently,
scopy is a quick analytical method since no extraction is required prior however, there is no universally established form of sample preparation
to analysis. Due to its simple handling and rapid analysis, FT-NIR of food materials for the determination of origin. While in recent studies
spectroscopy is a cost-efficient, non-polluting and thus green analytical Biancolillo et al. and Moscetti et al. analyzed whole hazelnuts, Vitale
method (Armenta, Garrigues, & de la Guardia, 2008; Gałuszka, et al. bisected pistachios prior to analysis (Biancolillo et al., 2018;
Migaszewski, & Namieśnik, 2013). Therefore, various analytical Moscetti, Radicetti, Monarca, Cecchini, & Massantini, 2015; Vitale
methods based on NIR spectroscopy, e.g. the determination of water et al., 2013). In other laboratories even ground walnuts (Gu et al.,


Corresponding author.
E-mail address: markus.fischer@chemie.uni-hamburg.de (M. Fischer).
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.foodcont.2020.107302
Received 2 February 2020; Received in revised form 5 April 2020; Accepted 6 April 2020
Available online 11 April 2020
0956-7135/ © 2020 Elsevier Ltd. All rights reserved.
M. Arndt, et al. Food Control 115 (2020) 107302

2018) or freeze-dried (after grinding) asparagus were utilized for FT- (22 °C ± 2 °C) before the FT-NIR analysis. After the analysis, the al-
NIR measurements (Richter et al., 2019). NIR screening of freeze-dried monds were bisected to check for visible damage due to the ability to
(after grinding, hereinafter referred to as just freeze-dried) nuts has – to detect internal damages via NIR-spectroscopy (Nakariyakul, 2014).
the best of our knowledge – never been applied, despite it being a
promising sample preparation technique due to the removal of poten- 2.2.2. Preparations for analyzing bisected almonds
tially superimposing water bands in the resulting spectra. The selection of the five almonds was conducted as described in the
This study shall provide a systematic and in-depth comparison and previous section. After thawing (22 °C ± 2 °C), the almonds were
evaluation of the above-mentioned forms of sample preparation (whole, manually bisected widthways resulting in a plane contact surface ap-
bisected, ground, and freeze-dried) in terms of their suitability for the plicable for FT-NIR analysis.
determination of origin of almonds (Prunus dulcis MILLER). The prediction
of the geographical origin of almonds is of great economic interest as 2.2.3. Preparations for analysis of ground almonds
the country of harvest determines the market price. While almonds At least 100 g of whole frozen almonds were ground to homo-
from the USA cost about 5 US$/kg, almonds from the Mediterranean geneous powder by using a knife mill (Grindomix GM 300, Retsch,
region can cost up to 9 US$/kg (related to worldwide export 2018, UN Haan, Germany). In order to prevent frictional heat, the grinding was
Comtrade Database, 2018). Although a financial profit could be gen- carried out using dry ice (at least twice the sample mass). The ground
erated by misdeclaration of the origin of almonds (Manning, 2016), material was stored at −20 °C for a minimum of 24 h to ensure
there is still no reliable and routinely applicable method of origin de- quantitative dry ice sublimation. Prior to FT-NIR analysis, 1.25 g
termination to counteract this type of food fraud. The potential of NIR (±0.1 g) of the ground material was thawed at 22 °C (±2 °C) in closed
analysis for this application has, however, been hinted at in a recent glass vials (52.0 mm × 22 mm x 1.2 mm, Nipro Diagnostics Germany
publication (Firmani, Bucci, Marini, & Biancolillo, 2019) presenting an GmbH, Ratingen, Germany).
FT-NIR-based method for differentiating “Avola Almonds” from other
Italian almonds. 2.2.4. Preparations for analysis of freeze-dried almonds
The aim of this study is therefore to investigate the influence of The ground almonds (obtained after grinding process described in
various sample preparations on the FT-NIR analysis of almonds. Whole, Section 2.2.3.) were freeze-dried using a freeze dryer (Beta 1–8
bisected, ground, and freeze-dried almonds are compared regarding LSCplus, Martin Christ Freeze Dryers GmbH, Osterode, Germany) to
their processing effort, analysis time, required sample volume, re- decrease the water content to about 1% by weight. About 60 g ground
producibility, and, in particular, their ability to classify the geo- almonds (incl. dry ice) were freeze-dried for 24 h. Subsequently, the
graphical origin. So, almonds from various economically relevant pro- ground material was manually stirred and freeze-dried for an additional
duction countries were acquired and analyzed in all four sample 24 h. Thereafter, 1.25 g (±0.1 g) of freeze-dried almond powder was
preparations via a non-targeted FT-NIR spectroscopy approach. The thawed at 22 °C (±2 °C) in closed glass vials (52.0 mm × 22 mm x
data were evaluated by multivariate analysis, especially principal 1.2 mm, Nipro Diagnostics Germany GmbH, Ratingen, Germany).
component analysis (PCA) and support vector machine (SVM) classifi-
cation. 2.3. NIR spectroscopy

2. Materials and methods The thawed samples were analyzed using a FT-NIR spectrometer
with an integration sphere (TANGO, Bruker Optics, Bremen, Germany).
2.1. Almond sample acquisition The spectra were acquired in reflectance mode with 50 scans per
spectrum with a resolution of 2 cm−1. A wavenumber range of 11,550
In order to assess the suitability for geographical origin determi- until 3950 cm−1 was selected. Data acquisition was carried out via
nation, a total of 64 authentic almond samples from six different OPUS software (Bruker Optics, Bremen, Germany).
countries were analyzed. The samples were acquired directly from the Spectra of the ground and freeze-dried samples were acquired inside
producers and exporters. With the exception of the Australian samples, glass vials (see Section 2.2.3), while the whole and bisected samples
which stem from 2019 crops, all almonds were harvested in 2018. Due were put directly on the instrument's surface.
to the seasonal differences in the southern hemisphere, however, a six- Depending on the form of sample preparation, different numbers of
month discrepancy of the harvest time is inevitable. In an attempt to replicates are required for a sufficient coverage of the almonds’ basic
cover a broad spectrum of almonds, 17 different varieties (e.g. Non populations. While the ground and freeze-dried samples required only
Pareil, Padre, Butte, Tuono) and additionally even bitter almonds were five recorded spectra, analysis of whole and bisected almonds was
chosen. conducted with six spectra per almond (amounting to a total of 30
spectra per sample). A more detailed description of the four different
2.2. Sample preparation procedures can be taken from Table 1. All samples were measured at
room temperature (22 °C ± 2 °C).
100–1000 g of the acquired almond raw material, shelled and un-
shelled, were shock frosted in liquid nitrogen for 5 min each and sub- 2.4. Spectra pre-processing
sequently stored at −20 °C. If necessary, the almonds were cracked
manually under dry ice cooling (−78.5 °C) to safely remove remaining To avoid overfitting and enable comparability, multiplicative
endocarp. The samples were stored at −20 °C. Each of the 64 acquired scatter correction (MSC), smoothing, first order derivative, reduction of
authentic almond samples were eventually analyzed after undergoing variables, and median formation were applied as data pre-processing
the four sample preparation techniques which are listed in the fol- steps.
lowing sections. At first, MSC was used in order to equalize additive and multi-
plicative scattering effects using the mean spectrum of each sample
2.2.1. Preparations for analyzing whole almonds preparation technique as reference. The mentioned scattering effects
Five almonds with intact tegument (brown skin) were selected from lead to differences in the spectra caused by inhomogeneous physical
the available sample material. If the morphology of the seeds visually effects and not – as is desired – by geographical origin. To put it more
differed greatly within the sample, almonds were chosen which also precisely, the morphology of whole and bisected almonds can result in
differed in shape and/or color in order to cover the sample's variance as different scattering as the shape of the seeds or the surface structure of
best as possible. The almonds were thawed at room temperature the tegument varies. Moreover, different morphologies result in a

2
M. Arndt, et al. Food Control 115 (2020) 107302

Table 1 parameters and an outer cross-validation to assess the model perfor-


Spectra acquisition of the four sample preparation techniques for representing mance. First, the data set was separated into five equal parts by stra-
the almonds’ basic population. tified random sampling regarding the geographical origin. Four parts
whole bisected ground freeze-dried were used to train the model in an inner cross-validation and one is set
aside as a test set. This process was repeated five times so that each part
required 30 30 5 5 operated as a test set once (5-fold outer cross-validation). For each fold
spectra
of the outer cross-validation, an inner 10-fold cross-validation was
sample 5 almonds 5 almonds 1.25 g ± 0.1 g 1.25 g ± 0.1 g in
volume in glass vials glass vials performed within the respective training set to optimize the model
procedure three spectra per three spectra thorough shake thorough shake parameters that are then validated using the set-aside test set of the
side per almond, per almond (vertically and (vertically and outer cross-validation. The entire nested cross-validation was repeated
120° turn half, 120° turn horizontally) horizontally)
20 times with different random stratified splits into training and test set
between every between every between the between the
measurement measurement replicates, replicates,
to offset potential bias from their composition.
uniform filling uniform filling
level level 3. Results and discussion

3.1. Spectra interpretation

different amount of reflected light causing baseline shifting. Scattering Fig. 1 compares the median spectra of the four processing forms
effects caused by the inhomogeneous physical effects are also observed after MSC (9000–3950 cm−1, median value of all samples of each
analyzing ground or freeze-dried samples as the particle size is not sample preparation method, see Fig. S1 for whole and unprocessed
perfectly uniform. spectra). A precise peak assignment cannot be achieved, since peak
In addition, the first-order gap-segment derivatives were calculated overlaps are unavoidable when complex matrices like almonds are
to eliminate offset and baseline drifts (gapDer function implemented in analyzed. Nevertheless, the spectra of the bisected, ground as well as
the R package prospectr 0.1.3, using a derivative order m of 1, a the freeze-dried almonds show similar trends in absorbance which can
smoothing window s of 11 and a window size w of 11; Stevens & mainly be attributed to the predominant lipids. The HC]CH band oc-
Ramirez-Lopez, 2014). Gap-segment derivatives use a window-based curs at approximately 8550 cm−1 (C–H, second overtone) and is most
approach that first averages the points in a smoothing window centered likely caused by the carbon–carbon double bonds of the almonds’ un-
around the current measurement point, then calculate the derivative by saturated fatty acids. The aliphatic hydrocarbon part of the lipids ab-
taking the difference between two points separated by a given gap size sorbs in various areas: at about 5800 cm−1, the C–H stretch first
(Norris & Williams, 1984; Rinnan, Van Den Berg, & Engelsen, 2009). overtone of methylene is located while the symmetric CH2 bond vi-
The high resolution of 2 cm−1 used during spectra acquisition re- brations (C–H, stretch first overtone) appear at around 5680 cm−1.
sulted in 3725 variables that often contain similar information for ad- Another lipid associated absorbance is located at around 4330 cm−1
jacent wavenumbers. For this reason, the average of five contiguous which is caused by the C–H bending (second overtone). Besides lipids,
wavenumbers was taken (binning function implemented in the R some other characteristic bands were observed. At around 4855 cm−1,
package prospectr 0.1.3), reducing 3693 (after the first derivative) to protein vibrations are located due the amide combination band of
739 variables. CONH2. The broad absorbance band in a range of 7100–6100 cm−1 is
The final step of data pre-processing was to take the median of each composed of the N–H stretching (first overtone) of proteins and the
wavenumber of all measured spectra for the same sample (30 spectra water associated absorbance (O–H symmetric and asymmetric
for whole and bisected samples and five spectra for ground and freeze- stretching combination, first overtone). Furthermore, a combination
dried samples). The median was used instead of the arithmetic average band of water appears at a wavenumber of around 5155 cm−1 due to
in order to minimize the influence of potential outliers. the O–H stretching and H–O–H bending combination (Buijs & Choppin,
1963; Shenk, Workman, & Westerhaus, 2001; Weyer & Workman,
2.5. Multivariate data analysis and classification models 2012).
Examining the NIR spectra of Fig. 1, it becomes evident that the
Multivariate data analysis was conducted separately for each al- median spectrum of the whole almonds' analysis differs from the other
mond sample preparation. Each preparation leads to one data set con- spectra: apart from the overall lower absorbance values, the spectra
taining the results of all 64 acquired almond samples. also exhibits a partially disparate absorbance pattern. While the slightly
First, a principal component analysis (PCA) was applied to visualize weaker lipid bands are still observed, the absorbance in the range of
the present data. The PCAs were performed on the pre-processed and 4960–4510 cm−1 is significantly increased. When visually comparing
centered data (mean = 0). Subsequently, the supervised learning al- the whole almond samples with samples of the other preparations, the
gorithm support vector machine (SVM) was applied for classifying the most apparent difference is the tegument. As in NIR spectroscopy the
acquired data regarding the almond samples' geographical origin. SVM samples surface majorly impacts the measurement, the tegument's
is a state-of-the-art classification method that can be used to train both contribution to the spectra is – for obvious reasons – at its maximum in
linear and non-linear classifiers (Cortes & Vapnik, 1995). In this study the spectra recorded from whole almonds. Hence, it seems likely that
SVM was performed using LIBSVM (Chang & Lin, 2011), a publicly the band from 4960 to 4510 cm−1 derives from composition differences
available library that implements methods for the training and classi- of the tegument. In order to verify this explanatory approach, FT-NIR
fication of SVMs. LIBSVM was accessed using the e1071 interface (R analysis of different compartments was conducted. Subsequently, one
Core Team, 2019, R version 3.6.0; Meyer et al., 2019, ‘e1071’ version whole almond was blanched and the removed tegument as well as the
1.7–2). As SVM is a binary classifier, for multiclass classification the blanched almond were analyzed separately (see Fig. S2). In these
one-versus-one approach implemented in LIBSVM was used, i.e. a spectra, the intensity increase of the band from 4960 to 4510 cm−1 is
binary classifier is created for each pair of classes. A new data point is only observable in the tegument and the whole almond. While the te-
then classified by taking the most frequent class label. gument has fiber content of about 46 wt%, the whole almonds merely
In order to avoid overfitting and simultaneously optimize the model exhibit 11–14 wt% fiber. The fiber fraction consists mainly of cellulose,
parameters, a nested cross-validation was conducted (Krstajic, hemicellulose and lignin – biological macromolecules which all form
Buturovic, Leahy, & Thomas, 2014; Meyer et al., 2019; Varma & Simon, complex spectra independently – and might result in marked absor-
2006). This approach uses an inner cross-validation to optimize model bance deviations due to the O–H bending and C–O stretching

3
M. Arndt, et al. Food Control 115 (2020) 107302

Fig. 1) are comparable to those of ground almonds, since in both cases


mainly the lipid-rich kernel is analyzed.
There are significant absorbance differences between MSC-cor-
rected spectra of the ground and freeze-dried samples – as expected –
regarding the water bands (about 6900 cm−1 and 5155 cm−1) since the
water content is reduced in the freeze-dried samples. The lipid-asso-
ciated bands of the freeze-dried and ground samples show clear simi-
larities since lipids are not strongly influenced by the additional lyo-
philization.
In conclusion, the four sample preparation techniques show com-
parable absorbance trends, however, the expression of the bands differs
in the recorded spectra. For this reason, the suitability regarding the
geographical origin determination for each preparation is discussed
separately in the following.
Fig. 1. MSC-corrected spectra of whole, bisected, ground, and freeze-dried al-
monds.
3.2. Suitability for the classification of geographical origin
−1
combination in the range of 4960–4510 cm (Blackwell, 1977; Fig. 2 illustrates the derivative spectra (median over the countries of
Ruggeri, Cappelloni, Gambelli, Nicoli, & Carnovale, 1998; Yada, origin, 9000–4000 cm−1, scatter corrected) for the four different
Lapsley, & Huang, 2011). sample preparation techniques. Herein, intensity deviations with regard
As previously mentioned, the whole almonds’ spectra exhibit overall to the geographical origin become apparent. The lipid associated band
lower absorbance values caused by the smaller amount of reflected at about 4330 cm−1, for example, shows clear intensity differences
light. This might be due to the fact that light can pass by the rounded depending on the sample's geographical origin. In numerous studies
almond. Moreover, the porous structure of the tegument allows the lipids have already been mentioned as potential markers for the de-
light to penetrate deeper and thus reduces the intensity of the reflected termination of the geographical origin, as they can be influenced by
light (Socias, Kodad, Alonso, & Gradziel, 2007). exogenous conditions (Amorello, Orecchio, Pace, & Barreca, 2016;
While the median spectra absorbance values of bisected almonds are Klockmann, Reiner, Bachmann, Hackl, & Fischer, 2016). The extent of
the highest overall, the absorbance values show a relative decrease in a these intensity differences varies greatly depending on the chosen form
range of 4370–4250 cm−1 compared to the spectra of homogenized of sample preparation. In order to depict these differences, the pre-
almonds. This could also be caused by the reflection of light since the processed data (according to Section 2.4) were visualized utilizing a
diameter of bisected almonds is usually slightly smaller than the light principal component analysis (PCA). In Fig. 3, the first principal com-
beam width of the spectrometer (6 mm). A coverage of the light beam ponent (PC1) is plotted against the second (PC2) for each preparation
by increasing the almonds surface can theoretically be achieved by technique. None of the shown PCAs allows for visual differentiation of
bisecting the almonds lengthways. Yet, this approach is less easy to the individual countries entirely. For whole almonds, the first two PCs
apply and may result in insufficiently thick and highly frangible almond explain 82.2% and 6.6% of the total variance (88.8% in total). In
halves. Overall, the MSC-corrected spectra of the bisected almonds (see contrast to the PCAs of the other sample preparations which represent

Fig. 2. Median almond spectra after MSC and first derivative for (a) whole, (b) bisected, (c) ground, and (d) freeze-dried almonds.

4
M. Arndt, et al. Food Control 115 (2020) 107302

Fig. 3. PCA score plots showing differentiation for (a) whole, (b) bisected, (c) ground, and (d) freeze-dried almonds.

only about 70% of the variance in the first two PCs, cluster trends are Further comparison of the classification results is possible since the
observable slightly better in the whole almond PCA plot. However, the same almond samples were used for each sample preparation. As al-
Mediterranean almonds (Spain and Italy) form a cluster in all shown ready presumed from the spectra, the validation accuracies (hereinafter
PCAs and can thus be distinguished from other countries of origin. The referred to as just classification accuracy) differ depending on the
separation of the Mediterranean almonds is a relevant concern – chosen preparation (see Table 2). Comparatively, the freeze-dried al-
especially from an economic perspective – as these are the most ex- monds show the highest classification accuracy of 80.2% (±1.9%). The
pensive almonds compared to the almonds from the other origins (UN superiority of this preparation is, most likely, due to an effective re-
Comtrade Database, 2018). Higher-order PCs may also contain in- moval of water which can cause unwanted signal overlay possibly re-
formation that contributes to the determination of geographical origin sulting in information loss. While the analysis of ground and bisected
(see Figs. S3–S6) since the explained variance increases up to 20 per- almonds leads to accuracies of 71.9% (±3.5%) and 64.5% (±3.5%),
centage points by adding the third and the fourth PC. respectively, the most easily and quickly feasible analysis of whole al-
In order to compare the four sample preparation techniques quan- monds still achieves accuracies of 62.6% (±2.8%). The comparably low
titatively, a support vector machine (SVM) was used to classify the al- accuracies of the whole and bisected almond analysis are most probably
monds regarding their geographical origin. A one-versus-one classifier explained by the insufficient coverage of the light beam resulting in
with a Gaussian radial basis function (RBF) kernel was trained via significant loss of information. Additionally, the classification accuracy
LIBSVM (R package ‘e1071’) and validated using repeated nested cross- of whole almonds could be decreased by the influence of the tegument.
validation which allows to optimize model parameters and obtain and As the tegument is formed in an earlier growth phase, changes in ex-
unbiased estimation of the generalization performance. The hy- ternal conditions have a greater impact on the inner kernel which grows
perparameter optimization was performed using a grid search in the in a later phase. For example, water deficiency in a crucial growth
inner cross-validation loop (cost C from 10−5 to 105, gamma γ from phase can lead to irregular kernel development (e.g. wrinkled almonds)
10−6 to 10−1). where only the tegument is fully developed (Hawker & Buttrose, 1980).

5
M. Arndt, et al. Food Control 115 (2020) 107302

Table 2 preparation, a comparability is preferred with samples of one harvest


Mean confusion matrix across all 20 repeats of the nested cross-validation of all period.
sample preparation techniques used. Confusion values are given as count va-
lues, classification accuracies as percentage.
3.3. Comprehensive comparison of the different sample preparation
predicted techniques
AU ES IR IT MA US
Apart from the classification accuracy, there remain other para-
actual whole (62.6%) AU 4.90 1.70 0.00 1.90 0.40 0.10 meters of fundamental importance for the proper selection of the most
ES 1.35 11.55 0.15 1.20 0.70 0.05 appropriate preparation technique. A summary of these parameters can
IR 0.85 1.55 1.45 0.00 0.70 1.45
be obtained from Table 3 which additionally presents the required work
IT 0.70 1.65 0.00 19.60 0.05 0.00
MA 0.05 2.95 1.10 0.10 0.55 0.25 time – both active and passive – and the required sample quantity.
US 1.55 1.80 1.65 0.00 0.00 2.00 As the feasibility, short processing time and low overall required
bisected (64.5%) AU 6.60 0.05 0.00 1.10 0.00 1.25 work effort are highly advantageous – especially from an economic
ES 1.65 7.55 0.00 5.75 0.00 0.05 point of view – the analysis of whole and bisected almonds prima facie
IR 0.05 2.00 2.90 0.20 0.00 0.85
IT 0.10 2.70 0.20 18.25 0.00 0.75
clearly stand out. However, this is not always the only relevant concern.
MA 0.00 0.00 0.15 0.00 4.70 0.15 For example, while ground and freeze-dried material from a sufficient
US 3.10 0.05 0.05 1.40 1.10 1.30 amount of almonds (here min. 100 g) might enable future precise
ground (71.9%) AU 6.65 0.10 0.00 0.00 0.55 1.70 analyses of almond admixtures, using whole or bisected almonds is less
ES 0.00 11.20 0.00 3.45 0.00 0.35
suitable for such purposes. Another benefit of grinding as sample pre-
IR 0.00 0.50 4.30 0.00 0.95 0.25
IT 0.00 3.05 0.00 18.95 0.00 0.00 paration is that the subsequent analysis is easily transferred to almond
MA 0.35 1.30 1.75 0.00 0.95 0.65 flour, which is being consumed more and more in recent years (FMI,
US 1.80 0.40 0.10 0.20 0.55 3.95 2018).
freeze-Dried (80.2%) AU 5.45 2.15 0.00 0.15 0.00 1.25 While freeze-drying of the almonds leads to the most accurate
ES 1.05 11.35 0.00 2.05 0.00 0.55
classification, it is also the most time-consuming sample preparation.
IR 0.05 0.00 5.70 0.00 0.25 0.00
IT 0.15 0.95 0.00 20.90 0.00 0.00 Nevertheless, the lyophilization time implemented in this study (48 h)
MA 0.00 0.00 0.10 0.00 4.90 0.00 could be reduced by using less than 60 g of sample material. If very
US 1.05 1.90 0.00 1.00 0.00 3.05 little time is available for the analysis, using whole almonds might be
the most reasonable method e.g. for inspecting incoming material.
Apart from lower accuracies, analysis of whole almonds has one addi-
In prior studies, the amount of analyzed almonds was evaluated in tional major setback: due to the naturally present water content in
order to exclude a potential influence of the basic population at this whole almonds, obtained spectra show broad absorbance bands at
point. There is no consensus on the required quantity of almonds re- wavenumbers of about 5155 cm−1 and 6880 cm−1. As these water-
garding the NIR screening of whole samples: in some cases one nut is related peaks could show different absorbance values depending on the
analyzed per sample (Moscetti et al., 2015), while in other studies five geographical origin (see Fig. 2), they might alter the obtained results as
nuts (Pannico et al., 2015) were included for different analytical ap- water content may vary within the same sample due to variety, storage
proaches. Analogous to Pannico et al., five almonds per sample were or transport. This issue could be mitigated by the exclusion of the water
measured for whole and bisected almonds, providing a good trade-off bands (6500–5500 cm−1) during the data processing according to Teye
between the required work effort and an accurate representation of the et al. (Teye, Huang, Dai, & Chen, 2013). Yet, the loss of information
analyzed almond population. reduces the obtained classification accuracy to around 54.6% (±3.5%)
In order to verify the determination of the almonds' origin via FT- and thus the sensitivity of the method (see Table S2).
NIR spectroscopy, the data set requires extension. As more samples per In conclusion, despite the high practical effort, the most promising
group increase the statistical power, a larger data set enhances the sample preparation technique for the determination of the geographical
model's robustness. Additionally, the influence of different harvest origin of almonds turned out to be freeze-drying subsequent to
years cannot be assessed based on the present data. Since the metabo- grinding. Classification of freeze-dried almonds is more robust, most
lome, which is closest to the phenotype, may vary due to different likely since varying water contents, e.g. caused by transport conditions,
climatic conditions, the harvest year has an influence on the non-tar- can no longer affect the spectra. Nonetheless, the analysis of whole or
geted FT-NIR analysis (Lee et al., 2010; Richter et al., 2019). However, bisected almonds are more suitable for an initial rapid screening. This
as the present study is focused on the optimization of the sample work unprecedentedly shows the significant impact of comprehensive

Table 3
Comprehensive comparison of the four sample preparations regarding work time – active and passive –, sample quantity and classification accuracy via support
vector machine (SVM).
whole bisected ground freeze-dried

estimated active work time (measurement and 10 min 10 min 30 min 50 min (inclusive treatments during freeze-
preparation) drying)
estimated passive work time (storage or freeze-drying) – – 24 h (for sublimation of dry ice) and min. 4 h 48 h (freeze- drying) and min. 4 h thawing
thawing
sample quantity 5 almonds 5 almonds min. 100 g min. 100 g
classification accuracy via SVM 62.6% 64.5% 71.9% 80.2%

6
M. Arndt, et al. Food Control 115 (2020) 107302

optimization of the sample preparation prior to FT-NIR screening. Abbreviations


In order to facilitate novel food profiling approaches like determi-
nation of the almond variety, new considerations and additional tests AU Australia
are required. Since the composition – especially also the thickness – of ES Spain
the tegument is strongly influenced by the genotype (Hawker & FT-NIR fourier transform near-infrared
Buttrose, 1980), it is conceivable that the analysis of whole almonds IT Italy
would be more suitable in this case. However, as such applications are IR Iran
yet to be developed, this work might serve as a starting point for pro- MA Morocco
mising analytical works in the future. MSC multiplicative Scatter Correction
PC principal component
4. Conclusions PCA principal component analysis
RBF radial basis function
Four sample preparation techniques have been compared in terms sd standard deviation
of their suitability for geographical origin determination of almonds. In SVM support vector machine
addition to the multivariate classification via SVM, the work time and US or USA United States of America
the mandatory sample size of each preparation are also discussed and
evaluated. Despite the high work and time involved, freeze-drying in Appendix A. Supplementary data
combination with prior grinding turned out to be the most reliable
preparation technique for origin determination of almonds. Supplementary data to this article can be found online at https://
Due to the highly similar composition of other nuts or seeds, this doi.org/10.1016/j.foodcont.2020.107302.
procedure could possibly be transferred to these matrices after a man-
ageable adaptation and validation. The present study shows that FT- References
NIR screening is suitable for an initial rapid determination of the origin
of almonds, for example during incoming goods inspections. In order to Amorello, D., Orecchio, S., Pace, A., & Barreca, S. (2016). Discrimination of almonds
(Prunus dulcis) geographical origin by minerals and fatty acids profiling. Natural
improve the significance of the results, further investigations should be Product Research, 30(18), 2107–2110.
carried out. In this context, further work would be important to shed Armenta, S., Garrigues, S., & de la Guardia, M. (2008). Green analytical chemistry. TRAC
light on the influence of the harvest year and, in addition, other Trends in Analytical Chemistry, 27(6), 497–511.
Biancolillo, A., De Luca, S., Bassi, S., Roudier, L., Bucci, R., Magrì, A. D., et al. (2018).
countries of origin, e.g. France, Germany or China, would round off the Authentication of an Italian PDO hazelnut (“Nocciola Romana”) by NIR spectroscopy.
picture. Environmental Science and Pollution Research, 28780–28786.
Blackwell, J. (1977). Infrared and Raman spectroscopy of cellulose. ACS Publications, 14,
206–218.
CRediT authorship contribution statement Buijs, K., & Choppin, G. (1963). Near‐infrared studies of the structure of water. I. Pure
Water. The Journal of Chemical Physics, 39(8), 2035–2041.
Maike Arndt: Conceptualization, Investigation, Methodology, Chang, C.-C., & Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM
Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Writing - original draft. Marc Rurik: Methodology, Software, Formal Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3),
analysis, Writing - original draft, Visualization. Alissa Drees: 273–297.
Investigation, Writing - review & editing. Katharina Bigdowski: Cozzolino, D. (2016). Near infrared spectroscopy and food authenticity. Advances in food
traceability techniques and technologies. Woodhead Publishing.
Investigation, Writing - review & editing. Oliver Kohlbacher: Firmani, P., Bucci, R., Marini, F., & Biancolillo, A. (2019). Authentication of “Avola al-
Conceptualization, Writing - review & editing, Supervision. Markus monds” by near infrared (NIR) spectroscopy and chemometrics. Journal of Food
Fischer: Conceptualization, Resources, Writing - review & editing, Composition and Analysis, 82, 103235.
FMI (2018). Almond flour processors to capitalize on the gluten-free trend. Almond flour
Supervision. market - Global production-Supply analysis 2019.
Gałuszka, A., Migaszewski, Z., & Namieśnik, J. (2013). The 12 principles of green ana-
Declaration of competing interest lytical chemistry and the SIGNIFICANCE mnemonic of green analytical practices.
TRAC Trends in Analytical Chemistry, 50, 78–84.
Gu, X., Zhang, L., Li, L., Ma, N., Tu, K., Song, L., et al. (2018). Multisource fingerprinting
The authors declare that they have no known competing financial for region identification of walnuts in Xinjiang combined with chemometrics. Journal
of Food Process Engineeringe12687.
interests or personal relationships that could have appeared to influ- Hawker, J., & Buttrose, M. (1980). Development of the almond nut (Prunus dulcis (Mill.)
ence the work reported in this paper. DA Webb). Anatomy and chemical composition of fruit parts from anthesis to ma-
turity. Annals of Botany, 46(3), 313–321.
Klockmann, S., Reiner, E., Bachmann, R., Hackl, T., & Fischer, M. (2016). Food finger-
Acknowledgment printing: Metabolomic approaches for geographical origin discrimination of hazel-
nuts (corylus avellana) by UPLC-QTOF-MS. Journal of Agricultural and Food Chemistry,
64(48), 9253–9262.
In particular, we would like to thank Marie Sophie Oberpottkamp,
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls
Christian Ahlers and Kerstin Blum for their support in sample pre- when selecting and assessing regression and classification models. Journal of
paration. We also want to thank Doreen Teske and Nils Neumann for Cheminformatics, 6(1), 10.
the help in sample acquisition. Additionally, we thank Bernadette Lee, J.-E., Lee, B.-J., Chung, J.-O., Hwang, J.-A., Lee, S.-J., Lee, C.-H., et al. (2010).
Geographical and climatic dependencies of green tea (camellia sinensis) metabolites:
Richter, Torben Segelke and Tilman Eckert for providing expertise. A 1H NMR-based metabolomics study. Journal of Agricultural and Food Chemistry,
This work was funded by the University of Hamburg and was 58(19), 10582–10589.
strongly influenced by ideas developed within the scientific joint pro- Manning, L. (2016). Food fraud: Policy and food chain. Current Opinion in Food Science,
10, 16–21.
ject “Food Profiling” (funding code: 2816500914) funded since 2016 by Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., et al. (2012). Package
means of the Federal Ministry of Food and Agriculture (BMEL) by a ‘e1071’. The Comprehensive R Archive Networ. http://cran.rproject.org/web/
decision of the German Bundestag (parliament). Project support is packages/e1071/index.html, Accessed date: 15 November 2019.
Moscetti, R., Radicetti, E., Monarca, D., Cecchini, M., & Massantini, R. (2015). Near in-
provided by the Federal Institute for Agriculture and Food (BLE) within frared spectroscopy is suitable for the classification of hazelnuts according to
the scope of the program for promoting innovation. The focus of Food Protected Designation of Origin. Journal of the Science of Food and Agriculture, 95(13),
Profiling lies on developments in the area of instrumental analysis for 2619–2625.
Nakariyakul, S. (2014). Internal damage inspection of almond nuts using optimal near-
the authentication of foodstuffs. Therefore, we would like to thank all infrared waveband selection technique. Journal of Food Engineering, 126, 173–177.
working groups involved in this project as well as the financial sup- Norris, K. H., & Williams, P. C. (1984). Optimization of mathematical treatments of raw
porter. near-infrared signal in the measurement of protein in hard red spring wheat. I.

7
M. Arndt, et al. Food Control 115 (2020) 107302

Influence of particle size. Cereal Chemistry, 61(2), 158–165. Socias, R., Kodad, O., Alonso, J., & Gradziel, T. (2007). Almond quality: A breeding
Pannico, A., Schouten, R., Basile, B., Romano, R., Woltering, E., & Cirillo, C. (2015). Non- perspective. Horticultural Reviews, 34, 197–238.
destructive detection of flawed hazelnut kernels and lipid oxidation assessment using Stevens, A., & Ramirez-Lopez, L. (2014). An introduction to the prospectr package. R
NIR spectroscopy. Journal of Food Engineering, 160, 42–48. Package Vignette, Report No.: R Package Version 0.1.3.
Porep, J. U., Kammerer, D. R., & Carle, R. (2015). On-line application of near infrared Teye, E., Huang, X., Dai, H., & Chen, Q. (2013). Rapid differentiation of Ghana cocoa
(NIR) spectroscopy in food production. Trends in Food Science & Technology, 46(2), beans by FT-NIR spectroscopy coupled with multivariate classification.
211–230. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 114, 183–189.
R Core Team (2019). R: A language and enviroment for statistical computing. Vienna, UN Comtrade Database (2018). Almond (shelled) export trade value and netweight. New
Austria: R Foundation for Statistical Computing. http://www.r-project.org/, Accessed York, USA: United Nations Publications Boardhttps://comtrade.un.org/data/,
date: 25 November 2019. Accessed date: 2 November 2019.
Richter, B., Rurik, M., Gurk, S., Kohlbacher, O., & Fischer, M. (2019). Food monitoring: Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for
Screening of the geographical origin of white asparagus using FT-NIR and machine model selection. BMC Bioinformatics, 7(1), 91.
learning. Food Control, 104, 318–325. Vitale, R., Bevilacqua, M., Bucci, R., Magrì, A. D., Magrì, A. L., & Marini, F. (2013). A
Rinnan, Å., Van Den Berg, F., & Engelsen, S. B. (2009). Review of the most common pre- rapid and non-invasive method for authenticating the origin of pistachio samples by
processing techniques for near-infrared spectra. TRAC Trends in Analytical Chemistry, NIR spectroscopy and chemometrics. Chemometrics and Intelligent Laboratory Systems,
28(10), 1201–1222. 121, 90–99.
Rodriguez-Saona, L. E., Giusti, M. M., & Shotts, M. (2016). Advances in infrared spectro- Weyer, L., & Workman, J., Jr. (2012). Practical guide and spectral atlas for interpretive near-
scopy for food authenticity testing. Advances in food authenticity testing. Woodhead infrared spectroscopy. CRC Press.
Publishing. Yada, S., Lapsley, K., & Huang, G. (2011). A review of composition studies of cultivated
Ruggeri, S., Cappelloni, M., Gambelli, L., Nicoli, S., & Carnovale, E. (1998). Chemical almonds: Macronutrients and micronutrients. Journal of Food Composition and
composition and nutritive value of nuts grown in Italy. Italian Journal of Food Science, Analysis, 24(4–5), 469–480.
3, 243–252. Zhang, H., Jiang, H., Liu, G., Mei, C., & Huang, Y. (2017). Identification of Radix puer-
Shenk, J. S., Workman, J. J., & Westerhaus, M. O. (2001). Application of NIR spectro- ariae starch from different geographical origins by FT-NIR spectroscopy. International
scopy to agricultural products. Practical Spectroscopy Series, 27, 419–474. Journal of Food Properties, 20(sup2), 1567–1577.

You might also like