Professional Documents
Culture Documents
Microchemical Journal
journal homepage: www.elsevier.com/locate/microc
a r t i c l e i n f o a b s t r a c t
Article history: In the present work, Raman spectroscopy and chemometric tools were explored as an analytical method to
Received 29 November 2011 discriminate authentic and counterfeit Real banknotes. The analysis was based on the characterization of
Received in revised form 27 February 2012 inks used to confect the banknotes. Multivariate analysis was required for data analysis, since the colors present
Accepted 9 March 2012
in the banknotes are a mixture of pigments and the Raman spectra is complex and not totally resolved. Original
Available online 16 March 2012
and counterfeit R$ 50 banknotes were analyzed by Raman spectroscopy without any sample preparation and
Keywords:
three different areas were selected for study: chalcographic, orange and red inks. In this study, only the results
Banknotes for the chalcographic ink will be present. The classification method PLS-DA was employed to discriminate
Counterfeits authentic and counterfeit banknotes, as well as the counterfeit type. The reliability of the results was calculated
Raman spectroscopy using the re-sampling bootstrap technique. The samples classified as counterfeit banknotes by the PLS-DA model
Chemometric had been apprehended by local authorities and classified as fake by classical forensic approaches, based on
Uncertainty estimation sensory tests and optical inspection by a specialist. PLS-DA was used for the development of a procedure, that
could be used by non specialist operators and can also analyze new samples of R$ 50 banknotes, classifying
them with reliability and estimating uncertainty. In the proposed method all fake and not fake banknotes used
to validate the analysis were correctly classified. The procedure could be used as a complementary method to
classical forensic inspection, offering fast, non-destructive, robust analyses with the possibility of in situ analysis
using a portable instrument.
© 2012 Elsevier B.V. All rights reserved.
0026-265X/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.microc.2012.03.006
M.R. de Almeida et al. / Microchemical Journal 109 (2013) 170–177 171
lack of adaptations and validation of developed models. Robust esti- The PLS-DA model is developed from algorithms for Partial Least
mates of the accuracy and precision of results are crucial for any Squares (PLS) regression. PLS is an inverse multivariate calibration
method to be adopted in routine forensic analysis [15]. which seeks a direct relationship between instrumental response
Metrological activities are fundamental to ensure the quality of and the property of interest (qualitative and quantitative). The two
scientific and forensic activities. Measurement results must be valid, data matrices are: matrix X (NxJ) representing the instrumental
comparable, and reproducible, and their uncertainties are the quantita- response, where N is the number of the samples and J the variable
tive expression of their quality. In accordance with the ISO/IEC number; and matrix Y (NxM), which corresponds to the property of
17025:2005 standard [16], all calibration or testing laboratories must interest, with M being the number of properties. This matrix is
have and apply procedures to evaluate uncertainty in measurements. decomposed by factors (or latent variables), in order to reduce the
Establishing fitness-for-purpose is necessary before analytical results size of the data:
can be relied on for important legal decisions. This is particularly true
T
in the forensic measurement of identifying counterfeit banknotes. X ¼ TP þ E ð1Þ
Due to the importance of uncertainty estimation in analytical data,
multiple proposals have been published in the literature to estimate T
Y ¼ UQ þ F ð2Þ
uncertainty in multivariate analysis: linearization-based methods,
re-sampling methods, and U-deviation, among others [17]. However, where T and U are the score matrices containing orthogonal rows; P
there are few examples in the literature that have been applied to are the loading of the X matrix; E is the residue of the X matrix; Q
determine the uncertainty of multivariate calibration models, such is the loading of the Y matrix and F is the error for the Y matrix.
as Partial Least Squares (PLS) [18–21]. Olivieri and et al. [22] discuss In PLS there is a compromise between the explanation of the vari-
in a review paper the principal methods for uncertainty estimation ance in X and its correlation with Y. Usually the numbers of latent vari-
of multivariate calibration. ables are small. The T scores are orthogonal and estimated as a linear
The number of applications of pattern recognition methods, such combination of the original variables with weighing coefficients.
as Principal Component Analysis (PCA), Partial Least Squares for
Discriminant Analysis (PLS-DA) and Soft Independent Modeling of T ¼ XW ð3Þ
Class Analogy (SIMCA) in the literature is vast. However, most studies
only attribute object classifications; only a few papers evaluate the un- The T scores are good predictors of Y and assume that Y and X are
certainty estimation in these pattern recognition methods. Approaches modeled by the same latent variable.
to uncertainty estimation in unsupervised (PCA) and supervised
techniques PLS-DA and SIMCA have been reported by Preisner et al. T
Y ¼ TQ þ F ð4Þ
[23] for discrimination among pathogenic microorganisms. In this
case, the authors implemented different re-sampling methodologies, The Y residues, F, express the deviations between the observed
jackknife and bootstrap, to assess the uncertainty of bacteria discrimi- and modeled response.
nation models using infrared data. The re-sampling methods generated Eqs. (3) and (4) can be rewritten as:
new data sets from the available one by an artificial perturbation [20].
From this new data set, the unknown distribution of a parameter Y ¼ XWQ þ F
T
ð5Þ
could be estimated by mimicking the random mechanism through re-
sampling of the data set. Y ¼ Xβ þ F ð6Þ
In this work we propose the use of Raman spectroscopy, an analyti-
cal technique with great potential for investigations of forensic cases, The regression coefficient, β, can be written as:
for the characterization of inks used to confect authentic and counterfeit
Real banknotes. Multivariate analysis was required, since the colors of
β ¼ W Q
T
ð7Þ
the banknotes are from a mixture of pigments and the Raman spectra
are complex and not resolved. The classification method PLS-DA was −1
W ¼ W P W
T
employed for this proposal with evaluation of the reliability of the ð8Þ
results using the re-sampling bootstrap technique. The main advantage
of PLS-DA is that the sources of variability in the data are modeled by where W is defined in terms of a set of weighting loadings, that maxi-
latent variables, the associated PLS scores are then calculated and mize the covariance between X and Y [25]. More detail about PLS
plotted pairwise, allowing a visual assessment of group separation regression is given by Wold et al. [26].
[24]. The model also calculates the probability of a sample belonging As PLS-DA is a classification method, the matrix or vector Y (proper-
to the class being modeled. While the SIMCA classification method ty of interest) is coded to 0 or 1, when there are two classes (C= 2). For
provides good results in the development of classification model only more than two classes, one can build several models with 0 and 1
when classes are well defined by the PCA, this is not necessary in the encoding, or use the PLS2 algorithm by constructing a matrix (NxC),
PLS-DA method; SIMCA also still requires more development time to where each column represents a class [27].
optimize designs for each class. A fundamental step to build a PLS-DA model is the determination
of the correct number of latent variables. This choice is commonly
2. Theory performed by using cross-validation of the calibration samples where
some samples are separated into a validation set and the models are
2.1. Partial Least Square Discriminant Analysis (PLS-DA) built with the others. The prediction errors are calculated for the
samples that were separated using different numbers of latent vari-
The PLS-DA is considered a supervised classification method, which ables. The process is repeated until all samples have been predicted.
should have an initial knowledge of the classes of the sample set. The The value obtained by the PLS-DA model is a number given by
classes are defined based on a priori information of the system or by Eq. (5), not reading exactly 0 or 1. Thus it is necessary to establish
an exploratory analysis, for example, using Principal Component threshold values to define the class limits. The threshold is estimated
Analysis. Barker and Rayens [24] compared PLS-DA with LDA (Linear in many routines by the Bayesian theorem [27] or by establishing
Discriminant Analysis) and presented some advantages of PLS-DA confidence limits for each object classified. These confidence intervals
such as the selection of variables and noise reduction. can be calculated by re-sampling techniques, such as bootstrap.
172 M.R. de Almeida et al. / Microchemical Journal 109 (2013) 170–177
2.2. Bootstrap A new PLS model can be calculated from Y*, the regression coeffi-
cient bootstrap (β*), which allows the calculation of the new values of
Bootstrap [28,29] is a generalization of the ideas behind cross- Ŷ*, and then new residuals are calculated:
validation, a simple and trustworthy method to estimate prediction
errors. Again, the idea is to generate multiple data sets that, after F^ ¼ Y PLS −Y^ : ð15Þ
analysis, shed light on the variability of the statistics of interest as a
result of different training set compositions. The quantiles of the distribution F are used to estimate the confi-
Bootstrap was introduced by Efron in 1979 [30] to estimate confi- dence interval. For the confidence interval for each sample, the per-
dence intervals for certain parameters, not possible by other techniques, centile method was used, the confidence intervals are asymmetric
mainly when there were small numbers of samples. Despite more than and specific, the upper and lower limits are defined by:
three decades, there are only a few papers in the chemistry area that use
the bootstrap method [17–20,23].
F^ βα ≤ ya ≤ F^ βð1−αÞ ð16Þ
There are two types of bootstrap to estimate the uncertainty in 2 2
2
∑ðY−Y PLS Þ
MSEC ¼ ð12Þ
N
2
∑ðY−Y PLS−CV Þ
MSECV ¼ ð13Þ
N
where YPLS is the value predicted by the PLS model and YPLS-CV is the
value predicted by internal validation.
The samples are bootstrap generated from random substitutions
with replacement values of corrected values. The bootstrap residues
(F*) are added to the YPLS values, generating a matrix Y*:
Y ¼ Y PLS þ F : ð14Þ Fig. 1. Illustrative Brazilian R$ 50 banknotes with areas analyzed by Raman spectroscopy.
M.R. de Almeida et al. / Microchemical Journal 109 (2013) 170–177 173
the X (Raman intensities) and Y (class) blocks, and the Raman inten-
sities were normalized with SNV (Standard Normal Variate). First,
Principal Component Analysis was an employment for exploratory
analyses and the number of principal components was selected
based on the captured variance. For classification analyses, PLS-DA
was applied; the data set was randomly split into two subsets for
training and validation. A dummy matrix Y was created with 0 for
the street counterfeit samples and 1 for the authentic banknotes in
the first model. Other models were developed for the homemade
counterfeit banknotes and classified according to printer type. The
number of latent variables for PLS-DA models was chosen by leave-
one-out cross-validation. The threshold for the class was calculated
by Bayes' Theorem employing the plsthres function present in PLS
Toolbox software, version 4.2.1, from Eigenvector Technologies [27].
The confidence interval estimations for each sample were obtained
with bootstrap residual [33], according to the flowchart shown in
Fig. 2. For calculation of uncertainty, the number of pseudo degrees
of freedom [31] was estimated using Eq. (11).
Fig. 3. Raman spectra of the analyzed areas of R$ 50 banknotes: chalcographic ink (A);
orange ink (B) and red ink (C).
4. Results and discussion
Fig. 5. Scores plot of PC1 × PC2 × PC4 of the PCA analysis of authentic, street counterfeit Fig. 6. Graph of the loadings of (A) PC 1, (B) PC 2 and (C) PC 4 versus wavenumbers
and lab-made counterfeit banknotes. (variables) for the PCA model.
M.R. de Almeida et al. / Microchemical Journal 109 (2013) 170–177 175
Four latent variables were selected that represent 88.8% of the vari-
ance explained in the X and 98.9% of the variance in the Y, and show the
lowest RMSECV, equal to 0.0644. The predicted values from the model
were 0 for counterfeit samples and 1 for original samples. The RMSEC
(Root Mean Squared Error of Calibration) value for calibration set was
0.0589, calculated according to Eq. (18):
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u I 2
uP
u
ti¼1 yref −ycal
RMSEC ¼ : ð18Þ
N−DF
Acknowledgments
The authors thank CAPES, INCTBio for financial support and the
Technical–Scientific Police Superintendency from State of São Paulo,
Brazil for providing the street counterfeit banknotes.
References
[25] H. Martens, M. Martens, Modified jack-knife estimation of parameter uncertainty [31] H. van der Voet, Pseudo-degrees of freedom for complex predictive models: the
in bilinear modelling by partial least squares regression (PLSR), Food Qual. Prefer. example of partial least squares, J. Chemom. 13 (1999) 195–208.
11 (2000) 5–16. [32] R. Wehrens, H. Putter, L.M.C. Buydens, The bootstrap: a tutorial, Chemom. Intell.
[26] S. Wold, M. Sjöström, L. Eriksson, PLS-regression: a basic tool of chemometrics, Lab. Syst. 54 (2000) 35–52.
Chemom. Intell. Lab. Syst. 58 (2001) 109–130. [33] A.M. Zoubir, B. Boashash, The bootstrap and its application in signal processing,
[27] B.M. Wise, N.B. Gallagher, R. Bro, J.M. Shaver, W. Windig, R.S. Koch, Chemometris IEEE Signal Process. Mag. 15 (1998) 55–76.
Tutorial for PLS_Toobox and Solo, Eigenvector Research, Inc., 3905 West Eagle- [34] K.W.C. Poon, I.R. Dadour, A.J. McKinley, In situ chemical analysis of modern organ-
rock Drive, Wenatchee, WA 98801 USA, 2006. ic tattooing inks and pigments by micro-Raman spectroscopy, J. Raman Spectrosc.
[28] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New 39 (2008) 1227–1237.
York, 1993. [35] D.R. Tackley, G. Dent, W.E. Smith, Phthalocyanines: structure and vibrations, Phys.
[29] A.C. Davison, D.V. Hinkley, Bootstrap Methods and Their Applications, Cambridge Chem. Chem. Phys. 3 (2001) 1419–1426.
University Press, Cambridge, 1997.
[30] B. Efron, Bootstrap methods: another look at the jackknife, Ann. Stat. 7 (1979)
1–26.