You are on page 1of 11

Chemometrics and Intelligent Laboratory Systems 64 (2002) 169 – 179

www.elsevier.com/locate/chemometrics

Uncertainty estimation for multivariate regression coefficients


Nicolaas (Klaas) M. Faber
Department of Production and Control Systems, ATO, PO Box 17, 6700 AA Wageningen, The Netherlands
Received 18 June 2002; received in revised form 22 August 2002; accepted 27 August 2002

Abstract

Five methods are compared for assessing the uncertainty in multivariate regression coefficients, namely, an approximate
variance expression and four resampling methods (jack-knife, bootstrapping objects, bootstrapping residuals, and noise
addition). The comparison is carried out for simulated as well as real near-infrared data. The calibration methods considered are
ordinary least squares (simulated data), partial least squares regression, and principal component regression (real data). The
results suggest that the approximate variance expression is a viable alternative to resampling.
D 2002 Elsevier Science B.V. All rights reserved.

Keywords: Multivariate calibration; Regression vector; Uncertainty estimation; Resampling; Jack-knife; Bootstrap; Monte Carlo simulation;
OLS; PLSR; PCR; NIR

1. Introduction component regression (PCR). By contrast, Faber


and Kowalski [3] derived approximate variance
Typically, applications of multivariate models are expressions for PLSR and PCR (see their Sections
concerned with the prediction of a property of 3.3.3, 3.3.4, and 3.3.5). These expressions account
interest. To achieve an acceptable predictive ability, for all sources of measurement error and accom-
the uncertainty in the model parameters, i.e., the modate for heteroskedastic as well as correlated
regression coefficients, should not be too large. In noise.
keeping with this principle, Centner et al. [1] The jack-knife and bootstrap are resampling
eliminated variables for which the regression coef- methods [4]. Briefly, resampling amounts to gen-
ficients carry a relatively large uncertainty. They erating new data sets from the available one by
used jack-knifing to estimate this uncertainty when introducing an artificial perturbation. The desired
partial least squares regression (PLSR) is used for uncertainty estimate follows from the spread in the
calibration. In the chemometrics literature, two results obtained for the new data sets. This
alternatives to the jack-knife have been proposed approach essentially assumes that the artificial per-
for assessing the uncertainty in multivariate regres- turbation mimics the effect of the real perturbation
sion coefficients. Wehrens and Van der Linden [2] already present in the original data set. Another
used the bootstrap in connection with principal resampling method, which has received less atten-
tion in the chemometrics literature, is the noise
addition method. Excellent discussions of this
E-mail address: n.m.faber@ato.wag-ur.nl (N.M. Faber). method are available. Press et al. [5] treat con-

0169-7439/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 ( 0 2 ) 0 0 1 0 2 - 8
170 N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179

fidence limits on estimated model parameters, Underlying all resampling methods is the assump-
whereas Carrol et al. [6] focus on bias estimation. tion that the resampled entity is independently iden-
It was used by Derks et al. [7] to assess the tically distributed (iid). The validity of this
uncertainty in the output of artificial neural net- assumption depends, among others, on the experi-
works (ANNs), while Duewer et al. [8] and Dable mental set-up. For a correct treatment of resampling
and Booksh [9] reported successful application to methods, it is, therefore, useful to distinguish between
pseudo-rank estimation. The versatility of noise two fundamentally different experimental set-ups,
addition is further illustrated by the work of del namely random and controlled calibration [11]. In
Rı́o et al. [10], who used the method to validate the first case, the training set predictor variables (rows
expression-based prediction intervals in linear of X) are randomly observed, whereas in the latter,
regression with errors on both axes. they are fixed by design. A design leads to exact
The purpose of this study is to investigate the relationships in the training set data because measure-
relative merits of various resampling methods and an ments are taken at special points. As a result, a certain
approximate variance expression. The resampling iid assumption will be violated. Clustering of the data,
methods under investigation are the jack-knife, boot- which is often the case in quantitative structure
strapping objects, bootstrapping residuals, and noise activity relationship (QSAR) work, may have similar
addition. The different approaches are compared for consequences. Score plots are convenient for visual-
simulated as well as real near-infrared (NIR) data. izing relationships in the data.
The simulated data are modeled using ordinary least
squares (OLS). This allows one to study the methods 2.2. Estimation of regression coefficients
under idealized circumstances. For example, the
approximate variance expression specializes to a The OLS estimate for b is:
well-known exact one under these circumstances. It
is believed that these simulations yield insight that bOLS ¼ ðXT XÞ1 XT y ð2Þ
can be used to better interpret the results obtained for where the superscripts ‘  1’ and ‘T’ denote matrix
real NIR data modeled by PLSR or PCR. inversion and transposition, respectively. For the OLS
solution to exist, X must be of full column rank.
The F-factor PLSR estimate for b can be
2. Theory expressed under the SIMPLS formalism [12] as:

2.1. Model assumptions bPLSR ¼ ðRRT ÞXT y ð3Þ

The multiple linear regression model is assumed, where R ( J  F) is a matrix of weights; when
i.e., applied to the predictor variables, one obtains the
scores as T = XR. Eq. (3) includes Eq. (2) as a
y ¼ Xb þ e ð1Þ special case because the full-factor PLSR model
reproduces the OLS solution. The PLSR estimate
where y (I  1) is the true predictand (property of also exists if X is rank-deficient (as long as the
interest); X (I  J) is the true predictor matrix (e.g., number of factors does not exceed the rank of X)
spectra); b ( J  1) is the true regression vector; e or even if J >I, which is the ‘underdetermined’ case
(I  1) is a vector of residuals; and I and J denote often encountered in spectroscopy. Similar to Eq.
the number of training samples and predictor var- (3), the PCR estimate is given by:
iables (e.g., wavelengths), respectively. The actual
bPCR ¼ ðVK1 VT ÞXT y ð4Þ
modeling may be based on realizations of y and X
that are corrupted by non-negligible measurement where V and K contain a subset of the eigenvectors
errors. However, the presence of measurement and eigenvalues of XTX, respectively. Eq. (4) can
errors is not indicated by additional notation to be brought in the form of Eq. (3) by a simple
simplify the presentation. rescaling of the eigenvectors, i.e., R = VK 1/2.
N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179 171

2.3. Uncertainty in regression coefficients De Jong [12] noticed that the analogy between
Eqs. (2) and (3) suggests that RRT is proportional
In the current study, the uncertainty in the coef- to an approximate covariance matrix for the PLSR
ficient estimates is quantified by a standard error coefficients. Faber and Kowalski [3] further worked
(square root of a variance): out this observation for PLSR as well as PCR (see
their Sections 3.3.3, 3.3.4, and 3.3.5). Recalling that
rðbj Þ ¼ VðbÞjj ;
1=2
j ¼ 1; . . . ; J ð5Þ the notation RRT also applies to OLS (full-factor
PLSR) and PCR (R = VK 1/2) yields:
where V() symbolizes the covariance matrix of a
vector quantity and b stands for bOLS, bPLSR, or bPCR, VðbÞ ¼ MSEC  ðRRT Þ ð8Þ
respectively. It is important to note that the standard
error fully accounts for the uncertainty only when where b stands for bOLS, bPLSR, or bPCR, and R is
OLS is applied in connection with errorless predictors estimated using the appropriate method.
(e.g., spectra). With measurement errors in X, the Two comments seem to be in order. First, the
OLS solution will be biased [13]. The relative impor- correct estimation of MSEC requires an adequate
tance of this bias depends on the size of the errors in number of degrees of freedom. As OLS, PCR con-
X. Often, the signal-to-noise ratio is rather high for sumes a single degree of freedom for each factor
spectroscopic data (H10), so that we can safely when the factors are chosen without reference to the
neglect this bias. The situation is further complicated predictand vector (e.g., in the order of their corre-
when using PLSR or PCR because these methods owe sponding eigenvalues). By contrast, the appropriate
much of their popularity to the bias –variance trade- number of degrees of freedom for PLSR is not a
off. It is well known that the number of factors is trivial matter, since the construction of the factors
selected as a compromise between bias (too few includes the predictand vector. The rigorous study of
factors) and variance (too many factors). However, a van der Voet [14] has clearly established that the
successful bias –variance trade-off implies the bias to conventional number, i.e., a single degree of freedom
be relatively unimportant. for each factor, is too small for the early factors and
too large for the latter ones. A sound alternative can
2.3.1. Approximate formula be calculated using the results of leave-one-out
An approximate covariance matrix of the OLS cross-validation (see Eq. (26) in Ref. [14]). Second,
regression coefficients is given by: MSEC in Eq. (8) may contain a bias term (see
Denham [15] for more details). Thus, Eq. (8) yields
a mean squared error in the regression coefficients,
VðbOLS Þ ¼ MSEC  ðXT XÞ1 ð6Þ
rather than a variance. Because we have assumed
bias to be relatively small (successful bias – variance
where MSEC denotes the mean squared error of
trade-off), this distinction will not be made explicit
calibration estimated as:
to simplify the presentation.
X
I It is important to note that the validity of the
ðyi  yfit;i Þ2 approximate variance formulas (6) and (8) does not
i¼1 depend on the experimental set-up leading to the data
MSEC ¼ ð7Þ
I  df (random or fixed calibration). Clearly, the variance in
the parameter estimates depends on the data only, not
in which yi is the predictand for the ith training on how they are obtained. The ‘design’ of the training
sample; yfit,i is the corresponding fitted value; and df set is reflected in the matrix (XTX) 1 or RRT, which
denotes the degrees of freedom consumed by the determines the amount of error propagation. Obvi-
model parameters. For OLS, each parameter takes ously, error propagation is minimized by a proper
away a degree of freedom from the data, likewise a design, but in many applications, e.g., when dealing
potential intercept. Eq. (6) is approximate [13], unless with natural produce, it is impossible to construct a
X is without error. design. The main assumption underlying Eqs. (6) and
172 N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179

(8) is that the noise in y and X is adequately accounted known as bootstrapping pairs or observations) and
for by MSEC. bootstrapping residuals (see Chapter 9 in Ref. [4]).
The method working with residuals will be
2.3.2. Jack-knife explained in Section 2.3.4. Bootstrapping objects
The jack-knife generates reduced data sets by proceeds as follows. New data sets are generated
deleting objects, i.e., an element of y and the by randomly drawing objects with replacement (cf.
corresponding row of X (cf. Fig. 1a and b for a Fig. 1a and c for a univariate model):
univariate model):
ðybi ; xbi Þ ¼ ðynbi ; xnbi Þ; i ¼ 1; . . . ; I;
yi ¼ ðy1 ; . . . ; yi1 ; yiþ1 ; . . . ; yI ÞT ; i ¼ 1; . . . ; I
b ¼ 1; . . . ; B ð12Þ
Xi ¼ ðxT1 ; . . . ; xTi1 ; xTiþ1 ; . . . ; xTI ÞT ; i ¼ 1; . . . ; I:
ð9Þ where
For these reduced data sets, coefficient vectors
(bi) are estimated. Combining these estimates with nbi ¼ int½U ð0  1Þ  I þ 1; i ¼ 1; . . . ; I;
the estimate of the entire data set (b) yields so- b ¼ 1; . . . ; B ð13Þ
called pseudo-values [4]:
in which int[] symbolizes the integer part of the
bipseudo ¼ Ib  ðI  1Þbi ; i ¼ 1; . . . ; I: ð10Þ associated number and U(0– 1) is a random number
that is uniformly distributed between zero and unity.
The desired covariance matrix follows from the The use of random numbers effectively makes boot-
spread in the pseudo-values as: strapping a Monte Carlo simulation technique. The
procedure is repeated B times, where B should be
1 XI selected large enough to yield precise estimates for
VðbÞ ¼ ðbi  bÞðbipseudo  bÞT the desired standard error. Eq. (13) ensures that for
IðI  1Þ i¼1 pseudo
each draw, any of the ( yi,xi)-pairs is selected with
ð11Þ probability I -1. As a result, some of the objects will
be present more than once, whereas others are not
where b denotes the average of the pseudo-values. selected at all. The desired covariance matrix fol-
When comparing Eq. (11) with the common expres- lows from the common formula of a covariance
sion for a covariance matrix (see Eq. (14) below), the matrix of independent vectors:
additional division by I is noteworthy. The reason for
this additional division is that the covariance matrix for
the mean of the pseudo-values estimates the desired 1 X B
VðbÞ ¼ ðbb  bÞðbb  bÞT ð14Þ
covariance matrix [4]. Martens and Martens [16] B  1 b¼1
introduced a modification of the jack-knife that yields
similar results.
It is seen that this procedure does not make any where b denotes the average of the bootstrapped
assumption about the noise in y or X. The values of b. Bootstrapping objects is similar to the
resampled entities are ( yi,xi)-pairs, so they should jack-knife in the sense that it does not make any
form a random sample from some multivariate assumption about the noise in y or X, but the
distribution. This implies that the data should not ( yi,xi)-pairs should form a random sample from
be designed or— maybe even worse—clustered. some multivariate distribution (random calibration).

2.3.3. Bootstrapping objects 2.3.4. Bootstrapping residuals


In the regression context, two modes of the Bootstrapping residuals starts off from the
bootstrap exist, namely, bootstrapping objects (also regression model for which one attempts to esti-
N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179 173

Fig. 1. Illustration of resampling methods for univariate straight-line fit of single x versus y. (a) Original data points (o), model (—), fitted points
.
( ), and residuals (: : : ). (b) Original data points (o) and model (—) for objects {1,2,4,5} selected by jack-knife (  ). (c) Original data points
.
(o) and model (—) for objects {4,4,4,5,2} selected by bootstrap (  ). (d) Fitted points for original data ( ) and model (—) for objects obtained
by adding bootstrap-selected residuals {3,2,3,1,4} to the fitted points (  ). Resampling objects works directly with the original data points (o),
.
whereas resampling residuals works with the fitted points ( ) and estimated residuals ( – – – ).

mate the uncertainty. First, residuals are calculated new residual vectors are generated by randomly
as: drawing residuals with replacement. Finally, new
data sets are constructed by adding these new
yi  yfit;i residual vectors to the fitted predictand vector (cf.
ei ¼ ; i ¼ 1; . . . ; I: ð15Þ
ð1  df =IÞ1=2 Fig. 1a and d for a univariate model):

The ‘raw’ residual in the numerator is corrected for ybi ¼ yfit;i þ enbi ; i ¼ 1; . . . ; I; b ¼ 1; . . . ; B ð16Þ
degrees of freedom because the difference between
observed and fitted data is consistently smaller than
the deviation from the expected values (E[ yi ]; where nib is as defined in Eq. (13). The procedure is
i = 1,. . .,I). Alternatively, one could adjust the ‘raw’ repeated B times and the desired covariance matrix
residuals by means of the associated leverage. Next, follows from Eq. (14). Unlike jack-knifing and boot-
174 N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179

strapping objects, resampling residuals alters the data reference values were obtained using the Kjeldahl
similarly as the original perturbation would (e.g., method. The training set consists of 24 objects
measurement noise). The procedure assumes that (I = 24). The NIR spectra are digitized at six wave-
the order in which the residuals are drawn is imma- lengths in the range 1680– 2310 nm ( J = 6). This
terial (exchangeability), which is the case when the data set has been used extensively in the chemo-
noise is iid. As discussed by Efron and Tibshirani metrics literature for method testing (see Refs.
[4], bootstrapping residuals yields better results for [3,15] and references therein). The simulated data
the classic linear regression model (full column rank are generated by the following two steps:
X) than bootstrapping objects when this condition is
met. However, since the results depend critically on 1. The ‘true’ y, X, and b in Eq. (1) are the OLS fit of
this iid condition [4], bootstrapping objects is the the experimental y, the experimental X, and bOLS,
preferred mode [17]. respectively.
Finally, it is noted that the ( yi,xi)-pairs need not 2. ‘Experimental’ realizations of y and X are
form a random sample from some multivariate constructed by artificially adding noise to y and
distribution. The reason for this is that the simu- X. The noise added to y is iid with standard
lations are performed conditional on the model. deviation 0.2% (m/m), which is the estimated
This conditioning effectively fixes the xi, so that uncertainty of the Kjeldahl method [19]. The noise
it is immaterial whether the data are designed (fixed added to X is either iid or proportional. The
calibration) or not (random calibration). standard deviation of the iid noise takes the values
0%, 0.25%, 0.5%, 0.75%, and 1% of the maximum
2.3.5. Add noise to original data value of X. Similarly, the proportional noise has
Similar to Eq. (16), N new data sets are gen- standard deviation 0%, 0.25%, 0.5%, 0.75%, and
erated according to: 1% of the associated value of X. It is believed that
the level of the noise in X is unrealistically high for
yni ¼ yi þ MSEC1=2  Fð0; 1Þ; i ¼ 1; . . . ; I; certain spectroscopies (e.g., NIR), but it may be
n ¼ 1; . . . ; N ð17Þ adequate for testing the validity of uncertainty
estimates.
where F(0,1) symbolizes a random number gener-
ated from a distribution with mean zero and stand- A single ‘experimental’ realization suffices to
ard deviation unity. The covariance matrix follows calculate estimates of standard error in the regres-
from the equivalent of Eq. (14). sion coefficients. However, these estimates contain
Noise addition and bootstrapping residuals have an uncertainty themselves. Clearly, an unavoidable
in common that the ( yi,xi)-pairs need not form a source of the uncertainty in error estimates is that
random sample. However, the noise addition some ‘experimental’ realizations are noisier than
method is more versatile because it can deal with others are by chance alone. To quantify the total
heteroskedastic and correlated noise. uncertainty, the error estimates are calculated for
100 independent ‘experimental’ realizations. Boot-
strapping and noise addition are based on 1000
3. Experimental replicates constructed for a single ‘experimental’
realization (B = N = 1000). This large number is
3.1. Simulated NIR data chosen to ensure that the uncertainty of the error
estimates is mainly determined by the variability
Fearn [18] published a NIR data set that was among the ‘experimental’ realizations [20].
collected for the prediction of protein content in The ‘ideal’ estimate of standard error is obtained
ground wheat samples. Because wheat samples from the spread in the regression vectors obtained for
cannot be designed, the experimental set-up con- 1000 ‘experimental’ realizations. Obviously, one can-
forms to random calibration. Hence, no resampling not calculate this ‘ideal’ estimate in practice because
methods can be excluded from the start. The only few realizations are available—most often just a
N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179 175

single one. However, there is a way to approach the


‘truth’ in practice if the spectra of all constituents are
known. In that case, Monte Carlo simulations can be
performed to validate the various approaches in a
similar fashion. Keller et al. [21] have detailed how
to conduct these simulations for two industrial appli-
cations (NIR and visible spectra).

3.2. Real NIR data

This NIR data set was measured to predict the mass


fraction percent O delivered by oxygenates in gaso-
line. The prediction was carried out for the oxygenates
methyl tert-butyl ether (MTBE) and ethanol (EtOH),
as well as water (H2O), which is an unavoidable major
contaminant of EtOH. Full details concerning the
spectral data acquisition, peak identification, reference
method (gravimetry), and data pretreatment (multi- Fig. 2. Distribution of samples in plane spanned by PC scores 1 and
plicative signal correction) are presented elsewhere 2: training set (  ) and test set (o).
[22]. Briefly, training and test sets consist of 40
samples each (I = 40). NIR absorbances are taken at
391 wavenumbers evenly spaced in the region 6000– 3.3. Calculations
9000 cm 1 ( J = 391). Cross-validation and external
prediction testing using the test set lead to the follow- All calculations are performed in MATLAB (The
ing optimum dimensionalities for MTBE, EtOH, and Mathworks, Natick, MA).
H2O, respectively: 5, 8, and 7 (calibration method is
PLSR) and 6, 12, and 10 (calibration method is PCR).
To support the validity of certain resampling meth- 4. Results and discussion
ods, it is necessary to investigate whether relation-
ships exist among the training set data. It is noted that 4.1. Simulated NIR data
the data set was designed with a number of practical
considerations in mind, in particular: Only results obtained when adding proportional
noise to X are presented because the iid results are
1. The target values of 2.0% O and 2.7% O (total very similar. All results are divided by the ‘ideal’
amount) should be well covered. results, which are obtained by 1000 realizations.
2. EtOH, which is typically added to a gasoline at the This normalization facilitates the interpretation
point of delivery, is likely to be the minor because the target value after normalization is unity.
component of a mixed oxygenate gasoline. The order of the calculation is as follows. First, the
3. The H2O concentration should not be unrealisti- mean variances are calculated. Then, these means
cally high. are divided by the optimum value. Finally, the
square root is taken. Recall that Eq. (6) is exact
The analyte concentrations initially proposed by with error-free X, not approximate. The means of
the design were adjusted to facilitate the preparation 100 normalized uncertainty estimates obtained using
of the samples. As a result, no clear relationships the exact formula, bootstrapping residuals, and add-
exist among the training set data (Fig. 2). Thus, it ing noise are close to the target value (Table 1). By
is safe to assume that the random calibration contrast, the jack-knife and bootstrapping objects
assumption is not severely violated, although, in a overestimate the true standard error by approxi-
strict sense, the training set is designed. mately 20%. Apart from overestimating the true
176 N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179

standard error, the latter estimates are more variable Table 2


(Table 2). This result is consistent with Denham’s Standard deviation in normalized estimates of standard error in OLS
coefficients for simulated data set with proportional noise in X (100
observation that formula-based estimates of root runs). All standard error estimates are normalized with respect to the
mean squared error of prediction (RMSEP) are ideal results
more stable than the results obtained for certain Noise in Method Variable
resampling methods [15]. X (%)
1 2 3 4 5 6
The standard deviation in the formula-based
0 Exact formula 0.18 0.18 0.18 0.18 0.18 0.18
result is close to what is expected. This can be
Jack-knife 0.35 0.36 0.34 0.33 0.33 0.36
understood as follows. Without noise in X, the Bootstrap objects 0.26 0.24 0.24 0.25 0.26 0.25
square of the standard error estimate is distributed Bootstrap residuals 0.18 0.18 0.18 0.18 0.18 0.18
proportional to a v2 variable. For a v2 variable Noise addition 0.18 0.19 0.19 0.18 0.18 0.18
with m degrees of freedom, the mean is m while the 1 Approximate 0.25 0.24 0.25 0.21 0.23 0.26
formula
variance is 2m. The relative standard deviation pffiffiffiffiffiin Jack-knife 0.40 0.41 0.45 0.38 0.44 0.42
the estimate of standard error follows as 1= 2m . Bootstrap objects 0.32 0.34 0.34 0.30 0.36 0.35
This is an approximation (because the square root Bootstrap residuals 0.25 0.24 0.25 0.21 0.23 0.26
is a nonlinear function) that works quite well for Noise addition 0.25 0.24 0.25 0.21 0.23 0.26
large degrees of freedom. Inserting the degrees of
freedom 24  6 = 18 yields 0.17, which is close to
the corresponding numbers in Table 2. The stand-
ard deviations in the mean values presented in high by approximately 20%. It is remarkable that
Table 1 follow by multiplying
pffiffiffiffiffiffiffiffi the numbers in for all methods, the standard error estimate has
Table 2 by the factor 1= 100 ¼ 0:1. In this way, degraded for variable 5. Currently, there is no
it can be inferred that the results obtained by the explanation for this phenomenon. One of the
exact formula do not deviate significantly from reviewers noted that the degradation of variable 5
unity. is presumably connected with the correlation struc-
When adding 1% proportional noise, the approx- ture between the predictors. This correlation struc-
imate formula slightly overestimates the true stand- ture can be investigated using diagnostics such as
ard error, but remains consistent with bootstrapping variance inflation factors. However, these diagnos-
residuals and adding noise (Table 1). The alterna- tics did not reveal anything particular for variable
tive resampling methods yield results that are too 5. There is an overall increase in variability in the
estimates for standard error (Table 2), which is a
logical consequence of the increased variability in
the data. The results obtained for the intermediate
Table 1
Mean estimates of standard error in OLS coefficients for simulated noise levels in X (0.25%, 0.5%, and 0.75%) are
data set with proportional noise in X (100 runs). All results are similar to the ones presented for 1% (not shown).
normalized with respect to the ideal results The main assumption underlying Eq. (6),
Noise in Method Variable namely, that MSEC accounts for all noise in y
X (%) and X, is known to be valid for iid noise in X in
1 2 3 4 5 6
0 Exact formula 0.98 0.98 0.98 0.98 0.98 0.98
connection with OLS [13]. The generalization to
Jack-knife 1.24 1.21 1.21 1.25 1.23 1.26 proportional noise seems to work well.
Bootstrap objects 1.22 1.19 1.18 1.24 1.19 1.25
Bootstrap residuals 0.98 0.98 0.98 0.98 0.98 0.98 4.2. Real NIR data
Noise addition 0.98 0.98 0.98 0.98 0.98 0.98
1 Approximate 1.02 1.04 1.01 1.05 1.27 1.13
formula
Only PLSR results are shown because the PCR
Jack-knife 1.19 1.21 1.27 1.32 1.64 1.22 results are very similar. As expected from the
Bootstrap objects 1.15 1.17 1.19 1.26 1.54 1.22 simulations, the resampling methods fall apart in
Bootstrap residuals 1.02 1.04 1.01 1.04 1.26 1.13 two categories that yield comparable results: jack-
Noise addition 1.03 1.04 1.01 1.05 1.26 1.14 knife and bootstrapping objects versus bootstrap-
N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179 177

Fig. 3. Standard errors in PLSR coefficients for oxygenate data (  10 2): formula-based versus the ones obtained by bootstrapping objects
(top) and bootstrapping residuals (bottom).

Fig. 4. Reliabilities of PLSR coefficients for oxygenate data: bootstrapping objects (top) and bootstrapping residuals (bottom). The dotted line
(: : :) indicates the value corresponding to a 90% confidence interval including zero.
178 N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179

ping residuals and noise addition. Consequently, we


will restrict ourselves to the discussion of the two
modes of bootstrapping. Unlike bootstrapping
objects, bootstrapping residuals is consistent with
Eq. (8) (see Fig. 3). As noted by one of the
reviewers, this just indicates consistency between
the methods and not whether the approaches pro-
duce the correct estimates of standard error, which
is the important question. However, Eq. (8) is
consistent with a formula for sample-specific stand-
ard error of prediction that gave promising results
for NIR [22 – 24] as well as fluorescence data [25].
These all lead to a convenient error analysis for
multivariate calibration models. The standard errors Fig. 6. PLSR coefficients for H2O between 7600 and 8200 cm 1 for
obtained when bootstrapping objects are believed to which the 90% confidence interval includes zero: bootstrapping
objects () and bootstrapping residuals (*).
be overly pessimistic. This conjecture is supported
by calculating the reliabilities of the regression
coefficients as [1]: of 391) contradicts the smoothness of the regres-
sion vector (Fig. 5). Moreover, on the basis of 142
bj insignificant coefficients, one would expect a much
cj ¼ ; j ¼ 1; . . . ; J : ð18Þ
rðbj Þ larger number of actual zero crossings. This is
further illustrated by plotting only the insignificant
Wehrens and Van der Linden [2] removed wave- coefficients for a limited region (Fig. 6). Similar
lengths for which the 90% confidence interval observations can be made for Figs. 5– 8 in Ref.
includes zero. Assuming t-statistics, this condition [2].
translates to demanding the reliability to exceed
1.69 for all three analytes. The largest difference
between the two bootstrap methods is observed for 5. Conclusions
the H2O calibration (Fig. 4). Bootstrapping objects
suggests 142 coefficients to be insignificant versus The results presented in this work suggest that the
only 28 for the residual-based method. However, best resampling methods for uncertainty estimation
this large number of insignificant coefficients (142 explicitly work with noise, rather than objects.
Besides more closely resembling the probability
mechanism underlying the observed data [4], this
also leads to more stable uncertainty estimates. The
latter is an added bonus that concurs with results
presented by Denham [15]. Because the approximate
formula, i.e., Eq. (8), performed equally well to
bootstrapping residuals and noise addition, it seems
to offer a viable alternative to resampling.
It is important to note that this finding strongly
disagrees with the main conclusion of a study where
the uncertainty in second-order bilinear predictions
using the generalized rank annihilation method
(GRAM) was assessed [26]. In that specific context,
noise addition seems to be the most versatile method.
The reason for this discrepancy is that the approx-
Fig. 5. PLSR coefficients for H2O. imate variance expression for GRAM requires
N.M. Faber / Chemometrics and Intelligent Laboratory Systems 64 (2002) 169–179 179

detailed knowledge about the measurement noise, [7] E.P.P.A. Derks, M.S. Sánchez Pastor, L.M.C. Buydens, Che-
which is not always available. By contrast, Eq. (8) mometr. Intell. Lab. Syst. 28 (1995) 49.
[8] D.L. Duewer, B.R. Kowalski, J.L. Fasching, Anal. Chem. 48
requires only the MSEC as input. (1976) 2002.
[9] B.K. Dable, K.S. Booksh, J. Chemom. 15 (2001) 591.
[10] F.J. del Rı́o, J. Riu, F.X. Rius, J. Chemom. 15 (2001) 773.
Acknowledgements [11] P.J. Brown, Measurement, Regression, and Calibration, Clar-
endon Press, Oxford, 1993(Chap. 5).
[12] S. de Jong, Chemometr. Intell. Lab. Syst. 18 (1993) 251.
The National Institute of Standards and Technol-
[13] S.D. Hodges, P.G. Moore, Appl. Stat. 21 (1972) 185.
ogy (NIST) is thanked for making the oxygenate data [14] H. van der Voet, J. Chemom. 13 (1999) 195.
available for this study. The critical remarks by Frank [15] M.C. Denham, J. Chemom. 14 (2000) 351.
Schreutelkamp, Age Smilde, and two reviewers are [16] H. Martens, M. Martens, Food Qual. Prefer. 11 (2000) 5.
appreciated by the author. [17] R. Wehrens, H. Putter, L.M.C. Buydens, Chemometr. Intell.
Lab. Syst. 54 (2000) 35.
[18] T. Fearn, Appl. Stat. 32 (1983) 73.
[19] H. Martens, T. Naes, Multivariate Calibration, Wiley, Chiches-
References ter, 1989.
[20] J.S. Alper, R.I. Gelb, Talanta 40 (1993) 355.
[1] V. Centner, D.L. Massart, O.E. de Noord, S. de Jong, B.G.M. [21] H.R. Keller, J. Röttele, H. Bartels, Anal. Chem. 66 (1994)
Vandeginste, C. Sterna, Anal. Chem. 68 (1996) 3851. 937.
[2] R. Wehrens, W.E. Van der Linden, J. Chemom. 11 (1997) 157. [22] N.M. Faber, D.L. Duewer, S.J. Choquette, T.L. Green, S.N.
[3] K. Faber, B.R. Kowalski, J. Chemom. 11 (1997) 181. Chesler, Anal. Chem. 70 (1998) 2972.
[4] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, [23] R. Boqué, M.S. Larrechi, F.X. Rius, Chemometr. Intell. Lab.
Chapman and Hall, London, 1993. Syst. 45 (1999) 397.
[5] W.H. Press, B.P. Flannery, S.A. Teukolski, W.T. Vetterling, [24] J.A. Fernández Pierna, L. Jin, F. Wahl, N.M. Faber, D.L.
Numerical Recipes. The Art of Scientific Computing, Cam- Massart, Chemometr. Intell. Lab. Syst., accepted for pub-
bridge Univ. Press, Cambridge, 1988(Section 14.5). lication.
[6] R.J. Carrol, D. Ruppert, L.A. Stefanski, Measurement Error in [25] A.C. Olivieri, J. Chemom. 16 (2002) 207.
Nonlinear Models, Chapman and Hall, London, 1995(Chap. 4). [26] N.M. Faber, Anal. Chim. Acta 439 (2001) 193.

You might also like