Use and Abuse of Chemometrics in Chromat

Published in Trends in Analytical Chemistry 25 (2006) 1081-1096;
http://dx.doi.org/10.1016/j.trac.2006.09.001
USE AND ABUSE OF CHEMOMETRICS IN CHROMATOGRAPHY
Michał Daszykowski and Beata Walczak
Institute of Chemistry, The University of Silesia,

9 Szkolna Street, 40-006 Katowice, Poland
e-mail: beata@us.edu.pl
KEYWORDS
mixture analysis, signals alignment, warping, denoising, wavelets
ABSTRACT
This paper is going to present the reader with a panorama of highly important issues that emerge
on an interface between chromatography and chemometrics. In the first part of the paper we
presented advantages and also drawbacks of applying the signal enhancement, warping, and
mixture analysis methods. In the second part typical examples of misuse and abuse of
chemometrics that can occur to those less familiar with the data processing approaches were
discussed. Finally, we conclude that close collaboration between the communities of
chromatographers and chemometricians will allow getting a deeper insight in the analyzed
chromatographic systems and permit to solve new chromatographic problems in an efficient and
elegant manner.
1
INTRODUCTION
Chemometrics is considered as part of analytical chemistry. In its arsenal we can find methods
that help analytical chemists to deal with all steps of analytical procedures, starting from the
design of an experiment to extraction of information and to the final decision making. A majority
of chemometrics methods are general, i.e. they can be applied to any type of analytical
experiment and to any type of instrumental signal. However, there are problems associated with
the specific types of instrumental signals or with the particular analytical techniques that need
special treatment. Only with the knowledge of the system studied and of the principles of the
performed measurements, the well suited methods can be chosen. What are the methods specific
of chromatography? Mainly, the mixture analysis approaches and warping. Other chemometrical
approaches are of general use, but their possible applications to chromatography are unlimited. It
is enough just to mention all the methods of data compression, visualization, calibration,
classification etc. [1].
We cannot, however, forget that chemometrics is a relatively new sub-discipline and many
methods, today included in its arsenal, have been successfully applied to chromatography already
decades ago. In this place it will be enough just to mention the experimental design and the
optimization methods.
As any other analytical technique, chromatography adapts from the other fields what is necessary
and useful for its development and the speed of these adaptations is determined by the complexity
of the problems to be solved, by instrumentation currently in use and by the amount of the data to
be processed. From its part, chemometrics attempts to cope with the ongoing challenges and to
develop new tools to deal with the new problems. There are, however, the old chromatographic
problems that can efficiently be solved only now, due to an increasing power of the computers,
and due to the progress in the computer-related fields of human knowledge.
This paper is going to present the reader with a panorama of highly important issues that emerge
on an interface between chromatography and chemometrics. Invitation to write this particular
paper was directed to us as to the authors well familiar with chemometrics and also apprehensive
of certain chromatographic problems, and yet it has proved a rather demanding and even a
challenging task to us. The first step was to develop our own vision of the questions under
discussion and to group them into the thematic ‘building blocks’, then to be handled less
2
generally and with greater ease. Our first thematic “building block” was addressed to all
chromatographers who are still not fully convinced that chemometrics can make their life easier.
In this section we presented advantages (and also certain drawbacks) of applying the signal
enhancement, warping, and mixture analysis methods. In the second “building block” we
discussed typical examples of misuse and abuse of chemometrics that can occur to those less
familiar with the data processing approaches. Even if certain examples were taken directly from
literature, it was certainly not meant to criticize, but to provide instruction for the future. In the
third “building block” we focused on the most up-to-the-date challenges of chemometrics in
service of chromatography. The paper ends with the conclusions directly derived from the three
preceding sections.
As an opening to the issues of interest, we performed a search of literature available in the
Scopus system. Our search focused on paper titles and on paper key words that contained
chemometric notions in combination with the notion “chromatography”. The results of this search
are given in Table 1. An overall picture of chemometric applications to solving the
chromatographic problems seems rather promising. Even if certain key words have a relatively
high score, there is still a vast space left for many possible uses of chemometric methods for an
enhancement, processing and analysis of chromatographic data in an everyday chromatographic
practice.
Table 1
Now let us start our story from presentation of a standard arsenal of chemometric methods that
can widely and profitably be invested in chromatography.
WHAT CHEMOMETRICS CAN OFFER CHROMATOGRAPHY?
Signal enhancement
Quality of chromatogram(s) determines final results of the chromatographic analysis and a

properly performed preprocessing step is of crucial importance. As any other instrumental signal,
3
chromatograms contain three major components: signal, noise and background (see Fig. 1). These
components differ in their frequency. Noise is the highest frequency component, background is
the lowest frequency component, and frequency of the signal is usually intermediate.
Fig. 1
Chromatogram enhancement can be achieved by eliminating the noise and background

components.
De-noising
To eliminate undesired frequencies from the signal without a distortion of the frequency region
that contains crucial information, a processed signal ought to be treated with digital filters.
Usually, digital filtering can be performed either in the time domain, or in the frequency domain
(there is equivalence between the direct time-domain and the indirect frequency-domain noise
filtering).
To analyze a signal in both the time and the frequency domain, the windowed Fourier Transform,
FT, is often used. The main idea of the windowed FT is to study the frequencies of a signal,
segment by segment. This approach has, however, a serious disadvantage and namely, the smaller
is the window, the better localized are sudden changes (peaks), but then less is known about the
lower frequency components of the studied signal. If a wider window is applied, then more of the
low frequencies are observed, but localization in time is worsened.
In the arsenal of the de-noising tools there is, however, one approach well suited for the non-
stationary type of signals, i.e. the signals with the components of very different frequencies and
namely, de-noising in the wavelet domain. Wavelets automatically adapt to the different
components of a signal, using a narrow window to look at the high-frequency components, and a
wide window to look at the long-lived, low-frequency components of a signal. In the other words,
the signal can be studied at the different resolution levels. At a coarse resolution, we can get an
overall picture of an analyzed signal, and at the consecutive higher levels we can see its
increasingly finer details.
4
Fig. 2
Due to unique properties of wavelets as the basis functions (they are orthogonal and local), the
non-stationary signals can be processed very efficiently, and the multi-resolution theory provides
a simple and very fast method for decomposing a signal, whose length equals to an integer power
of two, into its components at different scales (see Fig. 2). The time-frequency analysis is
performed by repeating the filtering of a signal. At each filter step, the frequency domain is cut in
the middle, using the pair of filters, the low-pass filter and the high-pass filter. From the raw data
the first step produces n/2 low-frequency coefficients and n/2 high-frequency coefficients. In
each consecutive step the high frequency coefficients are kept and the same filters are used to
further subdivide the low frequencies, until only one point is left [2].
Fig. 3
Wavelet coefficients at the first level of signal decomposition are associated with the basis
functions of the highest frequency and they can be used to estimate the noise level. Namely, for
the normalized signal, the cut-off value, t, can be calculated, for instance, as [3]:
t = σ 2 log n Eq. 1
where n is the length of the signal, and σ is the variance of the noise, estimated on the basis of the
wavelet coefficients at the first level of resolution (d1) as:
σ = median (| d1 |) / 0.6745 Eq. 2
Depending on the applied thresholding policy, wavelet coefficients with the amplitudes lower
than the cut-off can be replaced by zeros (hard thresholding) or additionally, the remaining
coefficients can be diminished by the cut-off value (the so-called soft thresholding) [3].
Inverse wavelet transform of the pretreated signal allows reconstruction of the signal in the time
domain, but now it is free from the noise component (see Fig. 4).
5
Fig. 4
Background elimination
Elimination of the chromatogram baseline is of great importance, e.g., for peak detection or for a
comparison of the different chromatographic signals, because the varying baseline can highly
influence the similarity measures, such as, e.g. correlation coefficient or Euclidean distance, of
the compared signals.
Among the different approaches to baseline approximation (and elimination), we favor the
approach proposed by Eilers [4], in which the asymmetric least squares method is employed.
Namely, the objective function of the method, Q, is defined as:
Q = ∑ v i ( y i − ŷ i ) 2 + λ ∑ (∆2 ŷ i ) 2 Eq. 3
i i
where y is the experimental signal (e.g., a chromatogram), ŷ is the smooth trend (in the discussed
case, the baseline approximation), vi are the prior weights, λ is the positive parameter weighing
the second term in eq. 3, and ∆ denotes the derivatives of ŷ.
The first term of eq. 3 represents the weighed squared residuals, whereas the second term is
associated with roughness penalty.
Chosing the weights in the asymmetric way:
vi = p if yi > ŷi Eq. 4
vi = 1- p if yi < ŷi Eq. 5
with 0 < p < 1,
6
we can differently weight the positive and the negative deviations from the trend ŷ. E.g., if p =
0.01, all data points with positive deviation from the approximation ŷ are going to exert a very
small influence on the baseline approximation.
The problem is, how to determine ŷ with unknown weights. This can be done in an iterative way.
Staring with vi=1 for all the data points, we can calculate the first approximation of the signal,
and then for all the points above this first approximation, take p = 0.01 and find the second
approximation of the signal etc.
The consecutive approximations of the chromatogram baseline are shown in consecutive panels
of Fig. 5. Once the baseline is approximated in a satisfactory manner, it can be subtracted from
the studied signal.
This approach requires optimization of parameter λ, and eventually, of the degrees of derivatives.
Fig. 5
After some modifications [5], the discussed approach can also be applied to the background
estimation with the two-dimensional signals such, as 2D gel electropherograms (see Fig. 6).
Fig. 6
Pitfalls of the presented approaches
We hope to have sufficiently illustrated the practical value of the signal enhancement methods.
Now the time has come to consider, why their applications in chromatography are rather limited.
Let us start with wavelets. They are a very flexible tool, but in reality it means that the user is
faced with many choices. Just to mention a handful of basic ones, let us enumerate selection of
the wavelet transform, selection of the wavelets basis, selection of the decomposition level,
selection of the thresholding policy and selection of the threshold criterion. As the choices are
data and problem dependent, implementation of the fixed rules does not seem a proper way of
dealing with this problem. With the rules fixed, the power of wavelets as a flexible tool will be
lost in a given application. Also the earlier mentioned background correction approach requires
7
optimization of the input parameters. User has to select a proper degree of derivatives and the
penalty parameter, and both parameters depend on the chromatograms at hand.
In the other words, it is not easy to implement these methods as the black-box approaches. User
ought to be aware of their principles and limitations. However, one can hardly demand from a
regular chromatographer to have advanced chemometric knowledge, necessary to select a proper
approach and to optimize input parameters of the method.
Moreover, although there are many public domain programs (e.g., in Matlab code), they are not
implemented in any standard chromatographic software. It seems that the software producers
have still a big role to play in popularization of the new chemometric approaches.
The situation is not better in the case of the other chemometric approaches, particularly those
considered by us as developed especially for the chromatographic purposes (i.e. the warping
approaches and the mixture analysis methods).
Alignment of chromatograms by means of warping techniques
Chromatograms of complex mixtures can be treated as their fingerprints and used in the same
way as the other instrumental signals for further data analysis (calibration, classification, etc.).
This, however, requires uniform representation of the studied signals in the matrix form. To
construct such representation for m chromatograms (representing m samples), it is necessary to
ensure that there is synchronization of their time axes. Synchronization of the time axes, named
warping or alignment, is not a trivial problem. There are different approaches to deal with signals
shifts [6,7,8,9] and the correlation optimized warping [10,11] is the most popular and efficient
among them. The examples of application of the alignment methods to the different
chromatographic signals can be found in [12,13,14,15,16,17,18,19,20,21]. Let us briefly present
one of the most interesting warping approaches, and namely, the correlation optimized warping.
Correlation Optimized Warping
The Correlation Optimized Warping (COW) algorithm aims to correct peak shifts in a
chromatogram P with respect to the target chromatogram, T [10]. This is achieved by means of
8
the piecewise linear stretching and compression of chromatogram P, such that the correlation
between P and T is maximal. Two input parameters are required and responsible for quality of the
chromatograms alignment. The first parameter is a number of sections, N, into which the
chromatograms are divided. The second parameter, t, the so-called warping parameter, defines a
possible end positions of a section and is responsible for the degree of alignment (flexibility).
Namely, for the larger values of t, the larger time shifts can be corrected. By means of piecewise
stretching and compression of the corresponding sections in the two chromatograms the signals
are aligned such that the overall correlation between them is maximal.
To illustrate the performance of the COW approach, two pairs of chromatograms were chosen,
where the peak shifts along the time axis were nearly linear for the first pair of chromatograms
(see Fig. 7, the first column) and highly non-linear for the second pair of chromatograms (see Fig.
7, the second column).
Fig. 7
The alignment presented for the first pair of the chromatograms is satisfactory, when the
chromatograms are divided into 35 sections and the warping parameter t is set to 4. The initial
correlation between the two chromatograms is equal to 0.59, whereas for the aligned
chromatograms the correlation is ca. 1.00. It should be stressed that the COW algorithm gives a
possibility to obtain the optimal alignment, but selection of the warping parameters is crucial.
The computational time of the algorithm is exponentially influenced by the high t values.
Although parameter t is responsible for flexibility of the signals alignment and with the higher t
values more drastic retention shifts can be corrected, by increasing the number of sections at the
relatively low t values some extra flexibility of the alignment can also be gained. When the
alignment is unsatisfactory, then the larger t values should be considered. Nevertheless, in a
majority of applications it is possible to achieve good alignment at the low t values, thus ensuring
a reasonable computational time.
In the second example, where the complex retention shifts are observed, the COW algorithm
gives an acceptable alignment with 25 sections and t=5. In this case, dividing the chromatograms
into more sections and using low t values does not lead to a better alignment. For the selected
9
warping parameters, the initial correlation of the two chromatograms is considerably improved
from 0.05 to 0.98.
The COW algorithm can also be used for warping of the LC-MS signals along the time axis.
Among the available approaches to warping of the 2-dimensional signals [22], we would like to
mention the fuzzy warping iterative algorithm, which has been developed for the alignment of the
2D gel electropherograms. It allows an automated global and local warping of the two images
with unidentified components of a studied mixture (see Fig. 8). It is suitable for an alignment of
the one-dimensional signals as well [23] and it is particularly useful for warping of signals that
contain thousands of data points [24], for which the use of the COW algorithm is not feasible.
Fig. 8
Deconvolution of chromatographic signals
Deconvolution is used to enhance selectivity of a certain chromatographic technique, when an

improved separation by optimization of the separation conditions cannot be achieved. There are a
number of chemometrical approaches that help to deconvolute the measured chromatographic
signals, using a mono-channel detector. In principle, however, most of them require an advanced
knowledge and thus they are usually not used in an everyday practice. Just to give an impression
about complexity of this problem, usually a chromatographer is faced with the signal de-noising,
peak detection, selection of certain deconvolution method, chosing a peak model and specifying
the range of characteristics for the parameters of the peak model. In order to facilitate
deconvolution and to encourage chromatographers to use deconvolution, an automatic program
was developed [25,26]. The main idea while developing this program was to make the task of
deconvolution easy for non-experienced users with little knowledge about implemented
chemometric tools and little knowledge about the analyzed samples.
Mixture Analysis (MA)
Nowadays the so-called hyphenated chromatographic techniques became a standard analytical

tool. Among the most popular hyphenated techniques there are high-performance liquid
10
chromatography with the diode array detection (HPLC-DAD), HPLC or gas chromatography
coupled with the Fourier Transform infrared spectrometric detection (HPLC-FTIR, GC-FTIR)
and liquid or gas chromatography with mass spectrometric detection (LC-MS, GC-MS). An
interest in hyphenated techniques arouse from a surplus information about the analyzed samples
that they can provide, when compared with the standard techniques. From a practical point of
view, it is possible to detect co-eluting compounds, i.e., to identify substances present in one
peak, or in the other words, to evaluate the purity of the chromatographic peaks. This information
is of great importance, when studying, e.g., purity/contamination pharmaceutical/chemical
products.
A typical data obtained with aid of a hyphenated technique can be presented as a two-way data
table, X. An example of such the data, obtained by means of the HPLC-DAD technique, is given
in Fig. 9.
Fig. 9
The columns of X are the chromatograms of a sample, registered by using a single wavelength
channel of the detector, whereas the rows are the spectra at a given elution time. This type of the
data has a bi-linear structure. This means that the chromatographic data can be decomposed into
the two matrices containing concentrations and spectral profiles, C and A, respectively:
X = CAT Eq. 7
Decomposition of the bi-linear data does not have a unique solution. Only when the good
estimates of the spectral profiles are known and the additional constrains are introduced, a unique
solution of the decomposition of X exists. To identify an initial concentration and the spectral
profiles, such approaches as, e.g., Orthogonal Projection Approach [27] or the NEEDLE
algorithm [28] can be used. For the data decomposition under certain constraints, the Alternating
Least Squares (ALS) can be applied [29].
A collection of the mixture analysis approaches is very diverse. Among them one can find such
methods, as Evolving Factor Analysis [30,31], Heuristic Evolving Factor Analysis [32,33],
Iterative Target Transformation Factor Analysis [34], SIMPLISMA [35], Window Factor
11
Analysis [36], and many more. A good overview of the different mixture analysis approaches is
given in [37].
Let us briefly discuss two approaches used in mixture analysis, i.e. the Orthogonal Projection
Approach and the Alternating Least Squares.
Initial estimation of the pure concentration and of spectral profiles
Orthogonal Projection Approach
The Orthogonal Projection Approach (OPA) aims to determine the most dissimilar spectral or
concentration profiles [27]. This is done in a stepwise manner, by finding one by one the pure
profiles, being as dissimilar to each other, as only possible. To score dissimilarity of the spectra
(or of the concentration profiles), a dissimilarity criterion, d, is introduced in OPA. The
dissimilarity between a set of profiles, s1, s2, ..., sn, being the column vectors, is defined as the
determinant of the so-called dispersion matrix (YTY), with the normalized s1, s2, ..., sm, in the
columns of Y:
d = det (YTY) Eq. 8
In order to determine the first pure profile, i.e., the most dissimilar to the remaining ones, the
profiles are compared with a reference profile, being the normalized mean profile. The
dissimilarity of a possible pure profile is scored by the determinant of the dispersion matrix,
where the columns of Y are the normalized mean profile and the actually tested profile. The first
pure profile is the spectrum with the largest dissimilarity measure. The second pure profile is
found by calculating dissimilarity for all the profiles with respect to the first pure profile (i.e., by
calculating the determinant of the dispersion matrix, where in the columns of Y there are
normalized (i) the first pure profile and (ii) the profile considered). The consecutive pure profiles
are found as the most dissimilar to the already selected ones. To fulfill this condition, matrix Y
holds in the columns the normalized pure profiles, already determined at the earlier steps of the
procedure, and the currently considered profile.
12
Information about the number of individual components in the mixture can be found by
examining the so-called dissimilarity plot. It presents the dissimilarity of all spectra calculated
with respect to the pure profiles, already found at each step of the procedure. If the dissimilarity
plot resembles the random profile, it means that all pure profiles in the data were identified. An
example of the dissimilarity plots constructed for a three-component mixture of the pesticides
(containing two known and one unknown component) in order to determine their pure
concentration profiles, is shown in Fig. 10. A more detailed description of the data and of the
experiment itself can be found in reference [38]. Concentrations, spectral profiles, and the spectra
of the two known pure components are shown in Fig. 11.
Fig. 10
Fig. 11
The first dissimilarity plot suggests that the first initial spectral profile corresponds with spectrum
no. 25 (see Fig. 10a). The second initial spectral profile, according to the dissimilarity plot, is
spectrum no. 61, and the third one is spectrum no. 38 (see Fig. 10c).
Alternating Least Squares
The ALS approach is the so-called self-modeling curve resolution technique and it aims to
provide the pure concentration and spectral profiles [29].
Once the initial pure spectra are estimated, for instance with aid of the OPA algorithm, the
concentration and spectral profiles are obtained in an iterative way with the least-squares
procedure. With each individual iteration of the algorithm, the concentration and spectral profiles
are calculated, as follows:
C = XA(ATA)–1 Eq. 12
A = XTC(CTC)–1 Eq. 13
13
Hence, after each step of the algorithm, a better solution is obtained. The procedure is continued,
until the convergence is reached, i.e., until the squared differences between the original data and
those reconstructed (X and CAT), are smaller than a predefined limit.
It is possible to implement certain constrains within the steps of ALS. Depending on the type of
the data, the most often applied constrains are unimodality, non-negativity and closure. The
unimodality constrain forces the chromatograms to have the unimodal shape, i.e., one peak only.
The non-negativity constrain assumes that the concentration and spectral profiles are positive and
the closure ensures that the total concentration of the analytes remains unchanged.
Let us briefly present application of ALS to resolving the chromatograms of the co-eluting
compounds. In Fig. 12, the pure concentration profiles and their corresponding spectral profiles
obtained for the mixture of the pesticides with the ALS procedure are shown. The initial spectral
profiles used in ALS were identified with the OPA approach.
Fig. 12
Now let us focus on the more exciting applications of the MA techniques to chromatography. A
large collection of the hyphenated techniques provides an almost unlimited number of the
potential applications of the MA techniques to resolving the overlapping peaks. An excellent
review of the latest trends in the multivariate resolution techniques (together with many practical
examples, including their use in chromatography) is given in [39]. The advantages of studying
the data obtained from the hyphenated techniques along with using of the various chemometrical
techniques is highly appreciated in the quality control of herbal medicines [40]. Recently, the use
of the MA techniques to analyze the data obtained from various different hyphenated techniques
(such, as on-flow LC-NMR [41], LC-LC [42], GC-MS [43], GC-SIM [44], and LC-MS [45]) has
been reported in the literature. Mixture analysis plays an important role in the environmental
studies, e.g., in the analysis of the environmental samples [46], or in monitoring of the different
pollutants (such, as various organic contaminants, e.g. polycyclic aromatic hydrocarbons,
pesticides, herbicides and the metabolites in waters, e.g., [38,47]). Also in the metabolomic
analyses, MA was found to be valued for extracting chemical information from the complex LC-
MS and GC-MS data (see, e.g., [48,49]). With aid of the MA techniques it is also possible to
determine the kinetic parameters of the enzymes (the Michaelis constant, the maximum velocity,
14
and the inhibition constants) based on chromatographic data, as reported in [50], to control purity
of drugs [51], and to characterize the reversed-phase liquid chromatographic stationary phases
[52,53].
A list of possible application of MA to chromatography can still be extended. However, let us
stop at this point. As shown, MA has already found many interesting applications, although many
more are yet to be published. We could certainly expect intensive applications of MA, e.g., in the
area of the drugs purity assignment, where their use could strongly limit the necessity to apply the
costly "orthogonal systems".
MISUSE OF THE CHEMOMETRICS APPROACHES IN CHROMATOGRAPHY
Typical examples of misuse and abuse of the chemometrics methods
There is another important problem, which we now ought to discuss, and namely, an improper
use of the chemometrics methods, or/and an improper choice of a method to deal with a problem
at hand.
Application of improper methods
For the sake of example, let us mention one of the most popular mistakes. Let us assume that we
have at our disposal the chromatograms of the – nominally at least – fully analogous products,
although of a different origin (e.g., samples of olive oil originating from the different countries)
and that we are interested in tracing differences in composition of these samples, caused by the
place of their origin. In order to study the effect of the sample origin on its composition, we ought
to apply the classification or the discrimination approach. This type of approaches is known as
supervised learning (both, sample composition and its origin are known a priori). However, in
many applications the unsupervised methods such as, e.g., Principal Components Analysis, are
used for data analysis instead of the supervised approach. In order not to leave this statement
unfounded, let us refer our readers, e.g., to paper [54], where the sentence "Principal component
15
analysis (PCA) recently became a popular technique in data analysis for classification" appears in
a repeated manner, or to paper [55], where the authors conclude that "The results show that the
artificial neural network technique is better than the principal component analysis for the
classification of healthy persons and cancer patients based on nucleoside data".
If the data is well structured, then the PCA score plots alone can reveal the groups of samples of a
different origin, although the lack of these groups in the PCA space does not necessarily mean
that there is no statistically significant difference between these samples and that their
classification or discrimination is impossible. PCA by definition maximizes description of the
data variance, but the main variance may not necessarily be associated with the effect studied (in
our case, with sample origin). Of course, PCA can be used for exploration (compression,
visualization, etc.) of any data, but it cannot be mixed with the supervised classification methods
such as, e.g., SIMCA (where PCA is applied to each class of objects separately and the PCA
models are used for samples classification) or the supervised discrimination approaches such, as
Linear Discriminant Analysis (LDA), Discriminant Partial Least Squares (D-PLS), Neural
Networks (NN) or Support Vector Machines (SVM).
As an application of an improper method to solve a given problem is quite popular, let us cite
another example. In paper [56] the authors used PLS "to select the most significant variables
from a large set of descriptors" to model gas chromatographic retention indices and in the
Conclusions we can find that "PLS failed as a variable selection method".
PLS is not a method of variable selection. We agree that basing on the final PLS model, we can
draw certain conclusions about significance of individual variables in model construction, but this
can only be done (based on the values of the regression coefficients) if data were standardized,
which is not the case in the cited study. In the same paper as an alternative to PLS for variable
selection, the following approach was introduced: Construct the MLR models containing two
parameters, i.e. the boiling point and the topological index X, select indices, for which the
individual two-parameter models were the best and construct the final MLR model with these
particular parameters. For those familiar with basic statistical methods it should be obvious that it
is not the best approach, because individual variables are considered in an independent manner. It
would be much better to apply stepwise MLR, which is the basic method available in any
statistical software. In stepwise MLR, it is possible to construct the MLR model with the boiling
16
point, and then find variable(s), well modeling the residuals from the previous model. There are
also other approaches such as, e.g., Uniformative Variable Elimination – PLS (UVE-PLS) or
Classification and Regression Tree (CART), which could be a proper choice, well suited for the
purpose of variables selection.
Improper (or lack of) model validation
Another example of a popular abuse of the chemometrics methods is an improper validation (or
even its lack) of the presented models. It does not seem obvious to everyone that the model fit has
nothing in common with its predictive power and that it is possible to fit almost any type of the
data, especially when using flexible non-linear modeling techniques. This problem appears quite
often, when the data are modeled with the Neural Networks (e.g. [57], where NN is used for
modeling retention of nine phenols as a function of the mobile phase composition). There is no
good reason to optimize the architecture of NN (i.e. the number of nodes in the hidden layer),
based on the Root Mean Square Error for the training set. It is necessary to apply the monitoring
set (i.e. the set which is used to trace the predictive power of NN, but it is not used for training of
NN) or to use the Cross-Validation (CV) procedure (if NN is stable). In the discussed paper, the
CV procedure is mentioned, yet not applied. Finally, the predictive power of NN ought to be
estimated with an independent test set. The role of the monitoring and the test sets is not the
same. The monitoring set is already used for optimization of the NN architecture, so it cannot be
considered as independent anymore. There is also no reason to determine the number of the
learning epochs, based on the Root Mean Square Error of prediction for the test set. This ought to
be done with the monitoring set. The final NN model, proposed in this paper for modeling
retention of 9 phenols as a function of mobile phase composition, contains 129 weights (i.e., 129
parameters). Taking into the account the fact that the NN architecture was not properly optimized
and that these 129 parameters were calculated based on 25 experimental examples, we can be
sure, that this model will be useless for practical applications.
The list of papers with NNs containing hundreds of weights (parameters) to be calculated and
trained with a very limited number of examples could be really long. For instance, in [58], the
NN model constructed to predict an optimal separation of the analytes, was trained with 21
17
examples only to calculate 773 weights (the net architecture was determined to be three input
parameters, two hidden layers comprising 20 nodes each, and the 13 output parameters).
Problems with data representativity
The lack of model validation is often accompanied by the lack of representativity of the data,
used for model construction. For instance, in [59], Linear Discriminant Analysis is applied to
distinguish among the 12 classes of oils on the basis of the chromatographic data, where some
individual classes are represented by two or three samples only, and the model is, of course, not
validated. One does not need to be a chemometrician to realize that two or three samples are not
enough to draw any relevant conclusions about the class which they belong to. There are more
sources of a possible data variance than the number of samples used to estimate class variability.
As the constructed models are statistical in their nature, they ought to be built using the
representative data sets.
Unfair comparison of different methods
Let us add yet another example to our list of the popular abuses. We can find a number of papers
in which the performance of the two different modeling methods is compared. Quite often these
two methods are linear and non-linear. When applied to a particular data set, they can perform
differently, depending on complexity of the modeled data. If the problem at hand is non-linear,
then the non-linear method obviously outperforms the linear one. But this does not allow one to
draw any general conclusion that, e.g., "the study shows that ANN can give better prediction
results than MLR" [60] or that "The results obtained using ANNs were compared with the
experimental values as well as with those obtained using regression models and showed the
superiority of ANNs over regression models" [61]. The only founded conclusion can be such that
for the data studied, the non-linear modeling technique, e.g., NN, is necessary.
In the aforementioned paper we also found such intriguing statements as, e.g., the following one:
“Before training, the network was optimized for the number of nodes in the hidden layer, learning
rates and momentum. Then the network was trained using the training set to optimize the values
of weights and biases".
18
Similar "mysterious" things were found in paper [62], in which the authors managed to construct
a neural network that performed much worse than the multi-linear regression model.
Chemometrics approches as black-box tools
Although we are genuinely interested in popularization of chemometrics in different fields of

analytical chemistry, we do not recommend treating them as the black-box approaches. Carefully
studying the papers on applications of chemometrics to chromatography, we often found
sentences such, e.g., "the computer program calculated", or "the computer programs available
also do not differentiate among numerical values of logP for organic isomers" [63]. It sounds, as
if the authors personified in a way their computer programs and due to that, they could not bear
full responsibility for a failed outcome of these undisciplined creatures. It also sounds similar to
the sentence that "Mrs X measured concentration of calcium in the studied samples". It is not
important who measured concentration of the element of interest, but we need to know, how it
was measured. In the other words, it is necessary to describe the applied approach, its principles
and its input parameters.
Useless efforts
Relatively significant number of abuses can be found in the area of modeling the
chromatographic retention based on topological indices.
In J. Liq. Chrom & Rel. Technol., we found a series of papers, in which several well known
topological indices were intensely explored and several new topological indices were also
proposed. E.g., in [64], two new indices were introduced to differentiate between the L and D
amino acids, named as "optical topological indexes " and "valence optical indexes". Introduced
completely out of the blue, they have the same predictive power as any arbitrarily chosen pair of
the two numbers, higher for isomer L and lower for isomer D. Namely, vertices corresponding to
the symmetric atoms and the asymmetric atom in isomer L are denoted as '+δi', whereas the
respective asymmetric carbon atom in isomer D is denoted as '-δi'. The new index is defined as
multiplication of the distance matrix and vector z, and the elements of the vector z are just the
above mentioned δ parameters. This absolutely trivial trick of course leads to a lower value of the
19
proposed index for isomer D than for isomer L, thus differentiating between the two isomers.
Sounds very much like a time consuming nonsense.
Just to mention the other curious examples, in a few places in [56] (so it does not seem a printed
mistake) we found the negative value of R2, i.e. of the so-called coefficient of determination,
which is defined as the ratio of the explained variation to the total variation and by definition is
such that 0 < R 2 < 1. In the same paper there are, however, some other "mysterious" statements
as well, such as, e.g. "Model building using PLS was performed on the test set, and the external
validation sets using the selected descriptors plus BP and TE gave very good results". We can
only wonder what the authors did with their training set.
A similar "mystery" can be found in [65], where the term "qualitative correlation" is proposed to
describe correlation between the two objects (isomers) described by two parameters. Taking into
the account the fact that through the two points posed on the plane (i.e. in the two-dimensional
parameter space) we can always draw a straight line, we wonder what is a real meaning of this
"qualitative correlation", or of any other type of correlation.
We would like to encourage chromatographers to apply the chemometrics methods, but then for
everyone it ought to be obvious that any method ought to be applied properly.
In the same way as a repeated weighing of your sample on the unbalanced balance will not help
you to estimate its real mass, the wrongly applied chemometrics methods will not help you to
draw any reasonable conclusions.
To make sure, the aforementioned examples of abuse and/or misuse of the chemometrics
methods happen in any subfield of chemistry, chemometrics included.
ONGOIG CHALLENGES
Chromatography as an analytical technique is challenged by the analysis of complex biological

samples. Let us focus on LC-MS, which rapidly emerged as a method of choice for a large-scale
20
proteomic analysis. The LC-MS systems can be used to identify and evaluate relative abundance
of the thousands of molecules (in the proteomic profiling, the molecules in question are peptides
derived by proteolysis of the intact proteins). For instance, for a very complex sample – such as,
e.g., the blood sample – the peptide mixture is resolved by chromatographic separation prior to its
injection to mass spectrometer, so the data generated during analysis, consists of both, the unique
retention times and the m/z ratios of the individual peptides (see Fig. 13).
Fig. 13
In any description of the state-of-the-art LC-MS systems, it can be read that it is "an information
reach technique". This is of course true, but there is a long way from the massive experimental
data to a useful information about the complex biological systems and events studied. Data
analysis is rapidly becoming the major obstacle in conversion of experimental knowledge to valid
conclusions. The main problems are associated with the fact that the LC-MS systems are subject
to a considerable noise and to not fully characterized variability (see examples of the individual
mass chromatographic profiles, presented in Fig. 14), the elution time axis varies across the
different experiments (see Fig. 14, in which for an illustrative purpose two total ion
chromatograms are presented), the confounding overlap of peptides across the experimental
space is usually observed, and additionally, the differences in the overall sample composition can
effect signal intensities of the individual peptides [66].
Fig. 14
To address all these issues, many challenging tasks need to be resolved. Already at the stage of
data organization into a matrix form, we are faced with many problems. In raw form, the full-
scan spectra obtained from an LC-MS experiment consist of a table of the following values: a)
the scan number; b) the LC retention time, and 3) the ion abundance. Depending on resolution of
the MS instrument, retaining of all possible values can lead to an intractably huge matrix, so that
matrix representation of the LC-MS data usually involves the so-called binning nominal m/z
values. An optimal bin width should be large enough to keep the matrix tractable and not too
sparse, but small enough so that individual m/z values remain informative. Till now, however, no
21
methods have been reported for evaluation of an optimal bin width, or for determination of its
influence on calculations of the features.
Yet there are some other approaches that can eliminate the binning problem. Namely, the
variables in the data matrix can correspond to the peaks characterized by their retention time and
m/z. This type of data organization allows keeping high resolution of MS, but also requires
certain kind of peaks pre-selection, in order to avoid an enormous dimensionality of the data. In
any such approach signals enhancement, peaks detection and peaks alignment are unavoidable.
All these methods are involved in the data pre-processing step only, which is followed by data
exploration, e.g., by data classification, markers identification, model validation etc.
Chemometrics offers all possible tools for the start-to-finish approaches. All the methods for data
compression, clustering, visualization, feature selection, classification or discrimination, and
resolution, are at our disposal for the analysis of this type of the data, and the analytical chemists
ought to benefit from this fact.
It ought to be stressed that – depending on an overall strategy of data modeling – a different type
of preprocessing is required and with a good overview of the problem at hand only, the optimal
strategy can be chosen. For instance, if the pilot study suggests that classification can be
successfully performed on the total mass chromatographic profiles only, there is no need for the
enhancement of the individual mass spectra or the individual mass chromatograms. As in the total
mass chromatograms random noise is to a high degree reduced (by summation of many
individual mass chromatograms), only its background correction is necessary. Classification
based on the total mass chromatograms, extended by feature selection and perhaps by mixture
analysis also, allows our return to the pure mass spectra of the identified significant components
and our benefit from their original high resolution.
Several research groups have developed systems that address the main problems associated with
the LC-MS data processing (e.g., [67,68,69]), but all these systems are far from being flexible
enough to deal with the different problems associated with the
LC-MS data and moreover, they assume data organization in one out of many possible ways. The
optimal approach depends on the data structure and complexity, and it can not be defined a
priori. For each problem, the simplest and simultaneously the most stable approach ought to be
individually selected.
22
Another aspect of an effective modeling of the massive and noisy LC-MS data that needs to be
addressed is application of robust methods of data pretreatment and modeling. We have at our
disposal a growing number of the robust approaches (e.g., robust PLS [70], robust LSSVMs [71],
robust SIMCA [72], etc.), which allow construction of stable models well describing the data
majority, i.e., not influenced by the outlying observations.
Massive chromatographic data sets require an intensive chemometrical treatment.
CONCLUSIONS
Although there are analytical chemists who are somewhat critical about chemometrics, this
review demonstrates not only various advantages of data processing, but also a need for
mathematical evaluation of the data, in order to better understand the results in the area of an
increasing sample complexity.
As demonstrated in this review, chemometrics can efficiently deal with the signal enhancement
and with the other specific tasks associated with chromatographic signals. However, the reader
could already perceive that each of the presented methods requires optimization of the input
parameters and in the majority of cases it can not be run fully automatically. The choice of a
particular method and its parameters is by no means a trivial task. It requires understanding of the
method principles and of the meaning of the individual input parameters. What is relatively easy
for all those involved in chemometrics, usually seems too complicated for the others who are
involved in a particular sub-discipline of analytical chemistry.
For many approaches there is the software available in the public domain, yet this advantage does
not seem to highly influence popularity of the methods. Although the main goal of chemometrics
is to provide analytical chemists with efficient tools of data processing, it seems that these
methods are mainly used by chemometricians and we are still miles away from their daily use in
analytical laboratories. In today's laboratories there dominate methods that are implemented in
the software offered as a part of equipment of a given instrument.
23
Summing up, it seems that there is a huge gap between the communities of chemometricians and
of analytical chemists. Chemometricians are extending the arsenal of the possible approaches, but
the analytical community is not profiting of this fact to the extent it could.
Both communities ought to undertake some efforts and collaborate more efficiently.
Both communities could profit from a closer collaboration. There are many things to be learnt by
chemometricians about modern separation techniques and the problems inherently associated
with them, and there are many chemometrical approaches that could make the life of
chromatographers easier. Moreover, chemometricians should care more about popularization of
their approaches and publish more examples of practical application of the chemometrics
methods in the journals more analytical-chemistry-oriented, but chromatographers also ought to
make an effort to study basic principles of the chemometrics approaches.
The main message of chemometricians to chromatographers could sound like that:
Chemometrical methods are being invented to make your life easier.
24
Table 1 Results of keyword search in SCOPUS system, using a quick search ("keyword(s)" and
chromatography)
Lp. Keyword(s) Score

1 multivariate curve resolution 44
2 alternating least squares 34
3 MCR-ALS 18
4 Chemometrics 403
5 experimental design 605
6 multivariate analysis 275
7 pattern recognition 280
8 Classification 1029
9 PCA 556
10 QSPR 51
11 QSAR 111
12 topological indices 38
13 topological descriptors 11
14 modeling retention 5
15 Fingerprints 802
16 Clustering 219
17 peak shifts 16
18 deconvolution 244
19 background correction 21
20 denoising 9
21 noise reduction 17
22 signal enhancement 43
23 preprocessing 45
24 mixture analysis 82
25 alignment 383
26 warping 11
27 peak matching 18
28 peak detection 54
29 wavelets 32
5456
25
26
FIGURES CAPTIONS
Fig. 1 Components of analytical signal

Fig. 2 Illustration of multiresolution properties of the wavelet transform
Fig. 3 a) Mallat's pyramid algorithm for discrete wavelet decomposition (five levels
decomposition of the signal); b) final representation of the signal in the wavelet domain for five
levels of decomposition
Fig. 4 a) Original signal; b)signal in the wavelet domain; c) signal in the wavelet domain after
thresholding, and d) reconstructed signal
Fig. 5 Consecutive estimates of the background and the chromatogram after background
subtraction
Fig. 6 a) Original 2D gel electropherogram; b) estimated background, and c) 2D gel
electropherogram after background elimination
Fig. 7 Alignment of chromatograms: a) the original chromatograms before alignment, and b) the
chromatograms aligned by Correlation Optimized Warping
Fig. 8 Fuzzy warping of 2 D gel electropherograms
Fig. 9 Illustration of bi-linear chromatographic data. Columns of matrix X contain the
concentration profiles (chromatograms) and rows contain the spectral profiles
Fig. 10 The dissimilarity plots with marked maxima for a) the first component; b) the second
component, and c) the third component
Fig. 11 a) Concentration profiles; b) spectral profiles; and c) pure spectra of the two known
pesticides (fenitrothion and azinphos-ethyl)
Fig. 12 a) Concentration profiles and b) spectral profiles obtained by the Alternating Least
Squares approach with the non-negativity and unimodality constrains and the initial spectral
profiles found by means of the OPA approach
Fig. 13 Color image of the log values of the LC-MS data
Fig. 14 Examples of individual LC-MS time profiles
27
FIGURES
Fig. 1
Fig. 2
f(t) f(t) f(t)
time time time
Wavelet
Transform
frequency
Short time intervals for

high frequency events
Long time intervals for

low frequency events
time
28
Fig. 3
a)
b)
29
Fig. 4
a) b)
DWT
thresholding
c) c)
IDWT
time domain wavelet domain
30
Fig. 5
Intensity
Intensity
time time
Intensity
Intensity
time time
Intensity
Intensity
time time
31
Fig. 6
Fig. 7
a)
b)
32
Fig. 8
Fig. 9
wavevlengths
X
time
33
Fig. 10
a) c)
b)
Fig. 11
a) b) c)
34
35
Fig. 12
a) b)
Fig. 13
m/z
time
36
Fig. 14
IIC
time IIC
IIC time
IIC
time time
37
REFERENCES
[1] D.L. Massart, L. Buydens, Chemometrics in pharmaceutical analysis, Journal of

Pharmaceutical and Biomedical Analysis 6 (1988) 535-545
[2] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation,
IEEE Trans. Pattern Anal. Machine Intell. 11 (1989) 674-693
[3] D.L. Donoho, Denoising by soft thresholding, IEEE Trans. Information Theory 41 (1995)
613-627
[4] P.H.C. Eilers, A Perfect Smoother, Analytical Chemistry, 75 (2003) 3631-3636
[5] K. Kaczmarek, B. Walczak, S. De Jong, B.G.M. Vandeginste, Baseline reduction in two
dimensional gel electrophoresis images, Acta Chromatographica 15 (2005) 82-96
[6] A. Kassidas, J.F. MacGregor, P.A. Taylor, Synchronization of Batch Trajectories Using
Dynamic Time Warping, AICHE J. 44 (1998) 864-875
[7] P. Eilers, Parametric time warping, Analytical Chemistry 76 (2004) 404-411
[8] J. Forshed, I. Schuppe-Koistinen, S.P. Jacobsson, Peak alignment of NMR signals by means
of a genetic algorithm, Analytica Chimica Acta 487 (2003) 189-199
[9] B. Walczak, W. Wu, Fuzzy warping of chromatograms, Chemometrics and Intelligent
Laboratory Systems 77 (2005) 173-180
[10] N.-P. V. Nielsen, J.M. Carstensen, J. Smedsgaard, Aligning of single and multiple
wavelength chromatographic profiles for chemometric data analysis using correlation optimised
warping, Journal of Chromatography A 805 (1998) 17-35
[11] V. Pravdova, B. Walczak, D.L. Massart, A comparison of two algorithms for warping of
analytical signals, Analytica Chimica Acta 456 (2002) 77-92
[12] R. Bro, C. Andersson, H.A.L. Kiers, PARAFAC2 - Part II, Modeling chromatographic data
with retention time shifts, Journal of Chemometrics 13 (1999) 295-309
[13] B. Grung, O.M. Kvalheim, Retention time shifts adjustments of two-way chromatograms
using Bessel's inequality, Analytica Chimica Acta 304 (1995) 57-66
[14] K.M. Ǻberg, R.J.O. Torgrip, S.P. Jacobsson, Extensions to peak alignment using reduced set
mapping: classification of LC/UV data from peptide mapping, Journal of Chemometrics 18
(2004) 465-473
38
[15] R.J.O. Torgrip, K.M. Ǻberg, B. Kalberg, S.P. Jacobsson, Peak alignment using reduced set
mapping, Journal of Chemometrics 17 (2003) 573-582
[16] N.-P. Vest Nielsen, J. Smedsgaard, J.C. Frisvad, Full second-order
chromatographic/spectrometric data matrices for automated sample identification and component
analysis by non-data-reducing image analysis, Analytical Chemistry 71 (1999) 727-735
[17] G. Malmquist, R. Danielsson, Alignment of chromatographic profiles for principal
component analysis: a prerequisite for fingerprinting methods, Journal of Chromatography A 687
(1994) 71-88
[18] E. Reiner, L.E. Abbey, T.F. Moran, P. Papamichalis, R.W. Schafer, Characterization of
normal human cells by pyrolisis gas chromatography mass spectrometry, Biomedical Mass
Spectrometry 6 (1979) 491-498
[19] R. Andersson, M.D. Hämäläinen, Simplex focusing of retention times and latent variable
projections of chromatographic profiles, Chemometrics and Intelligent Laboratory Systems 22
(1994) 49-61
[20] D. Bylund, R. Danielsson, G. Malmquist, K.E. Markides, Chromatographic alignment by
warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid
Chromatography-mass spectrometry data, Journal of Chromatography A 961 (2002) 237-244
[21] K.M. Pierce, J.L. Hope, K.J. Johnson, B.W. Wright, R.E. Synovec, Classification of gasoline
data obtained by gas chromatography using a piecewise alignment algorithm combined with
feature selection and principal component analysis, Journal of Chromatography A 1096 (2005)
101-110
[22] K. Kaczmarek, B. Walczak, S. de Jong, B.G.M. Vandeginste, Feature based fuzzy matching
of 2D gel electrophoresis images, Journal of Chemical Information and Computer Sciences 42
(2002) 1431-1442
[23] B. Walczak, W. Wu, Fuzzy warping of chromatograms, Chemometrics and Intelligent
[24] W. Wu, M. Daszykowski, B. Walczak, B.C. Sweatman, S.C. Connor, J.N. Haselden, D.J.
Crowther, R.W. Gill, M.W. Lutz, Peak alignment of urine NMR spectra using fuzzy warping,
Journal of Chemical Information and Modeling 46 (2006) 863-875
39
[25] G. Vivó-Truyols, J.R. Torres-Lapasió, A.M. van Nederkassel, Y. Vander Heyden, D.L.
Massart, Automatic program for peak detection and deconvolution of multi-overlapped
chromatographic signals: Part I: Peak detection, Journal of Chromatography A 1096 (2005) 133-
145
[26] G. Vivó-Truyols, J.R. Torres-Lapasió, A.M. van Nederkassel, Y. Vander Heyden, D.L.
Massart, Automatic program for peak detection and deconvolution of multi-overlapped
chromatographic signals: Part II: Peak model and deconvolution algorithms, Journal of
Chromatography A 1096 (2005) 146-155
[27] F. Cuesta Sánchez, J. Toft, B. van den Bogaert, D.L. Massart, Orthogonal Projection
Approach applied to peak purity assessment, Analytical Chemistry 68 (1996) 79-85
[28] A. de Juan, B. van den Bogaert, F. Cuesta Sánchez, D.L. Massart, Application of the needle
algorithm for exploratory analysis and resolution of HPLC-DAD data, Chemometrics and
Intelligent Laboratory Systems 33 (1996) 133-145
[29] R. Tauler, D. Barceló, Multivariate curve resolution applied to liquid chromatography diode
array detection, Trends in Analytical Chemistry 12 (1993) 319-327
[30] M. Maeder, Evolving factor analysis for the resolution of overlapping chromatographic
peaks, Analytical Chemistry 59 (1987) 527-530
[31] M. Maeder, A. Zilian, Evolving factor analysis, a new multivariate technique in
chromatography, Chemometrics and Intelligent Laboratory Systems 3 (1988) 205-213
[32] O.M. Kvalheim, Y.-Z. Liang, Heuristic evolving latent projections: resolving two way
multicomponent data. 1. Selectivity, latent-projective graph, data scope, local rank, and unique
resolution, Analytical Chemistry 64 (1992) 936-946
[33] Y.-Z. Liang, O.M. Kvalheim, Heuristic evolving latent projections: resolving two way
multicomponent data. 2. Detection and resolution of minor constituents, Analytical Chemistry 64
(1992) 946-953
[34] B.G.M. Vandeginste, W. Derks, G. Kateman, Multicomponent self-modelling curve
resolution in high-performance liquid chromatography by iterative target transformation analysis,
Analytica Chimica Acta 173 (1985) 253-264
[35] W. Windig, J. Guilment, Interactive self-modeling mixture analysis, Analytical Chemistry
63 (1991) 1425-1432
40
[36] E.M. Malinowski, Widow factor analysis: theoretical derivation and application to flow
injection analysis data, Journal of Chemometrics 6 (1992) 15-43
[37] F.C. Sánchez, B. van den Bogaert, S.C. Rutan, D.L. Massart, Multivariate peak purity
approaches, Chemometrics and Intelligent Laboratory Systems 34 (1996) 139-171
[38] R. Tauler, S. Lacorte, D. Barceló, Application of multivariate self-modeling curve resolution
to the quantitation of trace levels of organophosphorus pesticides in natural waters from
interlaboratory studies, Journal of Chromatography A 730 (1996) 177-183
[39] A. de Juan, R. Tauler, Chemometrics applied to unravel multicomponent processes and
mixtures. Revisiting latest trends in multivariate resolution, Analytica Chimica Acta 500 (2003)
195-210
[40] Y.-Z. Liang, P. Xie, K. Chan, Quality control of herbal medicines, Journal of
Chromatography B 812 (2004) 53-70
[41] M. Wasim, R. Brereton, Application of multivariate curve resolution to on-flow LC-NMR,
Journal of Cheomatography A 1096 (2005) 2-15
[42] C.G. Fraga, C.A. Corley, The chemometric resolution and quantification of overlapped
peaks form comprehensive two-dimensional liquid chromatography, Journal of Chromatography
A 1096 (2005) 40-49
[43] M. Jalali-Heravi, B. Zekavat, H. Sereshti, Characterization of essential oil components of
Iranian geranium oil using gas chromatography-mass spectrometry combined with chemometric
resolution techniques, Journal of Chromatography A 1114 (2006) 154-163
[44] C.G. Fraga, Chemometric approach for the resolution and quantification of unresolved peaks
in gas chromatography-selected-ion mass spectrometry data, Journal of Chromatography A 1019
(2003) 31-42
[45] W. Windig, W.F. Smith, W.F. Nichols, Fast interpretation of complex LC/MS data using
chemometrics, Analytica Chimica Acta 446 (2001) 467-476
[46] E. Peré-Trepat, S. Lacorte, R. Tauler, Solving liquid chromatography mass spectrometry
coelution problems in the analysis of environmental samples by multivariate curve resolution,
Journal of Chromatography A (1096) 111-122
41
[47] R. Tauler, D. de Almeida Azevedo, S. Lacorte, P. Viana, D. Barceló, Organic pollutants in

surface waters from Portugal using chemometric interpretation, Environmental Technology 22
(2001) 1043-1053
[48] H. Idborg, L. Zamani, O.-O. Edlung, I. Schuppe-Koisrinen, S.P. Jacobsson, Metabolic
fingerprinting of rat urine by LC/MS. Part 2. Data pretreatment methods for handling of complex
data, Journal of Chromatography B 828 (2005) 14-20
[49] P. Jonsson, A.I. Johansson, J. Gullberg, J. Trygg, J.A. Bjørn Grung, S. Marklund, M.
Sjöström, H. Antti, T. Moritz, High-throughput data analysis for detecting and identifying
differences between samples in GC/MS-based metabolomic analyses, Analytical Chemistry 77
(2005) 5635-5642
[50] Raymundo Sánchez-Ponce, S.C. Rutan, Steady state kinetic model constraint for
Multivariate Curve Resolution-Alternating Least Squares analysis, Chemometrics and Intelligent
[51] D. Lincoln, A.F. Fell, N.H. Anderson, D. England, Assessment of chromatographic peak
purity of drugs by multivariate analysis of diode-array and mass spectrometric data, Journal of
Pharmaceutical and Biomedical Analysis 10 (1992) 837-844
[52] S. Nigam, M. Stephens, A. de Juan, S.C. Rutan, Characterization of the polarity of reversed-
phase liquid chromatographic stationary phases in the presence of 1-propanol using
solvatochromism and multivariate curve resolution, Analytical Chemistry 73 (2001) 290-297
[53] S. Nigam, A. de Juan, V. Cui, S.C. Rutan, Characterization of reversed-phase liquid
chromatographic stationary phases using solvatochromism and multivariate curve resolution,
Analytical Chemistry 71 (1999) 5225-5234
[54] K. Heberger, K. Milczewska, A. Voelkel, Principal component analysis of polymer–solvent
and filler–solvent interactions by inverse gas chromatography, Colloids and Surfaces A:
Physicochemical and Engineering Aspects 260 (2005) 29-37
[55] J. Yang, G. Xu, H. Kong, Y. Zheng, T. Pang, Q. Yang, Artificial neural network
classification based on high-performance liquid chromatography of urinary and serum
nucleosides for the clinical diagnosis of cancer, J. Chromatography B 780 (2002) 27-33
[56] O. Farkas, K. Heberger, I.G. Zenkevich, Chemometrics and Intelligent Laboratory Systems
72 (2004) 173-184
42
[57] T. Vasiljevic, A. Onjia, D. Cokesa, M. Lausevic, Optimization of artificial neural network

for retention modeling in high-performance liquid chromatography, Talanta 64 (2004) 785-790
[58] P. Zakaria, M. Macka, P.R. Haddad, Mixed-mode electrokinetric chromatography of
aromatic bases with two pseudo-stationary phases and pH control, J. Chromatography A, 997
(2003) 207-218
[59] A. Jakab, K. Heberger, E. Forgacs, Comparative analysis of different plant oils by high-
performance liquid chromatography – atmospheric pressure chemical ionization mass
spectrometry, J. Chromatography A 976 (2002) 255-263
[60] H. Li, Y. Xiong Zhang, L. Xu, The Study of the relationship between the new topological
index Am and the gas chromatographic retention indices of hydrocarbons by artificial neural
network, Talanta 67 (2005) 741-748
[61] M.H. Fatemi, Quantitative structure-property relationship studies of migration index in
microelusion electrokinetic chromatography using artificial neural networks, J. Chromatography
A 1002 (2003) 221-229
[62] K. Brudzewski, A. Kesik, K. Kolodziejczyk, U. Zborowska, J. Ulaczyk, gasoline quality
prediction using gas chromatography and FTIR spectroscopy: An artificial intelligence approach,
Fuel 85 (2006) 553-558
[63] M. Stefaniak, A. Niestroj, J. Klupsch, J. Sliwiok, A. Pyka, Use of RP-TLC to determine the
logP values of isomers of organic compounds, Chromatographia 62 (2005) 87-89
[64] A. Pyka, Topological indices for evaluation of the separation of D and L amino acids by
TLC, J. Liq. Chrom. & Rel. Technol. 22 (1999) 41-50
[65] A. Pyka, K. Bober, On the importance of topological indices in research α- and
γ-terpinene as well as α- and β-pinene separated by TLC, J. Liq. Chrom. & Rel. Technol. 25
(2002) 1301-1315
[66] J. Listgarten, A. Emili, Statistical and computational methods for comparative proteomic
profiling using liquid chromatography-tandem mass spectrometry, Molecular & Cellular
Proteomics 4 (2005) 419-434
[67] P. Kearney, P. Thibault, Bioinformatics meets proteomics – bridging the gap between mass
spectrometry data analysis and cell biology, J. Bioinform. Comput. Biol 1 (2003) 183-200
43
[68] W. Wang, H. Zhou, H. Lin, S. Roy, T.A. Shaler, L.R. Hill, S. Norton, P. Kumar, M.
Anderle, C.H. Becker, Quantification of proteins and metabolites by mass spectrometry without
isotopic labeling or spiked standards, Anal. Chem. 75 (2003) 4818-4826
[69] D. Radulovic, S. Jelveh, S. Ryu, T.G. Hamilton, E. Foss, Y. Mao, A. Emil, Informatics
platform for global proteomic profiling and biomarker discovery using liquid-chromatography-
tandem mass spectrometry, Mol. Cell. Proteomics 3 (2004), 984-997
[70] S. Serneels, C. Croux, P. Filzmoser, P. J. Van Espen, Partial Robust M-regression,
Chemometrics and Intelligent Laboratory Systems 79 (2005) 55-64
[71] S. Serneels, Linear robust classification by the robust least squares support vector classifier,
Journal of Chemometrics, submitted
[72] K. Vanden Branden, M. Hubert, Robust classification in high dimensions based on the
SIMCA method, Chemometrics and Intelligent Laboratory Systems 79 (2005) 10-21
44

Use and Abuse of Chemometrics in Chromat

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Use and Abuse of Chemometrics in Chromat

Uploaded by

Copyright:

Available Formats

Published in Trends in Analytical Chemistry 25 (2006) 1081-1096;

USE AND ABUSE OF CHEMOMETRICS IN CHROMATOGRAPHY

Michał Daszykowski and Beata Walczak

Institute of Chemistry, The University of Silesia,

mixture analysis, signals alignment, warping, denoising, wavelets

WHAT CHEMOMETRICS CAN OFFER CHROMATOGRAPHY?

Quality of chromatogram(s) determines final results of the chromatographic analysis and a

Chromatogram enhancement can be achieved by eliminating the noise and background

σ = median (| d1 |) / 0.6745 Eq. 2

Chosing the weights in the asymmetric way:

Pitfalls of the presented approaches

Alignment of chromatograms by means of warping techniques

Correlation Optimized Warping

Deconvolution of chromatographic signals

Deconvolution is used to enhance selectivity of a certain chromatographic technique, when an

Mixture Analysis (MA)

Nowadays the so-called hyphenated chromatographic techniques became a standard analytical

Initial estimation of the pure concentration and of spectral profiles

Orthogonal Projection Approach

d = det (YTY) Eq. 8

Alternating Least Squares

MISUSE OF THE CHEMOMETRICS APPROACHES IN CHROMATOGRAPHY

Typical examples of misuse and abuse of the chemometrics methods

Application of improper methods

Improper (or lack of) model validation

Problems with data representativity

Unfair comparison of different methods

Chemometrics approches as black-box tools

Although we are genuinely interested in popularization of chemometrics in different fields of

Chromatography as an analytical technique is challenged by the analysis of complex biological

Massive chromatographic data sets require an intensive chemometrical treatment.

The main message of chemometricians to chromatographers could sound like that:

Chemometrical methods are being invented to make your life easier.

Lp. Keyword(s) Score

Fig. 1 Components of analytical signal

f(t) f(t) f(t)

time time time

Short time intervals for

Long time intervals for

time domain wavelet domain

[1] D.L. Massart, L. Buydens, Chemometrics in pharmaceutical analysis, Journal of

[47] R. Tauler, D. de Almeida Azevedo, S. Lacorte, P. Viana, D. Barceló, Organic pollutants in

[57] T. Vasiljevic, A. Onjia, D. Cokesa, M. Lausevic, Optimization of artificial neural network

You might also like