You are on page 1of 9

Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349

Contents lists available at ScienceDirect

Journal of Pharmaceutical and Biomedical Analysis


journal homepage: www.elsevier.com/locate/jpba

Review

The analytical process to search for metabolomics biomarkers


M.D. Luque de Castro a,b,c,∗ , F. Priego-Capote a,b,c
a
Department of Analytical Chemistry, Annex Marie Curie Building, Campus of Rabanales, University of Córdoba, E-14071, Córdoba, Spain
b
Institute of Biomedical Research Maimónides (IMIBIC), Reina Sofía University Hospital, University of Córdoba, E-14004, Córdoba, Spain
c
CIBER Fragilidad y Envejecimiento Saludable (CIBERfes), Instituto de Salud Carlos III, Spain

a r t i c l e i n f o a b s t r a c t

Article history: The scant number of available metabolomics biomarkers does not reflect the extent of the research in
Received 15 June 2017 this field. Looking for the reasons of failure, the authors, as analytical chemists, critically discuss each of
Received in revised form 19 June 2017 the steps in the analytical process that requires improvements. They find insufficient information about
Accepted 19 June 2017
how the experimental part is developed. After revising the steps from sampling to obtainment of the
Available online 6 July 2017
analytical sample (from typical samples such as blood and urine to others less common such as sweat or
saliva), the need for data and metadata for either reproduction of a given study or for taking the study as
Keywords:
starting point after biomarker discovery is criticized. The separation and analysis steps are also revised
Metabolomics
Biomarkers
as does data treatment. After the sources of errors from the analytical process are overcome, subsequent
Sample preparation steps in the implementation of biomarkers to reach the final aim of clinical adoption should be supported
Blood as required.
Urine © 2017 Elsevier B.V. All rights reserved.
Saliva
Sweat
Exhaled breath condensate

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
2. From sampling to the analytical sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
3. Sample preparation dependence on the subsequent analytical strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
4. Analysis in the search for metabolomics biomarkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
5. Statistical analysis in the search for metabolomics biomarkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
6. Foreseeable and desirable trends in the search for metabolomics biomarkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

1. Introduction

It is each time more accepted that the majority of our chronic


diseases (e.g., diabetes, cardiovascular diseases, cancer) are poly-
Abbreviations: CA, clustering analysis; EBC, exhaled breath condensate; KNN, genic [1]. This seems to be the reason why approaches such as
k-nearest neighbors algorithm; LC, liquid chromatograph; MS, mass spectrometry;
MSTUS, Mass Spectrometry Total Useful Signal; NMR, nuclear magnetic resonance;
genome-wide association studies have so far explained only a
O-PLS, orthogonal partial least squares; PCA, principal component analysis; PLS- small number of these diseases, and contributed also in small
DA, partial least squares discriminant analysis; QC, quality control; ROC, receiver proportion to intervention strategies based on the involved mecha-
operating characteristic; SIMCA, soft independent modelling class analogy; SPE, nisms [2,3]. The necessity to test the relationship between genetics
solid-phase extraction; TOF, time-of-flight.
∗ Corresponding authors at: Department of Analytical Chemistry, Annex Marie
and xeno-factors (e.g., diet, the gut microbiome, drugs, physi-
Curie Building, Campus of Rabanales, 14071 Córdoba, Spain.
cal activity) makes almost mandatory to use metabolomics as a
E-mail addresses: qa1lucam@uco.es (M.D. Luque de Castro), q72prcaf@uco.es bridge, from which also proteomics and transcriptomics can ben-
(F. Priego-Capote). efit. The key role of the “omics of the small molecules” (as is

http://dx.doi.org/10.1016/j.jpba.2017.06.073
0731-7085/© 2017 Elsevier B.V. All rights reserved.
342 M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349

also known metabolomics) is a consequence of both the loca- (b) Biomarkers as such or features indicative of disease states,
tion of the metabolome in the flowchart of biological information responses to therapeutic treatments, or other relevant biological
(metabolome is the “ome” closest to the phenotype) and the nature states.
of these small molecules (the metabolites, which are endowed with (c) Biosignatures or collection of features, which together define
well known characteristics as compared with the objects of older given biomarkers.
omics). (d) Risk biomarkers that identifies patients who are likely to develop
In fact, metabolites are the most reliable indicators of phe- given diseases.
notypes, located downstream of genomic, transcriptomic and (e) Diagnostic biomarkers, for detection of early disease state, clas-
proteomic changes; therefore, metabolomic changes are frequently sification into disease subtypes or characterization of response
more significant than those taking place at upstream levels, and to treatment.
they are fast indicators of either non-homeostasis state and adap- (f) Prognostic biomarkers, for prediction of disease progression, pre-
tation of the organism to a new homeostasis state [4]. In addition, diction of disease recurrence or identification of patients who
metabolomics data can easily be translated across species because are likely to respond to a treatment.
most metabolic pathways are hardly altered through evolution,
which is crucial to interrelate findings from experiments with lab-
After establishment of solid definitions and a wide development
oratory animals and human studies [5]; even more, most of the of research in this field for more than a decade, it must be admit-
metabolic processes in the body, such as energy metabolism and
ted we cannot name a single resulting metabolomics biomarker
amino acid catabolism, are common to all living cells.
routinely used in the clinical field apart from those used for many
Also the fact that metabolomics makes use of a variety of easily years in typical clinical tests (creatinine or glucose in blood, and uric
accessible biological samples such as blood, urine, sweat, etc. is a acid, chloride, ammonia, creatine or glucose in urine). This situation
favorable aspect to reduce cost and analysis time, thus facilitating can be explained by the lack of either analytical validation or any
application of a high number of samples and contributing to an other step towards clinical implementation. Looking for the reasons
assessed study as a result. for failure authors working in this field have revised the different
Hence, metabolomics analysis offers a sensitive measure of bio- steps involved in the obtainment of clinically accessible biomark-
logical status in health or disease as some metabolic pathways are ers with clinical utility, and informing clinical decision-making to
up- or down-regulated in diseased cells. The altered metabolic fin- improve patient outcomes [22,23]. Thus, the steps from biomarker
gerprints, which are unique to every individual, offer novel avenues discovery, biomarker assay development and analytical validation,
to better understand systems biology, detect or identify potential validation of clinical utility, to clinical implementation have been
risks for various diseases and ultimately help to achieve the goal of reviewed [24] and they appeared in Fig. 1. Recently, more steps have
“personalized medicine” [6]. been included by considering not only the metabolites by them-
Broadly, metabolomics investigations focused on human dis- selves, but also the pathways in which they are involved and the
eases can be grouped into three categories: mechanisms from which they result. The publication by Jonhson
et al. [25] provides a detailed workflow of what the authors con-
sider – and it should be – a holistic approach of the involved steps,
(i) A major group of investigations is focused on understanding
as shows Fig. 2. In this figure pathways to relate the metabolites to
the molecular basis of pathogenesis of diseases and identifying
each other and within interconnected biological pathways, as well
altered metabolic pathways.
as the combination of metabolomic, orthogonal biological analy-
(ii) The second group is focused on identifying metabolite
sis and isotope-assisted deciphering of pathways have been taken
biomarkers to classify diseases with sensitivity and specificity
into account. Nevertheless, a step, considered as insignificant for
as high as possible. The number of such studies is increasing
most of the researchers working in this field – the analytical pro-
exponentially owing to the urgent need for discovering sensi-
cess that starts with sampling and finishes with data treatment to
tive biomarkers that can improve disease diagnostics, mainly
provide the information for “untargeted metabolomics” or “high-
cancer biomarkers (e.g., breast cancer biomarkers based on
throughput profiling” in Fig. 2 – is crucial as its development has a
investigations of breast tissue metabolites) [7–10]; prostate
cascade repercussion on the final results. No person pays attention
cancer biomarkers supported on low levels of citrate concen-
to a very complete reproduction of all the steps and sub-steps of
tration [11,12]; ovarian carcinoma biomarkers, including those
the analytical process, such as sampling, deproteination, centrifu-
involved in purine, pyrimidine and glycerolipid metabolism
gation, sample preparation, and so on. A critical discussion of these
[13]; colon cancer biomarkers that include lactate, pyruvate,
steps is below.
malic acid and long-chain polyunsaturated fatty acids [14,15];
lung cancer biomarkers using samples as different as urine
[16], exhaled breath condensate [17,18] or sweat [19,20]. 2. From sampling to the analytical sample
(iii) The third group of investigations describes translational
opportunities and applications in metabolomics. These inves- Establishment of an appropriate experimental design and use
tigations are far few, but they are also increasing exponentially of patient questionnaires with subsequent population stratification
[21]. are mandatory prior to sampling. This is one of the critical steps to
guarantee the success in biomarkers discovery. Control of external
variability factors such as drug administration or the identification
As being the subject matter of this review, the different types of of exclusion criteria is crucial to associate metabolite patterns to
biomarkers deserve to be defined, as they can be: pathological states.
Concerning sampling, the physical state of the sample condi-
tions this and subsequent steps leading to obtain the analytical
(a) Features, that consists of measurable biological components sample. This last is the fraction of the original or raw sam-
(e.g., a metabolite or more commonly a panel of metabolites) or ple that remains after the different necessary steps – e.g.,
state of components (e.g., a primary, secondary defense metabo- solid–liquid extraction, protein precipitation, centrifugation, solid-
lite or that resulting from gene demethylation) that can be phase extraction (SPE) or liquid–liquid extraction –, and is ready for
analyzed as a candidate biomarker. being introduced into either the detector or high-resolution equip-
M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349 343

Fig. 1. Conventional scheme for biomarkers discovery with special emphasis on statistical analysis and validation.

Fig. 2. Metabolomics workflow for biomarkers discovery. The workflow involves metabolomics analysis using two different approaches: untargeted analysis or high-
throughput profiling, pathways analysis, validation, orthogonal biological analysis and mechanistic interpretation by connection to the phenotype. Reproduced with
permission of Macmillan Publishers Limited, Ref. [25].

ment such as a gas or liquid chromatograph on-line coupled to the them characterized for a less invasive and simple sample retrieval
detector (usually a mass spectrometer). than tissue and providing a representative analysis or a ‘snapshot’ of
The types of samples usual in metabolomics studies encompass metabolism [26]. However, the concentrations to what metabolites
mainly biofluids such as blood (either plasma or serum) and urine, are present in biofluids as compared to tissue extracts may be disad-
and in a lesser proportion saliva, tears, sweat or even exhaled breath vantageous. Other samples no frequent in metabolomics studies are
(either condensate or constituted only by volatile components); all
344 M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349

cerebrospinal fluid [27], seminal fluid [28] or even bronchoalveolar step is developed, including all data and metadata that can con-
lavage fluid [29]. tribute to create a work protocol as reproducible as possible.
The main problems to which the biomarkers researcher faces Semen is also an infrequent sample in metabolomics, mainly
from sampling to obtainment of the analytical sample are related to devoted to infertility studies [41], with a recent and interesting
the absence in the publications of crucial information (sometimes contribution to comprehensive metabolites identification by both
consisting of metadata) that hinders no only to take the results nuclear magnetic resonance (NMR) and mass spectrometry (MS)
in the given publication as the starting point to support subse- [28]. While the latter contribution provides wide information that
quent research, but even to reproduce the experiments in it. By allows almost complete reproduction of the experiments, the infor-
discussing the steps involved from sampling of typical samples to mation on the sample containers (sterile containers) does not
obtainment of the analytical sample, examples of what must be provide enough data to check possible interaction between the con-
thoroughly explained and what results from previous publications tainer’s wall and some potential interesting metabolites. Harder
must be taken into account are given below. to reproduce is the preliminary study by Chen et al. on seminal
Blood is the most common biofluid in metabolomics [30], jus- plasma from infertile males [42] as the authors do not provide
tified by its minimally invasive sampling [31], homogeneity as information about sample containers, storage, etc. In addition, the
compared to urine, sweat or saliva; all them strongly influenced authors express the conditions for centrifugation as rpm, which
by the collected volume [32]. The type of sample obtained from is not enough to reproduce the experiment and poorly express the
blood – plasma or serum – depends on allowing or blocking effectiveness of the separation achieved in this step that the authors
coagulation, respectively. The impact of the different commer- consider a “purification step”.
cial tubes for collection of serum or plasma on variations in the Tissue samples are subjected to multiple steps to obtain the tar-
concentration of the metabolites (both primary and secondary get analytical sample. Information about each of the steps should
metabolites) in plasma or serum has been widely investigated be better established after discovery of metabolomics biomarkers
[33,34]; nevertheless, the type of collection tube used for sample for appropriate biomarkers validation, clinical validation and clin-
collection is rarely included under “experimental”. This infor- ical assay. Most of the sampling and sample preparation protocols
mation can be crucial even in dealing with metabolites from lack of the required data for reproduction of a given experiment
the same precursor (vitamin D and its mono- and dihydrox- [38,43].
ymetabolites) for which very different behavior has been reported Two samples that deserve separate discussion as sources of lung
depending on the given metabolite when serum or plasma were cancer biomarkers are exhaled breath condensate (EBC) [44] and
used as sample. Thus, statistical analysis revealed similar phys- sweat [45]. Despite the excellent values of sensitivity and selec-
iological levels for vitamin D3 , 24,25-dihydroxyvitamin D3 and tivity they have provided [18,20,46,47], implementation of the
25-dyhydroxyvitamin D3 in serum and plasma, but significant subsequent steps of validation, etc., makes mandatory better and
different levels for 1,25-dihydroxyvitamin D3 [35]. Also sample reproducible sampling devices and protocols, checking of contain-
preparation has demonstrated to be crucial for some vitamin ers, and exhaustive stability studies, among others.
D3 metabolites as is the case with 1,25-dihydroxyvitamin D3 : Duplicate efforts in reproducibility are required when a
this metabolite was not detected after deproteination, but it was biomarker panel is based on two different samples, as is the case
quantified after SPE [35]. Thus, closely adoption of the optimum with saliva and tissue samples that provide an integrated panel for
procedure in all steps from sampling to obtainment of the analytical oral cancer [38]. After entering in the validation step, an exhaus-
sample is mandatory. tive study of sample stability, and target metabolites features (such
Urine is also a common biofluid in biomarker discovery by as thermolability and/or volatility that require modification of the
metabolomic approaches [36]. Starting with sampling, urine can working conditions to ensure stability) is mandatory.
be collected as random samples, timed samples, 24-h samples, Especial working conditions and tools are required in dealing
and even longer sampling times when searching for biomarkers with small sample volumes, as is the case with EBC, sweat or saliva
of pharmacokinetics behavior [36]. Unlike serum or plasma, urine samples. Solid-phase extraction (SPE) using centrifugal micro-spin
does not require containers with special characteristics as usu- cartridges with different sorbents as a function of the metabolites
ally bare polypropylene containers of the required volume suffice. characteristics or sample lyophilization have shown to be useful
Critical discussions involve the doubtful necessity of a quenching as sample preparation steps for small sample volumes [48]. The
step, preservative addition, volume correction by normalization protocols required in these cases are even more restrictive as the
using the sometimes controversial approaches, pH adjustment, errors increase by decreasing the sample volume involved.
deproteination, stability and storage conditions, among the most
influential. All these steps are commonly detailed in the exper-
imental section, but never the material from which the sample 3. Sample preparation dependence on the subsequent
container is made has been specified in this section, despite non- analytical strategy
specific adsorption to surfaces causes metabolites losses; therefore,
to ensure the use of containers of the same type is not a trivial The three metabolomics strategies – fingerprinting, untargeted
aspect. The presence of urinary sediment in the urine sample has and targeted analyses – can be used in the way from biomarkers
led to spurious results in the sarcosine–creatinine ratio related to discovery to clinical use according to the available instrumenta-
urine supernatant [37]; a fact that makes of paramount importance tion and methodologies. Each strategy requires different sample
to report the g value in centrifugation steps. preparation. As they must be used in the following order: finger-
Saliva is a low frequent sample in the search for metabolomics printing and/or untargeted analysis and, then, targeted analysis,
biomarkers, despite salivary metabolites may be transferred into also discussion of the specific sample treatment follows the order.
saliva by different cells in the oral cavity and salivary glands, includ- Metabolomics fingerprinting uses a global screening approach to
ing cancer cells if present [38]. Saliva is at present successfully classify samples in terms of metabolite patterns (fingerprints) that
emerging as a tool to diagnose or screen oral cancer [38], after change in response to a specific factor. This approach is frequently
previous application for leukoplakia and Sjogren’s syndrome stud- used prior to a more comprehensive strategy to find discrimina-
ies [39,40]. Subsequent research to validate promising biomarkers tion patterns among groups of samples. This preliminary approach
requires a solid establishment of the conditions under which each is performed by NMR, MS or even infrared or Raman spectroscopy,
which leads to a low identification rate in dealing with complex
M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349 345

samples as compared to untargeted metabolomics analysis. How- as selective as possible to the metabolite(s) that constitute the
ever, metabolomics fingerprinting with an NMR detector based selected panel. Preconcentration and cleanup steps are usually
on direct analysis of samples also contributes to biomarkers dis- involved in sample preparation for targeted metabolomics analysis.
covery with key benefits such as direct analysis in most cases, The main causes of failure can be attributed to insufficient
high-throughput in 1 H NMR, and non-destructive analysis, of espe- cleanup of the sample that results in ionization suppression. While
cial interest in dealing with valuable samples or microsamples [49]. this effect can be overcome by the use of isotopically labeled stan-
Some examples of the use of this approach are the study carried dards when the ionization suppression phenomenon is partial [52],
out by Wu et al. to find candidate biomarkers of functional dys- the absence of signals by total ionization suppression makes useless
pepsia [50] and that developed by Paixao de Santana-Filho et al. to the practice of labeled isotopic standards.
find melanoma biomarkers by analysis of murine melanocyte and An optimal way for development of sample preparation
melanoma cell lines [51]. It is worth mentioning that despite NMR when the sample volume is small – particularly for targeted
fingerprinting metabolomics is useful for biomarkers discovery, it metabolomics analysis – is by connection with the subsequent
needs generally a confirmatory analysis before validation to obtain step (either detection or high-resolution separation). In this way,
a more comprehensive view of the biological effect–biomarker rela- the whole analytical sample resulting from sample preparation is
tionship. inserted into the detector or chromatograph, respectively [53,54].
Untargeted metabolomics analysis, global metabolomics analysis
or metabolomics profiling aims at detecting/identifying the broad-
est number of metabolites present in an analytical sample without a 4. Analysis in the search for metabolomics biomarkers
priori knowledge of the metabolome (which is particularly achieved
when several analytical platforms are combined). This approach While analysis has traditionally been the name applied to the
assumes that the significant metabolites are by definition unknown whole analytical process, presently it is applied to the process
prior to analysis, as are their physico-chemical characteristics, occurring after insertion of the analytical sample either into the
which usually encompass a wide variability range. The main aims of detector or the high-resolution equipment online connected to the
the sample preparation step in this case are therefore to keep intact detector.
the sample as much as possible and to avoid fractionation and/or Direct insertion of the analytical sample into the detector to
losses. Nevertheless, potentially important diagnostic differences search for metabolomics biomarkers has mainly involved NMR
might remain undetected due to confounding factors (e.g., instabil- instruments [27,38], with very few cases dealing with connec-
ity of the analytical system, chemical and biochemical differences tion of the instrument to a liquid chromatograph (LC) [55,56]. On
that are unrelated to the condition under study). the contrary, mass detectors have traditionally been connected to
An ideal sample preparation procedure to be subsequently used chromatographs and in a lesser extent to capillary electrophore-
to search for metabolomics biomarkers should be: (i) unselec- sis devices. The type of analyzer differs as a function of the
tive, by avoiding from the very beginning preferential metabolites metabolomics strategy, being time-of-flight (TOF) and Orbitrap
that can lead to ignoring some others which should be key or analyzers (usually provided with a quadrupole unit) involved in
complementary biomarkers of the final panel; (ii) simple and fast profiling analysis, while the triple quad is the preferred choice for
with the minimal number of steps as the increase of them also quantitation and confirmatory analysis. In this way, metabolites
increases losses of potential biomarkers, with danger of lowering can be tentatively identified with a resolution that is a function
the concentration to undetected values, or even total removal; (iii) of the equipment used. After checking the usefulness of the pan-
reproducible that requiring the thoroughly application of the proto- els (viz., their appropriate values of both sensitivity and specificity)
cols without forgetting metadata (e.g., the g value if centrifugation the metabolites involved in them are absolutely identified by the
is used, ultrasound frequency, power and duty cycle if this type corresponding standards, providing they are commercial or easy
of energy is applied); and, (iv) including a metabolism-quenching to synthesize. After selection of a given panel with the consequent
step, if required. In short, the sample preparation procedure prior reduction of the number of analytes under study, application of
to untargeted metabolomics should transform the sample repro- a targeted approach is desirable after optimization of the sam-
ducibly into a format compatible with the given analytical platform ple preparation procedure and the analytical platform. In this
while maintaining the original metabolite composition of the sam- way, the accuracy, repeatability, and intra-days and inter-day vari-
ple as much as possible. The main risk of failure comes from: (a) ance can be evaluated, followed by clinical validation. This last
the removal or losses of key metabolites by inappropriate prepa- step should involve a number of samples as high as possible and
ration of the analytical sample; (b) the wide diversity of chemical from patients with different disease development. The ideal case
families of metabolites, which makes impossible to obtain a com- for a metabolomics biomarker (either a panel or an individual
plete metabolomics profile with only one analytical method; (c) the biomarker) is that in which the alteration of the involved metabo-
detection of significant metabolites present at different concentra- lite(s) is detected at the very incipient state of the disease, thus
tion levels in the same sample, which makes difficult to determine allowing an early diagnostic.
them in a unique analysis (d) the integration of data provided by Aspects that can jeopardize the desired development of
combination of methods including different techniques; (d) the lack metabolomics biomarkers attributable to the analysis step are
of databases offering information on many metabolites. related to: (i) the sensitivity of the detector that can be not enough
Targeted metabolomics analysis aims at qualitative and quantita- to reach the detection limit of the target metabolites (e.g., the
tive studying one metabolite or, more frequently, a small group amount of sperm cells necessary for gas chromatography (GC)–MS
of chemically similar metabolites. This analysis provides higher analysis is 15 millions, but 150 millions are needed for a 1 H NMR
sensitivity and selectivity than its untargeted counterpart, but the experiment [28]). (ii) The absence of quality control (QC) samples
target metabolites are analyzed on the basis of a priori information. that should be periodically analyzed throughout the analytical run
Therefore, the physico-chemical characteristics of the metabo- to check the reproducibility of the data produce by the instrument
lites are known and an exhaustive separation of them from the used; thus providing quality assessment for each metabolite peak.
matrix is usually required for a quantitation as free of interferents (iii) the lack of selectivity to discriminate isomers when only one of
as possible. Thus, in addition to requirements common to untar- them possesses biomarker capability. (iv) the unavailability of com-
geted metabolomics analysis (viz., reproducibility and quenching), mercial standards of the biomarkers for preparation of calibration
the targeted approach requires a sample preparation procedure models, applicable also to internal standards, isotopically labeled
346 M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349

Fig. 3. Random forest analysis provided by analysis of EBC from current smokers (CS) and never smokers (NS). Reproduced with permission of Nature, Ref. [18], http://
creativecommons.org/licenses/by/4.0/.

or not. (v) the presence of matrix effects with direct influence on accurate enough to identify the appropriate normalization method
accuracy and, therefore, on the final results. for some datasets. However, normalization methods influenced
relevantly the kind and number of potential biomarkers. These
results led the authors to propose a two-part strategy to deter-
5. Statistical analysis in the search for metabolomics
mine the suitability of the normalization method. The first part
biomarkers
was targeted at understanding comprehensively the compatibil-
ity between data and normalization approaches, while the second
Apart from the experimental plan and the analytical method,
part was used to handle biomarkers by comparison of normaliza-
there is one other step that needs for scientific rigor to ensure
tion approaches [59]. The same conclusions could be extracted from
the quality of the final results: statistical analysis, which means
the applicability of other operations for data transformation such
the application of univariate and multivariate tools to reveal the
as centering or scaling, which transform the original data sets to
biomarker capability of molecular entities or metabolites, provid-
ensure comparisons. These pretreatment operations depend on the
ing they have been previously identified.
data distribution, thus hindering establishment of guided work-
The conventional scheme for statistical analysis in the search for
flows.
biomarkers typically involves four main steps, as Fig. 1 shows: data
The data treatment step is generally initiated by testing unsu-
preconditioning, pretreatment, treatment and validation. Although
pervised and supervised analysis. PCA and clustering analysis
this scheme seems to be very well defined, its main limitation is the
(CA) are the two main alternatives to reveal discrimination pat-
huge variability of implementable software, algorithms and statis-
terns occurring between cases and controls. Partial least squares
tical operations. Thus, it is difficult to find common protocols in
discriminant analysis (PLS-DA) is the most frequent supervised
the literature for biomarkers discovery. This lack of standardiza-
approach to prove that the studied factor explains a representa-
tion particularly affects to data preconditioning and pretreatment.
tive part of the total variability. Other frequent alternatives are
One example is normalization, which can be carried out at a pre- or
orthogonal PLS (O-PLS), k-nearest neighbors algorithm (KNN), soft
post- data acquisition step to ensure comparability among groups
independent modelling class analogy (SIMCA), neural networks-
of samples [57]. It is well-known that normalization should be
based algorithms, logistic regressions, supported vector machines
compatible with the experimental design, research purpose and
and random forest analysis [60–62]. Since all these supervised
data mining methods [58]. However, the selection of an appropri-
approaches generate classification models, they can be subjected
ate method is still confusing, the prerequisites for a normalization
to internal and external validation. The former, typically cross-
method are difficult to confirm in some cases, and evaluation
validation, is essentially selected when the number of samples used
of normalization results is unclear [59]. Extended practices are
to create the models is limited. Therefore, it is preferably tested in
the MSTUS (Mass Spectrometry Total Useful Signal) approach,
a screening phase. External validation is a more robust approach
QCs correction or logarithmic transformations. Nevertheless, few
since the samples of the validation set are not used to develop the
researchers have evaluated the influence of normalization proto-
classification models. The highest variability level is to use sam-
cols on the final biomarker candidates. Recently, Chen et al. tested
ples from different locations and combine them in training and
five representative normalization methods in three real datasets
validation sets.
obtained by GC–MS affected by different variability sources. The
Once discrimination patterns are revealed, parametric or non-
tested approaches were MSTUS, Median, Probabilistic Quotient
parametric tests (ANOVA, t-test versus Kruskal-Wallis test) allow
Normalization, Remove Unwanted Variation-random and System-
highlighting the metabolites more statistically affected by the fac-
atic Ratio Normalization. Typical evaluation parameters such as
tor under study. The selection of these tests is influenced by the
the values of relative standard variation of the metabolites among
success in the normalization process. An extended alternative is the
QC samples, principal component analysis (PCA), within-group and
analysis of fold change ratios to check the concentration variations
across-group relative log abundance plots were applied to compare
that contribute to differentiate groups of samples. However, these
normalization effects. Surprisingly, these evaluation tests were not
M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349 347

and by indole (this one the best individual marker, which curiously
was not in the panel) to discriminate risk factor and control individ-
uals. As can be seen, the performance was considerably enhanced
as compared to indole when several metabolites were combined
[64].
The final step in the workflow is the validation of biomarkers
in the analytical context. This step is basically focused on analyti-
cal characterization of the biomarkers to be quantified in absolute
terms in order to find differences between groups of samples versus
controls. Clinical validation is out of the scope of this review as it
typically involves a large scale study dealing with cohorts subjected
to different variability sources.

6. Foreseeable and desirable trends in the search for


metabolomics biomarkers

The numerous successful developments on metabolomics have


shown the power of this discipline that encompasses from
biomarker discovery to understanding both the mechanisms
underlying phenotypes and the key contribution of microbiota to
the individual state.
Once development of the analytical process for metabolomics
biomarkers discovery is achieved with the required information
referred to both data and metadata, its perfect reproduction in any
laboratory will facilitate participation of as many research groups
as possible. It would be followed by the accurate development of
Fig. 4. ROC curves of the panel to discriminate control and risk factor groups
(black) and the best individual marker —indole— (grey). The panel included five the steps following biomarkers discovery (see Fig. 1). Therefore:
metabolites: monopalmitin, monostearin, benzyl alcohol, 2,4-diphenyl-4-methyl-
2-E-pentene and p-cresol. Reproduced with permission of IOP Publishing, Ref. [64].
(i) Biomarker validation by different laboratories will give
robustness to this step.
variations are not statistically supported. In any case, statistically (ii) Clinical validation by multiple hospitals from which a huge
significant metabolites or compounds experiencing a relevant con- number of samples can be obtained will facilitate knowledge
centration change must not be labeled as candidate markers prior of the lowest concentration of metabolites involved in the
to assessment of their predictive capability. panel for disease detection.
The predictive capability is one of the most critical tasks in (iii) Further improvement can even be achieved by polishing the
biomarker discovery. This information can be deduced from any analytical process after checking the presence of interferents
of the supervised approaches previously indicated. Nevertheless, or masking agents when the panel components are at very
the most used approach with this purpose is random forest anal- low concentrations.
ysis, which allows obtaining a list of top-n metabolites classified (iv) Integration of orthogonal omics [25] can provide insight into
by this parameter. One example is found in Fig. 3 that shows the mechanisms of the given disease by providing up- and
the top-12 metabolite candidates found in EBC for discrimination down-stream information [48].
of current smokers versus never smokers, in which the classifi- (v) Development of workflows to assign biological meaning to
cation is carried out by independent predictive capability [18]. metabolites and to move towards finding mechanisms of dis-
Complementarily, receiver operating characteristic (ROC) analy- ease to determine the metabolic pathways perturbed by the
sis quantifies the marker capability of each compound in terms aberrant conditions.
of sensitivity and specificity, which means identification of true (vi) Targeting of metabolomics pathways to influence/interfere
positive and true negative cases. ROC analysis is considered an metabolite levels can even be directed to genes and silencing
essential tool in the identification of biomarkers for all proto- gene expression.
cols. In this aspect, it is quite difficult to find a unique biomarker (vii) Also, metabolic pathways can be influenced at the protein
with high predictive behavior in terms of sensitivity and selectiv- level by using antimetabolites.
ity and the strategy is generally guided to multivariate analysis (viii) Manipulation of sources of exposure to different stimuli can
by combination of biomarkers in panels. Combination of metabo- also influence the metabolome, providing further mechanis-
lites to enhance biomarker performance can be carried out with a tic insights.
targeted approach, by selecting metabolites with the highest pre-
diction capability, or in an iterative manner by configuring panels Without forgetting that all subsequent research should be sup-
of n compounds from the list of significant metabolites. Within ported on a solid analytical process of which all data for being
this second strategy it is worth mention the role of software such
® reproduced are known.
as Panelomix [63], which accelerates considerably the searching
process. Additionally, panels can be configured according to the
final aim to maximize accuracy, sensitivity and/or specificity. This Acknowledgments
approach has been recently applied to find candidate biomarkers
in EBC to discriminate lung cancer patients, risk factor individuals The Spanish Ministerio de Economía y Competitividad
and control individuals. Fig. 4 illustrates the ROC curves obtained (MINECO), Junta de Andalucía and FEDER program are grate-
by combination of five metabolites – monopalmitin, benzyl alcohol, fully acknowledged for financial support (Projects “Development
monostearin, 2,4-diphenyl-4-methyl-2-E-pentene and p-cresol – of methods for early cancer detection”, FQM-1602, and CTQ2015-
348 M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349

68813-R). CIBER de Fragilidad y Envejecimiento Saludable [29] C.R. Esther, L. Turkovic, T. Rosenow, M.S. Muhlebach, et al., Metabolomic
(CIBERfes), an initiative of ISCIII, Spain, is also acklowledged. biomarkers predictive of early structural lung disease in cystic fibrosis, Eur.
Respir. J. 48 (2016) 1612–1621.
[30] Z. Yu, G. Kastenmuller, Y. He, P. Belcredi, et al., Differences between human
plasma and serum metabolite profiles, PLoS One 6 (2011) e21230.
References [31] K.A. Lawton, A. Berger, M. Mitchell, K.E. Milgram, et al., Analysis of the adult
human plasma metabolome, Pharmacogenomics 9 (2008) 383–397.
[1] C.B. Newgard, Metabolomics and metabolic diseases: where do we stand? Cell [32] R. Lundblad, Considerations for the use of blood plasma and serum for
Metab. 25 (2017) 43–56. proteomic analysis, Internet J. Genom. Proteom. 1 (2003) 2.
[2] S. O’Rahilly, Human genetics illuminates the paths to metabolic disease, [33] B. Jorgenrud, S. Jantti, I. Mattila, P. Poho, et al., The influence of sample
Nature 462 (2009) 307–314. collection methodology and sample preprocessing on the blood metabolic
[3] C.B. Newgard, A.D. Attie, Getting biological about the genetics of diabetes, Nat. profile, Bioanalysis 7 (2015) 991–1006.
Med. 16 (2010) 388–391. [34] M.A. López-Bascón, F. Priego-Capote, Á. Peralbo-Molina, M.
[4] C.W.W. Beecher, The human metabolome, in: G.G. Harrigan, R. Goodacre (Eds.), Calderón-Santiago, et al., Influence of the collection tube on metabolomic
Metabolic Profiling: Its Role in Biomarker Discovery and Gene Function changes in serum and plasma, Talanta 150 (2016) 681–689.
Analysis, Springer, New York, 2003, pp. 311–319. [35] A. Mena-Bravo, F. Priego-Capote, M.D. Luque de Castro, Study of blood
[5] E. Trushina, M.M. Mielke, Recent advances in the application of metabolomics collection and sample preparation for analysis of vitamin D and its
to Alzheimer’s disease, Biochim. Biophys. Acta 1842 (2014) 1232–1239. metabolites by liquid chromatography tandem mass spectrometry, Anal.
[6] E. Holmes, I.D. Wilson, J.K. Nicholson, Metabolic phenotyping in health and Chim. Acta 879 (2015) 69–76.
disease, Cell 134 (2008) 714–717. [36] M.A. Fernández-Peralbo, M.D. Luque de Castro, Preparation of urine samples
[7] J.L. Spratlin, N.J. Serkova, S.G. Eckhardt, Clinical applications of metabolomics prior to targeted or untargeted metabolomics mass-spectrometry analysis,
in oncology: a review, Clin. Cancer. Res. 15 (2009) 431–440. Trends Anal. Chem. 41 (2012) 75–85.
[8] B. Sitter, S. Lundgren, T.F. Bathen, J. Halgunset, H.E. Fjosne, I.D. Gribbestad, [37] J. Schalken, Is urinary sarcosine useful to identify patients with significant
Comparison of HR MAS MR spectroscopic profiles of breast cancer tissue with prostate cancer? The trials and tribulations of biomarker development, Eur.
clinical parameters, NMR Biomed. 19 (2006) 30–40. Urol. 58 (2010) 19–20.
[9] K. Glunde, C. Jie, Z.M. Bhujwalla, Molecular causes of the aberrant choline [38] S. Ishikawa, M. Sugimoto, K. Kitabatake1, A. Sugano, et al., Identification of
phospholipid metabolism in breast cancer, Cancer Res. 64 (2004) 4270–4276. salivary metabolomic biomarkers for oral cancer screening, Sci. Rep. 6 (2016)
[10] L. Bartella, S.B. Thakur, E.A. Morris, D.D. Dershaw, et al., Enhancing nonmass 31520.
lesions in the breast: evaluation with proton (1H) MR spectroscopy, [39] J. Wei, G. Xie, Z. Zhou, P. Shi, et al., Salivary metabolite signatures of oral
Radiology 245 (2007) 80–87. cancer and leukoplakia, Int. J. Cancer 129 (2011) 2207–2217.
[11] E.E. Kline, E.G. Treat, T.A. Averna, M.S. Davis, et al., Citrate concentrations in [40] J.J. Mikkonen, M. Herrala, P. Soininen, R. Lappalainen, et al., Metabolic
human seminal fluid and expressed prostatic fluid determined via 1H nuclear profiling of saliva in patients with primary Sjogren’s syndrome, Metabolom.
magnetic resonance spectroscopy outperform prostate specific antigen in Open Access 3 (2013) 128.
prostate cancer detection, J. Urol. 176 (2006) 2274–2279. [41] F. Deepinder, H.T. Chowdary, A. Agarwal, Role of metabolomics analysis of
[12] E.M. De Feo, C.L. Wu, W.S. McDougal, L.L. Cheng, A decade in prostate cancer: biomarkers in the management of male infertility, Expert Rev. Mol. Diagn. 7
from NMR to metabolomics, Nat. Rev. Urol. 8 (2011) 301–311. (2007) 351–358.
[13] C. Denkert, J. Budczies, T. Kind, W. Weichert, et al., Mass spectrometry based [42] X. Chen, C. Hu, J. Dai, L. Chen, Metabolomics analysis of seminal plasma in
metabolic profiling reveals different metabolite patterns in invasive ovarian infertile males with kidney-yang deficiency: a preliminary study, Evid.-Based
carcinomas and ovarian borderline tumors, Cancer Res. 66 (2006) Complement. Alternat. Med. 2015 (2015) 892930.
10795–10804. [43] M.A. Fernández-Peralbo, F. Priego-Capote, M.D. Luque de Castro, A.
[14] Y. Qiu, G. Cai, M. Su, T. Chen, et al., Urinary metabonomic study on colorectal Casado-Adam, et al., LC?MS/MS quantitative analysis of paclitaxel and its
cancer, J. Proteome Res. 9 (2010) 1627–1634. major metabolites in serum, plasma and tissue from women with ovarian
[15] S.A. Ritchie, J. Tonita, R. Alvi, D. Lehotay, et al., Low-serum GTA-446 cancer after intraperitoneal chemotherapy, J. Pharm. Biomed. Anal. 91 (2014)
anti-inflammatory fatty acid levels as a new risk factor for colon cancer, Int. J. 131–137.
Cancer 132 (2013) 355–362. [44] M.A. Fernández-Peralbo, M.D. Luque de Castro, Analytical methods based on
[16] J. Carrola, C.M. Rocha, A.S. Barros, A.M. Gil, et al., Metabolic signatures of lung exhaled breath for early detection of lung cancer, Trends Anal. Chem. 32
cancer in biofluids: NMR-based metabonomics of urine, J.Proteome Res. 10 (2012) 13–20.
(2011) 221–230. [45] A. Mena-Bravo, M.D. Luque de Castro, Sweat: a sample with limited present
[17] A. Peralbo-Molina, M. Calderón-Santiago, F. Priego-Capote, et al., applications and promising future in metabolomics, J. Pharm. Biomed. Anal.
Metabolomics analysis of exhaled breath condensate for discrimination 90 (2014) 139–147.
between lung cancer patients and risk factor individual, J. Breath Res. 10 [46] A. Peralbo-Molina, M. Calderón-Santiago, F. Priego-Capote, B. Jurado-Gámez,
(2016) 016011. et al., Identification of metabolomics panels for potential lung cancer
[18] A. Peralbo-Molina, M. Calderón-Santiago, B. Jurado-Gámez, M.D. Luque de screening by analysis of exhaled breath condensate, J. Breath Res. 10 (2016)
Castro, et al., Exhaled breath condensate to discriminate individuals with 026002.
different smoking habits by GC–TOF/MS, Sci. Rep. 7 (2017) 1421. [47] M. Calderón-Santiago, F. Priego-Capote, N. Turck, X. Robin, et al., Human
[19] M. Calderón-Santiago, F. Priego-Capote, B. Jurado-Gámez, M.D. Luque de sweat metabolomics for lung cancer screening, Anal. Bional. Chem. 407
Castro, Optimization study for metabolomics analysis of human sweat by (2015) 5381–5392.
liquid chromatography?tandem mass spectrometry in high resolution mode, [48] M.A. Fernández-Peralbo, M. Calderón Santiago, F. Priego-Capote, M.D. Luque
J. Chromatogr. A 1333 (2014) 70–78. de Castro, Study of exhaled breath condensate sample preparation for
[20] M.M. Delgado-Povedano, M. Calderón-Santiago, F. Priego-Capote, B. metabolomics analysis by LC–MS/MS in high resolution mode, Talanta 144
Jurado-Gámez, et al., Recent advances in human sweat metabolomics for lung (2015) 1360–1369.
cancer screening, Metabolomics 12 (2016) 166. [49] A. Smolinska, L. Blanchet, L.M.C. Buydens, S.S. Wijmenga, NMR and pattern
[21] G.A. Nagana Gowda, D. Raftery, Biomarker discovery and translation in recognition methods in metabolomics: from data acquisition to biomarker
metabolomics, Curr. Metabolom. 1 (2013) 227–240. discovery. A review, Anal. Chim. Acta 750 (2012) 82–97.
[22] D.R. Parkinson, R.T. McCormack, S.M. Keating, S.I. Gutman, et al., Evidence of [50] Q. Wu, M. Zou, M. Yang, S. Zhou, et al., Revealing potential biomarkers of
clinical utility: an unmet need in molecular diagnostics for patients with functional dyspepsia by combining 1 H NMR metabonomics techniques and an
cancer, Clin. Cancer Res. 20 (2014) 1428–1444. integrative multi-objective optimization method, Sci. Rep. 6 (2016) 18852.
[23] C.L. Sawyers, L.J. van’t Veer, Reliable and effective diagnostics are keys to [51] A. Paixao de Santana-Filho, T. Jacomasso, D. Suss Riter, A. Barison, et al., NMR
accelerating personalized cancer medicine and transforming cancer care: a metabolic fingerprints of murine melanocyte and melanoma cell lines:
policy statement from the American association for cancer research, Clin. application to biomarker discovery, Sci. Rep. 7 (2017) 42324.
Cancer Res. 20 (2014) 4978–4981. [52] M. Suzukia, S. Nishiumia, A. Matsubaraa, T. Azumaa, et al., Metabolome
[24] N. Goossens, S. Nakagawa, X. Sun, Y. Hoshida, Cancer biomarker discovery and analysis for discovering biomarkers of gastroenterological cancer, J.
validation, Transl. Cancer Res. 4 (2015) 256–269. Chromatogr. B 966 (2014) 59–69.
[25] C.H. Johnson, J. Ivanisevic, G. Siuzdak, Metabolomics: beyond biomarkers and [53] M.A. Fernández-Peralbo, F. Priego-Capote, J.G. Galache-Osuna, M.D. Luque de
towards mechanisms, Nat. Rev. Mol. Cell Biol. 17 (2016) 451–459. Castro, Targeted analysis of omega-6-derived eicosanoids in human serum by
[26] D.B. Kell, M. Brown, H.M. Davey, W.B. Dunn, I. Spasic, S.G. Oliver, Metabolic SPE?LC?MS/MS for evaluation of coronary artery disease, Electrophoresis 34
footprinting and systems biology: the medium is the message, Nature Rev. (2013) 2901–2909.
Microbiol. 3 (2005) 557–565. [54] A. Mena, F. Priego-Capote, M.D. Luque de Castro, Two-dimensional liquid
[27] R. González-Domínguez, Á. Sayagoa, Á. Fernández-Recamales, Metabolomics chromatography coupled to tandem mass spectrometry for vitamin D
in Alzheimer’s disease: the need of complementary analytical platforms for metabolite profiling including the C3-epimer-25-monohydroxyvitamin D3, J.
the identification of biomarkers to unravel the underlying pathology, J. Chromatogr. A 1451 (2016) 50–57.
Chromatogr. B (2017), http://dx.doi.org/10.1016/j.jchromb.2017.02.008. [55] R. Hammerl, O. Frank, T. Hofmann, Differential off-line LC–NMR (DOLC–NMR)
[28] C. Paiva, A. Amaral, M. Rodriguez, N. Canyellas, et al., Identification of metabolomics to monitor tyrosine-induced metabolome alterations in
endogenous metabolites in human sperm cells using proton nuclear magnetic Saccharomyces cerevisiae, J. Agric. Food Chem. 65 (2017) 3230–3241.
resonance (1H-NMR) spectroscopy and gas chromatography?mass [56] E. Appiah-Amponsah, K. Owusu-Sarfo, G.A. Nagana Gowda, T. Ye, et al.,
spectrometry (GC–MS), Andrology 3 (2015) 496–505. Combining hydrophilic interaction chromatography (HILIC) and isotope
M.D. Luque de Castro, F. Priego-Capote / Journal of Pharmaceutical and Biomedical Analysis 147 (2018) 341–349 349

tagging for off-line LC-NMR applications in metabolite analysis, Metabolites 3 [61] B. Xi, H. Gu, H. Baniasadi, D. Raftery, Statistical analysis and modeling of mass
(2013) 575–591. spectrometry-based metabolomics data, Methods Mol. Biol. 1198 (2014)
[57] Y.M. Wu, L. Li, Sample normalization methods in quantitative metabolomics, 333–353.
J. Chromatogr. A 1430 (2016) 80–95. [62] A. Alonso, S. Marsal, A. Julià, Analytical methods in untargeted metabolomics:
[58] E. Saccenti, H.C.J. Hoefsloot, A.K. Smilde, J. Westerhuis, et al., Reflections on state of the art in 2015, Front. Bioeng. Biotechnol. 3 (2015) 23.
univariate and multivariate analysis of metabolomics data, Metabolomics 10 [63] X. Robin, N. Turck, A. Hainard, N. Tiberti, et al., PanelomiX: a threshold-based
(2014) 361–374. algorithm to create panels of biomarkers, Transl. Proteom. 1 (2013) 57–64.
[59] J. Chen, P. Zhang, M. Lv, H. Guo, et al., Influences of normalization method on [64] A. Peralbo-Molina, M. Calderón-Santiago, F. Priego-Capote, B. Jurado-Gámez,
biomarker discovery in gas chromatography-mass spectrometry based et al., Identification of metabolomics panels for potential lung cancer
untargeted metabolomics: what should be considered? Anal. Chem. (2017), screening by analysis of exhaled breath condensate, J. Breath Res. 10 (2016)
http://dx.doi.org/10.1021/acs.analchem.6b05152. 026002.
[60] J.E. McDermott, J. Wang, H. Mitchell, B. Webb-Robertson, et al., Challenges in
biomarker discovery: combining expert insights with statistical analysis of
complex omics data, Expert Opin. Med. Diagn. 7 (2013) 37–51.

You might also like