Professional Documents
Culture Documents
Abstract
Introduction
Vibrational spectroscopic techniques can be applied in different fields due to its versatility,
simplicity and low-cost per analysis. Within these techniques, mid-infrared (MIR) spectroscopy
parameters in the same analysis, environmental friendly, rapid, avoids sample preparation, has
a low cost per analysis and can be applied in-situ [1]. This technique was first applied in the
agro-food sector but currently it has been extended to the health sector because of the
within the health area [2, 3]. MIR spectroscopy measures the fundamental vibrations within
4000 to 400 cm-1. It is considered a fingerprint technique and for this reason it is applied in the
identification and characterization of different types of samples [1, 3, 4]. The final spectrum is
complex, reflecting several absorption bands that can be weak and overlapping, and therefore
it is essential to apply chemometric tools to extract useful information. The need of using
This technique was already explored in the analysis of gingival crevicular fluid (GCF) but there
are only two works described in the literature [5, 6]. In 2010, Xiang et al. [6] collected GCF
from several patients to assess if this technique was able to discriminate between healthy
patients and patients with periondontitis. The GCF samples were dried and then analysed. The
results obtained, 93% of correct predictions for the validation set through linear discriminant
analysis, demonstrated that this technique was very accurate for the discrimination of patients
with and without periodontitis. Moreover, the authors suggested that the molecular
components responsible for this discrimination were lipids, proteins and DNA. However, the
results obtained could be better if the authors had performed a spectral region and pre-
processing optimization. A few years later, in 2013, Xiang et al. [5] investigated the capacity of
infrared spectroscopy to discriminate between patients with and without diabetes mellitus.
Several GCF samples collected from different sites in each patient were analysed. The
classification of the samples was accomplished by linear discriminant analysis and an overall
accuracy of 87% of correct classifications for the validation set was obtained. An algorithm was
applied for the selection of the best spectral regions which included the molecular vibrations
of proteins, glycogen, oligosaccharides and glycolipids. Therefore, the authors attributed the
discrimination of patients with and without diabetes mellitus to the differences of these
compounds content in the GCF. Again, the authors have not tested different pre-processing
techniques which could improve the good results obtained. Both these works reveal the
Therefore, the hereby manuscript explores the used of mid-infrared spectroscopy to detect
eating disorders in GCF. Moreover, this work explored as well the potential of this technique to
discriminate patients, sampling site, medication intake, type of eating disorders, the presence
of others pathologies and vomiting induction. To the best of our knowledge this is the first
Patients/Subjects
Spectrum BX FTIR System spectrophotometer (Waltham, USA) with a DTGS detector and a PIKE
Technologies Gladi ATR accessory. The spectra were acquired in diffuse reflectance mode from
4000 to 600 cm-1, with a resolution of 4cm-1 and 32 scans co-additions. Each strip perio-paper
was analysed on both sides at the bottom and compressed with a pressure of 150 N cm -2. Thus,
a total of 224 (28 x 8) spectra were obtained. The ATR crystal was cleaned and a background
MIR spectra were modelled with the help of chemometric tools, namely, principal component
analysis (PCA) to assist outlier detection and partial least squares discriminant analysis (PLSDA)
to develop discrimination models [7, 8]. The spectra were previously mean centred before any
data analysis. For the PLSDA, the spectral data were randomly divided in two data sets, one for
calibration (70%) and the other for validation (30%). This division was performed ensuring the
same proportion of patients’ classes in both sets aiming to avoid unbalanced classes [9]. The
optimization of the PLSDA models was performed through the selection of the optimal number
of latent variables, best spectral region and best pre-processing technique (using only the
calibration set). The optimal number of latent variables (LV) was estimated through leave-one-
sample-out cross-validation procedure using only the calibration set. The assessment of the
best spectral region involved dividing the MIR spectra in 5 different regions and testing all
these regions individually and in combination. The different regions were the following: from
3982 to 2652 cm-1 (region 1), from 2650 to 1862 cm-1 (region 2), from 1860 to 1182 cm-1
(region 3), from 1180 to 922 cm-1 (region 4) and from 920 to 620 cm-1 (region 5). The selection
of the best pre-processing technique was achieved through testing different techniques
individually and in combination, namely standard normal variate (SNV) and Savitzky-Golay
filter (with different filter widths, polynomial orders and first and second derivatives). After
model optimization, the validation set was used to test the accuracy of the optimized models.
This was performed through the projection of the validation set and the results were arranged
in the form of confusion matrices. The confusion matrices express the percentages of correct
predictions for each patient class and the total percentage of correct predictions was obtained
by adding the diagonal elements of the confusion matrix [9]. The coefficient regression vectors
of the PLSDA models for eating disorders and medication were analyzed to understand which
specific wavenumbers were more important and to relate them with possible compounds
Matlab version 8.6 (MathWorks, Natick, USA) and PLS Toolbox version 8.2.1 (Eigenvector
Research Inc., Wenatchee, USA) software were used to perform all chemometric analysis.
Results and discussion
The raw spectra of the gingival crevicular fluid are show in Figure 1. As abovementioned, a PCA
was performed to assist outlier detection and no outliers were identified. After this, a several
PLSDA models were developed to verify if GCF spectra contains information related with the
medication intake and vomit induction. For all these discrimination models, different pre-
processing techniques and different spectral regions were tested individually and in
regions allowed comprehending which were more appropriate for the respective
discrimination and with that ascertain the compounds that can be responsible.
The PLSDA models were built considering three different strategies: total spectral data
(strategy one), the mean of both sides of each strip perio-paper (strategy two) and the mean
of all the spectra collected from each patient/subject (strategy three) for each discrimination
model. Therefore, for each discrimination model (ex: patient/subject, sampling site, presence
of eating disorders, presence of other pathologies, medication intake and vomit induction) a
total of three correct predictions were obtained. However it should be noted that for each
discrimination model and strategy more than 150 (31 spectral regions combinations x 5 pre-
Patient/subject
In the first place, GCF spectra were modelled trough PLSDA to verify if MIR spectra contained
specific information related with each patient/subject. The results obtained (a total of correct
predictions below 20% for all the strategies) indicated that it was not possible to discriminate
these patients through GCF. With these results it was not possible to perceive which was the
best spectral region and pre-processing technique. We believe that the GCF composition of
each patient varies (introduzir reference talvez) but possibly these variations could be so
Sampling site
The GCF spectra were modelled as well against the sampling site. It was important to
understand if GCF is influenced by the sampling site (aqui será melhor desenvolver mais um
bocado sobre se faz sentido existir ou não variação da composição do GCF). Again, the PLSDA
results revealed that it was not possible to discriminate (a total of correct predictions below
30% for all the strategies except for the mean of all spectra collected for each patient that was
not performed) the sampling site as well as to perceive the best spectral region(s) and pre-
colocar referências)
Vomit induction
several PLSDA models were developed using all the strategies considering patients with
induced vomiting practices in one group and patients without vomiting practices in another
group. The best results were obtained, 75% of correct predictions (with 13 LV), when strategy
one was used. The best spectral region was obtained within 3982 to 2652 cm -1 (region 1) and
when applying SNV followed by Savitzky-Golay filter (15 points filters width, second polynomial
order and second derivative). The results obtained in terms of correct predictions through the
application of the other strategies were lower (approximately 65%). This was not expected
since averaging the spectra should smooth slightly random variations present in the spectra of
Thus, it can be concluded that MIR spectra is capable of detecting the chemical differences on
the composition of GCF between these two groups of patients/subjects. However, further
studies are needed (including a higher amount of patients) to confirm the robustness of this
technique.
Eating disorders
The primary objective of this manuscript was to attest if it is possible to discriminate patients
with and without eating disorders (ED) through the GCF spectra. Again, several PLSDA models
were tested aiming to find the best spectral region and pre-processing technique. This was
done for all the strategies. The best results were obtained when selecting the spectral region
within 920 to 620 cm-1 (region 5) and through the application of SNV followed by Savitzky-
Golay filter (15 points filters width, second polynomial order and second derivative) in all the
strategies. The raw and pre-processed spectra obtained in the spectral region within 920 to
620 cm-1 are depicted in Figure x. The raw spectra are very similar between each group of
Figure x- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
A total of 80.1% of correct predictions was obtained using 6 LV and adopting strategy three.
Strategy two yielded a total of 77.3% of correct predictions with 8 LV while strategy one
yielded a total of 76.0% of correct predictions with 10 LV’s. The results demonstrated that
averaging the spectra increases the accuracy of the PLSDA models and reduces the number of
LV. This was expected since averaging the spectra should smooth slightly random variations
present in the spectra of strip perio-papers. Table X shows the confusion matrix obtained
through strategy three (the confusion matrices obtained using the other strategies are very
similar and for that reason were not showed). It can be seen that the worst predictions
involved the group of patients/subjects without ED. In fact, slightly more than half of this
Table x. Confusion matrix for the discrimination of patients/subjects with and without ED
based on the GCF spectra through strategy two (80.1 % of correct predictions and 6 LV’s).
Strategy three
% Subjects group(real)
Thus, it is important to explore the regression coefficient vectors to understand which spectral
absorptions showed a higher contribution and with that attempt to establish a possible
relation with the compounds present in the GCF. Figure x shows the regression coefficient
vectors of the PLSDA model adopting strategy three and the wavenumbers that showed the
higher contribution were: 862, 836, 684, 654, 642 and 635 cm -1. (explore the compounds that
could be changing in GCF and relate them with the possible absorptions)
642 cm-1
862 cm-1
684 cm-1
836 cm-1
654 cm-1 635 cm-1
Type of eating disorders
Within the patients/subjects with ED, ten presented nervous anorexia restrictive type, 4
nervous anorexia purgative type, 1 nervous bulimia restrictive type and 3 nervous bulimia
purgative type. Thus, PLSDA models using the three strategies were developed considering five
different groups (the four abovementioned plus the group of patients/subjects without ED).
The percentage of total prediction was below 50% in all the strategies. The spectral region and
pre-processing technique that yielded the best results were the same as for discriminating the
presence of ED. Although the results obtained were not good, it seems that GCF spectra
contain some information regarding the ED type. Further studies increasing the number of
Os resultados indicam que não é possível identificar outras patrologias (ex: osteoporose,
termos poucas amostras para todo o tipo de patologias (só temos uma amostra para cada)
(talvez devido ao número reduzido de amostras e/ou similaridade dos distúrbios em termos
Os resultados indicam que é possível identificar que os pacientes são medicados (70-80% de
previsões correctas). (colocar figuras dos espectro da zona utilizada para evidenciar as
a b
Figure x- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
PLSDA results
Conclusion
References
1. dos Santos, C.A.T., R.N. Páscoa, and J.A. Lopes, A review on the application of
vibrational spectroscopy in the wine industry: From soil to bottle. TrAC Trends in
Analytical Chemistry, 2017.
2. Diem, M., et al., A decade of vibrational micro-spectroscopy of human cells and tissue
(1994-2004). Analyst, 2004. 129(10): p. 880-885.
3. Siqueira, L.F.S. and K.M.G. Lima, A decade (2004-2014) of FTIR prostate cancer
spectroscopy studies: An overview of recent advancements. Trac-Trends in Analytical
Chemistry, 2016. 82: p. 208-221.
4. Stuart, B., Infrared Spectroscopy: Fundamentals and Applications 2004, Chichester:
Wiley Online Library. p.46-47.
5. Xiang, X.M., et al., Diabetes-Associated Periodontitis Molecular Features in Infrared
Spectra of Gingival Crevicular Fluid. Journal of Periodontology, 2013. 84(12): p. 1792-
1800.
6. Xiang, X.M., et al., Periodontitis-specific molecular signatures in gingival crevicular
fluid. Journal of Periodontal Research, 2010. 45(3): p. 345-352.
7. Naes, T., et al., Interpreting PCR and PLS solutions. A User-Friendly Guide to
Multivariate Calibration and Classification, 2004. 1: p. 39-54.
8. Barker, M. and W. Rayens, Partial least squares for discrimination. Journal of
Chemometrics, 2003. 17(3): p. 166-173.
9. Páscoa, R., et al., Exploratory study on vineyards soil mapping by visible/near-infrared
spectroscopy of grapevine leaves. Computers and Electronics in Agriculture, 2016. 127:
p. 15-25.
Tables
ED
Table x. Confusion matrix for the discrimination of patients/subjects with and without ED
based on the GCF spectra through strategy one (76.0 % of correct predictions and 10 LV’s).
Strategy one
% Subjects group(real)
Subjects group (predicted) With ED Without ED Total
Table x. Confusion matrix for the discrimination of patients/subjects with and without ED
based on the GCF spectra through strategy two (77.3 % of correct predictions and 8 LV’s).
Strategy two
% Subjects group(real)
Table x. Confusion matrix for the discrimination of patients/subjects with and without ED
based on the GCF spectra through strategy two (80.1 % of correct predictions and 6 LV’s).
Strategy three
% Subjects group(real)
Table x. Confusion matrix for the discrimination of patients/subjects with and without
medication based on the GCF spectra through strategy one (72.8 % of correct predictions and 6
LV’s).
Strategy one
% Subjects group(real)
Table x. Confusion matrix for the discrimination of patients/subjects with and without
medication based on the GCF spectra through strategy two (73.5 % of correct predictions and
6 LV’s).
Strategy two
% Subjects group(real)
medication based on the GCF spectra through strategy two (81.5 % of correct predictions and
4 LV’s).
Strategy three
% Subjects group(real)
List of figures
Figure 1- Raw spectra of strip perio-papers gingival crevicular fluid.
a b
Figure x- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
Figure x- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
within 920 and 620 cm-1 (region 5) and through the application of SNV followed by Savitzky-
Golay filter (15 points filters width, second polynomial order and second derivative).
642 cm-1
862 cm-1
684 cm-1
836 cm-1
654 cm-1 635 cm-1
Extra