Professional Documents
Culture Documents
Abstract
Introduction
Vibrational spectroscopic techniques can be applied in different fields due to its versatility,
simplicity and low-cost per analysis. Within these techniques, mid-infrared (MIR) spectroscopy
parameters in the same analysis, environmental friendly, rapid, avoids sample preparation, has
a low cost per analysis and can be applied in-situ [1]. This technique was first applied in the
agro-food sector but currently it has been employed in the health sector because of the
abovementioned characteristics. In fact, the number of application within the health sector is
rapidly increasing [2, 3]. MIR spectroscopy measures the fundamental vibrations within 4000
to 400 cm-1 of several covalent bonds. It is considered a fingerprint technique and for this
reason it is applied in the identification and characterization of different types of samples [1, 3,
4]. The final spectrum is complex, reflecting several absorption bands that can be weak and
information. The need of using chemometric tools is pointed as the major drawback.
This technique was already explored in the analysis of gingival crevicular fluid (GCF) but there
are only two works described in the literature [5, 6]. In 2010, Xiang et al. [6] collected GCF
from several patients to assess if this technique was able to discriminate between healthy
patients and patients with periondontitis. The GCF samples were dried and then analysed. The
results obtained, 93% of correct predictions for the validation set through linear discriminant
analysis, demonstrated that this technique was very effective in the discrimination of patients
with and without periodontitis. Moreover, the authors suggested that the molecular
components responsible for this discrimination were lipids, proteins and DNA. However, the
results obtained could be better if the authors had performed a spectral region and pre-
processing optimization. A few years later, in 2013, Xiang et al. [5] investigated the capacity of
infrared spectroscopy to discriminate between patients with and without diabetes mellitus.
Several GCF samples collected from different sites in each patient were analysed. The
classification of the samples was accomplished through linear discriminant analysis and an
overall accuracy of 87% of correct classifications for the validation set was obtained. An
algorithm was applied for the selection of the best spectral regions which included the
authors attributed the discrimination of patients with and without diabetes mellitus to the
differences of these compounds content in the GCF. Again, the authors have not tested
different pre-processing techniques which could improve the good results obtained. Both
these works revealed the potential of infrared spectroscopy to detect, through a non-invasive
Therefore, the hereby manuscript explores the used of mid-infrared spectroscopy to detect
eating disorders in GCF. Moreover, this work explored as well the potential of this technique to
discriminate individuals, sampling site, type of eating disorders and vomiting induction. To the
best of our knowledge this is the first time mid-infrared spectroscopy is used with this
purpose.
Material and methods
Patients/Subjects
Spectrum BX FTIR System spectrophotometer (Waltham, USA) with a DTGS detector and a PIKE
Technologies Gladi ATR accessory. The spectra were acquired in diffuse reflectance mode from
4000 to 600 cm-1, with a resolution of 4cm-1 and 32 scans co-additions. Each strip perio-paper
was analysed on both sides at the bottom and compressed with a pressure of 150 N cm -2. Thus,
a total of 224 (28 x 8) spectra were obtained. The ATR crystal was cleaned and a background
MIR spectra were modelled with the help of chemometric tools, namely, principal component
analysis (PCA) to assist outlier detection and partial least squares discriminant analysis (PLSDA)
to develop discrimination models [7, 8]. The spectra were previously mean centred before any
data analysis. For the PLSDA, the spectral data were randomly divided in two data sets, one for
calibration (70%) and the other for validation (30%). This division was performed ensuring the
same proportion of patients’ classes in both sets aiming to avoid unbalanced classes [9]. The
optimization of the PLSDA models was performed through the selection of the optimal number
of latent variables, best spectral region and best pre-processing technique (using only the
calibration set). The optimal number of latent variables (LV) was estimated through leave-one-
sample-out cross-validation procedure. The assessment of the best spectral region involved
dividing the MIR spectra in 5 different regions and testing all these regions individually and in
combination. The different regions were the following: from 3982 to 2652 cm -1 (region 1), from
2650 to 1862 cm-1 (region 2), from 1860 to 1182 cm-1 (region 3), from 1180 to 922 cm-1 (region
4) and from 920 to 620 cm-1 (region 5). The selection of the best pre-processing technique was
normal variate (SNV) and Savitzky-Golay filter (with different filter widths, polynomial orders
and first and second derivatives). After model optimization, the validation set was used to test
the accuracy of the optimized models. This was performed through the projection of the
validation set and the results were arranged in the form of confusion matrices. The confusion
matrices express the percentages of correct predictions for each patient class and the total
percentage of correct predictions was obtained by adding the diagonal elements of the
confusion matrix [9]. The coefficient regression vectors of the PLSDA models for eating
disorders were analyzed to understand which specific wavenumbers were more important and
to relate them with possible compounds present in the gingival crevicular fluid.
Matlab version 8.6 (MathWorks, Natick, USA) and PLS Toolbox version 8.2.1 (Eigenvector
Research Inc., Wenatchee, USA) software were used to perform all chemometric analysis.
Results and discussion
The raw spectra of the gingival crevicular fluid are show in Figure 1. As abovementioned, a PCA
was performed to assist outlier detection and no outliers were identified. After this, several
PLSDA models were developed to verify if GCF spectra contains information related with each
individual, sampling site and presence of eating disorders. For all these discrimination models,
different pre-processing techniques and different spectral regions were tested individually and
model developed more than 150 (31 spectral regions combinations x 5 pre-processing
techniques combinations) PLSDA models were developed. Moreover, testing different spectral
regions allowed comprehending which were more appropriate for the respective
discrimination and with that ascertain the compounds that can be responsible.
The PLSDA models were built considering the mean of all the spectra collected from each
individual, thus at the end a total of 28 spectra were used. This was performed for each
discrimination model (ex: patient/subject, sampling site and presence of eating disorders).
Individual discrimination
In the first place, GCF spectra were modelled trough PLSDA to verify if MIR spectra contained
specific information related with each individual. The results obtained (a total of correct
predictions below 20% for all the strategies) indicated that it was not possible to discriminate
these patients through GCF. Moreover, it was not clear the best spectral region and pre-
processing technique. We believe that the GCF composition of each patient varies (introduzir
reference talvez) but possibly these variations are so marginal and MIR spectroscopy is not
sensitive enough.
Sampling site
The GCF spectra were modelled as well against the sampling site. It was important to
understand if GCF is influenced by the sampling site (aqui será melhor desenvolver mais um
bocado sobre se faz sentido existir ou não variação da composição do GCF). Again, the PLSDA
results revealed that it was not possible to discriminate (a total of correct predictions below
30% for all the strategies except for the mean of all spectra collected for each patient that was
not performed) the sampling site as well as to identify the best spectral region(s) and pre-
Eating disorders
The primary objective of this manuscript was to attest if it is possible to discriminate patients
with and without eating disorders (ED) through the GCF spectra. Again, several PLSDA models
were tested aiming to find the best spectral region and pre-processing technique. The best
results were obtained when selecting the spectral region within 3982 to 2652 cm -1 (R1) and
through the application Savitzky-Golay filter (15 points filters width, second polynomial order
and second derivative). The mean of the raw and pre-processed spectra obtained in the
spectral region within 3982 to 2652 cm -1 are depicted in Figure 2. Regarding Fig. 2a, both
spectra of each group of patients are very similar but the patients with ED show more intense
peaks. Fig. 2b, after spectra pre-processing, reveals significant differences around 3700, within
3400 to 3200 and within 3000 to 2800 cm-1.
a b
Figure 2- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
Through PLSDA, a total of 84.1% of correct predictions were obtained using 6 LV for the
discrimination between patients with and without ED. Table 1 shows the confusion matrix and
it can be seen that the worst predictions involved the group of individuals without ED. In fact,
approximately 30% of this group was incorrectly classified as having ED. (introduzir alguma
Table 1. Confusion matrix for the discrimination of patients/subjects with and without ED
based on the GCF spectra through strategy two (80.1 % of correct predictions and 6 LV’s).
Strategy three
% Subjects group(real)
Thus, it is important to explore the regression coefficient vectors to understand which spectral
absorptions showed a higher contribution and with that attempt to establish a relation with
the compounds present in the GCF. Figure 3 shows the regression coefficient vectors of the
PLSDA model and the wavenumbers that showed the higher contribution were located within
3720 to 3620, 3320 to 3270 and 2970 to 2880 cm -1, respectively. These wavenumbers are
consistent with the differences in the pre-processed spectra of patients with and without ED
(Figure 2). The wavenumbers within 3720 to 3620 cm -1 can be attributed to N-H stretching of
amines and O-H stretching of alcohols (REF book Clara); the wavenumbers within 3320 to 3270
cm-1can be associated with amines and carboxylic acids due to the N-H stretching and O-H
stretching, respectively; and the wavenumbers within 2970 to 2880 cm -1 can be attributed to
O-H stretching of carboxylic acids as well. (explore the compounds that could be changing in
3720-3620 cm-1
2970-2880 cm-1
Figure 3 – Regression coefficient vectors of the PLSDA model obtained adopting strategy three
within 3982 to 2652 cm-1 (region 5) and through the application of Savitzky-Golay filter (15
Conclusion
The principal objective of this manuscript was accomplished and the results obtained allowed
averaging the spectra is important to smooth slightly random variations present in the spectra
of strip perio-papers (achas João que posso dizer isto?) and the results obtained in the PLSDA
models for ED showed the by averaging the spectra the efficacy of the PLSDA models increases
- the GCF is the same in all the sampling sites (como não se conseguiu discriminar)
Overall, this exploratory work revealed the potential MIR spectroscopy to detect chemical
variations in the composition of GCF in a non-invasive, rapid and cheap way. Further works
including a higher number of samples are needed to attest the robustness of this technique
Acknowledgements
The authors are grateful for financial support from the European Union (FEDER funds through
(Fundação para a Ciência e Tecnologia) and POPH (Programa Operacional Potencial Humano)
1. dos Santos, C.A.T., R.N. Páscoa, and J.A. Lopes, A review on the application of
vibrational spectroscopy in the wine industry: From soil to bottle. TrAC Trends in
Analytical Chemistry, 2017.
2. Diem, M., et al., A decade of vibrational micro-spectroscopy of human cells and tissue
(1994-2004). Analyst, 2004. 129(10): p. 880-885.
3. Siqueira, L.F.S. and K.M.G. Lima, A decade (2004-2014) of FTIR prostate cancer
spectroscopy studies: An overview of recent advancements. Trac-Trends in Analytical
Chemistry, 2016. 82: p. 208-221.
4. Stuart, B., Infrared Spectroscopy: Fundamentals and Applications 2004, Chichester:
Wiley Online Library. p.46-47.
5. Xiang, X.M., et al., Diabetes-Associated Periodontitis Molecular Features in Infrared
Spectra of Gingival Crevicular Fluid. Journal of Periodontology, 2013. 84(12): p. 1792-
1800.
6. Xiang, X.M., et al., Periodontitis-specific molecular signatures in gingival crevicular
fluid. Journal of Periodontal Research, 2010. 45(3): p. 345-352.
7. Naes, T., et al., Interpreting PCR and PLS solutions. A User-Friendly Guide to
Multivariate Calibration and Classification, 2004. 1: p. 39-54.
8. Barker, M. and W. Rayens, Partial least squares for discrimination. Journal of
Chemometrics, 2003. 17(3): p. 166-173.
9. Páscoa, R., et al., Exploratory study on vineyards soil mapping by visible/near-infrared
spectroscopy of grapevine leaves. Computers and Electronics in Agriculture, 2016. 127:
p. 15-25.
Tables
ED
Table 1. Confusion matrix for the discrimination of individuals with and without ED based on
the GCF spectra through strategy two (80.1 % of correct predictions and 6 LV’s).
Strategy three
% Subjects group(real)
Figure 2- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
Figure 3 – Regression coefficient vectors of the PLSDA model obtained adopting strategy three
within 920 and 620 cm-1 (region 5) and through the application of SNV followed by Savitzky-
Golay filter (15 points filters width, second polynomial order and second derivative).
Figures
Figure 2- Mean of raw (a) and pre-processed spectra (b) using the best spectral region for the
684 cm-1
836 cm-1
654 cm-1 635 cm-1
Figure 3 – Regression coefficient vectors of the PLSDA model obtained adopting strategy three
within 920 and 620 cm-1 (region 5) and through the application of SNV followed by Savitzky-
Golay filter (15 points filters width, second polynomial order and second derivative).
Extra