You are on page 1of 12

SUPPORRTING INFORMATION

Rapid screening of complex matrices: Utilizing Kendrick mass defect to enhance knowledge-based
group type evaluation of GC×GC-HR(EI)ToFMS data.

Benedikt A. Weggler *,†, Beate Gruber *# and Frank L. Dorman*


*Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 107 Althouse
Laboratory, PennState University, University Park, PA 16802, USA
† University of Liège, MolSys - Organic and Biological Analytical Chemistry Group, Quartier Agora, Place du

Six Août 11, B6c, 4000 Liège, Belgium


# Research Instiute for Chromatography, Kennedypark 26, 8500 Kortjik, Belgium

Table of Contents
EXPERIMENTAL SECTION
114 compounds standard mixture S-2-5
Initial data processing parameters S-6
RESULTS AND DISCUSSION
Criteria for developed and evaluated classes S-6-7
Suggestions for additional classes S-7
FIGURES
Mass spectral similarity: Naphtenes and alkenes S-8
TPR vs FPR S-9
ROC discrimination Diesel Kerosene S-9
ROC weathered Diesel S-10
PCA weathered Kerosene S-10
ROC weathered Kerosene S-11
Approach incl. RT information S-12

S-1
EXPERIMENTAL SECTION
114 compounds standard mixture:

DHA-PARAFFINS Standard ALASKA MIX

Name CAS w/w% Used Name CAS Conc. Used

N-Decane 24-18-5 9% Benzene 71-43-2

N-Dodecane 112-40-3 9% Ethylbenzene 100-41-4

1-Ethyl-2-
N-Heptane 142-82-5 9% 611-14-3
methylbenzene

1-Ethyl-3-
N-Hexane 110-54-3 9% 620-14-4
methylbenzene

1-Ethyl-4-
N-Nonane 111-84-2 9% 622-96-8
methylbenzene
100µL for mix

Isopropylbenzene
N-Octane 111-65-9 9% 98-82-8
(cumene)

100µL for mix


N-Pentadecane 629-62-9 9% n-Propylbenzene 103-65-1

2,000 μg/mL
N-Pentane 109-66-0 9% Toluene

1,2,3-
N-Tetradecane 629-59-4 9% 526-73-8
Trimethylbenzene

1,2,4-
N-Tridecane 629-50-5 9% 95-63-6
Trimethylbenzene

1,3,5-
N-Undecane 1120-21-4 9% 108-67-8
Trimethylbenzene

m-Xylene 108-38-3

DHA-PARAFFINS Standard o-Xylene 95-47-6

Name CAS w/w% Used p-Xylene 106-42-3

C4:0 Methyl butyrate 623-42-7 4%


100µL for mix

DHA
C6:0 Methyl caproate 106-70-7 4%
NAPHTHENES

C8:0 Methyl caprylate 111-11-5 4% Name CAS w/w% Used

S-2
C10:0 Methyl
110-42-9 4% Cyclohexane 110-82-7 4.7%
decanoate

C11:0 Methyl Cyclopentane, 1,3-


1731-86-8 2% 30363-79-2 10.3%
undecanoate dimethyl-, cis-

C12:0 Methyl
111-82-0 4% Cyclopentane, ethyl- 1640-89-7 7.8%
dodecanoate

C13:0 Methyl Cyclohexane, 1,4-


1731-88-0 2% 589-90-2 4.7%
tridecanoate dimethyl-

C14:0 Methyl Cyclopentane, 1-


124-10-7 4% 16747-50-5 1.5%
myristate ethyl-1-methyl-

C14:1 (cis-9) Methyl Cyclohexane, 1,3,5-


56219-06-8 2% 1839-63-0 2.3%
myristoleate trimethyl-

C15:0 Methyl Cyclopentane,

100µL for mix


7132-64-1 2% 2040-96-2 1.3%
pentadecanoate propyl-

1,1,4-
C15:1 (cis-10) Methyl
90176-52-6 2% Trimethylcyclohexan 2234-75-5 2.9%
pentadecenoate
e

C16:0 Methyl Cyclohexane, 1,2-


112-39-0 6% 2207-01-4 10.3%
palmitate dimethyl-, cis-

C16:1 (cis-9) Methyl Cyclohexane, 1,2,3-


1120-25-8 2% 15890-40-1 4.5%
palmitoleate trimethyl-, (1a,2ß,3a)-

C17:0 Methyl Cyclopentane,


1731-92-6 2% 3788-32-7 1.3%
heptadecanoate isobutyl-

C17:1 (cis-10) Methyl Cyclohexane, 1,1,2-


75190-82-8 2% 7094-26-0 1.8%
heptadecenoate trimethyl-

Cyclohexane,
C18:0 Methyl stearate 112-61-8 4% 1678-98-4 3.7%
isobutyl-

C18:1 (trans-9) Cyclohexane, 1-


1937-62-8 2% 61142-58-3 4.6%
Methyl octadecenoate methyl-2-propyl-

C18:1 (cis-9) Methyl


112-62-9 4%
oleate

C18:2 (all-trans-9,12)
2566-97-4 2% DHA Olefines*
Methyl linolelaidate

C18:2 (all-cis-9,12)
112-63-0 2% Name CAS w/w% Used
Methyl linoleate

C18:3 (all-cis-6,9,12)
16326-32-2 2% 1-Heptene 592-76-7 6.79%
Methyl linolenate
100µL for mix

C18:3 (all-cis-
9,12,15) Methyl 301-00-8 2% (Z)-3-Heptene 7642-10-6 1.02%
linolenate

C20:0 Methyl
1120-28-1 4% 1-Octene 111-66-0 11.34%
arachidate

S-3
C20:1 (cis-11) Methyl
9/2/2390 2% 2-Octene, (E)- 13389-42-9 2.32%
eicosenoate

C20:2 (all-cis-11,14,)
2463-02-7 2% 2-Octene, (Z)- 7642-04-8 4.37%
Methyl eicosadienoate

C20:3 (all-cis-
8,11,14) Methyl 21061-10-9 2% 1-Nonene 124-11-8 6.91%
eicosatrienoate
C20:3 (all-cis-
11,14,17) Methyl 55682-88-7 2% 3-Nonene 20063-92-7 3.03%
eicosatrienoate
C20:4 (all-cis-
5,8,11,14) Methyl 2566-89-4 2% 2-Nonene 6434-78-2 2.28%
arachidonate
C20:5 (all-cis-
5,8,11,14,17) Methyl 2734-47-6 2% 2-Methyl-1-nonene 2980-71-4 3.41%
eicosapentaenoate

C21:0 Methyl
6064-90-0 2% 1-Decene 872-05-9 2.27%
heneicosanoate

C22:0 Methyl
929-77-1 4%
behenate

C22:1 (cis-13) Methyl


1120-34-9 2% SV#5 *
erucate

C22:2 (all-cis-13,16)
Methyl 61012-47-3 2% Name CAS Conc. Used
docosadienoate
C22:6 (all-cis-
4,7,10,13,16,19)
2566-90-7 2% Naphthalene 91-20-3
Methyl
docosahexaenoate

C23:0 Methyl
2433-97-8 2% Acenaphthylene 209-96-8
tricosanoate

C24:0 Methyl
2442-49-1 4% Acenaphthene 83-32-9
lignocerate

C24:1 (cis-15) Methyl


2733-88-2 2% Fluorene 86-73-7
nervonate
100µL for mix
2,000 μg/mL

Phenanthrene 85-01-8

Alcohol Anthracene 120-12-7

Name CAS Conc. Used Fluoranthene 206-44-0

1-Butanol 71-36-3 Pyrene 129-00-0


100µL for mix
2,000 μg/mL

1-Pentanol 71-41-0 Benz[a]anthracene 56-55-3

1-Hexanol 111-27-3 Chrysene 218-01-9

S-4
1-Heptanol 111-70-6 Benzo[b]fluoranthene 205-99-2

1-Octanol 111-87-5 Benzo[a]pyrene 50-32-8

1-Nonanol 143-08-8

1-Decanol 112-30-1

1-Dodecanol 112-53-8

Aldehydes

Name CAS Conc. Used

Pentanal 110-62-3

Hexanal 66-25-1

Heptanal 111-71-7
100µL for mix

Octanal 124-13-0
2,000 μg/mL

Nonanal 124-19-6

Decanal 112-31-2

Undecanal 112-44-7

Dodecanal 50-28-2

Table S 1: Compounds used in the" 114 compounds standard mixture". Each compound class was either purchased as
prepared mix directly from the manufacturer or prepared individually. 100µL of each mix was used for the total mix.
Compound classes marked with * contained more compounds, which could not be detected with the presented GC×GC-
HRTOFMS method.

S-5
Initial data processing parameters

Parameter Value Parameter Value


S/N Ratio 10 Remove BKG Ions CO2, PFTBA Ions
Peak Quality 0.7 Remove Solvent Peak Yes
PeakConfidence 3.0
GC×GC SupPeak 500
Combining Threshold
Table S 2: Initial data processing parameters.

RESULTS AND DISCUSSION


Criteria for developed and evaluated classes

Class Fragment to include Fragment to exclude KBR


Formula KMD Occurence Formula KMD
Alkanes O+ (alcohol) 0.0168 R1 = 43.05 or 57.07
H+ -0.0062 3 or more
HO+ (aldehyde) 0.0301 or 71.05
Mono R1 = 43.05 or 55.05
unsaturated O+ (alcohol) 0.0168 or 56.05 or 69.06;
CH+ 0.0072 3 or more
CHO2 (ester) 0.0530
ratio m/z 55 to 83
Benzenes R1 = 78.04 or 91.06
or 92.06 or 105.06
C3+ 0.0407 1 or more 0.0766
or 106.05 or 119.06
or 118.07
FAMEs R1 = 43.018 or
CH2O2+ 0.0459 55.02 or 67.05 or
CHO2+ 0.0530 74.03 or 79.05 or
115.04
PAHs R1 = 128 or 141 or
C6+ 0.0809
151 or 153 or 155 or
C8+ 0.1210
165 or 170 or 178 or
C9+ 0.1076
202 or 228 or 252.09
Alcohols Here R1 and R2 are
investigated based.
Primary Alcohols:
R1: 41 or 45 or 55 or
69
OH+ 0.0168 R2: 41 or 45 or 55 or
CH2O2+ 0.0459
CHO+ 0.0302 69
Secondary
Alcohols:
R1: 45 or 57 or 59
R2: 41 or 43 or 45 or
57 or 69
Aldehydes *Spectra showing an
occurrence of 2 or
more of this KMD
OH+ 0.0168* will be excluded
CHO+ 0.0302 2 or more
CH2O2+ 0.0459# # Spectra showing an

occurrence of 3 or
more of this KMD
will be excluded

S-6
Naphthenes Ratio 55.05 and
83.06 > 1
CH+ 0.0072 2 or more O+ (alcohol) 0.0168
CH2+ 0.0005 1 or more CHO2 (ester) 0.0530
R1 = 56.06 or 69.06
or 84.09
Table S 3: Kendrick Mass Defects and selection criteria for the developed and evaluated classifiers. The Fragments where
selected based on the fragmentation patterns for the respective classes as described in McLafferty, Interpretation of Mass
Spectra

Suggested Criteria for additional classes, not yet evaluated

Class Fragment to include Fragment to exclude KBR


Formula KMD Occurence Formula KMD
Siloxanes R1 = 73;04 or
HOSi+ 0.0711 3 or more
147.06 or 207.03 or
H5O3Si3+ 0.1988 2 or more
281.05 or355.06
Ketones CHO+ 0.0302 2 or more R1 =43.01 or 58.04
Tetrahydro- R1 = 104 or 117 or
naphthalenes 118 or 131 or 132
C4+ 0.0541
(HydroN4) or 145 or 146 or
159
Decahydro- R1 = 137.13 or
naphthalenes C2H+ 0.0206 138.14 or 81.07 or
(HydroN) 95.09
MonoTerpene C2H+ 0.0206 R1 = 67.054 or
C+ 0.0138 68.062 or 93.069
Table S 4: Suggestions for Kendrick Mass Defects and selection criteria for the additional classifiers. The Fragments where
selected based on the fragmentation patterns for the respective classes as described in McLafferty, Interpretation of Mass
Spectra

S-7
MASS SPECTRA SIMILARITY NAPHTHENES AND ALKENES

Figure S1: Illustration of Spectral Similarity between Naphthenes and Alkenes. Spectra reprinted with permission from
https://sdbs.db.aist.go.jp (National Institute of Advanced Industrial Science and Technology, 06/03/2019)

S-8
TPR vs FPR

Figure S2: Performance evaluation of the developed classifiers in the receiver ROC space. The dotted red line represents a
random guess of the chemical compound class. The upper left corner (X/Y=0/1) represents a perfect classification

ROC Curves Diesel vs. Kerosene

Figure S3: ROC Curves for the predictions of Diesel and Kerosene Samples based on the summed area values generated by
the GC×GC-HRTOFMS classification approach. AUC values of 1 indicate perfect distinction in-between the two different
fuel types. K-Nearest Neighbor clustering with 5fold cross validation and previous PCA that explains 85% variance.

S-9
Prediction: unweathered Prediction: 25% weathered
1 1

0.8 0.8
True positive rate

True positive rate


0.6 0.6
AUC = 1 AUC = 0.77778

0.4 0.4

0.2 0.2

0 0
0 0.5 1 0 0.5 1
False positive rate False positive rate

Prediction: 50% weathered Prediction: 75% weathered


1 1

0.8 0.8
True positive rate

True positive rate

0.6 0.6
AUC = 0.77778 AUC = 1

0.4 0.4

0.2 0.2

0 0
0 0.5 1 0 0.5 1
False positive rate False positive rate

Figure S4: ROC Curves for the predictions of the different weathering grades of Diesel based on the summed Area values
for the chemical compound classes generated by the GC×GC-HRTOFMS classification approach. Unweathered and Diesel
weathered 75% can be easily differentiated from the bulk as indicated by an AUC value of 1. The differentiation of 25% and
50% weathering grade is more difficult and less accurate (AUC values of 0.78).

PCA, Kerosene weathered, log, mean centered

1
unweathered

0.8 25% weathered


50% weathered
75% weathered
0.6

0.4

0.2
PC2 (29.5%)

-0.2

-0.4

-0.6

-0.8

-1

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1


PC1 (61.3%)

Figure S5: Results from PCA directly applied to the area per peak values for the weathered Kerosene experiments.

S-10
Prediction: unweathered Prediction: 25% weathered
1 1

0.8 0.8
True positive rate

True positive rate


0.6 0.6
AUC = 0.85185 AUC = 0.96296

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False positive rate False positive rate

Prediction: 50% weathered Prediction: 75% weathered


1 1

0.8 0.8
True positive rate

True positive rate


0.6 0.6
AUC = 0.92593 AUC = 1

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False positive rate False positive rate

Figure S6: ROC Curves for the predictions of the different weathering grades of Kerosene based on the summed Area values
for the chemical compound classes generated by the GC×GC-HRTOFMS classification approach. Unweathered and
Kerosene weathered 75% can be easily differentiated from the bulk as indicated by an AUC value of 1. The differentiation of
inweathered and 25% weathered kersosene however is not as distinguished as compared to the diesel weathereing studies.

S-11
Pseudochromatographic Representation
6

5
Second Dimension Retention Time [sec]

Peaks
1 linear alkanes
Branched alkanes
elution profile curve linear alkanes (fit)

0
0 1000 2000 3000 4000 5000 6000 7000
First Dimension Retention Time [sec]

Figure S7: Utilizing post-retention information for further discrimination of the classifier’s performance. Here, the
retetntion pattern of the linear alkanes (red) is predicted (blue). Everything compound eluting in the second dimension with
a retention time below the blue prediction is assigned to the “branched” alkane class. Since the prediction of the linear
alkanes is based on empirical measurements (such as e.g. standards etc.) this particular classifier provides good results
expect for very volatile compounds (in the beginning of the run) due to similar elution times in the second dimension.

S-12

You might also like