For UGe

CompoundCalculator Suite
Based on Manual.pptx, 20201213

An introduction to CompoundCalculator, Match and Interpret
(coming later)
ron@bonners.ca
1
Overview
• CompoundCalculator – a simple (but useful) tool
o Calculate and Match
• Origin
• Adducts
• How to use the Calculator

o Where to get it
o How to run it
o Files and parameters
• How to use Match

o Files and parameters
o Example output
• Case studies
o PCVG example
o 2,3, dimethyl succinic acid
o MSMS example from PCVG group
2
PCVG background
• PCVG uses data from multiple samples, e.g. metabolomics, and finds variables
that have similar expression profiles across the samples. These are often related
so identifying one can give clues to the others
o Ivosev et al, AnalChem 2008(80)-4933; Mohamed et al, AnalChem 2009(81)-7677
o Uses PCA loadings to find related variables
• This data is from samples collected in roadside intoxication tests and known to
contain groups with cocaine, THC, both and blanks, and analyzed by positive
SWATH LCMS [Klont et al, Talanta 2020(211)-120747]
o PCVG produced several groups of related peaks, some of which were predominantly in
single samples
o One such group contained 540 variables grouped at different retention times
o Searching the mass of one large peak suggested it might be an Ibuprofen metabolite
o Known Ibuprofen metabolites include
• Hydroxy forms (at least 2)
• COOH form corresponding to the oxidation of CH3 to CO2H
• Glucuronide conjugates
• How many features can be annotated? How many are associated with Ibuprofen?
3
SJ Drugs of Abuse data
• Features (m/z, rt pairs) determined by MarkerView for all samples and exported
• Colour = log(intensity), size = sqrt (non-zero count)
m/z
RT
4
PCVG group 1 from sample C001 (best responder)
• Colour = log(intensity), size = loadings magnitude
m/z
RT
5
Introduction
• The CompoundCalculator Suite helps interpret complex peak lists – from single
spectra or multiple with retention times
• Electrospray Ionization often produces multiple peaks for each analyte
o Adducts – formed with small cations (see comments)
o In source fragments
o Multimers and heterodimers – contain multiple molecules of one or more analytes, co-
eluting analytes or background compounds
• Biological samples can contain many related analytes, i.e. metabolites
o Phase 1 – usually oxidation forms such as hydroxyls, acids and oxides
o Phase 2 – conjugates such as glucuronides and sulphates
• Thus a single compound, such as a drug (therapeutic or recreational) can produce
many related forms
• An important step in spectrum interpretation is determining the number of
underlying compounds which requires calculating many potential masses
• The CompoundCalculator Suite performs these calculations and matches them to
a peak list
6
Introduction - Workflow
Compound(s)
Specified as lists of “compositions”,
i.e. a mass and a name Metabolites and limits
Adducts and limits
Using a common directory helps

Target lists have mass, root, name. transfer data between modules
Root is used to help organize lists and matches
Calculate
Update parameters for:

new compounds
Target ion lists new adducts
new losses…
Shared
storage
Input peak lists can include
intensity and retention time
Output peak lists can be be

overlaid in spectrum display
tools (e.g. PeakView)
Match Interpret*
Detailed match lists
Peak list
*Currently manual 7
Adducts
• Adducts are generally regarded as ions formed with small cations rather than the
expected protons
o I.e. M+Na+, M+K+, M+NH4+ rather than M+H+
• But singly charged forms with multiple cations are also observed
o Can’t be addition of multiple Na+, K+ etc.
• We believe that these ions result from the replacement of labile protons by small
cations and the addition of a proton to provide the charge, i.e.
o M » M + (Na-H) » [M + (Na-H) + H+] ≡ [M + Na]+
o M » M + 2(Na-H) » [M + 2(Na-H) + H+] ≡ [M + 2Na - H]+
o M » M + 3(Na-H) » [M + 3(Na-H) + H+] ≡ [M + 3Na - 2H]+
o Note the first form is identical to M + Na+
• We call these “canonical forms”; although reported previously there has been no
comment about their origin
• This mechanism allows incorporation of other species, e.g. formates:
o M » M + HCOONa + H+ ≡ [M + (Na-H) + CH2O2 + H+]
• This allows simpler calculations by combining basic forms rather than specific
values, and lists can be used for positive or negative ions
o Depending on adding or subtracting a proton
8
Calculator approach
Base compounds can be real compounds, known metabolites (e.g.
apovinpocetin which is formed by loss of C2H4 from vinpocetin) or can
Specify base compounds,
correspond to hypothetical compound included to explain unknown
peaks, e.g. x544 to explain peaks at 562 (M+NH4+) and 567 (M+Na+) modifications and limits, and
processing options
All compounds undergo the same modifications, including dimer and Each modification or multimer
heterodimer generation, and adduct addition generates a new compound
Apply phase 1 and phase 2 which is added to the
compound list
modifications
Specify adducts, limits and the Multimers combine the same

Calculate multimers and
compound. Heterodimers
maximum number allowed heterodimers combine different compounds
Generate combinations of Apply specified losses to all Losses also generate new
adducts compounds compounds
Add each adduct form to each

compound
Note: the combinatorial
nature of the calculations can
result in many species! Add or subtract a proton for
each ion form
Save masses and labels
10
Implementation and availability
• Written in Python 3.7 in Jupyter Notebooks
o Especially convenient for interactive data science
o Platform independent with many packages for science and machine learning
• Available on Github https://github.com/arbi56/CompoundCalculator
1 Click here
2 Click to download
notebooks
Needs local Python

installation (anaconda)
Alternative: click to run in Google

CoLaboratory (CoLab)
11
Running
• There are two ways to access the notebooks (see Implementation..)
o Download and run on your own machine

• Requires installation of Python and Jupyter
– Recommend Anaconda and JupyterLab (installed with Anaconda)
• All local resources (files) are available
• Easy to have two notebooks open and run interactively
– Update Calculate parameters; Match; repeat
o Run on Google’s ‘CoLab’ platform

• Web-based Jupyter like environment; temporary; does not require local installation
– Resources are reclaimed after the session ends
– Designed for Machine Learning; many science packages available
• Requires a Google account
• Cannot normally access local file system
– Data and results must be uploaded or downloaded
• Can access a Google Drive account (GDrive)
– Useful for sharing data between modules
13
CALCULATOR
DETAILS AND PARAMETERS
21
Setup Parameters – data file path
• Data path generation
o The cell shown below sets a default path to an output folder
• This is useful if Match uses the same folder so lists can be easily transferred
o The cell checks to see if it is running in CoLab; if so, it sets the path to a folder on GDrive
o You should change the paths and file names for your system
This will fail if CoLab is not available and the

code following ‘except” will be executed
Mount GDrive (always the same)
Extend/modify to reflect
your file organization
File name. Modify to reflect

your file organization
In Windows, local paths are specified as:
data_path = ‘C:’ + os.sep + os.path.join(‘dir_1’, ‘dir_2’)
23
Setup Parameters - overview
• Most parameters are defined here; they are explained in the next few slides
24
Setup Parameters - limits
• The calculator determines compounds and adduct combinations separately then
combines them
o Both are calculated using ‘limits’ – tuples of (a composition, max count)
o Compounds are calculated by successively applying phase 1, phase 2 and losses
o The limits define which modifications are considered and the maximum count
• Minimum is always 0, i.e. unmodified
• As shown below, it is convenient to use the ‘ionization’ parameter to switch sets
of limits
Brackets – [], () – are required even if the is only one entry

Same for both polarities
Some conjugates respond better in negative mode
Modification can be switched off by removing

it from the list or setting the limit to 0
The Match notebook automatically searches for 13C isotopes, but other isotopes can be specifically
included. 41K has a natural abundance of 6% and becomes significant at higher counts;
replacement with this isotope is represented as K*H.
27
Changing core modifications
• In general Compositions are provided as tuples of (mass, name)
o Note: these are not elemental compositions and have no chemical knowledge
o This is part of the Composition class, but can be set elsewhere if desired
• Modifications are set as a dictionary of name:mass values
o Names are simple strings and reflect the common notation not the elemental composition,
e.g. as defined:
• ‘COOH’ is used with Ibuprofen for oxidation of -CH3 to –CO2H; the mass corresponds to this change
• ‘OH’ indicates oxidation of –H to –OH so the mass is that of an oxygen atom
• Specific isotopes can be set, e.g. (K*H) is equivalent to (K-H) but for the 41K isotope (6%)
– The Match code automatically looks for 13C isotopes
• Sodium and potassium formate can be set specifically, but using ‘CH2O2’, ‘C2H4O2’, etc. is more
flexible
– E.g. for addition of multiple formates, the exact
position (Na and K) cannot be determined from
the mass
• Losses must be negative and can be simple
(H2O, CO2, etc.) or correspond to known
(or suspected) neutral losses
– E.g. loss of ribose (‘Rib’) occurs in guanosine to
produce a strong fragment ion
– An alternative is to include the fragment as a base
compound
28
MATCH
DETAILS AND PARAMETERS
30
Overview
• Compares a list of masses and labels generated with CompoundCalculator to an
input peak list.
o Peak list is tab-delimited and must have mass values but can also contain columns for
Retention Time (RT) and Intensity
• The function that reads the peak list tries to determine which columns are present
o Allows for complex peak and target ion lists that can result in multiple matches.
• A cell shows how to list peaks with redundant matches – can be simplified
• Unmatched peaks greater than a given intensity threshold (percent base peak intensity) can be
shown
o Searches for 13C peaks of matched peaks

• Keeps peaks and isotopes together; other matches are also shown
o Results can be saved in several ways including a simple mass/intensity list and more
detailed lists
• The former is useful with PeakView which allows text lists to be imported as spectra and overlaid on
the original data
• Detailed lists are useful for interpretation
o Convenient to open the Calculator and Match notebooks in side-by-side windows so it is

easy to update the target ion list and repeat the matching
31
Notebook organization
• See also Calculator Notebook organization
• Preamble
o Describes the program and intended use
• Setup
o Defines file paths, match parameters and output options
• Step 1 – Match
o Matches the peaks and target ions
o For matched peaks looks for 13C ions
• Step 2 – Report, save
o Displays matches
• Peak, target ion, error, monoisotopic peak
o Shows redundant matches
• I.e. peaks that have multiple target ions
o Shows unmatched peaks above an intensity threshold
• Emphasizes peaks that still need to be explained
32
Setup - parameters
• Parameters
o save_matches
• If true, the matches are saved to a tab-delimited text file with a header line
– Mass, intensity, error (mmu), RT, Pk_index, Mono_peak, Target_mass, Target Root, Target Label
• If Mono_peak = Pk_index the peak is monoisotopic
• File is saved to the same directory as the peaks file and the name is:
– Peaks file name + compounds in target list + ‘matches’ + date_time (optional)
o local_files
• If true, results are saved to the same directory as the notebook, otherwise data_file_path is used
o include_large_unmatched
• If true the largest unmatched peaks are included with the label ‘None’
o Include_data_in_file_name If both are specified the larger will be used.
• True to include the date_time string This allows the match tolerance to change as the
o amu_window mass increases but never be smaller than the
• Tolerance for matching in amu, typically 0.005 (i.e. ± 5 mmu) amu_window value.
o ppm_window
• Tolerance in ppm These are half-windows so ±
o c13_half_window
• Tolerance for matching 13C isotopes in amu, typically 0.005 (i.e. ± 5 mmu)
o max_C13_count
• Maximum number of 13C isotope peaks to consider, typically 3 or 4
o c13_rt_window
• If the peaks file has RT values they must also match for potential isotope peaks
o require_lower_c13_inten
• If True the intensity of the isotope(s) must be less than the intensity of the initial matched peak
35
Example output – printed match list
Peak mass, match error, retention time, intensity
Target mass, mono peak index, root, label
Isotopes
Same mass but different RT

(i.e. different peaks)
Simplify mode, so only first

target (lowest absolute error) is
printed
36
Example output – redundant and unmatched
• Redundant (partial)
o The number of redundant peaks and matches is always reported but the peaks are only
listed if print_redundant_matches is True
Same peak, three matches;

the first (lowest error) is
considered most likely
Heterodimer
This code adds a line between

redundant groups
• Unmatched >= 1% base peak intensity
37
CASE STUDIES
1. Guanosine FIA spectrum
2. PCVG Group from DOA samples
3. Di-methyl succinic acid
4. Unknown MSMS spectrum from 2
Note
The result summaries don’t match the current
software; they will be updated later (when I stop
making changes!)
38
Setup for interactive use
• Calculator and Match in JupyterLab
SharedData
folder
39
Basic workflow
Calculator Match
Save, open in PeakView,
Run cells above Setup overlay
Set base compounds New compounds?
Set modifications,
New adducts?
multimers and adducts
Review output cells

Set other parameters
(e.g. Save files)
Run Match Cells
Run Setup cell and all

Target ions file Import Target Ions File
below
Import Peak List
Run cells above Setup
40
PCVG group 1
50
Partial match list (by compound)
First pass - output
Peak Peak Delta Peak Peak Target Mono Target Target
index mass (mmu) RT Inten mass peak Root Label 122 peaks, 26.9 %
Same mass at different RTs - isomers
Isotopes.
Mono peak index is not the peak index
This is the second type of match results display.
The other just lists the peak in mass order; this one can
also skip isotopes
Print compound summary

51
First pass - analysis
• Unmatched ions > 1%
o Largest peaks are not yet identified
From previous analysis:

399.1642 is Ibu_OH-gluc.H+
421.1477 is Ibu_OH-gluc.(Na-H).H+
Masses suggest 416.1915 is M+NH4+
MH+
52
Second pass – add NH3, increase Na and K
196 peaks, 89.2 % (up from 122 peaks, 26.9 %)
177 is loss of HCOOH from Ibuprofen-OH

Add HCOOH as a loss
Several peaks have isotopes at same RT

-> credible peaks
MSMS of 562 suggests presence a hexose (added with

loss of H2O – see next section). Add Hex and increase
H2O loss count
53
Third pass – add HCOOH loss and Hex as adduct or compound
As adduct
268 peaks, 92.6 %

(up from 196 peaks, 89.2 %)
As compound (needs hetero_dimers = True) Heterodimer
317 peaks, 94.1 %

(up from 196 peaks, 89.2 %)
54
Ibuprofen XICs
• Target lists can also include an ‘xic width’
o Use in PeakView with ‘Extract ions using dialog’ + import (right-click)
Keep low for reasonable processing time
55
2,2 Dimethylsuccinic acid
56
Background
• Broeckling showed the spectrum below – many unexplained ions
o Anal. Chem. 2016(88)-9226, suppl fig 2
• Tobias spectrum has some common peaks…
MW = 146.0579
57
Recent spectrum from Ali
• Resembles Broeckling with extra peaks
58
What are the unknown peaks?
• Suggestion from Gerard that they may be due to metals – Ca, Al, Fe
• Based on large mass defect for adducts..
• Add to calculator and try..

o X = Al or Fe, both trivalent
o With one M: M + (X-H)++
o With two M: 2M + (X-2H)+
• Calculator adds H+ so need to use (X-3H) for replacement mass
o With three M: 3M + (X-3H) + H+
• Calculator adds H+ so need to use (X-3H) for replacement mass
o Fe-3H = 52.910913
o Al-3H = 23.957515
59
First pass
K-H (37.955881) and Ca-2H (37.94694)
70 peaks, 78.7% TIC
60
Matches
575.0639-429.0079 = 146.056
429.0079 – 282.9513 = 146.056
??? (there is also an ion at 575.0639 +

146.056 = 721.1198)
These ions all have interesting isotope

patterns…see next
61
Unknown pattern
2+ 575.0639 + (Na-H)
62
What causes the strange isotope pattern??
First (obvious) place with this pattern is 283.9513…
= 146.054 + 136.8973
Barium!
Ba-H = 136.897422
New composition (Ba-2H) = 135.889597
64
With Ba
98 peaks, 84.5% TIC

(from 70 peaks, 78.7%)
mmu
mmu
65
With Ba - 2
mmu
C6H9BaO4
2+
≡
mmu
[M+Ba-H]+
≡ C18H30BaO12
M+(Ba-2H)+H+ ≡
[3M+Ba]2+
Arrows indicate Ba isotopes
66
Generated a new peak list > 0.2% to capture higher mass ions
• Increased half-window to 0.015 (should use ppm?)
571 peaks, 73.9% TIC

(from 98 peaks, 84.5% )
67
Region > 575
68
Broeckling spectrum
• From Gerard
1.8 ppm
52 peaks, 68.2%
69
Unknown MSMS spectrum from ibuprofen group 1
70
MSMS from PCVG group
• Since the PCVG data was based on a SWATH analysis, it was possible to generate
the MSMS spectrum at 8.08 min
• Match with Ibu only
391.1713 = loss of 176.036 from the Na ion at 567 -> glucuronide?
Peak at 369.1977 is lower by (Na-H) -> MH+ of underlying compound? Also has peaks
corresponding to losses of 1 and 2 H2O
369.1977 to 207.138 (Ibuprofen.MH+) is 162.0597 = C6H10O5 with an error of 8mmu

= Hexose - H2O
Add C6H105 as a separate compound
71
Second pass
• With C6H10O5 and glucuronide
61 peaks, 52.0 %
Peak at 177.0397 corresponds to MH+ of glucuronic acid (177.039364) with 141.018

and 159.0291 corresponding to water losses
Add glucuronic acid
72
Third pass
• With Gluc added – will calculate ions of glucuronic acid alone
72 peaks, 67.2 % (from 61 peaks, 52.0 %)
97.0821 may be a sugar fragment? (C5H5O2, -0.3mmu)
73
Final assignments
74

For UGe

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

For UGe

Uploaded by

Copyright:

Available Formats

CompoundCalculator Suite

Based on Manual.pptx, 20201213

• How to use the Calculator

• How to use Match

Using a common directory helps

Update parameters for:

Output peak lists can be be

Specify adducts, limits and the Multimers combine the same

Add each adduct form to each

Save masses and labels

Needs local Python

Alternative: click to run in Google

o Download and run on your own machine

o Run on Google’s ‘CoLab’ platform

This will fail if CoLab is not available and the

Mount GDrive (always the same)

File name. Modify to reflect

In Windows, local paths are specified as:

data_path = ‘C:’ + os.sep + os.path.join(‘dir_1’, ‘dir_2’)

Brackets – [], () – are required even if the is only one entry

Some conjugates respond better in negative mode

Modification can be switched off by removing

o Searches for 13C peaks of matched peaks

o Convenient to open the Calculator and Match notebooks in side-by-side windows so it is

Same mass but different RT

Simplify mode, so only first

Same peak, three matches;

This code adds a line between

• Unmatched >= 1% base peak intensity

Set base compounds New compounds?

Review output cells

Run Setup cell and all

Import Peak List

Run cells above Setup

Same mass at different RTs - isomers

This is the second type of match results display.

Print compound summary

From previous analysis:

Masses suggest 416.1915 is M+NH4+

196 peaks, 89.2 % (up from 122 peaks, 26.9 %)

177 is loss of HCOOH from Ibuprofen-OH

Several peaks have isotopes at same RT

MSMS of 562 suggests presence a hexose (added with

268 peaks, 92.6 %

As compound (needs hetero_dimers = True) Heterodimer

317 peaks, 94.1 %

Keep low for reasonable processing time

• Tobias spectrum has some common peaks…

• Based on large mass defect for adducts..

• Add to calculator and try..

K-H (37.955881) and Ca-2H (37.94694)

70 peaks, 78.7% TIC

429.0079 – 282.9513 = 146.056

??? (there is also an ion at 575.0639 +

These ions all have interesting isotope

First (obvious) place with this pattern is 283.9513…

New composition (Ba-2H) = 135.889597

98 peaks, 84.5% TIC

Arrows indicate Ba isotopes

571 peaks, 73.9% TIC

391.1713 = loss of 176.036 from the Na ion at 567 -> glucuronide?

369.1977 to 207.138 (Ibuprofen.MH+) is 162.0597 = C6H10O5 with an error of 8mmu

Add C6H105 as a separate compound