Professional Documents
Culture Documents
1
Overview
• CompoundCalculator – a simple (but useful) tool
o Calculate and Match
• Origin
• Adducts
• Case studies
o PCVG example
o 2,3, dimethyl succinic acid
o MSMS example from PCVG group
2
PCVG background
• PCVG uses data from multiple samples, e.g. metabolomics, and finds variables
that have similar expression profiles across the samples. These are often related
so identifying one can give clues to the others
o Ivosev et al, AnalChem 2008(80)-4933; Mohamed et al, AnalChem 2009(81)-7677
o Uses PCA loadings to find related variables
• This data is from samples collected in roadside intoxication tests and known to
contain groups with cocaine, THC, both and blanks, and analyzed by positive
SWATH LCMS [Klont et al, Talanta 2020(211)-120747]
o PCVG produced several groups of related peaks, some of which were predominantly in
single samples
o One such group contained 540 variables grouped at different retention times
o Searching the mass of one large peak suggested it might be an Ibuprofen metabolite
o Known Ibuprofen metabolites include
• Hydroxy forms (at least 2)
• COOH form corresponding to the oxidation of CH3 to CO2H
• Glucuronide conjugates
• How many features can be annotated? How many are associated with Ibuprofen?
3
SJ Drugs of Abuse data
• Features (m/z, rt pairs) determined by MarkerView for all samples and exported
• Colour = log(intensity), size = sqrt (non-zero count)
m/z
RT
4
PCVG group 1 from sample C001 (best responder)
• Colour = log(intensity), size = loadings magnitude
m/z
RT
5
Introduction
• The CompoundCalculator Suite helps interpret complex peak lists – from single
spectra or multiple with retention times
• Electrospray Ionization often produces multiple peaks for each analyte
o Adducts – formed with small cations (see comments)
o In source fragments
o Multimers and heterodimers – contain multiple molecules of one or more analytes, co-
eluting analytes or background compounds
• Biological samples can contain many related analytes, i.e. metabolites
o Phase 1 – usually oxidation forms such as hydroxyls, acids and oxides
o Phase 2 – conjugates such as glucuronides and sulphates
• Thus a single compound, such as a drug (therapeutic or recreational) can produce
many related forms
• An important step in spectrum interpretation is determining the number of
underlying compounds which requires calculating many potential masses
• The CompoundCalculator Suite performs these calculations and matches them to
a peak list
6
Introduction - Workflow
Compound(s)
Specified as lists of “compositions”,
i.e. a mass and a name Metabolites and limits
Adducts and limits
Peak list
*Currently manual 7
Adducts
• Adducts are generally regarded as ions formed with small cations rather than the
expected protons
o I.e. M+Na+, M+K+, M+NH4+ rather than M+H+
• But singly charged forms with multiple cations are also observed
o Can’t be addition of multiple Na+, K+ etc.
• We believe that these ions result from the replacement of labile protons by small
cations and the addition of a proton to provide the charge, i.e.
o M » M + (Na-H) » [M + (Na-H) + H+] ≡ [M + Na]+
o M » M + 2(Na-H) » [M + 2(Na-H) + H+] ≡ [M + 2Na - H]+
o M » M + 3(Na-H) » [M + 3(Na-H) + H+] ≡ [M + 3Na - 2H]+
o Note the first form is identical to M + Na+
• We call these “canonical forms”; although reported previously there has been no
comment about their origin
• This mechanism allows incorporation of other species, e.g. formates:
o M » M + HCOONa + H+ ≡ [M + (Na-H) + CH2O2 + H+]
• This allows simpler calculations by combining basic forms rather than specific
values, and lists can be used for positive or negative ions
o Depending on adding or subtracting a proton
8
Calculator approach
Base compounds can be real compounds, known metabolites (e.g.
apovinpocetin which is formed by loss of C2H4 from vinpocetin) or can
Specify base compounds,
correspond to hypothetical compound included to explain unknown
peaks, e.g. x544 to explain peaks at 562 (M+NH4+) and 567 (M+Na+) modifications and limits, and
processing options
All compounds undergo the same modifications, including dimer and Each modification or multimer
heterodimer generation, and adduct addition generates a new compound
Apply phase 1 and phase 2 which is added to the
compound list
modifications
Generate combinations of Apply specified losses to all Losses also generate new
adducts compounds compounds
10
Implementation and availability
• Written in Python 3.7 in Jupyter Notebooks
o Especially convenient for interactive data science
o Platform independent with many packages for science and machine learning
• Available on Github https://github.com/arbi56/CompoundCalculator
1 Click here
2 Click to download
notebooks
11
Running
• There are two ways to access the notebooks (see Implementation..)
13
CALCULATOR
DETAILS AND PARAMETERS
21
Setup Parameters – data file path
• Data path generation
o The cell shown below sets a default path to an output folder
• This is useful if Match uses the same folder so lists can be easily transferred
o The cell checks to see if it is running in CoLab; if so, it sets the path to a folder on GDrive
o You should change the paths and file names for your system
Extend/modify to reflect
your file organization
23
Setup Parameters - overview
• Most parameters are defined here; they are explained in the next few slides
24
Setup Parameters - limits
• The calculator determines compounds and adduct combinations separately then
combines them
o Both are calculated using ‘limits’ – tuples of (a composition, max count)
o Compounds are calculated by successively applying phase 1, phase 2 and losses
o The limits define which modifications are considered and the maximum count
• Minimum is always 0, i.e. unmodified
• As shown below, it is convenient to use the ‘ionization’ parameter to switch sets
of limits
The Match notebook automatically searches for 13C isotopes, but other isotopes can be specifically
included. 41K has a natural abundance of 6% and becomes significant at higher counts;
replacement with this isotope is represented as K*H.
27
Changing core modifications
• In general Compositions are provided as tuples of (mass, name)
o Note: these are not elemental compositions and have no chemical knowledge
o This is part of the Composition class, but can be set elsewhere if desired
• Modifications are set as a dictionary of name:mass values
o Names are simple strings and reflect the common notation not the elemental composition,
e.g. as defined:
• ‘COOH’ is used with Ibuprofen for oxidation of -CH3 to –CO2H; the mass corresponds to this change
• ‘OH’ indicates oxidation of –H to –OH so the mass is that of an oxygen atom
• Specific isotopes can be set, e.g. (K*H) is equivalent to (K-H) but for the 41K isotope (6%)
– The Match code automatically looks for 13C isotopes
• Sodium and potassium formate can be set specifically, but using ‘CH2O2’, ‘C2H4O2’, etc. is more
flexible
– E.g. for addition of multiple formates, the exact
position (Na and K) cannot be determined from
the mass
• Losses must be negative and can be simple
(H2O, CO2, etc.) or correspond to known
(or suspected) neutral losses
– E.g. loss of ribose (‘Rib’) occurs in guanosine to
produce a strong fragment ion
– An alternative is to include the fragment as a base
compound
28
MATCH
DETAILS AND PARAMETERS
30
Overview
• Compares a list of masses and labels generated with CompoundCalculator to an
input peak list.
o Peak list is tab-delimited and must have mass values but can also contain columns for
Retention Time (RT) and Intensity
• The function that reads the peak list tries to determine which columns are present
o Allows for complex peak and target ion lists that can result in multiple matches.
• A cell shows how to list peaks with redundant matches – can be simplified
• Unmatched peaks greater than a given intensity threshold (percent base peak intensity) can be
shown
o Results can be saved in several ways including a simple mass/intensity list and more
detailed lists
• The former is useful with PeakView which allows text lists to be imported as spectra and overlaid on
the original data
• Detailed lists are useful for interpretation
31
Notebook organization
• See also Calculator Notebook organization
• Preamble
o Describes the program and intended use
• Setup
o Defines file paths, match parameters and output options
• Step 1 – Match
o Matches the peaks and target ions
o For matched peaks looks for 13C ions
• Step 2 – Report, save
o Displays matches
• Peak, target ion, error, monoisotopic peak
o Shows redundant matches
• I.e. peaks that have multiple target ions
o Shows unmatched peaks above an intensity threshold
• Emphasizes peaks that still need to be explained
32
Setup - parameters
• Parameters
o save_matches
• If true, the matches are saved to a tab-delimited text file with a header line
– Mass, intensity, error (mmu), RT, Pk_index, Mono_peak, Target_mass, Target Root, Target Label
• If Mono_peak = Pk_index the peak is monoisotopic
• File is saved to the same directory as the peaks file and the name is:
– Peaks file name + compounds in target list + ‘matches’ + date_time (optional)
o local_files
• If true, results are saved to the same directory as the notebook, otherwise data_file_path is used
o include_large_unmatched
• If true the largest unmatched peaks are included with the label ‘None’
o Include_data_in_file_name If both are specified the larger will be used.
• True to include the date_time string This allows the match tolerance to change as the
o amu_window mass increases but never be smaller than the
• Tolerance for matching in amu, typically 0.005 (i.e. ± 5 mmu) amu_window value.
o ppm_window
• Tolerance in ppm These are half-windows so ±
o c13_half_window
• Tolerance for matching 13C isotopes in amu, typically 0.005 (i.e. ± 5 mmu)
o max_C13_count
• Maximum number of 13C isotope peaks to consider, typically 3 or 4
o c13_rt_window
• If the peaks file has RT values they must also match for potential isotope peaks
o require_lower_c13_inten
• If True the intensity of the isotope(s) must be less than the intensity of the initial matched peak
35
Example output – printed match list
Peak mass, match error, retention time, intensity
Target mass, mono peak index, root, label
Isotopes
36
Example output – redundant and unmatched
• Redundant (partial)
o The number of redundant peaks and matches is always reported but the peaks are only
listed if print_redundant_matches is True
Heterodimer
37
CASE STUDIES
1. Guanosine FIA spectrum
2. PCVG Group from DOA samples
3. Di-methyl succinic acid
4. Unknown MSMS spectrum from 2
Note
The result summaries don’t match the current
software; they will be updated later (when I stop
making changes!)
38
Setup for interactive use
• Calculator and Match in JupyterLab
SharedData
folder
39
Basic workflow
Calculator Match
Save, open in PeakView,
Run cells above Setup overlay
Set modifications,
New adducts?
multimers and adducts
40
PCVG group 1
50
Partial match list (by compound)
First pass - output
Peak Peak Delta Peak Peak Target Mono Target Target
index mass (mmu) RT Inten mass peak Root Label 122 peaks, 26.9 %
Isotopes.
Mono peak index is not the peak index
The other just lists the peak in mass order; this one can
also skip isotopes
MH+
52
Second pass – add NH3, increase Na and K
53
Third pass – add HCOOH loss and Hex as adduct or compound
As adduct
54
Ibuprofen XICs
• Target lists can also include an ‘xic width’
o Use in PeakView with ‘Extract ions using dialog’ + import (right-click)
55
2,2 Dimethylsuccinic acid
56
Background
• Broeckling showed the spectrum below – many unexplained ions
o Anal. Chem. 2016(88)-9226, suppl fig 2
MW = 146.0579
57
Recent spectrum from Ali
• Resembles Broeckling with extra peaks
58
What are the unknown peaks?
• Suggestion from Gerard that they may be due to metals – Ca, Al, Fe
59
First pass
60
Matches
575.0639-429.0079 = 146.056
61
Unknown pattern
2+ 575.0639 + (Na-H)
62
What causes the strange isotope pattern??
= 146.054 + 136.8973
Barium!
Ba-H = 136.897422
64
With Ba
mmu
mmu
65
With Ba - 2
mmu
C6H9BaO4
2+
≡
mmu
[M+Ba-H]+
≡ C18H30BaO12
M+(Ba-2H)+H+ ≡
[3M+Ba]2+
66
Generated a new peak list > 0.2% to capture higher mass ions
• Increased half-window to 0.015 (should use ppm?)
67
Region > 575
68
Broeckling spectrum
• From Gerard
1.8 ppm
52 peaks, 68.2%
69
Unknown MSMS spectrum from ibuprofen group 1
70
MSMS from PCVG group
• Since the PCVG data was based on a SWATH analysis, it was possible to generate
the MSMS spectrum at 8.08 min
• Match with Ibu only
Peak at 369.1977 is lower by (Na-H) -> MH+ of underlying compound? Also has peaks
corresponding to losses of 1 and 2 H2O
71
Second pass
• With C6H10O5 and glucuronide
61 peaks, 52.0 %
72
Third pass
• With Gluc added – will calculate ions of glucuronic acid alone
73
Final assignments
74