You are on page 1of 49

CompoundCalculator Suite

Based on Manual.pptx, 20201213


An introduction to CompoundCalculator, Match and Interpret
(coming later)
ron@bonners.ca

1
Overview
• CompoundCalculator – a simple (but useful) tool
o Calculate and Match

• Origin

• Adducts

• How to use the Calculator


o Where to get it
o How to run it
o Files and parameters

• How to use Match


o Files and parameters
o Example output

• Case studies
o PCVG example
o 2,3, dimethyl succinic acid
o MSMS example from PCVG group

2
PCVG background
• PCVG uses data from multiple samples, e.g. metabolomics, and finds variables
that have similar expression profiles across the samples. These are often related
so identifying one can give clues to the others
o Ivosev et al, AnalChem 2008(80)-4933; Mohamed et al, AnalChem 2009(81)-7677
o Uses PCA loadings to find related variables

• This data is from samples collected in roadside intoxication tests and known to
contain groups with cocaine, THC, both and blanks, and analyzed by positive
SWATH LCMS [Klont et al, Talanta 2020(211)-120747]
o PCVG produced several groups of related peaks, some of which were predominantly in
single samples
o One such group contained 540 variables grouped at different retention times
o Searching the mass of one large peak suggested it might be an Ibuprofen metabolite
o Known Ibuprofen metabolites include
• Hydroxy forms (at least 2)
• COOH form corresponding to the oxidation of CH3 to CO2H
• Glucuronide conjugates

• How many features can be annotated? How many are associated with Ibuprofen?

3
SJ Drugs of Abuse data
• Features (m/z, rt pairs) determined by MarkerView for all samples and exported
• Colour = log(intensity), size = sqrt (non-zero count)

m/z

RT

4
PCVG group 1 from sample C001 (best responder)
• Colour = log(intensity), size = loadings magnitude

m/z

RT

5
Introduction
• The CompoundCalculator Suite helps interpret complex peak lists – from single
spectra or multiple with retention times
• Electrospray Ionization often produces multiple peaks for each analyte
o Adducts – formed with small cations (see comments)
o In source fragments
o Multimers and heterodimers – contain multiple molecules of one or more analytes, co-
eluting analytes or background compounds
• Biological samples can contain many related analytes, i.e. metabolites
o Phase 1 – usually oxidation forms such as hydroxyls, acids and oxides
o Phase 2 – conjugates such as glucuronides and sulphates
• Thus a single compound, such as a drug (therapeutic or recreational) can produce
many related forms
• An important step in spectrum interpretation is determining the number of
underlying compounds which requires calculating many potential masses
• The CompoundCalculator Suite performs these calculations and matches them to
a peak list

6
Introduction - Workflow
Compound(s)
Specified as lists of “compositions”,
i.e. a mass and a name Metabolites and limits
Adducts and limits

Using a common directory helps


Target lists have mass, root, name. transfer data between modules
Root is used to help organize lists and matches
Calculate

Update parameters for:


new compounds
Target ion lists new adducts
new losses…
Shared
storage
Input peak lists can include
intensity and retention time

Output peak lists can be be


overlaid in spectrum display
tools (e.g. PeakView)
Match Interpret*
Detailed match lists

Peak list
*Currently manual 7
Adducts
• Adducts are generally regarded as ions formed with small cations rather than the
expected protons
o I.e. M+Na+, M+K+, M+NH4+ rather than M+H+
• But singly charged forms with multiple cations are also observed
o Can’t be addition of multiple Na+, K+ etc.
• We believe that these ions result from the replacement of labile protons by small
cations and the addition of a proton to provide the charge, i.e.
o M » M + (Na-H) » [M + (Na-H) + H+] ≡ [M + Na]+
o M » M + 2(Na-H) » [M + 2(Na-H) + H+] ≡ [M + 2Na - H]+
o M » M + 3(Na-H) » [M + 3(Na-H) + H+] ≡ [M + 3Na - 2H]+
o Note the first form is identical to M + Na+
• We call these “canonical forms”; although reported previously there has been no
comment about their origin
• This mechanism allows incorporation of other species, e.g. formates:
o M » M + HCOONa + H+ ≡ [M + (Na-H) + CH2O2 + H+]
• This allows simpler calculations by combining basic forms rather than specific
values, and lists can be used for positive or negative ions
o Depending on adding or subtracting a proton

8
Calculator approach
Base compounds can be real compounds, known metabolites (e.g.
apovinpocetin which is formed by loss of C2H4 from vinpocetin) or can
Specify base compounds,
correspond to hypothetical compound included to explain unknown
peaks, e.g. x544 to explain peaks at 562 (M+NH4+) and 567 (M+Na+) modifications and limits, and
processing options
All compounds undergo the same modifications, including dimer and Each modification or multimer
heterodimer generation, and adduct addition generates a new compound
Apply phase 1 and phase 2 which is added to the
compound list
modifications

Specify adducts, limits and the Multimers combine the same


Calculate multimers and
compound. Heterodimers
maximum number allowed heterodimers combine different compounds

Generate combinations of Apply specified losses to all Losses also generate new
adducts compounds compounds

Add each adduct form to each


compound
Note: the combinatorial
nature of the calculations can
result in many species! Add or subtract a proton for
each ion form

Save masses and labels

10
Implementation and availability
• Written in Python 3.7 in Jupyter Notebooks
o Especially convenient for interactive data science
o Platform independent with many packages for science and machine learning
• Available on Github https://github.com/arbi56/CompoundCalculator

1 Click here

2 Click to download
notebooks

Needs local Python


installation (anaconda)

Alternative: click to run in Google


CoLaboratory (CoLab)

11
Running
• There are two ways to access the notebooks (see Implementation..)

o Download and run on your own machine


• Requires installation of Python and Jupyter
– Recommend Anaconda and JupyterLab (installed with Anaconda)
• All local resources (files) are available
• Easy to have two notebooks open and run interactively
– Update Calculate parameters; Match; repeat

o Run on Google’s ‘CoLab’ platform


• Web-based Jupyter like environment; temporary; does not require local installation
– Resources are reclaimed after the session ends
– Designed for Machine Learning; many science packages available
• Requires a Google account
• Cannot normally access local file system
– Data and results must be uploaded or downloaded
• Can access a Google Drive account (GDrive)
– Useful for sharing data between modules

13
CALCULATOR
DETAILS AND PARAMETERS

21
Setup Parameters – data file path
• Data path generation
o The cell shown below sets a default path to an output folder
• This is useful if Match uses the same folder so lists can be easily transferred
o The cell checks to see if it is running in CoLab; if so, it sets the path to a folder on GDrive
o You should change the paths and file names for your system

This will fail if CoLab is not available and the


code following ‘except” will be executed

Mount GDrive (always the same)

Extend/modify to reflect
your file organization

File name. Modify to reflect


your file organization

In Windows, local paths are specified as:

data_path = ‘C:’ + os.sep + os.path.join(‘dir_1’, ‘dir_2’)

23
Setup Parameters - overview
• Most parameters are defined here; they are explained in the next few slides

24
Setup Parameters - limits
• The calculator determines compounds and adduct combinations separately then
combines them
o Both are calculated using ‘limits’ – tuples of (a composition, max count)
o Compounds are calculated by successively applying phase 1, phase 2 and losses
o The limits define which modifications are considered and the maximum count
• Minimum is always 0, i.e. unmodified
• As shown below, it is convenient to use the ‘ionization’ parameter to switch sets
of limits

Brackets – [], () – are required even if the is only one entry


Same for both polarities

Some conjugates respond better in negative mode

Modification can be switched off by removing


it from the list or setting the limit to 0

The Match notebook automatically searches for 13C isotopes, but other isotopes can be specifically
included. 41K has a natural abundance of 6% and becomes significant at higher counts;
replacement with this isotope is represented as K*H.

27
Changing core modifications
• In general Compositions are provided as tuples of (mass, name)
o Note: these are not elemental compositions and have no chemical knowledge
o This is part of the Composition class, but can be set elsewhere if desired
• Modifications are set as a dictionary of name:mass values
o Names are simple strings and reflect the common notation not the elemental composition,
e.g. as defined:
• ‘COOH’ is used with Ibuprofen for oxidation of -CH3 to –CO2H; the mass corresponds to this change
• ‘OH’ indicates oxidation of –H to –OH so the mass is that of an oxygen atom
• Specific isotopes can be set, e.g. (K*H) is equivalent to (K-H) but for the 41K isotope (6%)
– The Match code automatically looks for 13C isotopes
• Sodium and potassium formate can be set specifically, but using ‘CH2O2’, ‘C2H4O2’, etc. is more
flexible
– E.g. for addition of multiple formates, the exact
position (Na and K) cannot be determined from
the mass
• Losses must be negative and can be simple
(H2O, CO2, etc.) or correspond to known
(or suspected) neutral losses
– E.g. loss of ribose (‘Rib’) occurs in guanosine to
produce a strong fragment ion
– An alternative is to include the fragment as a base
compound

28
MATCH
DETAILS AND PARAMETERS

30
Overview
• Compares a list of masses and labels generated with CompoundCalculator to an
input peak list.
o Peak list is tab-delimited and must have mass values but can also contain columns for
Retention Time (RT) and Intensity
• The function that reads the peak list tries to determine which columns are present

o Allows for complex peak and target ion lists that can result in multiple matches.
• A cell shows how to list peaks with redundant matches – can be simplified
• Unmatched peaks greater than a given intensity threshold (percent base peak intensity) can be
shown

o Searches for 13C peaks of matched peaks


• Keeps peaks and isotopes together; other matches are also shown

o Results can be saved in several ways including a simple mass/intensity list and more
detailed lists
• The former is useful with PeakView which allows text lists to be imported as spectra and overlaid on
the original data
• Detailed lists are useful for interpretation

o Convenient to open the Calculator and Match notebooks in side-by-side windows so it is


easy to update the target ion list and repeat the matching

31
Notebook organization
• See also Calculator Notebook organization
• Preamble
o Describes the program and intended use
• Setup
o Defines file paths, match parameters and output options
• Step 1 – Match
o Matches the peaks and target ions
o For matched peaks looks for 13C ions
• Step 2 – Report, save
o Displays matches
• Peak, target ion, error, monoisotopic peak
o Shows redundant matches
• I.e. peaks that have multiple target ions
o Shows unmatched peaks above an intensity threshold
• Emphasizes peaks that still need to be explained

32
Setup - parameters

• Parameters
o save_matches
• If true, the matches are saved to a tab-delimited text file with a header line
– Mass, intensity, error (mmu), RT, Pk_index, Mono_peak, Target_mass, Target Root, Target Label
• If Mono_peak = Pk_index the peak is monoisotopic
• File is saved to the same directory as the peaks file and the name is:
– Peaks file name + compounds in target list + ‘matches’ + date_time (optional)
o local_files
• If true, results are saved to the same directory as the notebook, otherwise data_file_path is used
o include_large_unmatched
• If true the largest unmatched peaks are included with the label ‘None’
o Include_data_in_file_name If both are specified the larger will be used.
• True to include the date_time string This allows the match tolerance to change as the
o amu_window mass increases but never be smaller than the
• Tolerance for matching in amu, typically 0.005 (i.e. ± 5 mmu) amu_window value.
o ppm_window
• Tolerance in ppm These are half-windows so ±
o c13_half_window
• Tolerance for matching 13C isotopes in amu, typically 0.005 (i.e. ± 5 mmu)
o max_C13_count
• Maximum number of 13C isotope peaks to consider, typically 3 or 4
o c13_rt_window
• If the peaks file has RT values they must also match for potential isotope peaks
o require_lower_c13_inten
• If True the intensity of the isotope(s) must be less than the intensity of the initial matched peak

35
Example output – printed match list
Peak mass, match error, retention time, intensity
Target mass, mono peak index, root, label

Isotopes

Same mass but different RT


(i.e. different peaks)

Simplify mode, so only first


target (lowest absolute error) is
printed

36
Example output – redundant and unmatched
• Redundant (partial)
o The number of redundant peaks and matches is always reported but the peaks are only
listed if print_redundant_matches is True

Same peak, three matches;


the first (lowest error) is
considered most likely

Heterodimer

This code adds a line between


redundant groups

• Unmatched >= 1% base peak intensity

37
CASE STUDIES
1. Guanosine FIA spectrum
2. PCVG Group from DOA samples
3. Di-methyl succinic acid
4. Unknown MSMS spectrum from 2
Note
The result summaries don’t match the current
software; they will be updated later (when I stop
making changes!)

38
Setup for interactive use
• Calculator and Match in JupyterLab

SharedData
folder

39
Basic workflow
Calculator Match
Save, open in PeakView,
Run cells above Setup overlay

Set base compounds New compounds?

Set modifications,
New adducts?
multimers and adducts

Review output cells


Set other parameters
(e.g. Save files)
Run Match Cells

Run Setup cell and all


Target ions file Import Target Ions File
below

Import Peak List

Run cells above Setup

40
PCVG group 1

50
Partial match list (by compound)
First pass - output
Peak Peak Delta Peak Peak Target Mono Target Target
index mass (mmu) RT Inten mass peak Root Label 122 peaks, 26.9 %

Same mass at different RTs - isomers

Isotopes.
Mono peak index is not the peak index

This is the second type of match results display.

The other just lists the peak in mass order; this one can
also skip isotopes

Print compound summary


51
First pass - analysis
• Unmatched ions > 1%
o Largest peaks are not yet identified

From previous analysis:


399.1642 is Ibu_OH-gluc.H+
421.1477 is Ibu_OH-gluc.(Na-H).H+

Masses suggest 416.1915 is M+NH4+

MH+

52
Second pass – add NH3, increase Na and K

196 peaks, 89.2 % (up from 122 peaks, 26.9 %)

177 is loss of HCOOH from Ibuprofen-OH


Add HCOOH as a loss

Several peaks have isotopes at same RT


-> credible peaks

MSMS of 562 suggests presence a hexose (added with


loss of H2O – see next section). Add Hex and increase
H2O loss count

53
Third pass – add HCOOH loss and Hex as adduct or compound
As adduct

268 peaks, 92.6 %


(up from 196 peaks, 89.2 %)

As compound (needs hetero_dimers = True) Heterodimer

317 peaks, 94.1 %


(up from 196 peaks, 89.2 %)

54
Ibuprofen XICs
• Target lists can also include an ‘xic width’
o Use in PeakView with ‘Extract ions using dialog’ + import (right-click)

Keep low for reasonable processing time

55
2,2 Dimethylsuccinic acid

56
Background
• Broeckling showed the spectrum below – many unexplained ions
o Anal. Chem. 2016(88)-9226, suppl fig 2

• Tobias spectrum has some common peaks…

MW = 146.0579

57
Recent spectrum from Ali
• Resembles Broeckling with extra peaks

58
What are the unknown peaks?
• Suggestion from Gerard that they may be due to metals – Ca, Al, Fe

• Based on large mass defect for adducts..

• Add to calculator and try..


o X = Al or Fe, both trivalent
o With one M: M + (X-H)++
o With two M: 2M + (X-2H)+
• Calculator adds H+ so need to use (X-3H) for replacement mass
o With three M: 3M + (X-3H) + H+
• Calculator adds H+ so need to use (X-3H) for replacement mass
o Fe-3H = 52.910913
o Al-3H = 23.957515

59
First pass

K-H (37.955881) and Ca-2H (37.94694)

70 peaks, 78.7% TIC

60
Matches

575.0639-429.0079 = 146.056

429.0079 – 282.9513 = 146.056

??? (there is also an ion at 575.0639 +


146.056 = 721.1198)

These ions all have interesting isotope


patterns…see next

61
Unknown pattern

2+ 575.0639 + (Na-H)

62
What causes the strange isotope pattern??

First (obvious) place with this pattern is 283.9513…

= 146.054 + 136.8973

Barium!
Ba-H = 136.897422

New composition (Ba-2H) = 135.889597

64
With Ba

98 peaks, 84.5% TIC


(from 70 peaks, 78.7%)

mmu

mmu

65
With Ba - 2

mmu

C6H9BaO4
2+

mmu
[M+Ba-H]+
≡ C18H30BaO12
M+(Ba-2H)+H+ ≡
[3M+Ba]2+

Arrows indicate Ba isotopes

66
Generated a new peak list > 0.2% to capture higher mass ions
• Increased half-window to 0.015 (should use ppm?)

571 peaks, 73.9% TIC


(from 98 peaks, 84.5% )

67
Region > 575

68
Broeckling spectrum
• From Gerard
1.8 ppm

52 peaks, 68.2%

69
Unknown MSMS spectrum from ibuprofen group 1

70
MSMS from PCVG group
• Since the PCVG data was based on a SWATH analysis, it was possible to generate
the MSMS spectrum at 8.08 min
• Match with Ibu only

391.1713 = loss of 176.036 from the Na ion at 567 -> glucuronide?

Peak at 369.1977 is lower by (Na-H) -> MH+ of underlying compound? Also has peaks
corresponding to losses of 1 and 2 H2O

369.1977 to 207.138 (Ibuprofen.MH+) is 162.0597 = C6H10O5 with an error of 8mmu


= Hexose - H2O

Add C6H105 as a separate compound

71
Second pass
• With C6H10O5 and glucuronide

61 peaks, 52.0 %

Peak at 177.0397 corresponds to MH+ of glucuronic acid (177.039364) with 141.018


and 159.0291 corresponding to water losses

Add glucuronic acid

72
Third pass
• With Gluc added – will calculate ions of glucuronic acid alone

72 peaks, 67.2 % (from 61 peaks, 52.0 %)

97.0821 may be a sugar fragment? (C5H5O2, -0.3mmu)

73
Final assignments

74

You might also like