You are on page 1of 2

correspondence

A five-level classification system for proteoform


identifications
To the Editor — The term proteoform,
introduced in Nature Methods in 2013 Table 1 | Proteoform level classification system
(ref. 1), has rapidly gained acceptance in the Levela PTMs localized PTMs identified Sequence defined Gene identified
proteomics community. The challenge
1 Yes Yes Yes Yes
and importance of comprehensively
identifying proteoforms in complex samples 2A No Yes Yes Yes
has been recognized, and reports have 2B Yes No Yes Yes
begun to appear of new platforms towards 2C Yes Yes No Yes
that end2–5. However, one interesting
2D Yes Yes Yes No
central ambiguity has emerged, namely
determining precisely what is meant by a 3 Two are certain and two are ambiguous
‘proteoform identification’. At present, 4 One is certain and three are ambiguous
the only practical approaches for 5 All are ambiguous
establishing the exact primary structure of
a
See Supplementary Table 1 for definitions of sub-levels.
a proteoform employ mass spectrometry
(MS), and a wide range of MS results
claim proteoform identifications6.
This seemingly small matter has The five levels of proteoform is, the PTMs or their localization; only the
significant impact, as the ambiguity identification are: observed mass of the proteoform is known.
in what is meant by an ‘identification’ Level 1: the proteoform is identified We distinguish here between the
makes it difficult to compare results from with no ambiguity, with full knowledge of definitional issue of what level of proteoform
different laboratories and approaches. its gene of origin, with its complete amino identification is being claimed, and the
This situation hinders the ability of the sequence defined, and with the identities related issue of statistical confidence, which
community to evaluate technological and locations of all PTMs known, if present. provides a measure of the quality of the
progress and to efficiently expand Level 2: the proteoform is identified claimed identification. The system presented
biological knowledge. with ambiguity in just one of the classes here permits proteoform identifications
To address this issue and assist of ambiguity described above. Examples of to be classified into various types but
researchers in representing the ambiguity this include: level 2A, where the amino intentionally does not address the related
within their proteoform identifications, we acid sequence is completely defined with issue of the confidence with which such
propose a five-level proteoform classification knowledge of its gene of origin and all identifications have been asserted by each
system. The classification scheme covers PTMs are fully identified, but their investigator. Ideally, every proteoform
the four types of ambiguity possible for a localization is incomplete. Level 2B, identification would be accompanied by
proteoform identification, ranging from the where the amino acid sequence is both a classification level and a confidence
most subtle (that is, precise localization of a completely defined with knowledge of metric. Early efforts to estimate proteoform
post-translational modification (PTM)), its gene of origin and the localization characterization confidence include the
to the most dramatic (that is, ambiguity of PTMs is complete, but the PTM(s) C-score7 and the MIScore8, but further work
in the gene of origin). are not fully identified or structurally is needed to develop and refine estimates so
The four classes of ambiguity are: characterized (for example, acetylation that proteoform levels can be confidently
versus trimethylation or glycoproteoforms). and automatically assigned.
• PTM localization: a PTM is not localized Level 2C, where all PTMs are identified and This five-level system for classifying
to a specific amino acid. localized, if present, but there exists some proteoform identifications should greatly
• PTM identification: there is ambiguity localized sequence ambiguity (for example, reduce ambiguity in the reporting of
in the identity of a PTM. the order of amino acids is unknown in a proteoform identification results. We will
• Amino acid sequence: there is ambiguity small region), yet there is still knowledge of use this system in future publications,
in the amino acid sequence. its gene of origin. Level 2D, where the amino breaking down proteoform identifications
• Gene: the gene of origin is unknown acid sequence is completely defined and made into these five categories; and we
or ambiguous. all PTMs are fully identified and localized, hope others will adopt the system as well.
however there is ambiguity with respect to An important practical issue in broad
These four classes determine the level the gene of origin. implementation of the scheme will be
of ambiguity present in the identification, Level 3: the proteoform is identified with the development of informatic tools that
ranging from no ambiguity at all (Level 1), ambiguity in two of the classes. assign and report these classifications
to ambiguity of all four types (Level 5). Level 4: the proteoform is identified with automatically. We believe the resultant
Details of the scheme are provided in ambiguity in three of the classes. uniformity will help researchers publish
Table 1 and Supplementary Table 1, with Level 5: insufficient information has findings with greater clarity, compare
specific-use cases and examples provided in been obtained to know from which gene and evaluate results using different
Supplementary Fig. 1. the proteoform is derived, what its sequence methodologies, and drive efficient progress
Nature Methods | www.nature.com/naturemethods
correspondence

in the field. Together with integration of Program, University of Wisconsin-Madison, References


Smith, L. M. & Kelleher, N. L. Nat. Methods 10, 186–187 (2013).
different types of results (such as peptide- Madison, WI, USA. 5Department of Chemistry 1.
2. Anderson, L. C. et al. J. Prot. Res. 16, 1087–1096 (2017).
level data from bottom-up proteomics) and Chemical Biology, Northeastern University, 3. Lubeckyj, R. A. et al. Anal. Chem. 89, 12059–12067 (2017).
and prior efforts to regularize proteoform Boston, MA, USA. 6Ion Cyclotron Resonance 4. Schaffer, L. V. et al. J. Prot. Res. 17, 3526–3536 (2018).
sharing9, a growing infrastructure to build Program, National High Magnetic Field Laboratory, 5. Bush, D. R., Zang, L., Belov, A. M., Ivanov, A. R. & Karger, B. L.
Anal. Chem. 88, 1138–1146 (2016).
up proteoform databases is developing10. ❐ Tallahassee, FL, USA. 7Mass Spectrometry for Biology 6. Lermyte, F., Tsybin, Y. O., O’Connor, P. B. & Loo, J. A. J Am. Soc.
Unit, Institut Pasteur, Paris, France. 8Department Mass. Spectrom. 30, 1149–1157 (2019).
Lloyd M. Smith   1*, Paul M. Thomas   2,3, of Chemistry, University of Oxford, Oxford, UK. 7. LeDuc, R. D. et al. J. Prot. Res. 13, 3231–3240 (2014).
8. Kou, Q. et al. J. Prot. Res. 15, 2422–2432 (2016).
Michael R. Shortreed1, Leah V. Schaffer1, 9
Department of Chemistry and Biochemistry, 9. LeDuc, R. D. et al. J. Prot. Res. 17, 1321–1325 (2018).
Ryan T. Fellers2,3, Richard D. LeDuc2,3, University of California, Los Angeles, CA, USA. 10. Natale, D. A. et al. Nucleic Acids Res. 45, D339–D346 (2017).
Trisha Tucholski1, Ying Ge4, Jeffrey N. Agar   5, 10
Environmental Molecular Sciences Laboratory
Lissa C. Anderson6, Julia Chamot-Rooke   7, and Biological Sciences Division, Pacific Northwest Acknowledgements
This work was supported by grants R35 GM126914
Joseph Gault   8, Joseph A. Loo   9, National Laboratory, Richland, WA, USA.
(to L.M.S.) and P41 GM108569 (to N.L.K.) from the NIH
Ljiljana Paša-Tolić   10, Carol V. Robinson8, 11
University Medical Center, Hamburg-Eppendorf, National Institute of General Medical Sciences,
Hartmut Schlüter11, Yury O. Tsybin   12, Hamburg, Germany. 12Spectroswiss CH, Lausanne, R21 LM013097 (to P.M.T.) from the National Library
Marta Vilaseca   13, Juan Antonio Vizcaíno14, Switzerland. 13Institute for Research in Biomedicine, of Medicine, and the Sherman Fairchild Foundation
Paul O. Danis15 and Neil L. Kelleher   2,3* Barcelona, Spain. 14European Molecular Biology (to N.L.K.).
1
Department of Chemistry, University of Laboratory, European Bioinformatics Institute, Competing interests
Wisconsin-Madison, Madison, WI, USA. Cambridge, UK. 15Consortium for Top Down Y.O.T. is an employee of Spectroswiss. R.T.F., N.L.K. and
2
Department of Chemistry and Molecular Proteomics, Cambridge, MA, USA. R.D.L. declare a competing interest in the development and
Biosciences, Northwestern University, Evanston, *e-mail: smith@chem.wisc.edu; commercialization of proteoform search and management
IL, USA. 3National Resource for Translational n-kelleher@northwestern.edu software. All other authors declare no competing interests.
and Developmental Proteomics, Northwestern Additional information
University, Evanston, IL, USA. 4Department of Cell Published: xx xx xxxx Supplementary information is available for this paper at
and Regenerative Biology and Human Proteomics https://doi.org/10.1038/s41592-019-0573-x https://doi.org/10.1038/s41592-019-0573-x.

Nature Methods | www.nature.com/naturemethods

You might also like