You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343313807

Generating Proteomic Big Data for Precision Medicine

Article  in  Proteomics · July 2020


DOI: 10.1002/pmic.201900358

CITATIONS READS

6 136

7 authors, including:

Liang Yue Rui Sun


Westlake University Westlake University
20 PUBLICATIONS   329 CITATIONS    27 PUBLICATIONS   946 CITATIONS   

SEE PROFILE SEE PROFILE

Teagan Ss Chunhui Yuan


Westlake University Zhejiang University
28 PUBLICATIONS   930 CITATIONS    27 PUBLICATIONS   1,697 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multidrug Resistance in Cancer View project

Digital biobanking of cancer proteomes View project

All content following this page was uploaded by Teagan Ss on 28 August 2020.

The user has requested enhancement of the downloaded file.


www.proteomics-journal.com Page 1 Proteomics

Generating Proteomic Big Data for Precision Medicine

Yue Liang 1,2#, Fangfei Zhang 1,2#, Rui Sun 1,2, Yaoting Sun 1,2, Chunhui Yuan 1,2, Yi Zhu 1,2, Tiannan
Guo 1,2*

1, Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang

Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou 310024, Zhejiang Province, China

2, Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024,
Zhejiang Province, China

#, co-first

*, correspondence: guotiannan@westlake.edu.cn

ABSTRACT

Here we reason that the complexity of medical problems and proteome science might be tackled
effectively with deep learning (DL) technology. However, deployment of DL for proteomics data
requires the acquisition of data sets from large number of samples. Based on the success of DL in
medical imaging classification, proteome data from thousands of samples are arguably the minimal
input for DL. Contemporary proteomics is turning high-throughput thanks to the rapid progresses of
sample preparation and liquid chromatography mass spectrometry (LC-MS) methods. In particular,
data-independent acquisition (DIA) now enables generation of hundreds to thousands of quantitative
proteome maps from clinical specimens in clinical cohorts with only limited sample amounts in
clinical cohorts. Upheavals in the design of large-scale clinical proteomics studies might be required
to generate proteomic big data and deploy DL to tackle complex medical problems.

Keywords: Proteomic big data; Precision medicine; Deep learning; Clinical cohort;
Data-independent acquisition; High-throughput proteomics

Received: 02/03/2020 ; Revised: 13/07/2020 Accepted: 27/07/2020

This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1002/pmic.201900358.

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 2 Proteomics

1. Proteomics for precision medicine

The promise of precision medicine is to make clinical decisions regarding diagnosis, prognosis and
treatment customized to the diverse needs of individuals based on their specific phenotypic, molecular
or psychosocial characteristics [1]. The contemporary dominating paradigm for medical practice still
heavily relies on patient’s symptoms and morphological examinations. For example, the diagnosis and
stratification of prostate cancer is mainly dependent on digital rectal exam, concentration of blood
prostate specific antigen (PSA), magnetic resonance imaging (MRI), ultrasound fusion and Gleason
score, although it is widely acknowledged that these measures have led to high degree of
over-diagnosis and over-treatment [2]. Advances of molecular biology, in particular omics science, are
uncovering specific and/or systematic molecular dysregulations of various diseases. Diagnosis and
treatment of some diseases have greatly benefited from these progresses. For example, BCR-ABL
fusion gene can help diagnose chronic myelogenous leukemia and acts as an effective drug target for
this disease [3].

However, few diseases are caused by single gene mutations and can be effectively treated with a
specific inhibitor. The diagnosis and treatment of most diseases are confounded by highly convoluted
symptoms and laboratory examinations. Diseases of different pathogenesis may have similar
symptoms, morphological and even laboratory test results. On the other hand, a disease may manifest
different clinical phenotypes during its natural progression that is constantly influenced by
environmental factors. This complexity reflects the complex molecular mechanisms of diseases,
which can hardly be interrogated by measuring a fixed shortlist of molecules.

Omics offers an emerging approach to systematically profile thousands of molecular dysregulations


to cover the granularity of disease pathogenesis and progression. A number of large-scale genomics
experiments have been included in clinical studies to help build up models that can predict
incidence[4], diagnosis[5] and prognosis[6] of diseases. Next generation sequencing genomic
technologies have detected numerous molecular regulations in cancers[7]. However, identification of
functional changes emerges as a bottleneck towards understanding the key disease pathogenesis
process. Most variations in nucleic acids do not translate into modulation of proteins, the major
executors of life activities and catalyzers of biochemical reactions[8]. Measurement of the proteome of
clinical specimens can complement their genomic data in understanding cancer biology and
customizing cancer management [9, 10]. Recently the He group has advocated the emergence of
proteomics-driven precision medicine [11].

The measurement of a proteome, largely dependent on mass spectrometry[12], is inherently much


more sophisticated than that of nucleic acids, not only because proteins cannot be technically
2

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 3 Proteomics

amplified, but also due to the fact that the proteome is differentially expressed among tissue types. In
a cell, proteins are distributed in specific subcellular locations, and many of them can translocate
among multiple cellular organelles. Proteins can also be secreted by multiple cells, tissues and organs
into various body fluids, including but not limited to, blood and urine. Moreover, no protein acts in an
isolated manner. Proteins form complexes and networks, delivering sophisticated functions as elegant
machineries. The expression and structures of proteins are also highly dynamic. The activities of
most, if not all, proteins are regulated by post-translational modifications, degradation and synthesis.
Despite of its great potential in precision medicine, successful application of proteomics in medical
practice has yet to be accomplished.

2. Emerging large-scale clinical proteomic data sets using DDA and DIA

The high complexity of proteome structure, organization and dynamics poses great technical
challenge to measure a proteome in depth precisely and reproducibly [12]. Clinical proteomics is
further complicated by the complexity of medical problems [1]. To obtain reliable insights from
proteomic studies addressing medical questions, acquisition of large-scale proteomic data sets from
multiple independent clinical cohorts is indispensable. In the clinical applications of genomic and
metabolomic profiling, several thousand samples from multiple cohorts have been analyzed in order
to obtain reliable medical conclusions [6].

Data-dependent acquisition (DDA) coupled with multi-dimensional fractionation is the most


adopted method for proteomics. Several consortium-supported studies, including the NCI Clinical
Proteome Tissue Analysis Consortium and Chinese Human Proteome Consortium, recently reported
in-depth proteomic analysis of 100-200 clinical tissues recently [10, 11, 13]. Recently, the Kuster group
reported a microflow DDA-MS strategy which exhibited robustness after analyzing 1550 consecutive
injections of various cell and clinical samples in about 40 days [14]. This proof-of-principle study
suggests analysis of 100s to 1000s clinical specimens using DDA-MS becomes feasible in individual
laboratories.

In the meantime, targeted data analysis strategy, initially developed for targeted proteomics, was
introduced to DIA-MS, specifically Sequential Window Acquisition of All Theoretical Mass Spectra
(SWATH-MS) that was implemented in the TripleTOF mass spectrometers [15]. Most DIA-MS
experiments are performed in single shots, consuming no more than 1 microgram peptides in total.
DIA-MS produces a permanent digital map containing all the flyable peptide precursors of a
specimen, serving as an ideal method for the proteome digitization of clinical specimens [16]. Over 500
3

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 4 Proteomics

FFPE tissue samples have been robustly analyzed using Pressure cycling technology (PCT)-SWATH
technology in multiple laboratories [17]. Meanwhile, DIA-MS using Orbitrap technology is gaining
popularity in the field [18]. With QE-HFX, the Biognosys team identified over 10,000 proteins in a
single DIA run after optimization of chromatography and spectral library using testis tissue digests
[18]
. The same team also reported a 40 min microflow LC DIA method on Orbitrap Fusion Lumos and
its application to analyze over 1,500 plasma samples, leading to quantification of 565 proteins with a
data completeness of 74% [19].

3. DDA or DIA for generating proteomic big data?

Huge volume of proteomic data have been acquired and deposited in public repositories including
PRIDE [20], PeptideAtlas [21] and iProx [22]. These data are acquired by different sample preparation,
fractionation and LC-MS analysis strategies. The resultant data can hardly be utilized jointly to
address specific medical questions. Neither label-free nor label-based DDA-MS will likely generate
consistent quantitative proteomic data from complex tissue samples in large clinical cohorts in high
throughput due to the need of multi-dimensional fractionation. Label-free quantification by DIA-MS
circumvents the reproducibility problem of data-dependent sampling in DDA-MS[23]. DIA records
complete information of fragment ions and combined targeted approaches for more
comprehensiveness and consistent acquisition on the MS level. Several studies have shown that with
the same LC gradient, DIA enabled deeper depth for proteomic and phosphoproteomic studies [24, 25].
The Jesper lab claimed identification of > 20,000 phosphopeptides using a 15 min LC gradient DIA in
a QE-HFX with optimized DIA configuration and a DIA-specific phosphorylation site localization
algorithm [25].

The LC gradient of DIA/SWATH can be optimized to analyze tissues using a 15 min LC gradient
in advanced TripleTOF 6600 instruments [26]. This high-throughput method allowed us to perform
relatively large scale quantitative proteomics studies to understand prostate cancer intra-tumor
heterogeneity [27] and search for clinically relevant biomarkers in prostate cancer and diffuse large
B-cell lymphoma tissues [17]. Recently, the combination of timsTOF Pro and Evosep One allowed
identification of more than 2000 protein groups in Hela digest using 4.8-minutes ultra-high
throughput diaPASEF[28].

Nevertheless, the abovementioned DIA data are not sufficient, in terms of sample size, to be
claimed as big data, neither can numerous DDA data acquired from various studies be. There remains
a shortage of big proteomic data sets from rigorously designed clinical studies, reproducible sample
4

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 5 Proteomics

preparation and LC-MS analysis pipelines. However, in this viewpoint, we anticipate that increasing
number of large DIA data sets from clinical cohorts will be generated in the coming years by
individual laboratories equipped with high-throughput proteomics technologies and industrialized
facilities [29].

4. High-throughput sample preparation for proteomics

High throughput and reproducible extraction and digestion of proteins from clinical specimens are
prerequisite for generating high quality proteomic big data. Multiple effective technologies have been
developed towards this goal. Since this viewpoint is not a comprehensive review, we listed only a few
of them of interest due to the limit of space. Pressure cycling technology (PCT) can effectively
process FFPE samples to peptides ready for LC-MS shot in ~3 hrs [16, 30]. Liquid handling robots can
deal with samples in 96-well plates. Mann’s group developed a pipeline for plasma proteome
profiling, in which the entire sample preparation procedure took less than 2 h in 96-well plates [31].
Lately, they implemented it to process microdissected FFPE samples in high throughput [32]. EasyPhos
was developed by Mann’s group to enable parallel 96-well processing to simplify and accelerate the
phosphoproteomics workflows with optimized coverage and reproducibility of phosphorylation site
quantification[33].

5. Considerations of data quality

In large cohort studies, multiple steps could introduce batch effects which may confound the
biological interpretation of the data[34]. The use of standard operating procedures (SOP) and quality
control (QC) samples for both sample preparation and MS data acquisition contributed to minimizing
batch effects [35, 36]. Given properly designed batches, the batch effects can be evaluated and
minimized[34, 35]. Faster sample preparation and shorter LC gradient with microflow LC reduced batch
effects too [37].

6. Deep learning for proteomic big data

Many complex clinical problems have been well assisted by big data and deep learning technology,
or the so-called artificial Intelligence (AI). AI, literally explained by machines possessed human
intelligence, was first described with the emergence of the first computer early 50’s [38]. As a basic
5

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 6 Proteomics

practice of AI, machine learning (ML) applies algorithms to train on a prior acquired annotated data
sets to learn the data and make informative prediction for a particular task. Conventional ML
techniques, such as decision tree learning, clustering, reinforcement learning, and Bayesian networks,
are oftentimes insufficient to learn compound functions from high-dimensional data space [39].
Although ML has been widely used in various biomedical scenarios, it frequently suffers from the
curse of dimensionality when feature space is larger than sample space. Feature selection with domain
knowledge is required to transform the raw data to build a classifier or detect patterns.

During the past decade, deep learning or deep neural networks (DNN) [39], empowered by the
computing power of CPU clusters and graphic processing unit (GPUs) has made breakthroughs in
dealing with the data of sophisticated structure and high-dimensionality. DL has demonstrated
superior performance than human beings in object and speech recognition. Compared with machine
learning (ML), DL is autodidactic where the number of layers is automatically determined by the data
itself. With availability of annotated data as input, three major types of architectures have been
implemented to connect neurons in DL including feed-forward, convolutional and recurrent neural
networks. Advanced techniques such as regularization and dropout are adopted to avoid overfitting.

DL has displayed comparable performance with the gold standard i.e. human experts in the tasks of
classifying medical images for clinical diagnosis. For instance, diabetic retinopathy is reported to be
diagnosed by a DL model developed by learning of 128,175 retinal images with area under curve
(AUC) over 99% in the validation dataset [40]. Deep learning of 207,130 images of the retina from
4,686 patients led to the diagnosis of the most common blinding retinal diseases with an AUC of
99.9% [41].

In theory, a sizeable protein matrix of injections and proteins, if available, could be used for
building a predictive model for biomarker discovery (Figure 1). To the best of our knowledge, no
published proteomic big data allows effective DL for a specific medical question. While multiple
genome-scale proteomics studies have been published in the literature, very few studies have
consistently analyzed thousands of clinical specimens that are arguably regarded as the minimal
requirement for DL. In a preprint manuscript, Sun et al. collected 1725 DIA maps for thyroid tissue
samples in a retrospective cohort and used them to build a classifier of 14 proteins with artificial
neural networks. This classifier achieved 91% accuracy in classifying malignant thyroid nodules in
retrospective samples of 271 patients, and 88% accuracy in prospective cohorts from four independent
centers of 62 patients [35]. The neural network is not very deep though. Alternative methodology using
an end-to-end AI deep learning framework is being developed to construct a functional mapping from
raw MS data to phenotypic data. This not only avoids erroneous protein intensity values and missing

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 7 Proteomics

values during DIA data interpretation, but also substantially reduces analysis time and cost of
proteomic big data-based diagnosis[42]. In both scenarios, interpretability of complex DL models will
be a major challenge to translate the models into biological insights. In addition, proteomic data
acquired from independent cohorts are essential to eliminate potential over-fitting. Finally, not only
the expression of singular proteins can be the input of DL modeling. Increasing evidences show
protein complexes, pathways and networks also constitute promising input features [43], and can be
potentially employed to feed DL modeling.

This paper is not intended to discuss applications of DL for DIA spectra interpretation, which is
also an active research field as discussed in another viewpoint paper in this issue [44].

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 8 Proteomics

Figure 1. Generating proteomic big data for precision medicine. Diseases are developed via
complex pathogenesis routes and manifested as medical problems. To solve medical problems,
multiple clinical cohorts are to be procured. Clinical specimens are collected for streamlined
8

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 9 Proteomics

high-throughput proteomic analysis. The resultant proteomics maps should arguably reach multiple
thousands before effective deep learning could be applied to provide solutions or clues for various
medical problems and insights to pathogenesis.

7. Conclusions

Precision medicine, confounded by inevitable real-world factors, is inherently more complex than
biology. The complexity of proteomes and clinical problems necessitates the generation of large-scale
proteomic data sets from multiple clinical cohorts to understand pathogenesis and to address medical
problems. Recent technological advances in sample preparation and LC-MS are enabling generation
of proteomic big data at increasing pace. In particular, both DDA and DIA mass spectrometry exhibit
increased throughput in analyzing the proteomes, while DIA seems to possess higher degree of
reproducibility, throughput and information content compared to DDA. Increasing volumes of
DIA-based proteome maps are being produced, approaching the minimal requirement of deep
learning. Deep learning has been successfully applied to analyze big medical image data, facilitating
clinical diagnosis in recent years. We anticipate rapid accumulation of DIA-based proteomic big data
from clinical cohorts, which will build up a solid basis for DL-based modeling of diseases. Future
efforts are required to address technical obstacles of generating proteomic big data and customize DL
modeling for the emerging proteomic big data. Upheavals in the design of large-scale clinical
proteomics studies might be required to take full advantage of the rapidly advancing high-throughput
proteomic methods and deep learning technology.

Acknowledgements

This work is supported by grants from the National Natural Science Foundation of China (81972492)
and National Natural Science Foundation of China for Young Scholars (21904107), Zhejiang
Provincial Natural Science Foundation for Distinguished Young Scholars (LR19C050001), Hangzhou
Agriculture and Society Advancement Program (20190101A04).

Conflict of Interests

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 10 Proteomics

The research group of T.G. is supported by SCIEX, which provides access to prototype
instrumentation, and Pressure Biosciences Inc., which provides access to advanced sample
preparation instrumentation.

References

[1] J. L. Jameson, D. L. Longo, N Engl J Med 2015, 372, 2229.

[2] S. Loeb, M. A. Bjurlin, J. Nicholson, T. L. Tammela, D. F. Penson, H. B. Carter, P. Carroll, R. Etzioni, Eur
Urol 2014, 65, 1046.

[3] R. Ren, Nat Rev Cancer 2005, 5, 172.

[4] S. Abelson, G. Collord, S. W. K. Ng, O. Weissbrod, N. Mendelson Cohen, E. Niemeyer, N. Barda, P. C.


Zuzarte, L. Heisler, Y. Sundaravadanam, R. Luben, S. Hayat, T. T. Wang, Z. Zhao, I. Cirlan, T. J. Pugh, D.
Soave, K. Ng, C. Latimer, C. Hardy, K. Raine, D. Jones, D. Hoult, A. Britten, J. D. McPherson, M. Johansson,
F. Mbabaali, J. Eagles, J. K. Miller, D. Pasternack, L. Timms, P. Krzyzanowski, P. Awadalla, R. Costa, E.
Segal, S. V. Bratman, P. Beer, S. Behjati, I. Martincorena, J. C. Y. Wang, K. M. Bowles, J. R. Quirós, A.
Karakatsani, C. La Vecchia, A. Trichopoulou, E. Salamanca-Fernández, J. M. Huerta, A. Barricarte, R. C.
Travis, R. Tumino, G. Masala, H. Boeing, S. Panico, R. Kaaks, A. Krämer, S. Sieri, E. Riboli, P. Vineis, M.
Foll, J. McKay, S. Polidoro, N. Sala, K.-T. Khaw, R. Vermeulen, P. J. Campbell, E. Papaemmanuil, M. D.
Minden, A. Tanay, R. D. Balicer, N. J. Wareham, M. Gerstung, J. E. Dick, P. Brennan, G. S. Vassiliou, L. I.
Shlush, Nature 2018, 559, 400.

[5] D. Capper, D. T. W. Jones, M. Sill, V. Hovestadt, D. Schrimpf, D. Sturm, C. Koelsche, F. Sahm, L.


Chavez, D. E. Reuss, A. Kratz, A. K. Wefers, K. Huang, K. W. Pajtler, L. Schweizer, D. Stichel, A. Olar, N. W.
Engel, K. Lindenberg, P. N. Harter, A. K. Braczynski, K. H. Plate, H. Dohmen, B. K. Garvalov, R. Coras, A.
Holsken, E. Hewer, M. Bewerunge-Hudler, M. Schick, R. Fischer, R. Beschorner, J. Schittenhelm, O.
Staszewski, K. Wani, P. Varlet, M. Pages, P. Temming, D. Lohmann, F. Selt, H. Witt, T. Milde, O. Witt, E.
Aronica, F. Giangaspero, E. Rushing, W. Scheurlen, C. Geisenberger, F. J. Rodriguez, A. Becker, M. Preusser,
C. Haberler, R. Bjerkvig, J. Cryan, M. Farrell, M. Deckert, J. Hench, S. Frank, J. Serrano, K. Kannan, A.
Tsirigos, W. Bruck, S. Hofer, S. Brehmer, M. Seiz-Rosenhagen, D. Hanggi, V. Hans, S. Rozsnoki, J. R.
Hansford, P. Kohlhof, B. W. Kristensen, M. Lechner, B. Lopes, C. Mawrin, R. Ketter, A. Kulozik, Z. Khatib, F.
Heppner, A. Koch, A. Jouvet, C. Keohane, H. Muhleisen, W. Mueller, U. Pohl, M. Prinz, A. Benner, M.

10

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 11 Proteomics

Zapatka, N. G. Gottardo, P. H. Driever, C. M. Kramm, H. L. Muller, S. Rutkowski, K. von Hoff, M. C.


Fruhwald, A. Gnekow, G. Fleischhack, S. Tippelt, G. Calaminus, C. M. Monoranu, A. Perry, C. Jones, T. S.
Jacques, B. Radlwimmer, M. Gessi, T. Pietsch, J. Schramm, G. Schackert, M. Westphal, G. Reifenberger, P.
Wesseling, M. Weller, V. P. Collins, I. Blumcke, M. Bendszus, J. Debus, A. Huang, N. Jabado, P. A. Northcott,
W. Paulus, A. Gajjar, G. W. Robinson, M. D. Taylor, Z. Jaunmuktane, M. Ryzhova, M. Platten, A. Unterberg,
W. Wick, M. A. Karajannis, M. Mittelbronn, T. Acker, C. Hartmann, K. Aldape, U. Schuller, R. Buslei, P.
Lichter, M. Kool, C. Herold-Mende, D. W. Ellison, M. Hasselblatt, M. Snuderl, S. Brandner, A. Korshunov, A.
von Deimling, S. M. Pfister, Nature 2018, 555, 469.

[6] A. Ivey, R. K. Hills, M. A. Simpson, J. V. Jovanovic, A. Gilkes, A. Grech, Y. Patel, N. Bhudia, H. Farah,
J. Mason, K. Wall, S. Akiki, M. Griffiths, E. Solomon, F. McCaughan, D. C. Linch, R. E. Gale, P. Vyas, S. D.
Freeman, N. Russell, A. K. Burnett, D. Grimwade, U. K. N. C. R. I. A. W. Group, N Engl J Med 2016, 374,
422; J. Deelen, J. Kettunen, K. Fischer, A. van der Spek, S. Trompet, G. Kastenmuller, A. Boyd, J. Zierer, E. B.
van den Akker, M. Ala-Korpela, N. Amin, A. Demirkan, M. Ghanbari, D. van Heemst, M. A. Ikram, J. B. van
Klinken, S. P. Mooijaart, A. Peters, V. Salomaa, N. Sattar, T. D. Spector, H. Tiemeier, A. Verhoeven, M.
Waldenberger, P. Wurtz, G. Davey Smith, A. Metspalu, M. Perola, C. Menni, J. M. Geleijnse, F. Drenos, M.
Beekman, J. W. Jukema, C. M. van Duijn, P. E. Slagboom, Nat Commun 2019, 10, 3346.

[7] K. Tomczak, P. Czerwińska, M. Wiznerowicz, Contemp Oncol (Pozn) 2015, 19, A68.

[8] Y. Liu, A. Beyer, R. Aebersold, Cell 2016, 165, 535.

[9] H. Rodriguez, S. R. Pennington, Cell 2018, 173, 535.

[10] D. J. Clark, S. M. Dhanasekaran, F. Petralia, J. Pan, X. Song, Y. Hu, F. da Veiga Leprevost, B. Reva, T. M.
Lih, H. Y. Chang, W. Ma, C. Huang, C. J. Ricketts, L. Chen, A. Krek, Y. Li, D. Rykunov, Q. K. Li, L. S. Chen,
U. Ozbek, S. Vasaikar, Y. Wu, S. Yoo, S. Chowdhury, M. A. Wyczalkowski, J. Ji, M. Schnaubelt, A. Kong, S.
Sethuraman, D. M. Avtonomov, M. Ao, A. Colaprico, S. Cao, K. C. Cho, S. Kalayci, S. Ma, W. Liu, K.
Ruggles, A. Calinawan, Z. H. Gumus, D. Geiszler, E. Kawaler, G. C. Teo, B. Wen, Y. Zhang, S. Keegan, K. Li,
F. Chen, N. Edwards, P. M. Pierorazio, X. S. Chen, C. P. Pavlovich, A. A. Hakimi, G. Brominski, J. J. Hsieh, A.
Antczak, T. Omelchenko, J. Lubinski, M. Wiznerowicz, W. M. Linehan, C. R. Kinsinger, M. Thiagarajan, E. S.
Boja, M. Mesri, T. Hiltke, A. I. Robles, H. Rodriguez, J. Qian, D. Fenyo, B. Zhang, L. Ding, E. Schadt, A. M.
Chinnaiyan, Z. Zhang, G. S. Omenn, M. Cieslik, D. W. Chan, A. I. Nesvizhskii, P. Wang, H. Zhang, C. Clinical
Proteomic Tumor Analysis, Cell 2019, 179, 964.

[11] Y. Jiang, A. Sun, Y. Zhao, W. Ying, H. Sun, X. Yang, B. Xing, W. Sun, L. Ren, B. Hu, C. Li, L. Zhang, G.
Qin, M. Zhang, N. Chen, M. Zhang, Y. Huang, J. Zhou, Y. Zhao, M. Liu, X. Zhu, Y. Qiu, Y. Sun, C. Huang, M.

11

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 12 Proteomics

Yan, M. Wang, W. Liu, F. Tian, H. Xu, J. Zhou, Z. Wu, T. Shi, W. Zhu, J. Qin, L. Xie, J. Fan, X. Qian, F. He,
C. Chinese Human Proteome Project, Nature 2019, 567, 257.

[12] R. Aebersold, M. Mann, Nature 2016, 537, 347.

[13] Q. Gao, H. Zhu, L. Dong, W. Shi, R. Chen, Z. Song, C. Huang, J. Li, X. Dong, Y. Zhou, Q. Liu, L. Ma, X.
Wang, J. Zhou, Y. Liu, E. Boja, A. I. Robles, W. Ma, P. Wang, Y. Li, L. Ding, B. Wen, B. Zhang, H.
Rodriguez, D. Gao, H. Zhou, J. Fan, Cell 2019, 179, 561; S. Vasaikar, C. Huang, X. Wang, V. A. Petyuk, S. R.
Savage, B. Wen, Y. Dou, Y. Zhang, Z. Shi, O. A. Arshad, M. A. Gritsenko, L. J. Zimmerman, J. E. McDermott,
T. R. Clauss, R. J. Moore, R. Zhao, M. E. Monroe, Y. T. Wang, M. C. Chambers, R. J. C. Slebos, K. S. Lau, Q.
Mo, L. Ding, M. Ellis, M. Thiagarajan, C. R. Kinsinger, H. Rodriguez, R. D. Smith, K. D. Rodland, D. C.
Liebler, T. Liu, B. Zhang, C. Clinical Proteomic Tumor Analysis, Cell 2019, 177, 1035.

[14] Y. Bian, R. Zheng, F. P. Bayer, C. Wong, Y. C. Chang, C. Meng, D. P. Zolg, M. Reinecke, J. Zecha, S.
Wiechmann, S. Heinzlmeir, J. Scherr, B. Hemmer, M. Baynham, A. C. Gingras, O. Boychenko, B. Kuster, Nat
Commun 2020, 11, 157.

[15] L. C. Gillet, P. Navarro, S. Tate, H. Rost, N. Selevsek, L. Reiter, R. Bonner, R. Aebersold, Mol Cell
Proteomics 2012, 11, O111 016717; L. C. Gillet, A. Leitner, R. Aebersold, Annual Review of Analytical
Chemistry 2016, 9, 449.

[16] T. Guo, P. Kouvonen, C. C. Koh, L. C. Gillet, W. E. Wolski, H. L. Rost, G. Rosenberger, B. C. Collins, L.


C. Blum, S. Gillessen, M. Joerger, W. Jochum, R. Aebersold, Nat Med 2015, 21, 407.

[17] Y. Zhu, T. Weiss, Q. Zhang, R. Sun, B. Wang, Z. Wu, Q. Zhong, X. Yi, H. Gao, X. Cai, G. Ruan, T. Zhu,
C. Xu, S. Lou, X. Yu, L. Gillet, P. Blattmann, K. Saba, C. D. Fankhauser, M. B. Schmid, D. Rutishauser, J.
Ljubicic, A. Christiansen, C. Fritz, N. J. Rupp, C. Poyet, E. Rushing, M. Weller, P. Roth, E. Haralambieva, S.
Hofer, C. Chen, W. Jochum, X. Gao, X. Teng, L. Chen, P. J. Wild, R. Aebersold, T. Guo, bioRxiv 2019,
667394.

[18] J. Muntel, T. Gandhi, L. Verbeke, O. M. Bernhardt, T. Treiber, R. Bruderer, L. Reiter, Mol Omics 2019,
15, 348.

[19] R. Bruderer, J. Muntel, S. Muller, O. M. Bernhardt, T. Gandhi, O. Cominetti, C. Macron, J. Carayol, O.


Rinner, A. Astrup, W. H. M. Saris, J. Hager, A. Valsesia, L. Dayon, L. Reiter, Mol Cell Proteomics 2019, 18,
1242.

[20] Y. Perez-Riverol, A. Csordas, J. Bai, M. Bernal-Llinares, S. Hewapathirana, D. J. Kundu, A. Inuganti, J.


Griss, G. Mayer, M. Eisenacher, E. Pérez, J. Uszkoreit, J. Pfeuffer, T. Sachsenberg, S. Yilmaz, S. Tiwary, J.
12

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 13 Proteomics

Cox, E. Audain, M. Walzer, A. F. Jarnuczak, T. Ternent, A. Brazma, J. A. Vizcaíno, Nucleic Acids Res 2019,
47, D442.

[21] U. Kusebauch, E. W. Deutsch, D. S. Campbell, Z. Sun, T. Farrah, R. L. Moritz, Curr Protoc Bioinformatics
2014, 46, 13 25 1.

[22] J. Ma, T. Chen, S. Wu, C. Yang, M. Bai, K. Shu, K. Li, G. Zhang, Z. Jin, F. He, H. Hermjakob, Y. Zhu,
Nucleic Acids Res 2019, 47, D1211.

[23] K. Barkovits, S. Pacharra, K. Pfeiffer, S. Steinbach, M. Eisenacher, K. Marcus, J. Uszkoreit, Molecular &
cellular proteomics : MCP 2020, 19, 181; C. Fernández-Costa, S. Martínez-Bartolomé, D. B. McClatchy, A. J.
Saviola, N.-K. Yu, J. R. Yates, Journal of Proteome Research 2020.

[24] R. Bruderer, O. M. Bernhardt, T. Gandhi, S. M. Miladinovic, L. Y. Cheng, S. Messner, T. Ehrenberger, V.


Zanotelli, Y. Butscheid, C. Escher, O. Vitek, O. Rinner, L. Reiter, Mol Cell Proteomics 2015, 14, 1400.

[25] D. B. Bekker-Jensen, O. M. Bernhardt, A. Hogrebe, A. Martinez-Val, L. Verbeke, T. Gandhi, C. D.


Kelstrup, L. Reiter, J. V. Olsen, Nat Commun 2020, 11, 787.

[26] R. Sun, C. Hunter, C. Chen, W. Ge, N. Morrice, S. Liang, T. Zhu, C. Yuan, G. Ruan, Q. Zhang, X. Cai, X.
Yu, L. Chen, S. Dai, Z. Luan, R. Aebersold, Y. Zhu, T. Guo, J Proteome Res 2020.

[27] T. Guo, L. Li, Q. Zhong, N. J. Rupp, K. Charmpi, C. E. Wong, U. Wagner, J. H. Rueschoff, W. Jochum, C.
D. Fankhauser, K. Saba, C. Poyet, P. J. Wild, R. Aebersold, A. Beyer, Life Sci Alliance 2018, 1.

[28] T. K. Stephanie Kaspar-Schoenefeld , Markus Lubeck, Oliver Rather, Gary Kruppa, Nicolai Bache, Dorte
B. Bekker-Jensen, 2020.

[29] B. Tully, R. L. Balleine, P. G. Hains, Q. Zhong, R. R. Reddel, P. J. Robinson, Proteomics 2019, 19,
e1900109.

[30] H. Gao, F. Zhang, S. Liang, Q. Zhang, M. Lyu, L. Qian, W. Liu, W. Ge, C. Chen, X. Yi, J. Zhu, C. Lu, P.
Sun, K. Liu, Y. Zhu, T. Guo, J Proteome Res 2020, 19, 1982; N. Lucas, A. B. Robinson, M. Marcker Espersen,
S. Mahboob, D. Xavier, J. Xue, R. L. Balleine, A. deFazio, P. G. Hains, P. J. Robinson, Journal of Proteome
Research 2019, 18, 399.

[31] P. E. Geyer, N. A. Kulak, G. Pichler, L. M. Holdt, D. Teupser, M. Mann, Cell Syst 2016, 2, 185.

[32] F. Coscia, S. Doll, J. M. Bech, A. Mund, E. Lengyel, J. Lindebjerg, G. I. Madsen, J. M. A. Moreira, M.


Mann, bioRxiv 2019, 779009.

13

This article is protected by copyright. All rights reserved.


www.proteomics-journal.com Page 14 Proteomics

[33] S. J. Humphrey, O. Karayel, D. E. James, M. Mann, Nat Protoc 2018, 13, 1897.

[34] W. W. B. Goh, W. Wang, L. Wong, Trends in Biotechnology 2017, 35, 498.

[35] Y. Sun, S. Selvarajan, Z. Zang, W. Liu, Y. J. Zhu, H. Zhang, H. Chen, X. Cai, H. Gao, Z. Wu, L. Chen, X.
Teng, Y. Zhao, S. Mantoo, T. K.-H. Lim, B. Hariraman, S. Yeow, S. M. F. Syed Abdillah, S. S. Lee, G. Ruan,
Q. Zhang, T. Zhu, W. Wang, G. Wang, J. Xiao, Y. He, Z. Wang, W. Sun, Y. Qin, Q. Xiao, X. Zheng, L. Wang,
X. Zheng, K. Xu, Y. Shao, K. Liu, S. Zheng, R. Aebersold, S. Z. Li, O. L. Kon, N. G. Iyer, T. Guo, medRxiv
2020, 2020.04.09.20059741.

[36] B. Shen, X. Yi, Y. Sun, X. Bi, J. Du, C. Zhang, S. Quan, F. Zhang, R. Sun, L. Qian, W. Ge, W. Liu, S.
Liang, H. Chen, Y. Zhang, J. Li, J. Xu, Z. He, B. Chen, J. Wang, H. Yan, Y. Zheng, D. Wang, J. Zhu, Z. Kong,
Z. Kang, X. Liang, X. Ding, G. Ruan, N. Xiang, X. Cai, H. Gao, L. Li, S. Li, Q. Xiao, T. Lu, Y. Zhu, H. Liu, H.
Chen, T. Guo, Cell 2020, 182, 59.

[37] R. Sun, C. Hunter, C. Chen, W. Ge, N. Morrice, S. Liang, C. Yuan, Q. Zhang, X. Cai, X. Yu, L. Chen, S.
Dai, Z. Luan, R. Aebersold, Y. Zhu, T. Guo, bioRxiv 2019, 675348.

[38] A. M. Turing, Mind 1950, 59, 433.

[39] I. B. Goodfellow, Yoshua;Courville, Aaron, MIT press 2016.

[40] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner,


T. Madams, J. Cuadros, R. Kim, R. Raman, P. C. Nelson, J. L. Mega, D. R. Webster, JAMA 2016, 316, 2402.

[41] D. S. Kermany, M. Goldbaum, W. Cai, C. C. S. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang,


X. Wu, F. Yan, J. Dong, M. K. Prasadha, J. Pei, M. Y. L. Ting, J. Zhu, C. Li, S. Hewett, J. Dong, I. Ziyar, A.
Shi, R. Zhang, L. Zheng, R. Hou, W. Shi, X. Fu, Y. Duan, V. A. N. Huu, C. Wen, E. D. Zhang, C. L. Zhang, O.
Li, X. Wang, M. A. Singer, X. Sun, J. Xu, A. Tafreshi, M. A. Lewis, H. Xia, K. Zhang, Cell 2018, 172, 1122.

[42] F. Zhang, S. Yu, L. Wu, Z. Zang, X. Yi, J. Zhu, C. Lu, P. Sun, Y. Sun, S. Selvarajan, L. Chen, X. Teng, Y.
Zhao, G. Wang, J. Xiao, S. Huang, O. L. Kon, N. G. Iyer, S. Z. Li, Z. Luan, T. Guo, bioRxiv 2020,
2020.03.05.978635.

[43] V. Emilsson, M. Ilkov, J. R. Lamb, N. Finkel, E. F. Gudmundsson, R. Pitts, H. Hoover, V.


Gudmundsdottir, S. R. Horman, T. Aspelund, L. Shu, V. Trifonov, S. Sigurdsson, A. Manolescu, J. Zhu, O.
Olafsson, J. Jakobsdottir, S. A. Lesley, J. To, J. Zhang, T. B. Harris, L. J. Launer, B. Zhang, G. Eiriksdottir, X.
Yang, A. P. Orth, L. L. Jennings, V. Gudnason, Science 2018, 361, 769.

[44] L. Xu, Yong, A., Zhou, A., Rost, H., Proteomics 2020, doi: 10.1002/pmic.201900352.
14

This article is protected by copyright. All rights reserved.

View publication stats

You might also like