Plant Proteomics: Methods and Protocols

Methods in
Molecular Biology 2139
Jesus V. Jorrin-Novo
Luis Valledor
Mari Angeles Castillejo
Maria-Dolores Rey Editors
Plant
Proteomics
Methods and Protocols
Third Edition
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK
For further volumes:

http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and
methodologies in the critically acclaimed Methods in Molecular Biology series. The series was
the first to introduce the step-by-step protocols approach that has become the standard in all
biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-
step fashion, opening with an introductory overview, a list of the materials and reagents
needed to complete the experiment, and followed by a detailed procedure that is supported
with a helpful notes section offering tips and tricks of the trade as well as troubleshooting
advice. These hallmark features were introduced by series editor Dr. John Walker and
constitute the key ingredient in each and every volume of the Methods in Molecular Biology
series. Tested and trusted, comprehensive and reliable, all protocols from the series are
indexed in PubMed.
Plant Proteomics
Methods and Protocols
Third Edition
Edited by
Agroforestry and Plant Biochemistry, Proteomics and Systems Biology, Department of Biochemistry
and Molecular Biology, University of Cordoba, Cordoba, Spain
Luis Valledor
Department of Organisms and Systems Biology, Institute of Biotechnology of Asturias, University
of Oviedo, Oviedo, Asturias, Spain
Mari Angeles Castillejo

and Molecular Biology, University of Cordoba UCO-CeiA3, Cordoba, Cordoba, Spain
Maria-Dolores Rey
and Molecular Biology, University of Cordoba, Cordoba, Spain
Editors
Jesus V. Jorrin-Novo Luis Valledor
Agroforestry and Plant Biochemistry Department of Organisms and Systems Biology
Proteomics and Systems Biology Institute of Biotechnology of Asturias
Department of Biochemistry University of Oviedo
and Molecular Biology Oviedo, Asturias, Spain
University of Cordoba
Cordoba, Spain Maria-Dolores Rey
Agroforestry and Plant Biochemistry
Mari Angeles Castillejo Proteomics and Systems Biology
Agroforestry and Plant Biochemistry Department of Biochemistry and Molecular Biology
Proteomics and Systems Biology University of Cordoba
Department of Biochemistry Cordoba, Spain
and Molecular Biology
University of Cordoba UCO-CeiA3
Cordoba, Cordoba, Spain
ISSN 1064-3745 ISSN 1940-6029 (electronic)

Methods in Molecular Biology
ISBN 978-1-0716-0527-1 ISBN 978-1-0716-0528-8 (eBook)
https://doi.org/10.1007/978-1-0716-0528-8
© Springer Science+Business Media, LLC, part of Springer Nature 2020

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface
You now have in your hands the third edition Plant Proteomics: Methods and Protocols,
preceded by the first edition in 2007 (M. Zivy, C. Damerval, and V. Mechin, eds.) and the
second one in 2014 (J. V. Jorrin Novo, S. Komatsu, W. Weckwerth, and S. Wienkoop, eds.).
The success of the previous editions and the continuous advances and improvements in
proteomic techniques, equipment, and bioinformatics tools, and their uses in basic and
translational plant biology research that has occurred in the past 5 years encouraged Humana
Press to prepare a new updated version. Under the title Advances in Proteomics Techniques,
Data Validation, and Integration with Other Classic and -Omics Approaches in the Systems
Biology Direction, it contains 29 chapters written by worldwide recognized scientists.
The monograph, which starts with an introductory chapter (Chapter 1), is a compilation
of protocols commonly employed in plant biology research. They show recent advances at all
workflow stages, starting from the laboratory (tissue and cell fractionation, protein extrac-
tion, depletion, purification, separation, MS analysis, quantification) and ending on the
computer (algorithms for protein identification and quantification, bioinformatics tools for
data analysis, databases and repositories).
Out of the 29 chapters, 6 are devoted to descriptive proteomics, with a special emphasis
on subcellular protein profiling (Chapters 5–10), 6 to PTMs (Chapter 11 and 14–18), 3 to
protein interactions (Chapters 19–21), and 2 to specific proteins, peroxidases (Chapter 24)
and proteases and proteases inhibitors (Chapter 26). The book reflects the new trajectory in
MS-based protein identification and quantification, moving from the classic gel-based
approaches to the most recent labeling (Chapters 10, 11, 29), shotgun (Chapters 5, 7,
12, 15), parallel reaction monitoring (Chapter 16), and targeted data acquisition
(Chapter 13). MS-imaging (Chapter 25), the only in vivo MS-based proteomics strategy,
is far from being fully optimized and exploited in plant biology research. A confident protein
identification and quantitation, especially in orphan species, and on low-abundant proteins,
is still a challenging topic (Chapters 4, 28).
This edition also gives a novel point of view to the proteomics approach with the
description of different protocols for proteomics data validation and integration with
other classic and -omics approaches in the systems biology direction. Chapter 2 reports on
multiple extractions in a single experiment of the different biomolecules, nucleic acids,
proteins, and metabolites. Chapter 27 describes how metabolic pathways can be recon-
structed from multiple -omics data, and Chapter 3 is on network building. Finally, Chapters
22 and 23 deal with, respectively, the search for allele-specific proteins and proteogenomics.
Keeping in mind the history and evolution of proteomics, it is quite probable that the
fourth edition will be published in few years, as we are still at the beginning of deciphering
the plant proteome to understand the central dogma of the molecular biology in terms of
proteins and to exploit the potential of the technique for translational purposes.
Cordoba, Spain Jesus V. Jorrin-Novo

Oviedo, Spain Luis Valledor
Cordoba, Spain Mari Angeles Castillejo
Cordoba, Spain Maria-Dolores Rey
v
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 What Is New in (Plant) Proteomics Methods and Protocols:

The 2015–2019 Quinquennium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Multiple Biomolecule Isolation Protocol Compatible with Mass
Spectrometry and Other High-Throughput Analyses in Microalgae . . . . . . . . . . . 11
Francisco Colina, Marı́a Carbo, Ana Álvarez, Mo nica Meijon,
Marı́a Jesús Cañal, and Luis Valledor
3 Protein Interaction Networks: Functional and Statistical Approaches . . . . . . . . . . 21
Mo nica Escandon, Laura Lamelas, Vı́ctor Roces,
Vı́ctor M. Guerrero-Sanchez, Mo nica Meijon, and Luis Valledor
4 Specific Protein Database Creation from Transcriptomics Data
in Nonmodel Species: Holm Oak (Quercus ilex L.) . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Vı́ctor M. Guerrero-Sanchez, Ana M. Maldonado-Alconada,
Rosa Sánchez-Lucas, and Maria-Dolores Rey
5 Subcellular Proteomics in Conifers: Purification of Nuclei
and Chloroplast Proteomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Laura Lamelas, Lara Garcı́a, Marı́a Jesús Cañal,
and Monica Meijon
6 Apoplastic Fluid Preparation from Arabidopsis thaliana Leaves
Upon Interaction with a Nonadapted Powdery Mildew Pathogen . . . . . . . . . . . . 79
Ryohei Thomas Nakano, Nobuaki Ishihama, Yiming Wang,
Junpei Takagi, Tomohiro Uemura, Paul Schulze-Lefert,
and Hirofumi Nakagami
7 Shotgun Proteomics of Plant Plasma Membrane and
Microdomain Proteins Using Nano-LC-MS/MS . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Daisuke Takahashi, Bin Li, Takato Nakayama,
Yukio Kawamura, and Matsuo Uemura
8 A Protocol for the Plasma Membrane Proteome Analysis
of Rice Leaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Ravi Gupta, Yu-Jin Kim, and Sun Tae Kim
9 Isolation, Purity Assessment, and Proteomic Analysis of
Endoplasmic Reticulum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Xin Wang and Setsuko Komatsu
10 Dimethyl Labeling-Based Quantitative Proteomics of
Recalcitrant Cocoa Pod Tissue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Yoel Esteve-Sánchez, Jaime A. Morante-Carriel,
Ascensio n Martı́nez-Márquez, Susana Sellés-Marchart,
and Roque Bru-Martinez
vii
viii Contents
11 Quantitative Profiling of Protein Abundance and Phosphorylation

State in Plant Tissues Using Tandem Mass Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Gaoyuan Song, Christian Montes, and Justin W. Walley
12 Optimizing Shotgun Proteomics Analysis for a Confident Protein
Identification and Quantitation in Orphan Plant Species:
The Case of Holm Oak (Quercus ilex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Isabel Gomez-Gálvez, Rosa Sánchez-Lucas, Bonoso San-Eufrasio,
Luis Enrique Rodrı́guez de Francisco, Ana M. Maldonado-Alconada,
Carlos Fuentes-Almagro, and Mari Angeles Castillejo
13 Combining Targeted and Untargeted Data Acquisition to
Enhance Quantitative Plant Proteomics Experiments. . . . . . . . . . . . . . . . . . . . . . . . 169
Gene Hart-Smith
14 A Phosphoproteomic Analysis Pipeline for Peels of Tropical Fruits . . . . . . . . . . . . 179
Janet Juarez-Escobar, José M. Elizalde-Contreras,
Vı́ctor M. Loyola-Vargas, and Eliel Ruiz-May
15 Label-Free Quantitative Phosphoproteomics for Algae . . . . . . . . . . . . . . . . . . . . . . 197
Megan M. Ford, Sheldon R. Lawrence II, Emily G. Werth,
Evan W. McConnell, and Leslie M. Hicks
16 Targeted Quantification of Phosphopeptides by Parallel
Reaction Monitoring (PRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Sara Christina Stolze and Hirofumi Nakagami
17 Enrichment of N-Linked Glycopeptides and Their Identification
by Complementary Fragmentation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Eduardo Antonio Ramirez-Rodriguez and Joshua L. Heazlewood
18 High-Resolution Lysine Acetylome Profiling by Offline
Fractionation and Immunoprecipitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Jonas Giese, Ines Lassowskat, and Iris Finkemeier
19 A Versatile Workflow for the Identification of Protein–Protein
Interactions Using GFP-Trap Beads and Mass Spectrometry-Based
Label-Free Quantification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Guillaume Née, Priyadarshini Tilak, and Iris Finkemeier
20 In Vivo Cross-Linking to Analyze Transient Protein–Protein
Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Heidi Pertl-Obermeyer and Gerhard Obermeyer
21 Proteome Analysis of 14-3-3 Targets in Tomato Fruit Tissues . . . . . . . . . . . . . . . . 289
Yongming Luo, Yu Lu, Junji Yamaguchi, and Takeo Sato
22 The Use of Proteomics in Search of Allele-Specific Proteins in
(Allo)polyploid Crops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Sebastien Christian Carpentier
23 Methods for Optimization of Protein Extraction and
Proteogenomic Mapping in Sweet Potato. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Thualfeqar Al-Mohanna, Norbert T. Bokros, Nagib Ahsan,
George V. Popescu, and Sorina C. Popescu
Contents ix
24 In Silico Analysis of Class III Peroxidases: Hypothetical Structure,

Ligand Binding Sites, Posttranslational Modifications,
and Interaction with Substrates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Sabine Lüthje and Kalaivani Ramanathan
25 MALDI Mass Spectrometry Imaging of Peptides
in Medicago truncatula Root Nodules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Caitlin Keller, Erin Gemperline, and Lingjun Li
26 Cystatin Activity–Based Protease Profiling to Select Protease
Inhibitors Useful in Plant Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Marie-Claire Goulet, Frank Sainsbury, and Dominique Michaud
27 A Pipeline for Metabolic Pathway Reconstruction in
Plant Orphan Species. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Cristina Lopez-Hidalgo, Monica Escandon, Luis Valledor,
and Jesus V. Jorrin-Novo
28 Detection of Plant Low-Abundance Proteins by Means
of Combinatorial Peptide Ligand Library Methods . . . . . . . . . . . . . . . . . . . . . . . . . 381
Egisto Boschetti and Pier Giorgio Righetti
29 iTRAQ-Based Proteomic Analysis of Rice Grains . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Marouane Baslam, Kentaro Kaneko, and Toshiaki Mitsui
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Contributors
NAGIB AHSAN • Division of Biology and Medicine, COBRE Center for Cancer Research
Development, Proteomics Core Facility, Rhode Island, USA Hospital, Providence, Brown
University, Providence, RI, USA; Division of Biology and Medicine, Brown University,
Providence, RI, USA
THUALFEQAR AL-MOHANNA • Department of Biochemistry, Molecular Biology, Entomology,
and Plant Pathology, Mississippi State University, Mississippi State, MS, USA
ANA ÁLVAREZ • Plant Physiology, Department of Organisms and Systems Biology and
University Institute of Biotechnology (IUBA), University of Oviedo, Oviedo, Spain
MAROUANE BASLAM • Department of Biochemistry, Faculty of Agriculture, Niigata
University, Niigata, Japan
NORBERT T. BOKROS • Department of Biochemistry, Molecular Biology, Entomology, and
Plant Pathology, Mississippi State University, Mississippi State, MS, USA
EGISTO BOSCHETTI • Scientific Consultant, JAM Conseil, Neuilly-sur-Seine, France
ROQUE BRU-MARTINEZ • Plant Proteomics and Functional Genomics Group, Department of
Agrochemistry and Biochemistry. Faculty of Sciences, University of Alicante, Alicante,
Spain
MARÍA JESÚS CAÑAL • Plant Physiology, Department of Organisms and Systems Biology and
MARÍA CARBÓ • Plant Physiology, Department of Organisms and Systems Biology and
SEBASTIEN CHRISTIAN CARPENTIER • SYBIOMA: Facility for Systems Biology-Based Mass
Spectrometry, KULeuven, Leuven, Belgium; Bioversity International, Genetic Resources,
Leuven, Belgium
MARI ANGELES CASTILLEJO • Agroforestry and Plant Biochemistry, Proteomics and Systems
Biology, Department of Biochemistry and Molecular Biology, University of Cordoba, UCO-
CeiA3, Cordoba, Spain
FRANCISCO COLINA • Plant Physiology, Department of Organisms and Systems Biology and
LUIS ENRIQUE RODRÍGUEZ DE FRANCISCO • Laboratorio de Biologı́a, Instituto Tecnologico de
Santo Domingo, Santo Domingo, República Dominicana
JOSÉ M. ELIZALDE-CONTRERAS • Red de Estudios Moleculares Avanzados, Clúster Cientı́fico y
Tecnologico BioMimic®, Instituto de Ecologı́a A.C. (INECOL), Veracruz, Mexico
MÓNICA ESCANDÓN • Agroforestry and Plant Biochemistry, Proteomics and Systems Biology,
Department of Biochemistry and Molecular Biology, University of Cordoba, UCO-CeiA3,
Cordoba, Spain
YOEL ESTEVE-SÁNCHEZ • Plant Proteomics and Functional Genomics Group, Department of
Agrochemistry and Biochemistry. Faculty of Sciences, University of Alicante, Alicante,
Spain
IRIS FINKEMEIER • Plant Physiology, Institute of Plant Biology and Biotechnology, University
of Münster, Münster, Germany
MEGAN M. FORD • Department of Chemistry, University of North Carolina at Chapel Hill,
Chapel Hill, NC, USA
xi
xii Contributors
CARLOS FUENTES-ALMAGRO • Proteomics Facility, SCAI, University of Cordoba, Cordoba,

Spain
LARA GARCÍA • Plant Physiology, Department of Organisms and Systems Biology and
ERIN GEMPERLINE • Department of Chemistry, University of Wisconsin-Madison, Madison,
WI, USA
JONAS GIESE • Plant Physiology, Institute of Plant Biology and Biotechnology, University of
Münster, Münster, Germany
ISABEL GÓMEZ-GÁLVEZ • Agroforestry and Plant Biochemistry, Proteomics and Systems
MARIE-CLAIRE GOULET • Centre de Recherche et d’Innovation sur les Végétaux, Université
Laval, Québec, QC, Canada
VÍCTOR M. GUERRERO-SANCHEZ • Agroforestry and Plant Biochemistry, Proteomics and
Systems Biology, Department of Biochemistry and Molecular Biology, University of Cordoba,
UCO-CeiA3, Cordoba, Spain
RAVI GUPTA • Department of Plant Biosciences, Life and Energy Convergence Research
Institute, Pusan National University, Miryang, South Korea
GENE HART-SMITH • Department of Molecular Sciences, Macquarie University, Sydney,
NSW, Australia
JOSHUA L. HEAZLEWOOD • School of BioSciences, The University of Melbourne, Parkville, VIC,
Australia
LESLIE M. HICKS • Department of Chemistry, University of North Carolina at Chapel Hill,
NOBUAKI ISHIHAMA • RIKEN Center for Sustainable Resource Science, Yokohama, Japan
JESUS V. JORRIN-NOVO • Agroforestry and Plant Biochemistry, Proteomics and Systems
JANET JUAREZ-ESCOBAR • Red de Estudios Moleculares Avanzados, Clúster Cientı́fico y
Tecnologico BioMimic®, Instituto de Ecologı́a A.C. (INECOL), Veracruz, Mexico
KENTARO KANEKO • Graduate School of Science and Technology, Niigata University, Niigata,
Japan
YUKIO KAWAMURA • United Graduate School of Agricultural Sciences, Iwate University,
Morioka, Japan; Department of Plant-bioscience, Faculty of Agriculture, Iwate University,
Morioka, Japan
CAITLIN KELLER • Department of Chemistry, University of Wisconsin-Madison, Madison,
WI, USA
SUN TAE KIM • Department of Plant Biosciences, Life and Energy Convergence Research
Institute, Pusan National University, Miryang, South Korea
YU-JIN KIM • Graduate School of Biotechnology and Crop Biotech Institute, Kyung Hee
University, Yongin, South Korea
SETSUKO KOMATSU • Faculty of Environmental and Information Sciences, Fukui University
of Technology, Fukui, Japan
LAURA LAMELAS • Plant Physiology, Department of Organisms and Systems Biology and
INES LASSOWSKAT • Plant Physiology, Institute of Plant Biology and Biotechnology, University
Contributors xiii
SHELDON R. LAWRENCE II • Department of Chemistry, University of North Carolina at

Chapel Hill, Chapel Hill, NC, USA
BIN LI • United Graduate School of Agricultural Sciences, Iwate University, Morioka, Japan
LINGJUN LI • Department of Chemistry, University of Wisconsin-Madison, Madison, WI,
USA; School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
CRISTINA LÓPEZ-HIDALGO • Plant Physiology, Department of Organisms and Systems Biology,
University Institute of Biotechnology of Asturias (IUBA), University of Oviedo, Oviedo,
Asturias, Spain
VÍCTOR M. LOYOLA-VARGAS • Unidad de Bioquı́mica y Biologı́a Molecular de Plantas,
Centro de Investigacion Cientı́fica de Yucatán (CICY), Mérida, Yucatán, Mexico
YONGMING LUO • Faculty of Science and Graduate School of Life Science, Hokkaido
University, Sapporo, Japan
SABINE LÜTHJE • Oxidative Stress and Plant Proteomics Group, Institute for Plant Science
and Microbiology, University of Hamburg, Hamburg, Germany
YU LU • Faculty of Science and Graduate School of Life Science, Hokkaido University,
Sapporo, Japan; Graduate School of Life and Environmental Sciences, University of
Tsukuba, Tsukuba, Japan
ANA M. MALDONADO-ALCONADA • Agroforestry and Plant Biochemistry, Proteomics and
Systems Biology, Department of Biochemistry and Molecular Biology, University of Cordoba,
UCO-CeiA3, Cordoba, Spain
ASCENSIÓN MARTÍNEZ-MÁRQUEZ • Plant Proteomics and Functional Genomics Group,
Department of Agrochemistry and Biochemistry. Faculty of Sciences, University of Alicante,
Alicante, Spain
EVAN W. MCCONNELL • Department of Chemistry, University of North Carolina at Chapel
Hill, Chapel Hill, NC, USA
MÓNICA MEIJÓN • Plant Physiology, Department of Organisms and Systems Biology and
DOMINIQUE MICHAUD • Centre de Recherche et d’Innovation sur les Végétaux, Université
Laval, Québec, QC, Canada
TOSHIAKI MITSUI • Department of Biochemistry, Faculty of Agriculture, Niigata University,
Niigata, Japan; Graduate School of Science and Technology, Niigata University, Niigata,
Japan
CHRISTIAN MONTES • Department of Plant Pathology and Microbiology, Iowa State
University, Ames, IA, USA
JAIME A. MORANTE-CARRIEL • Biotechnology and Molecular Biology Group, Quevedo State
Technical University, Quevedo, Ecuador
HIROFUMI NAKAGAMI • Protein Mass Spectrometry Group, Max Planck Institute for Plant
Breeding Research, Cologne, Germany
RYOHEI THOMAS NAKANO • Department of Plant Microbe Interactions, Max Planck Institute
for Plant Breeding Research, Cologne, Germany; Cluster of Excellence on Plant Sciences
(CEPLAS), Max Planck Institute for Plant Breeding Research, Cologne, Germany
TAKATO NAKAYAMA • Department of Plant-bioscience, Faculty of Agriculture, Iwate
University, Morioka, Japan
GUILLAUME NÉE • Plant Physiology, Institute of Plant Biology and Biotechnology, University
GERHARD OBERMEYER • Department of Biosciences, Membrane Biophysics, Paris-Lodron-
University of Salzburg, Salzburg, Austria
xiv Contributors
HEIDI PERTL-OBERMEYER • Department of Biosciences, Membrane Biophysics, Paris-Lodron-

University of Salzburg, Salzburg, Austria
GEORGE V. POPESCU • Institute for Genomics, Biocomputing, and Biotechnology, Mississippi
State University, Mississippi State, MS, USA; The National Institute for Laser, Plasma &
Radiation Physics, Bucharest, Romania
SORINA C. POPESCU • Department of Biochemistry, Molecular Biology, Entomology, and
Plant Pathology, Mississippi State University, Mississippi State, MS, USA
KALAIVANI RAMANATHAN • Oxidative Stress and Plant Proteomics Group, Institute for Plant
Science and Microbiology, University of Hamburg, Hamburg, Germany
EDUARDO ANTONIO RAMIREZ-RODRIGUEZ • School of BioSciences, The University of
Melbourne, Parkville, VIC, Australia
MARIA-DOLORES REY • Agroforestry and Plant Biochemistry, Proteomics and Systems Biology,
Department of Biochemistry and Molecular Biology, University of Cordoba, UCO-CeiA3,
Cordoba, Spain
PIER GIORGIO RIGHETTI • Miles Gloriosus Academy, Milan, Italy
VÍCTOR ROCES • Plant Physiology, Department of Organisms and Systems Biology and
ELIEL RUIZ-MAY • Red de Estudios Moleculares Avanzados, Clúster Cientı́fico y Tecnologico
BioMimic®, Instituto de Ecologı́a A.C. (INECOL), Veracruz, Mexico
FRANK SAINSBURY • Centre de Recherche et d’Innovation sur les Végétaux, Université Laval,
Québec, QC, Canada; Griffith Institute for Drug Discovery, Griffith University, Brisbane,
QLD, Australia
ROSA SÁNCHEZ-LUCAS • Agroforestry and Plant Biochemistry, Proteomics and Systems
BONOSO SAN-EUFRASIO • Agroforestry and Plant Biochemistry, Proteomics and Systems
TAKEO SATO • Faculty of Science and Graduate School of Life Science, Hokkaido University,
Sapporo, Japan
PAUL SCHULZE-LEFERT • Department of Plant Microbe Interactions, Max Planck Institute
for Plant Breeding Research, Cologne, Germany; Cluster of Excellence on Plant Sciences
(CEPLAS), Max Planck Institute for Plant Breeding Research, Cologne, Germany
SUSANA SELLÉS-MARCHART • Plant Proteomics and Functional Genomics Group, Department
of Agrochemistry and Biochemistry. Faculty of Sciences, University of Alicante, Alicante,
Spain
GAOYUAN SONG • Department of Plant Pathology and Microbiology, Iowa State University,
Ames, IA, USA
SARA CHRISTINA STOLZE • Protein Mass Spectrometry Group, Max Planck Institute for Plant
Breeding Research, Cologne, Germany
JUNPEI TAKAGI • Faculty of Science and Engineering, Konan University, Kobe, Japan
DAISUKE TAKAHASHI • Central Infrastructure Group: Genomics and Transcript Profiling,
Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany; United Graduate
School of Agricultural Sciences, Iwate University, Morioka, Japan; Graduate School of
Science and Engineering, Saitama University, Saitama, Japan
PRIYADARSHINI TILAK • Plant Physiology, Institute of Plant Biology and Biotechnology,
University of Münster, Münster, Germany
Contributors xv
MATSUO UEMURA • United Graduate School of Agricultural Sciences, Iwate University,

Morioka, Japan; Department of Plant-bioscience, Faculty of Agriculture, Iwate University,
Morioka, Japan
TOMOHIRO UEMURA • Graduate School of Humanities and Sciences, Ochanomizu
University, Tokyo, Japan
LUIS VALLEDOR • Department of Organisms and Systems Biology, Institute of Biotechnology
of Asturias, University of Oviedo, Oviedo, Asturias, Spain
JUSTIN W. WALLEY • Department of Plant Pathology and Microbiology, Iowa State University,
Ames, IA, USA
XIN WANG • College of Agronomy and Biotechnology, China Agricultural University,
Beijing, China
YIMING WANG • Department of Plant Microbe Interactions, Max Planck Institute for Plant
Breeding Research, Cologne, Germany; Department of Plant Pathology, Nanjing
Agricultural University, Nanjing, China
EMILY G. WERTH • Department of Chemistry, University of North Carolina at Chapel Hill,
JUNJI YAMAGUCHI • Faculty of Science and Graduate School of Life Science, Hokkaido
University, Sapporo, Japan
Chapter 1
What Is New in (Plant) Proteomics Methods and Protocols:

The 2015–2019 Quinquennium
Abstract
The third edition of “Plant Proteomics Methods and Protocols,” with the title “Advances in Proteomics
Techniques, Data Validation, and Integration with Other Classic and -Omics Approaches in the Systems
Biology Direction,” was conceived as being based on the success of the previous editions, and the continuous
advances and improvements in proteomic techniques, equipment, and bioinformatics tools, and their uses
in basic and translational plant biology research that has occurred in the past 5 years (in round figures, of
around 22,000 publications referenced in WoS, 2000 were devoted to plants).
The monograph contains 29 chapters with detailed proteomics protocols commonly employed in plant
biology research. They present recent advances at all workflow stages, starting from the laboratory (tissue
and cell fractionation, protein extraction, depletion, purification, separation, MS analysis, quantification)
and ending on the computer (algorithms for protein identification and quantification, bioinformatics tools
for data analysis, databases and repositories). At the end of each chapter there are enough explanatory notes
and comments to make the protocols easily applicable to other biological systems and/or studies, discussing
limitations, artifacts, or pitfalls. For that reason, as with the previous editions, it would be especially useful
for beginners or novices.
Out of the 29 chapters, six are devoted to descriptive proteomics, with a special emphasis on subcellular
protein profiling (Chapters 5–10), six to PTMs (Chapters 11, and 14–18), three to protein interactions
(Chapters 19–21), and two to specific proteins, peroxidases (Chapter 24) and proteases and protease
inhibitors (Chapter 26). The book reflects the new trajectory in MS-based protein identification and
quantification, moving from the classic gel-based approaches to the most recent labeling (Chapters 10,
11, 29), shotgun (Chapters 5, 7, 12, 15), parallel reaction monitoring (Chapter 16), and targeted data
acquisition (Chapter 13). MS imaging (Chapter 25), the only in vivo MS-based proteomics strategy, is far
from being fully optimized and exploited in plant biology research. A confident protein identification and
quantitation, especially in orphan species, of low-abundance proteins, is still a challenging task (Chapters 4,
28).
What is really new is the use of different techniques for proteomics data validation and their integration
into other classic and -omics approaches in the systems biology direction. Chapter 2 reports on multiple
extractions in a single experiment of the different biomolecules, nucleic acids, proteins, and metabolites.
Chapter 27 describes how metabolic pathways can be reconstructed from multiple -omics data, and
Chapter 3 network building. Finally, Chapters 22 and 23 deal with, respectively, the search for allele-
specific proteins and proteogenomics.
Around 200 groups were, almost 1 year ago, invited to take part in this edition. Unfortunately, only 10%
of them kindly accepted. My gratitude to those who accepted our invitation but also to those who did not,
as all of them have contributed to the plant proteomics field. I will enlist, in this introductory chapter,
following my own judgment, some of the relevant papers published in the past 5 years, those that have
Jesus V. Jorrin-Novo et al. (eds.), Plant Proteomics: Methods and Protocols, Methods in Molecular Biology, vol. 2139,
https://doi.org/10.1007/978-1-0716-0528-8_1, © Springer Science+Business Media, LLC, part of Springer Nature 2020
1
2 Jesus V. Jorrin-Novo
shown us how to enhance and exploit the potential of proteomics in plant biology research, without aiming
at giving a too exhaustive list.
Key words Omics approaches, Plant proteomics, Protein interactions, PTMs, Proteogenomics,
Quantitative proteomics, Shotgun proteomics, Systems biology, Targeted proteomics
1 Introduction
The success of the previous editions of “Plant Proteomics Methods

and Protocols” (Springer Nature Methods in Molecular Biology,
vols. 355, 2007, and 1072; 2014; http://www.springer.com/
series/7651) [1, 2] and the continuous advances and improve-
ments in proteomic techniques, equipment, and bioinformatics
tools, and their use in basic and translational plant biology research,
have encouraged Humana Press to prepare a new updated third
version with the title, “Advances in Proteomics Techniques, Data
Validation, and Integration with Other Classic and -Omics
Approaches in the Systems Biology Direction,” edited by J.V. Jorrı́n
Novo, L. Valledor, M.A. Castillejo, and M.D. Rey.
Since the last, second, edition, and in a very short period of
time, 5 years (2014-May 2019), the number of proteomics papers,
in general, and those devoted to plant proteomics studies in partic-
ular, has been continuously increasing. There were 22,000 and
2000 hits for a search at WoS with the keywords “proteomics” or
“plant + proteomics,” respectively. These figures reflect, on the one
hand, that the field of proteomics has been greatly enriched and
updated with equipment, techniques, protocols, algorithms, data-
bases, and repositories. Thus, the possibility now exists of having a
deeper coverage of the proteome, a more confident protein identi-
fication and quantification, and a less speculative and more confi-
dent biological interpretation of the data and responses to
biological questions based on the protein language. On the other,
and on glancing once again at plant proteomics figures, the same
conclusion is reached: “the full potential of proteomics is still far
from being fully exploited in plant biology research” [3], and there are
not many groups carrying out plant proteomics experiments using
the latest technological advances and equipment in the field. There
are more groups entering proteomics, with new plant experimental
systems, proposing new biological studies, but they use classic
approaches, keeping proteomics mostly descriptive and speculative.
Assuming this situation, this new edition aims to show plant scien-
tists how they can go one step forward by using proteomics as an
experimental approach.
2019 Plant Proteomics Methods and Protocols 3
2 Novelties in the 2015–2019 Period
The main objective of a proteomics experiment is to identify, char-

acterize, and quantify as many proteoforms or protein species as
possible. Its success depends on the experimental system, the pro-
tocols for protein extraction and fractionation, the MS strategy, the
equipment, and the algorithms and databases employed. Each
technique and protocol has to be optimized to the experimental
system, the biological process, and the starting hypothesis. Like any
analytical technique, MS has to be validated, and its resolution,
sensitivity, detection limit, and dynamic range determined
(Chapter 12). With respect to the experimental system, a consider-
ation should be made of its biological characteristics such as the
level of ploidy, the availability of species-specific protein databases,
and its recalcitrance, the latter related to the chemical composition
(Chapters 22, 23, 29). In the plant proteomics scenario, orphan
and recalcitrant species such as forest trees still remain challenging
(Chapters 4 and 12).
Up to six consecutive generations of MS proteomics platforms
have been developed and employed since its beginning, in the early
1990s, 25 years ago [4]. Human proteome research has moved fast
in using the most recent technologies, gel-free/label-free or shot-
gun (fourth generation) [5], single/multiple reaction monitoring,
targeted or mass western (fifth generation) [6], and data-
independent acquisition, DIA, and its sequential windowed data-
independent acquisition of the total high-resolution mass spectra,
SWATH (sixth generation) [7] However, plant investigators still
cling to the employment of gel-MS, including difference gel elec-
trophoresis, DIGE (first and second generation), isobaric or isoto-
pic labeling, mostly isobaric tags for relative and absolute
quantitation, iTRAQ (third generation), and shotgun (fourth gen-
eration) [8] (Chapters 10, 11, 13, 15).
The optimization of classic protocols for protein extraction [9]
and purification [10, 11], together with advances in mass spec-
trometry techniques [12], the evolution of mass spectrometers,
especially the Orbitrap family [13], the feasibility of sequencing
and annotating quicker and cheaper complete genomes and tran-
scriptomes for protein database constructions ([14, 15] and Chap-
ters 4 and 12), and the development of algorithms and
bioinformatics tools for protein identification, quantification,
grouping, and statistical analysis of the data ([16] and Chapter 4,
this volume), “[has] taken proteomics to an unimaginable achieve-
ment in terms of the number of protein species confidently identified,
quantified, and characterized” [4]. We have progressed from iden-
tification of hundreds to thousands of gene products in a single
experiment. As a result, protein databases and repositories are being
created or enriched [17, 18]. Even so, we are only able to visualize a
small fraction of the whole proteome (1–5%). For a higher cover-

age, subcellular or protein fractionation has been chosen. In this
volume, different chapters deal with the proteome analysis of sub-
cellular fractions, including apoplast, membrane systems, nuclei,
and chloroplasts (Chapters 5–9). Chapter 28 describes detecting
low-abundance proteins by using the combinatorial peptide ligand
library (CPLL) technique.
Descriptive and comparative proteomics remain the most
represented areas in the current plant proteomics literature, with
new plant systems and biological processes continuously being
reported. The main interest lies in crops and processes related to
productivity and other phenotypes of importance from an agro-
nomic point of view [19, 20]. Stresses associated with climate
change and biodiversity are two of the leading topics [21, 22].
It has been claimed that proteomics can lead us to the identifi-
cation of protein markers [23–26] that are useful in plant breeding
programs and in the selection of elite genotypes, but that is still far
from reality. One of the difficulties in identifying protein markers is
the existence of very similar proteins as members of a multigene
family, allelic variants, or individual genes, that give rise to a variable
number of proteoforms or protein species as a result of posttran-
scriptional (alternative splicing) of posttranslational (PTMs) events,
without finding out the biological role of each one of them
[27, 28]. As bottom-up, peptide-centric, platforms cannot give
clear responses to this question, top-down strategies have to be
improved [29]. Just as an example, in Chapter 22, by Prof. Car-
pentier, alternative protocols for allele-specific proteins are
proposed.
Posttranslational modifications, PTMs, and interactomics
remain a challenge, but more and more papers are appearing on
these topics [30–32]. As a novelty with respect to the previous two
editions, this third edition includes five chapters describing proto-
cols for PTM analysis: Chapters 11, 14, 15, 16 (phospho),
Chapter 17 (glyco), and Chapter 18 (acetyl). PTM analysis can be
done with gel-based, gel-free, labeling, and targeted parallel reac-
tion monitoring approaches, the topic recently reviewed by Vu et al.
[33]. The difficulty of the PTM analysis depends on the type of
modification, its stability, stoichiometric levels of protein modifica-
tion, the existence of multiple sites for specific or different PTMs,
and the efficacy of the enrichment protocols for modified proteins
and peptides, among other items. Whereas in vitro analysis is quite
feasible, changes in the in vivo PTM profiles remain somewhat
elusive.
Three of the chapters, Chapters 19–21, address the study of
protein interactions or interactome, one of the main challenges in
the postgenomic era. Interactomics shares with PTMs their meth-
odological strategy and workflow, with a previous MS step directed
at purifying or enriching the target and partners complexes. The
difficulty in characterizing interactions is even greater than PTMs

because of the low stability of the interactions and the generation of
false positives due to unspecific binding. In order to diminish those
false positives, in vivo site-specific chemical cross-linking coupled to
MS has appeared as being a powerful technique [34], as it converts
unstable complexes into stable ones that can be purified or enriched
by using immunoaffinity techniques (Chapter 20 of this book).
Both PTMs and interactions studies are favored by computational
analysis and in silico predicted PTM motifs and functional associa-
tion network of genes ([35–37] andChapter 3 of this book). In
Chapter 19, Nee et al. report on a mass spectrometry-based label-
free quantification approach to identify protein interaction net-
works under native conditions. It uses a transgenic plant expressing
the protein of interest fused to a GFP-Tag; enrichment of the
GFP-tagged protein with its interaction partners is performed by
immunoaffinity purification, with the captured purified proteins
being analyzed by LC-MS/MS and label-free quantification.
FLAG tag-fused is an alternative, as shown in Chapter 21 by Luo
et al., who propose a protocol to study 14-3-3 interactors in tomato
fruit.
Proteomics is being increasingly employed in a directed, tar-
geted, hypothesis-based direction, thus changing the previous view
of a holistic approach that did not need a hypothesis. The latter
option was a good starting point, but it made proteomics mostly
descriptive and speculative, without the possibility of comparing
the data with those previously obtained by using other experimen-
tal approaches. In the end, experimental data has to be manually
validated if it is intended to confidently interpret it from a biological
point of view, and if we wish to escape from the tyranny of the blind
analysis based on computational tools, and to move from the forest
(whole proteomes, subproteomes, functional or structural groups)
to the tree (individual proteins). We need to understand when,
where, how, and the reasons for the orchestration of thousands of
proteins in order to construct the cellular building, to fit it into a
developmental program, and to respond to a highly changeable
environment.
Targeted (Mass Western) proteomics is a bottom-up approach
based on the MS analysis of individual proteins, or a selected group
of them, through a set of selected peptides, ideally proteotypic
ones. These are the basics of a number of recently developed
techniques such as single, multiple, or parallel reaction monitoring
(SRM, MRM, and PRM), accurate inclusion mass screening
(AIMS), and the sequential window acquisition of all theoretical
fragments (SWATH) [38]. These approaches offer new possibilities
in biomarker discoveries and multiplexing analyses [39].
In Chapter 13, Dr. Hart-Smith addresses the combined use of
targeted and untargeted LC-MS/MS data acquisition, a strategy
termed TDA/DDA, and its application to a model quantitative
plant proteomics experiment performed on Arabidopsis. This

approach is compatible with different methodologies, including
metabolic and chemical labeling and label-free approaches, and
can be used to create tailored assay libraries to assist in the interpre-
tation of quantitative proteomics data collected using the Indepen-
dent Acquisition Data (IDA).
MS techniques, in combination with classic protein purification
approaches and in silico analyses of gene sequences at the genomic
or transcriptomic level, are perfect for the chemical, structural, and
functional characterization of proteins, as illustrated in Chapter 24
by Luthje and Ramanathan. They describe a protocol to perform in
silico analysis of plant peroxidases, concretely of the secretory path-
way family, in order to determine amino acid sequence, PTMs,
structure, and ligand sites, among others. Prediction models then
have to be validated in wet experiments. In Chapter 26, Goulet
et al. introduce an activity-based functional proteomics approach
protocol for the selection of protease inhibitors, a group of peptides
with a high biotechnologic potential. This protocol is an alternative
to the in vitro activity assay with synthetic peptides, with the
advantage of additional information on specificity. The procedure
involves the capture of target Cys proteases with biotinylated ver-
sions of the cystatins, followed by the identification and quantifica-
tion of captured proteases by mass spectrometry.
Genomics, transcriptomics, and proteomics feedback each
other. Thus, up to now, protein identification has been based on
available protein sequences obtained from annotated genomes and
transcriptomes. [However, proteomics could be of great help in
improving and correcting genome annotation. With this in mind,
the term proteogenomics was coined following a publication by
Church’s group in 2004, in which proteomics data were used to
annotate the genome of Mycoplasma pneumonia [40]. The field of
proteogenomics has expanded and is being applied to a number of
living organisms, including plants. Thus, by 2008, Castellana et al.
[41], in an MS analysis of Arabidopsis tissues, found that 18,024
peptides did not correspond to annotated genes, discovering
778 new coding genes, and refining, in addition, 695 more gene
models. The topic of proteogenomics has recently been reviewed
[42]. In Chapter 23 of this book, Al-Mohanna et al. propose a
proteogenomic method for the peptide mapping of the haplotype-
derived sweetpotato genome assembly. Proteogenomics is a very
useful tool for genomics studies of species that, like sweet potato,
have a complex, hexaploid, genome (2n ¼ 6 ¼ 90).
3 Proteomics Data Validation, and Integration into Other Classic and -Omics
Approaches in the Systems Biology Direction
Up to 2010, -omics approaches were developed independently

with not much interaction between them. This made proteomics
and transcriptomics, as affirmed above, mostly descriptive and
speculative. In this decade, papers reporting the integrated employ-
ment of the two or three -omics approaches, mostly transcriptomics
and proteomics, have started to appear [4]. While defining the
contents of the present monograph, it was clear, as pointed out in
the invitation letter to contributors, that chapters on protocols for
proteomics data validation and integration with other classic and
-omics approaches in the systems biology direction would consti-
tute the main novelty in this new edition. As stated in Rey et al. [4],
“The logical transition from reductionists to a holistic strategy and
integration of multidimensional biological information is currently
accepted by the scientific community as the only way to decipher the
complexity of living organisms and predict through multiscale net-
works and models.” The integrated use of the -omics approaches will
not only allow us to connect the phenotype and the genotype but
also, more importantly, to deepen the knowledge of gene expres-
sion mechanisms, including posttranscriptional (RNA splicing,
micro-RNAs, small interfering RNA, long noncoding RNAs), and
posttranslational (phosphorylation, glycosylation, acetylation,
methylation, etc.) events [43].
The new strategy requires novel methodologies, with bioinfor-
matics and computer skills being the real bottleneck. The experi-
mental setup is highly complex considering the heterogeneity of the
molecules under study (DNA, RNA, proteins, and metabolites);
the levels of analysis; next-generation sequencing for nucleic acids;
mass spectrometry for proteins and metabolites, the huge amount
of data produced, and the biases generated by each methodology.
In the wet lab, one limitation is the independent extraction of
each type of biomolecule, making the results not fully comparable.
In order to solve this, protocols for sequential extraction of the
different types of biomolecules have been developed [44]. Valle-
dor’s group, in Chapter 2, introduces a novel protocol, optimized
for microalgae, that allows for the combined extraction of different
levels including total metabolites, or their pigments or lipids frac-
tions along with nucleic acids (DNA and RNA) and/or proteins
from the same sample, reducing biological and time variations
between different levels of data.
The workflow, including wet and dry steps, has recently been
reviewed [4], including original articles and reviews related to the
topic. In order to avoid repetitions, I suggest that the reader go
through it.
Chapter 27, by Lopez-Hidalgo et al., is a good example of how

to use the different -omics for gaining biological knowledge. They
present a protocol based on a multiomics approach for the meta-
bolic pathway reconstruction in a recalcitrant and orphan plant
species, that is, the forest tree Holm oak (Quercus ilex). There are
more examples in the very recent current literature, such as the
study of substantial equivalence in transgenic crops [45], seed
germination in Arabidopsis [46], somatic embryogenesis [47],
biotic stress in fruit crops [48], and root development [49].
While I was summarizing the advances in (plant) proteomics
methods and protocols in the past 5 years since the second edition
of this monograph was published, I began to wonder what the
future holds for this discipline and I asked myself two questions:
(a) How long will it take before a fourth edition is needed? and
(b) Will this third edition become obsolete? The answer to these
questions is, in my opinion, are as follows: (a) In a few years’ time
and (b) No. Proteomics, and more concretely plant proteomics, is
in its infancy, at the descriptive stage, with the proteomes observed
being just the tip of the iceberg. We are assembling the pieces of a
puzzle that will help us to understand how the cell is built and how
it works. We are striving to see light at the end of the very long
tunnel that links genotype and phenotype that, however, is still too
dark. Every proteomics experiment shows us that life is more
complex than we have ever imagined, while research continues to
be reductionist and simple.
References
1. Thiellement H, Zivy M, Damerval C et al (eds) 6. Picotti P, Bodenmiller B, Aebersold R (2013)
(2007) Plant proteomics methods and proto- Proteomics meets the scientific method. Nat
cols. Methods Mol Biol 355:1–8 Methods 10:24–27
2. Jorrin-Novo JV, Komatsu S, Weckwerth W et al 7. Gillet LC, Navarro P, Tate S et al (2012) Tar-
(2014) Plant proteomics methods and proto- geted data extraction of the MS/MS spectra
cols. In: Methods molecular biology, vol 1072, generated by data independent acquisition: a
2nd edn. Humana Press, Totowa new concept for consistent and accurate prote-
3. Jorrin Novo JV (2014) Plant proteomics meth- ome analysis. Mol Cell Proteomics 11:
ods and protocols. In: Novo J et al (eds) O111.016717
Chapter 1, plant proteomics methods and pro- 8. Jorrin-Novo JV, Komatsu S, Sanchez-Lucas R
tocols, Methods molecular biology, vol 1072, et al (2018) Gel electrophoresis-based plant
2nd edn. Humana Press, Totowa, pp 3–13 proteomics: past, present, and future. Happy
4. Rey MD, Valledor L, Castillejo MA et al 10th anniversary journal of proteomics. J Pro-
(2019) Recent advances in MS-based plant teome 198:1–10
proteomics: proteomics data validation 9. Luthria DL, Maria John KM, Marupaka R et al
through integration with other classic –omics (2018) Recent update on methodologies for
approaches. In: Progress in botany. Springer, extraction and analysis of soybean seed pro-
Berlin, Heidelberg teins. J Sci Food Agric 98:5572–5580
5. Neilson KA, Ali NA, Muralidharan S et al 10. Fesmire JD (2019) A brief review of other
(2011) Less label, more free: approaches in notable electrophoretic methods. Methods
label-free quantitative mass spectrometry. Pro- Mol Biol 1855:495–499
teomics 11:535–553 11. Minic Z, Dahms TES, Babu M (2018) Chro-
matographic separation strategies for precision
mass spectrometry to study protein-protein 25. Lankinen A, Abreha KB, Masini L et al (2018)
interactions and protein phosphorylation. J Plant immunity in natural populations and
Chromatogr B Analyt Technol Biomed Life agricultural fields: Low presence of
Sci 1102-1103:96–108 pathogenesis-related proteins in Solanum
12. Ankney JA, Muneer A, Chen X (2018) Relative leaves. PLoS One 13:e0207253
and absolute quantitation in mass 26. Ghatak A, Chaturvedi P, Weckwerth W (2017)
spectrometry-based proteomics. Annu Rev Cereal crop proteomics: systemic analysis of
Anal Chem 11:49–77 crop drought stress responses towards marker-
13. Eliuk S, Makarov A (2015) Evolution of Orbi- assisted selection breeding. Front Plant Sci
trap mass spectrometry instrumentation. Annu 8:757
Rev Anal Chem 8:61–80 27. Schaffer LV, Millikin RJ, Miller RM et al
14. Jung H, Winefield C, Bombarely A et al (2019) (2019) Identification and quantification of
Tools and strategies for long-read sequencing proteoforms by mass spectrometry. Proteomics
and de novo assembly of plant genomes. 19:SI 1800361
Trends Plant Sci 24(8):P700–P724. (in press) 28. Naryzhny S (2019) Inventory of proteoforms
15. Guerrero-Sanchez VM, Maldonado-Alconada- as a current challenge of proteomics: some
A, Amil-Ruiz et al (2019) Ion torrent and lllu- technical aspects. J Proteome 191:22–28
mina, two complementary RNA-seq platforms 29. Toby TK, Fornelli L, Kelleher NL (2016)
for constructing the holm oak (Quercus ilex) Progress in top-down proteomics and the anal-
transcriptome. PLoS One 14:e0210356 ysis of proteoforms. Annu Rev Anal Chem
16. Misra BB (2018) Updates on resources, soft- (Palo Alto, Calif) 9:499–519
ware tools, and databases for plant proteomics 30. Hashiguchi A, Komatsu S (2017) Postransla-
in 2016–2017. Electrophoresis 39:1543–1557 tional modifications and plant-environment
17. Subba P, Narayana Kotimoole C et al (2019) interaction. Methods Enzymol 586:97–113
Plant proteome databases and bioinformatic 31. Wu XL, Gong FP, Cao D et al (2016) Advances
tools: an expert review and comparative in crop proteomics: PTMs of proteins under
insights. OMICS 23:190–206 abiotic stress. Proteomics 16:847–865
18. Martens L, Vizcaı́no JA (2017) A golden age 32. Friso G, van Wijk KJ (2015) Posttranslational
for working with public proteomics data. protein modification in plant metabolism.
Trends Biochem Sci 42:333–341 Plant Physiol 3:1469–1487
19. Duncan O, Trosch J, Fenske R et al (2017) 33. Vu LD, Gevaert K, De Smet I (2018) Protein
Resource: mapping the Triticum aestivum pro- language: post-translational modifications talk-
teome. Plant J 89:601–616 ing to each other. Trends Plant Sci
20. Katam K, Jones KA, Sakata K (2015) Advances 12:1068–1080
in proteomics and bioinformatics in agriculture 34. Zhu XL, Yu FC, Yang Z et al (2016) In planta
research and crop improvement. J Proteomics chemical cross-linking and mass spectrometry
Bioinform 8:3 analysis of protein structure and interaction in
21. Hu J, Rampitsch C, Bykova NV (2015) Arabidopsis. Proteomics 16:1915–1927
Advances in plant proteomics toward improve- 35. Li GXH, Vogel C, Choi H (2018) PTMscape:
ment of crop productivity and stress resistance. an open source tool to predict generic post-
Front Plant Sci 6:209 translational modifications and map modifica-
22. Carrera DA, Oddsson S, Grossmann J et al tion crosstalk in protein domains and biological
(2018) Comparative proteomic analysis of processes. Mol Omics 14:197–209
plant acclimation to six different long-term 36. Willems P, Horne A, Van Parys T, et al (2019)
environmental changes. Plant Cell Physiol The Plant PTM Viewer, a central resource for
59:510–526 exploring plant protein modifications. Plant J
23. Schneider S, Harant D, Bachmann G et al doi: https://doi.org/10.1111/tpj.14345.
(2019) Subcellular phenotyping: using proteo- [Epub ahead of print]
mics to quantitatively link subcellular leaf pro- 37. Yao H, Wang X, Chen P et al (2018) Predicted
tein and organelle distribution analyses of Arabidopsis interactome resource and gene set
Pisum sativum cultivars. Front Plant Sci 10:638 linkage analysis: a transcriptomic analysis
24. de Lamo FJ, Constantin ME, Fresno DH et al resource. Plant Physiol 177:422–433
(2018) Xylem sap proteomics reveals distinct 38. Rödiger A, Baginsky S (2018) Tailored use of
differences between R gene- and endophyte- targeted proteomics in plant-specific applica-
mediated resistance against Fusarium wilt dis- tions. Front Plant Sci 9:1204
ease in tomato. Front Microbiol 9:2977 39. Chawade A, Alexandersson E, Bengtsson T
et al (2016) Targeted proteomics approach
for precision plant breeding. J Proteome Res 45. Corujo M, Pla M, van Dijk J et al (2019) Use of
15:638–646 omics analytical methods in the study of genet-
40. Jaffe J, Berg HC, Church GM (2004) Proteo- ically modified maize varieties tested in 90 days
genomic mapping as a complementary method feeding trials. Food Chem 292:359–371
to perform genome annotation. Proteomics 46. Ponnaiah M, Gilard F, Gakiere B et al (2019)
4:59–77 Regulatory actors and alternative routes for
41. Castellana NE, Payne SH, Shen Z (2008) Dis- Arabidopsis seed germination are revealed
covery and revision of Arabidopsis genes by using a pathway-based analysis of transcrip-
proteogenomics. Proc Natl Acad Sci U S A tomic datasets. Plant J 99:163–175
105:21034–21038 47. Pais MS (2019) Somatic embryogenesis induc-
42. Low TY, Mohtar MA, Ang MY et al (2019) tion in woody species: the future after omics
Connecting proteomics to next-generation data assessment. Front Plant Sci 10:240
sequencing: Proteogenomics and its current 48. Li T, Wang YH, Liu JX et al (2019) Advances in
applications in biology. Proteomics 19: genomic, transcriptomic, proteomic, and
e1800235 metabolomic approaches to study biotic stress
43. Hong WJ, Kim YJ, Chandran AKN et al (2019) in fruit crops. Crit Rev Biotechnol 39:680–692
Infrastructures of systems biology that facilitate 49. Proust H, Hartmann C, Crespi M et al (2018)
functional genomic study in rice. Rice 12:15 Root development in Medicago truncatula: les-
44. Xiong J, Yang Q, Kang J et al (2011) Simulta- sons from genetics to functional genomics.
neous isolation of DNA, RNA, and protein Methods Mol Biol 1822:205–239
from Medicago truncatula L. Electrophoresis
32:321–330
Chapter 2
Multiple Biomolecule Isolation Protocol Compatible

with Mass Spectrometry and Other High-Throughput
Analyses in Microalgae
Francisco Colina, Marı́a Carbó, Ana Álvarez, Mónica Meijón,
Marı́a Jesús Cañal, and Luis Valledor
Abstract
Microalgae are gaining attention in industry for their high value–added biomolecules and biomass produc-
tion and for studying fundamental processes in biology. The introduction of novel approaches for under-
standing and modeling molecular networks at different omic levels is paramount for increasing the
productivity of these organisms. However, the construction of these networks requires high quality datasets
with, if possible, perfectly overlapping datasets. The employ of different materials for different biomolecule
isolation protocols, even if they come from the same homogenate, is one of the commonest issues affecting
quality. Hence, a new method has been developed, allowing for the combined extraction of different levels
including total metabolites, or their pigments or lipid fractions along nucleic acids (DNA and RNA) and/or
proteins from the same sample reducing biological and time variation between levels data.
Key words Microalgae, Proteomics, Lipids, Metabolite, Pigments, DNA, RNA
1 Introduction
Microalgae have gained attention in industry during the last dec-

ades. They constitute a sustainable production platform due to
their high biomass production together with their generation of
high value–added biomolecules such as biodiesel, ß-carotene, astax-
anthin, and omega-3. However, research is still necessary to make
these microorganisms real economically profitable producers.
Moreover, not only is microalgae research industry-focused, but
their intermediate plant–animal phylogenetic position makes them
a powerful and convenient model to study fundamental processes
in biology [1].
Understanding microalgae metabolic networks is complex, but
recent advances in omics and systems biology allow for the reliable
characterization and (semi)quantitation of hundreds to thousands
11
12 Francisco Colina et al.
700 µL ddH2O
TUBE L TUBE NAP
Lipids extraction Lipids

+ DNA
RNA
Proteins
1900 g, 5 min
TUBE S
1900 g, 2 min Spin DNA RNA Proteins

TUBE Pi TUBE NAP
Nucleic acids
Discard supernatant
Pigments extraction DNA extracction
Discard supernatant
Pigments
+ RNA
Proteins
TUBE TUBE TUBE P

DNA RNA
TUBE NPM TUBE PM2 TUBE NAP
Metabolites extraction DNA

+ RNA
Proteins
Non polar Polar

metabolites metabolites
Storage (-20ºC)
until LC/MS analysis
Stacking
Desalting Protein digestion 1 cm

Proteins purification
C18 tips
Trypsin
Resolving
Protein fractionation
Fig. 1 Workflow of microalgae metabolite, lipid, or pigment fraction extraction combined with nucleic acid
and/or protein extraction from the same sample
of transcripts, proteins, or metabolites and its integration into

different functional networks, helping to better understand their
functions and relationships.
However, this high-throughput capability comes at a cost: the
different omic levels require different isolation methods, analytical
platforms, and specific data processing pipelines. The biases related
to different sample processing could have a major impact over later
bioinformatics analyses and metabolic reconstruction. The best
strategy to avoid this potential flaw is the development of a multiple
extraction protocols, allowing for the fragmentation of a single
sample into its different omic layers. These kinds of protocols
have been developed for plants, animals, and microorganisms
[2–4]. In Chlamydomonas, different strategies have been devel-
oped focusing the multiple extraction of metabolites, nucleic acids,
and protein fractions [2], but none of these are compatible with the
commonly used spectrophotometry- and gravimetry-based physio-
logical indexes as total lipid content or pigment contents. For this
reason, we have developed a multiple extraction protocol allowing
for either metabolite or total lipid or pigment fraction extraction
along with nucleic acid (DNA and RNA) and protein extraction
from the same microalga sample (Fig. 1). Moreover, this protocol
can be easily coupled to other procedures including the
fluorescence-based lipid [5], enzyme-based starch [6], or phenolic
compound [7] quantitation and various phenotyping workflows as
in vivo quantification of pigments [8] and lipids/carbohydrates [9],
photosynthetic performance, and growth [10].
2 Materials
2.1 Cell Culture 1. Chlamydomonas reinhardtii CC-503 cw92 mt+ agg1+ nit1
Materials nit2 (available at the Chlamydomonas Culture Collection,
Duke University).
Multiple Biomolecule Isolation in Microalgae 13
2. Tris-Acetate-Phosphate Media (TAP) (https://www.

chlamycollection.org). For 1 L of media combined the follow-
ing amounts of stock solutions and autoclave: 10 mL of TAP
salts stock, 1 mL of TAP Phosphate Solution, 1 mL of Hutner’s
trace elements stock, 2.42 g of Tris base, and 1 mL of glacial
acetic acid. Adjust pH to 7.0–7.5.
3. Culture physical environment. Light intensity: 100 μmol/m2 s
PAR is a good level for photosynthetically competent cultures
on agar. For liquid cultures, light intensities of 200–300 μmol/
m2 s, shaking at 110–150 rpm, and 25 C temperature are
recommended.
4. Material needed for the culture: flask, incubator, or culture
chamber with temperature, light intensity, photoperiod, and
shake control.
2.2 Sampling and 1. 50 mL conical tubes, 1.5 mL tubes, 2 mL tubes, and 1.5 screw-
Extraction Materials cap tubes.
2. Refrigerated centrifuge.
3. Regimill/Fastprep (beads beating system).
4. Vortex.
5. Vacuum concentrator (speedvac).
6. Heat block.
7. Ultrasound sonicator.
8. Freezers (20 and 80 C).
2.3 Sampling and 1. Metabolite extraction buffer (MEB): methanol–chloroform–

Extraction Reagents ddH2O (2.5:1:0.5). Store at 4 C; must be cold when added.
and Solutions 2. Phase separation mix (PSM): chloroform–ddH2O (1:1) (see
Note 1).
3. Polar metabolites extraction buffer (PMEB): chloroform–
ddH2O (1:1).
4. Pigment extraction buffer (PEB): acetone–1 M Tris pH 8–
ddH2O (80:5:15).
5. Lipid extraction buffer 1 (LBE1): chloroform–isopropanol
(1:1).
6. Lipid extraction buffer 2 (LBE2): hexane.
7. Washing buffer 1 (WB1): 0.75% (v/v) ß-mercaptoethanol in
100% methanol.
8. Washing buffer 2 (WB2): 2 mM Tris pH 7.5, 20 mM NaCl,
0.1 mM EDTA, 90% ethanol.
9. Washing buffer 3 (WB3): 2 mM Tris pH 7.5, 20 mM NaCl,
0.1 mM EDTA, 70% ethanol.
10. RNase solution: 300 μL of WB2 and 3 μL of 20 mg/mL

PureLink RNAse A (Invitrogen).
11. DNase solution: 300 μL of WB2, 3 μL of 10 DNase I Buffer
and 3 μL of 2 U/μL DNase I (Ambion).
12. Protein solubilization buffer (PSB): 7 M guanidine hydrochlo-
ride, 2% (v/v) TWEEN 20, 4% (v/v) NP-40, 50 mM Tris,
pH 7.5, 1% (v/v) ß-mercaptoethanol.
13. Phenol.
14. Protein phase separation mix (PPSM): phenol–ddH2O
(0.92:1).
15. Phenol washing buffer: 0.7 M sucrose, 50 mM Tris–HCl
pH 7.5, 50 mM EDTA, 0.5% ß-mercaptoethanol, 0.5% (v/v)
Plant Protease Inhibitor Cocktail (Sigma-Aldrich).
16. Protein precipitation buffer (PPB): 0.1 M ammonium acetate
and 0.5% ß-mercaptoethanol in methanol.
17. Methanol.
18. Protein pellet washing buffer (PPWB): acetone–ddH2O
(85:15).
19. Protein pellet solubilization buffer (PPSB): Urea 8 M with
4% SDS.
3 Methods
3.1 Sampling 1. Harvest 50 mL of culture and centrifuge at 1900 g for 5 min.

Method Discard the supernatant (see Notes 2 and 3).
2. Resuspend the cell pellet in 700 μL of ddH2O and transfer to a
2 mL tube (tube S) (see Note 4).
3. Centrifuge at 1900 g for 2 min. Discard the supernatant.
And spin the tube S on a centrifuge to discard all the superna-
tant (see Note 5).
4. Weight the tube S and determine the fresh weight (see Note 6).
3.2 Metabolite Following steps must be done in ice and centrifugations at 4 C

Extraction Method unless specified. Metabolites extraction is not compatible with lipid
and pigment extractions.
1. Transfer the content of tube S to a new screw-cap tube with
glass beads (tube SB). Add 600 μL of MEB and, if needed,
resuspend the pellet by pipetting up and down (see Notes 7
and 8).
2. Homogenize pellets by beads beating until total
homogenization.
3. Centrifuge at 20,000 g for 6 min and transfer supernatant to

tube containing 800 μL of PSM (tube M) (see Note 9). The
pellet contains nucleic acids and proteins (tube NAP).
4. Mix well by vortexing and centrifuge tube M, 5 min at
15,000 g.
5. During centrifugation time add 500 μL of WB1 to tube NAP
and mix by vortex until the pellet is mostly disaggregated (see
Note 10). Keep at 4 C until metabolite extraction is finished.
6. After step 6 is finished, two different phases should be clearly
defined with a sharp interphase. Transfer the upper, aqueous
layer to a new 2 mL microcentrifuge tube (Tube PM, polar
metabolites). Transfer the lower layer, containing nonpolar
metabolites, to a new 2 mL tube (Tube NPM) (see Notes 11
and 12).
7. Add 300 μL of PMEB to each PM tube. Mix 1 min at room
temperature and centrifuge at 15,000 g for 4 min.
8. Transfer upper layer to a new microcentrifuge tube PM2 (see
Note 11).
9. Dry PM2 and NPM tubes in a speedvac or under nitrogen
stream. Keep the dried tubes at 20 C or 80 C until
analysis.
10. Centrifuge tube NAP at 20,000 g for 10 min. Discard
supernatant without disturbing the pellet.
3.3 Pigment Following steps must be at 4 C unless other conditions are speci-
Extraction Method fied. All materials used must be acetone resistant. Pigment extrac-
tion is not compatible with metabolite and lipid extractions.
1. Add 500 μL of PEB to tube S for pellet resuspension. Transfer
to the glass beads screw cap tubes (tube SB) (see Note 8).
2. Add 500 μL of PEB to tube S and be sure the pellet is
completely resuspended. Mix with previous PEB (step 1) in
the tube SB.
3. Vortex vigorously for 30 s or Regimill/Fastprep for 30 s.
Transfer to a new 1.5 mL tube (tube NAP).
4. Centrifuge for 5 min at 21,100 g. Transfer supernatant to a
new tube (tubePi). The pellet containing nucleic acids and
proteins (tube NAP) should be whitish-brownish (see
Note 13).
5. Read the absorbance of tube Pi (dilute Pi contents if necessary)
immediately, since the acetone is highly volatile.
Absorbance to be read: 470 nm, 537 nm, 647 nm, 663 nm.
Take the background-subtracted mean absorbance of the three
replicates (see Note 14).
6. The concentration of chlorophylls and carotenoids (in μmol

mL1) can be obtained with the following equations (see Note
15) according to [11]:
Chla ¼ 0, 01373 A 663 0, 000897 A537 0, 003046 A 647
Chlb ¼ 0, 02405 A 647 0, 004305 A537 0, 005507 A 663
Carotenoids ¼ ðA 470 ð17, 1 ðChla þ Chlb ÞÞ=119, 26
7. Air-dry pellets for PEB evaporation at room temperature (tube
NAP) (see Note 10).
3.4 Lipid Extraction Lipid extraction is not compatible with pigment and metabolite
Method extractions.
1. Add 200 μL of LBE1 to cell pellet (tube S) and transfer to a
glass beads containing screw-cap tube (tube SB).
2. Homogenize using beads beating until total homogenization.
Weight a 1.5 mL tube (tube L).
3. Centrifuge at 14,000 g for 5 min at room temperature and
transfer supernatant to the tube L.
4. Repeat steps 1 and 2, mixing both fractions in the tube same L.
5. Reextract the pellet with 400 μL of LBE2 and vigorously
vortex for 3 min.
6. Centrifuge at 14,000 g for 5 min at room temperature and
transfer supernatant to tube L. The pellet contains proteins and
nucleic acids (tube NAP) (see Notes 10 and 16).
7. Dry tube L in a speedvac or oven.
8. Determine lipid weight gravimetrically.
3.5 Nucleic Acid The following steps must be carried out at 4 C, unless other
Purification Method conditions are specified.
1. Add 500 μL of WB1 to tube NAP and mix by vortex until the
pellet is mostly disaggregated (see Note 17). Centrifuge at
20,000 g for 10 min. Discard supernatant without disturbing
the pellet (see Note 18).
2. Resuspend the pellet in 400 μL of PSB and centrifuge at
14,000 g for 3 min.
3. Transfer supernatant to a new silica column (SC1) placed in a
nuclease- and protease-free 2 mL tube (see Note 18). Centri-
fuge at 10,000 g for 1 min.
4. Transfer the flow through to a new tube (tube RP) containing
RNA and proteins. Reserve the SC1 containing DNA for later
washing steps.
5. Add 400 μL of acetonitrile to the tube RP and mix first by

pipetting and then by vortex.
6. Transfer tube RP sample mix to a new silica column (SC2)
placed in a nuclease- and protease-free 2 mL tube (tube P) (see
Note 18).
7. Centrifuge SC2 at 12,000 g for 2 min and save the flow-
through containing proteins in tube P.
8. Wash the columns SC1 and SC2 with 600 μL of WB2. Centri-
fuge at 12,000 g for 2 min and discard the flow through.
9. Add 300 μL of RNase solution to SC1 and incubate 30 min at
room temperature. Add 360 μL of DNase solution to SC2 and
incubate 30 min at 37 C.
10. Centrifuge SC1 and SC2 at 12,000 g for 1 min. Discard the
flow-through.
11. Add 600 μL of WB3 to SC1 and SC2. Centrifuge at 12,000 g
for 2 min. Discard the flow-through.
12. Centrifuge SC1 and SC2 1 min at 20,000 g (see Note 19).
13. Place SC1 in a new tube (tube DNA) and SC2 in other one
(tube RNA). Add 50 μL of ddH2O to the center of the mem-
brane of SC1 and SC2. Incubate 5 min at room temperature.
14. Centrifuge SC1 and SC2 at 12,000 g for 1 min for eluting
both DNA (tube DNA) and RNA (tube RNA).
3.6 Protein Following steps must be at 4 C unless other conditions are

Extraction and specified.
Purification Methods 1. Add 100 μL of PSB and 300 μL of phenol to tube P. Mix by
vortexing and incubate for 2 min at room temperature (see
Note 20).
2. Add 1150 μL of PPSM to tube P and vortex for 1–2 min at
room temperature.
3. Centrifuge for 5 min at 10,000 g and room temperature for
allowing for phase separation.
4. Transfer the upper phenolic phase containing proteins to a new
tube (tube A) and add 600 μL of PWB. Vortex for 1–2 min and
then centrifuge 5 min at 10,000 g and room temperature.
5. Transfer the upper phenolic phase to a new tube (tube B),
being carefully for not disturbing the interphase (see Note 21).
6. Precipitate the proteins by adding 1.5 mL of PPB to tube
B. Incubate over night at 20 C (see Notes 22 and 23).
7. Centrifuge tube B at 10,000 g for 15 min and discard the
supernatant carefully using a pipette for not disturbing the
pellet.
8. Fill the tube B with methanol and disaggregate the pellet using
an ultrasound sonicator.
9. Centrifuge at 10,000 g for 10 min and discard the superna-
tant without disturbing the pellet.
10. Wash the pellet with 600 μL of PPWB. Mix until the pellet is
completely disaggregated (see Note 24).
11. Centrifuge at 10,000 g for 10 min and discard the superna-
tant without disturbing the pellet.
12. Air-dry pellets and redissolve in an adequate buffer (see Notes
25 and 26).
13. Resolubilize and quantify proteins (see Note 26).
Proceed with protein fractionation, digestion, desalting, and
concentration according to [12].
4 Notes
1. Phase separation mix should be prepared in the 1.5 mL tube.

2. Cell concentration should be 5 105–1 106 cells/mL.
3. All the sampling steps must be done quickly. If it is not possible,
centrifuge 15 mL of cell culture in 35 mL of cold (80 C)
methanol and keep it at 80 C until the extraction will be
performed.
4. Weight 2 mL tubes S before transferring the cells.
5. Discarding all supernatant is crucial for the step 4.
6. Maximum fresh weight for extraction should be 50 mg.
7. Tissue should remain frozen during all of the process.
8. Cut the pipette tip for an easier resuspension.
9. If the resultant pellet is green (nonwhitish), proceed to reho-
mogenize it because it indicates a poor homogenization. Add
200 μL of MEB to the tube containing the pellet. Mix well by
vortex until the pellet is completely disaggregated. Centrifuge
at 20,000 g for 6 min and transfer supernatant to tube M.
10. If it is needed the nucleic acid extraction, perform immediately
the first step of nucleic acid extraction and maintain at 4 C.
For performing directly protein extraction, maintain at 4 C or
air-dry the pellets (tube NAP) at room temperature and kept
overnight at 20 C. For long-term storage keep at 80 C
until nucleic acid and protein extractions. This purification is
compatible with directly protein extraction and purification
without nucleic acid purification.
11. Low-binding tube is preferred.
12. Sometimes polar phase can be slightly cloudy, becoming trans-

parent if the tube is warmed to room temperature. This indi-
cates a chloroform contamination. In this case, a second wash
of the PM tube with WB1 is recommended. Transfer the upper
layer to a new PM tube and the lower to the NPM tube.
13. If the pellet remains green colored, repeat steps 3 and 4.
14. The spectrophotometer response is linear with pigment con-
centration up to an absorbance of 1. When the peak absorbance
of the samples exceeds 1, the solutions are diluted further and
remeasured.
15. All of these values should be multiplied by the dilution factor of
the samples (if sample is diluted). The fresh weight can be used
to obtain the moles of each pigment by milligram of fresh
weight.
16. If the pellet remains green and hexane still pigmented, repeat
steps 4 and 5 but adding 500 μL LBE2 instead of 400 μL. In
case the pellet remains green, continue with the extraction
because green pellet color may come from the chlorophyll
hemo group separation from the dead cells.
17. Pipetting through beads reduces the amount of nonsoluble
particles that are taken up.
18. Avoid transferring pellet particles to silica column.
19. This step is for eliminating residual ethanol and completely
drying the column for a better elution of nucleic acids.
20. If previous nucleic acid extraction is not performed, disaggre-
gate tube NAP pellet in 400 μL of PSB and transfer the
dissolved pellet to a new tube (tube P). Then, follow the
protein extraction and purification protocol.
21. For a maximum protein yield, remaining aqueous phase of tube
A can be reextracted with 550 μL of phenol, repeating steps 4–
5.
22. If aqueous phase was reextracted, transfer the upper phenolic
phase to a 10 mL tube, and precipitate protein with 4 mL
of PPB.
23. Pause point: precipitated proteins in acetone are stable for
more than 1 week at room temperature, but we recommended
keeping them at 20 C until extraction is resumed.
24. Pellets that are not completely dry (but with the acetone
completely evaporated) are easier to solubilize.
25. Pellet solubilization should be done in an appropriate buffer
depending on the downstream application of proteins. Chla-
mydomonas best protein pellets buffer solubilizer is urea 8 M,
4% SDS.
26. Choose protein quantification method depending on the com-

patibility of protein resuspension buffer used.
Acknowledgments
Our research group is generously funded by Spanish Ministry of

Science, Innovation and Universities (AGL2016-77633-P and
AGL2017-83988-R). M.M., L.V., and F.C. were also supported
by Spanish Ministry of Science, Innovation and Universities
through Ramón y Cajal (RYC-2014-14981, RYC-2015-17871 to
M.M. and L.V., respectively) and Programa de Ayudas Predoctor-
ales Severo Ochoa, Autonomous Community of Asturias, Spain
(BP14-138) to F.C. programs.
References
1. Sasso S, Stibor H, Mittag M et al (2018) From oxidation substrates and antioxidants by means
molecular manipulation of domesticated Chla- of Folin-Ciocalteu reagent. In: Methods in
mydomonas reinhardtii to survival in nature. enzymology. Academic Press, Cambridge, pp
elife 7:e39233 152–178
2. Valledor L, Escandón M, Meijón M et al 8. Gregor J, Maršálek B (2004) Freshwater phy-
(2014) A universal protocol for the combined toplankton quantification by chlorophyll a: a
isolation of metabolites, DNA, long RNAs, comparative study of in vitro, in vivo and in
small RNAs, and proteins from plants and situ methods. Water Res 38:517–522
microorganisms. Plant J 79:173–180 9. Chiu L, Ho S-H, Shimada R et al (2017) Rapid
3. Nakayasu ES, Nicora CD, Sims AC et al (2016) in vivo lipid/carbohydrate quantification of
MPLEx: a robust and universal protocol for single microalgal cell by Raman spectral imag-
single-sample integrative proteomic, metabo- ing to reveal salinity-induced starch-to-lipid
lomic, and lipidomic analyses. MSystems 1: shift. Biotechnol Biofuels 10:9
e00043–e00016 10. Strenkert D, Schmollinger S, Gallaher SD et al
4. Salem MA, Jüppner J, Bajdzienko K et al (2019) Multiomics resolution of molecular
(2016) Protocol: a fast, comprehensive and events during a day in the life of Chlamydomo-
reproducible one-step extraction method for nas. PNAS 116:2374–2383
the rapid preparation of polar and semi-polar 11. Sims DA, Gamon JA (2002) Relationships
metabolites, lipids, proteins, starch and cell between leaf pigment content and spectral
wall polymers from a single sample. Plant reflectance across a wide range of species, leaf
Methods 12:45 structures and developmental stages. Remote
5. Morschett H, Wiechert W, Oldiges M (2016) Sens Environ 81:337–354
Automation of a Nile red staining assay enables 12. Valledor L, Weckwerth W (2014) An improved
high throughput quantification of microalgal detergent-compatible gel-fractionation
lipid production. Microb Cell Factories 15:34 LC-LTQ-Orbitrap-MS workflow for plant
6. Smith AM, Zeeman SC (2006) Quantification and microbial proteomics. In: Jorrin-Novo JV,
of starch in plant tissues. Nat Protoc 1:1342 Komatsu S, Weckwerth W, Wienkoop S (eds)
7. Singleton VL, Orthofer R, Lamuela-Raventós Plant proteomics: methods and protocols.
RM (1999) Analysis of total phenols and other Humana Press, Totowa, NJ, pp 347–358
Chapter 3
Protein Interaction Networks: Functional and Statistical

Approaches
Mónica Escandón, Laura Lamelas, Vı́ctor Roces,
Vı́ctor M. Guerrero-Sanchez, Mónica Meijón, and Luis Valledor
Abstract
The evolution of next-generation sequencing and high-throughput technologies has created new oppor-
tunities and challenges in data science. Currently, a classic proteomics analysis can be complemented by
going a step beyond the individual analysis of the proteome by using integrative approaches. These
integrations can be focused either on inferring relationships among proteins themselves, with other
molecular levels, phenotype, or even environmental data, giving the researcher new tools to extract and
determine the most relevant information in biological terms. Furthermore, it is also important the employ
of visualization methods that allow a correct and deep interpretation of data.
To carry out these analyses, several bioinformatics and biostatistical tools are required. In this chapter,
different workflows that enable the creation of interaction networks are proposed. Resulting networks
reduce the complexity of original datasets, depicting complex statistical relationships (through PLS analysis
and variants), functional networks (STRING, shinyGO), and a combination of both approaches. Recently
developed methods for integrating different omics levels, such as coinertial analyses or DIABLO, are also
described. Finally, the use of Cytoscape or Gephi was described for the representation and mining of the
different networks.
This approach constitutes a new way of acquiring a deeper knowledge of the function of proteins, such as
the search for specific connections of each group to identify differentially connected modules, which may
reflect involved protein complexes and key pathways.
Key words Protein networks, String, Omics levels, sPLS, DIABLO, Cytoscape
1 Introduction
The classic workflow for proteome analysis, mainly based on the use
of univariate statistics and PCAs, is being quietly displaced in favor
of new approaches that take advantage of protein interaction
knowledge and advanced statistical tools. These novel
Electronic supplementary material: The online version of this chapter (https://doi.org/10.1007/978-1-0716-

0528-8_3) contains supplementary material, which is available to authorized users.
21
22 Mónica Escandón et al.
methodologies allow for the study of the proteome and its interac-
tion with other biomolecules, the environment, and even with
itself, providing a holistic perspective. This kind of workflows
gives the researcher the possibility of having a deeper understand-
ing of the biological responses behind the observed differences in
the experimental systems.
Integrative studies heavily rely on computational biology and
require the use of specific algorithms, methods, and models to
extract and determine the most relevant information in biological
terms [1, 2]. Common classification methods (including discrimi-
nant analysis; neural networks; decision trees; support vector
machine, SVM; and random forest, RF) are suited to single dataset
analyses [2], whereas the methods that build predictive models
require multiple sources of those which act as predictors and
those which are predicted.
The most employed methods for the characterization of multi-
ple omics dataset is the combination of unsupervised multivariate
statistics, like principal component analysis (PCA), and supervised,
like partial least square (PLS) and discriminant analysis (PLS-DA)
and its variants [3–5]. PLS methods are suitable to integrate two
datasets considering one omic level a predictor of a second omic
level, the response. With these methods it is possible to get an
overview about the most important variables (proteins, metabolites
or transcripts) determining which variables of the predictor explain
the maximum variance of the responses [6]. In addition, there are
innovative multiple integration tools (for more than two different
data inputs) that allow for the construction of these relationship
based models, such as DIABLO (Data Integration Analysis for
Biomarker discovery using a latent component method for Omics
studies) [2], multiple coinertia analysis (MCIA) [7], and
xMWAS [8].
Integrative analysis can be pushed beyond sample and variable
biplotting or variable filtering, since determined interaction can be
depicted as networks, where the variables are the nodes and the
relations among them, the edges. As a result, this simple represen-
tation collects the complexity of the original data as retrieved by
previous analyses. These networks can be topologically evaluated to
determine the most connected nodes or hubs within the data as
well as subnetworks or clusters with same (or opposite) behavior.
Interaction networks described above, and its inferred relation-
ships are based on statistical analyses. However, there is also possi-
ble to create or enrich those networks with functional or
biologically relevant annotations. This new information layer is
obtained from specific tools and databases (STRING, ShinyGO)
which gather known functional relations (protein–protein, protein–
metabolite associations). In addition, we can create functional net-
works [9] even for species not included in these databases through
BLAST and protein domain analyses.
Protein Interaction Networks 23
In this chapter, different workflows aimed to conduct all of

these functional and statistical approaches together with data visu-
alization are described.
2 Materials
Next, we describe different approaches aiming to obtain networks

that infer proteins connection between themselves and with other
omics datasets, specifically with the metabolomic and transcrip-
tomic levels in the example shown.
The experiment used as an example consists of a control and
two experimental treatments (T1 treatment, T2 treatment) with
three biological replicas each. The names given for each replica are
C-1; C-2; C-3; T1-1; T1-2; T1-3; T2-1; T2-2; and T2-3. The
matrices used—Proteins matrix, Metabolites matrix, and Tran-
scripts matrix—in the different workflows follow the template
shown in Fig. 1 (where A: Protein Matrix; B: Metabolite Matrix;
and C: Transcript Matrix). Individual matrixes of each dataset have
the following arrangement: samples in columns (e.g., control,
treatment1, treatment2) and variables in rows (protein1, protein
2, . . .). This arrangement of the matrixes for entry as a dataset in the
different programs is crucial to follow to obtain good results in the
different workflows. In supplementary, a simplified dataset is
provided to carry out the different workflows (Supplementary
dataset S1). The networks shown in the chapter have been made
with real data from experiments.
Protein identification and quantification from raw MS/MS was
performed by Thermo Proteome Discoverer™, Metabolites using
MZmine 2 [10] and Transcripts from Trinity software [11]. All
datasets are normalized following the indications of [12], [13], and
[14]. In each workflow, we will specify which matrices we need as
starting materials (e.g., Protein matrix with proteins identified,
annotated, and quantified for each experimental situation).
The different workflows require different tools which are
enlisted below:
Software: R Program (v.3.6.0), Rstudio, Cytoscape (v. 3.7.1),
Cytoscape STRING App (v. 1.4.2), Gephi (v.0.9.2), and
spreadsheet.
R Libraries: Bioconductor, edgeR, ARTIVA, xMWAS, MixO-
mixs, igraph, and RColorBrewer.
3 Methods
Depending on the approach used, we have developed different

workflows, and they are summarized in this protocol index:
Fig. 1 Example matrices for data entry in R in csv format. (a) Protein matrix, (b) Metabolite matrix, and (c)
Transcript matrix (RNA-seq data in this case). Each dataset with samples in columns and variables in rows
1. Selection of differential expression proteins (for targeted net-

works) (Subheading 3.1).
2. Integration tools for statistical networks.
(a) Statistical integration networks: Dynamic protein–protein

interaction networks (Subheading 3.2.1).
(b) With other omics datasets:
l Partial Least Square Regression (PLS) and variates
(Subheading 3.2.2.1).
l Data-driven integration and differential network anal-
ysis, xMWAS (Subheading 3.2.2.2).
l Data Integration Analysis for Biomarker discovery
using a Latent component method for Omics studies
(DIABLO) (Subheading 3.2.2.3).
3. Biological interaction network enrichment.
(a) STRING (Subheading 3.3.1).
(b) ShinyGO (Subheading 3.3.2).
4. Merged functional and statistical interaction networks (Sub-
heading 3.4).
5. Network visualization tools.
(a) Cytoscape (Subheading 3.5.1).
(b) Gephi (Subheading 3.5.2).
6. Future Perspectives.
3.1 Selection For the analysis of protein–protein interactions, besides the global
of Differential analysis of the proteome, it is possible to analyze in particular the
Expression Proteins interactions of proteins with differential expression within our
(for Targeted experiment. Quantitative analysis of shotgun proteomic data can
Networks) be performed through statistical tools commonly used to measure
the differential expression of genes (proteins in our case) such as
EdgeR [15]. This package implements a range of statistical meth-
odology based on the negative binomial distributions, including
empirical Bayes estimation, exact tests, generalized linear models
and quasilikelihood tests. This analysis makes it possible to better
group proteins according to their function under certain condi-
tions, reducing network complexity and keeping only the proteins
significantly altered for a specific treatment.
Then, we will explain the workflow to obtain a selection of
differential proteins through which we will obtain the network
functionally enriched with programs such as STRING or ShinyGO
(Subheading 3.2).
Workflow 1. Install and load the required packages. These are collections of
functions, data, and R code that are stored in a folder according
to a well-defined structure, easily accessible for R (see Note 1).
In an R console or GUI (we recommend R Studio) type:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocInstaller::install("edgeR")
library(edgeR)
2. Load your data (Protein matrix, Fig. 1a), indicating the path of
the file that contains them and its format. In addition, we must
assign a name to the columns of data, indicating the controls
and the corresponding treatments.
proteins <- read.table("proteins.csv", header = T,
row.names=1, sep=";")
3. Now we can create a DGEList variable (a list-based system,

designed to store quantification data and associated informa-
tion from sequencing technologies (see Note 2)). In this case,
protein quantification data will be used instead of RNA-seq
data.
dpList <- DGEList(counts=proteins, genes=rownames(proteins))
4. To correct for variations between samples, the Trimmed Mean

of M-values (TMM) method must be applied [16] in order to
bring the average expression values of different samples to the
same scale, based on the assumption that the majority of pro-
teins are not differentially expressed.
dpList <- calcNormFactors(dpList, method="TMM")
5. Once dataset is ready, it is needed to define a matrix that will

describe the setup of the experiment. Each row defines a treat-
ment, and columns define the different proteins. For imple-
menting more treatments, see Note 3.
design <- matrix(c(c(1,1,1,0,0,0,0,0,0), c(0,0,0,1,1,1,0,0,0), c(0,0,0,0,0,0,1,1,1)),

ncol=3, dimnames = list(c(’C.1’,’C.2’, ’C.3’,’T1.1’,’T1.2’,’T1.3’,’T2.1’,’T2.2’,’T2.3’),
c(’C.’,’T1.’,’T2.’)))
A common negative binomial dispersion parameter consid-

ering the experimental design previously defined must be cre-
ated in order to estimate the variance on the dataset.
dpList <- estimateGLMCommonDisp(dpList, design=design)

dpList <- estimateGLMTrendedDisp(dpList, design=design)
dpList <- estimateGLMTagwiseDisp(dpList, design=design)
6. After calculating dispersions, differential expression values can

be estimated fitting to a Negative Binomial model because this
approach provides greater benefits than others when using
small number of replicates [15]. The contrast matrix needs to
be carried out, in which the comparisons to be made between
treatments and controls are indicated, and which must be

adjusted for each experiment. Example:
contrast <- makeContrasts(T1vsC= T1.- C., T2vsC= T2. - C.,
levels = colnames(design))
fit <- glmFit(dpList, design)
lrt <- glmLRT(fit, contrast = contrast)
DEP <- topTags(lrt,n=Inf)
DEPdf <- as.data.frame(DEP)
7. FDR must be applied in order to avoid the consideration of

false positives. In this example, a false positive ratio of less than
5% (FDR < 0.05) has been applied.
test <- DEPdf[which(DEPdf$FDR < 0.05),]
8. A rule for defining differentially expressed proteins was defined

(double or half amount for proteins accumulated or lost,
log2FC > 1 or < 1 respectively). Proteins meeting this criteria,
and also having an FDR below threshold, were selected and
exported into a new object:
(a) Upregulated proteins Treatment 1:
proteinsup1 <- test[which(test$logFC.T1. >= 1),]

proteinsup1 <- proteinsup1[,c(1,2,3,5,6,7)]
colnames(proteinsup1) <- c("Protein", "log2T.1", "log2T.2", "LR", "p-value", "FDR")
Export the list of upregulated proteins treatment 1 (“pro-

teinsup1.txt”). The document will be saved in the selected
workspace.
write.table(proteinsup1$Protein, "proteinsup1.txt", quote = FALSE, row.names=FALSE)
(b) Downregulated proteins Treatment 1:
proteinsdown1 <- test[which(test$logFC.T1. <= -1),]

df <- proteinsdown1[,c(1,2,3,5,6, 7)]
colnames(proteinsdown1) <- c("Protein", "log2T.1", "log2T.2", "LR", "p-value", "FDR")
Export the list of downregulated proteins treatment 1

(“proteinsdown1.txt”) in the working directory:
write.table(proteinsdown1$Protein, "proteinsdown1.txt", quote = FALSE, row.names=FALSE)
(c) Upregulated proteins Treatment 2:
proteinsup2 <- test[which(test$logFC.T2. >= 1),]

proteinsup2 <- proteinsup2[,c(1,2,3,5,6,7)]
colnames(proteinsup) <- c("Protein", "log2T.1", "log2T.2", "LR", "p-value", "FDR")
Export the list of upregulated proteins treatment 2 (“pro-

teinsup2.txt”) in the working directory.
write.table(proteinsup$Protein, "proteinsup2.txt", quote = FALSE, row.names=FALSE)
(d) Downregulated proteins Treatment 2:
proteinsdown2 <- test[which(test$logFC.T2. <= -1),]

df <- proteinsdown2[,c(1,2,3,5,6, 7)]
colnames(proteinsdown2) <- c("Protein", "log2T.1", "log2T.2", "LR", "p-value", "FDR")
Export the list of downregulated proteins treatment

1 (“proteinsdown1.txt”) in the working directory:
write.table(proteinsdown2$Protein, "proteinsdown2.txt", quote = FALSE, row.names=FALSE)
9. The final files of overexpressed or repressed differential proteins

will be used for the protein–protein interaction analysis by
STRING database and the Functional network visualization
by ShinyGO. Example: In STRING you only need the names
or the fasta files of these selected proteins (Upregulated or
Downregulated) to introduce in the STRING web.
10. Continue the protocol with STRING/ShinyGO workflow
(Subheading 3.3).
(a) Subheading 3.3.1 for STRING.
(b) Subheading 3.3.2 for ShinyGO.
3.2 Integration Tools While most approaches are focused on constant interactions and
assume that biological systems are static, nonhomogeneous
3.2.1 Statistical
dynamic Bayesian statistics with ARTIVA (autoregressive time vary-
Integration Networks:
ing) algorithm can reveal causal interactions considering temporal
Dynamic Protein–Protein
information. This method is able to determine indirect associations
Interaction Networks
and regulation loops of high biological interest, thus giving a more
complete picture of the system. ARTIVA model divides global
dataset dynamics (heterogeneity) in several uniform (homoge-
neous) phases called changepoints (CPs). In each CP, it searches
for relations between two types of variables (regulators and targets)
while taking into account a user-defined time delay. As a result, it
reveals dynamic interactions across time [17]. This type of model
has three main limitations: (1) Interactions that occur at a time scale
shorter than the sampling points cannot be detected and may result
in wrong conclusions. If the time between two consecutive sam-
pling points is too large, it is recommended to try other approaches.
(2) Datasets with low number of samples and high number of
variables can lead to erroneous inference [18]. It would be desirable
to find a balance by filtering the variables with differential expres-
sion tests or functional categories. (3) Time-consuming process.
An example of the ARTIVA networks is available in Fig. 2.
Changepoint 1; Sampling points 1-2

Prot 1
Prot 46
Prot 2 Prot 3
Prot 4 Prot 5
Prot 45 Prot 62
Prot 50
Prot 47
Prot 11
Prot 6 Prot 49
Prot 9
Prot 7
Prot 10
Prot 14 Prot 8 Prot 48
Prot 61 Prot 51
Prot 13
Prot 15
Prot 12
Prot 52
Prot 55 Prot 56
Prot 16 Prot 18
Prot 20 Prot 53
Prot 19 Prot 54
Prot 17 Prot 59

Prot 22
Prot 23
Prot 64
Prot 58
Prot 25
Prot 26
Prot 24
Prot 60
Prot 27 Prot 28
Prot 34
Changepoint 2; Sampling points 2-3 Prot 30

Prot 29
Prot 33
Prot 31
Prot 32
Prot 1
Prot 32
Prot 31
Changepoint 4; Sampling points 4-5-6
Prot 38 Prot 24
Prot 27
Prot 35 Prot 36
Prot 38 Prot 37
Prot 28
Prot 30
Prot 44 Prot 90
Prot 3 Prot 5
Prot 29 Prot 2
Prot 33
Prot 91
Prot 41 Prot 80
Prot 40
Prot 34 Prot 5
Prot 82
Prot 39
Prot 35
Prot 65
Prot 44 Prot 84
Prot 4 Prot 42
Prot 36 Prot 24 Prot 33 Prot 57
Prot 66
Prot 46
Prot 66
Prot 11
Prot 59
Prot 36
Prot 45 Prot 8 Prot 73 Prot 6 Prot 10 Prot 9 Prot 7
Prot 67 Prot 1
Prot 50 Prot 68
Prot 49
Prot 18
Prot 87
Prot 63
Prot 51
Prot 12
Prot 18
Prot 58 Prot 2
Prot 53
Prot 52 Prot 40 Prot 88 Prot 89 Prot 79
Prot 71
Prot 54
Prot 29
Prot 13 Prot 37
Prot 77 Prot 85
Prot 65
Prot 19

Prot 26
Prot 64
Prot 60
Prot 56
Prot 55
Prot 69 Changepoint 3; Sampling points 3-4 Prot 64
Prot 86
Prot 1
Prot 61
Prot 78
Prot 38
Prot 4
Prot 72 Prot 31
Prot 71
Prot 37
Prot 69 Prot 29
Prot 30 Prot 36
Prot 79
Prot 35
Prot 65
Prot 62
Prot 74 Prot 33
Prot 81
Prot 2
Prot 45
Prot 75 Prot 46
Prot 80
Prot 48
Prot 49
Prot 76 Prot 5
Prot 12
Prot 43 Prot 53
Prot 25
Prot 77
Prot 24
Prot 44 Prot 66
Prot 59 Prot 55
Prot 54
Prot 58
Prot 68
Prot 42
Prot 40
Prot 67
Fig. 2 ARTIVA networks representing the interactions across time between three different groups of regulators
(blue, red, and green nodes) and targets (gray nodes). The interactions of each changepoint are represented
independently. The orange circle shows nodes that have steadily gained interactions over time and therefore
greater importance. This network was made using Cytoscape workflow. Changepoint 1 ¼ interactions that
only occur on sampling point 1; Changepoint 2 ¼ interactions that only occur on sampling point 2;
Changepoint 3 ¼ interactions that only occur on sampling point 3; Changepoint 4 ¼ interactions that only
occur on sampling points 4-5-6
Workflow 1. Install all necessary packages.

install.packages(“ARTIVA”)
2. Load all necessary packages.

library(ARTIVA)
3. Set working directory where is the dataset to use (Proteins

matrix, Fig. 1a) and import it (see Note 4). Protein names
must be unique. This matrix is named as ART_data:
ART_data <- read.table("proteins.csv", header = T, row.names=1, sep=";")
4. Transform the data to allow a better performance of the

algorithm.
ART_data <- log10(ART_data + 1)
5. Select and subset the regulators. Each group of regulators

should have similar number of variables (see Note 5). Classical
examples of regulators are: transcription factors and epigenetic-
related proteins. Regulators_1 is a numeric vector containing

the row numbers of the desired proteins. In this example there
is one group of regulators:
Regulators_1 <- c(43, 46, 47, 79, 131, 154, 160, 164, 187, 202,
218, 230, 242, 269, 289, 344, 345, 350, 428, 433, 438, 442,
468, 501, 510, 522, 561, 586, 605, 625, 669, 698)
6. Run ARTIVA regulators vs targets (see Note 6) and filter the

output (see Note 7) based in posterior probability (PostProb).
Posterior probability is a statement about the degree of belief in
a particular interaction.
(a) Run ARTIVA regulators vs targets. For a detailed explana-
tion of the arguments (see Note 8):
DBN <- ARTIVAnet(

targetData = as.matrix(ART_data[-Regulators_1, ] ),
parentData = as.matrix(ART_data[Regulators_1,]),
targetNames = rownames(ART_data[-Regulators_1, ] ),
parentNames = rownames(ART_data[Regulators_1,]),
niter = 50000,
dataDescription = rep(1:6, each=3),
nbCPinit = 1, maxCP = 5, segMinLength = 1,
savePictures = FALSE, saveEstimations = FALSE,
saveIterations = FALSE,
dyn = 0, edgesThreshold = 0.6)
(b) Filter the output:

DBN <- DBN[which(DBN$PostProb > 0.6),]
7. Export the output.
write.table(DBN, file = "DBN.txt")
8. Continue the protocol with Cytoscape/Gephi workflow (Sub-

heading 3.5).
3.2.2 Statistical The statistical interaction networks between proteins with other
Interaction Networks omic datasets (metabolomics, transcriptomics, ...) is obtained
between Proteins through the use of the Partial Least Square Regression (PLS), or
with Other Omics Datasets its variant regression of Sparse Partial Least Squares regression
(sPLS). The PLS is a multivariable analysis which integrates two
Partial Least Square matrices: X a predictive matrix (e.g., Proteins) and Y a response
Regression (PLS) matrix (e.g., Metabolites) [19, 20]. This method is used when the
and Variates number of predictors are more than the number of observations or
cases and where the variables considered for the study are correlated
[21], a common situation in data obtained from omic techniques.
The advantages of PLS is that it can handle many noisy and collinear
variables, while sPLS [22] also has the advantage that a
IMPORTIN glucose-6-phosphate
SUBUNIT 1-dehydrogenase
ALPHA-2
Malate Proteins
dehydrogenase Importin subunit
Malate
dehydrogenase
(oxaloacetate-decarboxylating)) beta-1
(NADP(+)) Universal stress Transcripts
PEROXIDASE protein family Secogolanin
(decarboxylating)
ASPARTIC 35-RELATED
(Usp) synthase
BAND 7 PROTEINASE
hypothetical
PHOSPHOLIPASE PROTEIN-RELATED A1-RELATED solute carrier GLYCEROL-3-PHOSPHATE
protein (K09955)
family 25
D ALPHA
(mitochondrial
Farnesyl DEHYDROGENASE
1-RELATED
phosphate
diphosphate [NAD(+)]
Caffeate
transporter),member synthase NAD
O-methyltransferase
3 Diphosphomevalonate DEPENDENT
decarboxylase EPIMERASE
Ubiquitin PHOSPHOLIPASE Acyl-[acyl-carrier-protein]
carboxyl-terminal D ALPHA alpha,alpha-trehalase desaturase 1 4-coumarate--CoA
hydrolase (UCH) EARLY-RESPONSIVE 1-RELATED (E3.2.1.28, treA,
MAP ligase (4CL)
stress-induced-phosphoprotein 1-deoxy-D-xylulose-5-phosphate TO Limonene treF)
Serine protease
1 synthase DEHYDRATION
family S10 serine Chalcone-Flavonone
large subunit
FAD2.1 synthase
Apolipoprotein D
STRESS
Carboxypeptidase Isomerase ribosomal protein
PROTEIN L28e PECTINESTERASE
and lipocalin 3-Related coniferyl-alcohol Naringenin
family protein TIP120 SKP1 FAD7.2 glucosyltransferase 3-dioxygenase
(APOD) MYC2 acetyl-CoA
raffinose synthase Anthocyanidin GIP1 Betaine-aldehyde
Leucoanthocyanidin acetyl-CoA Beta-ketoacyl-[acyl-carrier-protein] carboxylase dehydrogenase
dioxygenase MEMBER OF Annexin A13 carboxyl 3-O-glucoside
carboxylase 12-oxophytodienoic synthase II N-MYC 2''-O-glucosyltransferase
'GDXG' FAMILY (ANXA13) carboxyl acid reductase DOWNSTREAM transferase
Long-chain
OF LIPOLYTIC transferase REGULATED subunit alpha Prohibitin 1
acyl-CoA
ENZYMES PECTATE subunit alpha Linoleate GE10H (PHB1)
small subunit LYASE acetyl-CoA 13S-lipoxygenase Solute carrier synthetase METALLOPROTEASE
ribosomal protein Phosphoglycerate
1-RELATED acyltransferase 1 ATP SYNTHASE PROTEIN family 25 M41 FTSH
S27Ae kinase (PGK)
CHAPERONIN (ACAA1) PEPTIDYL-PROLYL DELTA CHAIN (PAC:37713294) (mitochondrial
Oxalyl-CoA Polyneuridine-aldehyde calcium-binding
60 SUBUNIT CIS-TRANS Linoleate oxoglutarate
MEMBER OF decarboxylase SERINE esterase EF hand family
ALPHA 1 ISOMERASE 9S-lipoxygenase transporter),
Solute Carrier 'GDXG' FAMILY member 11 Cytochrome P450 protein
family 25 OF LIPOLYTICCHLOROPLASTIC CYP18-3-RELATED
L-ascorbate 1-hydroxy-2-methyl-2-(E)-butenyl (p450)
(mitochondrial ENZYMES HALOACID
SERINE 9-cis-epoxycarotenoid peroxidase 4-diphosphate
oxoglutarate DEHALOGENASE-LIKE
CARBOXYPEPTIDASE-LIKE DIHYDROLIPOAMIDE dioxygenase Beta-ketoacyl-[acyl-carrier-protein] synthase aminocyclopropanecarboxylateHYDROLASE
transporter), 36-RELATED ACETYL
member 11
GDPmannose synthase III TRIOSEPHOSPHATE oxidase
Aspartate 4,6-dehydratase ISOMERASE Inositol-3-phosphate enoyl- (fabI)
GLUCOSYL Arsenical
Aminotransferase, synthase
Trans-cinnamate pump-driving SUGAR
cytoplasmic 4-monooxygenase 3-oxo-5-beta-steroid
atpase plastocyanin UTILIZATION
(GOT1) cis-zeatin 4-dehydrogenase
arsenite-translocating
g (petE) REGULATORY Cathepsin F
O-glucosyltransferase
atpase Cell cycle arrest PROTEIN IMP2
Geraniol
dehydrogenase protein BUB3
(BUB3)
(NADP(+)) Villin 1 (VIL1)
T-complex protein 3-oxoacyl- (fabG)
FK506-binding
protein
1 subunit gamma Dormancy Edge
(CCT3, TRIC5)
Cysteine synthase
3-hydroxyacyl-CoA
dehydrogenase
-0.97 0.99
Fig. 3 sPLS network using the transcript matrix as the predictor matrix and the protein matrix as the response
matrix. Edge legend represents the weight of the links between proteins and transcripts. Network represen-
tation was drawn employing Cytoscape
simultaneous selection of variables can be made in the two datasets,

reducing the complexity of omics data.
In this section, sPLS has been used to obtain networks of
interaction of our protein matrix against other molecular levels
(transcripts, in this case). An example of the sPLS network is
represented in Fig. 3.

install.packages(“mixOmics”)
install.packages(“igraph”)
install.packages(“RColorBrewer”)
2. Load all necessary packages:

library(mixOmics)
library(igraph)
library(RColorBrewer)
3. Introduce the necessary datasets (in this case, the two

matrixes): The transcripts (Fig. 1c) as the predictor matrix
(X) and protein matrix (Fig. 1a) as the response matrix (Y).
First, in the session tab of R, set working directory where are all
matrixes of the datasets to use.
Xmatrix <- read.table("contigs.csv", header = T, row.names=1, sep=";")

Ymatrix <- read.table("proteins.csv", header = T, row.names=1, sep=";")
4. Transforms matrixes to the correct format for the PLS func-

tion. The transposed matrix is required for the function.
Xmatrix <- as.data.frame(t(Xmatrix))

Ymatrix <- as.data.frame(t(Ymatrix))
5. The sPLS analysis of the mixOmics package will be done as

follows. To do the PLS analysis, simply change the spls function
to pls.
spls.analysis <- spls(Xmatrix,Ymatrix,ncomp=4,max.iter=2500)
6. Select the two components that best explain your experimental

model (normally components 1 and 2). Use the plots and other
functions included in mixOmics (see Lê Cao et al. [22] for
more information) to decide the components to used.
(a) For visualize plots: Create a vector of colors for your
experiment (pal) to improve the visualization and use the
plotIndiv function to make the plots (see Note 9). Change
the number of treatments and replicates to the one that fits
your experiment, in this case three treatments with three
replicas each.
treatmentnumber <- 3
Replicates <- 3
pal <- brewer.pal(treatmentnumber,"Set1")
pal <- rep(pal,each=replicates)
plotIndiv(spls.analysis,comp=1:2,cex=4,col=pal,rep.space="XY-
variate",pch=16,Y.label="comp 2", X.label="comp 1")
variate",pch=16,Y.label="comp 4", X.label="comp 3")
7. Make the sPLS network with the two selected components, in

this case components 1 and 2 (this is set in the “comp” argu-
ment, e.g., comp ¼ 1:2). The cutoff selected for this network is
0.7 (it will only include interactions greater than 0.7). The
cutoff will depend on each experiment. We recommend the use
of low cutoff for network creation to export to Cytoscape and
increase this cutoff later in this one. Sometimes the output may
not show with Rstudio because of margin issues. The plot can
be saved as an image using the argument pdf.save and name.
save. Example for pdf function can be seen below.
net <- network(spls.analysis, comp = 1:2, color.node = c("white","pink"),

cutoff=0.7, shape.node = c("rectangle", "rectangle"),show.color.key = TRUE)
(a) Example for pdf function:
pdf(file="networksPLS_0.7.pdf")
net <- network(spls.analysis, comp = 1:2, color.node = c("white","pink"),
cutoff=0.7,shape.node = c("rectangle", "rectangle"),show.color.key = TRUE)
dev.off()
8. Exports the network in a format suitable for opening with

Cytoscape.
write.graph(net$gR, file = "network_sPLS(0.7).gml", format = "gml")
9. Continue the protocol with Cytoscape workflow (Subheading

3.5.1).
Data-Driven Integration PLS algorithm and its variants have many advantages, some of them
and Differential Network especially useful for omics dataset research as stated in Subheading
Analysis (xMWAS) 3.2.2.1, but they are restricted to two different omics layers. Statis-
tical based integrative networks aim to collect the relations among
all the studied omics layers of molecular information, “the more,
the better,” to get the more accurate and complete network. To this
end, other tools have been developed, expanding sPLS algorithm,
which allow for the integration of more levels (up to four). xMWAS
uses this method of integration and performs network analysis to
allow for visualization of positive or negative associations between
different datasets [8].
In this section, xMWAS package has been used to obtain net-
works of interaction of three omics data: metabolites, proteins, and
transcripts. An example of the xMWAS network is represented in
Fig. 4.
metabolite 224 metabolite 145

metabolite 239 transcript 14
metabolite 225
metabolite 219 metabolite 216 metabolite 44 metabolite 152
metabolite 59 metabolite 233 metabolite 134
metabolite 166 metabolite 262 metabolite 80 metabolite 202 metabolite 137
metabolite 141
metabolite 260 metabolite 50 metabolite 74 transcript 18metabolite 217metabolite 157 transcript 19 metabolite 174 metabolite 159
metabolite 207 metabolite 57 metabolite 240 protein 28
metabolite 160 metabolite 259 metabolite 70 metabolite 84 metabolite 211 metabolite 125 metabolite 132 protein 47
metabolite 257 metabolite 130 metabolite 215 metabolite 246 metabolite 192 protein 17
metabolite 180 metabolite 258 metabolite 237 metabolite 124 metabolite 138 metabolite 170
transcript 16 metabolite 208 metabolite 172 protein 7 metabolite 94
metabolite 162 metabolite 73 metabolite 123 protein 18
metabolite 39 metabolite 188 protein 43
metabolite 167 transcript 17 metabolite 66 metabolite 103 metabolite 214 metabolite 218 metabolite 253 metabolite 163
metabolite 82 metabolite 205 protein 41protein 40 metabolite 194 protein 51
metabolite 173 transcript 13 metabolite 54 metabolite 197 metabolite 185 metabolite 221 protein 61
transcript 15 metabolite 51 transcript 21 metabolite 230 metabolite 189
metabolite 48 metabolite 247 metabolite 187 protein 42 metabolite 133
metabolite 71 metabolite 67 transcript 22 metabolite 213 metabolite 231
metabolite 86 metabolite 91 transcript 20 metabolite 232
metabolite 153 protein 34 metabolite 243 metabolite 126 protein 49
metabolite 93
metabolite 92 metabolite 49 metabolite 139 metabolite 195 metabolite 142 metabolite 242
metabolite 56 metabolite 169metabolite 220 metabolite 210
metabolite 79 transcript 11 metabolite 179
metabolite 226
protein 16
protein 23 protein 20 metabolite 20
transcript 8 protein 50 metabolite 140
protein 22 transcript 6 metabolite 4
protein 36 metabolite 261 transcript 12 metabolite 115 metabolite 144
protein 21
transcript 5
metabolite 114
metabolite 98 protein 48 metabolite 17
metabolite 112 protein 14
metabolite 171 transcript 7 protein 1 protein 6 protein 8protein 10 protein 12
transcript 9 metabolite 9 protein 13
metabolite 191 protein 56 metabolite 35 metabolite 106 metabolite 10 protein 39
transcript 1
protein 52 protein 26 protein 58
transcript 10 metabolite 34 metabolite 110 protein 54 metabolite 18
protein 27 protein 37 metabolite 15
transcript 4 metabolite 100 metabolite 16
protein 31
transcript 2 metabolite 96
metabolite 131 transcript 3 metabolite 105
metabolite 135 metabolite 118 protein 64 protein 4 metabolite 200
protein 65 protein 32 metabolite 252 metabolite 107
protein 29
protein 15 metabolite 95
protein 55 metabolite 161 metabolite 154 metabolite 245
protein 5
metabolite 250
metabolite 212
protein 3
metabolite 60
metabolite
metabolite 32 31 metabolite 256 metabolite 23
metabolite 101 metabolite 13 metabolite 6 protein 25 metabolite 176 metabolite 196
protein 59 metabolite 33 metabolite 26
Proteins
protein 60 metabolite 24 protein 2 protein 24 metabolite 227 metabolite 19
protein 53
protein 9 protein 62
protein 35 protein 11 protein 19protein 45 Transcripts
metabolite 29
protein 46 metabolite 97 metabolite 22
protein 57 metabolite 27
metabolite 21
Metabolites
Fig. 4 xMWAS, multi sPLS network using transcripts, proteins, and metabolites. The clusters are formed by the
500 most relevant variables of each omics dataset, and its presence is greater than 20% along the treatments.
Negative connections are shown in red and positive connections in blue. Network representation was drawn
employing Cytoscape

(a) Install R dependencies.
source("https://bioconductor.org/biocLite.R");
biocLite(c("GO.db","graph","RBGL","impute","preprocessCore"),dependencies=TRUE);
install.packages(c("devtools","WGCNA","mixOmics","snow","igraph","plyr","pl
sgenomics"),dependencies=TRUE,type="binary", repos="http://cran.r-project.org")
(b) Install R package xMWAS.

library(devtools); install_github("kuppal2/xMWAS")

library(xWMAS)
3. Introduce the necessary datasets (in this case, the three

matrixes described in materials): The transcripts (Fig. 1c), pro-
teins (Fig. 1a) and metabolites (Fig. 1b). First, in the session
tab of R, set working directory where are all matrixes of the
datasets to use.
transcripts <- read.table("contigs.csv", header = T, row.names=1, sep=";")
proteins <- read.table("proteins.csv", header = T, row.names=1, sep=";")
metabolites <- read.table("metabolites.csv", header = T, row.names=1, sep=";")
4. Create a dataframe describing the samples, one column with

the sample names and another for the treatments, and list it
with the previous datasets:
SampleID <- colnames(metabolites)
Class <- rep(c(“Control”, ”Treatment1”, “Treatment2”), each = 3)
classlabels <- cbind(“SampleID” = SampleID, “Class” = Class)
dataset <- list(“transcripts” = transcripts, “proteins” = proteins, “metabolites” =
metabolites, “classlabels” = classlabels)
5. Create the path for the outputs, in the working directory.

output <- getwd()
6. Run the function (it may take a while). For further argument
details (see Note 10). In the following example regression
mode of sPLS has been used, selecting the 500 most relevant
variables of each omics dataset and ten components. Tran-
scripts, proteins, and metabolites have been depicted in RStu-
dio Viewer Window as gold rectangles, green circles, and cyan
triangles, respectively. All those variables with a presence less
than 20% along the treatments have been discarded.
xmwas_res<-run_xmwas(Xome_data = dataset$Transcripts, Yome_data =

dataset$proteins, Zome_data = dataset$metabolites, outloc = output, classlabels =
dataset$classlabels, xmwasmethod = "spls", plsmode = "regression", max_xvar =
5000, max_yvar = 5000, max_zvar = 5000, rsd.filt.thresh = 1, corthresh = 0.5,
keepX = 500, keepY = 500, keepZ = 500, pairedanalysis = FALSE, optselect =
FALSE, rawPthresh = 0.1, numcomps = 10, net_edge_colors = c("blue","red"),
net_node_colors = c("gold", "green", "cyan"), net_node_shape =
c("rectangle","circle","triangle"), all.missing.thresh = 0.2, seednum = 100,
label.cex = 0.2, vertex.size = 6, graphclustering = TRUE, interactive = FALSE,
max_connections = 10000, centrality_method = "eigenvector", use.X.reference =
FALSE, removeRda = TRUE)

3.5.1) by opening the gml file created in the output directory
(Multidata_Network_threshold0.59cytoscapeall.gml).
DIABLO DIABLO is a Data Integration Analysis for Biomarker discovery

using a Latent component method for Omics studies. DIABLO
models maximize the correlation between pairs of prespecified
omics datasets to unravel similar functional relationships between
those omics data [23]. As the most practical approach, DIABLO
allows to select relevant correlated and discriminatory biomarkers,
using synthetic variables as well as multi-omics datasets [2]. DIA-
BLO builds on Projection to Latent Structure models [19], sub-
stantially extends both sparse PLS-DA [5] to multi-omics analyses
and sparse generalized canonical correlation analysis [24] to a
discriminant analysis framework. An example of the DIABLO net-
work is represented in Fig. 5.
Edge
Contig 72015
Contig 42439
1,2,4-trithiolane
Contig 46910 n540
p635 n575
-1.00 1.00
Serine/threonine-protein Contig 50072
p661 p634
kinase p598 p532
Aurora-3 n229 Contig 35628
p599 Gibberellin
n908 GA7
Glycine-rich n1239 n974
Castasterone RNA-binding n75 n1082
protein n3283/n1295 Contig 49804 Pyridoxine
p1502
Isopentenyladenosine n143
n10
p610 3-Caffeoylpelargonidin
Contig 50114
5-glucoside
Salicilic acid
Selenoprotein p275 Contig 61535
O n34
E3 O-glycosyl
Delta-1-pyrroline-5-carboxylate ubiquitin-protein p224
Contig 39406 hydrolases
synthetase ligase BAH1
Anthocyanidin Putative NAD-dependent family 17
Contig 14852 protein
Contig 48980 synthase sugar malic enzyme
Cytochrome Dihydrokaempferol
phosphate/phosphate 2
b5 Contig 32263 translocator
Benzyl Contig 28711 n962
alcohol Contig 72233 Peroxidase
Predicted
O-benzoyltransferase Contig 58633 Cellulose
Os10g0434900 Contig 44151 protein
n1142 synthase-like
W.nobilis RmlC-like protein Contig 14494 H2 Transmembrane
(R.T.26054_591) cupins
Contig 60614 protein,
RNA superfamily Caffeoyl-CoA Protein putative
sequence protein O-methyltransferase Proteins DETOXIFICATION
Disease Contig 60576
resistance Contig 54211 Transcripts
response Metabolites
protein Contig 28611
Fig. 5 DIABLO network showing interactions between different omic levels. Edge legend represents the weight
of the links between proteins, transcripts, and metabolites. Network representation was drawn employing
Cytoscape

install.packages(“mixOmics”)
install.packages(“snow”)
install.packages(“(RColorBrewer”)

library(mixOmics)
library(snow)
library(igraph)
3. Introduce all dataset (in this case, the three matrixes) individu-
ally. These matrixes are named as proteins as protein matrix
(Fig. 1a), metabolites for metabolites matrix (Fig. 1b) and
contigs as transcripts matrix (Fig. 1c) (see Note 11). In the
session tab of R, set working directory where are all matrixes of
the datasets to use. The csv matrix files are the individual
matrices of each data set stored in Excel as csv (see Fig. 1 and
Supplementary dataset S1).
metabolites <- read.table("metabolites.csv", header = T, row.names=1, sep=";")
proteins <- read.table("proteins.csv", header = T, row.names=1, sep=";")
contigs <- read.table("contigs.csv", header = T, row.names=1, sep=";")
metabolites <- as.data.frame(t(metabolites))
proteins <- as.data.frame(t(proteins))
contigs <- as.data.frame(t(contigs))
4. Check that all matrices have the same dimension (number of

samples). The first number must be the same for the four
matrixes in lapply(datasetsdiablo,dim).
datasetsdiablo <-list(met=t(metabolites),pro=t(proteins),mrna=t(contigs))
lapply(datasetsdiablo,dim)
5. Create the diablovectorY, a vector with elements as number of

row: for example, three treatments (C, T1, and T2) with three
replicates (each ¼ 3) each equal to a diablovectorY of 9) and
establish ncomp. The ncomp is equal to the number of levels
(treatments) minus 1 (e.g., 3–1 ¼ 2).
diablovectorY <- rep(c("C","T1","T2"),each=3)
ncomp = 2
6. Prepare the functions with which DIABLO chooses the opti-

mal number of variates for each data set:
(a) Design matrix: The matrix design determines which
blocks (variates) should be connected to maximize the
correlation or covariance between components. The
values may range between 0 (no correlation) to 1 (correla-
tion to maximize), in a symmetrical matrix with a diagonal
of 0. In the example, we choose a design where all the
blocks are connected with a link of 1 (see Note 12). Check

that design is a matrix of 1 (or the chosen value in each
case) with a diagonal of 0.
design <- matrix(1,ncol=length(datasetsdiablo),

nrow=length(datasetsdiablo),dimnames = list(names(datasetsdiablo),
names(datasetsdiablo)))
dimnames = list(names(datasetsdiablo), names(datasetsdiablo))
diag(design)=0
design
(b) List.keepX: This tuning function should be used to tune the

keepX parameters in the block.splsda function. We choose
the optimal number of variables to select in each data set
using the tune function, for a grid of keepX values (see Note
13). First we set a range of values to take for each variable
with test.keepX (e.g., we have set
5,6,7,8,9,10,12,14,16,18,20,25,30,35,40,45,50). Meta-
bolites are usually highly correlated with respect to the
other matrices. Expand the possible number of variables
to include of these dataset if necessary (e.g., to 100 meta-
bolites (see Note 14)).
test.keepX <- list ("met" = c(5:9, seq(10, 18, 2), seq(20,50,5)),"prot" =

c(5:9, seq(10, 18, 2), seq(20,50,5)),"mrna" = c(5:9, seq(10, 18, 2),
seq(20,50,5)))
optimal.variables <- tune.block.splsda(X=datosdiablo,
Y=diablovectorY, ncomp=2, test.keepX = test.keepX, design =
design,nrepeat=1, cpus=1, folds=2)
This process is long (more than 3000 models being fitted

for each component and each nrepeat) and it is recommended
to have a computer with more than one cpu. If you have a
computer with six cpus, change cpus ¼ 1 to cpus ¼ 6.
list.keepX <- optimal.variables $choice.keepX
list.keepX
You get a list of the variables selected for each dataset for
the ncomp (e.g., $met [1] 50 [2] 10 $prot [1] 8 [2] 10 $mrna
[1] 10 [2] 12). This means that the optimal variables for
metabolites are in component [1] 50 metabolites, component
[2] 10 metabolites; those for proteins are in component [1] 8
proteins, component [2] 10 proteins; and those for mRNA are
in component [1] 10 transcripts, component [2] 12 tran-
scripts. Alternatively, you can manually input those parameters
(see Note 15).
7. Make the DIABLO with the selection of variables made previ-
ously (datasetsdiablo, diablovectorY, list.keepX, and design).
DIABLOanalysis <- block.splsda(X= datasetsdiablo, Y=diablovectorY,

ncomp=2, keepX=list.keepX, design=design)
8. Once the DIABLO is done, make the corresponding network

as follows. Cutoff is the parameter that restricts the degree of
correlation (ranging from 1 to 1 correlation). With a cutoff of
0.8 we only take interactions that exceed this threshold (elim-
inating interactions of less than 0.1 to 0.8). Adapt the cutoff
to each experiment (more than 0.7 is an acceptable
correlation).
net <- network(DIABLOanalysis, cutoff =0.8, blocks=c(1,2,3),
row.names=FALSE,col.names =FALSE ,color.node =
c("pink","gold","blue"),shape.node = c("circle","rectangle","circle"))
9. Export the pdf of the network as follows:

pdf("DIABLOanalysis_network_0.8.pdf",10,10)
net<-network(DIABLOanalysis, cutoff =0.8, blocks=c(1,2,3),
row.names=FALSE,col.names =FALSE ,color.node =
c("pink","gold","blue"),shape.node = c("circle","rectangle","circle"))
dev.off()
10. Exports the network in a format suitable for opening with

Cytoscape.
write.graph(net$gR, file = "DiabloNet(0.8).gml", format = "gml")

3.5.1).
3.3 Biological The functional interaction network is very useful for the full under-
Interaction Network standing of biological phenomena [25]. For this purpose, there are
Enrichment different databases and programs that help to compile and integrate
all protein–protein interactions, including both direct (physical)
and indirect (functional) relationships. In this section we will
develop two workflows using two of the most used databases
(STRING and ShinyGO) to obtain networks in species included
and not included in these databases. Initial input data could be the
complete protein dataset or the differentially expressed proteins
obtained in Subheading. 3.1.
3.3.1 STRING STRING is a Search Tool for Recurring Instances of Neighboring

Genes. The STRING [9, 25–27] database compiles all public
sources of information on protein–protein interaction. It is an
online tool available at https://string-db.org/ which allows the
creation of functional networks from protein identification/
sequence. The latest version of STRING (11.0) has 5090 organ-
isms, of which 56 are plants (Embryophyta) including some tree
Fig. 6 STRING-based interaction network represented in the webtool (a) and in Cytoscape (b). This network
employed Vitis vinifera as reference organism and proteins sequences (Supplementary S2). Edge represents
confidence in STRING web network, and the score in Cytoscape-represented network
species such as Populus, Eucalyptus, Prunus, Citrus, or Malus. An

example of STRING resulting networks is represented in Fig. 6.
Workflow 1. Go to the STRING website: https://string-db.org/

2. Introduce your proteins (upload a file or paste in the tab).
There are a lot of options in STRING web (included for only
proteins family), for multiple proteins used:
(a) Option Multiple proteins for the name/identifiers list
(Fig. 7a).
(b) Option Multiple sequences for the fasta (or .txt) file of
protein sequences (less than 2000 sequences). If the file
is larger, split into several files. For species not included in
the organisms available in STRING, we recommend the
used of sequences (Fig. 7b, Supplementary S2: Sequences
in txt).
3. Select the Organism. In the case of a plant species not included
in the STRING databases, write plants in the Organism tab and
select Embryophyta.
4. STRING gives you all the plant species included in its databases
ordered from the highest to the lowest number of proteins with
which it targets your sequences/identifier/name protein (nr of
protein matching in STRING).
5. Select the species with the highest number of nr (number of
proteins in the species that have blast bit scores higher than 60)
(see Note 16).
A) B)
>protein000006.2
E0CQ31 MAKVPPKHARDQFQDFEGLLNNLQDWELSFKDKDKRLKSQFVGKDKLDLPAQRHSMNSASQHSNGTGVNEKPPMGKTTALDNLGSGRQYDYMKDYDAIHR
LSDGLMEEEAVDANSEKELGNEFFKQKKFNEAIDCYSRSIAFSPTAVAYANRAMAYIKIKRFQEAENDCTEALNLDDRYIKAYSRRSTARKELGKLKESI
D7SKD8 DDTSFALRLDPHNQEIKKQYAELKSLLEKEILKKASGVAGGSSQGVQREGKLKVEKSKSIHKVQSVSPSSPAGVAEVLKDNSKDREGGAETSMEVESSRL
D7THC1 RTHRADMNTSFGNVKIEHKNGEQELKASVQELAARAANLAKAEAAKNISPPNSAYQFEVSWRGLSGDRTLQAHLLKVTPATALPGIFKNALSAPMLVDVI
RCIATFFTEDMDLGVKYLENLTKVPRFDMVIMCLSPSDKADLWKIWDEVFSKGTSEYAENLGNLRLKYGVKQ*
Q6B4V4 >protein000016.1
MWFSLFVLLIYICYVNSKDGWENRWVKSDWKKDENMAGEWNYTSGKWNGDANDKGIQTSEDYRFYAISAEFPEFSNKGKTLVFQFSVKHEQKLDCGGGYM
A5AWT3 KLLSGEVDQKKFGGDTPYSIMFGPDICGYSTKKVHAILTYNGTNHLIKKEVPCETDQLSHVYTFILRPDATYSILIDNVEKQSGSLYSDWDLLPPKEIKD
D7U5G1 PEAKKPEDWDDKEYIPDPEDKKPEGYDDIPKEIPDPDAKKPEDWDDEEDGEWTAPTIPNPEYKGPWKPKKIKNPNYKGKWKAPMIDNPDFKDDPDLYVYP
KLKYVGVELWQVKSGTLFDNVLVCDDPEYAKQLAEETWGKQKDAEKAAFDEAEKKREEEESKDDPIDSDAEDGDDDAEDNDTDDDSKSDSTEDEATSVDD
F6HQP9 DAHDEL*
>protein000017.1
D7U044 MFLVDWFYGVLASLGLWQKEAKILFLGLDNAGKTTLLHMLKDERLVQHQPTQYPTSEELSIGKIKFKAFDLGGHQIARRVWKDYYAKVDAVIYLVDAYDK
ERFAESKKELDALLSDESLATVPFLILGNKIDIPYAASEDELRYHMGLTGITTGKGKVNLADSNVRPLEVFMCSIVRKMGYGDGFKWVSQYIK*
A5BD13 >protein000019.1
A5ACP0 MSNSELLQIEPLELQFPFELKKQISCSLQLTNKSDNYVAFKVKTTNPKKYCVRPNTGVVLPHSTCDVTVTMQAQKEAPPDLQCKDKFLLQSVVVGPGVTT
ENIKPDVFNKESGNRVEECKLRVSYVPPPQPPSPVREGSEEGSSPRASLSDNGTVNQIPDYNSMSRAYVDSLENTPEIDPC*
F6HEM8 >protein000041.1
MAITSRTPDISGERQSGQDVRTQNVVACQAVANIVKSSLGPVGLDKMLVDDIGDVTITNDGATILKMLEVEHPAAKVLVELAELQDREVGDGTTSVVIIA
E0CR38 AELLKRANDLVRNKIHPTSIISGYRLAMREACKYVDEKLAVKVEKLGKDSLVNCAKTSMSSKLIGGDSDFFANLVVEAVQTVKMTNGRGEVKYPIKGINI
F6HAX5 LKAHGKSARDSYLLKGYALNTGRAAQGMPMRVAPARIACLDFNLQKAKMQMGVQVLVTDPRELEKIRQREADMTKERIDKLLKAGANVVLTTKGIDDMAL
KYFVEAGAIAVRRVRKEDLRHVAKATGATVVSTFADMEGEETFDSSLLGYADEVVEERIADDDVIMIKGTKTTSAVSLILRGANDFMLDEMDRALHDALC
D7TGC8 IVKRTLESNTVVAGGGAVEAALSVYLENLATTLGSREQLAIAEFAESFLIIPKVLAVNAAKDATELVAKLRAYHHTAQTKADKKHLSSMGLDLGKGTVRN
NLEAGVIEPAMSKVKIIQFATEAAITILRIDDMIKLVKDESQNEE*
D7U1Z1 >protein000053.1
MNPLTFLRVLGPEPWNVAYVEPSIRPDDSRYGENPNRLQRHTQFQVILKPDPGNSQDLFIRSLSALGINVHDHDIRFVEDNWESPVLGAWGLGWEIWMDG
F6H1H4 MEITQFTYFQQAGSLQLTPVSVEITYGLERILMLLQGVDHFKKIQYADGITYGELFLENEKEMSAYYLKHASVDNIHKHFDLFEAEARCLLDSGLAIPAY
D7TDE2 DQLLKTSHAFNILDSRGFVGVTERARYFGRMRSLARQCAQLWLKTRESLGYPLGVTSQSDHIVFPKEVLEEAAGKVSTDPRLFVLEIGTEELPPNEVVNA
CKQLKDLIEQLLEKQRLSHGKVLTFGTPRRLVVHVHNLYAKQVANEIDVRGPPASKAFDQGGNPTKAAEGFCRRNGVPLGSLFRRVEGKTEYVYVRAVEP
D7TCD0 SRLALEVLSEELPGTIGKILFPKSMRWNSEVMFSRPIRWILALHGDVVVPFIGNLSHGLRNTPSATVKVASAESYTDVMQRAGIAISMEQRKQTILDSSN
ALAKSVGGIIILQNDLLDEVANLVEKPVPVLGKFNESFLVLPKDLLIMVMQKHQKYFAITDQGGNLLPYFISVANGAINEMVVRKGNEAVLRARYEDAKF
A7NVX9 FYEVDTSKRFSEFRSQLNGILFHEKLGTMLDKMTRVQHLVTEVGSSLRVSGDTLQIIKGAASLAMIDLATAVVTEFTSLSGIMARHYALRDGYSEQIAEA
A5B3K2 LFEITLPRFSGDIVPKTDAGTVLAITDRLESLVGLFAAGCQPSSSNDPFGLRRISYCLVQLLVETNRDLDLRHGLELAAAVQPINVAAETIDTVHQFVTR
RLEQLLMDQGISPEVVRSVLAERANQPCLATKSAYKMEALSRGELLPKIVEVYSRPTRIVRGKDINDDLEVDEGAFETKEEKALWCTFTSLRTKIRPDME
D7UC14 VDDFVEASSDLLQPLEDFFNNVFVMVEDERIRKNRLALLKKISDLPKGIADLSILPGF*
>protein000056.1
A5C3G7 MEQTFIMIKPDGVHRGLVGEIIGRFEKKGFTLKGLKLITVDRHFAEQHYADLSAKPFFNGLVEYIISGPVTAMVWEGKNVVTTGRKIIGATNPADSAPGT
IRGDYAIDIGRNVIHGSDSVESAKKEIALWFPEGIAEWRSSVHQWIYE*
etc. etc.
Fig. 7 Dataset Input. (a) Multiple proteins in gene names. (b) Extract of a fasta file with sequences of proteins
6. The STRING will give you a list of the proteins with which it
made target, being able to have more than one identification by
protein. Review all annotations, according to identity and bit-
score that gives you the STRING. Choose the optimum or the
one that matches your identification. STRING will take by
default all identifications with the highest bitscore.
7. Once selected the protein identification in the new database of
the organism, download the document (option MAPPING) to
save which protein corresponds each STRING identification in
the organism database (Supplementary material S3). It will be
used later in Cytoscape.
8. There are two possibilities for viewing the network: continue
from the web tool or go to the Cytoscape and work with the
STRING app available in it.
(a) STRING web tool:
l Click the continue button and STRING makes us an
interaction network for proteins on the web.
l Improve the visualization of your network using
the tabs below. Here are some examples, but for more
see [26]:
– Meaning of the network edges: There are three display

options (evidence, confidence, or molecular action),
with evidence as default. Choose according to the
level of proven interaction evidence you want to
consider. For explanation about the interaction (see
Note 17).
– Minimum required interaction score: Define a
threshold for trusting biological interactions
(STRING assigns 0.4 by default). With option Cus-
tom value, you can increase to more than 0.9 (high-
est confidence). Try to find a balance between
biological significance and network complexity (see
Note 18).
– Max number of interactors to show: This option
allows you the introduction of intermediate inter-
actors, not present in your dataset, which interacts
with your proteins and proteins linking them (see
Note 19).
– Display simplifications: Active tabs “hide discon-
nected nodes in the network” and “disable structure
previews inside network bubbles”. This will elimi-
nate disconnected nodes and images of the struc-
tures within each protein.
l Export you Network in the tab Export (bitmap, vector
graphic, network coordinates, etc.), as well as all docu-
ments generated or used (protein sequences, XML
summary, simple tabular text output, etc.).
(b) STRING Cytoscape App:
l Installs the STRING App using Cytoscape’s App
manager.
l File->Import ->Network -> from Public Databases.
– Data Source: STRING: Protein query.
– Species: Select the Organism previously selected in
STRING web (the species with the highest number
of nr and that we have the MAPPING
downloaded).
– Enter the protein names or identifiers (the column
“preferredName” of the STRING mapping file,
Supplementary material S3, downloaded in point
7 of this workflow).
– Leaves the default values for Confidence (score) cut-
off (0.4) and Maximum number of interactors (0).
– Select Layout: Apply preferred layout to order the
network.
Fig. 8 ShinyGO networks using the Website application. (a) Protein Network with enrichment of Go Biological
Process (all proteins dataset) and (b) Protein response to stress Network with enrichment of a selected Go
Category (Go category: response to stress) (only proteins in category were introduced)
l STRING creates the default network. The edge attri-

butes include the overall confidence score (see Note
17). This network will depend on the studio species.
For an example of how to proceed according to the
network obtained, see Note 20.
l See Subheading 3.5.1 Workflow for information to
improve the network visualization.
3.3.2 ShinyGO ShinyGO [28] is an intuitive and graphical web platform with more
than 200 species of plants and animals, annotated from GO and
other databases, which allows for the graphical visualization of
enrichment results and with access to a program interface (API)
to STRING to make protein–protein interaction networks [28]. An
example of the ShinyGO networks is represented in Fig. 8.
Workflow 1. Go to the ShinyGO website: http://bioinformatics.sdstate.

edu/go/.
2. Select the Organism. Default ¼ Best matching species. Option
Select or search for your species option.
3. Introduce your proteins. Upload a file with your protein list
which is the list of genes that encode for those proteins (ACC1,
LHY, HSA32, FTSH6, CHS, etc.) in the option Paste genes:
(a) All protein dataset (list with all genes that encode all
proteins).
(b) Differential expression proteins (e.g., Upregulated or

Downregulated list of genes that encode those proteins).
(c) Selected proteins in a GO biological process or KEGG
pathway (in the Groups tab you can see the proteins
included in these classifications once they have been
uploaded to ShinyGO).
4. Select the category you wish to analyze (GO Biological Pro-
cess, KEGG pathways, GO Molecular Function, or others).
You can change this option at any time in the tab on the left
and it will change the network obtained in the tab on the right.
5. Set a P-value cutoff (FDR). Default ¼ 0.05 (see Note 21).
6. Set the number of most significant terms to show. Default ¼ 30
(see Note 22).
7. Click Submit.
8. Click on the Network tab to get an enriched term visualized as
a network. You can click on Change layout, to change the
network view. Furthermore, you can change organisms since
ShinyGO shows all Matched Species (genes), for example to
choose a species with less matched but closer phylogenetically.
Repeat the steps by choosing this new species.
9. Click on the download button to save your interaction net-
works into a local file (see Note 23).
3.4 Merged Once experimental statistical and biological based networks are
Functional already depicted separately, they can be merged to deeper under-
and Statistical stand our system. Following this approach, we will be able to
Interaction Networks visualize in one graph already known biological information and
expand it with our experimental results. This depiction can be also
used to check and validate our experimental network reliability.
These networks can be merged either by using Cytoscape built-in
function or manually. An example of the Merged networks is repre-
sented in Fig. 9.
Create a STRING Network 1. Go to STRING web page (version 11.0) to “Proteins with
for Merged Networks values/Ranks”.
2. Introduce a list containing our experimental proteins and one
numeric value as fold change as calculated in Subheading 3.1.1.
The matrix with protein identifiers must be compatible with
STRING database (see Subheading 3.3.1).
3. Select the organism you are working with.
4. Click Search.
5. Download the mapping document and continue.
6. Download the depicted network.
Prot ei n 196
Prot ei n 217
Prot ei n 105
Prot ei n 194
Prot ei n 169Prot ei n 161 Prot ei n 163

Prot ei n 192
Prot ei n 81 Prot ei n 183

Prot ei n 101
Prot ei n 47
Prot ei n 148
Prot ei n 51
Prot ei n 93
Prot ei n 97 Prot ei n 84 Prot ei n 80 Prot ei n 98
Prot ei n 102
Prot ei n 77
Prot eiProt
n 162
ei n 79 Prot ei n 56
Prot ei n 53
Prot eiProt
n 92ei n 103 Prot ei n 172 Prot eiProt ei n 106 Prot ei n 48
n 49 Prot ei n 54
Prot ei n 146
Prot ei n 99 Prot ei n 90 Prot ei n 202
Prot ei n 46
Prot ei n 113
Prot ei n 83
ProtProt
ei nei66
n 173 Prot ei n 209
Prot ei n 91 Prot eiProt ei n 96 Prot ei n 55 Prot ei n 52
n 73 Prot ei n 104
Prot ei n 208 Prot ei n Prot
143ei n 78 Prot ei n 18
Prot ei nProt
44 ei Prot
n 108ei n 187
Prot ei nProt
65 ei n 61 Prot ei n 197 Prot ei n 109
Prot ei n 58 Prot eiProt n 67ei n 122 Protei n 9
Prot ei n 125 Protei n 1 Prot ei n 204 Prot ei n 228
Prot ei n 82 Prot ei n 181 Protei n 8 Prot ei n 232
Prot ei n 60
Prot ei n 170
Prot ei n 41 Prot ei n 144 Prot ei nProt
117ei n 123
Prot ei Prot
n 63ei n 141 Prot ei n 31Prot eiProt Prot ei n 15 Prot ei n 206
Prot ei n 158 n 186 ei n 24
Prot ei n 198 Prot ei n 225 Prot eiProt
n 64ei n 126 Prot ei n 27 Prot ei n 150 Prot ei n 176
Prot ei n 40 Prot ei n 59 Prot ei Prot
n 188 Prot eiProt
n 14ei n 21 Prot ei n 131
ei n 112 Prot ei n 33
Prot ei n 184 Prot ei n 25Prot ei n 17
Prot ei n 193
4 Prot
Protei nProt 30 ei n 16 Prot ei n 29
n Prot
ei nei28
Prot ei n 137 Prot
Prot ei121
n 177 Prot ei n 136
Prot eieinn 124 Prot ei nProt
107ei n 155
Protei nProt
6 ei n 32 Protei n 5
Prot ei n 68
Prot ei n 76
Prot
Prot eiei
n n151
156 Prot ei n 37 Prot eiProt ei n 39 Protei n 7
n 10
Prot ei n 128Prot ei n 201
Protei n 2 Protei n 3
Prot ei n 185
Prot ei n 130
Prot ei n 216
Prot ei n 159 Prot ei n 34 Prot ei n 23 Prot ei n 129 Prot ei n 224
Prot ei nProt
127 ei n 215

Prot ei n 219
Prot ei n 231
Prot ei n 221
Prot ei n 160
Prot ei n 235
Fig. 9 Merged Network combining two different mathematical methods (sPLS, ARTIVA) and STRING network.
sPLS edges are shown in blue, ARTIVA in red, and STRING in green. Network representation was drawn
employing Cytoscape
7. Network visualization can be improved employing Cytoscape

or Gephi (see specific workflows in Subheading 3.5).
3.4.1 Cytoscape Merged 1. Open already built networks in the same Cytoscape session.
Functional and Statistical Example: STRING network and sPLS network.
Interaction Network 2. Cytoscape > Tools > Merge > Networks and select the net-
Workflow works to combine.
3. Choose between union, intersection, and difference.
4. Mark Enable merge nodes/edges in the same network.
5. Select the matching columns.
6. See Cytoscape workflow (Subheading 3.5.1) to improve the
network visualization.
3.4.2 Manually Merged Another choice, more flexible, is pasting both table networks, just
Functional and Statistical concatenating data. The only requirement for the subsequent visu-
Interaction Network alization to work is to verify that node names are the same in both
networks. In this way, we will able to create a consensus network of
different statistical analyses as those mentioned above or comple-
mentary annotation tools. This network table will be compatible
with both visualization tools described below. An example of the
resulting network is shown in the Fig. 9.
Workflow 1. Check all the nodes have coherent identifiers across the differ-
ent networks.
2. Paste the network tables in the spreadsheet maintaining the
source origin. Please note that whether directed (sPLS) and
undirected (STRING) edges are mixed together all the result-
ing network should be treated as undirected.
3. Add a column “Method” to distinguish the provenance of each
type of relations (sPLS, STRING, ARTIVA, . . .).
4. Save the table document (Table 1) in xlsx format for opening it
with a network visualization tool (Subheading 3.5.1).
3.5 Network Two workflows are presented for two most used and freely accessi-
Visualization Tools ble platforms to visualize networks: Cytoscape and Gephi. Net-
works can be introduced to the visualization programs in a
multiple variety of formats, the two most widely used are as a
table (.csv, .xlsx) and as a gml file.
Table 1
Example of input table combining two different mathematical methods (sPLS, ARTIVA) and STRING
network
Source Target Weight/CoeffMean Relation Method

Contig_16059_6_5 Contig_00060_5_2 0.705818839393222 Positive sPLS
Contig_03989_5_4 Contig_00109_4_9 0.74432961834302 Positive STRING
Contig_04066_5_5 Contig_00109_4_9 0.72448590539238 Negative STRING
Contig_04940_5_3 Contig_00109_4_9 0.757539908309518 Positive ARTIVA
Contig_05478_5_9 Contig_00109_4_9 0.712740724773293 Negative ARTIVA
Contig_06439_5_1 Contig_00109_4_9 0.725660528940782 Positive ARTIVA
3.5.1 Cytoscape Cytoscape is an open source software project for integrating bio-
molecular interaction networks with high-throughput expression
data and other molecular states into a unified conceptual frame-
work [29]. Some examples of resulting networks are shown in the
Figs. 2, 3, 4, 5, 6b, and 9.
Workflow 1. Go to the Cytoscape website: https://cytoscape.org/ and

download the program (v. 3.7.1 or upper).
2. Open Cytoscape program.
3. Import a network from a file, for example those generated in
previous sections (Subheadings 3.2.1, 3.2.2.1, 3.2.2.2,
3.2.2.3, 3.3.1 or 3.4.2):
File > Import > Network from file > Select desired file or
open desired network from file (see Note 24).
4. Order the network by applying a layout. In the tab Layout
select apply preferred layout to change the network visualiza-
tion to a prefuse force directed layout (width is used as default).
This layout is a good base for visualizing your data, but Cytos-
cape gives other types of layout that can better fit your data
(Grid, hierarchical, circular, . . .) and prefuse force directed
layout for others attributes as width, weigth, or labelcex. Net-
work display is obtained with this option.
5. Now there will be some optional improvements that can help
to the compression and visualization of the network:
(a) Graphically analyze the network (the network contains
only undirected edges):
Tools > NetworkAnalyzer > Network Analysis >
Analyze network.
l In the Result Panel click in Visualize Parameters to
open the windows to change it.
l Map Node size and Map Node Color: For continuous
variables as size it can be selected a variable as Radiality.
Click on Map Node size/Color and select the attribute
Radiality. This way networks node size will be accord-
ing to their relevance in the network (more connection
more size or color). For Map Edge Size or Color, select
the attribute Weight. This way networks edge will be
according to their interaction weight in the network
(from 1 to 1, for default Cytoscape have selected red
for negative values and blue for positive values).
l Apply and close the tab.
(b) Apply desired style to nodes (color, shape, width, label) by
right tab “Control panel” selecting the Style tab and
change down to “node.”
l Change the name of your nodes: To improve the

understanding of the network, it is advisable to change
the names of the nodes to the name of the identifica-
tions. In the Label tab, change the selected column
(by default in the “name” column) to the column in
your table where you have the ID. If you do not already
have it in your table, you can add the column as
follows:
– Export the names of your nodes to include in a new
document.
File > Export > Table for file > Select “default
node” in the .csv format.
– Create a new document with two columns (Supple-
mentary S4): the first column with node names
exported previously (it is required to match the
Cytoscape table) and the second column with all
you identification name in the column (e.g., name:
contig542; identification: Aquaporin PIP2). Save
the document in .csv or .xlsx format.
– Import the table to your Cytoscape network:
File > Import > Table for file > Select the new
document created with the two columns.
– Change in Label (Control panel, Style, node tab) to
the column with the identification.
(c) Apply desired style to edges (color, arrow, width) by right
tab “Control panel” selecting “edge” menu.
l Legend to edges: In “Stroke Color” you can see the
range of colors given for the edges, as well as the
maximum and minimum value (weight). Use it to cre-
ate a legend to your network.
(d) Increase the cutoff to reduce the size of your network and
keep only the strongest connections:
l In the Control panel, tab select, click in the + and select
“column filter”.
l Select “Edge:weight”. Now it will show you the range
of your data, and you will be able to select a more
adjusted one. Example:
Select Edge:weigth, “is not” and between (0.97
and 0.97), click to Apply. This selects the edges outside
that range, then:
File > New network > From selected nodes,
selected edges.
l The network has been created with the nodes only
included outside that range (all connections between
0.97 and 0.97 were eliminated) and the new network

will appear in the Control panel in the tab Networks.
(e) All nodes can be moved and placed in the most under-
standable position. As well as select groups of
interconnected nodes to move together or make extrac-
tions from your own network. For further details of net-
work depiction and customization (see Note 25).
6. Export the network as pdf File > Export > Network to image
> Export file format: pdf.
3.5.2 Gephi Gephi is an open source software that allows you to visualize,
manipulate, and explore all types of graphics and networks for all
types of data (a general-use platform), while Cytoscape is com-
monly used in the biology domain with specific apps and pluggings.
Gephi have some advantages over Cytoscape like have a good preset
and an integrated statistical analysis module. Furthermore, Gephi is
recommended for visualize large networks (up to 100,000 nodes).
An example of network created with Gephi is shown in Fig.10.
Workflow 1. Go to the Gephi website: https://gephi.org/ and download

the program (v.0.9.2 or upper).
2. Open Gephi program.
3. Select New project.
4. Open your network:
(a) From spreadsheet: File (e.g., from sPLS analysis) > Import
spreadsheet > Select desired file or open desired network
from file.
(b) From a .gml format File > Open > Select desired file.
5. Follow program instructions and import your table as Edges
table. You will find imported data in Window > Context > Data
Table, where you can edit your data input as well as add new
node or link information about annotations or labels.
6. Create the first view of your raw network with
Window > graph.
7. Order the network by Window > Layout > Force Atlas2 > Run.
This layout is one of the most used, being nice starting point to
understand the specific topology of your network.
8. Return to graph sheet and, once your network is already
spread, go to layout page and stop the Force atlas algorithm.
To configure the spread rate of the layout (see Note 26).
9. Graphical analysis can be done by Window > Statistics.
10. Apply the desired style to the network in
Window > Appearance.
Protein 232
Protein 210
Protein 235
Protein 202
Protein 224
Protein 129
Protein 182 Protein 176
Protein 221 Protein 218 Protein 39
Protein 130 Protein
Protein 11 20 Protein 7 Protein 228
Protein 201 200
Protein Protein 189 Protein 205
Protein 204
Protein
Protein 227 128
Protein 170
Protein 133 Protein Protein
5 21
Protein 35 Protein
Protein 15 26 Protein 206
ProteinProtein
10 17 Protein 8Protein 9
Protein 12
ProteinProtein
29 32 Protein 38 13 Protein 171
Protein 131 Protein 6 ProteinProtein 24 Protein 18
Protein 132 Protein 157 Protein 108Protein 190
Protein 109
Protein 4 Protein
Protein 25
153 Protein 52
Protein Protein
21414
Protein 37Protein 16 Protein 30 Protein 155 Protein 110
Protein 3 Protein 158 Protein 186 Protein 47Protein
Protein 148 98
Protein 33 Protein Protein
55 53
Protein
Protein 156 Protein 216 Protein 207 Protein 46 Protein 217
Protein 167 Protein 48 Protein 93 Protein
Protein 94 81
ProteinProtein
219 231 Protein 149 Protein 104
Protein 102
Protein 220 191
Protein Protein 107 Protein 89Protein Protein
85 161
Protein
Protein 49 147
Protein 100 Protein 183Protein 87
Protein 151Protein 112 Protein 111 Protein 79 Protein
Protein 196 169
Protein 166
Protein 209
Protein 51Protein 95
Protein 2
Protein 195
Protein 83 145 Protein 90 Protein 97
Protein
Protein 162
Protein 173Protein 92
Protein 188 Protein 230 Protein
Protein 199 Protein 172 86
Protein
Protein Protein 82
Protein 136
ProteinProtein
88 78
Protein 103
Protein 91
Protein 66
Protein 73 197
Protein
Protein 215
Protein 208
Protein 67
Protein 122
Protein 65
Protein 138 Protein 225Protein 72
Protein 142
Protein 121Protein 143 Protein 74
Protein 134
Protein 116 139
Protein Protein 70
Protein 117
ProteinProtein 120 Protein
114 Protein 69
Protein
60 141
Protein 118
Protein 135Protein 137 ProteinProtein
40
203
Protein 68
Protein 185
Protein 124
Protein 175
Protein 174
Protein
Protein 226 222
Protein 41
Protein 233
Protein 229
Fig. 10 Protein–protein interaction network depicted using Gephi. Node colors indicate the different subnet-
work groups based on the cluster betweenness graphical analysis. Arrows between nodes and labels are
colored as the parental node
(a) Node color: Change the color of the nodes in the Appear-
ance>Nodes menu, click on the palette icon and select
among “unique” for all nodes in the same color, “Parti-
tion” for make color clusters of nodes (attending to for
example a statistical network parameter (Choose and attri-
bute drop-down) as degree, or cluster coefficient) or
“Ranking” for a continue color scale. Click Apply to
save the changes and see the result.
(b) Node size: change node size in the next right icon; possi-
bilities are a unique value for all nodes and a Ranking, as a
continuous sizing scale. Click Apply to save the changes
and see the result.
(c) Change edge color following the first part of point
10 “Node color” in the Appearance > Edges menu.
11. Customize some other visual parameters in the preview menu.

Window > Preview settings.
(a) Add node names in the plot by selecting Show Labels in
Node Labels tab, change label color (e.g., to the same
color of node by clicking “parent” in Node Labels > Color
drop-down). Please note added labels will be those in the
data table column “Label”. Therefore, it may be necessary
to add them in the Data Table tab.
(b) Change the size of the edges or edge arrows.
(c) Click Refresh to update the graph and previsualize it.
12. For further details (see Note 27).
13. Export the network as pdf file: File > Export > SVG/PDF/
PNG file.
3.6 Future The evolution of next-generation sequencing and high-throughput

Perspectives technologies has created new possibilities and opportunities in data
analysis. Despite recent advances in regulatory networks are
focused on transcriptomics (RNA-seq, microarrays), most of them
can be straightforwardly expanded to proteomics research as long
as one assumption is satisfied: regulator abundance must influence
target abundance. The study of gene regulatory networks will allow
the proteomics community to anticipate future approaches in pro-
tein network construction.
During the last few years, a large number of available algo-
rithms have been implemented, and as this chapter has shown,
each method is biased toward certain type of biological interaction.
To overcome this problem, a “wisdom of crowds” approximation is
needed. This concept refers to the phenomenon in which the
collective knowledge of a community is greater than the knowledge
of any individual; in other words, aggregating several networks into
a metanetwork significantly improves network accuracy [30]. A
metanetwork community approach and user-friendly software like
Seidr [31] need to be established as a priority in the near future of
protein network construction.
Another interesting topic is comparing networks between dif-
ferent groups/conditions. Differential network analysis can eluci-
date the different roles proteins play and constitutes a new way to
acquire a deeper knowledge of protein function. Briefly, it searches
group-specific connections to identify differentially connected
modules, which may reflect key pathways or protein complexes
involved. This method is particularly suitable for cases in which
variations are caused by rewiring in the network [32].
Understanding the advantages and limitations of current and
future network construction methods is critical to address one of
the most challenging tasks in molecular and computational biology,
which is genome-scale inference of protein interaction networks
from exclusively protein abundance datasets.
4 Notes
1. If you have problems installing the packages from the console,

download the packages from BiocManager (https://cran.r-proj
ect.org/web/packages/BiocManager/index.html) and
EdgeR (https://bioconductor.org/packages/release/bioc/
html/edgeR.html).
2. For more examples see https://www.rdocumentation.org/
packages/edgeR/versions/3.10.5/topics/DGEList-class
3. Adapt to more treatments following this example: For four
treatments (C, treatment T1, treatment T2, and treatment T3):
design <- matrix(c(c(1,1,1,0,0,0,0,0,0,0,0,0),
c(0,0,0,1,1,1,0,0,0,0,0,0),c(0,0,0,0,0,0,1,1,1,0,0,0),c(0,0,0,0,0,0,0,0,0,1,1,1)),ncol=4,dimnames=
list(c(’C.1’,’C.2’, ’C.3’,’T1.1’,’T1.2’,’T1.3’,’T2.1’,’T2.2’,’T2.3’, ’T3.1’,’T3.2’,’T3.3),
c(’C.’,’T1.’,’T2.’, ’T3.’)))
4. Another R functions designed for this purpose:
ART_data <- read.delim("clipboard", header=T, row.names=1)

# Previously you copied in the clipboard the matrix protein (Fig. 1A)
5. This is important in order to maintain sensitivity and true

positives detection. For further details see Lèbre et al. [17]. It
is advisable to select regulators by manual revision of proteins
list before doing the workflow.
6. Another possibility is run ARTIVA with Regulators_1 vs Reg-
ulators_1 to infer dynamic interactions between regulators:
all_DBNsub <- list()
for (i in 1:length(Regulators_1)) {
Targets <-Regulators_1[i] #One regulator
Regulators <- Regulators_1[-i] #the rest of regulators
DBNsub <- ARTIVAsubnet(
targetData = as.vector(as.matrix(ART_data[Targets,])),
parentData = as.matrix(ART_data[Regulators,]),
targetName = rownames(ART_data[Targets,]),
parentNames = rownames(ART_data[Regulators,]),
niter = 50000,
dataDescription = rep(1:6, each=3),
nbCPinit = 1, maxCP = 5 , segMinLength = 1,
savePictures= FALSE, saveIterations = FALSE, saveEstimations =
FALSE,
dyn = 0,edgesThreshold = 0.6)
Save network information in the list:
all_DBNsub[[i]] <- DBNsub$network

}
#bind rows of all data in the list
all_DBNsub_df <- Reduce(rbind, all_DBNsub)
#Filter the output
all_DBNsub_df <-all_DBNsub_df[which(all_DBNsub_df$PostProb >
0.6),]
7. Output format: Parent ¼ source of the interaction; Tar-

get ¼ destiny of the interaction; CPstart ¼ time point when
the interaction begins; CPend ¼ time point when the interac-
tion ends; CoeffMean ¼ sign and strength of the interaction.
8. The entered parameters are: niter ¼ number of iterations in the
Reversible-Jump Markov chain Monte Carlo (RJ-MCMC).
This is needed to determine an approximation of each time-
varying network; dataDescription ¼ number of replicates for
each sampling point; nbCPinit ¼ the initial number of change-
points to be considered; maxCP ¼ maximum number of chan-
gepoints to be considered; segMinLength ¼ minimum number
of time points that constitute a changepoint; savePictures-
saveEstimations-saveIterations ¼ algortihm execution reports;
dyn ¼ time delay to search relationships between the abun-
dance of targets and regulators. If dyn ¼ 0, the algorithm
searches for relationships between the abundance of targets
and regulators in the same time point (t). If dyn ¼ 1, the
algorithm searches for relationships between abundance of
targets in (t) and abundance of regulators in (t-1).
9. Use the argument ind.names ¼ TRUE to see the name of each
replica in the plot. Example:
variate",ind.names=TRUE,Y.label="comp 2", X.label="comp 1")
10. For further details of arguments please see

https://github.com/kuppal2/xMWAS/blob/master/example_manual_tutorial/xM
WAS-manual.pdf.
11. For further details see MixOmics [33].

12. A compromise between maximizing the correlation between
blocks, and discriminating the outcome needed to be achieved,
and that the weight in the design matrix could be set to <1
between blocks. We recommend decreasing the value gradually
(0.9–0.1) choosing the highest possible with an acceptable
number of variables chosen by DIABLO (in the test.keepX)
for each data set.
13. Note that the function has been set to favor the small-ish
signature while allowing to obtain a sufficient number of vari-
ables for downstream validation/interpretation. See Singh
et al. [34] for further information.
14. An example of manual entry of parameters:
test.keepX <- list ("met" = c(5:9,
seq(10, 18, 2), seq(20,50,5)),"prot" = c(5:9, seq(10, 18, 2),
seq(20,50,5)),"mrna" = c(5:9, seq(10, 18, 2), seq(20,50,5)))
15. To include a list.keepX for our example (3 datasets with

2 components):
list.keepX <- list(met = c(50,10), prot = c(8,10), mrna = c(10,12))
16. If there are several species with a good number of nr, make for
each species the network STRING and choose the one that
best suits your experiment.
17. For each interaction, the edge attributes include the overall
confidence score and the subscores from seven individual evi-
dence channels. These channels are as follows:
(a) The experiments channel: Evidence comes from actual
experiments in the lab.
(b) The database channel: Evidence that has been asserted by
a human expert curator.
(c) The textmining channel: Pairs of proteins are given an
association score when they are frequently mentioned
together in the same paper, abstract, or even sentence.
(d) The coexpression channel: Pairs of proteins that are con-
sistently similar in their expression patterns, under a vari-
ety of conditions, will receive a high association score.
(e) The neighborhood channel: Genes are given an associa-
tion score where they are consistently observed in each
other’s genome neighborhood (such as in the case of
conserved, cotranscribed “operons”).
(f) The fusion channel: Pairs of proteins are given an associa-
tion score when there is at least one organism where their
respective orthologs have fused into a single, protein-
coding gene.
(g) The co-occurrence channel: STRING evaluates the phy-
logenetic distribution of orthologs of all proteins in a
given organism.
Therefore, the three possible display options will be
displayed:
Evidence: Included Known interaction from curated
databases and experimentally determined), predicted
interaction (gene neighborhood, fusions, and
co-occurrence), and others (textmining, coexpression,

and protein homology).
Confidence: The overall confidence score obtain about
the seven channels.
Molecular action: activation, binding, phenotype,
reaction, inhibition, catalysis, posttranslational modifica-
tion, transcriptional regulation in three classes (positive,
negative, or unspecified).
18. Higher confidence values will produce high significant net-
works but at a cost of losing real interactions. This will reduce
the biological meaning of the network. A balance should be
determined by researcher considering organisms, experimental
setups, and network topologies.
19. Use this option if your network has few connections, or many
unconnected nodes, or you want to discover potential candi-
date molecules not present in your samples.
20. An example of how to proceed according to the network
obtained:
(a) If your network has few connections, or many uncon-
nected nodes, include interactors. Interactors are proteins
not included in your data, but they will act as connection
nodes to your proteins. Select the whole network and in
the STRING tab, click in Expand network where you can
increase the number of interactors.
(b) If you have a very dense network it is possible to apply new
filters for its simplification:
l Increase the cutoff of your network by adding a default
filter (column filter, Edge:score) in the control panel.
For example, Edge:score between 0.8 and 0.999,
STRING will select the edges within that range.
l In the tab Select, select the connected nodes by
selected edges.
l Create a new network in File: New Network from
selected nodes, selected edges.
l Make different nets until you get the one that most
closely explains the experiment.
21. An FDR adjusted p-value of 0.05 implies that 5% of significant
tests will result in false positives. If FDR is increased, the
probability of finding false positives will be higher. If, on the
other hand, if FDR is decreased, the probability of finding false
positives will be lower.
22. If many results are generated, a subset with the most significant
terms can be selected. If, for example, we select 30, ShinyGO
will show the 30 most significant terms (even if there are more
than 50 generated).
23. It is also possible to download the resulting table from the

network using the STRING API tab and use the Cytoscape
workflow for improvement.
24. The visualization of the network may not be clear until the
application of some visual criteria. Continue the protocol to
improve this visualization.
25. Please see Cytoscape manual in: manual.cytoscape.org/.
26. To configure the spread rate:
(a) If depicted network is too spread increase gravity to 1.5
(Default 1) in Layout sheet> Tuning> Gravity and repeat
steps 7 and 8, if your network is too contracted decrease
gravity value.
(b) In this menu, it is advisable to mark “prevent Overlap” in
Behavior alternatives drop-down.
27. See Gephi manuals, available at gephi.org/users/.
Acknowledgments
This work was supported by the projects financed by the Spanish

Ministry of Economy and Competitiveness (AGL2016-77633-P
and AGL2017-83988-R). The Spanish Ministry of Economy and
Competitiveness supported M.E., L.V., and M.M. by Juan de la
Cierva (FJCI-2017-31613) and Ramón y Cajal Programs
(RYC-2015-17871 and RYC-2014-14981), respectively. L.L. and
V.R were supported by fellowship from the FPI (BES-2017-
082092) and FPU (FPU18/02953) (Ministry of Science, Innova-
tion and Universities, Spain), respectively. V.S. was supported by
Youth Employment Operational Program (EJI-17-AGR-164),
cofinanced by Regional Government of Andalusia and the
European Social Fund (ESF).
References
1. Valledor L, Jorrı́n J (2011) Back to the basics: 4. Steuer R, Morgenthal K, Weckwerth W et al
maximizing the information obtained by (2007) A gentle guide to the analysis of meta-
quantitative two dimensional gel electrophore- bolomic data. Methods Mol Biol 358:105–126
sis analyses by an appropriate experimental 5. Lê Cao K-A, Boitard S, Besse P (2011) Sparse
design and statistical analyses. J Proteome PLS discriminant analysis: biologically relevant
74:1–18 feature selection and graphical displays for mul-
2. Singh A, Gautier B, Shannon CP et al (2016) ticlass problems. BMC Bioinformatics 12:253
DIABLO – an integrative, multi-omics, multi- 6. Groth D, Hartmann S, Klie S et al (2013)
variate method for multi-group classification. Principal components analysis. In: Reisfeld B,
bioRxiv. https://doi.org/10.1101/067611 Mayeno AN (eds) Computational toxicology,
3. Scholz M, Selbig J (2007) Visualization and Methods in molecular biology, vol II. Humana
analysis of molecular data. In: Weckwerth W Press, New York City. p chapter 22
(ed) Metabolomics methods protocol. 7. Meng C, Kuster B, Culhane AC et al (2014) A
Humana Press, Totowa, NJ, pp 87–104 multivariate approach to the integration of
multi-omics datasets. BMC Bioinformatics 21. Cramer RD (1993) Partial least squares (PLS):
15:162 its strengths and limitations. Perspect Drug
8. Uppal K, Go Y-M, Jones DP (2017) xMWAS: Discov Des 1:269–278
an R package for data-driven integration and 22. Lê Cao K-A, Rossouw D, Robert-Granie C
differential network analysis. bioRxiv:122432 et al (2008) A sparse PLS for variable selection
9. Von Mering C, Jensen LJ, Snel B et al (2005) when integrating omics data. Stat Appl Genet
STRING: known and predicted protein- Mol Biol 7:35
protein associations, integrated and transferred 23. Lee HK, Hsu AK, Sajdak J et al (2004) Coex-
across organisms. Nucleic Acids Res 33: pression analysis of human genes across many
D433–D437 microarray data sets. Genome Res
10. Pluskal T, Castillo S, Villar-briones A et al 14:1085–1094
(2010) MZmine 2 : modular framework for 24. Tenenhaus A, Philippe C, Guillemot V et al
processing, visualizing, and analyzing mass (2014) Variable selection for generalized
spectrometry-based molecular profile data. canonical correlation analysis. Biostatistics
BMC Bioinformatics 11:395 15:569–583
11. Haas BJ, Papanicolaou A, Yassour M et al 25. Szklarczyk D, Gable AL, Lyon D et al (2019)
(2013) De novo transcript sequence recon- STRING v11: protein-protein association net-
struction from RNA-seq using the Trinity plat- works with increased coverage, supporting
form for reference generation and analysis. Nat functional discovery in genome-wide experi-
Protoc 8:1494 mental datasets. Nucleic Acids Res 47:
12. Valledor L, Romero-Rodriguez MC, Jorrin- D607–D613
Novo JV (2014) Standardization of data pro- 26. Szklarczyk D, Franceschini A, Wyder S et al
cessing and statistical analysis in comparative (2015) STRING v10: protein–protein interac-
plant proteomics experiment. Methods Mol tion networks, integrated over the tree of life.
Biol 1072:51–60 Nucleic Acids Res 43:D447–D452
13. Escandon M, Valledor L, Pascual J et al (2017) 27. Szklarczyk D, Morris JH, Cook H et al (2017)
System-wide analysis of short-term response to The STRING database in 2017: quality-
high temperature in Pinus radiata. J Exp Bot controlled protein-protein association net-
68:3629–3641 works, made broadly accessible. Nucleic Acids
14. Pascual J, Cañal MJ, Escandón M et al (2017) Res 45:D362–D368
Integrated physiological, proteomic, and meta- 28. Ge SX, Jung D (2018) ShinyGO: a graphical
bolomic analysis of ultra violet (UV) stress enrichment tool for animals and plants.
responses and adaptation mechanisms in Pinus bioRxiv:315150
radiata. MCP 16:485–501 29. Shannon P, Markiel A, Ozier O et al (2003)
15. Branson OE, Freitas MA (2016) A multi- Cytoscape: a software environment for
model statistical approach for proteomic spec- integrated models of biomolecular interaction
tral count quantitation. J Proteome 144:23–32 networks. Genome Res 13:2498–2504
16. Robinson MD, Oshlack A (2010) A scaling 30. Marbach D, Costello JC, Küffner R et al
normalization method for differential expres- (2012) Wisdom of crowds for robust gene net-
sion analysis of RNA-seq data. Genome Biol. work inference. Nat Methods 9:796
https://doi.org/10.1186/gb-2010-11-3-r25 31. Schiffthaler B, Serrano A, Delhomme N et al
17. Lèbre S, Becq J, Devaux F et al (2010) Statisti- (2019) Seidr: a gene meta-network calculation
cal inference of the time-varying structure of toolkit. bioRxiv:250696
gene-regulation networks. BMC Syst Biol 32. Grimes T, Potter SS, Datta S (2019) Integrat-
4:130 ing gene regulatory pathways into differential
18. Nagarajan R, Scutari M, Lèbre S (2013) Bayes- network analysis of gene expression data. Sci
ian networks in R with applications in systems Rep 9:5479
biology. Springer-Verlag, New York. https:// 33. Rohart F, Gautier B, Singh A et al (2017)
doi.org/10.1007/978-1-4614-6446-4 mixOmics: an R package for ‘omics feature
19. Wold H (1966) Estimation of principal com- selection and multiple data integration. PLoS
ponents and related models by iterative least Comput Biol 13:e1005752
squares. Multivariate analysis, NewYork. Aca- 34. Singh A, Gautier B, Shannon CP et al (2018)
demic Press, Cambridge DIABLO: from multi-omics assays to bio-
20. Wold S, Sjöström M, Eriksson L (2001) marker discovery, an integrative approach.
PLS-regression: a basic tool of chemometrics. bioRxiv. https://doi.org/10.1101/067611
Chemom Intell Lab Syst 58:109–130
Chapter 4
Specific Protein Database Creation from Transcriptomics

Data in Nonmodel Species: Holm Oak (Quercus ilex L.)
Vı́ctor M. Guerrero-Sanchez, Ana M. Maldonado-Alconada,
Rosa Sánchez-Lucas, and Maria-Dolores Rey
Abstract
Proteomics encompasses efforts to identify all the proteins of a proteome, with most of studies about plant
proteomics based on a bottom-up mass spectrometry (MS) strategy, in which the proteins are subjected to
digestion by trypsin and the tryptic fragments are subjected to MS analysis. The identification of proteins
from MS/MS spectra has been performed using different algorithms (Mascot, Sequest) against plant
protein sequence databases such as UniProtKB or NCBI_Viridiplantae. But these databases are not the
best choice for nonmodel species where they are underrepresented, resulting in poor identification rates. A
high identification rate requires a sequenced and well-annotated genome of the species under investigation.
For nonmodel organisms, the identification of proteins is challenging since, in the best of the cases, only hits
or orthologs instead of gene products are identified. However, in the absence of a sequenced genome, this
situation can be improved using transcriptome data to generate a specific species database to compare
proteins. In this chapter, we report the protein database construction from RNA-Seq data in a nonmodel
species, in this particular case Holm oak (Q. ilex).
Key words Quercus ilex, Nonmodel species proteomics, Protein databases, RNA-Seq analysis, Cus-
tom protein databases
1 Introduction
The main objective of a proteomics experiment is to identify and

quantify as many protein species or proteoforms as possible. The
identification of proteins is generally based on the comparison
between the experimental m/z data and the theoretical one
deduced from the in silico translation of DNA or RNA sequences,
following a well-known bottom up scheme. Currently, several algo-
rithms are widely used in a proteomics experiment, such as
SEQUEST [1], MASCOT [2], and ANDROMEDA [3], among
others. The availability of a well-annotated genome (e.g., Arabi-
dopsis and rice) makes possible a quick identification of gene pro-
ducts, including mRNA splicing or posttranslational variants
57
58 Vı́ctor M. Guerrero-Sanchez et al.
(PTMs). However, the situation is quite different when working

with organisms scarcely represented in databases.
Considering the above concerns, the question arises, “What is
the best way for facing proteomics research with orphan species?”
Genome sequencing and annotation would be the first option, but
in most cases performing this step is unrealistic for many reasons
(too expensive and time consuming to be conducted for individual
laboratories). A more feasible alternative is to perform a de novo
transcriptome for later building a protein database out of the
transcript reads. Furthermore, these sequence reads can be com-
plemented by adding all available sequences (DNA or RNA) dis-
persed in the literature or deposited in different databases. This
approach has been used with orphan forest tree species such as
Quercus ilex [4] and Pinus radiata [5]. We observed that the
employ of such a database notably increased the number of proteins
identified and the confidence of the identification [4, 5].
In this chapter, a workflow for the creation of a specific species
protein sequence database from RNA-Seq data that enables effec-
tive proteomic profiling is described. This protocol is illustrated
using as example the transcriptome of Holm oak, generated from
RNA sequence experiments of a pool of equal amounts homoge-
nized tissue from acorn embryos, leaves and roots [6, 7]. As part of
the workflow, we first explain the generation of the transcriptome
of Holm oak and the bioinformatics tools used, including the
following steps: (1) trimming to select only high-quality sequences;
(2) de novo assembling of all the clean reads; (3) evaluation of
structure and completeness of the de novo transcriptome; (4) anno-
tating candidate transcripts; and, finally, (5) constructing and anno-
tating a protein database.
2 Materials
The development of this protocol was carried out using a Linux

Ubuntu distribution (GNU/Linux distribution Ubuntu 18.04 or
higher) as an operating system. It is also applicable on systems
based on Ubuntu, such as Linux Mint.
2.1 Nucleotide mRNA reads in FastQ format obtained from the Illumina sequenc-
Sequences ing platform.
2.1.1 Required Software All the software required for completing this protocol is publicly
available. Documentation and software can be freely downloaded
from the following addresses.
2.2 Raw Reads https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Quality Control
Protein Database Creation in Non-Model Organisms 59
2.3 Preprocessing http://hannonlab.cshl.edu/fastx_toolkit/

Raw Data https://cutadapt.readthedocs.io/
2.4 Assembling http://denovoassembler.sourceforge.net/

Raw Data https://www.mirametrics.com/
https://github.com/trinityrnaseq/trinityrnaseq/.
https://www.ebi.ac.uk/~zerbino/velvet/
http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss
https://sourceforge.net/projects/soapdenovotrans/
2.5 Removing http://bioinformatics.org/cd-hit/

Redundant Transcripts
2.6 Evaluating http://cab.spbu.ru/software/rnaquast/

the Assembly http://deweylab.biostat.wisc.edu/detonate/
Structure https://busco.ezlab.org/
and Completeness of a
Transcriptome
2.7 Annotation of a http://www.bioinfocabd.upo.es/node/11

Transcriptome
2.8 Construction of a https://github.com/TransDecoder/TransDecoder/

Custom Protein
Database
3 Methods
3.1 Nucleotide 1. Download raw transcriptome data of Holm oak from the
Sequences European Nucleotide Archive (ENA) at European Bioinfor-
matics Institute (EBI) in FastQ-Format (Illumina_ R1 and
Illumina_R2). Repository can be accessed via http://www.
ebi.ac.uk/ena. Sequencing reads were uploaded using the
accession number: SRR5815058 (see Note 1).
$ wget
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR581/008/SRR5815058/SRR5815058_1.fastq.g
z
$ wget
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR581/008/SRR5815058/SRR5815058_2.fastq.g
z
Both FASTQ files (SRR5815058_1 and SRR5815058_2)

should be downloaded, as they were obtained from an Illumina
sequencing platform, which is capable of paired-end sequencing.
3.2 Raw Reads Quality control (QC) is critical to ensure that RNA-seq data are of
Quality Control (See high quality and suitable for subsequent analyses. Due to the
Note 2) presence of intrinsic biases and limitations, such as nucleotide
composition bias, GC bias, and RCA bias, a quality control should
be carried out in a RNA-seq. These biases directly affect the accu-
racy of many RNA-seq applications [8, 9] and they can be checked
from raw sequences using tools like FastQC. The FastQC software
provides a FastQC Report of a set of high throughputs sequencing
reads, allowing identifying sequencing errors or bias for a later
trimming. Basic statistics (total sequences, filtered sequences,
sequence length, GC %) are provided by using this tool.
1. Download and install the FastQC software v0.11.8.
$ wget
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zi
p
$ unzip fastqc v0.11.8.zip
2. Unzip both FASTQ files.
$ gunzip SRR5815058_1.fastq.gz
$ gunzip SRR5815058_2.fastq.gz
3. Run FastQC (see https://dnacore.missouri.edu/PDF/Fas

tQC_Manual.pdf).
$ ./fastqc SRR5815058_1.fastq
$ ./fastqc SRR5815058_2.fastq
3.3 Preprocessing Preprocess all raw data using Fastx_Toolkit version 0.0.13 and
Raw Data Cutadapt version 1.9 [10]. The aim of this step is retaining only
high-quality sequences by removing of low-quality reads with
ambiguous bases and adapter sequences.
1. Download and install Fastx_Toolkit.
$ wget
http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit_0.0.13_binaries_Linux_
2.6_amd64.tar.bz2
$ tar -xjvf fastx_toolkit_0.0.13_binaries_Linux_2.6_amd64.tar.bz2
Filter fastq sequences using a minimum Phred score (Phred

score indicates the quality of the identification of the
nucleobases generated by DNA sequencing):
$ ./fastq_quality_filter -q 20 -i SRR5815058_1.fastq -o
SRR5815058_1_f.fastq
$ ./fastq_quality_filter -q 20 -i SRR5815058_2.fastq -o
SRR5815058_2_f.fastq
–q indicates minimum quality score (commonly between

20 and 40).
2. Download and install cutadapt.
$ sudo apt install python3-pip

$ pip3 install --user --upgrade cutadapt
3. Remove all the overrepresented sequences.
$ cutadapt -m 100 -a adapter

ATCGGAAGAGCACACGTCTGAACTCCAGTCACCGGCTATGATCTC
GTATG SRR5815058_1_f.fastq -o SRR5815058_1_fc.fastq
$ cutadapt -m 100 -a adapter
ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGCCTCTATGTGTA
GATCTC SRR5815058_2_f.fastq -o SRR5815058_2_fc.fastq
–m discards processed reads shorter than a length.
3.4 Assembling The assembling of the de novo transcriptome can be carried out
Raw Data using different assemblers such as Velvet [11], Trans-ABySS [12],
SOAPdenovo [13], TRINITY [14], MIRA [15], or RAY [16],
among others. In Holm oak, we used TRINITY version 2.5.1,
RAY version 2.3.1, and MIRA V5rc1 (see Note 3). The use of
more than a single assembly should be carried out to assess the
quality of an assembly [17]. This is due to the read lengths, read
counts, and error profiles that are produced by different next-
generation sequencing technologies [17].
1. Download and install RAY.
$ sudo apt install ray
2. Run RAY selecting the following launched parameters (see

Note 4 and http://denovoassembler.sourceforge.net/).
$ mpiexec -n 8 Ray -k 31 -p SRR5815058_1_fc.fastq SRR5815058_2_fc.fastq -o
ray_assembly_folder
3. Download and install MIRA.
$ sudo apt install mira-assembler
4. Run MIRA selecting the following launched parameters:
job = denovo,est,accurate > parameters = NW:maxreadnamelength=100

COMMON SETTINGS
GENERAL: number of threads = 12
MERSTATISTICS: lossless digital normalisation = yes
ALIGN:min relative score = 70
ASSEMBLY:minimum read length = 100

CLIPPING:quality clip = no
CLIPPING:qc minimum quality = 15
CLIPPING:qc window length = 20
CLIPPING:clip polyat = yes
CLIPPING:cp min sequence len = 12
technology = solexa
5. Download and install Trinity selecting the following launched

parameters:
$ wget https://github.com/trinityrnaseq/trinityrnaseq/archive/Trinity-
v2.8.4.zip
$ unzip Trinity-v2.8.4.zip
$ sudo apt-get instann cmake
$ make
$ make plugins
$ make install
6. Run TRINITY selecting the following launched parameters.
$ ./Trinity --seqType fq --left SRR5815058_1_fc.fastq --right

SRR5815058_2_fc.fastq --CPU 20 --max_memory 1000G
3.5 Removing A high redundancy of transcripts leads to an increase in the amount

Redundant Transcripts of data to process and computer requirements. To cluster and
compare nucleotide or protein sequences, the CD-HIT algorithm
[18, 19] is highly recommended.
1. Download and install CD-HIT.
$ sudo apt install cd-hit
2. Run CD-HIT (see http://www.bioinformatics.org/cd-hit/cd-

hit-user-guide.pdf).
$ cdhit -i input Contigs.fasta -o output clusteredassembly.fasta -c 0.95 -T 4 (see

Note 5).
–c 0.95, indicates 95% identity, is the clustering threshold

–T number of threads: 4.
3.6 Evaluating Evaluate the structure of the generated assemblies using the rna-
the Assembly QUAST (Quality Assessment Tool for Transcriptome Assemblies)
Structure software version 1.5.1 [20] (see Note 6).
and Completeness of a 1. Download and install rnaQUAST.
Transcriptome
$ wget http://cab.spbu.ru/files/rnaquast/release1.5.2/rnaQUAST-1.5.2.tar.gz
$ tar -xjvf rnaQUAST-1.5.2.tar.gz$ sudo apt-get install python3-matplotlib
$ pip install joblib
$ pip install gffutils
$ sudo apt-get install ncbi-blast+
$ wget http://research-pub.gene.com/gmap/src/gmap-gsnap-2019-03-15.tar.gz
$ tar -xjvf gmap-gsnap-2019-03-15.tar.gz
$ ./configure
$ make
$ make install
2. Run rnaQUAST (see http://cab.spbu.ru/software/rnaquast/).
$ python rnaQUAST.py --transcripts clusteredassembly.fasta --reference

OCV4_assembly_final.fsa
By using the transcriptome of Q. robur and Q. petraea (https://

urgi.versailles.inra.fr/download/oak/OCV4_assembly_final.fsa).
If a reference genome is available, the following command-line
should be used:
$ python rnaQUAST.py --transcripts clusteredassembly.fasta --reference
Qrob_PM1N.fa -gft Qrob_PM1N_genes_20161004.gff
By using the genome of Q. robur (https://urgi.versailles.inra.

fr/download/oak/Qrob_PM1N.fa.gz and https://urgi.versailles.
inra.fr/download/oak/Qrob_PM1N_genes_20161004.gff.gz).
The analysis of the structure of a de novo transcriptome can be
complemented with other transcriptome-specific metrics obtained
by using the DETONATE (DE novo TranscriptOme rNa-seq
Assembly with or without the Truth Evaluation) package version
1.11 [21]. In order to realign the assembled contigs with the reads
used to generate a more complete assembly using the RSEM-EVAL
[22] assembly evaluator of DETONATE and obtain the value of
E90N50, overall score values, length of alignable reads, and num-
ber of alignments in total, among others.
1. Download and install DETONATE.
$ wget http://deweylab.biostat.wisc.edu/detonate/detonate-1.11-precompiled.tar.gz
$ tar -xjvf precompiled.tar.gz
2. Run DETONATE.
$ ./rsem-eval-estimate-transcript-length-distribution clusteredassembly.fasta
parameter_file
$ ./rsem-eval-calculate-score -p 8 --transcript-length-parameters parameters_file
SRR5815058_1_fc.fastq SRR5815058_2_fc.fastq clusteredassembly.fasta
output_folder
Evaluate the completeness of the transcriptome by BUSCO

(Benchmarking Universal Single-Copy-Orthologs) version 3.0.2
[23, 24].
3. Download and install BUSCO.
$ wget https://gitlab.com/ezlab/busco/-/archive/master/busco-master.zip$ gunzip

busco-master.zip
$ sudo apt-get install ncbi-blast+
$ sudo apt-get install hmmer
$ wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus.current.tar.gz
$ tar -xjvf augustus.current.tar.gz
$ make
$ make install
According to BUSCO manual (see https://busco.ezlab.org/

v1/files/BUSCO_userguide.pdf): Do not forget to create a config.
ini file in the config/subfolder. You can set the BUSCO_CONFIG_-
FILE environment variable to define a custom path (including the
filename) to the config.ini file, useful for switching between configura-
tions or in a multiuser environment. Augustus uses several executables
and PERL scripts. Please refer to Augustus documentation for PERL
requirements. In addition to the entries in the config.ini file, August-
us requires environment variables to be declared as follows:
$ export PATH="/path/to/AUGUSTUS/augustus-3.2.3/bin:$PATH"
$ export PATH="/path/to/AUGUSTUS/augustus-3.2.3/scripts:$PATH"
$ export AUGUSTUS_CONFIG_PATH="/path/to/AUGUSTUS/augustus-
3.2.3/config/"
$ sudo python setup.py install
$ wget https://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz
$ tar –xf embryophyta_odb9.tar.gz
4. Run BUSCO (see https://busco.ezlab.org/).
$ python run_BUSCO.py -i clusteredassembly.fasta -o output_name -l

embryophyta_odb9 -m tran
–m or --mode sets the assessment MODE: genome ( m geno),

proteins ( m prot), transcriptome ( m tran).
–l indicates location of the BUSCO lineage data to use.
$ python generate_plot.py -wd “working directory”
–wd name or full path to folder containing BUSCO short_-

summary files.
3.7 Annotation of a Annotate the transcriptome using the Sma3s annotator version
Transcriptome 2 [25, 26] (see Note 7). Sma3s (Sequence massive annotation by
three modules) is an easy-to-use tool for high-throughput annota-

tion that provides both accuracy and broad applicability. The anno-
tation of biological sequences is to associate biological information
to sequences of interest (see http://www.bioinfocabd.upo.es/
node/11).
1. Download Sma3s (see Note 8).
$ wget http://www.bioinfocabd.upo.es/node/11#sma3s.pl
$ wget http://www.bioinfocabd.upo.es/sma3s/db/uniref90.fasta.gz
$ gunzip uniref90.fasta.gz
$ wget http://www.bioinfocabd.upo.es/sma3s/db/uniref90.annot.gz
$ gunzip uniref90.annot.gz
2. Run Sma3s.
$ perl sma3s_v2.pl -num_threads 10 -i clusteredassembly.fasta -d

uniref90.fasta -go -goslim -nucl
3.8 Construction of a Generate a six-frame translation for each sequence of the transcrip-
Custom Protein tome using the TransDecoder software version 5.5.0 [27].
Database
1. Download TransDecoder.
$ wget
https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-
v5.5.0.zip
$ gunzip TransDecoder-v5.5.0.zip
2. Run TransDecoder (see https://github.com/TransDecoder/

TransDecoder/wiki).
$ ./TransDecoder.LongOrfs -t clusteredassembly.fasta
-m 75 –o outputfolder
–m length of amino acids

3. Process the spectra using the SEQUEST algorithm available in
Proteome Discoverer version 2.1 (Thermo-Scientific,
Massachusetts, USA).
4. Use the FASTA sequences generated previously (see Note 9).
5. Select peptides following these settings: (1) precursor mass
tolerance sets to 10 ppm and fragment ion mass tolerance to
0.8 Da, (2) only charge states +2 or greater, (3) identification
confidence sets of 5% FDR (false discovery rate), (4) variables
modifications such as oxidation of methionine and fixed modi-
fication such as carbamidomethyl cysteine formation, and (5) a
maximum of two missed cleavage for all searches.
6. Save archive with all the identified proteins generating a custom
species database.
4 Notes
1. This protocol has been employed in the construction of a

Holm oak specific transcriptome by the Illumina Hiseq 2500
sequencing platform using 100 bp paired-end sequencing.
Additionally, this protocol is compatible with raw data from
Ion Torrent sequencing platform as previously described in
[7]. At ENA, raw data can be downloaded by either experiment
accession (SRX2993508) or run accession (SRR5815058).
2. Raw data provided by the platform contained hundreds of
reads in a single run; hence, before drawing biological conclu-
sions, an analysis of sequences should always be performed to
monitor the quality of raw data.
3. Trinity version 2.5.1 uses an algorithm based on Bruijn graphs
[14]. MIRA version 4.9.6 is based on the strategy known as
Overlap / Layout / Consensus [15]. Ray version 2.3.1 also
uses de Bruijn graphs but its framework is not based on the
Eulerian steps [16].
4. Other launched parameters can be used in the assembling of a
de novo transcriptome using RAY:
– minimum-seed-length “minimumSeedLength” ->
Changes the minimum seed length, default is
100 nucleotides,
– minimum-contig-length “minimumContigLength” ->
Changes the minimum contig length, default is
100 nucleotides,
– use-maximum-seed-coverage “maximumSeedCoverage-
Depth” -> Ignores any seed with a coverage depth above
this threshold. The default is 4294967295,
– use-minimum-seed-coverage “minimumSeedCoverage-
Depth” -> Sets the minimum seed coverage depth. Any
path with a coverage depth lower than this will be discarded.
The default is 0.
5. The cutoff could be increased to 99% and 100%, depending on
the stricness required in the experiment.
6. In order to carry out an evaluating using rnaQUAST, it is
recommended to provide either FASTA files with all transcripts
or align transcripts to the reference genome [20] to get the
values of N50, L50, N75, L75, and GC % of the contigs
generated in the assembly and choose those with higher N50
and best contig size distribution. In order to align assemblies
with complete and annotated transcriptome sequences of phy-
logenetically related species. In Q. ilex, the transcriptomes of
Q. suber were used in the alignment of the de novo Q. ilex

transcriptome.
7. An annotation process should be done using a complete data-
base (e.g., UniRef, Swiss-Prot or NR-NCBI) with a lower
e-value cutoff than 10 6 (default lower than 10 6). Moreover,
other software such as Trinotate or Blast2Go can be used in the
annotation of a transcriptome [28, 29].
8. According to Sma3s (see http://www.bioinfocabd.upo.es/
node/11): Sma3s has low computing requirements and can
be used on virtually any computer. It is written in Perl lan-
guage, and you need its interpreter (http://www.perl.com),
which is preinstalled in Linux and Mac OS X (in Windows
it will not be necessary). Additionally, you need to install the
Blas + package for your operating system.
9. A de novo peptide sequencing and annotation can be per-
formed by the NOVOR software [30].
Acknowledgments
The authors thank University of Cordoba (UCO-CeiA3) and the

staff of the Central Service for Research Support (SCAI) at the
University of Cordoba (Spain) for its technical support in the
bioinformatics data analysis. This research was supported by the
grant ENCINOMICA BIO2015-64737-R from Spanish Ministry
of Economy and Competitiveness. MD-R and LV thanks, respec-
tively, the contracts “Ayudas Juan de la Cierva-Formación (FJCI-
2016-28296)” and “Programa Ramón y Cajal (RYC-2015-
17871)” of the Spanish Ministry of Science, Innovation, and
Universities.
References
1. Eng JK, McCormack AL, Yates JR (1994) An Characterization of Quercus ilex seed and
approach to correlate tandem mass spectral Pinus radiata needle proteomes by using
data of peptides with amino acid sequences in SEQUEST and custom databases. J Proteome
a protein database. J Am Soc Mass Sprectrom 105:85–91
5:976–989 5. Valledor L, Jorrı́n-Novo JV, Rodrı́guez JL et al
2. Perkins DN, Pappin DJ, Creasy DM et al (2010) Combined proteomic and transcrip-
(1999) Probability-based protein identification tomic analysis identifies differentially expressed
by searching sequence databases using mass pathways associated to Pinus radiata needle
spectra. Bioinformatics 20:3551–3567 maturation. J Proteome Res 9:3954–3979
3. Cox J, Neuhauser N, Michalski A et al (2011) 6. Guerrero-Sanchez VM, Maldonado-Alconada
Andromeda: a peptide search engine integrated AM, Amil-Ruiz F et al (2017) Holm oak
into the MaxQuant environment. J Proteome (Quercus Ilex) transcriptome. De novo
Res 10:1794–1805 sequencing and assembly analysis. Front Mol
4. Romero-Rodrı́guez MC, Pascual J, Valledor L Biosci 4:70
et al (2014) Improving the quality of protein 7. Guerrero-Sanchez VM, Maldonado-Alconada
identification in non-model species. AM, Amil-Ruiz F et al (2019) Ion torrent and
lllumina, two complementary RNA-seq plat- 19. Fu L, Niu B, Zhu Z et al (2012) CD-HIT:
forms for constructing the holm oak (Quercus accelerated for clustering the next-generation
ilex) transcriptome. PLoS One 14:e0210356 sequencing data. Bioinformatics
8. Benjamini Y, Speed TP (2012) Summarizing 28:3150–3152
and correcting the GC content bias in high- 20. Bushmanova E, Antipov D, Lapidus A et al
throughput sequencing. Nucleic Acids Res 40: (2016) RnaQUAST: a quality assessment tool
e72 for de novo transcriptome assemblies. Bioinfor-
9. Hansen K, Brenner S (2010) Biases in Illumina matics 32:2210–2212
transcriptome sequencing caused by random 21. Li B, Fillmore N, Bai Y et al (2014) Evaluation
hexamer priming. Nucleic Acids Res 38:e131 of de novo transcriptome assemblies from
10. Martin M (2011) Cutadapt removes adapter RNA-Seq data. Genome Biol 15:553
sequences from high-throughput sequencing 22. Li B, Dewey CN (2011) RSEM: accurate tran-
reads. EMBnet J 17:10–20 script quantification from RNA-Seq data with
11. Zerbino DR, Birney E (2008) Velvet: algo- or without a reference genome. BMC Bioinfor-
rithms for de novo short read assembly using matics 12:323
de Bruijn graphs. Genome Res 18:821–829 23. Simão FA, Waterhouse RM, Ioannidis P
12. Simpson JT, Wong K, Jackman SD et al (2009) (2015) BUSCO: assessing genome assembly
ABySS: a parallel assembler for short read and annotation completeness with single-copy
sequence data. Genome Res 9:1117–1123 orthologs. Bioinformatics 31:3210–3212
13. Li R, Zhu H, Ruan J et al (2010) De novo 24. Waterhouse RM, Seppey M, Simão FA et al
assembly of human genomes with massively (2017) BUSCO applications from quality
parallel short read sequencing. Genome Res assessments to gene prediction and phyloge-
20:265–272 nomics. Mol Biol Evol 35:543–548
14. Grabherr MG, Haas BJ, Yassour M et al (2011) 25. Muñoz-Merida A, Viguera E, Claros MG et al
Trinity: reconstructing a full-length transcrip- (2014) Sma3s: a three-step modular annotator
tome without a genome from RNA-Seq data. for large sequence datasets. DNA Res
Nat Biotechnol 29:644–652 21:341–353
15. Chevreux B, Wetter T, Suhai S (1999) Genome 26. Casimiro-Soriguer CS, Muñoz-Mérida A,
sequence assembly using trace signals and addi- Pérez-Pulido AJ (2017) Sma3s: a universal
tional sequence information. German Conf tool for easy functional annotation of pro-
Bioinformatics 99:45–56 teomes and transcriptomes. Proteomics
16. Boisvert S, Laviolette F, Corbeil J (2010) Ray: 17:1700071
simultaneous assembly of reads from a mix of 27. Haas B, Papanicolaou A (2017) TransDecoder.
high-throughput sequencing technologies. J https://transdecoder.github.io
Comput Biol 17:1519–1533 28. Bryant DM, Johnson K, DiTommaso T et al
17. Bradnam KR, Fass JN, Alexandrov A et al (2017) A tissue-mapped axolotl de novo tran-
(2013) Assemblathon 2: evaluating de novo scriptome enables identification of limb regen-
methods of genome assembly in three verte- eration factors. Cell Rep 18:762–776
brate species. Gigascience 2:10 29. Conesa A, Götz S (2008) Blast2GO: a compre-
18. Weizhong L, Godzik A (2006) Cdhit: a fast hensive suite for functional analysis in plant
program for clustering and comparing large genomics. Int J Plant Genomics 2008:619832
sets of protein or nucleotide sequences. Bioin- 30. Ma B (2015) Novor: real-time peptide de novo
formatics 22:16589 sequencing software. J Am Soc Mass Spectrom
26:1885–1894
Chapter 5
Subcellular Proteomics in Conifers: Purification of Nuclei

and Chloroplast Proteomes
Laura Lamelas, Lara Garcı́a, Marı́a Jesús Cañal, and Mónica Meijón
Abstract
The complexity of the plant cell proteome, exhibiting thousands of proteins whose abundance varies in
several orders of magnitude, makes impossible to cover most of the plant proteins using standard shotgun-
based approaches. Despite this general description of plant proteomes, the complexity is not a big issue
(current protocols and instrumentation allow for the identification of several thousand proteins per
injection), low or medium abundant proteins cannot be detected most of times, being necessary to fraction
or perform targeted analyses in order to detect and quantify them. Among fractioning choices, cell
fractioning in its different organelles is a good strategy for gaining not only a deeper coverage of the
proteome but also the basis for understanding organelle function, protein dynamics, and trafficking within
the cell, as nuclear and chloroplast communication. This approach is used routinely in many labs working
with model species; however, the available protocols focusing on tree species are scarce. In this chapter, we
provide a simple but robust protocol for isolating nuclei and chloroplasts in pine needles that is fully
compatible with later mass spectrometry–based proteome analysis.
Key words Nucleus, Chloroplast, Subcellular proteomics, Forest species, Pinus
1 Introduction
Current LC-MS–based methodologies allow for the identification

of thousands of proteins per injection, being close to perform
almost complete proteome coverage for simple organisms
[1]. Despite analyzing more than three thousand proteins in a
single injection is an amazing milestone, not even dreamt some
years ago, this analytical capability is not enough for covering cells
of more complex organisms such as a plant cell. Plant cells, with
tens of thousands of proteins, are a fascinating living systems for
proteome analyses but current possibility for obtaining complete
proteomes in a single injection is beyond our range. This limitation
is translated in the fact that when performing single-injection
untargeted shotgun-based proteomics, we will be able to quantify
about two thousand proteins, three thousand in the best of the
69
70 Laura Lamelas et al.
situations. If targeting less abundant proteomes is required, frac-

tioning is a required step. It can be performed after protein isola-
tion (MudPIT) [2], or at cellular level, purifying the different
organelles separately [3].
Subcellular proteomics, besides decreasing proteome complex-
ity, allows for the targeted study of the compartments in eukaryotic
cells, allowing for a better knowledge of organelle function, protein
dynamics, trafficking, and the understanding of the proliferation of
multigene families and the specialization of cellular functions
[4]. Consequently, the exploration of the proteome of the cell at
subcellular level is therefore both a practical approach and also a
functional necessity for proper interpretation of dynamics of prote-
ome which requires detailed information about compartmentation
of protein machinery.
In plants and other organisms, the functions of the nucleus are
crucial for cell proliferation and the regulation of gene expression
during development and/or in response to biotic/abiotic stresses
[5]. Knowing nuclear proteome dynamics is essential to increase
our understanding of how both environmental and cytoplasmic
signals are sensed and translated into molecular responses, mainly
through the proteins that guide and control the gene expression.
Nuclear proteomics is an advantageous approach for investigating
the mechanisms underlying plant responses to abiotic stresses,
including protein–protein interactions, spliceosome complex, his-
tones, enzyme activities, posttranslational modifications, and
intrinsically disordered proteins [5, 6].
On the other hand, the chloroplast is a major plant cell organ-
elle that fulfills basic metabolic and biosynthetic functions
[7]. Chloroplast is indispensable for plant response to environmen-
tal stresses, growth and development, whose function is regulated
by different plant hormones. The chloroplast proteome is encoded
by chloroplast genome and nuclear genome, which play essential
roles in plant photosynthesis, metabolism and other biological
processes [8]. Chlorophyll precursors, photosynthetic electron
transport, and sugars have all been shown to be involved in signal-
ing from the chloroplast to the nucleus, suggesting the presence of
multiple signaling pathways of coordination between both cellular
compartments.
Chloroplast function requires the import of both nucleus-
encoded photosynthetic proteins and cytoplasmic factors that reg-
ulate the expression of chloroplast genes. The plastid also plays a
role in nuclear gene expression, with signals that originate in the
chloroplast acting to regulate transcription of nucleus-encoded
photosynthetic genes, a process called retrograde signaling. In the
last 10 years, many studies have revealed the nature of nucleus-
derived molecules that affect chloroplast gene expression at all
levels [9]. Although it has been known for many years that the
expression of a subset of nuclear genes, whose products are
Pine Nuclei and Chloroplast Proteomics 71
involved in photosynthesis, depends on the presence in the cell of

functional plastids, little progress has been made in elucidating the
signaling molecules or mechanisms involved in retrograde signal-
ing. However, several recent discoveries have made inroads into this
complex mechanism and have begun to shed light on the black box
of signaling from the chloroplast to the nucleus [9–11]. For exam-
ple, it has recently been discovered that, when experiencing stress
or damage from various sources plants use chloroplast-to-nucleus
communication to regulate gene expression and help them cope.
Subcellular proteomics stands on the shoulders of decades of
biochemical research that has developed methods for separation of
subcellular compartments. Numerous laboratories have worked
over the years to improve separation techniques, enabling incre-
mental limitation of contamination in isolation methods [4]. Such
subcellular fractionation protocols typically utilize density-gradient
centrifugation and have enabled the enrichment of crude micro-
somes, the cytosol, the plasmalemma, the nuclei, and chloroplasts.
In this context, this chapter describes the experimental steps
involved in the enrichment of nuclei and chloroplast from needles
of conifers, tissue especially complex in biochemical terms, in order
to analyze their proteomes and cross talk signaling between both
organelles. An overview of the workflow for the purification of
nuclei and chloroplast subproteome is presented in Fig. 1.
2 Materials
All solutions must be prepared using ultrapure water (prepared by

purifying deionized water to attain a resistivity of 18 MΩ cm at
25 C) and analytical grade reagents.
2.1 Nuclei Isolation 1. Plant material

Approximately 1 g of needles (see Note 1).
2. Reagents and Solutions
All buffers must be made fresh and kept on ice (4 C).
– Organelle Extraction Buffer (OEB): 0.44 M sucrose,
10 mM Tris–HCl pH 8.0, 5 mM β-mercaptoethanol,
0.015 mM PMSF.
– Plastid Disruption Buffer (PDB): 0.25 M sucrose, 10 mM
Tris–HCl pH 8.0, 10 mM MgCl2, 1% (v/v) Triton X-100,
5 mM β-mercaptoethanol, 0.015 mM PMSF.
– Washing Buffer (WB): 0.25 M sucrose, 10 mM Tris–HCl
pH 8.0, 10 mM MgCl2, 5 mM β-mercaptoethanol,
0.015 mM PMSF.
– Pellet Suspension Buffer (PSB): WB: ddH2O (2:1, v/v).
– Discontinuous Sucrose Gradient (DSB) (see Note 2):
SUBCELLULAR PROTEOME
EXTRACTION
NUCLEUS CHLOROPLAST
• Mortar and HOMOGENEIZATION

• Rotor-stator
liquid N2 1 gram of tissue
• Organelle • Chloroplast
Extraction Isolation
Buffer PLASTID Buffer
• Filtration ISOLATION • Filtration
• Centrifugation • Differential
centrifugation
• Plastid
• Chloroplast
Disruption
ORGANELLE Isolation
Buffer
CLEANING Buffer
• Centrifugation
• Differential
• Washing Buffer
centrifugation
• Centrifugation
• Pellet
Suspension • Percoll-
Buffer PURIFICATION BY Sucrose
• Sucrose DISCONTINUOUS GRADIENT gradient
gradient
PROTEIN EXTRACTION
• Protein Extraction Buffer and SDS
• Incubate by shaking (15 min,RT)
• Buffer Z and phenol
• Centrifugation
• Protein Precipitation Buffer
• Incubate overnight (-20ºC)
PROTEIN PURIFICATION
• Centrifugation
• Washes with acetone
• Air dry
PROTEIN RESUSPENSION
• Protein Solubilization Solution
PROTEIN ASSESSMENT
LC-MS & BIOINFORMATIC

ANALYSIS
Fig. 1 Overview of the workflow of nuclei and chloroplast subproteome purification

Solution a. 0.32 M sucrose, 3 mM CaCl2, 2 mM Mg

(C2H3O2)2, 0.1 mM EDTA, 10 mM Tris–HCl
pH 8.0, 1 mM DTT, 0.5% (v/v) NP-40.
Solution b. 2 M sucrose, 5 mM Mg(C2H3O2)2, 0.1 mM
EDTA, 10 mM Tris–HCl pH 8.0, 1 mM DTT.
Solution c. 3 M sucrose, 5 mM Mg(C2H3O2)2, 0.1 mM
EDTA, 10 mM Tris–HCl pH 8.0, 1 mM DTT.
– Liquid Nitrogen.
– Ultrapure Water.
3. Equipment.
– Mortar and pestle.
– Cheesecloth or miracloth (filters).
– Centrifuge and disposable centrifuge tubes (50 and 15 mL).
– Vortex/microtube mixer.
– Micropipettes.
2.2 Chloroplast 1. Plant material

Isolation Approximately 1 g of needles (see Note 1).
All buffers must be made fresh just before performing
chloroplast isolation and kept on ice (4 C).
– Chloroplast Isolation Buffer (CIB): Sorbitol 0.35 mM,
HEPES-KOH 50 mM pH 7.4, EDTA 5 mM, MgCl2
5 mM, 15 mM β-mercaptoethanol, PMSF 0.5 mM, BSA
1% (w/v) (see Note 3).
– Discontinuous Percoll-Sucrose Gradient (DPSG) (see Note
4):
Solution a. 9 vol 3 M sucrose, 5 mM Mg(C2H3O2)2,
0.1 mM EDTA, 10 mM Tris–HCl pH 8.0, 1 mM
DTT, 1 vol CIB.
Solution b. Percoll 70% (v/v) diluted in CIB.
Solution c. CIB.
– Liquid nitrogen.
– Ultrapure water.
3. Equipment.
– Cheesecloth or miracloth (filters).
– Centrifuge and disposable centrifuge tubes (50 and 15 mL).
– Rotor–stator homogenizer.
– Pipettes.
2.3 Protein 1. Plant material

Extraction Nuclei or chloroplast purified pellets.
All buffers must be made fresh just before performing
protein extraction and kept on ice (4 C).
– Protein Extraction Bufffer (PEB): 100 mM Tris–HCl
pH 8.0, 5% SDS (w/v), 10% glycerol (v/v), (2 mM
PMSF), 10 mM DTT, 1.2% (v/v) plant protease inhibitor
cocktail (Sigma P9599) (see Note 5).
– Buffer Z (BZ): 1.5 M sucrose, 10 mM DTT, 1% (v/v) plant
protease inhibitor cocktail (Sigma, P9599).
– Protein Precipitation Solution (PPS): 0.1 M ammonium
acetate in methanol.
– Protein Solubilization Solution (PSS): 1.5% (w/v) SDS,
8 M urea/6 M urea, 2 M thiourea.
– SDS: 20% (w/v).
– Acetone.
– Methanol.
– Phenol.
3. Equipment.
– Microcentrifuge and disposable microcentrifuge tubes
(1.5 mL).
– Vortex/microtube mixer.
– Ultrasound bath.
– Pipettes.
3 Methods
3.1 Nuclei Isolation Unless otherwise specified, all steps of nuclei isolation must be
performed at 4 C
1. Cell lysis.
– Collect pine needles and keep them at 80 C or in liquid
nitrogen until use.
– Ground 1 g of plant material to fine powder using a mortar
and pestle in liquid nitrogen. Immediately transfer the pow-
der to a 50 mL tube with 10 mL of Organelle Extraction
Buffer (OEB) and mix gently by inversion.
– Incubate on ice for 30 min and mix it carefully by inversion
every 5 min to avoid the formation of clumps.
2. Nuclei isolation.
– Filter the solution through three layers of cheesecloth or

miracloth previously soaked with OEB.
– Centrifuge filtered samples 15 min at 3000 g in a swing-
ing rotor at 4 C and discard the supernatant.
– Resuspend the obtained pellet in 5 mL of Plastid Disruption
Buffer (PDB) by gently pipetting and incubate on ice
10 min (during this period mix by inversion every 2 min).
– Centrifuge for 10 min at 3000 g and 4 C and discard the
supernatant.
– Repeat this step until whitish pellets were obtained (see Note
6).
– Wash pellets with 8 mL of Washing Buffer (WB).
– Centrifuge at 3000 g 10 min and remove supernatant.
3. Nuclei fraction purification.
– Add 3 mL of previously cooled Discontinuous Sucrose Gra-
dient (DSG) solution c to a 12 mL tube and the overlay
3 mL of solutions b and a sequentially and keep gradient on
ice. At this point two sharp interfaces should be observed in
the tube.
– Resuspend obtained pellets carefully in 400 μL of Pellet
Suspension Buffer (PSB) and add them to gradient, centri-
fuge for 12 min at 3000 g. Intact nuclei solution is in the
interface between DSG solutions b and c.
– Harvest nuclei fraction and clean it with Pellet Suspension
Buffer (PSB) in a 1.5 mL disposable tube, centrifuge at
3000 g and discard the supernatant.
∗Pause point: At this point nuclei can be stored at 20 C.
3.2 Chloroplast 1. Cell lysis.

Isolation – Collect pine needles just before starting chloroplast isola-
tion and keep them on ice.
– Cut the needles in 2–3 mm pieces and immediately homog-
enize them in 12 mL of precooled Chloroplast Isolation
Buffer (CIB) using a rotor–stator homogenizer at
6500 min 1 for 20 s three times each.
– Clean rotor–stator system in a fresh tube with 8 mL of CIB.
Add this CIB with the homogenized needles. Mix both
homogenates and filter through four layers of cheese-
cloth/Miracloth.
2. Chloroplast isolation.
– Centrifuge filtered solution for 3 min at 200 g at 4 C in a
swinging rotor.
– Transfer supernatant to a new tube and centrifuge 20 min at

3000 g at 4 C.
– Discard the supernatant and wash the raw chloroplast pellet
with 10 mL of CIB.
– Repeat the centrifugation step and suspend the cleaned
pellet in 3 mL of CIB.
3. Chloroplast fraction purification.
– Add 3 mL of Discontinuous Percoll-Sucrose Gradient
(DPSG) solution a and then overlay 3 mL of DPSG solution
b. A sharp interface between the two layers should be
observed. Prepare this gradient in a 15 mL tube.
– Resuspend chloroplast pellet (Subheading 3.2, step 2) in
3 mL of CIB and carefully overlay the discontinuous gradi-
ent. Centrifuge 30 min at 3300 g at 4 C in a swinging
rotor with smooth acceleration–deceleration (see Note 7).
Intact chloroplasts are located in the lower phase (see Note
8). It is recommended to check by microscopy each
obtained layer composition.
– Recover the lower dark green phase of the gradient to a
12 mL tube and fill the tube with CIB mix gently by
inversion until homogeneous color is obtained and centri-
fuge it at 3000 g 10 min. Discard the supernatant.
∗Pause point: Obtained pellet consisting in intact chloroplasts
can be stored at 80 C.
3.3 Protein – Resuspend nuclei or chloroplast pellets in 300 μL of Protein

Extraction Extraction Buffer (PEB) and sonicate them for 15 s at 60%
amplitude (Hielcher UP200S). Then incubate in a vortex at
maximum speed for 15 min at room temperature. Add 100 μL
of 20% SDS to the sample tube. Incubate 2–5 min at 95 C
vortex in-between.
– Add 300 μL of Buffer Z (BZ) and 300 μL of phenol. Mix
vigorously, and centrifuge for 5 min at 17,000 g and room
temperature.
– After centrifugation, save phenolic (upper) phase and reextract
lower phase by adding 300 μL of phenol.
– Centrifuge again for 5 min at 17,000 g and merge both
phenolic phases.
– Clean obtained phenolic phases with BZ in the same way and
keep inly the upper phase.
– Add two volumes of Protein Precipitation Buffer and incubate
overnight at 20 C. White flakes of protein may be seen
immediately.
– Centrifuge the tubes and wash the protein pellet with acetone
twice.
– Dry the pellets at room temperature and dissolve them in the
minimum amount of Protein Solubilization Solution (PSS)
(30–40 μL or more) (see Note 9).
– Protein content can be quantified by BCA assay [12]. And the
enrichment in nuclear proteins can be assessed by 1-DE SDS-
PAGE by comparing total protein fraction with nuclei/chloro-
plast fraction.
4 Expected Results
The suitability of an untargeted proteomics protocol depends

heavily on two main factors, the obtained protein yield, understood
as the purified protein abundance per weight of starting material
and also, the diversity of obtained proteins.
Described protocols in this chapter resulted material-effective
and allowed for the identification of a wide variety of subcellular
proteins as summarized in Table 1.
5 Notes
1. Starting plant material can be fresh or frozen, for tissue

homogenization step frozen material should be disrupted
with liquid nitrogen mortar and pestle and for fresh material
a rotor–stator homogenizer. Both approaches are valid, but it is
preferred, when possible, the use of fresh material which will
lead to a better purification and then a higher protein yield.
2. NP-40 and DTT added just prior to use.
3. β-mercaptoethanol, PMSF, and BSA must be freshly added.
4. DTT added just prior to use.
5. Add DTT, PMSF, and protease inhibitor cocktail just
before use.
6. If pellets are still greenish after three washes it is recommended
increase the Triton concentration up to 3% of PDB).
Table 1
Expected results of protein extraction yield and identifications
Nucleus proteome Chloroplast proteome Total proteome

Protein yield 100 μg/g frozen needles 700 μg/g fresh needles –
Protein identification 1057 1342 3652
7. It is essential to use an automatic rate controller, if available, to

avoid mixing of the gradient layers in the deceleration; if this
feature is not available, it is recommended to disconnect the
brake or to manually decrease the rotor frequency.
8. The chloroplast density depends heavily on the amount of
starch in them; if the experimental system has varying starch
amounts, it has to be taken in consideration for intact chloro-
plast recovery.
9. Freeze and thaw the samples to help the protein to denaturalize
and get dissolved.
Acknowledgments
Our research group is generously funded by Spanish Ministry of

Science, Innovation, and Universities (AGL2016-77633-P and
AGL2017-83988-R). M.M., and L.L. were also supported by
Spanish Ministry of Science, Innovation, and Universities trough
Ramón y Cajal (RYC-2014-14981) and Spanish PhD training
(BES-2017-082092 to L.L.) programs. L.G. was supported by
Government of Principado de Asturias, (Spain) trough Severo
Ochoa program (BP19-146).
References
1. Hebert AS, Richards AL, Bailey DJ et al (2014) 7. Bouchnak I, Brugiere S, Moyet L et al (2019)
The one hour yeast proteome. Mol Cell Prote- Unraveling hidden components of the chloro-
omics 13:339–347 plast envelope proteome: opportunities and
2. Kislinger T, Gramolini AO, MacLennan et al limits of better MS sensitivity. Mol Cell Prote-
(2005) Multidimensional protein identifica- omics 18:1285–1306
tion technology (MudPIT): technical overview 8. Wu W, Yan Y (2018) Chloroplast proteome
of a profiling method optimized for the com- analysis of Nicotiana tabacum overexpressing
prehensive proteomic investigation of normal TERF1 under drought stress condition. Bot
and diseased heart tissue. J Am Soc Mass Spec- Stud 59:26
trom 16:1207–1220 9. Brown EC, Somanchi A, Mayfield SP (2001)
3. Pascual J, Alegre S, Nagler et al (2016) The Interorganellar crosstalk: new perspectives on
variations in the nuclear proteome reveal new signaling from the chloroplast to the nucleus.
transcription factors and mechanisms involved Genome Biol 2:REVIEWS1021
in UV stress response in Pinus radiata. J Pro- 10. Colombo M, Tadini L, Peracchio C et al
teome 143:390–400 (2016) GUN1, a jack-of-all-trades in chloro-
4. Millar AH, Taylor NL (2014) Subcellular plast protein homeostasis and signaling. Front
proteomics-where cell biology meets protein Plant Sci 7:1427
chemistry. Front Plant Sci 5:55 11. Zhao X, Huang J, Chory J (2019) GUN1
5. Goto C, Hashizume S, Fukao Y et al (2019) interacts with MORF2 to regulate plastid
Comprehensive nuclear proteome of Arabi- RNA editing during retrograde signaling.
dopsis obtained by sequential extraction. Proc Natl Acad Sci U S A 116:10162–10167
Nucleus 10:81–92 12. Smith PK, Krohn RI, Hermanson GT et al
6. Yin X, Komatsu S (2016) Plant nuclear prote- (1985) Measurement of protein using bicinch-
omics for unraveling physiological function. oninic acid. Anal Biochem 150:76–85
New Biotechnol 33:644–654
Chapter 6
Apoplastic Fluid Preparation from Arabidopsis thaliana

Leaves Upon Interaction with a Nonadapted Powdery
Mildew Pathogen
Ryohei Thomas Nakano, Nobuaki Ishihama, Yiming Wang, Junpei Takagi,
Tomohiro Uemura, Paul Schulze-Lefert, and Hirofumi Nakagami
Abstract
Proteins in the extracellular space (apoplast) play a crucial role at the interface between plant cells and their
proximal environment. Consequently, it is not surprising that plants actively control the apoplastic pro-
teomic profile in response to biotic and abiotic cues. Comparative quantitative proteomics of plant
apoplastic fluids is therefore of general interest in plant physiology. We here describe an efficient method
to isolate apoplastic fluids from Arabidopsis thaliana leaves inoculated with a nonadapted powdery mildew
pathogen.
Key words Apoplast, Protein secretion, Membrane trafficking, Arabidopsis thaliana, Cell wall
1 Introduction
The “extracellular space” of plant cells constitutes a compartment

external to the plasma membrane, including cell walls, the intercel-
lular space, and the apoplastic fluid. Given its interface location
between plant cells and their proximate environment, molecules
in this compartment are expected to play fundamental roles in
intrinsic developmental processes and adaptation to fluctuating
environmental conditions. It is therefore not surprising that the
extracellular compartment of plants contains a multitude of partly
diffusible small molecules and proteins as well as carbohydrate
polymers, the latter of which provide structure to plant cells. All
of these molecules must be transported from the cell interior to the
extracellular space. The transport mechanism itself is likely subject
to controlled activity to fulfill changes in supply and demand of the
extracellular compartment to developmental needs and for envi-
ronmental adaptation. The primary pathway to deliver proteins to
79
80 Ryohei Thomas Nakano et al.
the apoplast is mediated by a vesicular protein transport system, a

process referred to as membrane trafficking. Earlier work has shown
that in leaves the secreted protein profile (“secretome”) is depen-
dent on environmental conditions. For instance, biotic stress sti-
muli such as pathogen inoculation is known to trigger the secretion
of a specific set of inducible defense-related proteins, including
PATHOGENESIS RELATED-1 (PR-1) and PR-2 [1–3]. How-
ever, the mechanism(s) enabling plants to shift their secretome
profiles both locally and systemically are still incompletely
understood.
In order to comprehensively analyze the proteomic profile of
the secretome, several protocols have been described [1]. One
method takes advantage of suspension-cultured plant cells to col-
lect the culture supernatant as secretome fraction [4–7]. This
approach is technically straightforward, minimizes contamination
from damaged cells, and permits the collection of large volumes of
apoplastic fluid under defined nutritional stress conditions and/or
upon phytohormone treatment or application of ligands (elicitors)
that trigger innate immune responses. Another nondestructive
approach is to collect proteins in hydroponic culture media in
which plant seedlings are growing. This permits the identification
of proteins that are actively secreted in vivo from the whole organ-
ism consisting of diverse tissue types and organs [8]. These proteins
will comprise, besides apoplastically secreted polypeptides normally
found in the intercellular space in the interior of a plant organ, also
molecules that are exported to the plant surface, for example,
polypeptides present in root exudates. However, these approaches
are often not applicable for in-depth studies of interactions between
plants and microorganisms that are adapted to colonize above-
ground plant organs. On the other hand, vacuum infiltration cen-
trifugation (VIC) methods have been widely used for many plant
species to isolate apoplastic fluid from plants growing in natural soil
substrate [2, 7, 9–18]. The VIC method is essentially based on
collecting vacuum-infiltrated buffer solutions from plant tissues by
centrifugation, which is assumed to consist mainly of plant apoplas-
tic fluid. Experimental details can be modified according to a given
particular research question, for example by altering the infiltration
buffer’s contents and/or using different sample preparation proce-
dures before buffer infiltration. We here describe a method to
isolate leaf apoplastic proteins from Arabidopsis thaliana leaves
inoculated with a non-adapted powdery mildew pathogen for
subsequent quantitative proteomic analysis. The method described
here can be applied to investigate interactions of plant leaves with
microbes that have different lifestyles, for example, pathogens,
mutualists, or commensals.
Leaf Secretome of Pathogen-Challenged Arabidopsis Thaliana 81
2 Materials
Prepare all solutions using ultrapure water and analytical grade

reagents. Prepare and store all reagents at room temperature, unless
indicated otherwise.
2.1 Plant Growth 1. 70% ethanol in Milli-Q water.

and Pathogen 2. 99.9% ethanol (VWR Chemicals).
Inoculation
3. 1.5- or 2-mL microtubes.
4. Greenhouse soil (see Note 1).
5. Pots (9 9 9.5 cm3).
6. A. thaliana seeds.
7. Hordeum vulgare (cultivar Golden Promise) seedlings.
8. Blumeria graminis f. sp. hordei (Bgh) isolate K1 [19].
9. Controlled environment growth cabinet for A. thaliana
(Day—21 C, 10 h, 60% humidity/Night—21 C, 14 h,
60% humidity).
10. Controlled environment growth cabinet for H. vulgare
(Day—21 C, 16 h, 55% humidity/Night—21 C, 8 h,
55% humidity).
2.2 Apoplastic Fluid 1. Infiltration Buffer: 5 mM sodium acetate, 0.2 M calcium chlo-
Preparation ride, pH 4.3 (see Note 1). Store at 4 C.
2. Protease inhibitor cocktail, EDTA-free (Roche).
3. Flat-bottom glass beakers (300 mL and 200 mL).
4. A vacuum pump and a vacuum desiccator (see Note 2).
5. Paper towel.
6. Blunt-end needleless syringe (20 mL).
7. 50-mL low-polymer conical tubes (see Note 3).
8. 2-mL low-polymer microtubes (see Note 3).
2.3 Protein 1. Ultrafree-CL Centrifugal Filter 0.22 μm pore size (Merck

Purification by Millipore).
Chloroform–Methanol 2. 50-mL conical centrifuge tubes, compatible with chloroform
Precipitation (see Note 4).
3. 100% methanol (VWR Chemicals).
4. 99% chloroform (Merck Millipore).
5. Sterile Milli-Q water.
6. Vortex.
7. SpeedVac vacuum concentrator.
3 Methods
3.1 Plant Growth 1. To sterilize the seed surface, mix a scoop of A. thaliana seeds
and Pathogen (200~500 seeds) with 1 mL of 70% ethanol in a 1.5- or 2-mL
Inoculation microtube and shake or rotate for 10 min. Discard the ethanol
and briefly rinse the seeds with 99.9% ethanol. Leave on a
bench with its lid open for 30~60 min to dry the seeds (see
Note 5).
2. Sow a generous seed number (see Note 5) in 9 9 9.5 cm3
pots filled with greenhouse soil. Start plant cultivation in the
growth chamber for A. thaliana (see Note 6).
3. Keep watering the plants every 2–3 days. Avoid excessive water-
ing to minimize algae growth on soil surface.
4. After 2 weeks, remove the majority of seedlings so that each pot
harbors 4–5 plants of similar size. In case of plant dwarfism
(e.g., due to particular gene defects), this number can be
increased up to 10–25 plants/pot (see Note 7).
5. Continue plant cultivation under the same conditions for
another 2–3 weeks.
6. One week before pathogen inoculation, bulk up fresh coni-
diospores of Bgh isolate K1 on 7-day-old susceptible H. vulgare
seedlings and incubate in a growth chamber for barley
cultivation.
7. Inoculate Bgh conidiospores on A. thaliana plants by tapping
the leaf blades of infected barley plants (see Note 8). A subset of
plants is used as non-inoculated samples (0 hpi; hour post
inoculation) and proceed to preparation of apoplastic fluid
(see Note 8).
8. Incubate inoculated A. thaliana plants under the same condi-
tions for a desired time period.
3.2 Apoplastic Fluid 1. Collect entire shoots by cutting hypocotyls in a 300-mL glass
Preparation beaker containing 100 mL of the Infiltration Buffer, freshly
supplemented with two tablets of the Protease Inhibitor Cock-
tail (see Notes 9 and 10).
2. Submerge shoots in the infiltration buffer by placing on top an
empty 200-mL glass beaker (Fig. 1a).
3. Vacuum for 10 min and release gently. Vacuum release should
take longer than 10 min (see Note 11).
4. Remove excessive buffer on the leaf surface by blot drying on a
paper towel (Fig. 1b).
5. Introduce plant shoots in a 20-mL blunt-end needleless syringe
after removal of its plunger.
200-mL
beaker
300-mL
beaker
B C 20-mL blunt-end
needle-less syringe
1,000 g
20 min.
50-mL
conical tube
Fig. 1 Apoplast fluid preparation. (a) Buffer infiltration. A. thaliana shoots are submerged into the Infiltration
Buffer in a 300-mL flat-bottom beaker, and a 200-mL flat-bottom beaker is used as a weight during vacuum
infiltration. (b) Shoots are collected from the beaker, carefully flattened, and blot-dried on a paper towel. (c)
Shoots are introduced in a 20-mL blunt-end needleless syringe within a 50-mL conical tube. After centrifu-
gation, the apoplastic fluid will be extracted to the space between the syringe and the surrounding conical tube
6. Introduce the syringe into a 50-mL conical centrifuge tube

(Fig. 1c).
7. Centrifuge at 1000 g for 20 min at 4 C. The apoplastic fluid
will be collected at the bottom of the conical tube as a flow-
through (Fig. 1c). Residual plant material can be stored in a
separate microtube at 80 C as “total fraction.”
3.3 Protein 1. Apply obtained apoplastic fluid to a centrifuge tube with a

Purification by 0.22-μm filter column on ice and centrifuge at 12,000 g
Chloroform–Methanol for 2 min at 4 C.
Precipitation 2. Transfer the flow-through to a 50-mL conical tube that is
resistant to chloroform (see Note 12) on ice. Estimate the
amount of flow-through using a pipette (see Note 13).
3. Add 4 volumes of 100% methanol on ice and mix well by
vortexing.
4. Add 1 volume of 99% chloroform on ice and mix well by
vortexing. The mixed solution can be stored at 20 C for a
few hours.
5. Add 3 volumes of sterile Milli-Q water on ice and mix well by

vortexing.
6. Centrifuge at 15,000 g for 2 min at 4 C. Proteins will
accumulate at the liquid interface.
7. Remove the top aqueous layer and add 4 volumes of 100%
methanol on ice. Mix well by vortexing.
8. Centrifuge at 15,000 g for 2 min at 4 C.
9. Remove the supernatant such that the total amount is less than
2 mL.
10. Mix residual supernatant and precipitates by pipetting and
transfer to a new 2-mL microtube on ice.
12. Carefully reduce the amount of supernatant as much as possi-
ble (see Note 14).
13. Dry the pellet by a vacuum concentrator (e.g., SpeedVac) (see
Note 15).
14. Dissolve the pellet in an optimal buffer for protein digestion or
protein gel electrophoresis for subsequent proteomic analysis
(see Note 16). The pellet can be stored at 80 C until
subsequent analyses.
4 Notes
1. Any soils that are compatible to grow A. thaliana and

H. vulgare can be used. We normally use “Mini Tray” soils
(EINHEITS ERDE) without additional fertilization.
2. Any conventional vacuum diaphragm pumps and compatible
desiccators, with which the vacuum is able to be released gently,
can be used. A glass vacuum desiccator of ~25-cm diameter is
able to accommodate 5–6 samples at once (see Fig. 1a).
3. To obtain a high signal-to-noise ratio in the mass spectrometer,
it is crucial to avoid polymer contaminations derived from
plastics throughout the procedures. Toward this end, we rec-
ommend the use of Eppendorf 50-mL Conical Tubes and
SARSTEDT SafeSeal 2-mL microtubes.
4. To avoid physical damage caused by chloroform on the conical
tubes made of polypropylene (PP), an alternative chloroform-
tolerant material such as fluorocarbon polymers (FEP) should
be used. We use Nalgene® Oak Ridge Centrifuge Tube FEP
(Sigma-Aldrich).
5. Although surface-sterilized seeds can be stored for weeks up to
months, we recommend freshly sterilized seeds for every use as
seed vigor and germination rate rapidly decreases during
genotype A genotype B genotype C genotype D genotype E genotype F
genotype F genotype A genotype B genotype C genotype D genotype E
genotype E genotype F genotype A genotype B genotype C genotype D
genotype D genotype E genotype F genotype A genotype B genotype C
Fig. 2 Randomized design of plant cultivation. Pots need to be randomized when multiple genotypes are to be
compared. A schematic example for six genotypes in a single technical replicate (four pots per replicate) is
shown
storage. In most cases stratification is not essential for germi-

nating A. thaliana seeds but can be applied (e.g., for old
seeds). The amount of seeds needed for an experiment depends
on the growth and germination rate of each plant line to be
tested. We typically sow at least five times more seeds than the
necessary number of plants (e.g., 25~30 seeds to have five
plants of similar size).
6. At least four to five technical replicates (i.e., independent pro-
tein samples) must be prepared. When multiple plant geno-
types are included in an experiment, grow all genotypes in the
same tray in a randomized design (Fig. 2). When multiple
treatments are to be compared, the plants of all treatments
within a technical replicate must be grown in the same plant
growth chamber, ideally side-by-side on the same shelf.
7. For one protein sample, we normally use 16 5-week-old plants
growing in four separate pots. This may be scaled up if plants
are expected to be smaller, for instance due to genetic dwarfism
or upon abiotic/biotic stress treatment.
8. Bgh conidiospores can be easily detached from infected barley
plants by tapping the leaf blades directly above A. thaliana
rosettes. Conidiospores will settle immediately on the surface
of target leaves by gravity. A settling tower can be used for a
more efficient and homogeneous conidiospore
inoculation [20].
9. We avoid washing plant leaves before buffer infiltration to

collect also proteins that are secreted to the leaf surface. This
is relevant because Bgh is a leaf epiphyte and attacks exclusively
leaf epidermal cells. When apoplastic fluid within the leaf organ
is the primary target, a thorough washing step can be added
before submerging the leaves in the Infiltration Buffer.
10. Samples to be directly compared within a technical replicate
should be processed jointly at this step and throughout the
following steps.
11. Ensure that all plants are submerged in the buffer. Vacuum
must be released very slowly and gently, otherwise plant cells
will collapse and this will increase contamination with cytoplas-
mic proteins.
12. Buffer composition can be optimized depending on the target
proteins. For instance, calcium chloride is required for extract-
ing cell wall-associated proteins but can be omitted if these
proteins are not of interest. Alternative pH values can also be
used, but it should be noted that the plant extracellular space is
usually around pH 5.0 or below [21]. It was reported that
infiltration with sodium acetate buffer (pH 4.3) best performs
to avoid contamination with cytoplasmic proteins [18, 22].
13. The amount of flow-through depends on the number/size of
the plants. From 16 5-week-old A. thaliana shoots, we typi-
cally obtain 0.5–2.0 mL of apoplastic fluids.
14. Protein precipitate after methanol–chloroform extraction is
usually very fragile and disruption of the precipitate decreases
the final protein yield. To minimize the yield loss, we do not
remove the whole amount of supernatant but rather decrease
its amount by pipetting and use SpeedVac concentrator to
completely avoid the residual liquid.
15. The time period required for complete drying of protein pellets
depends mainly on the amount of residual liquid.
16. After untargeted proteomic analysis, we typically quantify
thousands of protein groups and 25–30% of them contain
predicted N-terminal signal peptides [3]. The rest of the pro-
teins may be delivered into the apoplast via noncanonical secre-
tory pathway(s) or represent contaminations, for example, due
to physical cell disruption. Whether proteins lacking a
N-terminal signal peptide should be considered as true signals
or are removed in silico prior to a deeper computational analy-
sis depends on the specific hypotheses to be tested.
Acknowledgments
This work was supported by Ministry of Education, Culture,

Sports, Science, and Technology of Japan Grants-in-Aid for Scien-
tific Research to T. Uemura (No. 15H04627), by the Asahi Glass
Foundation to T. Uemura, by the Max Planck Society to P.S.-L.
and H. N., and by the “Cluster of Excellence on Plant Sciences
(CEPLAS)” program funded by the Deutsche Forschungsge-
meinschaft (DFG) to P.S.-L.
References
1. Delaunois B, Jeandet P, Clément C et al (2014) proteins and cell wall modifying enzymes.
Uncovering plant-pathogen crosstalk through BMC Plant Biol 1:24
apoplastic proteomic studies. Front Plant Sci 11. Konozy EHE, Rogniaux H, Causse M, Fauro-
5:249 bert M (2013) Proteomic analysis of tomato
2. Uemura T, Nakano RT, Takagi J et al (2019) A (Solanum lycopersicum) secretome. J Plant Res
Golgi-released subpopulation of the trans- 126:251–266
Golgi network mediates protein secretion in 12. Wen F, VanEtten HD, Tsaprailis G, Hawes MC
arabidopsis. Plant Physiol 179:519–532 (2007) Extracellular proteins in pea root tip
3. Ruhe J, Agler MT, Placzek A et al (2016) and border cell exudates. Plant Physiol
Obligate biotroph pathogens of the genus 143:773
albugo are better adapted to active host defense 13. Delannoy M, Alves G, Vertommen D et al
compared to niche competitors. Front Plant (2008) Identification of peptidases in Nicoti-
Sci 7:820 ana tabacum leaf intercellular fluid. Proteo-
4. Kusumawati L, Imin N, Djordjevic MA (2008) mics 8:2285–2298
Characterization of the secretome of suspen- 14. Soares NC, Francisco R, Ricardo CP, Jackson
sion cultures of Medicago species reveals pro- PA (2007) Proteomics of ionically bound and
teins important for defense and development. J soluble extracellular proteins in Medicago trun-
Proteome Res 7:4508–4520 catula leaves. Proteomics 7:2070–2082
5. Oh IS, Park AR, Bae MS et al (2005) Secre- 15. Witzel K, Shahzad M, Matros A et al (2011)
tome analysis reveals an Arabidopsis lipase Comparative evaluation of extraction methods
involved in defense against Alternaria brassici- for apoplastic proteins from maize leaves. Plant
cola. Plant Cell 17:2832 Methods 7(1):48
6. Okushima Y, Koizumi N, Kusano T, Sano H 16. Agrawal GK, Jwa N-S, Lebrun M-H et al
(2000) Secreted proteins of tobacco cultured (2010) Plant secretome: unlocking secrets of
BY2 cells: identification of a new member of the secreted proteins. Proteomics 10:799–827
pathogenesis-related proteins. Plant Mol Biol 17. Casasoli M, Spadoni S, Lilley KS et al (2008)
42:479–488 Identification by 2-D DIGE of apoplastic pro-
7. Cho WK, Chen XY, Chu H et al (2009) Pro- teins regulated by oligogalacturonides in Ara-
teomic analysis of the secretome of rice calli. bidopsis thaliana. Proteomics 8:1042–1054
Physiol Plant 135:331–341 18. Wang Y, Kim SG, Wu J et al (2014) Differential
8. Waghmare S, Lileikyte E, Karnik R et al (2018) proteome and secretome analysis during rice-
SNAREs SYP121 and SYP122 mediate the pathogen interaction. Methods Mol Biol
secretion of distinct cargo subsets. Plant 1072:563–572
Physiol 178:1679–1688 19. Zhou F, Kurth J, Wei F et al (2001) Cell-
9. Lohaus G, Pennewiss K, Sattelmacher B et al autonomous expression of barley Mla1 confers
(2001) Is the infiltration-centrifugation tech- race-specific resistance to the powdery mildew
nique appropriate for the isolation of apoplastic fungus via a Rar1-independent signaling path-
fluid? A critical evaluation with different plant way. Plant Cell 13(2):337–350
species. Physiol Plant 111:457–465 20. Adam L, Somerville SC (1996) Genetic char-
10. Delaunois B, Colby T, Belloy N et al (2013) acterization of five powdery mildew disease
Large-scale proteomic analysis of the grapevine resistance loci in Arabidopsis thaliana. Plant J
leaf apoplastic fluid reveals mainly stress-related
9(3):341–356. https://doi.org/10.1046/j. 22. Sehrawat A, Deswal R (2014) S-nitrosylation

1365-313X.1996.09030341.x analysis in Brassica juncea apoplast highlights
21. Barbez E, Dünser K, Gaidora A et al (2017) the importance of nitric oxide in cold-stress
Auxin steers root cell expansion via apoplastic signaling. J Proteome Res 13(5):2599–2619.
pH regulation in Arabidopsis thaliana. Proc https://doi.org/10.1021/pr500082u
Natl Acad Sci U S A 114(24):E4884–E4893.
https://doi.org/10.1073/pnas.1613499114
Chapter 7
Shotgun Proteomics of Plant Plasma Membrane

and Microdomain Proteins Using Nano-LC-MS/MS
Daisuke Takahashi, Bin Li, Takato Nakayama, Yukio Kawamura,
and Matsuo Uemura
Abstract
Shotgun proteomics allows for the comprehensive analysis of proteins extracted from plant cells, subcellular
organelles, and membranes. Previously, two-dimensional gel electrophoresis-based proteomics was used for
mass spectrometric analysis of plasma membrane proteins. However, this method is not fully applicable for
highly hydrophobic proteins with multiple transmembrane domains. In order to solve this problem, we
here describe a shotgun proteomics method using nano-LC-MS/MS for proteins in the plasma membrane
and plasma membrane microdomain fractions. The results obtained are easily applicable to label-free
protein semiquantification.
Key words Plasma membrane, Detergent-resistant membrane, Microdomain, Nano-LC-MS/MS,

Shotgun proteomics, Label-free semiquantification, In-gel digestion, In-solution digestion
1 Introduction
Comprehensive protein identification involves solubilization and

preseparation of proteins, peptide digestion and fragmentation
using trypsin, and separation and detection of each peptide with
liquid chromatography–tandem mass spectrometer (LC-MS/MS)
[1–4]. Compared with soluble proteins, the preparation steps for
proteomics of cellular membranes including the plasma membrane
(PM) is difficult because of a large number of the highly hydropho-
bic properties of the proteins and its highly hydrophobic lipid
environments [5–7]. Although membrane proteomics has been
performed by two-dimensional gel electrophoresis (2-DE)–based
proteomics [5, 7, 8], PM proteins are particularly difficult to solu-
bilize and 2-DE–based proteomics requires a large amount of valu-
able PM proteins [9]. In addition, microdomains, which are
considered to exist as extremely hydrophobic compartments in the
PM because of the enrichment of specific lipids and proteins, have
been recognized by many researchers because of their involvement
89
90 Daisuke Takahashi et al.
in important cellular process such as signal transduction, membrane

trafficking, and pathogen infection. However, microdomains,
which are often isolated as detergent-resistant membrane (DRM)
fractions, are even more intractable to comprehensive proteomics
study due to their highly hydrophobic characteristics [10–15]. The
hydrophobicity of the membranes has been a challenge for research-
ers to obtain a comprehensive view of their proteomic profiles.
Liquid chromatography and mass spectrometry technologies
have advanced rapidly, providing higher resolution and reliable
results for a huge amount of peptides [16, 17]. In particular,
nano-flow reverse phase liquid chromatography allows for the sep-
aration of proteins without preseparation using 1-DE or 2-DE
[17]. Here, we describe PM and DRM protein preparation meth-
ods that are adapted to nano-LC-MS/MS-based shotgun proteo-
mics using two sample preparation methods, “in-gel” and “in-
solution” peptide digestion. We applied this method for the aerial
parts of oat plants as an example and the method described are
applicable to other plants such as rye [18], Arabidopsis [19], and
Brachypodium distachyon (data not shown). In the “In-gel diges-
tion protocol,” solubilized PM and DRM proteins were applied to
sodium dodecyl sulfate (SDS)–polyacrylamide gel to remove non-
proteinaceous materials and subsequently subjected to tryptic
digestion. In the “In-solution digestion protocol,” we use an
MPEX PTS reagent kit (GL Sciences, Tokyo, Japan), which has
been widely used for solubilization of proteins in mammalian and
bacteria PMs, such as HeLa cells and Escherichia coli cells, respec-
tively [20]. Using these methods, thousands of PM and DRM
proteins have been consistently identified, including highly hydro-
phobic proteins with multiple transmembrane domains. Further-
more, these data can be used for protein quantification.
2 Materials
All solutions are prepared using ultrapure water (prepared by pur-

ifying deionized water to attain a sensitivity of 18.2 MΩ cm at
24 C) and analytical grade reagents. All reagents included in this
chapter are prepared and stored at room temperature (unless indi-
cated otherwise). Diligently follow all waste disposal regulations
when disposing of waste materials.
2.1 Plasma Ultrapure water, the Polytron homogenizer, centrifuge, and ultra-
Membrane Purification centrifuge rotors should be kept and used at 4 C.
Components
1. Homogenizing medium: 0.5 M sorbitol, 50 mM Mops-KOH,
pH 7.6, 5 mM EGTA (pH 8.0), 5 mM EDTA (pH 8.0), 5%
(w/v) polyvinylpyrrolidone-40 (molecular weight 40,000),
0.5% (w/v) bovine serum albumin (BSA), 2.5 mM
Shotgun Proteomics of Plasma Membrane 91
phenylmethanesulfonyl fluoride (PMSF), 4 mM salicylhy-

droxamic acid (SHAM), 2.5 mM 1,4-dithiothreitol (DTT).
Store at 4 C (see Note 1).
2. Microsome (MS)-suspension medium: 10 mM KH2PO4/
K2HPO4 (K-P) buffer (pH 7.8), 0.25 M sucrose. Store at
4 C (see Note 2).
3. NaCl medium: Add 1.17 g of NaCl to 180 mL MS suspension
medium and stir moderately using a stirring bar. Add MS
suspension medium up to 200 mL with a graduated cylinder.
Store at 4 C (see Note 3).
4. Plasma membrane (PM)-suspension medium: 10 mM Mops-
KOH (pH 7.3), 2 mM EGTA (pH 8.0), 0.25 M sucrose. Store
at 4 C (see Note 4).
5. Two phase partition medium: Weigh 1.45 g of polyethylene
glycol 3350 and 1.45 g dextran in a 40 mL centrifuge tube.
Add 9.3 mL MS suspension medium and 7.3 mL NaCl
medium to the centrifuge tube and mix well by shaking. Incu-
bate at 4 C overnight to completely dissolve the polymers (see
Note 5). Prepare three tubes per sample for repeating the two
phase partition in order to increase the purity of the PM
fraction.
6. Bio-Rad Protein Assay Kit (Bio-Rad Laboratories, CA, USA):
Store at 4 C.
2.2 Detergent- Ultracentrifuge rotors should be prechilled at 4 C.

Resistant Membrane
1. TED buffer: 50 mM Tris–HCl (pH 7.4), 3 mM EGTA
Extraction
(pH 8.0), 1 mM DTT. This buffer should be freshly prepared.
Components
2. 10% (w/v) Triton X-100 buffer: Add 1 g of Triton X-100 to
TED buffer and then adjust to 10 mL volume. Shake the
Triton X-100 buffer with a shaker for 3 h to completely dis-
solve Triton X-100. This buffer should be freshly prepared.
3. 65%, 48%, 35%, 30% and 5% (w/w) sucrose solution (in TED
buffer): Weigh 65 g, 24 g, 17.5 g, 15 g, and 2.5 g of sucrose
and dissolve in 35 g, 26 g, 32.5 g, 35 g, and 47.5 g of TED
solution, respectively. These solutions should be freshly
prepared.
2.3 In-Gel Tryptic 1. Running gel solution: 1.5 M Tris–HCl, pH 8.8. Add approxi-
Digestion mately 900 mL water to a 1 L glass beaker. Add 181.7 g Tris
and stir moderately using a stirring bar. After Tris is completely
2.3.1 SDS–
dissolved, adjust pH with HCl using a pH meter. Add water up
Polyacrylamide Gel
to 1 L with a graduated cylinder (see Note 6).
Components
2. SDS sample buffer (2): 2% (w/v) SDS, 50 mM Tris–HCl
(pH 6.8), 6% (v/v) β-mercaptoethanol, 10% (w/v) glycerol,
0.001% (w/v) bromophenol blue (BPB). Store at 4 C for

current use or at 30 C for long-term storage (see Note 7).
3. TGS running buffer: 0.025 M Tris, 0.188 M glycine, 0.1%
(w/v) SDS (see Note 8).
4. 30% (w/v) acrylamide solution: Weigh 29 g of acrylamide
monomer and 1 g methylenebisacrylamide. Add to 50 mL
water in 100 mL glass beaker and stir moderately using a
stirring bar. After the solids have completely dissolved, add
water up to 100 mL with a graduated cylinder and store at
4 C, with protection from light using a light-shielding bottle
or wrapping with aluminum foil (see Note 9).
5. 10% (w/v) SDS: Add 10 g of SDS to 50 mL water and stir
moderately using a stirring bar. Add water up to 100 mL with a
graduated cylinder.
6. 10% (w/v) ammonium persulfate: Add 1 g of ammonium
persulfate to 8 mL water and stir moderately using a stirring
bar. Add water up to 10 mL with a graduated cylinder. Store at
4 C for current use or at 30 C for long-term storage.
7. N,N,N,N0 -tetramethyl-ethylenediamine (TEMED) (Wako
Pure Chemical Industries, Tokyo, Japan). Store at 4 C.
2.3.2 Tryptic Digestion All of these processes should be carefully performed at a clean
Components bench with gloves and clean lab coat throughout to avoid contami-
nation by keratin, dust, and other exogenous proteinaceous
materials.
1. Fixation solution: Mix 50 mL of water, 40 mL of ethanol
(99.5%, w/w) and 10 mL of acetic acid (100%, w/w).
2. 0.1 M ammonium bicarbonate: Weigh 3.95 g of ammonium
bicarbonate and add to 400 mL of water in a glass beaker. Stir
moderately using a stirring bar and add water up to 500 mL
with a graduated cylinder. Store at 4 C (see Note 10).
3. Acetonitrile (LC-MS grade) (Wako Pure Chemical Industries,
Tokyo, Japan). Store at 4 C (see Note 10).
4. 25 mM ammonium bicarbonate/50% (v/v) acetonitrile: Mix
50 mL of acetonitrile (LC-MS grade), 25 mL of 0.1 M ammo-
nium bicarbonate, and 25 mL of water. Store at 4 C (see Note
10).
5. Reduction buffer: Weigh 7.7 mg DTT and add to 5 mL of
0.1 M ammonium bicarbonate in a conical tube (see Note 11).
6. 55 mM iodoacetamide (IAA)/0.1 M ammonium bicarbonate:
Weigh 51 mg IAA and add to 5 mL of 0.1 M ammonium
bicarbonate in a conical tube (see Note 11).
7. Protease solution: Add 2 mL of 0.1 M ammonium bicarbonate
into a vial containing 20 μg of trypsin (Sequence grade
modified, Promega KK, Tokyo, Japan) and mix well. Store at

20 C.
8. 5% (v/v) trifluoroacetic acid (TFA)/50% (v/v) acetonitrile:
Mix 450 μL of water and 500 μL of acetonitrile in a 1.5 mL
microtube. Quickly add 50 μL of TFA into the solution and
mix well (see Note 11).
9. 0.1% (v/v) TFA: Quickly add 1 μL of TFA into 999 μL of water
and mix well (see Note 11).
2.4 In-Solution In order to avoid keratin, dust, and other exogenous proteinaceous
Tryptic Digestion materials, all of these processes must be carefully performed at a
clean bench with gloves and a clean lab coat throughout the
in-solution tryptic digestion.
1. MPEX PTS reagents kit (GL Sciences, Tokyo, Japan): Make
solution B, DTT solution, IAA solution, and trypsin solution
according to the manufacturer’s instruction manual. Only
solution B should be stored at 4 C. Prepare fresh DTT solu-
tion, IAA solution, and trypsin solution immediately
before use.
2. 5% (v/v) acetonitrile/0.1% (v/v) TFA: Add 50 μL of acetoni-
trile and 1 μL of TFA into 949 μL of water and mix well (see
Note 11).
3. Pierce BCA protein assay kit (Thermo Fisher Scientific, MA,
USA): Store at room temperature.
2.5 Peptide 1. SPE C-TIP T-300 (Nikkyo Technos Co., Ltd., Tokyo, Japan).
Purification 2. 1.5 mL microtubes: Make a hole of 3 mm in diameter on the
Components cap with a soldering iron. Prepare two tubes per sample (Two
microtubes with a hole on the lid per one sample preparation
will be used in two steps of the peptide purification, Subhead-
ing 3.4, steps 1 and 6, below).
3. Solution A: Add 800 μL of acetonitrile and 5 μL of TFA into
195 μL of water and mix well (see Note 11).
4. Solution B: Add 40 μL of acetonitrile and 5 μL of TFA into
955 μL of water and mix well (see Note 11).
5. 0.1% (v/v) TFA: Quickly add 1 μL of TFA into 999 μL of water
and mix well (see Note 11).
3 Methods
Wear gloves and a clean lab coat throughout the processes to avoid
contamination by keratin, dust and other exogenous proteinaceous
materials. It is preferable to use low protein absorption microtubes
at all stages.
Fig. 1 Schematic outline of the plasma membrane preparation. All steps from harvesting leaves to suspending
the purified plasma membrane fractions are described
3.1 Plasma Perform all steps on crushed ice (unless indicated otherwise). Sche-
Membrane Purification matic outline of the procedure is described in Fig. 1.
1. Cut out the aerial parts of oat seedlings and weigh the samples
(10–70 g in fresh weight is suitable for the plasma membrane
purification). Put the harvested plants on a plastic container
and wash with 500 mL of chilled water. Wash twice. Drain on a
paper towel and put on crushed ice.
2. Cut into small pieces with razor blades. Immediately, put into
four volumes of chilled homogenizing medium and mix well
with a spatula. The homogenizing medium containing the
samples should be again cooled on crushed ice (see Note 12).
3. Homogenize with a chilled Polytron generator (PT10SK,
Kinematica Inc., Lucerne, Switzerland) until the samples are
broken down into tiny pieces (speed 6 for 60–90 s). Filter the
homogenates through four layers of gauze and squeeze tightly.
Put the filtrates into 40 mL centrifuge tubes and balance

them in pairs. Centrifuge at 10,000 g for 15 min with a
chilled rotor to remove debris and heavy membrane fractions
(see Note 13).
4. Transfer the supernatants into ultracentrifuge tubes by decan-
tation. Centrifuge at 231,000 g for 50 min with a chilled
ultracentrifuge rotor to precipitate the microsome fractions.
Discard supernatants by decantation.
5. Add an aliquot of MS suspension medium (approximately
0.5–1.0 mL) to each tube and homogenize the pellets with a
Teflon–glass homogenizer. Collect the microsomal suspensions
with a large Pasteur pipette into ultracentrifuge tubes. Balance
ultracentrifuge tubes in pairs with MS suspension medium.
6. Ultracentrifuge at 231,000 g for 50 min as described in step
4. Put 5 mL of MS suspension medium in a Teflon–glass
homogenizer and mark the water surface on the glass homog-
enizer as an indication of 5 mL volume. After centrifugation,
discard the supernatant with an aspirator.
7. Put 2 mL of MS suspension medium and break up the pre-
cipitated pellet with a glass rod. Transfer into a Teflon–glass
homogenizer using a large Pasteur pipette. Put 2 mL of MS
suspension medium into the tube and pipet up and down to
break up the remaining pellet. Transfer into a Teflon–glass
homogenizer and add MS suspension medium to 5 mL.
Homogenize well with an electric Teflon–glass homogenizer
(moving up and down five times) with cooling on ice (see Note
14).
8. Put all of the homogenized sample in a centrifuge tube con-
taining two-phase partition medium (tube A). Put 5 mL of MS
suspension medium two other two-phase partition systems
(tubes B and C). Chill on crushed ice for 10 min. During this
time, mix well every 2 min.
9. Centrifuge tubes A and B at 650 g for 5 min in a chilled rotor.
Two phases should be observed to have settled in tubes A and
B. Discard the upper phase of tube B with a Pasteur pipette
and transfer the upper phase of tube A into tube B. Chill on
crushed ice for 10 min. During this time, mix well every 2 min
(see Note 15).
10. Centrifuge tubes B and C at 650 g for 5 min in a chilled
rotor. Discard the upper phase of tube C with a Pasteur pipette
and transfer the upper phase of tube B into tube C. Balance
tube C with another centrifuge tube filled with water. Chill on
crushed ice for 10 min. During this time, mix well every 2 min
(see Note 15).
11. Centrifuge at 650 g for 5 min and split the resultant upper
phase of tube C into two ultracentrifuge tubes. Fill up the
tubes with PM suspension medium and balance them. Ultra-
centrifuge at 231,000 g for 50 min, as described in step 4 (see
Note 15).
12. Discard the supernatant with an aspirator. Add an appropriate
quantity of PM suspension medium to each tube and homoge-
nize the pellets with a Teflon–glass homogenizer. Collect the
plasma membrane suspensions with a large Pasteur pipette into
ultracentrifuge tubes. Balance ultracentrifuge tubes in pairs
with PM suspension medium. Ultracentrifuge again at
231,000 g for 35 min.
13. Discard the supernatant with an aspirator. Add a minimal
quantity of PM-suspension medium to the plasma membrane
pellets. Homogenize the pellets with a glass rod. Transfer into a
Teflon–glass homogenizer and homogenize well using an elec-
tric Teflon–glass homogenizer (moving up and down five
times) with cooling on ice. Transfer into a 1.5 mL microtube.
14. Measure the protein content using the Bradford assay (Bio-Rad
Protein Assay Kit). Use 10 μg of protein for tryptic digestion
and LC-MS/MS analysis. The remaining PM fractions should
be frozen in liquid nitrogen immediately and stored at 80 C.
3.2 Detergent- Perform all steps on crushed ice (unless indicated otherwise).
Resistant Membrane
1. Prepare PM with approximately 2.5 mg protein and dilute with
Extraction PM-suspension medium in an ultracentrifuge tube. After bal-
ancing the tubes in pairs, ultracentrifuge at 231,000 g for
35 min (see Note 16).
2. Add 2000 μL of PM-suspension medium in an ultracentrifuge
tube and grind pellets with a glass rod. Transfer into a Teflon–
glass homogenizer and homogenize well using an electric
homogenizer (moving up and down five times). Measure the
protein content by Bradford assay and place PM samples with
2 mg of protein into a 35 mL swing rotor tube. Adjust the
volume to 2.7 mL by adding PM suspension medium.
3. Add 300 μL of 10% (w/v) Triton X-100 buffer and mix well
(at this point, protein–detergent ratio is 1:15). Incubate for
30 min.
4. Add 12 mL of 65% (w/w) sucrose solution and mix well (at this
point, the final concentration of sucrose is 52%). Overlay 5 mL
of 48%, 35%, 30%, and 5% (w/w) sucrose solution slowly in
sequence (see Note 17).
5. Balance the swing rotor tubes in pairs by adding 5% (w/w)
sucrose solution and ultracentrifuge in a swing rotor at
141,000 g for 20 h.
6. DRMs will be visible as a white layer at the interface of the

35%/48% (w/w) sucrose solution. Recover the white layer and
place it in an ultracentrifuge tube. Dilute with TED buffer and
balance them in pairs. Ultracentrifuge (w/w) at 231,000 g
for 35 min (see Note 18).
7. Discard the supernatant. Add an appropriate quantity of PM
suspension medium to each tube and homogenize the pellets
with a Teflon–glass homogenizer. After balancing in pairs,
ultracentrifuge at 231,000 g for 35 min.
8. Discard the supernatant with an aspirator. Add a minimal
quantity of PM-suspension medium to the sample tube.
Break up the pellets with a glass rod, transfer into a Teflon–
glass homogenizer and homogenize well using an electric Tef-
lon–glass homogenizer (moving up and down five times).
Transfer into a 1.5 mL microtube. The DRM fraction should
be frozen in liquid nitrogen immediately and stored at 80 C.
3.3 In-Gel Tryptic 1. Mix 2.5 mL running gel solution, 3.35 mL 30% (w/v) acryl-
Digestion amide solution, and 3.95 mL water in a conical flask. Degas
with a vacuum pump for 5 min. Add 100 μL of 10% (w/v)
3.3.1 SDS–
ammonium persulfate, 100 μL of 10% (w/v) SDS and 5 μL of
Polyacrylamide Gel
TEMED. Cast gel into a 90 mm (W) 83 mm (H) 1 mm
Electrophoresis
(T) gel cassette immediately. Insert a 14-well comb without
introducing air bubbles. Incubate at room temperature for 1 h
(see Note 19).
2. Mix 5 μg of PM or DRM protein samples (within 10 μL) and
equal volume of SDS sample buffer. Vortex and centrifuge
tubes briefly. Heat at 95 C for 5 min. Centrifuge and cool to
room temperature.
3. Wash out the wells by pipetting up and down. Slowly load the
samples onto the gel. Electrophorese at 100 V until the upper
end of sample dye band enters 2 mm from the well (see Note
20).
4. Pry the gel plates open with a knife. Cut out the gel slice from
the well to 2 mm in front of the BPB dye with a scalpel on a
glass plate. Cut the gel slice into four equal pieces (Fig. 2) and
put into 1.5 mL microtubes (see Note 21).
5. Add 200 μL of fixation solution and agitate for 10 min. Cen-
trifuge briefly and discard the supernatant. Repeat these steps
twice.
3.3.2 In-Gel Tryptic All of these procedures should be performed at room temperature
Digestion for Nano-LC-MS/ (unless otherwise specified).
MS
1. Add 200 μL of water and agitate for 10 min. Centrifuge briefly
and discard the supernatant.
Fig. 2 Excision of a protein band from gel. Wells are separated to avoid
contamination of different samples during loading and electrophoresis. After
the BPB dye migrates into a gel (about 2 mm), a 4 mm of gel piece centered on
the BPB dye band is cut out. Subsequently, the gel piece is cut into four equal
pieces. Each of the gel pieces is separately put into 1.5 mL microtubes
2. Add 400 μL of 25 mM ammonium bicarbonate/50% (v/v)

acetonitrile and agitate for 10 min. Centrifuge briefly and
discard the supernatant.
3. Add 200 μL of acetonitrile and incubate at room temperature
for 5 min. Centrifuge briefly and discard the supernatant (see
Note 22).
4. Add 100 μL of 0.1 M ammonium bicarbonate and centrifuge
briefly. Incubate at room temperature for 5 min (see Note 23).
5. Add 100 μL of acetonitrile and centrifuge briefly. Incubate at
room temperature for 15 min. Centrifuge briefly and discard
the supernatant (see Note 24).
6. Dry out the gel samples using a centrifugal concentrator for
7. Add 100 μL of reduction buffer and centrifuge briefly. Incu-
bate at 56 C for 45 min. Discard the supernatant.
8. Add 100 μL of 55 mM IAA/0.1 M ammonium bicarbonate
and centrifuge briefly. Incubate in the dark at room tempera-
ture for 30 min. Discard the supernatant.
9. Add 200 μL of water and agitate for 10 min. Centrifuge briefly

and discard the supernatant.
10. Add 400 μL of 25 mM ammonium bicarbonate/50% (v/v)
acetonitrile and agitate for 10 min. Centrifuge briefly and
discard the supernatant.
11. Add 200 μL of acetonitrile and incubate at room temperature
for 5 min. Centrifuge briefly and discard the supernatant (see
Note 22).
12. Add 100 μL of 0.1 M ammonium bicarbonate and centrifuge
briefly. Incubate at room temperature for 5 min (see Note 23).
13. Add 100 μL of acetonitrile and centrifuge briefly. Incubate at
room temperature for 15 min. Centrifuge briefly and discard
the supernatant (see Note 24).
14. Dry out the gel samples using centrifugal concentrator for
15. Cool to room temperature and put on ice. Add 25 μL of
protease solution to each tube and centrifuge the tubes briefly.
Incubate on ice for 45 min (see Note 23).
16. Discard the supernatant and add 100 μL of 0.1 M ammonium
bicarbonate. Centrifuge the tubes briefly and incubate at 37 C
for 20 h.
17. Agitate for 15 min and add 100 μL of acetonitrile. Agitate for
15 min and collect the supernatant (see Note 24).
18. Add 5% (v/v) TFA/50% (v/v) acetonitrile and agitate for
15 min. Centrifuge the tubes briefly. Collect the supernatant
in tubes, as described in step 17 (see Note 24).
19. Dry out the collected supernatant using a centrifugal concen-
trator for 1 h. Add 30 μL of 0.1% TFA. Store at 30 C (see
Note 26).
3.4 In-Solution All of these procedures should be performed at room temperature

Tryptic Digestion (unless otherwise specified).
1. Precipitate 100 μg of PM or DRM protein using an ultracen-
trifuge (231,000 g, 4 C, 50 min).
2. Add solution B to the centrifuge tubes and homogenize with a
Teflon–glass homogenizer. Transfer to 1.5 mL microtubes.
3. Solubilize samples and measure the protein concentration with
a Pierce BCA protein assay kit according to the instruction
manual from the manufacturer.
4. Transfer 5 μg of PM protein to another 1.5 mL microtube.
Make up to 20 μL with solution A.
5. Perform reductive alkylation and tryptic digestion according to
the instruction manual, and store at 30 C (see Note 26).
Fig. 3 Peptide purification assembly. A C-TIP is inserted in a hole drilled with a

soldering iron in a 1.5 mL microtube cap. Solution A, solution B and the sample
solution are put into the C-TIP, in that order, and then the assembly is
centrifuged. The mixed solution passes through a C18 column and peptides
are absorbed on, or eluted from, the C-TIP
3.5 Peptide All of these procedures should be performed at a clean bench

Purification whenever possible and at room temperature (unless otherwise
specified).
1. Insert a SPE C-TIP into the 3 mm hole in the microtube top
(Fig. 3).
2. Add 30 μL of solution A to the upper side of the SPE C-TIP for
preconditioning. Centrifuge at 1000 g for 30 s to get solu-
tion A through the tip column.
3. Add 30 μL of solution B from upper side of SPE C-TIP for
preconditioning. Centrifuge at 1000 g for 30 s to get solu-
tion B through the tip column.
4. After confirming that the column is moist, add the entire
trypsin digested peptide sample to the upper side of the SPE
C-TIP for column absorption. Centrifuge at 1000 g for 30 s
to get the sample solution through the tip column.
5. Add 30 μL of solution B from upper side of SPE C-TIP for
cleaning. Centrifuge at 1000 g for 30 s to get solution B
through the tip column.
6. Put a vial insert for each LC-MS/MS sampler into another
holed microtube. Transfer the SPE C-TIP into the microtube.
7. Add 30 μL of solution A to the upper side of the SPE C-TIP for

elution. Centrifuge at 1000 g for 30 s to get solution A
through the tip column. Discard the SPE C-TIP.
8. Dry out the eluted samples using a centrifugal concentrator for
15 min. Add 15 μL of 0.1% (v/v) TFA. Put the vial insert into
the vial and close the lid. Store at 30 C (see Note 23).
3.6 Nano-LC-MS/MS Separate digested and purified peptide solutions with a C18 column
Analysis by nano-flow LC. Make a linear gradient of acetonitrile (from 5%
[v/v] to 45% [v/v]) at a flow rate of 500 nL/min for 100 min.
Detect and analyze the separated and ionized peptides in a mass
spectrometer. Examples of analyzed results using 5 μg of oat PM
and DRM proteins are shown in Figs. 4, 5, and 6. You can see
detailed results in figure legends.
4 Notes
1. Mops-KOH (pH 7.6), EGTA (pH 8.0), and EDTA (pH 8.0)
should be prepared as 0.5 M stock solutions and stored at 4 C.
The pH of EGTA and EDTA should be adjusted using NaOH.
When BSA is dissolved, BSA powder should be preset at room
temperature. PMSF and SHAM should be prepared as 1 M and
1.6 M stock solutions in DMSO, respectively, and stored at
4 C. DTT should be stored at 30 C as a 1 M stock solution.
PMSF, SHAM, DTT should be diluted only as needed just
before use. If you prepare Arabidopsis PM fraction, the homo-
genizing medium should consist of 0.5 M sorbitol, 50 mM
Mops-KOH (pH 7.6), 5 mM EGTA (pH 8.0), 5 mM EDTA
(pH 8.0), 1.5% (w/v) polyvinylpyrrolidone-40 (molecular
weight 40,000), 0.5% (w/v) BSA, 2 mM PMSF, 4 mM
SHAM, and 2.5 mM DTT.
2. KH2PO4/K2HPO4 (K-P) buffer (pH 7.8) should be prepared
as a 0.5 M stock solution and diluted to make the MS suspen-
sion medium. First, 200 mL of 0.5 M K2HPO4 and 30 mL of
0.5 M KH2PO4 are prepared. The pH of the 0.5 M K2HPO4 is
adjusted to 7.8 by adding 0.5 M KH2PO4, monitored by a pH
meter. If you prepare Arabidopsis PM fraction, the MS sus-
pending medium should contain 10 mM KH2PO4/K2HPO4
(K-P) buffer (pH 7.8), 0.3 M sucrose.
3. If you prepare Arabidopsis PM fraction, the final concentration
of NaCl should be adjusted to 100 m M in the MS suspension
medium.
4. Mops-KOH (pH 7.3), EGTA (pH 8.0) should be prepared as a
0.5 M stock solution and stored at 4 C. If you prepare Arabi-
dopsis PM fraction, the PM suspending medium consists of
Fig. 4 The number of transmembrane domains in oat PM proteins separated and identified following in-gel or
in-solution tryptic digestion. Peptide sequences were searched against the NCBI database (version
20,120,216, comprising 17,282,984 sequences), taxonomy viridiplantae. Transmembrane domains were
estimated by SOSUI engine ver. 1.10 (http://bp.nuap.nagoya-u.ac.jp/sosui/). (a) Proteins with up to 24 trans-
membrane domains were identified in oat PM by in-gel digestion in four biological replicates. On average,
700 proteins with transmembrane domains were identified in the four replicates. (b) Proteins with up to
24 transmembrane domains were identified in oat PM by in-solution digestion in four biological replicates. On
average, 397 proteins with transmembrane domains were identified in the four replicates
10 mM Mops-KOH (pH 7.3), 1 mM EGTA (pH 8.0), and

0.3 M sucrose.
5. For Arabidopsis PM preparation, you should weigh 1.4 g of
polyethylene glycol 3350 and 1.4 g dextran, and add to 9.4 mL
MS suspension medium and 7.3 mL NaCl medium in a 40 mL
centrifuge tube.
6. When the pH of the Tris buffer is adjusted, the buffer solution
should be at room temperature. The pH of Tris can be affected
by the temperature of the solution. Addition of HCl results in
an increase of temperature by heat of neutralization and dilu-
tion. To avoid a temperature increase of the solution, add the
HCl slowly and intermittently.
Fig. 5 Distribution of the number of transmembrane domains in oat DRM proteins identified using the “In-
Solution Digestion” protocol. Peptides were analyzed as described in Fig. 4. Compared to PM (Fig. 4b), in DRM,
more transmembrane proteins, especially those containing more than five transmembrane domains, could be
identified
Fig. 6 Estimated number of proteins with secretory signal peptides in oat PM. 173 and 81 proteins with signal
peptides were identified from four replicates of in-gel or in-solution tryptic digests of 5 μg oat PM proteins,
respectively
7. β-mercaptoethanol is a reducing agent and should be added to

the sample buffer just before use.
8. TGS buffer is normally made as a 10 stock solution. First,
make 1 L of 10 TGS buffer consisting of 30.3 g Tris, 141.4 g
glycine, and 10 g SDS. Just before use, dilute 100 mL of 10
TGS buffer with 900 mL of water.
9. Unpolymerized acrylamide is neurotoxic. Acrylamide powder
requires careful handling. Wear gloves, a clean lab coat, and a
mask, and pay attention to people around you when weighing
acrylamide. Store at 4 C. Add polymerization agent before
discarding any spare acrylamide solution.
10. These solutions should be dispensed into a small volume and
sealed with Parafilm to prevent contamination and evapora-
tion. Store at 4 C and use within 1 month of preparation.
11. DTT and IAA can be easily modified in solution for a short
period and TFA evaporates quickly. Solutions including DTT,
IAA, and TFA should be freshly prepared just immediately
before use.
12. For Arabidopsis PM preparation, plants must be put in homo-
genizing medium directly and immediately after harvest and
washing. Subsequently, plants should be cut with clean scissors
in the medium.
13. For Arabidopsis PM preparation, the homogenates should be
centrifuged at 5000 g for 10 min.
14. In this step, homogenization should not be too long or too
vigorous because harsh homogenization can severely disrupt
membrane integrity.
15. Two-phase partitioning is the most important step for prepar-
ing highly purified PM. When the upper phase of the
two-phase partition medium is removed, the Pasteur pipette
should be moved from left to right near the boundary of the
two phases to prevent taking lower phase. For Arabidopsis PM
preparation, the two phases should be centrifuged at 440 g
for 5 min.
16. The yield of the PM preparation is expected to be 2.5 mg
protein from 70 to 100 g (FW) of oat leaves.
17. One of the keys to making a good step gradient with sucrose
solutions is pouring the solution slowly along the inner wall of
the tube.
18. In this step, the upper portions of the white band should be
discarded first and then the DRM layer should be collected
carefully.
19. All parts of the gel cassette should be wiped with ethanol or
acetone on cleaning tissue to prevent contamination with other
proteins including keratin.
20. Wells are separated between samples to prevent mixing up

samples and electrophoresis. Electric power supply is turned
on constant voltage mode.
21. Be careful not to mix with the next sample bands. Illuminate
the glass plate from below with a fluorescent lamp to see the
gels easily.
22. At this stage, dehydrated, compressed, and completely
bleached gels should be observed. If the gels do not change,
repeat this step twice.
23. At this stage, rehydrated and swollen gels should be observed.
If the gels do not change, repeat this step twice.
24. Gels are sometimes partly bleached, but this is acceptable.
25. At this stage, dehydrated and compressed gels are easily lost by
electrostatic force. Take extra care.
26. Digested and purified peptides should be analyzed by nano-
LC-MS/MS within 1 week.
Acknowledgments
This work was supported in part by Grants-in-Aid for Scientific

Research (#22120003 and #24370018) from MEXT, Japan to
Y.K. and M.U.
References
1. Aebersold R, Mann M (2003) Mass 7. Gorg A, Weiss W, Dunn M-J (2004) Current
spectrometry-based proteomics. Nature two-dimensional electrophoresis technology
422:198–207 for proteomics. Proteomics 4:3665–3685
2. Domon B, Aebersold R (2006) Mass spec- 8. Rabilloud T (2009) Membrane proteins and
trometry and protein analysis. Science proteomics: love is possible, but so difficult.
312:212–217 Electrophoresis 30:174–180
3. Kersten B, Burkle L, Kuhn E-J et al (2002) 9. Rabilloud T (2002) Two-dimensional gel elec-
Large-scale plant proteomics. Plant Mol Biol trophoresis in proteomics: old, old fashioned,
48:133–141 but it still climbs up the mountains. Proteomics
4. van Wijk KJ (2001) Challenges and prospects 2:3–10
of plant proteomics. Plant Physiol 10. Simons K, Ikenen E (1997) Functional rafts in
126:501–508 cell membranes. Nature 387:569–572
5. Santoni V, Kieffer S, Desclaux D et al (2000) 11. Peskan T, Westermann M, Oelmuller R (2000)
Membrane proteomics: use of additive main Identification of low-density triton
effects with multiplicative interaction model X-100-insoluble plasma membrane microdo-
to classify plasma membrane proteins accord- mains in higher plants. Eur J Biochem
ing to their solubility and electrophoretic prop- 267:6989–6995
erties. Electrophoresis 21:3329–3344 12. Mongrand S, Morel J, Laroche J et al (2004)
6. Luche S, Santoni V, Rabilloud T (2003) Eval- Lipid rafts in higher plant cells. J Biol Chem
uation of nonionic and zwitterionic detergents 279:36277–36286
as membrane protein solubilizers in 13. Bhat RA, Panstruga R (2005) Lipid rafts in
two-dimensional electrophoresis. Proteomics plants. Planta 223:5–19
3:249–253
14. Martin SW, Glover BJ, Davies JM (2005) Lipid proteome in oat and rye: similarities and dis-
microdomains: plant membranes get similarities between two monocotyledonous
organized. Trends Plant Sci 10:263–265 plants. J Proteome Res 11:1654–1665
15. Grennan AK (2007) Lipid rafts in plants. Plant 19. Li B, Takahashi D, Kawamura Y et al (2012)
Physiol 143:1083–1085 Comparison of plasma membrane proteomic
16. Thakur S-S, Geiger T, Chatterjee B et al (2011) changes of Arabidopsis suspension cells (T87
Deep and highly sensitive proteome coverage line) after cold and abscisic acid treatment in
by LC-MS/MS without prefractionation. Mol association with freezing tolerance develop-
Cell Proteomics 10:M110.003699 ment. Plant Cell Physiol 53:543–554
17. Matros A, Kasper S, Witzel K et al (2011) 20. Masuda T, Tomita M, Ishihama Y (2008)
Recent progress in liquid chromatography- Phase transfer surfactant-aided trypsin diges-
based separation and label-free quantitative tion for membrane proteome analysis. J Prote-
plant proteomics. Phytochemistry 72:963–974 ome Res 7:731–740
18. Takahashi D, Kawamura Y, Yamashita T et al
(2011) Detergent-resistant plasma membrane
Chapter 8
A Protocol for the Plasma Membrane Proteome Analysis

of Rice Leaves
Ravi Gupta, Yu-Jin Kim, and Sun Tae Kim
Abstract
Subcellular proteome analysis is one of the most effective ways to reduce the complexity of total proteome.
With the advancement in protein extraction methodologies, it is now possible to fractionate and isolate the
proteins from subcellular compartments without significant contamination from the cytoplasm and other
organelles. Of the different subcellular proteomes, plasma membrane remained largely uncharacterized
because of the difficulties in isolation of contamination free plasma membrane proteins. Moreover, prote-
ome analysis in the past two decades majorly relied on the two-dimensional gel electrophoresis which
showed limited protein loading ability and poor separation of highly hydrophobic plasma membrane
proteins. Development of shotgun proteomics methods has facilitated the identification and quantification
of hydrophobic proteins isolated from plasma membrane or other cellular membranes. Here, we present a
simplified procedure for the isolation of plasma membrane proteins by a two-phase partitioning method
and their identification by shotgun proteomics approach using rice as a model plant.
Key words Plasma membrane, Shotgun proteomics, Rice, Two-phase partitioning, Signaling
1 Introduction
Plasma membrane is the outermost membrane that physically sepa-

rates a cell from its external environment and plays a central role in
the intracellular signaling [1]. Transmission of signals from external
environment to inside the cells is mediated by the lipids, receptors
and other proteins which are integral components of the plasma
membrane [2]. Majority of the plasma membrane-localized pro-
teins contain a transmembrane domain with hydrophobic regions
spanning the membrane and hydrophilic domains located toward
apoplast and symplast [3]. Plasma membrane proteins are not only
involved in the cell signaling but also play pivotal roles in transport
by functioning as carrier and channel proteins [4]. Thus, analysis of
plasma membrane proteome can provide important clues regarding
cell to cell communication, signaling, and transport [4]. Further-
more, some of the biological processes, such as plant–pathogen
107
108 Ravi Gupta et al.
interactions, cannot be fully understood without the analysis of

plasma membrane proteins. In particular, understanding of
pattern-triggered immunity (PTI) responses requires isolation and
characterization of plasma membrane proteins as pathogen secreted
molecules bind to the plasma membrane-localized receptors of the
plants [5]. This binding of pathogen secreted molecules, known as
pathogen-associated molecular patterns (PAMPs) to the plasma
membrane-localized pattern-recognition receptors (PRRs) of the
plants activate the myriad of signaling events culminating into the
defense responses [6, 7].
In this chapter, we describe the pipeline for the plasma mem-
brane proteome analysis using a shotgun proteomics approach.
Plasma membrane protein isolation method described here is mod-
ified from the method described by [8].
2 Materials
Prepare all solutions for protein extraction and trypsin digestion in

deionized water while solutions for peptide desalting and mass
spectrometry should be prepared in HPLC or LC-MS grade
water. After trypsin digestion, the use of low-protein bind tubes is
recommended to minimize the peptide loss. Wear lab coat and
nitrile gloves throughout the experimentation to avoid keratin
contamination.
2.1 Plant Material 1. Rice plants (see Note 1).
2.2 Reagents, 1. 2D-Quant kit (GE Healthcare) (see Note 2).

Equipment, and 2. 30 kDa spin filter (Amicon Ultra Centrifugal filters, Catalog
Software number UFC503096) or Microcon-30 kDa Centrifugal Filter
Unit with Ultracel-30 membrane (Millipore, Catalog number
MRCF0R030).
3. Trypsin Gold, Mass Spectrometry grade (Promega, Catalog
number V5280) (see Note 3).
4. 3 M Empore HP Extraction disk cartridge (C18-SD), 7 mm/
3 mL (Catalog No. 4215SD).
5. Eppendorf LoBind microcentrifuge tubes (Catalog number
Z666505) or low protein binding collection tubes (Thermo
Fisher Scientific, Catalog number 90410).
6. Acclaim PepMap 100 trap column (100 μm 2 cm, nanoViper
C18, 5 μm, 100 Å) (Thermo Fisher Scientific).
7. Acclaim PepMap 100 capillary column (75 μm 15 cm, nano-
Viper C18, 3 μm, 100 Å).
8. UHPLC Dionex UltiMate® 3000 (Thermo Fisher Scientific).
Rice Leaf Plasma Membrane Proteome 109
9. QExactive™ Orbitrap High-Resolution Mass Spectrometer

(Thermo Fisher Scientific) for high mass accuracy, high resolu-
tion, and high scan speed. Other n-LC/MS-MS can also be
used for the protein identification; however, the total number
of identified proteins may vary.
10. MaxQuant and Perseus software.
11. Ultracentrifuge (Beckman Coulter).
12. pH strips.
13. Sonicator (both probe type and water bath type).
14. Dry bath preferably (see Note 4).
2.3 Buffers 1. Homogenization buffer: 50 mM Tris, pH 8.0, 500 mM

sucrose, 10% glycerol (v/v), 20 mM EDTA, 20 mM EGTA,
0.6% PVP, and 10 mM ascorbic acid. Adjust the pH of the
buffer to 8.0 using 2-morpholinoethane sulfonic acid (MES)
(see Note 5).
2. Upper phase solution: 5 mM phosphate buffer pH 7.8,
330 mM sucrose and 2 mM DTT (see Note 5).
3. Lower phase solution: 5 mM phosphate buffer, pH 7.8, 5 mM
KCl, 300 mM sucrose, 6.4% Dextran T-500 (w/w), 6.4% PEG-
3350 (w/w) (see Note 5).
4. TEAB buffer: 100 mM triethyl ammonium bicarbonate,
pH 8.5 (see Note 6).
5. SDT-lysis buffer: 4% SDS, 100 mM DTT in 100 mM TEAB
6. UA buffer: 8 M urea in 100 mM TEAB pH 8.5 (see Note 7).
7. Alkylation buffer: 50 mM iodoacetamide in 100 mM TEAB
8. Trypsin solution: Dissolve 100 μg of Trypsin lyophilized pow-
der in 95 μL 100 mM TEAB and add 5 μL ACN (acetonitrile)
(see Note 8).
9. Solvent A: water–ACN, 98:2 v/v; 0.1% formic acid (FA).
10. Solvent B: 100% ACN, 0.1% FA (v/v).
11. Solvent C: 0.1% FA in water (v/v).
3 Methods
3.1 Extraction of 1. Take 20 g of fresh healthy green leaves of rice and grind it in the
Plasma Membrane prechilled pestle and mortar using liquid nitrogen.
Proteins 2. Add 40 mL of homogenization buffer and vortex the homog-
enate for 5 min.
3. Filter the homogenate using nylon cloth and centrifuge at

26,000 g for 25 min. Save 100 μL of this fraction as total
cellular proteins (T) for SDS-PAGE analysis.
4. Centrifuge the supernatant again at 84,000 g for 25 min to
pellet down the microsomal fraction.
5. Discard the supernatant and add 9 mL of upper phase solution
and sonicate for 3 min using an ultrasonic probe sonicator.
6. Add 18 mL of lower phase solution, vortex and incubate on ice
for 5 min (label it as tube A).
7. Centrifuge at 2000 g for 10 min for phase separation.
8. Carefully transfer the upper phase to a new tube (label it as tube
B).
9. To maximize the yield, add 9 mL of upper phase solution again
to the lower phase of tube A and vortex.
10. Incubate on ice for 5 min and centrifuge at 2000 g for
10 min.
11. Carefully collect the upper phase and combine it with the upper
phase of tube B.
12. Add 18 mL of lower phase solution to the combined upper
phase (tube B), vortex and centrifuge at 2000 g for 10 min.
13. Collect the upper phase and dilute it with 5 volumes of water.
Mix well and incubate on ice for 5 min.
14. Centrifuge at 84,000 g for 10 min at 4 C. Remove the
supernatant and pellet is the plasma membrane proteins.
15. Dissolve a part of the pellet in SDS-loading buffer and resolve
the isolated proteins on SDS-PAGE. A typical SDS-PAGE gel
profile of total and plasma membrane proteins from young rice
leaves should look like the one shown in Fig. 1a.
16. Check the quality of the isolated plasma membrane proteins by
Western blots using organelle specific markers. An enrichment
of plasma membrane protein marker should be observed while
cytoplasmic and nuclear proteins markers should not be
detected in the isolated plasma membrane proteins fraction
(Fig. 1b).
3.2 In-Solution 1. Dissolve the remaining plasma membrane protein pellet in

Trypsin Digestion by 300 μL of SDT-lysis buffer, sonicate for 5 min in a water bath
Filter-Aided Sample type sonicator.
Preparation (FASP) 2. Incubate the sample at 99 C for 30 min followed by sonication
again for 5 min.
3. Centrifuge at 12,000 g for 5 min to pellet down the insoluble
debris.
Fig. 1 (a) SDS-PAGE showing the protein profile of total cellular (T) as well as plasma membrane (PM) proteins
isolated from young rice leaves. (b) Immunoblots showing total (T) and plasma membrane proteins probed
with anti-glutamine synthase (GS), anti-histone 1 (H1), and anti-plasma membrane intrinsic protein 2 (PIP2)
antibodies used as cytosolic, nuclear, and plasma membrane markers, respectively. (c) Functional annotation
of the MS identified proteins using MapMan program showing signaling overview
4. Quantify proteins using 2DE quant kit following the manufac-

turer’s protocol.
5. Take 100 μg of protein, dilute it to 300 μL with UA buffer, and
load on 30 kDa spin filter.
6. Centrifuge at 14,000 g for 10 min and wash thrice with UA
buffer for complete removal of SDS.
7. Add 200 μL of alkylation buffer and incubate for 1 h in dark.
8. Add 300 μL of UA buffer and centrifuge at 14,000 g for
10 min (two times).
9. Add 300 μL of 50 mM TEAB and centrifuge at 14,000 g for
10 min (three times).
10. Add 290 μL of 50 mM TEAB and 10 μL of Trypsin solution
and incubate at 37 C overnight in a dry bath (see Note 9).
11. Transfer the spin filter to a new collection tube and collect the
digested peptides by centrifugation at 14,000 g for 10 min.
12. Add 100 μL of 50 mM TEAB and 50 μL of 0.5 M NaCl and
collect the filtrate (digested peptides) (see Note 10).
13. Add 0.5 μL of FA to stop the reaction (see Note 11).
3.3 Desalting of 1. Centrifuge the acidified digest at 2000 g for 15 min. Transfer
Peptides the supernatant to a new protein low bind tube and discard the
precipitate.
2. Take a new C-18-SD extraction disc cartridge and wash

sequentially with 3 mL each of Solvent-B and Solvent-C (see
Note 12).
3. Carefully load the acidified digest and collect the flow-through.
4. Reload the flow-through to the column and allow to pass it
once more for efficient binding of the peptides to the column
matrix.
5. Wash column with 1 mL of Solvent-C. Repeat this step thrice.
6. Elute peptides with 300 μL each of 40%, 60%, and 80% ACN
containing 0.1% FA.
7. Lyophilize the peptides in a speedvac (see Note 13).
3.4 Mass 1. Reconstitute lyophilized peptides in 40 μL of Solvent-A.

Spectrometry 2. Inject the desalted peptides in a UHPLC Dionex UltiMate®
3000 instrument equipped with Acclaim PepMap 100 trap
column and perform washing with 98% solvent A for 6 min at
a flow rate of 6 μL/min.
3. Separate peptides continuously by reversed-phase chromatog-
raphy using an Acclaim PepMap 100 capillary column at a flow
rate of 400 nL/min.
4. Use a liquid chromatography–tandem mass spectrometry
(LC-MS/MS) coupled with an electrospray ionization source
to the quadrupole-based mass spectrometer QExactive™
Orbitrap High-Resolution Mass Spectrometer and run the
LC analytical gradient at 2%–35% solvent B over 90 min, then
35–95% over 10 min, followed by 90% solvent B for 5 min, and
finally 5% solvent B for 15 min.
5. Let the resulting peptides electrosprayed through the coated
silica emitted tip at an ion spray voltage of 2000 eV.
6. Acquire the MS spectra at a resolution of 70,000 (200 m/z) in
a mass range of 350–1800 m/z. Set a maximum injection time
to 100 ms for ion accumulation.
7. Use the eluted samples for MS/MS events (resolution of
17,500), measured in a data-dependent mode for the
10 most abundant peaks (Top10 method), in the high mass
accuracy Orbitrap after ion activation/dissociation with Higher
Energy C-trap Dissociation (HCD) at 27 collision energy in a
100–1650 m/z mass range.
3.5 Data Processing 1. Export all the raw files and load in the MaxQuant software (see
Using MaxQuant Note 14).
Software 2. Download rice protein database (Osativa_373, 52424
sequences) file from Phytozome and upload to the MaxQuant
as the database file (see Note 15).
3. Search the acquired MS/MS spectra against this database using

integrated Andromeda as a search engine.
4. Select and apply the FDR <0.01 for proteins, peptides, and
modifications and select trypsin as a cleavage enzyme, cysteine
carbamidomethylation as a fixed modification, and oxidation of
methionine and acetylation (protein N-term) as variable
modifications.
5. Specify a minimum peptide length of six amino acids and
enable the “match between runs” (MBR) with a matching
time window of 0.7 min. Click start and allow the MaxQuant
to run.
6. After finishing the MaxQuant run, upload the ProteinGroup
file (saved automatically after MaxQuant run in the Combined-
txt folder) to the Perseus and remove all the contaminants,
reverse hits and identification based on the sites only.
7. Select the protein IDs and load it into the MapMan for the
functional annotation of the identified proteins.
8. Download Osa_MSU_v7 mapping file from MapMan store
and select it for mapping of the rice phytozome IDs.
9. A typical pie chart of the functional annotation of the identified
plasma membrane proteins from young rice leaves should be
similar to the one presented in Fig. 1c.
4 Notes
1. Sterilize rice seeds with 0.05% Spotak solution (Bayer

CropScience, South Korea) overnight at 4 C, followed by
five washings with deionized water. Place sterilized seeds on
moist tissue paper and incubate at 28 C in dark for germina-
tion. Transfer germinated seeds to sterilized soil in a growth
chamber at 24 C 1 C temperature, 70% relative humidity,
and 16/8 h day/night cycle. Harvest only leaf sheaths of
primary and secondary leaves of 4-week-old rice plant and use
for protein extraction [9].
2. Use of this kit is recommended as proteins would be dissolved
in a high concentration of urea and other detergents which
interfere with protein quantification by commonly used meth-
ods such as Bradford’s method.
3. Trypsin/Lys-C Mix, Mass Spectrometry grade (Promega, Cat-
alog number V5071) can also be used to increase the efficiency
of protein digestion.
4. Avoid using water bath during in-solution trypsin digestion as
digestion would be carried out in filter units and water can be
leaked inside resulting in sample contamination.
5. These buffers can be stored at 20 C for months.

6. A stock solution of 1 M TEAB can be purchased from Sigma-
Aldrich (Catalog number T7408).
7. Always prepare these solutions fresh, as some of these solutions
include volatile solvents which may evaporate during storage.
8. Store remaining trypsin solution at 70 C. Alternatively, tryp-
sin can be dissolved in 50 mM acetic acid (pH < 3.0) solution
for long term storage. At higher pH, trypsin activity is not
inhibited, resulting in trypsin autolysis.
9. Lys-C can also be used along with trypsin to increase the
efficiency of protein digestion.
10. The concentration of the peptides can be checked after this
step using Pierce quantitative fluorometric peptide assay kit.
Alternatively, the concentration of the peptides can be roughly
estimated by measuring the absorbance of the digest at
280 nm, assuming that the 1 mg/mL solution would have
1.1 absorbance units. Use only quartz cuvettes and UV–visible
spectrometer for the latter approach.
11. Check the pH of the eluate using a pH strip. If pH is >3.0, add
more FA until the pH drops <3.0.
12. Avoid air bubbles which may trap during changing of solvents
and do not allow the column to dry in between.
13. Lyophilized peptides can be stored at 70 C for months.
14. Proteome Discover Software can also be used for the database
search.
15. Rice protein database file can also be downloaded from Uni-
Prot. However, in that case, UniProt IDs should be converted
to Phytozome IDs by BLAST analysis. Alternatively, functional
annotation of the identified proteins can be done by KEGG
pathway or Gene Ontology directly using UniProt IDs.
Acknowledgments
This work was supported by the National Research Foundation of

Korea through Basic Research Lab (BRL) program
(2018R1A4A1025158) and (2019R1A2C2085868) provided to
S.T.K.
References
1. Li B, Takahashi D, Kawamura Y et al (2018) 2. Santoni V, Molloy M, Rabilloud T (2000) Mem-

Plasma membrane proteomics of Arabidopsis brane proteins and proteomics: un amour
suspension-cultured cells associated with growth impossible? Electrophoresis 21:1054–1070
phase using nano-LC-MS/MS. In: Plant mem-
brane proteomics. Springer, Cham, pp 185–194
3. Cordwell SJ, Thingholm TE (2010) Technolo- involvement of a novel cysteine protease in its
gies for plasma membrane proteomics. Proteo- pathogenicity. J Proteome 169:202–214
mics 10:611–627 7. Wang Y, Wu J, Kim SG et al (2016) Magna-
4. Voothuluru P, Anderson JC, Sharp RE et al porthe oryzae-secreted protein MSP1 induces
(2016) Plasma membrane proteomics in the cell death and elicits defense responses in rice.
maize primary root growth zone: novel insights Mol Plant-Microbe Interact 29:299–312
into root growth adaptation to water stress. 8. Santoni V (2007) Plant plasma membrane pro-
Plant Cell Environ 39:2043–2054 tein extraction and solubilization for proteomic
5. Gupta R, Lee SE, Agrawal GK et al (2015) analysis. In: Plant proteomics. Springer, Cham,
Understanding the plant-pathogen interactions pp 93–109
in the context of proteomics-generated apoplas- 9. Meng Q, Gupta R, Min CW et al (2019) A
tic proteins inventory. Front Plant Sci 6:352 proteomic insight into the MSP1 and flg22
6. Wang Y, Gupta R, Song W et al (2017) Label- induced signaling in Oryza sativa leaves. J Prote-
free quantitative secretome analysis of Xantho- ome 196:120–130
monas oryzae pv. Oryzae highlights the
Chapter 9
Isolation, Purity Assessment, and Proteomic Analysis

of Endoplasmic Reticulum
Xin Wang and Setsuko Komatsu
Abstract
Subcellular proteomics include, in its experimental workflow, steps aimed at purifying organelles. The purity
of the subcellular fraction should be assessed before mass spectrometry analysis, in order to confidently
conclude the presence of associated specific proteoforms, deepening the knowledge of its biological
function. In this chapter, a protocol for isolating endoplasmic reticulum (ER) and purity assessment is
reported, and it precedes the proteomic analysis through a gel-free/label-free proteomic approach. Dys-
function of quality-control mechanisms of protein metabolism in ER leads to ER stress. Additionally, ER,
which is a calcium-storage organelle, is responsible for signaling and homeostatic function, and calcium
homeostasis is required for plant tolerance. With such predominant cell functions, effective protocols to
fractionate highly purified ER are needed. Here, isolation methods and purity assessments of ER are
described. In addition, a gel-free/label-free proteomic approach of ER is presented.
Key words Endoplasmic reticulum isolation, Endoplasmic reticulum purity, Subcellular proteomics,
Shotgun proteomics
1 Introduction
The endoplasmic reticulum (ER) is a specialized organelle formed

by a continuous membrane system without (smooth ER) or with
(rough ER) ribosomes studded [1]. The ER lumen is the place
where protein traffic, modification, and metabolism take place,
including folding, refolding, degradation, and secretion [2]. Path-
ways of ER-associated degradation and unfolded protein response
are activated to degrade misfolded or unfolded proteins, which are
accumulated in response to ER stress caused by environmental cues
[3]. Besides, ER is a calcium-storage organelle, whose homeostasis
is controlled through the distribution of calcium-handling pro-
teins, namely calcium-binding proteins, calcium pumps, and
calcium-release channels [4]. Endoplasmic reticulum-associated
degradation was necessary for plants to overcome salt stress because
a defect component in endoplasmic reticulum-associated
117
118 Xin Wang and Setsuko Komatsu
degradation complex led to alteration of unfolded protein response

and increased plant sensitivity [5]. In addition, released calcium
from ER was involved in elevation of unfolded protein response [5]
and increased cytosolic calcium disturbed proper environment for
protein folding in ER [6]. Such documented information has
addressed the vital roles of ER in protein quality-control mechan-
isms, calcium homeostasis, and stress tolerance in plants under salt,
drought, and flooding conditions.
Because of the important cell functions of ER, studies with
increasing technical complexity and sophistication have been car-
ried out to obtain protein profiles of ER organelle [7]. In castor
bean, a gel-based proteomic analysis was conducted in ER isolated
from developing and germinating seeds, indicating that protein
folding represented dominant components of ER [8]. In rice endo-
sperm, an iTRAQ-based proteomic approach was performed to
explore metabolisms induced by ER stress, a result of the accumu-
lation of unfolded or misfolded proteins in ER, suggesting that
pathways of protein-processing in ER and degradation-related pro-
teasome were predominantly affected [9]. In barley, glycosylation
sites of secreted proteins in gibberellic acid-induced aleurone layers
were investigated under ER-stressed conditions through a
gel-based proteomic technique [10]. Additionally, both of
gel-based [11] and gel-free/label-free [6] proteomic approaches
were performed in ER isolated from soybean, suggesting that
suppressed protein glycosylation and disturbed calcium homeosta-
sis were induced by flooding. These studies emphasize pivotal roles
of proteomic techniques on investigation of ER functions in plants
during seed development and under stress conditions.
For subcellular proteomics, it is critical to isolate highly purified
ER prior to conducting mass spectrometry (MS) analysis. A num-
ber of methods have been developed to enrich ER fraction, such as
continuous iodixanol gradients [12], discontinuous sucrose gradi-
ents [13], continuous sucrose gradients [14], ultracentrifugation
combined with [15–17] or without [18] sucrose gradients.
Although these methods are available, the whole process is rather
complex and time consuming [19]. Simple methods using the
Endoplasmic Reticulum Enrichment kit (Novus, Littleton, CO,
USA; Catalog Number NBP2-29482) were applied for ER enrich-
ment in soybean with high efficiency [6, 11], though they were
developed from animal materials. These findings present various
approaches for ER isolation; however, compared with animal mate-
rials, isolation protocols of ER from plants are rare, and alternative
methods need to be developed.
Purity validation of ER is conducted through microscopic,
immunoblot, and enzymatic analyses. Electron microscope was
applied to estimate integrity and purity of targeted fraction; how-
ever, it is largely dependent on commercial dyes developed for
specific subcellular organelles. To overcome such shortage,
Proteomic Analysis of Endoplasmic Reticulum 119
immunoblot and enzymatic analyses are utilized as alternative ways

to estimate ER purity. Antibodies for immunoblot analysis and
maker proteins for enzyme assay have been reviewed to indicate
contamination from cytosol, plasma membrane, nuclei, mitochon-
dria, and chloroplast [19, 20]. Furthermore, for immunoblot anal-
ysis, Tlg2p, ALP, and Pep12p were used as antibodies to assess
contamination from Golgi, vacuole, and endosome, respectively,
while ER-resident proteins, such as Sec63p, Dpm1p, Kar2p, Sey1p,
and Yop1p were applied as ER-specific antibodies [18]. These stud-
ies provide multiple methods to isolate the intact ER fraction with
high purity and they could be alternatively used based on the
advantages of available techniques.
Here, protocols applied for ER enrichment from soybean are
presented [6, 11]. In addition, purity of isolated ER fraction is
estimated through immunoblot and enzymatic analyses. Further-
more, the gel-free/label-free proteomic approach applied to ER is
described.
2 Materials
2.1 Isolation of ER 1. Isosmotic homogenization buffer: the isosmotic homogeniza-

Fraction tion buffer and protease inhibitor cocktail are provided by the
Endoplasmic Reticulum Enrichment kit (Novus, Littleton,
CO, USA), and consisted of HEPES, sucrose, and KCl.
Prepare fresh and kept on ice for use. Mix 1 isosmotic
homogenization buffer with 100 protease inhibitor cocktail
(see Note 1).
2. ER precipitation solution: 8 mM CaCl2.
2.2 Immunoblot 1. Sodium dodecyl sulfate (SDS)-sample buffer: 60 mM Tris–

Analysis for ER Purity HCl, pH 6.8, 2% SDS, 10% glycerol, and 5%
Assessment 2-mercaptoethanol.
2. Blocking buffer: 20 mM Tris–HCl, pH 7.5, 500 mM NaCl,
and 5% nonfat milk.
3. Washing buffer: 20 mM Tris–HCl, pH 7.5, 137 mM NaCl, and
0.1% Tween-20.
4. Filter paper: four pieces of filter paper (7.5 cm 10 cm;
Thermo Fisher Scientific, San Jose, CA, USA) are used to
blot one membrane.
5. Polyvinylidene difluoride membrane (7.0 cm 8.4 cm;
Thermo Fisher Scientific;).
6. Semidry transfer blotter: a semidry transfer blotter (Bio-Rad,
Hercules, CA, USA) with the current of 1 mA/cm2 for 90 min
is used for membrane blotting.
7. First antibodies: anti-ascorbate peroxidase [21], anti-calnexin

[22], and anti-histone H3 (Abcam, Cambridge, UK).
8. Secondary antibody: goat anti-rabbit IgG conjugated with
horseradish peroxidase (Bio-Rad).
9. Chem-Lumi One Super kit (Nacalai Tesque, Kyoto, Japan).
10. LAS-3000 (Fujifilm, Tokyo, Japan).
2.3 Enzymatic 1. Extraction buffer: 50 mM HEPES-NaOH, pH 7.5, 1 mM

Analysis for ER Purity EDTA, 5 mM MgCl2, 2% polyvinylpyrrolidone-40, 1 mM phe-
Assessment nylmethylsulfonyl fluoride, 1 mM dithiothreitol, and 0.1% Tri-
ton X-100.
2. Reaction buffer for alcohol dehydrogenase assay: 50 mM
MES-NaOH, pH 7.5, 5 mM MgCl2, 0.1 mM NADH, 1 mM
dithiothreitol, and 4% acetaldehyde.
3. Reaction buffer for glucose-6-phosphate dehydrogenase assay:
55 mM Tris–HCl, pH 7.8, 3.3 mM MgCl2, 0.2 mM NADP,
and 3.3 mM glucose 6-phosphate.
4. Reaction buffer for fumarase assay: 70 mM KH2PO4-NaOH,
pH 7.7, 50 mM L-malic acid, and 0.05% Triton X-100.
5. Reaction buffer for catalase assay: 50 mM Na2HPO4-KH2PO4,
pH 7.0, and 15 mM H2O2.
6. Reaction buffer for NADH cytochrome c reductase assay:
20 mM Na2HPO4-KH2PO4, pH 7.2, 0.2 mM NADH,
0.02 mM cytochrome c, and 6 mM NaN3.
7. Beckman Coulter DU-370 spectrophotometer UV/Vis (Beck-
man, Coulter, CA, USA).
2.4 Concentration 1. Bovine serum albumin: 20 mg/mL bovine serum albumin

Measurement (Sigma-Aldrich, St. Louis, MO, USA).
of Proteins 2. Dye regent: Bio-Rad Protein Assay Dye Reagent Concentrate
(Bio-Rad) is five times diluted before using.
3. Pierce 660 nm Protein Assay Kit (Thermo Fisher Scientific) is
used to examine the protein concentration of samples dissolved
in SDS-sample buffer. One pack of Ionic Detergent Compati-
bility Reagent (Thermo Fisher Scientific) is dissolved into
20 mL of Pierce 660 nm Protein Assay Reagent.
2.5 Proteomic 1. Lysis solution: 7 M urea, 2 M thiourea, 5% CHAPS, and 2 mM

Analysis of ER Proteins tributylphosphine.
2. Protein enrichment: methanol and chloroform.
3. Protein reduction: 250 mM dithiothreitol in 50 mM ammo-
nium bicarbonate.
4. Protein alkylation: 300 mM iodoacetamide in 50 mM ammo-
nium bicarbonate.
5. Suspension solution for alkylated proteins: 100 mM ammo-

nium bicarbonate.
6. Protein digestion: trypsin (Wako, Tokyo, Japan) is dissolved in
the medium provided by the kit to make 0.1 μg/μL working
solution. Lysyl endopeptidase (Wako) is dissolved in 50 mM
ammonium bicarbonate to make 0.1 μg/μL working solution.
7. Peptide acidification: 20% (v/v) formic acid.
8. Peptide desalt: C18-pipette tips (SPE C-TIP, Nikkyo Technos,
Tokyo, Japan).
2.6 Search Engine, 1. Search engine: Mascot search engine (version 2.5.1; Matrix
Software, Science, London, UK).
and Database 2. Software: Xcalibur software (version 2.0.7; Thermo Fisher Sci-
for Proteomic Analysis entific), Proteome Discoverer software (version 1.4.0.288;
Thermo Fisher Scientific), SIEVE software (version 2.1.377;
Thermo Fisher Scientific), MapMan software (version
3.6.0RC1), SUBA3 (http://suba3.plantenergy.uwa.edu.au/),
MultiLoc2 (http://abi.inf.uni-tuebingen.de/Services/Multi
Loc2), and WoLF PSORT (http://www.genscript.com/wolf-
psort.html).
3. Database: soybean peptide database constructed from soybean
genome database (Phytozome version 12; http://www.
phytozome.net/soybean) and Gmax_109_peptide (http://
mapman.gabipd.org/web/guest/mapman).
3 Methods
3.1 Isolation of Total The procedure is conducted on ice or at 4 C in the cold room.
ER Fraction
1. Fresh sample is ground in a glass mortar and pestle with isos-
motic homogenization buffer (see Note 1).
2. The homogenate is transferred to a Falcon tube and centri-
fuged at 3000 g for 10 min at 4 C.
3. The pellet is collected as Fraction 1. The supernatant is col-
lected and centrifuged at 12,000 g for 15 min at 4 C.
5. The supernatant is discarded and the pellet collected as total
ER fraction (Fig. 1).
3.2 Isolation The procedure is conducted on ice or at 4 C in the cold room.

of Rough ER Fraction
1. Fresh sample is ground in a glass mortar and pestle with
homogenization buffer (see Note 1).
Fresh sample collection
Sample homogenization
Put the homogenate in a Falcon tube

3,000 × g, 10 min, 4 oC
Pellet Transfer the supernatant to an Eppendorf tube

(Fraction 1) 12,000 × g, 15 min, 4 oC

(Fraction 2) 90,000 × g, 60 min, 4 oC
Pellet
(Total ER fraction)
Fig. 1 Procedure of total ER isolation. Fresh sample is ground with homogeniza-

tion buffer. The homogenate is centrifuged at 3000 g for 10 min at 4 C and
the pellet is collected as Fraction 1. The supernatant is centrifuged at
12,000 g for 15 min at 4 C. The pellet is collected as Fraction 2 and the
supernatant is centrifuged at 90,000 g for 60 min at 4 C. This final pellet is
collected as total ER fraction
2. The homogenate is transferred to a Falcon tube and centri-

fuged at 1000 g for 10 min at 4 C.
5. The supernatant is transferred to a glass beaker and kept on ice
(see Note 2).
6. Precipitation solution of CaCl2 is added drop by drop to the
supernatant, stirring the mixture for 15 min. Keep the CaCl2 in
ice and slow down the stirring speed during the precipitation
(see Note 3).
7. The mixture is transferred to a Falcon tube and centrifuged at
8000 g for 10 min at 4 C.
8. The supernatant is discarded and the pellet collected as rough
ER fraction (Fig. 2).
3.3 Immunoblot 1. Protein preparation: total or rough ER fractions are suspended,

Analysis for ER Purity for protein solubilization in SDS-sample buffer; vigorous vor-
Assessment of ER tex favors dissolution. The mixture is centrifuged at 20,000 g
Fraction for 20 min at room temperature. Supernatant is collected and
used for immuno-blot analysis.
2. Protein concentration is determined by Pierce 660 nm Protein
Assay Kit with ionic detergent compatibility reagent (see Note 4).
Fresh sample collection
Sample homogenization
Put the homogenate in a Falcon tube

1,000 × g, 10 min, 4 oC

(Fraction 1) 10,000 × g, 15 min, 4 oC

(Fraction 2) 12,000 × g, 15 min, 4 oC
Transfer the supernatant to a glass beaker
Supernatant precipitation
Transfer the supernatant to a Falcon tube

8,000 × g, 10 min, 4 oC
Pellet
(Rough ER fraction)
Fig. 2 Procedure of rough ER isolation. Fresh sample is ground with homogeniz-

ing buffer. The homogenate is centrifuged at 1000 g for 10 min at 4 C and the
pellet is collected as Fraction 1. The supernatant is collected and centrifuged at
10,000 g for 15 min at 4 C. The pellet is collected as Fraction 2 and the
supernatant centrifuged at 12,000 g for 15 min at 4 C. The supernatant is
collected for precipitation using 8 mM CaCl2 and centrifuged at 8000 g for
10 min at 4 C. The pellet is collected as rough ER fraction
To examine concentration of protein samples for immunoblot

analysis, a serial diluted bovine serum albumin of 0.5 mg/mL,
1.0 mg/mL, 1.5 mg/mL, 2.0 mg/mL, 2.5 mg/mL, 3.0 mg/
mL, 3.5 mg/mL, 4.0 mg/mL, 4.5 mg/mL, and 5.0 mg/mL is
freshly prepared to make standard curve. Concentration of pro-
tein samples are calculated based on the standard curve.
3. Electrophoresis: protein samples (10 μg) are separated by gel
electrophoresis on 17% SDS–polyacrylamide gel (SDS-PAGE).
Coomassie brilliant blue staining of applied proteins for elec-
trophoresis is served as a loading control.
4. A “sandwich” making by two pieces of filter paper (Bio-Rad),
one piece of polyvinylidene difluoride membrane (Thermo
Fisher Scientific), 17% SDS–polyacrylamide gel, two pieces of
filter paper is applied for blotting using a semidry transfer
blotter (Bio-Rad) in a current of 1 mA/cm2 for 90 min (see
Note 5).
5. Membrane blocking: blotted membrane is blocked in blocking

buffer overnight at 4 C.
6. Membrane washing: blocked membrane is washed three times
using washing buffer at room temperature (see Note 6).
7. Reaction with the first antibody: anti-ascorbate peroxidase
antibody, anti-calnexin antibody, and anti-histone H3 antibody
are used as the first antibodies. Anti-ascorbate peroxidase anti-
body is 10,000 times diluted and used as marker protein for
cytosol. Anti-calnexin antibody is 5000 times diluted and used
as marker protein for ER. Anti-histone H3 antibody is 9000
times diluted and used as marker protein for nucleus (see Note
7). Membrane is reacted with the first antibody for 60 min at
room temperature.
8. Membrane washing: reacted membrane is washed three times
as described above.
9. Reaction with the secondary antibody: goat anti-rabbit IgG
conjugated with horseradish peroxidase (Bio-Rad) is used as
the secondary antibody. The secondary antibody is 3000 times
diluted in use (see Note 7). Membrane is reacted with the
secondary antibody for 60 min at room temperature.
10. Membrane washing: reacted membrane is washed three times
using washing buffer at room temperature (see Note 6).
11. Signal detection: signals are detected using a Chem-Lumi One
Super kit (Nacalai Tesque, Kyoto, Japan) and visualized by
luminescent image analyzer LAS-3000 (Fujifilm, Tokyo,
Japan). Reaction using the Chem-Lumi One Super kit is per-
formed according to the user’s manual. The kit provides Solu-
tion A and Solution B. Mix Solution A and Solution B in one to
one ratio to prepare working solution, keep mixed solution in a
1.5 mL Falcon tube, cover the tube with aluminum foil, and
keep it at room temperature for use. Wipe washing buffer on the
membrane with paper towels, and put the membrane on a
plastic wrap spread on the desk. Cover the membrane
completely with working solution, and incubate in dark condi-
tion for 1 min. Wipe the working solution from the membrane
carefully with paper towels and cover the membrane with a new
plastic wrap for signal detection. Detection using luminescent
image analyzer LAS-3000 is conducted according to the user’s
manual. Start the computer equipped with LAS-3000, select the
exposure type, set the interval time, put membrane on the tray
with suitable position, and focus to get the image (see Note 8).
3.4 Enzymatic 1. Total or rough ER preparations are suspended, for protein

Analysis for ER Purity solubilization, in the enzyme corresponding buffer (Subhead-
Assessment of ER ing 3.4, steps 3–6); vigorous vortex favors dissolution.
Fraction
2. The mixture is sonicated in cold water for 40 min followed by

centrifugation at 20,000 g for 20 min at 4 C. Supernatant is
collected and used for enzymatic analysis.
3. Protein concentration is determined by Bradford method
[23]. The bovine serum albumin (Sigma-Aldrich) is used as
standard protein. A serial of diluted bovine serum albumin of
0.2 mg/mL, 0.4 mg/mL, 0.6 mg/mL, 0.8 mg/mL, 1.0 mg/
mL, 1.2 mg/mL, 1.4 mg/mL, 1.6 mg/mL, 1.8 mg/mL, and
2.0 mg/mL is freshly prepared to make standard curve. The
Bio-Rad Protein Assay Dye Reagent Concentrate (Bio-Rad) is
five times diluted to prepare the working solution. Concentra-
tion of protein samples are calculated based on the standard
curve.
4. Activity of alcohol dehydrogenase or glucose-6-phosphate
dehydrogenase: protein samples used for enzyme assay are
diluted to 1.0 mg/mL using extraction buffer. A volume of
100 μL of protein sample is added into 900 μL of reaction
buffer for alcohol dehydrogenase assay or for glucose-6-phos-
phate dehydrogenase assay. Immediately mix by inversion and
the reaction is measured for 5 min at 25 C at 340 nm
(EC340 ¼ 6.23 mM1 cm1). The activity of alcohol dehydro-
genase [24] or glucose-6-phosphate dehydrogenase [25] is
calculated using the following formula: (ΔA340 total vol-
ume sample dilution factor)/(6.23 sample volume).
5. Activity of fumarase: protein samples used for fumarase assay
are diluted to 1.0 mg/mL using extraction buffer. A volume of
buffer for enzyme assay. Immediately mix by inversion and the
reaction is measured for 5 min at 25 C at 340 nm
(EC340 ¼ 2.55 mM1 cm1). The activity of fumarase is calcu-
lated using formula: (ΔA340 total volume sample dilution
factor)/(2.55 sample volume) [26].
6. Activity of catalase: protein sample used for catalase assay is
diluted to 1.0 mg/mL using extraction buffer. A volume of
buffer for enzyme assay. Immediately mix by inversion and the
reaction is measured for 5 min at 25 C at 240 nm
(EC240 ¼ 40 mM1 cm1). The activity of catalase is calculated
with formula: (ΔA240 total volume sample dilution fac-
tor)/(40 sample volume) [27].
7. Activity of NADH cytochrome c reductase: protein sample
used for NADH cytochrome c reductase assay is diluted to
1.0 mg/mL using extraction buffer. A volume of 100 μL of
protein sample is added into 900 μL of reaction buffer for
enzyme assay. Immediately mix by inversion and the reaction
is measured for 5 min at 25 C at 550 nm
(EC550 ¼ 21.1 mM1 cm1). The activity of NADH cyto-

chrome c reductase is calculated with formula: (ΔA550 total
volume sample dilution factor)/(21.1 sample volume)
[28, 29].
3.5 Proteomic 1. ER fraction (Subheading 3.1, step 5 and Subheading 3.2, step
Analysis of ER Proteins 8) is dissolved in lysis buffer followed by sonication in cold
water for 20 min. The suspension is centrifuged at 20,000 g
3.5.1 Preparation
for 20 min at 4 C. The solubilized proteins kept in the
of Peptides for Gel-Free/
supernatant.
Label-Free Proteomic
Analysis 2. Proteins concentration is determined as described in Subhead-
ing 3.4, step 3.
3. Proteins (100 μg) are added to 400 μL of methanol and mixed
thoroughly before adding 100 μL of chloroform and 300 μL of
water.
4. Mixed sample is centrifuged at 20,000 g for 10 min at room
temperature to achieve phase separation.
5. Upper aqueous phase is discarded and 300 μL of methanol is
slowly added to lower phase.
6. Mixture is centrifuged at 20,000 g for 10 min at room
temperature. Supernatant is discarded and the pellet allowed
drying at room temperature.
7. Dried pellet is re-suspended in 20 μL of 50 mM ammonium
bicarbonate.
8. Proteins are reduced with 5 μL of 250 mM dithiothreitol in
50 mM ammonium bicarbonate for 30 min at 56 C.
9. Proteins are alkylated with 5 μL of 300 mM iodoacetamide in
50 mM ammonium bicarbonate for 30 min at 37 C in
darkness.
10. Alkylated proteins are resuspended in 40 μL of 100 mM
ammonium bicarbonate.
11. Proteins are digested with 10 μL of 0.1 μg/μL trypsin (Wako)
and 10 μL of 0.1 μg/μL lysyl endopeptidase (Wako) for 16 h at
37 C.
12. Peptides are acidified with 20 μL of 20% (v/v) formic acid
(pH < 3) and centrifuged at 20,000 g for 10 min at room
temperature.
13. Supernatant is collected and acidified peptides are desalted with
C18-pipette tips (SPE C-TIP, Nikkyo Technos, Tokyo, Japan).
14. Desalted peptides are subjected to nano-liquid chromatogra-
phy (LC)-MS/MS analysis.
3.5.2 Mass Spectrometry Peptides are separated using an Ultimate 3000 nanoLC system
Analysis (Dionex, Germering, Germany) equipped with a C18 PepMap
trap column (300 mm ID 5 mm; Dionex) equilibrated with
0.1% formic acid and eluted with a linear acetonitrile gradient
(8–30% over 150 min) in 0.1% formic acid at a flow rate of
200 nL/min on a C18 Tip column (75 μm 1D 120 mm; Nikkyo
Technos) with a spray voltage of 1.8 kV. Peptide ions are detected
using a nanospray LTQ Orbitrap Discovery MS (Thermo Fisher
Scientific) in data-dependent acquisition mode with installed Xca-
libur software (version 2.0.7; Thermo Fisher Scientific). Full-scan
mass spectra are acquired in mass spectrometer over 400–1500 m/z
with a resolution of 30,000. A lock mass function is used to obtain
high mass accuracy. Ions of C24H39O4+ (m/z 391.28429),
C14H46NO7Si7+ (m/z 536.16536), and C16H52NO8Si8+ (m/z
610.18416) are used as lock mass standards [30]. Top ten most
intense precursor ions are selected for collision-induced fragmenta-
tion in linear ion trap at a normalized collision energy of 35%.
Dynamic exclusion is employed within 90 sec to prevent repetitive
selection of peptides [31].
3.5.3 Protein Protein identification is conducted using Mascot search engine

Identification from (version 2.5.1; Matrix Science, London, UK) with soybean peptide
Acquired Mass database constructed from soybean genome database (Phytozome
Spectrometry Data version 12; http://www.phytozome.net/soybean) [32]. Acquired
raw files are processed using Proteome Discoverer software (version
1.4.0.288; Thermo Fisher Scientific). Parameters set in Mascot
search engine are as follows: carbamidomethylation of cysteine is
fixed modification; oxidation of methionine is variable modifica-
tion; trypsin is specific proteolytic enzyme; one missed cleavage is
allowed; peptides mass tolerance is 10 ppm; fragment mass toler-
ance is 0.8 Da; and peptide charge is set at +2, +3, and +4. Peptide
cutoff score is 10, and S/N threshold (FT-only) is set at 1.5 for
peak filtration. An automatic decoy database search is performed as
part of search. Mascot percolator is performed to improve accuracy
and sensitivity of peptide identification [33]. False discovery rates
for peptide identification of all searches are less than 0.01. Peptides
with more than 13 ( p < 0.05) percolator ion score are used for
protein identification.
3.5.4 Analysis of Relative Acquired Mascot results are exported into SIEVE software (version
Protein Abundance Using 2.1.377; Thermo Fisher Scientific) for quantitation analysis
Acquired Mass between the control and experimental groups. Chromatographic
Spectrometry Data peaks detected by MS are aligned, and peptide peaks are detected as
a frame on all parent ions scanned by MS/MS using 5 min of frame
time width and 10 ppm of frame m/z width. Areas of chro-
matographic peak within a frame are compared for each sample,
and ratios between samples are determined for each frame. Frames
with MS/MS scan are matched to Mascot results. Peptide ratios
between samples are determined from variance-weighted average of

ratios in frames, which MS/MS spectrum match to the peptides.
Ratios of peptides are further integrated to determine ratios of
corresponding proteins. Total ion current is used for normalization
of differential analysis of protein abundance. The outliers of ratio
are deleted in frame table filter based on frame area. The minimum
requirement for protein identification is two matched peptides.
Isoforms are deleted manually according to protein
ID. Significant changes of relative protein abundance between the
control and experimental groups are analyzed ( p < 0.05).
3.5.5 Analysis Exported XML files from Mascot are used to analyze absolute
of Absolute Protein Amount protein abundance. The term of exponentially modified protein
Using Acquired Mass abundance index (emPAI) is used to indicate absolute protein
Spectrometry Data amount. The emPAI value of each identified protein is divided by
sum of emPAI values of all identified proteins and multiplied by
100. The absolute protein amount is determined by molar percent-
age (mol %) [34].
3.5.6 Visualization Visualization of protein abundance is performed using MapMan

of Protein Abundance software (version 3.6.0RC1). Software and mapping files of
Gmax_109_peptide are downloaded from MapMan website
(http://mapman.gabipd.org/web/guest/mapman) [35].
3.5.7 Protein Localization Protein localization is predicted using intracellular targeting predic-
Prediction tion programs of SUBA3 (http://suba3.plantenergy.uwa.edu.au/)
[36], MultiLoc2 (http://abi.inf.uni-tuebingen.de/Services/Multi
Loc2) [37], and WoLF PSORT (http://www.genscript.com/wolf-
psort.html) [38].
4 Notes
1. A portion (1.0 g) of fresh sample is available for ER enrichment

from plants, working well for soybean root tip. A volume of
grinding buffer consisting of 4 mL of 1 isosmotic homogeni-
zation buffer combined with 40 μL of 100 protease inhibitor
cocktail is used to grind the fresh sample. Grinding buffer is
freshly prepared each time for ER enrichment.
2. Volume of supernatant used for precipitation is recorded.
Based on the records, a volume of CaCl2 is 15 times as that of
supernatant used for precipitation.
3. Pasteur pipette is used to add CaCl2 drop by drop. Small
stirring bar is recommended. Precipitation time could be opti-
mized according to the volume CaCl2.
4. Vigorously vortex to help the Ionic Detergent Compatibility

Reagent easily dissolving into Pierce 660 nm Protein Assay
Reagent. Protect the mixture from light using aluminum foil.
5. Air bubble is removed between membrane and filter paper. The
order for “sandwich” is “anode–filter paper–membrane–gel–
filter paper–cathode,” from bottom to top.
6. Three minutes is recommended for membrane washing each
time. Washing time is optimized based on the intensity of
detected signals.
7. Dilution ratio is optimized based on the purity of antibody.
High dilution ratio is recommended for high purified
antibodies.
8. It is better to carry out the detection after incubation with
working solution as soon as possible. To get the suitable inten-
sity of signals, incubation time, exposure time, and concentra-
tion of proteins or antibodies could be optimized.
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number

15H04445.
References
1. Healy SJ, Verfaillie T, Jag̈er R et al (2012) 7. Chen X, Karnovsky A, Sans MD et al (2010)
Biology of the endoplasmic reticulum. In: Molecular characterization of the endoplasmic
Agostinis P, Samali A (eds) Endoplasmic retic- reticulum: insights from proteomic studies.
ulum stress in health and disease. Springer, Proteomics 10:4040–4052
Dordrecht, pp 3–22 8. Maltman DJ, Gadd SM, Simon WJ et al (2007)
2. Kleizen B, Braakman I (2004) Protein folding Differential proteomic analysis of the endoplas-
and quality control in the endoplasmic reticu- mic reticulum from developing and germinat-
lum. Curr Opin Cell Biol 16:343–349 ing seeds of castor (Ricinus communis)
3. Howell SH (2013) Endoplasmic reticulum identifies seed protein precursors as significant
stress responses in plants. Annu Rev Plant components of the endoplasmic reticulum.
Biol 64:477–499 Proteomics 7:1513–1528
4. Papp S, Dziak E, Michalak M, Opas M (2003) 9. Qian D, Tian L, Qu L (2015) Proteomic anal-
Is all of the endoplasmic reticulum created ysis of endoplasmic reticulum stress responses
equal? The effects of the heterogeneous distri- in rice seeds. Sci Rep 5:14255
bution of endoplasmic reticulum Ca2+- 10. Barba-Espı́n G, Dedvisitsakul P, H€agglund P
handling proteins. J Cell Biol 160:475–479 et al (2014) Gibberellic acid-induced aleurone
5. Liu L, Cui F, Li Q et al (2011) The endoplas- layers responding to heat shock or tunicamycin
mic reticulum-associated degradation is neces- provide insight into the N-glycoproteome,
sary for plant salt tolerance. Cell Res protein secretion, and endoplasmic reticulum
21:957–969 stress. Plant Physiol 164:951–965
6. Wang X, Komatsu S (2016) Gel-free/label-free 11. Komatsu S, Kuji R, Nanjo Y et al (2012) Com-
proteomic analysis of endoplasmic reticulum prehensive analysis of endoplasmic reticulum-
proteins in soybean root tips under flooding enriched fraction in root tips of soybean under
and drought stresses. J Proteome Res flooding stress using proteomics techniques. J
15:2211–2227 Proteome 77:531–560
12. Graham JM (2002) Fractionation of Golgi, protein-dye binding. Anal Biochem

endoplasmic reticulum, and plasma membrane 72:248–254
from cultured cells in a preformed continuous 24. Komatsu S, Nanjo Y, Nishimura M (2013)
iodixanol gradient. Sci World J 2:1435–1439 Proteomic analysis of the flooding tolerance
13. Williamson CD, Wong DS, Bozidis P et al mechanism in mutant soybean. J Proteome
(2015) Isolation of endoplasmic reticulum, 79:231–250
mitochondria, and mitochondria-associated 25. Honjoh K, Mimura A, Kuroiwa E et al (2003)
membrane and detergent resistant membrane Purification and characterization of two iso-
fractions from transfected cells and from forms of glucose 6-phosphate dehydrogenase
human cytomegalovirus-infected primary (G6PDH) from Chlorella vulgaris C-27. Biosci
fibroblasts. Curr Protoc Cell Biol Biotechnol Biochem 67:1888–1896
68:3.27.1–3.27.33 26. Huang S, Jacoby RP, Millar AH et al (2014)
14. Shore GC, Tata JR (1977) Two fractions of Plant mitochondrial proteomics. In: Jorrin
rough endoplasmic reticulum from rat Novo JV, Komatsu S, Weckwerth W, Wienkoop
liver. I. Recovery of rapidly sedimenting endo- S (eds) Plant proteomics: methods and proto-
plasmic reticulum in association with mito- col. Springer, New York, pp 499–526
chondria. J Cell Biol 2:714–725 27. Kato M, Shimizu S (1987) Chlorophyll metab-
15. Coughlan SJ, Hastings C, Winfrey RJ Jr (1996) olism in higher plants. VII. Chlorophyll degra-
Molecular characterisation of plant endoplas- dation in senescing tobacco leaves; phenolic-
mic reticulum. Identification of protein dependent peroxidative degradation. Botany
disulfide-isomerase as the major reticuloplas- 65:729–735
min. Eur J Biochem 235:215–224 28. Hasinoff BB (1990) Inhibition and inactiva-
16. Maltman DJ, Simon WJ, Wheeler CH et al tion of NADH-cytochrome c reductase activity
(2002) Proteomic analysis of the endoplasmic of bovine heart submitochondrial particles by
reticulum from developing and germinating the iron(III)-adriamycin complex. Biochem J
seed of castor (Ricinus communis). Electro- 265:865–870
phoresis 23:626–639 29. Gomez L, Chrispeels MJ (1994) Complemen-
17. Chanat E, Le Parc A, Lahouassa H et al (2016) tation of an Arabidopsis thaliana mutant that
Isolation of endoplasmic reticulum fractions lacks complex asparagine-linked glycans with
from mammary epithelial tissue. J Mammary the human cDNA encoding N-acetylglucosa-
Gland Biol Neoplasia 21:1–8 minyltransferase I. Proc Natl Acad Sci U S A
18. Wang X, Li S, Wang H et al (2017) Quantita- 91:1829–1833
tive proteomics reveal proteins enriched in 30. Olsen JV, de Godoy LM, Li G et al (2005)
tubular endoplasmic reticulum of Saccharomy- Parts per million mass accuracy on an Orbitrap
ces cerevisiae. elife 6:e23816 mass spectrometer via lock mass injection into
19. Komatsu S, Hashiguchi A (2018) Subcellular a C-trap. Mol Cell Proteomics 4:2010–2021
proteomics: application to elucidation of 31. Zhang Y, Wen Z, Washburn MP et al (2009)
flooding-response mechanisms in soybean. Effect of dynamic exclusion duration on spec-
Proteomes 6:E13 tral count based quantitative proteomics. Anal
20. Wang X, Komatsu S (2016) Plant subcellular Chem 81:6317–6326
proteomics: application for exploring optimal 32. Schmutz J, Cannon SB, Schlueter J et al (2010)
cell function in soybean. J Proteome Genome sequence of the palaeopolyploid soy-
143:45–56 bean. Nature 463:178–183
21. Komatsu S, Yamamoto A, Nakamura T et al 33. Brosch M, Yu L, Hubbard T et al (2008) Accu-
(2011) Comprehensive analysis of mitochon- rate and sensitive peptide identification with
dria in roots and hypocotyls of soybean under Mascot Percolator. J Proteome Res
flooding stress using proteomics and metabo- 8:3176–3181
lomics techniques. J Proteome Res 34. Ishihama Y, Oda Y, Tabata T et al (2005)
10:3993–4004 Exponentially modified protein abundance
22. Nouri MZ, Komatsu S (2010) Comparative index (emPAI) for estimation of absolute pro-
analysis of soybean plasma membrane proteins tein amount in proteomics by the number of
under osmotic stress using gel-based and LC sequenced peptides per protein. Mol Cell Pro-
MS/MS-based proteomics approaches. Prote- teomics 4:1265–1272
omics 10:1930–1945 35. Usadel B, Poree F, Nagel A et al (2009) A
23. Bradford MM (1976) A rapid and sensitive guide to using MapMan to visualize and com-
method for the quantitation of microgram pare Omics data in plants: a case study in the
quantities of protein utilizing the principle of
crop species, maize. Plant Cell Environ 37. Blum T, Briesemeister S, Kohlbacher O (2009)
32:1211–1229 MultiLoc2: integrating phylogeny and gene
36. Tanz SK, Castleden I, Hooper CM et al (2013) ontology terms improves subcellular protein
SUBA3: a database for integrating experimen- localization prediction. BMC Bioinformatics
tation and prediction to define the SUBcellular 10:274
location of proteins in Arabidopsis. Nucleic 38. Horton P, Park KJ, Obayashi T et al (2007)
Acids Res 41:D1185–D1191 WoLF PSORT: protein localization predictor.
Nucleic Acids Res 35:W585–W587
Chapter 10
Dimethyl Labeling-Based Quantitative Proteomics

of Recalcitrant Cocoa Pod Tissue
Yoel Esteve-Sánchez, Jaime A. Morante-Carriel,
Ascensión Martı́nez-Márquez, Susana Sellés-Marchart,
and Roque Bru-Martinez
Abstract
Dimethyl labeling is a type of stable-isotope labeling suitable for creating isotopic variants of peptides and
thus be utilized for quantitative proteomics experiments. Labeling is achieved through a reductive amina-
tion/alkylation reaction using the low-cost reagents formaldehyde and cyanoborohydride, resulting in
dimethylation of free amine groups of Lys and N-termini. Availability of isotopomeric forms of these
reagents allows for the generation of up to six different isotopic variants. Here we describe the application of
dimethylation to create two isotopic variants, light and heavy, differing in 4 Da, to label the total tryptic
digest peptides of cocoa pod extracted from healthy pods from cultivars susceptible and resistant to the
fungal disease called “frosty pod” caused by Moniliophthora roreri.
Key words Dimethyl labeling, Stable-isotope labeling, Quantitative proteomics, Plant proteomics,
Cocoa pod, Moniliophthora roreri, Fungal disease
1 Introduction
Stable-isotope labeling of proteins and peptides has become a

popular approach in quantitative proteomics as mass spectrometry
(MS) allows to distinguish between isotopic variants. Each isotopic
variant can be associated to a type of sample and the differently
labeled samples be pooled and subjected to a single run of liquid
chromatography (LC) separation coupled to tandem mass spec-
trometry (MS/MS). Mass intensity ratio of isotopic variant pairs
(and thus between samples) is used as a relative abundance mea-
surement, while the MS/MS spectra of the fragmented peptide
ions is used for protein identification as in a standard proteome
shotgun identification experiment.
Currently, there are a plethora of labeling strategies and meth-
ods for quantitative proteomics. At first glance, there are two
133
134 Yoel Esteve-Sánchez et al.
groups of methods attending to the signal used for quantification.

The first group uses the intensity of the parent peptide ions, so the
isotopic variants are resolved at the MS scan level. In the second
group, parent peptide ions are isobaric and thus indistinguishable at
MS level, but as they bear different combinations of isotopes, each
variant releases a characteristic reporter ion upon fragmentation of
the isobaric parent ion and quantification is performed at MS/MS
level. A major concern of using mass intensity signal for quantifica-
tion was the shift of the retention time of the isotopic variants that
may occur during chromatographic separation particularly when
variants are created by substitution of hydrogen by deuterium
[1]. Thus, instead of using the mass intensity ratio of a MS scan,
the issue is resolved by measuring the ratio of peak areas of
extracted ion chromatograms of the isotopic variant pairs [2].
Most popular strategies of labeling are metabolic and chemical.
Metabolic labeling ([3, 4]) relies on the incorporation of isotopic
variants of amino acids (e.g., 13C6-Arg vs 12C6-Arg, or uniform
incorporation of 15N vs 14N) during protein synthesis; thus, the
strategy is mainly applicable to cell cultures. The metabolic variants
so generated can be resolved at MS level. Chemical labeling consists
of the covalent binding to protein/peptide reactive groups (typi-
cally -SH in Cys; -COOH in Asp, Glu and C-termini; -NH2 in Lys
and N-termini) of chemical groups of which a number of isotopic
variants are available. There are tens of stable-isotope chemical
reagents most of which are designed to attach to free amine or
carboxyl groups depending on their chemistry (for a recent review
see [5]), thus assuring that each peptide generated by protein
enzymatic digestion is susceptible of being labeled. For chemical
labeling there are both isobaric reagents [6–8] that release reporter
ions at MS/MS level (e.g., iTRAQ available as fourplex and eight-
plex, TMT available as duplex, sixplex, and tenplex) and isotopo-
meric reagents distinguishable at MS level (e.g., ICPL available in
fourplex [9]) that incorporate to peptides through a direct nucleo-
philic substitution reaction.
Dimethyl labeling [15] is a type of chemical labeling method
producing isotopic variants; it performs the addition of two methyl
groups to free amino groups of Lys and N-termini by a reductive
amination/alkylation reaction. Although the labeling method
could be applicable to peptide digest obtained with virtually any
protease used in proteomics, it is a good choice is to use the trypsin
digestion, which is more robust and cheaper, as a high proportion
of peptides will have Lys as C-terminal and thus double-dimethyl
label. The exception occurs when Pro is the N-terminus of the
peptides to label. In such case only one methyl group is
incorporated as it is a secondary amine. The use of trypsin as
digesting enzyme precludes the generation of such type of peptides.
The labeling reaction uses formaldehyde as the carbon donor for
alkylation and cyanoborohydride as the reducing agent of the imine
Proteomics of Cocoa Pod Using Dimethyl Labelling 135
Fig. 1 Dimethyl labeling reaction in primary amine. Reductive amination/

alkylation with formaldehyde as donor of the carbon and two hydrogens
(in red) and sodium cyanoborohydride as donor of the third hydrogen (in green)
of the linked methyl groups. There are three commercial isotopomers of
formaldehyde, CH2O, CD2O, and 13CD2O, and two of cyanoborohydride,
NaBH3CN and NaBD3CN, useful for dimethyl labeling. Combinations of these
reagents lead up to six isotopic variants that bring about an increase in nominal
mass per –NH2 site of 28, 30, 32, 34, 34, and 36 Da
intermediate. For each methyl group, the carbon and two hydro-
gens come from formaldehyde and the third hydrogen from the
reducing agent as depicted in Fig. 1; thus, combining isotopomers
of both reagents, a number of resulting isotopic dimethyl variants
can be generated. There are three commercial isotopomers of
formaldehyde, CH2O, CD2O and 13CD2O, and two of cyanobor-
ohydride, NaBH3CN and NaBD3CN, useful for dimethyl labeling.
Combinations of these reagent lead to six isotopic variants that
bring about an increase in nominal mass per –NH2 site of 28, 30,
32, 34, 34, and 36 Da. Actually, the two variants of +34 are pseudo-
isobaric differing in 0.00584 Da [10]. It allows for designing
different multiplex methods for quantitative proteomics depending
on the reagent combinations used, including the mostly used
duplex (+28, +32), triplex (+28, +32, +36) [11], fourplex (+28,
+30, +32, +34) [12], and fiveplex (+28, +30, +32, +34, +36)
[13]. This multiplex-labeling ability of dimethyl labeling compares
with the aforementioned chemical methods. Furthermore, its
much lower cost per sample (as low as €0.10 for 25 3 μg protein
triplex labeling [14] and much less for 25 2 μg duplex labeling)
makes this strategy highly cost-effective versus other labeling
options.
Dimethyl labeling has been shown compatible with major
chromatographic methods used in proteomics [15–18] and, except
for extremely large peptides bearing a single dimethyl label, there is
no significant overlapping of isotopic envelopes between isotopic
variants that might affect quantitation accuracy [19].
Dimethyl labeling has been widely applied in global protein
abundance profiling (search of human diseases biomarkers, basic
research in cellular pathways in disease models) and quantitative
PTM analysis including protein phosphorylation and glycosylation
(reviewed in [20]). In the case of microorganisms, dimethyl
labeling has been applied rather scarcely, to basic research with the
models E. coli [21] and yeast [22], and for studying performance of
microbial polysaccharide substrate degradation for biofuel produc-
tion [23]. Yet in plants, dimethyl labeling has not been applied for
global protein abundance profiling to the best of our knowledge.
Cocoa pod is a highly recalcitrant tissue for protein extraction
and exhaustive tissue washing is necessary to extract good quality
and quantity of protein [24]. It is affected by a major fungal disease
known as “frosty pod” [25] due to the attack of Moniliophthora
roreri causing large economic losses to producers in tropical regions
of the Americas and affecting the economy of a large number of
small farms [26]. Here we describe a method to apply dimethyl
labeling to carry out a global quantitative proteomic profiling of
cocoa pods with different susceptibility phenotype to this fungal
disease.
2 Materials
2.1 Sample 1. Liquid nitrogen.

Preparation and 2. 20% (w/v) trichloroacetic acid (TCA) in acetone (stored at
Labeling 20 C).
3. 20% (w/v) TCA in water (stored at 4 C).
4. Tris-saturated phenol, pH 8.0.
5. SDS buffer: 30% (w/v) sucrose, 2% (w/v) SDS, 0.1 M Tris–
HCl, pH 8.0, and 5% (w/v) 2-mercaptoethanol.
6. 0.1 M ammonium acetate in methanol.
7. 2% (w/v) sodium deoxycholate (DOC) (stored at 4 C).
8. 24% (w/v) TCA in water (stored at 4 C).
9. 80% (v/v) acetone (stored at 20 C).
10. 6 M urea.
11. 1 M stock solution of triethylammonium bicarbonate (TEAB).
12. 0.2 M dithiothreitol (DTT) in 25 mM TEAB. Make freshly
before use.
13. 0.2 M iodoacetamide (IAM) in 25 mM TEAB.
14. Trypsin mass spectrometric grade.
15. SpeedVac.
16. 4% (v/v) formaldehyde (CH2O).
17. 4% (v/v) formaldehyde (CD2O).
18. 0.6 M sodium cyanoborohydride (NaBH3CN).
19. 1% (v/v) ammonium (NH4+).
20. 5% (v/v) formic acid (FA).
21. PepClean™ C18 spin columns (Thermo Scientific).

22. Activation Solution: 50% (v/v) acetonitrile (ACN). 400 μl per
sample.
23. Equilibration Solution: 0.5% (w/v) trifluoroacetic acid (TFA)
in 5% (v/v) ACN. 400 μl per sample.
24. Sample Buffer: 2% (w/v) TFA in 20% (v/v) ACN; 1 μl for every
3 μl of sample.
25. Wash Solution: 0.5% (w/v) TFA in 5% (v/v) ACN. 400–800 μl
per sample.
26. Elution Buffer: 70% (v/v) ACN. 40 μl per sample.
2.2 Sample Analysis 1. Reverse-phase (RP) analytical column: AdvanceBio UHPLC

by LC-MS/MS column, 2.1 mm 250 mm, 2.7 μm particle size (Agilent
Technologies).
2. Agilent 6550 hybrid spectrometer Q-TOF equipped with a Jet
Stream® source (see Note 1).
3. RP chromatography buffer A (RPB-A): 0.1% (v/v) FA and 5%
(v/v) ACN in water.
4. RP chromatography buffer B (RPB-B): 0.1% (v/v) FA and 90%
(v/v) ACN in water.
3 Methods
The procedure detailed here for high-quality protein extraction

from a recalcitrant plant tissue has been used in our group with
cocoa (Theobroma cacao L.) pod [24], which is adapted from Wang
et al. [27] with some modifications. Digestion is performed accord-
ing to Klammer and MacCoss [28] with modifications. Labeling
steps are adapted from Boersema et al. [14]. The following proto-
col is used to assess proteomic differences between two samples in
the same LC-MS/MS assay.
3.1 Sample 1. Grind plant tissue (0.1–0.3 g) with mortar and pestle in liquid
Preparation and nitrogen to a fine powder and place it in 2 ml microtubes.
Labeling 2. Resuspend samples in 1 ml of cold 20% TCA in acetone.
3.1.1 Protein Extraction 3. Vortex thoroughly for 30 s and centrifuge at 10,000 g at 4 C
for 5 min.
4. Discard the supernatant and repeat washing steps until it
becomes colorless.
5. Wash the pellet with 1 ml of cold 20% TCA in water twice as in
step 3.
6. Wash the pellet with 1 ml of cold 80% acetone twice as in step 3.
7. Dry the pellet at room temperature.
8. Resuspend the pellet in 800 μl of Tris-saturated phenol pH 8.0

and 800 μl SDS buffer.
9. Vortex thoroughly for 30 s and incubate with orbital shaking
on ice 1 h.
10. Centrifuge at 10,000 g at 4 C for 20 min and recover upper
phenol phase by pipetting it out to a fresh microtube (see Note
2).
11. Reextract remaining aqueous phase as in steps 9 and 10.
12. Precipitate proteins from the pooled phenol phases by adding
5 volumes of cold 0.1 M ammonium acetate in methanol,
incubate at 20 C overnight.
13. Collect precipitated proteins by centrifugation at 10,000 g at
4 C for 10 min.
14. Wash pellet with 0.1 M ammonium acetate twice by centrifu-
gation at 10,000 g at 4 C for 5 min.
15. Wash pellet with 80% acetone twice as in step 14.
16. Dry pellet at room temperature.
17. Dissolve pellet in 25 μl of urea 6 M (see Note 3).
18. Assess protein quantitation using a suitable protein assay kit
(e.g., BCA assay or Bradford).
3.1.2 Sample Follow this stage in case you have your proteins extracted and
Precipitation quantified but resuspended in a different solution to urea 6 M.
Otherwise, skip to Sample trypsin digestion (Subheading 3.1.3).
1. Aliquot 25 μg of protein from each sample in a new tube.
2. Add 0.5 volumes of 2% (w/v) DOC and incubate on ice for
15 min.
3. Add 0.5 volumes of 24% (w/v) TCA and incubate on ice for
20–30 min.
4. Centrifuge at 10,000 g at 4 C for 10 min. Remove the
supernatant.
5. Wash twice the pellet with 80% acetone (20 C) as in step 4.
Dry the pellet at room temperature.
6. Resuspend the pellet in 25 μl of 6 M (see Note 3).
3.1.3 Sample Trypsin 1. Reduce disulfide bridges by adding 0.2 volumes of 0.2 M DTT
Digestion in 25 mM TEAB. Vortex and incubate at 37 C for 1 h.
2. Alkylate cysteines thiol groups by adding 0.7 volumes of 0.2 M
IAM in 25 mM TEAB. Vortex and incubate in the dark at room
temperature for 1 h.
3. Add 2 volumes of 0.1 M TEAB and 0.4 volumes of 0.2 M

DTT. Adjust the pH to 7–9 with 1 M TEAB if necessary, to
optimize following digestion by trypsin.
4. Digest samples to obtain peptides by adding trypsin in a 30:1
protein–trypsin ratio (i.e., 1 μg trypsin per 30 μg protein).
Vortex and incubate at 37 C overnight.
5. Complete digestion by adding trypsin in a 60:1 protein–trypsin
ratio (i.e., 1 μg trypsin per 60 μg protein). Vortex and incubate
at 37 C for 3–5 h.
6. Dry samples in SpeedVac centrifuge.
3.1.4 Sample Dimethyl 1. Dissolve samples in 100 μl of 0.1 M TEAB with vortex.
Labeling 2. Label one of the two samples with 4 μl of 4% (v/v) formalde-
hyde light variant (CH2O). On the other hand, label the sec-
ond sample with 4 μl of 4% (v/v) formaldehyde heavy variant
(CD2O).
3. Complete labeling by adding 4 μl of 0.6 M sodium cyanobor-
ohydride. Vortex and incubate at room temperature with gen-
tle shaking for 1 h at room temperature.
4. Quench labeling by adding 16 μl of 1% (v/v) ammonium for
each tube. Add this reagent in a fume hood. Vortex.
5. Add 8 μl of 5% (v/v) FA to both samples.
6. Mix samples finally in the same tube to run a single LC-MS/
MS assay later.
3.1.5 Sample Cleaning 1. For desalting samples with PepClean™ C18 spin columns
follow manufacturer’s instructions (Pierce®) (see Note 4).
Briefly, set up the resin with Activation and Equilibration solu-
tions. Load sample with Sample Buffer. Remove impurities
with Wash Solution. Finally, obtain peptides with Elution
Solution.
3.2 Sample Analysis The protocol described here is used with an Agilent 1200 UHPLC
by LC-MS/MS equipped with an AdvancedBio column (2.1 mm 250 mm,
2.7 μm particle size) coupled to an Agilent 6550 hybrid spectrom-
eter Q-TOF equipped with a Jet Stream® source.
1. 8 μl injections are programmed to ensure reproducible sample
injection in autosampler.
2. Peptide are separated in the aforementioned analytical column
using a 140-min linear gradient from 3 to 40% RPB-B flowing
at 0.4 ml/min.
3. Peptides were introduced to the mass spectrometer from the
LC by using a Jet Stream source (Agilent Technologies)
operating in positive-ion mode (3500 V) and in high sensitivity
mode. Source parameters employed: gas temperature (250 C),

drying gas (14 L/min), nebulizer (35 psig), sheath gas temper-
ature (250 C), sheath gas flow (11 L/min), capillary voltage
(3500 V), fragmentor (360 V), and OCT 1 RF Vpp (750 V).
4. Q-TOF can operate in a high sensitivity mode and Auto
MS/MS, allowing for detection of the 20 most intense pre-
cursors with charge 2–5 and above a threshold of 1000 counts
in a 300–1700 m/z scan. MS/MS spectra (50–1700 m/z scans)
are acquired until either 25,000 counts in total were collected
or a maximum of 333 ms accumulation time.
3.3 Protein The protocol described here is based on the functionality of two
Identification and software packages Progenesis QI for proteomics (PQIp) (Nonlin-
Quantitation ear Dynamics, Waters) and Proteolabels (Omics Analytics). For
detailed handling of the software, refer to user guides and online
help (http://www.nonlinear.com/progenesis/qi-for-proteomics/
v3.0/user-guide/, http://www.omicanalytics.com/products/pro
teolabels/doc). PQIp provides a platform for MS and MS/MS
data extraction, LC-MS alignment across runs, MS feature detec-
tion, MS signal intensity normalization, and management of
MS/MS spectra-derived peptide and protein identification. The
PQIp output is imported in Proteolabels to carry out the quantita-
tive analysis based on heavy–light intensity ratio of MS feature pairs.
The previous alignment before fragment spectra identification in
PQIp allows for the propagation of identified fragment spectra
across runs, thus filling in identification gaps between runs; like-
wise, MS feature pairing in Proteolabels allows for the propagation
of identity within runs. As a result, this workflow (Fig. 2) leads to an
enhancement of identified spectra and quantified peptides and
proteins.
3.3.1 LC-MS and MS/MS 1. Import .d files generated from Agilent MassHunter Worksta-
Processing with Progenesis tion (i.e., the software implemented for data acquisition from
QI for Proteomics UHPLC-Q-TOF instrument) in PQIp (see Note 5). This soft-
ware has its own MS and MS/MS raw data extraction tool from
.d files, thus generating peak list files.
2. Select one of the runs as the reference and align to it all LC runs
making MS features overlap each other to a minimal alignment
score of 80% (see Notes 6 and 7) (Fig. 3).
3. Review peak picking automatically assigned by the software.
Add, edit, or delete peak detections manually to define couples
of precursors MS spectra (light and heavy) (see Note 8) (Fig. 4).
4. Export MS/MS spectra as .mgf files (see Note 9).
Laboratory workflow
Cocoa Pod Protein Trypsin Stable-isotope Mix pairs of phenotypes UHPLC-MS analysis
tissue sampling extraction digestion dimethyl labeling differentially labeled (e.g. QTOF)
Bioinformatic processing
Chromatographic MS/MS spectra search MS/MS-peptide Heavy/Light ratio Ontology annotation

runs alignment and with dimethyl as sequence association quantitation peptide
features detection quantitation parameter and protein levels
(e.g. Progenesis QI) (e.g. MASCOT) (e.g. Progenesis QI) (e.g. Proteolabels) (e.g. Blast2GO)
Fig. 2 Laboratory workflow and subsequent bioinformatic data treatment. Once proteins have been extracted
from plant tissue and digested, dimethyl labeling is carried out independently. Labeled samples are mixed per
pairs so that each pair contains both heavy and light dimethyl versions and the two phenotypes, and the mix is
then analyzed in a LC-MS instrument in data dependent acquisition mode. Bioinformatic processing comprises
LC peaks alignment across different runs and MS features detection. Those features belonging to light-labeled
peptides keep the expected m/z differences with the respective heavy-labeled counterpart and their MS/MS
spectra are used for database search selecting dimethyl labeling as quantitation parameter. After search
result import, association MS/MS-peptide sequence is accomplished. Quantitation by H/L ratio calculation at
peptide and protein levels is the last quantitative workflow step. Identified proteins with differential abundance
between experimental groups can be annotated by a Gene Ontology. Software packages used for each task in
this work is indicated between brackets
Fig. 3 LC-MS map showing peak signals on PQIp alignment stage. Refinement must focus on central area,
where points are gathered the most. Bottom area in the map (i.e., highest retention time and lowest m/z ratio)
may be due to nonpeptidic contaminant compounds. Hairpin-like alignment vectors are first seeded manually
and the added automatically throughout the map to make samples LC signals overlap each other at most.
Suitability degree is ranked in a color code, where green stands for good-quality alignment on depicted area.
High-quality alignment must be focus at least on central zone, where most of the peaks are present
Fig. 4 Example of a pair of differentially dimethylated peptides showed in PQIp interface. Top-left images
depict intensity and m/z values for light (m/z ¼ 638.3313, z ¼ 2) precursors’ isotopic envelope and its
chromatographic peak. Top-right images depict intensity and m/z values for heavy (m/z ¼ 640.3313, z ¼ 2)
precursors’ isotopic envelope and its chromatographic peak. Bottom-right image depicts complete signals
map in LC run to locate the showed feature. Bottom-right shows a zoomed area of the LC-MS map centered in
the selected precursor. Light and heavy precursor doublets can be seen throughout. Each precursor in the
doublet must have the same number of nonoverlapped isotopic signals detected (i.e., signals framed in the
same strip of linked squares), ideally four including the monoisotopic one. Charge state is depicted in a color
code (e.g., red peaks stand for doubly charged precursors). Reviewing must focus above all on peaks with high
MS/MS counts, good-quality chromatogram, and placed in central area of the map
3.3.2 Protein The exported file of peak lists is used for fragment ion database
Identification search using an appropriate search engine. Here we have used
MASCOT. If a different search engine is used, peak lists should
be exported in the appropriate format. This search will detect
dimethyl-labeled peptides correctly after previous LC runs refine-
ment. Label assignment is essential for protein quantitation.
1. The exported .mgf dataset is searched against a cocoa genome
encoded peptide database (https://www.cacaogenomedb.org/
databases.cacao11peptides_pub3i.aa.fasta) supplemented with
contaminant proteins selecting the following settings: enzyme
trypsin up to three missed cleavages, quantitation with
dimethylation, carbamidomethylation in cysteines as fixed
modification, oxidation in methionine as variable modification,
peptide tolerance of 20 ppm, MS/MS tolerance of 50 ppm,
monoisotopic mass, and peptide charge of 2+, 3+, and 4+. Data
format as Mascot generic is required.
2. Export results .xml files.
3.3.3 Protein 1. Import search results generated as .xml files by MASCOT in

Identification Refinement PQIp to review peptides identity. Identifications obtained with
with Progenesis QI for score <20 are filtered out as well as those of non–T. cacao
Proteomics species.
2. Resolve identification conflicts. As each MS/MS spectrum has
several amino acid sequence candidates in case of protein
assignment conflict the winner is that of higher MASCOT
score (see Note 10). Discard the rest of identification
assignments.
3. Export data to Proteolabels directly from PQIp interface.
3.3.4 Protein Quantification is performed in this software with precursor LC-MS

Quantitation in Proteolabels correctly refined in PQIp, and dimethyl-labeling detection properly
in MASCOT.
1. Import desired experiment from PQIp.
2. Experiment detection settings must include dimethyl labeling
in N-term and K, mass shift of 4.025 Da with tolerance for
mass of 10 ppm and 0.15 for retention time.
3. As mentioned in Note 7, group together replicates (i.e., differ-
entially labeled samples couples) belonging to same experiment
to obtain heavy/light (H/L) ratio for relative quantitation
(Fig. 5).
4. Select only peptides identified in a protein group.
5. Export results to generate a table with H/L quantitation ratios
for each protein.
Fig. 5 Experimental design setup stage in Proteolabels. Differentially labeled samples run in the same LC
analysis must be placed in the same position on its own Sample column so that they are correctly assigned to
be one replicate. In this case, the only experiment showed (i.e., Condition 1) is made up by four replicates,
each one consisting of differentially labeled peptides with light and heavy dimethyl labels. As an example,
Sample A may be the control group and Sample B would be the treated group. Four H/L ratios would be the
output
4 Notes
1. High-resolution spectrometers (e.g., Orbitraps) are

suitable too.
2. SDS complex often appears at the interphase. Care should be
taken not to disturb this interphase by pipetting.
3. Do not overheat the sample, otherwise cross-reactivity of urea
with residues as for lysine and arginine may occur. Dissolve
completely the proteins with vortex and sonication if necessary.
In case the pellet does not dissolve well, incubation at 4 C with
agitation overnight in 6–8 M urea may work.
4. Prior to LC-MS/MS analysis, sample cleaning is important to
eliminate interfering salts. These columns get saturated with
30 μg protein. If a higher amount of sample must be analyzed,
it may be useful to use some more columns for the Wash
product of the previous one so as not to lose sample as a result
of the overload. Elution products of every column can be
gathered in the same tube so that the entire sample quantity
is analyzed in the same LC-MS/MS assay.
5. Every dataset to present in the work can be initially imported all
at once regardless of the experiment it belongs to, since refined
data would be assigned to a defined group of replicates in
Proteolabels software later. However, following alignment or
LC peak reviewing would be less time-effective due to the large
number of spectra handled.
6. Automatic alignment can be performed, but results may not be
suitable, following manually alignment would be necessary,
therefore.
7. If absolute, complete alignment throughout the map is not
viable, focus mainly in the feather-like area present in the mid-
dle of the map.
8. This step is critical for proper quantitation. Every precursor
isotope envelope must have the same number of isotopic var-
iants detected than its labeled counterpart, which is equally
charged besides. It may be useful to take into account the
theoretical Δm/z between monoisotopic variants to make pair
assignment easier (e.g., a couple of two-charged precursors
differs in 2 m/z units at monoisotopic variants, due to the
mass discrepancy of 4 Da because of the differential dimethyl
label).
9. MS/MS spectra data can be exported in a wide variety of search
formats. In this case, we use MASCOT as search engine for
protein identification with dimethyl-labeled peptides.
10. There may be peptides with the same identification score for a
single protein. In this case, those equally assigned peptides are
considered.
Acknowledgments
Work supported by grants from Senescyt-Government of Ecuador

(UTEQ-Ambiental-9-FCAmb-IFOR-2014-FOCICYT002),
MAEC-AECID (2014-2015), Spanish Ministry of Science and
Innovation (BIO2017-82374-R), Spanish Ministry of Economy
and Competitiveness (PEJ-2014-A-90762/PEJ-2014-P-00289)
and European Funds for Regional Development (FEDER).
References
1. Hansen KC, Schmitt-Ulms G, Chalkley RJ et al cerebrospinal fluids by MS/MS using 6-plex
(2003) Mass spectrometric analysis of protein isobaric tags. Anal Chem 80:2921–2931
mixtures at low levels using cleavable 13C-iso- 9. Schmidt A, Kellermann J, Lottspeich F (2005)
tope-coded affinity tag and multidimensional A novel strategy for quantitative proteomics
chromatography. Mol Cell Proteomics using isotope-coded protein labels. Proteomics
2:299–314 5:4–15
2. Boutilier JM, Warden H, Doucette AA et al 10. Zhou Y, Shan Y, Wu Q et al (2013) Mass
(2012) Chromatographic behaviour of pep- defect-based pseudoisobaric dimethyl labeling
tides following dimethylation with H2/D2- for proteome quantification. Anal Chem
formaldehyde: implications for comparative 85:10658–10663
proteomics. J Chromatogr B 908:59–66 11. Boersema PJ, Aye TT, van Veen TA et al (2008)
3. Ong SE, Blagoev B, Kratchmarova I, Kristen- Triplex protein quantification based on stable
sen DB et al (2002) Stable isotope labeling by isotope labeling by peptide dimethylation
amino acids in cell culture, SILAC, as a simple applied to cell and tissue lysates. Proteomics
and accurate approach to expression proteo- 8:4624–4632
mics. Mol Cell Proteomics 1:376–386 12. Hsu JL, Huang SY, Chen SH (2006) Dimethyl
4. Conrads TP, Alving K, Veenstra TD et al multiplexed labeling combined with microcol-
(2001) Quantitative analysis of bacterial and umn separation and MS analysis for time course
mammalian proteomes using a combination study in proteomics. Electrophoresis
of cysteine affinity tags and 15N-metabolic 27:3652–3660
labeling. Anal Chem 73:2132–2139 13. Wu Y, Wang F, Liu Z et al (2014) Five-plex
5. Chahrour O, Cobice D, Malone J (2015) Sta- isotope dimethyl labeling for quantitative pro-
ble isotope labeling methods in mass teomics. Chem Commun (Camb)
spectrometry-based quantitative proteomics. J 50:1708–1710
Pharm Biomed Anal 113:2–20 14. Boersema PJ, Raijmakers R, Lemeer S et al
6. Ross PL, Huang YN, Marchese JN (2004) (2009) Multiplex peptide stable isotope
Multiplexed protein quantitation in Saccharo- dimethyl labeling for quantitative proteomics.
myces cerevisiae using amine-reactive isobaric Nat Protoc 4:484–494
tagging reagents. Mol Cell Proteomics 15. Hsu JL, Huang SY, Chow NH et al (2003)
3:1154–1169 Stable-isotope dimethyl labeling for quantita-
7. Choe L, D’Ascenzo M, Relkin NM (2007) tive proteomics. Anal Chem 75:6843–6852
8-plex quantitation of changes in cerebrospinal 16. Di Palma S, Raijmakers R, Heck AJ et al (2011)
fluid protein expression in subjects undergoing Evaluation of the deuterium isotope effect in
intravenous immunoglobulin treatment for zwitterionic hydrophilic interaction liquid
Alzheimer’s disease. Proteomics 7:3651–3660 chromatography separations for implementa-
8. Dayon L, Hainard A, Licker V (2008) Relative tion in a quantitative proteomic approach.
quantification of proteins in human Anal Chem 83:8352–8356
17. Wu CJ, Chen YW, Tai JH et al (2011) Quanti- 23. Tolonen AC, Haas W, Chilaka AC et al (2011)
tative phosphoproteomics studies using stable Proteome wide systems analysis of a cellulosic
isotope dimethyl labeling coupled with IMAC- biofuel-producing microbe. Mol Syst Biol
HILIC-nanoLC- MS/MS for estrogen 7:461
induced transcriptional regulation. J Proteome 24. Martı́nez-Márquez A, Morante-Carriel JA,
Res 10:1088–1097 Bru-Martinez R (2017) A comparison of tissue
18. Xu B, Wang F, Song C et al (2014) Large-scale preparation methods for protein extraction of
proteome quantification of hepatocellular car- cocoa (Theobroma cacao L.) pod. Acta Agron
cinoma tissues by a three-dimensional liquid 66:248–253
chromatography strategy integrated with sam- 25. Phillips-Mora W, Wilkinson MJ (2007) Frosty
ple preparation. J Proteome Res pod of cacao: a disease with a limited geo-
13:3645–3654 graphic range but unlimited potential for dam-
19. Cappadona S, Muñoz J, Spee WPE (2011) age. Phytopathology 97:1644–1647
Deconvolution of overlapping isotopic clusters 26. Phillips-Mora W, Aime M, Wilkinson M
improves quantification of stable isotope (2007) Biodiversity and biogeography of the
labeled peptides. J Proteome 74:2204–2209 cacao (Theobroma cacao) pathogen Moni-
20. Hsu J-L, Chen S-H (2016) Stable isotope liophthora roreri in tropical America. Plant
dimethyl labeling for quantitative proteomics Pathol 56:911–922
and beyond. Philos Trans R Soc A 27. Wang W, Scali M, Vignani R et al (2003) Pro-
374:20150364 tein extraction for two-dimensional electro-
21. Ji C, Li L (2005) Quantitative proteome analy- phoresis from olive leaf, a plant tissue
sis using differential stable isotopic labeling and containing high levels of interfering com-
microbore LCMALDIMS and MS/MS. J pounds. Electrophoresis 24:2369–2375
Proteome Res 4:734–742 28. Klammer AA, MacCoss MJ (2006) Effects of
22. Synowsky SA, van Wijk M, Raijmakers R et al modified digestion schemes on the identifica-
(2009) Comparative multiplexed mass spectro- tion of proteins from complex mixtures. J Pro-
metric analyses of endogenously expressed teome Res 5:695–700
yeast nuclear and cytoplasmic exosomes. J
Mol Biol 385:1300–1313
Chapter 11
Quantitative Profiling of Protein Abundance

and Phosphorylation State in Plant Tissues Using
Tandem Mass Tags
Gaoyuan Song, Christian Montes, and Justin W. Walley
Abstract
Proteins produce or regulate nearly every component of cells. Thus, the ability to quantitatively determine
the protein abundance and posttranslational modification (PTM) state is a critical aspect toward our
understanding of biological processes. In this chapter, we describe methods to globally quantify protein
abundance and phosphorylation state using isobaric labeling with tandem mass tags followed by phospho-
peptide enrichment.
Key words Plant proteomics, Tandem mass tags (TMT), Phosphoproteomics, Protein extraction,
Mass spectrometry
1 Introduction
Quantitative proteomics utilizing liquid chromatography

(LC) coupled to tandem mass spectrometry (MS/MS) represents
the state-of-the-art approach that is employed for deep and quanti-
tative analyses of protein abundance and posttranslational modifi-
cation (PTM) levels. Sample preparation is a critical step in
quantitative proteomic workflow and is particularly challenging
for plant tissues. Plants produce large amounts of interfering com-
pounds such as phenolics, terpenes, pigments, organic acids, lipids,
and polysaccharides, which makes generation of high quality sam-
ples difficult [1, 2]. We recently evaluated a number of sample
preparation methods and found that either a phenol or urea-
based extraction prior to protein digestion on molecular weight
cutoff filters (filter-aided sample preparation, FASP) enables gener-
ation of high quality peptides suitable for deep proteome profiling
of plant samples [3].
PTMs vastly expand proteome complexity and are critical mod-
ifications that affect regulatory activity, localization, and interaction
147
148 Gaoyuan Song et al.
of the protein with other molecules. Protein phosphorylation has

been the most intensively studied PTM in plants though proteomic
studies reporting modifications such as lysine acetylation and ubi-
quitination are increasing [4–14]. Typically, deep coverage of
PTMs requires an enrichment step, prior to MS, due to their low
abundance. Metal oxide affinity enrichment is a common approach
for phosphoproteomics that uses metal ions (TiO2, CeO2, Fe3+) to
bind the negatively charged phosphate for enrichment [15–19].
Finally, there are a number of approaches available to quantify
protein and PTM abundance. These methods include in vitro (i.e.,
isobaric labeling) and in vivo (i.e., stable isotope labeling by amino
acids in cell culture, SILAC) labeling as well as label-free
(ion-intensity or spectral counting) strategies [20]. Due to chal-
lenges associated with in vivo metabolic labeling in plants the
majority of plant proteomic studies utilize either isobaric labeling
or label-free quantification (LFQ) approaches for quantitative pro-
teomics. LFQ does not require the use of costly labeling reagents
and because each sample is run independently the experimental
design for analyzing a large number of samples (>11) is straight-
forward. However, because each sample is run independently LFQ
typically suffers from missing values (i.e., protein/PTM not identi-
fied in each MS run). Isobaric tagging using either isobaric tag for
relative and absolute quantitation (iTRAQ [21]) or tandem mass
tags (TMT [22]) enables multiplexing of up to 11 samples in a
single run, and various strategies have been developed to enable
experimental designs with more than 11 samples [23]. When quan-
tified at the MS2 level isobaric approaches have lower accuracy
compared to LFQ due to ratio compression [24]. However, the
higher precision provided by isobaric tagging has been demon-
strated to enable identification of a larger number of significant
differential regulation events than other quantification approaches
[25]. When isobaric quantification is incorporated into PTM stud-
ies the isobaric tagging of peptides is often done following PTM
enrichment, due to the cost of labeling reagent and the scale of
peptides needed for enrichment. While this reduces experimental
cost technical variation in PTM enrichment is not controlled.
However, modifications to the isobaric labeling reaction enable
cost-effective labeling of large amounts of peptides, which makes
labeling prior to phosphopeptide enrichment cost-effective.
Below we provide a current workflow for quantitative profiling
of protein abundance and phosphorylation state that details protein
extraction, digestion, TMT labeling, and phosphopeptides enrich-
ment steps (Fig. 1), which has been recently described in our recent
publications [3, 26]. While these methods are optimized for plant
samples, we have used them to analyze a range of nonplant samples
such as yeast and mouse [27].
Quantitative Phosphoproteomics Using TMT 149
Protein extraction
Filter-assisted sample purification (FASP) / reduce / alkylate

Lys-C and Trypsin digestion
TMT isobaric labelling (11-plex)
Combine all labeled peptides
LC-MS/MS LC-MS/MS
Phospho-peptide
acquisition acquisition
enrichment
Total peptides Phosphorylated peptides

Identification / quantification Identification / quantification
m/z m/z
Fig. 1 Quantitative proteomics workflow using tandem mass tags. Proteins from up to 11 samples are
extracted and digested into peptides using phenol-FASP. Peptides from each sample are then independently
labeled with a TMT reporter. Following labeling, the samples are pooled into a single tube. The pooled
TMT-labeled samples can directly be used to quantify protein abundance. The pooled TMT-labeled samples
can also be subjected to phosphopeptide (or other PTM) enrichment prior to LC-MS/MS
2 Materials
2.1 Protein 1. Tris buffered phenol, pH 8.

Extraction 2. Protein extraction buffer: 50 mM Tris pH 7.5, 1 mM ethyle-
nediaminetetraacetic acid (EDTA) pH 8, 0.9 M sucrose.
3. 50 Phosphatase inhibitor mix: 125 mM sodium fluoride
(NAF), 12.5 mM sodium vanadate (NaVO4), 12.5 mM
sodium pyrophosphate decahydrate (NaPyroPO4), and
12.5 mM glycerophosphate (glycerol-P) in H2O.
4. 0.1 M ammonium acetate in methanol.
5. 70% methanol.
6. Protein resuspension buffer: 8 M urea, 50 mM Tris pH 7,
5 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP:).
2.2 Filter-Aided 1. UA solution: 8 M urea in 0.1 M Tris–HCl pH 8.0. Prepare

Sample Preparation 16 ml per sample (see Note 1).
(FASP) and on Filter 2. IAM solution: 0.05 M iodoacetamide (IAM) in UA solution.
Digestion Prepare 2 ml per sample.
3. Trypsin, 1 μg/μl in water.
4. Lys-C, 0.1 μg/μl in water.
5. Ammonium bicarbonate (ABC) solution: 0.05 M NH4HCO3
in water. Prepare 7 ml per sample.
6. TCEP stock: 100 mM TCEP in 1 M Tris–HCl pH 7.3.
7. Amicon Ultracel-30K Centrifugal Filters—4 ml (Millipore,
UFC803008) (see Note 2).
8. Pierce™ Detergent compatible Bradford assay kit (Thermo-
Fisher Scientific).
9. pH-indicator strips (Millipore, 1095840001 and
1095410001).
2.3 C18 Desalting 1. C18 column 100 mg (Waters C18 Cartridge).

2. 100% methanol.
3. H2O (LC-MS grade).
4. 20%, 40%, 80% Acetonitrile (ACN).
5. Vacuum manifold (Visiprep™ SPE Vacuum Manifold).
6. Speedvac system (ThermoFisher Scientific).
7. Pierce™ BCA protein assay kit (ThermoFisher Scientific).
2.4 TMT Labeling 1. TMT label reagents (ThermoFisher).

2. 0.5 M HEPES, pH 8.5 (Alfa Aesar)
3. DriSolv® ACN (Millipore, AX0143).
4. 50% hydroxylamine (ThermoFisher).
2.5 Phosphopeptide 1. Titansphere Phos-TiO2 beads (GL Sciences).

Enrichment 2. Wash and binding buffer: 2 M lactic acid in 50% acetonitrile.
3. 50% ACN in 0.1% trifluoroacetic acid (TFA)
4. Elution buffers: 3% and 5% ammonium hydroxide.
5. 0.1% formic acid (FA).
3 Methods
3.1 Protein 1. Grind 0.1 g plant tissue in liquid nitrogen into fine power using
Extraction a ceramic mortar and pestle for 15 min (see Note 3).
2. Add 5 volumes (tissue–buffer; w:v) of Tris buffered phenol
pH 8 to the ground tissue and vortex for 1 min. For example,
for 0.1 g tissue add 5 ml of buffer.
3. Add 5 volumes (tissue–buffer; w:v) extraction buffer with 1
phosphatase inhibitor mix to the phenol–tissue solution, vortex
1 min.
4. Centrifuge at 13,000 g for 10 min at 4 C (see Note 4).
5. Transfer the phenol solution (top layer) to a new tube, add
same volume buffered phenol pH 8 as step 2 to the aqueous
phase, and vortex for 1 min.
6. Centrifuge at 13,000 g for 10 min at 4 C (see Note 4).
7. Transfer the phenol phase and combine with the phenol phase
from step 5.
8. Add 5 volumes of prechilled methanol with 0.1 M ammonium
acetate to the phenol solution, vortex to mix well.
9. Incubate at 80 C for 1 h.
10. Centrifuge at 4500 g, for 10 min at 4 C.
11. Discard the supernatant, add same volume of prechilled meth-
anol with 0.1 M ammonium acetate as step 8 to the tube.
12. Resuspend the pellet with probe sonication, then keep at
20 C for 30 min.
13. Centrifuge at 4500 g, for 10 min at 4 C.
14. Repeat once steps 11–13.
15. Add 5 ml prechilled 70% methanol to the tube, resuspend the
pellet with probe sonication, then keep at 20 C for 30 min to
overnight (see Note 5).
16. Centrifuge at 4500 g for 10 min at 4 C.
17. Discard the supernatant, remove the residual methanol by
speedvac at room temperature, resuspend in resuspension
buffer, and measure the protein concentration using the Brad-
ford assay.
3.2 FASP 1. Add the protein (max 4 mg protein) resuspended in 4 ml UA

buffer to filter unit and centrifuge at 4000 g for 20–40 min
(see Note 6).
2. Add 4 ml UA and centrifuge at 4000 g for 20–40 min.
3. Add 4 ml of UA with 2 mM TCEP to the filter unit and
centrifuge at 4000 g for 20–40 min.
4. Discard the flow-through from the collection tube.
5. Add 2 ml IAA solution, mix, and incubate without mixing for
30 min in dark.
6. Centrifuge the filter units at 4000 g for 20–40 min.
7. Add 2 ml of UA to the filter unit and centrifuge at 4000 g for
20–40 min. Repeat this step once.
8. Add 2 ml of ABC to the filter unit and centrifuge at 4000 g
for 20–40 min. Repeat this step once.
9. Transfer the filter unit to a new collection tube.
10. Add 2 ml ABC with trypsin (enzyme to protein ratio 1:50) and
mix well.
11. Incubate at 37 C overnight.
12. Estimate the amount of undigested protein using a Bradford
assay.
13. Add trypsin (enzyme to protein ratio 1:100) and an equal
volume of Lys-C (0.1 μg/μl). Incubate 2–4 h at 37 C.
14. Centrifuge the filter units at 4000 g for 20–40 min.
15. Add 1 ml ABC and centrifuge the filter unit at 4000 g for
20–40 min. Repeat this step once.
16. Acidify samples to pH 3 with 100% formic acid (measure pH
using indicator paper), store at 80 C until further
processing.
3.3 C18 Desalting 1. Setup a C18 column on Vacuum manifold and place a 15 ml
conical tube in vacuum chamber to hold flow through.
2. Rinse the column (for a 100 mg C18 column) with 1 ml
MeOH followed by 1 ml Water; repeat with other 1 ml water.
3. Load digested sample at a flow rate of less than 1 ml per min (see
Note 7).
4. Wash the column with 1 ml water, repeat once with other 1 ml
water.
5. Elute peptides stepwise with 250 μl 20% acetonitrile, 250 μl
40% acetonitrile, and 500 μl 80% acetonitrile.
6. Speedvac the peptides solution until almost dry.
7. Resuspend the pellet with water to a protein concentration of

~1 mg/ml, vortex and spin to dissolve well.
8. Measure the peptides amount with BCA assay and store at
80 C.
3.4 TMT Labeling 1. Resuspend 100 μg vacuum dried peptides, per sample, with
100 μl of 0.2 M HEPES, pH 8.5 to a final concentration of
1 μg/μl. Vortex for 10 min to dissolve well (see Note 8).
2. Remove TMT labels from the freezer and warm to room
temperature.
3. Resuspend TMT labels with 60 μl of dry acetonitrile, vortex,
and leave at room temperature for 5 min. Then spin to collect
reagent at the bottom of the vial.
4. When labeling 100 μg of peptides, aliquot 10 μl of the resus-
pended TMT labeling reagent into a tube and add 30 μl of
ACN. The remaining 50 μl of TMT labeling reagent and be
used to label additional samples in parallel or vacuum dried for
later use (see Note 9).
5. Add TMT labels (i.e., 40 μl) to the resuspended peptides
(100 μl) with a ratio 0.4:1 (ACN–HEPES; v:v), vortex to mix
well, and leave at room temperature for 1–2 h (see Note 10).
6. Add 8 μl 5% hydroxylamine to each 140 μl TMT labeling
solution to quench the reaction, vortex to mix well, and leave
at room temperature for 15 min.
7. Pool the TMT-labeled peptides to a single tube and vortex
well, speedvac to almost dry (see Note 11).
3.5 Phosphopeptide 1. Resuspend Titansphere Phos-TiO2 beads with wash and bind-
Enrichment ing buffer (prepare 6 mg beads for each 1 mg peptides) by
vortexing, centrifuge at 3000 g for 1 min, and then remove
the supernatant. Repeat this step for a total three times (see
Note 12).
2. Resuspend the pooled TMT-labeled peptides with wash and
binding buffer to a final concentration of 1 μg/μl and vortex to
dissolve well.
3. Transfer resuspended peptides to the tube with TiO2 beads
(4 mg beads for each 1 mg peptides) and rotate at room
temperature for 1 h.
4. Centrifuge at 3000 g for 1 min, save the beads, and transfer
supernatant to a new tube containing the second aliquot of
TiO2 beads (2 mg beads for each 1 mg peptides), rotate at
room temperature for 1 h.
5. Centrifuge at 3000 g for 1 min; save the beads (see Note 13).
6. Resuspend TiO2 beads from steps 4 to 5 with wash and bind-

ing buffer, centrifuge at 3000 g for 1 min, and discard the
supernatant. Repeat this step once.
7. Resuspend the TiO2 beads with 50% acetonitrile in 0.1% TFA,
centrifuge at 3000 g for 1 min, and discard the supernatant.
Repeat this step once.
8. Add 500 μl of 3% ammonium hydroxide to each tube of beads,
vortex, and centrifuge at 3000 g for 1 min.
9. Remove the supernatant to a new tube.
10. Add 500 μl of 5% ammonium hydroxide to each tube of beads,
vortex, and centrifuge at 3000 g for 1 min. Remove the
supernatant and combine with the supernatant from step 9.
11. Speedvac the supernatant to almost dry, resuspend with 0.1%
FA, measure the peptides amount with BCA assay (optional),
and store at 80 C until LC/MS-MS analysis.
4 Notes
1. UA solution should be prepared fresh.

2. The capacity of the filter of the largest amount of protein is
4 mg.
3. The protein yield from different tissues or species varies. Thus,
the amount of tissue necessary should be optimized.
4. For large scale protein extractions using 15 ml or 50 ml tube
centrifuge at 4500 g, for 15–20 min at 4 C.
5. The volume of this step will influence the efficiency of protein
precipitation, so for protein amounts lower than 500 μg 1–2 ml
of 70% methanol can be used.
6. The centrifuge time in each step may need to be adjusted for
different samples. It is not necessary to completely remove the
liquid above the filter; up to 10% of the volume can remain
above the filter.
7. The pH should be around 2–3, which is critical for the binding
activity of C18 column. The initial flow-through can be
reloaded on the column to increase peptide recovery.
8. We usually use at least 100–200 μg peptides (or more) per
sample for TMT labeling. When carrying out TMT 10-plex
experiments this will enable phosphoenrichment from 1 to
2 mg of pooled TMT-labeled peptides. Larger amounts of
peptides enable identification of a greater number of phosphor-
ylation sites. For reference, from 750 μg of TMT-labeled pep-
tides we can identify ~10,000 phosphosites [26], whereas
enrichment from 2 mg of peptides enables identification of

~17,000 phosphosites.
9. Unused TMT labeling reagent can be dried by speedvac and
stored at 80 C. Only thaw/freeze one time. Divide each
label into multiple aliquots prior to vacuum drying if future
small-scale labeling reactions are anticipated.
10. In our hands, each tube of TMT labels (0.8 mg) can label up to
600 μg peptides with a labeling efficiency >99%. A small-scale
test LC-MS/MS run can be performed to confirm TMT label-
ing efficiency.
11. The peptides from this step are ready for LC-MS/MS run of
TMT-labeled protein abundance assay. For phosphopeptide
enrichment, desalt using C18 columns.
12. The amount of TiO2 beads used for each step of enrichment
can be optimized for the tissue being studied.
13. The flow-through from this step can be saved and used to
quantify protein abundance or used for enrichment of other
proteins [28].
References
1. Wu X, Gong F, Wang W (2014) Protein extrac- 9. Xie X, Kang H, Liu W et al (2015) Compre-
tion from plant tissues for 2DE and its applica- hensive profiling of the rice ubiquitome reveals
tion in proteomic analysis. Proteomics the significance of lysine ubiquitination in
14:645–658 young leaves. J Proteome Res 14:2017–2025
2. Wu X, Xiong E, Wang W et al (2014) Universal 10. Song G, Walley JW (2016) Dynamic protein
sample preparation method integrating tri- acetylation in plant–pathogen interactions.
chloroacetic acid/acetone precipitation with Front Plant Sci 7:421
phenol extraction for crop proteomic analysis. 11. Aguilar-Hernández V, Kim D-Y, Stankey RJ
Nat Protoc 9:362–374 et al (2017) Mass spectrometric analyses reveal
3. Song G, Hsu PY, Walley JW (2018) Assess- a central role for ubiquitylation in remodeling
ment and refinement of sample preparation the Arabidopsis proteome during photomor-
methods for deep and quantitative plant prote- phogenesis. Mol Plant 10:846–865
ome profiling. Proteomics 18:1800220 12. Liu S, Yu F, Yang Z et al (2018) Establishment
4. Finkemeier I, Laxa M, Miguet L et al (2011) of dimethyl labeling-based quantitative acetyl-
Proteins of diverse function and subcellular proteomics in Arabidopsis. Mol Cell Proteo-
location are lysine acetylated in Arabidopsis. mics 17:1010–1027
Plant Physiol 155:1779–1790 13. Walley JW, Shen Z, McReynolds MR et al
5. Silva-Sanchez C, Li H, Chen S (2015) Recent (2018) Fungal-induced protein hyperacetyla-
advances and challenges in plant phosphopro- tion in maize identified by acetylome profiling.
teomics. Proteomics 15:1127–1141 Proc Natl Acad Sci 115:210–215
6. Rao RSP, Thelen JJ, Miernyk JA (2014) Is 14. Kelley DR (2018) E3 ubiquitin ligases: key
Lys-N(ε)-acetylation the next big thing in regulators of hormone signaling in plants.
post-translational modifications? Trends Plant Mol Cell Proteomics 17:1047–1054
Sci 19:550–553 15. Pinkse MWH, Uitto PM, Hilhorst MJ et al
7. Hartl M, Füßl M, Boersema PJ et al (2017) (2004) Selective isolation at the femtomole
Lysine acetylome profiling uncovers novel his- level of phosphopeptides from proteolytic
tone deacetylase substrate proteins in Arabi- digests using 2D-NanoLC-ESI-MS/MS and
dopsis. Mol Syst Biol 13:949 titanium oxide precolumns. Anal Chem
8. Fang X, Chen W, Zhao Y et al (2015) Global 76:3935–3943
analysis of lysine acetylation in strawberry 16. Nakagami H, Sugiyama N, Mochida K et al
leaves. Front Plant Sci 6:739 (2010) Large-scale comparative
phosphoproteomics identifies conserved phos- mixtures by MS/MS. Anal Chem

phorylation sites in plants. Plant Physiol 75:1895–1904
153:1161–1174 23. Plubell DL, Wilmarth PA, Zhao Y et al (2017)
17. Kettenbach AN, Gerber SA (2011) Rapid and Extended multiplexing of tandem mass tags
reproducible single-stage phosphopeptide (TMT) labeling reveals age and high fat diet
enrichment of complex peptide mixtures: specific proteome changes in mouse epididymal
application to general and phosphotyrosine- adipose tissue. Mol Cell Proteomics
specific phosphoproteomics experiments. Anal 16:873–890
Chem 83:7635–7644 24. Karp NA, Huber W, Sadowski PG et al (2010)
18. Marcon C, Malik WA, Walley JW et al (2015) A Addressing accuracy and precision issues in
high-resolution tissue-specific proteome and iTRAQ quantitation. Mol Cell Proteomics
phosphoproteome atlas of maize primary 9:1885–1897
roots reveals functional gradients along the 25. Hogrebe A, von Stechow L, Bekker-Jensen DB
root axes. Plant Physiol 168:233–246 et al (2018) Benchmarking common quantifi-
19. Walley JW, Sartor RC, Shen Z et al (2016) cation strategies for large-scale phosphopro-
Integration of omic networks in a developmen- teomics. Nat Commun 9:1045
tal atlas of maize. Science 353:814–818 26. Song G, Brachova L, Nikolau BJ et al (2018)
20. Bantscheff M, Schirle M, Sweetman G et al Heterotrimeric G-protein-dependent prote-
(2007) Quantitative mass spectrometry in pro- ome and phosphoproteome in unstimulated
teomics: a critical review. Anal Bioanal Chem Arabidopsis roots. Proteomics 18:1800323
389:1017–1031 27. Abdulghani M, Song G, Kaur H et al (2019)
21. Wiese S, Reidegeld KA, Meyer HE et al (2007) Comparative analysis of the transcriptome and
Protein labeling by iTRAQ: a new tool for proteome during mouse placental develop-
quantitative mass spectrometry in proteome ment. J Proteome Res 18(5):2088–2099
research. Proteomics 7:340–350 28. Mertins P, Qiao JW, Patel J et al (2013)
22. Thompson A, Sch€afer J, Kuhn K et al (2003) Integrated proteomic analysis of post-
Tandem mass tags: a novel quantification strat- translational modifications by serial enrich-
egy for comparative analysis of complex protein ment. Nat Methods 10:634–637
Chapter 12
Optimizing Shotgun Proteomics Analysis for a Confident

Protein Identification and Quantitation in Orphan Plant
Species: The Case of Holm Oak (Quercus ilex)
Isabel Gómez-Gálvez, Rosa Sánchez-Lucas, Bonoso San-Eufrasio,
Luis Enrique Rodrı́guez de Francisco, Ana M. Maldonado-Alconada,
Carlos Fuentes-Almagro, and Mari Angeles Castillejo
Abstract
The proteomics of orphan, unsequenced, and recalcitrant organisms is highly challenging. This is the case of
the typical Mediterranean forest tree Holm oak (Quercus ilex). Proteomics has moved on quite fast from the
classical 2DE-MS to shotgun or gel-free/label-free approaches, with the latter possessing a series of
advantages over the gel-based ones. Before translating proteomics data into biological knowledge, a few
questions as to the analysis technique itself have to be answered including its confidence in protein
identification and quantification. It is important to clearly differentiate a hit from an ortholog and gene
product identification, with the difference depending on the database and the confidence parameters (score,
number of peptides, and coverage). With respect to quantification and for comparative purposes it is
important to make sure that we are within the linear dynamic range. For that, a calibration curve based
on mass spectrometry analysis of a serial dilution of the extracts should be performed. Thus, just by
validating our data with the aim of improving the quality of the analysis enables us to give a correct
interpretation of our results. We show a method that aims to improve the confidence in protein identifica-
tion and quantification in the orphan species Q. ilex using a shotgun proteomics approach.
Key words Holm oak, Orphan species, Plant proteomics, Shotgun, Confidence parameters
1 Introduction
Proteomics is changing in scale and focus, from its initial objective

of identifying as many individual proteins as possible to analyzing
the dynamics of the proteome [1]. Liquid chromatography coupled
with mass spectrometry, also called “shotgun” or “gel-free,” is an
alternative to the 2-DE which has been extensively used for protein
separation although both can be combined in a single experiment
[2, 3]. The gel-free methods, essential for bottom-up MS analysis,
have a series of advantages over the gel-based ones (top-down),
157
158 Isabel Gómez-Gálvez et al.
such as allowing for a greater coverage of the proteome when

working with total protein extracts, faster processing, and lesser
sample handling. The resulting peptides are analyzed by liquid
chromatography coupled to mass spectrometry equipment. In
addition, if prefractionation steps such as SDS-PAGE electropho-
resis are introduced, a higher resolution can be achieved by elim-
inating impurities and concentrating the sample [4].
For protein identification, software that requires the use of a
database is used. The database chosen is of extreme importance as it
will, to a great extent, determine the number and quality of the
identifications. In nonmodel species, such as Quercus ilex, the best
choice is an initial transcriptome analysis using deep sequencing to
generate the species-specific database with which to compare the
proteins [5] so that those identified will correspond to gene pro-
ducts of that species. Otherwise, employing a non–species-specific
database, identifications will be closest to orthologous genes pro-
ducts, and from them we would only be able to hypothesize or
speculate on the protein function [5]. Besides identification, shot-
gun techniques permit peptide and protein quantification, and for
that, two main approximations are used: peak area and spectral
count [6, 7].
These new proteomic analysis tools have an almost nonexistent
use in forest species, as is the case of Quercus ilex. Therefore, a
workflow optimization will be necessary in order to improve the
quality of the analysis and be able to correctly interpret the results.
For this purpose a few questions on the analysis technique itself
have to be answered. For instance, what are we identifying and how
confident is the identification? Does the identified protein corre-
spond to an allelic, variant, or isogenic protein species? How confi-
dent is the quantification? Here we present a method that is aimed
at improving confidence in protein identification in the orphan
species Q. ilex. For this purpose we have carried out a shotgun
proteomic study based on a serial dilution with protein extracts
obtained from a mixture of different holm oak plant organs
(embryo, seed, root, leaf). Our study permits the determination
of the confidence range when interpreting the results from a shot-
gun proteomics experiment in an orphan species such as Q. ilex.
2 Material
2.1 Plant Material In the present experiment, a mixture of different organs (embryo,
cotyledons, leaves, and roots) from Q. ilex are used (see Note 1).
2.2 Protein If not specified, reagents are of analytical grade. Solutions are
Extraction prepared in distilled water. Stock solutions are stored at 20 C,
unless otherwise stated.
Optimizing Shotgun Proteomics Analysis in Orphan Plant Species 159
1. Solution 1: 10% (w/v) trichloroacetic acid (TCA) in acetone.

2. Solution 2: 0.1 M ammonium acetate in 80% methanol.
3. Solution 3: 80% (v/v) acetone.
4. SDS buffer: 0.1 M Tris–HCl pH 8, 30% (w/v) sucrose, 2%
(w/v) SDS, 5% (v/v), β-mercaptoethanol. Store at 4 C.
5. Solution 4: Phenol Tris–HCl saturated pH 8 (Sigma)/SDS
buffer (1:1).
6. Precipitation solution: 0.1 M ammonium acetate/methanol.
7. Solubilization solution: 7 M urea, 2 M thiourea, 4% (w/v)
CHAPS, 0.5% (v/v) Triton X-100, 20 mM dithiothreitol
(DTT).
2.3 SDS 1. Running buffer: for a 10 stock mix 30.2 g Tris base, 144 g of
Polyacrylamide Gel glycine, and 1 g of SDS; add H2O up to 1 L.
2. Resolving gel: amount required for 1 mini-Protean (Bio-Rad)
of 12% acrylamide gel: mix 3 mL of 40% (w/v) acrylamide–
bisacrylamide solution (19:1), 2.5 mL 1.5 M Tris–HCl,
pH 8.8, 100 μL of 10% (w/v) SDS, and 4.35 mL of
H2O. Add 50 μL of 10% (w/v) APS and 5 μL of TEMED for
starting the polymerization. Add the mix to the 7 cm gel
cassette (see Note 2). Let the gels polymerize for 1 h.
3. Stacking gel: amount required for 1 mini-Protean (Bio-Rad) of
4% acrylamide gel: mix 0.25 mL of 40% (w/v) acrylamide–
bisacrylamide solution (19:1), 0.63 mL 0.5 M Tris–HCl,
pH 6.8, 25 μL of 10% (w/v) SDS, and 1.59 mL of
H2O. Add 50 μL of 10% (w/v) APS and 5 μL of TEMED for
starting the polymerization (see Note 3).
4. Laemmli buffer (5): 0.3 M Tris–HCl (pH 6.8), 10% SDS,
25% β-mercaptoethanol, 0.005% bromophenol blue, 50%
glycerol.
5. Coomassie stain solution: 40% (v/v) methanol, 10% (v/v)
acetic acid, 0.1% (w/v) Coomassie R-250, in distilled water
(see Note 4).
6. Destaining solution: 40% (v/v) methanol, 10% (v/v) acetic
acid in distilled water.
2.4 Protein Digestion 1. Washing solution 1: 0.1 M ammonium bicarbonate.

2. Washing solution 2: 0.1 M ammonium bicarbonate–acetoni-
trile (1:1) (v/v).
3. Acetonitrile 100%.
4. Reduction and alkylation solutions: reduction with 20 mM
DTT in 0.1 M ammonium bicarbonate; alkylation with
55 mM of iodoacetamide solution in 0.1 M ammonium bicar-
bonate. Keep the solution in a dark place.
5. Trypsin solution: Dilute the stock solution of trypsin with

trypsin buffer (25 mM NH4HCO3, 10% Acetonitrile, 5 mM
CaCl2) to reach the final concentration of trypsin 12.5 ng/μL.
Keep it at 4 C until digestion.
2.5 Peptide 1. Columns cartridges 60 Å C18 (Sharlau).

Desalting 2. Activation solution: 70% acetonitrile–0.1% trifluoroacetic acid.
3. Washing solution: 0.1% trifluoroacetic acid.
4. Elution solution: 70% acetonitrile.
5. 0.1% trifluoroacetic acid.
2.6 Solutions for LC- 1. Mobile phase A: 0.1% (v/v) formic acid in water.
MS/MS 2. Mobile phase B: 0.1% (v/v) formic acid in 80% acetonitrile.
2.7 Equipment and 1. Proteome Discoverer (version 2.1, Thermo Scientific), using
Software the SEQUEST algorithm to perform the search.
2. Viridiplantae database obtained from UniProtKB and a
species-specific database of Quercus ilex developed from the
transcriptome [8].
3. MERCATOR Software for protein classification into MapMan
functional plant categories (http://www.plabipd.de/portal/
mercator-sequence-annotation) complemented with the use
of the KEGG database (https://www.genome.jp/kegg/path
way.html).
3 Methods
3.1 Protein The proteins are extracted by using the method of Wang et al. [9]
Extraction by TCA/ with some modifications.
Acetone/Phenol 1. Add 1 mL of solution 1 precooled at 20 C from protein
extraction protocol to 200 mg of plant material previously
pulverized in liquid nitrogen.
2. Sonicate 10 min at maximum speed (P Selecta Ultrasons) (see
Note 5).
3. Centrifuge at 14,000 g for 10 min (4 C) and discard the
supernatant.
4. Add 1 mL of solution 2 precooled at 20 C and solubilize the
pellet.
supernatant.
6. Add 1 mL of solution 3 precooled at 20 C and solubilize the
pellet.

supernatant.
8. Air-dry the pellet at room temperature to remove residual
acetone.
9. Add 1 mL of solution 4 precooled at 4 C (under the hood).
Mix thoroughly and incubate for 5 min at 4 C (see Note 6).
10. Centrifuge at 14,000 g for 10 min (4 C). Transfer the upper
phenol phase into a new tube.
11. Add 1 mL of precipitation solution precooled at 20 C, mix
well, and incubate for precipitation overnight at 20 C.
supernatant.
13. Wash the pellet once with 100% methanol (precooled at
20 C) and disperse.
supernatant.
15. Wash the pellet once with 80% (v/v) acetone (precooled at
20 C) and disperse.
supernatant.
17. Air-dry the pellet at room temperature.
18. Solubilize the pellet with solubilization solution at room tem-
perature (see Note 7).
19. Quantify proteins using the Bradford method [10].
3.2 SDS-PAGE 1. Prepare a mixture of different holm oak plant organs (embryo,
Electrophoresis seed, root, leaf) extracts with 300 μg of protein each.
2. Prepare a serial dilution of proteins in the range of 1–200 μg
BSA equivalents, mixed with Laemmli buffer and heat them to
95 C for 5 min.
3. Perform a SDS-PAGE in a 12% acrylamide gel. Before loading
the protein, mark a line 1 cm below the staking-resolving
interphase. Load the samples and run the electrophoresis at
80 V, constantly until the bromophenol blue reaches the
marked line.
4. Stop the electrophoresis and immediately transfer the gel to a
plate with Coomassie staining solution [11]. Incubate in an
orbital shaker for 1 h.
5. Distain the gels in distaining solution for 80 min. Finish
bleaching with distilled water.
6. Cut the unique bands from the gel, all of them in the same way.
Transfer gel pieces to individual 1.5 mL tubes and cover them
with distiller water. At this point the gel pieces can be stored at
4 C.
3.3 Sample 1. Cut the gel bands with a scalpel into small fragments (around
Preparation for MS 1 mm3). Transfer the gel pieces to a 1.5 mL low binding tube.
Analysis 2. Add 1 mM ammonium bicarbonate/acetonitrile (1:1) (v/v).
3.3.1 Protein Digestion Mix equal volumes of washing solution and acetonitrile. Stir for
30 min at 37 C. Repeat this step.
3. Remove the supernatant and add acetonitrile. Incubate for
5 min at room temperature and remove the supernatant.
4. For the reduction and alkylation of the proteins, add 20 mM
DTT/100 mM ammonium bicarbonate. Subsequently add
55 mM iodoacetamide/100 mM ammonium bicarbonate.
Incubate for 30 min at room temperature in each solution.
5. Wash twice with 25 mM ammonium bicarbonate and with
25 mM ammonium bicarbonate/acetonitrile 50%.
6. Digest with the trypsin solution (12.5 ng/μL trypsin) and
incubate overnight with shaking at 37 C.
3.3.2 Peptides Extraction 1. Spin the tubes and transfer the supernatant to a new tube “A”.
2. Add 150 μL (enough volume to cover the gel) of 20% acetoni-
trile/1% formic acid and incubate for 5 min at room tempera-
ture. Sonicate for 3 min. Transfer the peptide solution to the
tube A.
3. Step 2 is repeated two more times, with 150 μL of, respectively,
50% and 90% acetonitrile/1% formic acid. Transfer the final
solutions to the tube A.
4. Dry out in SpeedVac. Keep at 20 C or 80 C for long term
storage.
5. Resuspend the samples in 100 μL of 0.1% formic acid with
sonication.
3.3.3 Peptides Desalting 1. Activate the C18 column with 0.4 mL 70% acetonitrile/0.1%
trifluoroacetic acid. Then wash it with 0.5 mL 0.1%
trifluoroacetic acid.
2. Add the sample to the column and keep it for 5 min at
RT. Collect the flow through and repeat this step twice (see
Note 8).
3. Wash the columns with 0.5 mL 0.1% trifluoroacetic acid (see
Note 9).
4. Elute the peptides with 100 μL of 70% acetonitrile three times
to a final volume of 300 μL and dry in a SpeedVac.
3.4 nLC-MS/MS 1. Prepare a serial dilution of the recovered peptides from the
concentrations of protein loaded in the gel (see Note 10).
2. Load the samples onto a nano-LC-MS-UHPLC Ultimate
3000 using a flow of 300 nL/min and a gradient of B in A
from 4 to 35% (120 min), 35–55% (6 min), and 55–90%
(3 min). Finally, elute the column with 90% of B over 8 min
before wash and reequilibration with a total time of chroma-
tography of 150 min.
3. The eluent from the column is introduced in the electrospray
ionization source of an MS/MS instrument (Orbitrap Fusion,
Thermo Fisher Scientific) operating in positive ion mode.
4. Perform survey scans of peptide precursors from 400 to
1500 m/z at 120 K resolution (at 200 m/z) with a 4 105
ion count target. Tandem MS by isolation at 1.2 Da with the
quadrupole, CID fragmentation with normalized collision
energy of 35, and rapid scan MS analysis in the ion trap.
3.5 Protein 1. Use the MS2 spectra for identification, using the SEQUEST
Identification algorithm with the Proteome Discover software (version 2.1.,
Thermo Scientific). The following parameters were set: theo-
retical tryptic digestion allowing up to one missed cleavage,
carbamidomethylation of cysteines as fixed modification and
oxidation of methionine as a variable modification. Precursor
mass tolerance of 10 ppm and product ions search at 0.1 Da
tolerance. Validate peptide spectral matches (PSM) using per-
colator based on q-values at a 1% FDR. To group peptide
identifications into proteins use the law of parsimony and filter
to 1% FDR and a minimum XCorr of 2.
2. Use two databases for protein identification: one generic, Vir-
idiplantae database from UniprotKB and a custom species-
specific Q. ilex database developed from the transcriptome [8].
3. Assign functional categorization of proteins with the MERCA-
TOR software and the KEGG database.
3.6 Protein 1. Protein quantification by peak area: using the peak area values
Quantification given for each protein, following the normalization of data
with the total sum of the peak area values per each sample.
2. Protein quantification by proteotypic peptides: using the proteo-
typic peptides (specific of protein) from different proteins and
representing the peak area intensity in each of the serial dilu-
tions studied (see Note 11).
a b
8000 2500
7000
Number of proteins
2000
Number of peptides
6000
5000 Viridiplantae Viridiplantae
1500
4000 Quercus Quercus
1000
3000
2000 500
1000
0
0 0 200 400 600 800 1000
0 200 400 600 800 1000
Total amount of protein (ng)
Total amount of protein (ng)
c d
Quercus UniprotKB- Quercus UniprotKB-
ilex Viridiplantae ilex Viridiplantae
5539 1206 723 1256 380 521
Fig. 1 Number of peptides (a) and proteins (b) identified, using the UniprotKB-Viridiplantae and Quercus ilex
databases. The values correspond to the different amounts of proteins used, in the range of 10–1000 ng. Venn
diagram showing the number of peptides (c) and proteins (d) identified from the two databases used
(UniprotKB-Viridiplantae and Quercus ilex), corresponded to the 600 ng of protein sample dilution
4 Anticipated Results
1. The number of identifications (proteins/peptides) it depends

on the amount of protein loaded, with values from 3206 pep-
tides/1141 proteins to 6745 peptides/1636 proteins in the
linear range of 100–600 ng (see Fig. 1a and b).
2. The number of identifications is dependent of the database. In
the case of orphan species, such as Q. ilex, it is highly recom-
mended to use a species-specific database from the transcrip-
tome. These include gene products that are nonorthologous
such as that occur when generic databases are used. In our
system the number of peptides and proteins found with the
specific database was significantly greater than that found with
the generic one (see Table 1, Fig. 1c and d).
3. The identification confidence (% coverage, score, and number
of peptides) mostly depends on the database (species-specific vs
general). Using the Q. ilex database, for most of the proteins
identified (up to 70–80%) we found values ranging from 1 to
20% of coverage, 2–20 of score value, and 1–5 peptides per
protein. Compared to the Viridiplantae database, a larger
Table 1
Number of peptides and proteins identified. The values correspond to a serial dilution used in the
range of 10–1000 ng of protein. The UniprotKB (Viridiplantae) and Quercus ilex databases were used,
showing orthologs, for the first, and gene products, for the second one
UniprotKB-Viridiplantae database Quercus ilex database
ng of Proteins Peptides Proteins Orthologous products Peptides Proteins Gene products

10 996 556 85 3206 1141 1141
50 1110 563 95 3643 1058 1244
100 987 495 73 3234 993 1168
200 1363 659 100 2455 1180 1425
400 1678 812 116 6001 1511 1851
600 1929 901 135 6745 1636 2028
800 1678 793 122 6589 1587 1955
1000 1821 869 124 7034 1604 1992
proportion of proteins showed higher confidence values using

the Q. ilex database (see Fig. 2).
4. The relative protein quantification was confident in the
100–1000 ng range of protein, as was revealed for the peak
area of several proteotypic peptides (see Fig. 3). For that, six
proteotypic peptides belonging to different proteins were ran-
domly selected. The peak area values were plotted against the
serial dilution of protein. We were able to observe that there is a
limit of detection around 100 ng of protein, below which the
quantification is not reliable. The linear dynamic range was
placed between 100 and 800 ng for the six peptides analyzed.
5 Notes
1. Q. ilex acorns from Cordoba (Spain) are sterilized [12]. For

germination, the pericarps are removed from the acorns, then
sown in 0.5 L pots with perlite and grown in a greenhouse
(35/19 C day/night and HR less than 43%). Plants are irri-
gated weekly to field capacity with Hoagland solution
[13]. The embryo and cotyledons are obtained from germi-
nated seeds, and the roots and leaves from 4-month seedlings
(10-leaf developmental stage). Each organ (embryo, cotyle-
don, root and leaves) is individually frozen in liquid nitrogen
and stored at 80 C until analysis.
2. Avoid bubbles during casting and quickly cover the acrylamide
with 2-propanol.
Fig. 2 Relative number of proteins identified in the sample dilution of 600 ng,
grouped according to the confidence parameter ranges (% coverage, score
value, and number of peptides)
3. Before adding the APS and TEMED discard the 2-propanol

layer covering the resolving gels and briefly rinse with water.
Then pour the stacking solution containing APS and TEMED,
and carefully place the comb preventing bubbles.
10000000
9000000
8000000
Peak area (Intensity)
7000000 [K].AEYDESGPSIVHR.[K]
6000000
[K].AGEDADTLGLTGHER.[Y]
5000000
[K].AGIVASLDELVK.[E]
4000000
[K].GAPVVAAPAK.[E]
3000000
[K].ILDGPPGTAER.[A]
2000000
1000000 [K].VGNFLNR.[F]
0
0 200 400 600 800 1000
ng of protein
Fig. 3 Peak area for several proteotypic peptides, determined at the different protein dilutions. The selected
peptides corresponded to the following proteins, from top to bottom: actin-97, aconitate hydratase, disulfide-
isomerase A6, elongation factor 1-beta 1, flowering locus K homology domain and UTP-glucose-1phosphate
uridylyltransferase
4. Dissolve the Coomassie in methanol and then add the other

components.
5. Keep the samples on ice, since sonication generates heat.
6. Shake the mixture frequently as both phases (phenol and SDS
buffer) tend to separate.
7. It is recommended to use the minimum amount of buffer in
which the pellet is completely dissolved to obtain a high protein
concentration.
8. Pass the sample very slowly through the column to facilitate the
maximum binding of peptides.
9. After this step change the tubes to low-bind to recover the
eluted peptides.
10. For MS analysis a serial dilution ranged from 10 to 1000 ng of
protein BSA equivalents as determined with the Bradford assay
was prepared.
11. In shotgun experiments, complex mixtures of peptides are
usually used and some of them may be present in more than
one protein. For this reason, only using proteotypic peptides,
we can make a better estimation of the amount of a given
protein in the sample.
Acknowledgments
The authors thank the University of Cordoba, Spain (UCO-CeiA3)

and the staff of the Central Service for Research Support (SCAI) for
their technical support in the bioinformatics data analysis. This
research was funded by the grant ENCINOMICA BIO2015-

64737-R from Spanish Ministry of Economy and Competitiveness.
M.A.C. is grateful for the contract “Ramón y Cajal (RYC-2017-
23706) program” of the Spanish Ministry of Science, Innovation,
and Universities.
References
1. Barbier-Brygoo H, Jouard J (2004) Focus on quantitative proteomics. J Biol Chem 286
plant proteomics. Plant Physiol Biochem (29):25443–25449
42:913–917 8. Guerrero-Sanchez VM, Maldonado-Alconada
2. Cánovas FM, Dumas-Gaudot E, Recorbet G AM, Amil-Ruiz F, Jorrin-Novo J (2017)
et al (2004) Plant proteome analysis. Proteo- Holm oak (Quercus ilex) transcriptome. De
mics 4:285–298 novo sequencing and assembly analysis. Front
3. Jorrı́n-Novo JV (2014) Plant proteomics: Mol Biosci 4:70
methods and protocols. In: Jorrin-Novo JV, 9. Wang W, Vignani R, Scali M, Mauro C (2006)
Komatsu S, Weckwerth W, Wienkoop S (eds) A universal and rapid protocol for protein
Methods in molecular biology, vol 1072. extraction from recalcitrant plant tissues for
Humana Press, Totowa, pp 3–13 proteomic analysis. Electrophoresis
4. Valledor L, Wolfram W (2014) Standardization 27:2782–2786
of data processing and statistical analysis in 10. Bradford MM (1975) A rapid and sensitive
comparative plant proteomics experiment. In: method for the quantitation of microgram
Jorrin-Novo JV, Komatsu S, Weckwerth W, quantities of protein utilizing the principle of
Wienkoop S (eds) Plants proteomics: methods protein-dye binding. Anal Biochem
and protocols. Methods in molecular biology, 72:248–254
vol 1072. Humana Press, Totowa, pp 347–358 11. Neuhoff V, Arold N, Taube D, Ehrhardt W
5. Romero-Rodriguez MC, Pascual J, Valledor L, (1988) Improved staining of proteins in poly-
Jorrin-Novo J (2014) Improving the quality of acrylamide gels including isoelectric focusing
protein identification in non-model species. gels with clear background at nanogram sensi-
Characterization of Quercus ilex seed and tivity using Coomassie Brilliant Blue G-250
Pinus radiata needle proteomes by using and R-250. Electrophoresis 9:255–262
SEQUEST and custom databases. J Proteome 12. Bonner FT, Vozzo JA (1987) Seed biology and
105:85–91 technology of Quercus. General technical
6. Zhu W, Smith JW, Huang CM (2010) Mass report, SO-66. U.S. Dept. of Agriculture, For-
spectrometry-based label-free quantitative pro- est Service, Southern Forest Experiment Sta-
teomics. J Biomed Biotechnol:1–6. https:// tion, New Orleans, LA, p 21
doi.org/10.1155/2010/840518 13. Hoagland DR, Arnon DI (1950) The water-
7. Xie F, Liu T, Qian WJ et al (2011) Liquid culture method for growing plants without
chromatography-mass spectrometry-based soil. California Agricultural Experiment Sta-
tion, Circular-347
Chapter 13
Combining Targeted and Untargeted Data Acquisition

to Enhance Quantitative Plant Proteomics Experiments
Gene Hart-Smith
Abstract
Most quantitative proteomics experiments either target a limited number of selected proteins for quantifi-
cation or quantify proteins on a broad scale in an untargeted manner. However, we recently demonstrated
that experiments that have both targeted and untargeted components can be particularly advantageous.
Using a combined targeted and untargeted liquid chromatography–tandem mass spectrometry data acqui-
sition strategy termed TDA/DDA (shorthand for targeted data acquisition/data-dependent acquisition),
which we applied to a model quantitative plant proteomics experiment performed on Arabidopsis, we
demonstrated improved quantification of both targeted and untargeted proteins relative to purely untar-
geted experiments performed using conventional data-dependent acquisition (Hart-Smith et al. Front
Plant Sci 8:1669, 2017). This suggests that many quantitative proteomics datasets earmarked for collection
using data-dependent acquisition are likely to benefit from the use of TDA/DDA instead.
This chapter describes how TDA/DDA liquid chromatography–tandem mass spectrometry methods can
be created on commonly used mass spectrometric instrument platforms. It described how, using freely
available software, tandem mass spectrometry inclusion lists designed to target proteins of hypothesized
interest can be generated. Best practice implementation of these inclusion lists in TDA/DDA strategies is
then described. Relative to conventional data-dependent acquisition, the liquid chromatography–tandem
mass spectrometry methods created using these guidelines increase the chances of quantifying targeted
proteins and can produce widespread improvements in the reproducibility of untargeted protein quantifi-
cation, without compromising the total numbers of proteins quantified. They are compatible with different
quantitative proteomics methodologies, including metabolic labeling, chemical labeling and label-free
approaches, and can be used to create tailored assay libraries to aid the interpretation of quantitative
proteomics data collected using data-independent acquisition.
Key words Quantitative proteomics, Shotgun proteomics, Inclusion lists, Targeted data acquisition
(TDA), Data-dependent acquisition (DDA)
1 Introduction
Quantitative proteomic studies are expected to play a critical role in

the burgeoning field of plant molecular systems biology. These
studies, which quantify proteins using peptide ions identified in
liquid chromatography (LC)–tandem mass spectrometry
(MS/MS) experiments, have traditionally been categorized as
169
170 Gene Hart-Smith
either hypothesis driven or non–hypothesis driven [2]. Hypothesis

driven studies can be highly selective and sensitive toward individ-
ual proteins [3–5]. They quantify specific proteins using targeted
LC-MS/MS data acquisition strategies such as selected reaction
monitoring (SRM) [6], parallel reaction monitoring (PRM) [7],
or targeted data acquisition (TDA) [8–11], or extract quantitative
data for proteins of interest from LC-MS/MS datasets collected
using data-independent acquisition (DIA) [12]. In contrast, non–
hypothesis driven studies facilitate exploratory analyses by quantify-
ing proteins on a broad scale in an untargeted manner. This is
generally achieved using data-dependent acquisition (DDA), a
LC-MS/MS data acquisition strategy that selects a number of
peptide ions for MS/MS per MS/MS scan cycle, with relative ion
abundances used as a means to prioritize selections [13].
Despite this traditional segregation of quantitative proteomics
into hypothesis driven or non–hypothesis driven research, these
two categories are not mutually exclusive. For example, we recently
employed a combined targeted and untargeted LC-MS/MS data
acquisition strategy, termed TDA/DDA, in quantitative proteo-
mics analyses of wild-type Arabidopsis thaliana plants relative to
plants mutant for the proteins DOUBLE STRANDED RNA
BINDING PROTEIN1 (DRB1) and DRB2 [14, 15] using the
metabolic 15N-labeling approach [16]. In this context, TDA/DDA
enabled both hypothesis driven [14] and non–hypothesis driven
[15] insights into miRNA-guided translation inhibition.
TDA/DDA operates as follows (Fig. 1): within each MS/MS
scan cycle performed over the course of an LC-MS/MS experi-
ment, (1) targeted peptide ions are firstly selected for MS/MS via
TDA using an m/z inclusion list employed with an open retention
time window; and (2) after a set number of inclusion list-triggered
MS/MS scans, or if all peptide ions matching those in the inclusion
list are selected for MS/MS, additional peptide ions are then
selected for MS/MS in an untargeted manner using DDA.
We recently conducted an in-depth evaluation of the utility of
TDA/DDA for combined hypothesis driven and non–hypothesis
driven quantitative proteomics. This study demonstrated that, rel-
ative to conventional DDA, TDA/DDA is capable of not only
enhancing the quantification of targeted proteins; surprisingly it
can also enhance the broad scale quantification of untargeted pro-
teins [1]. These unexpected improvements in non–hypothesis
driven protein quantification stem from untargeted peptide ions
with m/z values serendipitously matching those in inclusion lists,
which are repeatedly identified across replicate experiments. This
enhanced experimental reproducibility can lead to widespread
improvements in the identification of statistically significant
changes in protein abundance.
Targeted and Untargeted Quantitative Proteomics 171
Fig. 1 TDA/DDA LC-MS/MS methods employ TDA inclusion lists for the hypothesis driven selection of peptides
for MS/MS (green arrows), followed by DDA for the non–hypothesis driven selection of peptides for MS/MS
(blue arrows). An illustrative MS survey scan is shown with green signals representing peptides derived from
targeted proteins, and black signals representing other peptides. In this example, up to 5 TDA events and
5 DDA events are employed per MS/MS scan cycle
The benefits of TDA/DDA relative to DDA will be sample and

instrument specific. However, using the guidelines described in this
chapter, TDA/DDA can be expected to match or outperform
DDA in most non–hypothesis driven quantitative proteomics
experiments performed on complex peptide samples (>25 K pep-
tide features), while concomitantly allowing specific proteins to be
quantified in a targeted manner. We therefore suggest that TDA/
DDA should be considered for use in all broad scale quantitative
plant proteomics experiments traditionally assigned for collection
using DDA; particularly those that could also benefit from the
targeted quantification of proteins of known or hypothesized
biological interest.
2 Materials
2.1 Creation 1. Benchtop computer with Skyline [17] installed (see Note 1).
of Inclusion Lists 2. Benchtop computer with mass spectrometry instrument soft-
and TDA/DDA LC-MS/ ware (e.g., Xcalibur if using Thermo Scientific equipment)
MS Methods installed (see Note 2).
3 Methods
3.1 Creation 1. Create lists of proteins of known or hypothesized biological

of Inclusion Lists interest to target for quantification. Create one list per peptide
mixture to be subjected to LC-MS/MS. These lists can range
from several proteins to over one hundred proteins per peptide
mixture (see Note 3).
172 Gene Hart-Smith
2. For each list of targeted proteins, import the amino acid

sequences of these proteins into Skyline. If using Skyline v4,
this can be done, for example, by importing FASTA entries (see
Note 4) for individual proteins as follows:
Edit ! Insert ! FASTA. . . .
3. Skyline will automatically perform an in silico digestion for
each imported protein, creating a list of theoretical peptides.
Ensure that these digestions are performed using parameters
appropriate for the peptide mixture to be analyzed.
If using Skyline v4, click: Settings ! Peptide Settings...
Ensure that the enzyme used to create the peptide mixture
to be analyzed is correctly specified. This can be done under the
“Digestion” tab (e.g., select “Trypsin [KR | P]”).
If a metabolic labeling experiment has been performed,
ensure that the correct isotope modifications (e.g., “Label:
15N”) have been listed and are checked. This can be done
under the “Modifications” tab.
It is also recommended that in silico digestions should
produce peptides 7–15 amino acids in length. This can be
specified under the “Filter” tab.
Generally, in silico digestions that produce up to two
missed cleavages (specified under the “Digestion” tab) and
consider methionine oxidation as a structural modification
(specified under the “Modifications” tab) are also recom-
mended, as elaborated on below. If alkylation of cysteine resi-
dues was performed during sample preparation, this should
also be specified when selecting structural modifications.
4. Appropriate m/z values need to be calculated for the theoretical
peptides generated by Skyline. It is recommended that these
should fall in the range m/z 350–1500, and be associated with
peptide ions of charge state +2, +3 or +4.
If using Skyline v4, click: Settings ! Transition Settings...
Precursor charges (e.g., “2, 3, 4”) can be specified under
the “Filter” tab, and m/z ranges under the “Instrument” tab.
5. The inclusion list incorporating these m/z values needs to be of
an appropriate size (see Note 5). In Skyline v4, the inclusion list
size is shown on the bottom right corner of the main GUI
window (i.e., the denominator next to “prec”).
If the inclusion list size needs to be reduced, the in silico
digestion parameters should be refined. It is recommended that
the following parameters, accessed as above, should be consid-
ered for alteration in the following order: (1) remove methio-
nine oxidation as a structural modification; (2) reduce the
number of missed proteolytic cleavages from 2 to 1; (3) specify
precursor ion charges of +2 and +3 only; (4) reduce the number
of missed proteolytic cleavages from 1 to 0.
If the inclusion list size still needs to be reduced after

making the above alterations, consider placing limits on the
number of m/z values associated individual large proteins, or
refining the list of targeted proteins.
6. Export the inclusion list. Ensure that the list is in an appropriate
file type and format for the mass spectrometric instrument
platform to be used during LC-MS/MS (see Note 6), and
that retention time windows are left open (see Note 7).
If using Skyline v4, click: File ! Export ! Isolation List. . .
If relevant, select the instrument platform to be used for
LC-MS/MS data collection before clicking “OK.” This will
ensure that the list is saved in a correct file type and format for
the selected instrument platform. It is nonetheless recom-
mended that exported files are manually checked to ensure
correct formatting.
If the instrument platform to be used for LC-MS/MS data
collection is not available, export the inclusion list using any
instrument type and manually format the list.
3.2 Creation of TDA/ 1. Open the mass spectrometry instrument software and navigate
DDA LC-MS/MS to the method editor. For example, if using Thermo Scientific
Methods equipment, open Xcalibur and click: Roadmap View ! Instru-
ment Setup.
2. Open a preoptimized DDA LC-MS/MS method to use as the
starting point for the new TDA/DDA LC-MS/MS method.
This preoptimized DDA LC-MS/MS method should contain
appropriate parameters for the following: survey scans, precur-
sor ion selection, and MS/MS (elaborated on in step 3,
below).
3. Create a TDA component to the LC-MS/MS method, which
should be prioritized over the DDA component. Specify how
many TDA and DDA events to perform per MS/MS scan cycle
(see Note 8). The steps required to perform these actions will
be dependent on the mass spectrometry instrument software.
For many instruments it will be possible to perform these
actions by modifying the DDA LC-MS/MS method opened in
step 2. This is, for example, possible on Thermo Scientific
Fusion (see Note 9) or Q Exactive (see Note 10) series
instruments.
For other instruments, a new method will need to be
created. This may, for example, be necessary on LTQ Orbitrap
series instruments (see Note 11). Ensure that any such new
method is populated with parameters found in the preopti-
mized DDA LC-MS/MS method (see Note 12).
4. Import the inclusion list created in Subheading 3.1 and specify
that the list should be used with a 10 ppm mass tolerance. The
174 Gene Hart-Smith
steps required to take these actions will be dependent on the

mass spectrometry instrument software (see Note 13).
5. Save the new method and use it for LC-MS/MS data collection
in place of the DDA LC-MS/MS method opened in step 2.
4 Notes
1. Although this chapter describes the use of Skyline v4 for in

silico protein digestions, different versions of Skyline and
numerous other utilities—for example the online utility
MS-Digest (UC San Francisco)—are also capable of
performing this task.
2. The software required to create TDA/DDA methods will
depend on the mass spectrometer used for LC-MS/MS data
collection. This chapter will provide detailed descriptions of the
creation of TDA/DDA methods for Thermo Scientific LTQ
Orbitrap series, Q Exactive series or Fusion series mass spectro-
meters using Xcalibur versions 2, 3 and 4, respectively. How-
ever, steps similar to those described here can be applied to the
creation of TDA/DDA methods on other instrument
platforms.
3. Extremely long lists of targeted proteins may limit the number
of theoretical peptide ions that can be targeted per protein,
which may diminish the efficacy of hypothesis driven protein
quantification. Whether or not this may be an issue will be
apparent following step 5 of Subheading 3.1. (See also Note
5, below).
4. There are numerous ways to access FASTA entries of individual
proteins. For example for proteins listed in Uniprot [18],
FASTA entries can be accessed online from each protein’s
web page by clicking ! FASTA (Sequence data in FASTA
format).
5. The advantages of TDA/DDA relative to DDA can be
expected to hold true across a range of inclusion list sizes. It
can be expected that large inclusion lists comprising thousands
of values should offer particular advantages. This is because,
even when using small inclusion lists (e.g., ~100 values) and
high resolution mass analyzers, substantial redundancy
between inclusion list and untargeted peptide ion m/z values
can be expected when analysing complex peptide samples
[1]. It is this redundancy in m/z values that improves the
reproducibility of untargeted protein quantification when
using TDA/DDA. This redundancy in m/z values can be
expected to increase with inclusion list size.
Despite these expected advantages of large inclusion lists,

inclusion list sizes are, nonetheless, capped by mass spectrome-
try instrument software. These maximum inclusion list sizes
will be software specific, and for most software platforms, will
be indicated in error messages if exceeded. For example, if
creating TDA/DDA methods for an LTQ Orbitrap instrument
using Xcalibur v2.2, inclusion lists will be capped at 2000
values when using open retention time windows.
6. If creating inclusion lists for Thermo Scientific LTQ Orbitrap,
Q Exactive or Fusion series instruments using Xcalibur version
2, 3, or 4, respectively, inclusion lists should be created as either
tab delimited text (.txt) or comma separated values (.csv) files.
For LTQ Orbitrap series instruments, 3 columns are
required per m/z value specifying the following: m/z value
(to 4 decimal places), retention start time (in min), retention
end time (in min). These lists must be formatted such that
there is no redundancy in m/z values within each specified
retention time window. It is therefore recommended that any
duplicate m/z values are removed prior to entering retention
time values (see Note 7).
For Q Exactive series instruments, 5 columns are required
per m/z value specifying the following: m/z value (to 4 decimal
places), molecular formula (entries can be left blank), targeted
species (entries can be left blank), peptide ion charge state (“2,”
“3,” etc.), polarity of the peptide ion (“Positive” if using
positive ion mode electrospray ionization).
For Fusion series instruments, three columns are required
per m/z value specifying the following: m/z value (to four
decimal places), peptide ion charge state (“2,” “3,” etc.),
name of targeted species (these names will not affect how
inclusion lists are implemented). In addition, the following
case-sensitive column headers are required: “m/z,” “z,”
“Name.”
7. If creating inclusion lists for Q Exactive or Fusion series instru-
ments using Xcalibur version 3 or 4 respectively, the creation of
open retention time windows simply involves leaving the col-
umns for retention start and end times out, as per Note 6.
If creating inclusion lists for LTQ Orbitrap series instru-
ments using Xcalibur v2, this will involve inputting retention
start and end times covering the entire period of peptide elu-
tion. From our experience with Xcalibur v2.1 and v2.2, speci-
fying a single broad retention time window per m/z value (e.g.,
14.00–50.00 min) leads to faulty implementations of inclusion
lists by the instrument software. To remedy this, we input each
m/z value multiple times using a series of 6-min retention time
windows, with each window separated by a minimal retention
time difference (14.00–20.00 min, 20.01–26.00 min,
26.01–32.00 min, etc.).
176 Gene Hart-Smith
8. When creating a TDA/DDA method, it is recommended that

the total number of dependent scan events per MS/MS scan
cycle is the same as the preoptimized DDA LC-MS/MS
method opened in Subheading 3.2, step 2. It is recommended
that the first half of these dependent scan events should be
allocated to TDA, with the latter half allocated to DDA.
9. If using a Fusion series instrument, adding a TDA component
to the preoptimized DDA LC-MS/MS method opened in
Subheading 3.2, step 2 involves the following.
Click on the “Scan Parameters” tab to open the data acqui-
sition workflow associated with the DDA LC-MS/MS
method. Drag and drop a new “ddMS2” scan node into the
workflow and ensure that its MS/MS parameters match those
of the existing ddMS2 scan node. Ensure that the new scan
node has Scan Priority ¼ 1 (listed under “Data-Dependent
MSn Scan Properties”) and change the Scan Priority of the
existing ddMS2 scan node to 2. Drag and drop a “Targeted
Mass” filter node above the new ddMS2 scan node.
Following this, navigate to “Data Dependent Properties”
(e.g., by clicking on the node specifying “# sec” or “# scans”)
and ensure that the data-dependent mode is set to “Scans Per
Outcome.” This will allow the number of TDA and DDA
events per MS/MS scan cycle to be specified.
10. If using a Q Exactive series instrument, adding a TDA compo-
nent to the preoptimized DDA LC-MS/MS method opened
in Subheading 3.2, step 2 involves the following.
Navigate to “Properties of the method” in the DDA
LC-MS/MS method and ensure that User Role ¼ Advanced.
Navigate to “Properties of Full MS/dd-MS2 (TopN)” and
ensure that Inclusion ¼ on, and that If idle ¼ pick others.
11. If using an LTQ Orbitrap series instrument, it is possible to
create an LC-MS/MS method with both TDA and DDA
components as follows.
After opening the method editor in Xcalibur (Subheading
3.1, step 1), click “Data dependent MS/MS” to create a new
method. Specify the number of Scan Events following Note
8 and ensure that the “Dependent scan” checkbox is marked
for all Scan Events other than Scan Event 1.
Following this, edit the parameters for the first half of the
dependent scans by clicking on “Settings. . .”. For each of these
Scan Events, navigate to “Parent Mass List” and ensure that
the “Use global mass lists” box is checked. After this, navigate
to the parameters listed under “Current Scan Event.” Ensure
that each mass is determined from Scan Event 1 and that “Nth
most intense from list” is specified, starting from 1 and increas-
ing by 1 for each subsequent Scan Event.
For the latter half of the dependent scans, click on

“Settings. . .” and navigate to the parameters listed under
“Current Scan Event.” Ensure that each mass is determined
from Scan Event 1 and that “Nth most intense ion” is specified,
starting from 1 and increasing by 1 for each subsequent Scan
Event.
12. For survey scans, it is particularly important to define the
following parameters appropriately: AGC target values, maxi-
mum injection times, mass analyzer used, and mass analyzer
resolution.
For precursor ion selection, it is particularly important to
define the following parameters appropriately: minimum ion
counts required to trigger MS/MS events, charge states capa-
ble of triggering MS/MS events, whether or not monoisotopic
precursor ion selection is enabled, and parameters associated
with dynamic exclusion.
For MS/MS, different dissociation methods will require
different parameters to be specified. For example, if using
HCD or CID, it is particularly important to define the follow-
ing parameters appropriately: collision energies, AGC target
values, activation times, mass analyzer used, and mass analyzer
resolution.
13. If using an LTQ Orbitrap series instrument, import the inclu-
sion list by clicking on the “Mass Lists” tab. Ensure that
“Parent Masses” is selected in the “Mass List” pull-down
menu before importing the inclusion list. Following this,
click on the “MS Detector Setup” tab. Click on “Settings. . .”
for any dependent scan and navigate to “Mass Widths.” Under
“Parent mass width,” specify low and high mass tolerances of
10 ppm.
If using a Q Exactive series instrument, click on “Inclu-
sion” (under “Global Lists”) to import the inclusion list. Fol-
lowing this, under “Properties of the method” navigate to
“Customized Tolerances ()” and specify mass tolerances of
10 ppm.
If using a Fusion series instrument, click on the “Targeted
Mass” node and specify the mass list type as “m/z & z” before
importing the inclusion list. After this, specify low and high
mass tolerances of 10 ppm.
References
1. Hart-Smith G, Reis RS, Waterhouse PM et al proteomics strategy. Nat Biotechnol
(2017) Improved quantitative plant proteo- 28:710–721
mics via the combination of targeted and untar- 3. Gillet LC, Leitner A, Aebersold R (2016) Mass
geted data acquisition. Front Plant Sci 8:1669 spectrometry applied to bottom-up proteo-
2. Domon B, Aebersold R (2010) Options and mics: entering the high-throughput era for
considerations when selecting a quantitative
178 Gene Hart-Smith
hypothesis testing. Annu Rev Anal Chem 11. Schmidt A, Gehlenborg N, Bodenmiller B et al
9:449–472 (2008) An integrated, directed mass spectro-
4. Picotti P, Bodenmiller B, Mueller LN et al metric approach for in-depth characterization
(2009) Full dynamic range proteome analysis of complex peptide mixtures. Mol Cell Proteo-
of S. cerevisiae by targeted proteomics. Cell mics 7:2138–2150
138:795–806 12. Gillet LC, Navarro P, Tate S et al (2012) Tar-
5. Schmidt A, Claassen M, Aebersold R (2009) geted data extraction of the MS/MS spectra
Directed mass spectrometry: towards generated by data-independent acquisition: a
hypothesis-driven proteomics. Curr Opin new concept for consistent and accurate prote-
Chem Biol 13:510–517 ome analysis. Mol Cell Proteomics 11:O111.
6. Picotti P, Aebersold R (2012) Selected reaction 016717
monitoring–based proteomics: workflows, 13. Kalli A, Smith GT, Sweredoski MJ et al (2013)
potential, pitfalls and future directions. Nat Evaluation and optimization of mass spectro-
Methods 9:555–566 metric settings during data-dependent acquisi-
7. Peterson AC, Russell JD, Bailey DJ et al (2012) tion mode: focus on LTQ-Orbitrap mass
Parallel reaction monitoring for high resolu- analyzers. J Proteome Res 12:3071–3086
tion and high mass accuracy quantitative, tar- 14. Reis RS, Hart-Smith G, Eamens AL, Wilkins
geted proteomics. Mol Cell Proteomics MR, Waterhouse PM (2015) Gene regulation
11:1475–1488 by translational inhibition is determined by
8. Domon B, Bodenmiller B, Carapito C et al Dicer partnering proteins. Nat Plants 1:1–6
(2009) Electron transfer dissociation in con- 15. Reis RS, Hart-Smith G, Eamens AL et al
junction with collision activation to investigate (2015) MicroRNA regulatory mechanisms
the Drosophila melanogaster phosphopro- play different roles in Arabidopsis. J Proteome
teome. J Proteome Res 8:2633–2639 Res 14:4743–4751
9. Hart-Smith G, Low JK, Erce MA et al (2012) 16. Arsova B, Kierszniowska S, Schulze WX (2012)
Enhanced methylarginine characterization by The use of heavy nitrogen in quantitative pro-
post-translational modification-specific tar- teomics experiments in plants. Trends Plant Sci
geted data acquisition and electron-transfer 17:102–112
dissociation mass spectrometry. J Am Soc 17. MacLean B, Tomazela DM, Shulman N et al
Mass Spectrom 23:1376–1389 (2010) Skyline: an open source document edi-
10. Savitski MM, Fischer F, Mathieson T et al tor for creating and analyzing targeted proteo-
(2010) Targeted data acquisition for improved mics experiments. Bioinformatics 26:966–968
reproducibility and robustness of proteomic 18. Consortium U (2014) UniProt: a hub for pro-
mass spectrometry assays. J Am Soc Mass Spec- tein information. Nucleic Acids Res 43:
trom 21:1668–1679 D204–D212
Chapter 14
A Phosphoproteomic Analysis Pipeline for Peels of Tropical

Fruits
Janet Juarez-Escobar, José M. Elizalde-Contreras,
Vı́ctor M. Loyola-Vargas, and Eliel Ruiz-May
Abstract
Phosphorylation is a posttranslational reversible modification related to signaling and regulatory mechan-
isms. Protein phosphorylation is linked to structural changes that modulate protein activity, interaction, or
localization and therefore the cell signaling pathways. The use of techniques for phosphoprotein enrich-
ment along with mass spectrometry has become a powerful tool for the characterization of signal transduc-
tion in model organisms. However, limited efforts have focused on the establishment of protocols for the
analysis of the phosphoproteome in nonmodel organisms such as tropical fruits. This chapter describes a
potential pipeline for sample preparation and enrichment of phosphorylated proteins/peptides before MS
analysis of peels of some species of tropical fruits.
Key words Peptide enrichment, Phosphorylation, Phosphoproteome, Tropical fruit
1 Introduction
Arabidopsis thaliana has been the plant model for studying several
biological processes including the application of omics tools and
the massive profiling of posttranslational modifications (PTM)
[1]. The establishment of proteomics pipelines in A. thaliana
represents an invaluable tool in the study of molecular mechanisms
of plants. In most of the cases, the extrapolation of these proteo-
mics protocols to other plant species is not feasible. In fact, in
nonmodel plant species, such as tropical fruits with recalcitrant
tissue and limited genomics information exponentially increase
the complexity of proteomics protocols. Working with fruit tissues
such as peels has some pitfalls due to the presence of a thick cuticle,
cell wall, lining, proteases, storage polysaccharides, phenolic com-
pounds, lipids, and secondary metabolites, and the high dynamic
range of the proteome. Therefore, a protocol should be optimized
for each plant species and tissue.
179
180 Janet Juarez-Escobar et al.
Plant proteome complexity increases due to posttranslational

modifications (PTM). Phosphorylation is considered one of the
most important posttranslational modifications pertaining to a
plant’s response to external stimuli and other cellular processes,
such as signal transduction, cell proliferation, differentiation, apo-
ptosis, and metabolism [2, 3]. At least one-third of all proteins are
phosphorylated at any given time, as the result of a kinase reaction,
so phosphorylation data can be a measure of signaling activity [3–
6]. Phosphorylation occurs on tyrosine, serine, or threonine resi-
dues, although six other amino acids can be phosphorylated: cyste-
ine, arginine, lysine, aspartate, glutamate, and histidine [7]. Protein
phosphorylation is highly dynamic, spatially regulated in the cell,
and inherently of low stoichiometry [5], therefore of relatively low
abundance (see Note 1). Hence, we need to implement effective
protocols for enrichment; moreover, phosphoproteins represent a
small proportion of total proteins present in the initial cell lysate.
Over the past several decades, protein phosphorylation has been
visualized on 1D and 2D gels, by 32P labeling or by Western blotting
with phosphosite-specific antibodies or using phosphoprotein stains
that specifically bind to phosphate moieties of phosphoproteins,
such as Pro-Q Diamond Phosphoprotein stain (Pro-Q DPS). How-
ever, these techniques are not entirely reliable due to the generation
of false-positive results. It is worth to note that there is no standard
method for sample processing, but a typical phosphoproteomics
workflow includes cell lysis, protein extraction, reduction/alkyl-
ation, trypsin digestion, prefractionation, enrichment, and MS anal-
ysis (Fig. 1). Although there are many strategies to follow, a highly
efficient protein extraction process with proteases and phosphatase
inhibitor cocktails is strongly suggested as the first step (see Note 2),
followed by fast and reliable digestion required to inhibit any prote-
ase and phosphatase activity (see Note 3). In addition, a fraction-
ation prior to an enrichment step is suggested for several reasons:
(a) to reduce sample complexity; (b) to improve the sensitivity to the
moderate and/or low-abundance proteins especially where high-
abundance proteins might dim the presence of the less abundant, or
when protein concentrations vary through a wide range [8]; (c) to
remove nonphosphorylated moieties since they are often expressed
in low abundance; (d) the low stoichiometry of phosphorylation
yields a small number of phosphopeptides; and (e) generally many
regulatory phosphoproteins have low expression levels.
1.1 Fractionation Fractionation is oriented to reduce complexity in each sample and

Methods enhance the subsequent enrichment stage [3]. It is usually carried
out after protein digestion. Fractionation methods include hydro-
philic or ion exchange resins, electrostatic repulsion hydrophilic
interaction chromatography (ERLIC), polymer-based metal ion
affinity capture (PolyMAC), and hydroxyapatite chromatography.
Phosphoproteomic in Peels 181
Fig. 1 Schematic representation of a phosphoproteomics workflow. Strategies often combine two orthogonal
separation modes or multiple enrichment techniques to enhance phosphoproteome coverage. Colored
diagram is the representation of the methodology presented in this chapter
Some examples of commercially available methods of fractionation

are as follows:
l HILIC (hydrophilic interaction liquid chromatography):
– SeQuant®ZIC®-HILIC column (Merck).
– TSKgel Amide-80 column (Tosoh).
l SAX (strong anion exchange):
– POROS™ XQ strong anion exchange resin (Thermo).
– Pierce spin columns (Thermo).
– III pro analysis (Merck).
– Amberlite IRA-402 (Merck).
– Amberlite IRA-410 (Merck).
– Amberjet 4200 CL (Merck).
– Dowex 1 8 (Merck).
l SCX (strong cation exchange; please note that this may require a
further desalting step):
– POROS™ XS strong cation exchange resin (Thermo).
– POROS™ XS resin (Thermo).
– Pierce SCX columns (Thermo).
– Amberlite IR-120 (Merck).
– Dowex 50 WX-8 (Merck).
– Amberlyst 15 (Merck).
– Dowex 50 WX-4 (Merck).
– Polysulfoethyl aspartamide.
– RESOURCE S column (GE Healthcare, Sweden).
HILIC offers a highly efficient separation of polar molecules, so

phosphorylated peptides are retained more strongly in the column.
HILIC has the highest degree of orthogonality to RP of all com-
mon separation methods. When using HILIC fractionation, a fur-
ther enrichment with Fe3+-IMAC is suggested since it has been
observed that HILIC fractionation improves the selectivity of
IMAC to greater than 90% [9]. ERLIC includes the selectivity of
HILIC with an additional electrostatic repulsion given by the func-
tional groups attached to the stationary phase. It was observed that
using ERLIC, the number of phosphopeptides was tripled com-
pared to SCX-IMAC [10].
First: SCX or SAX Phosphopeptides elute earlier than their

Second: IMAC nonphosphorylated counterparts, then Fe+-
IMAC enriches the collected fraction [15]
First: SCX or HILIC Phosphoryl group let the enrichment of
Second: TiO2 negative-charged peptides [16]
First: TiO2 Improves efficiency and reproducibility in
Second: SCX large-scale quantitative profiling [17, 18]
First: Estimates the level of tyrosine phosphorylation
Immunoprecipitation of and lets the recovery of large number of
pTyr peptides peptides [19]
Second: Fe (III)-NTA
column
1.2 Enrichment The most efficient enrichment strategy is carried out after peptide
Methods digestion. It should be considered that not all proteins can be later
identified using the fragmented peptides, since the nonphosphory-
lated “part” is lost during the enrichment step [13]. Phosphopro-
tein enrichment alone (without prefractionation) is less used since
the complexity of vegetal samples; it is also is preferred when
working with proteins associated with subcellular fractions or
isolated organelles [14]. Phosphopeptide enrichment strategies
Table 1
Affinity chromatography-based techniques commercially available
Stationary
phase Method Component Examples
Metals IMAC Fe, Ga, Al, Zr HisPur™ Cobalt Superflow Agarose, by
Ni Thermo Fisher
Ni-NTA spin column (P/N 31014), Qiagen
MOAC TiO2 Titansphere Phos-TiO kits are available from GL
Sciences Inc. (Torrance, CA, USA)
Antibodies Immunoprecipitation Phosphotyrosine pTyr-100 (Cell Signaling Technology)
peptides
are based on chemical modifications, affinity chromatography, and

immunological techniques (Table 1). Alternative affinity methods
are Phos-Tag chromatography, polymer-based metal ion affinity
capture (PolyMAC), and hydroxyapatite chromatography. It is
usual to find coupled orthogonal chromatographic strategies that
produce nonoverlapping separation but an increased identification
of low-abundance peptides [14]. It is also possible to perform two
sequential enrichment steps to reduce sample complexity and
increase the phosphoproteome coverage, for example: It is usual
to find coupled orthogonal chromatographic strategies that pro-
duce nonoverlapping separation but an increased identification of
low-abundance peptides [14]. For a more detailed description of
enrichment techniques, review the following references [20–32].
1.3 Quantification The most used techniques comprise label-free quantification

Strategies (LFQ), stable isotope labeling by amino acids in cell culture
(SILAC), and isobaric tandem mass tags (iTRAQ or TMT), with
LFQ and SILAC being the most accurate techniques [33].
A cost-effective alternative to the commercial iTRAQ and TMT
[34], that has been proved successful in maize leaves [35] and
Arabidopsis [36], is the stable isotope dimethyl labeling of peptides
at their α- and ε-amino groups before enrichments using SCX
and IMAC.
Chemical derivatization strategies can be used to incorporate
sample labeling for quantification (e.g., after β-elimination of the
phosphate group). Weckwerth et al. [37] added ethanethiol and
ethane-d5-thiol to create two different isotopic labels to make a
quantitative comparison of the samples. Goshe et al. [38]
incorporated a phosphoprotein isotope-coded affinity tag
(PhIAT), a biotinylated tag that allowed for high-specificity affinity
purification and isotopic labeling to perform a relative
quantification.
1.4 Plant There are many tools developed as predictors for phosphorylation
Phosphoproteome sites classified under kinase-specific or non–kinase-specific queries.
Tools and Databases For example, in the case of kinase-specific tools, users should pro-
vide protein sequence and the type of kinase under consideration.
Databases of phosphorylation sites and prediction tools are sum-
marized in Table 2.
Table 2
Prediction tools and databases of phosphorylation sites in plant proteomics
Name Description Link Ref.

Databases
PhosPhAt 4.0 Arabidopsis http://phosphat.uni- [39]
phosphorylation sites hohenheim.de
identified by mass
spectrometry in large-
scale experiments by
different research
groups.
dbPPT A curated database from http://dbppt.biocuckoo. [40]
literature in org
consistency with other
databases.
P3DB 3.0 Provides information http://www.p3db.org/ [41]
and annotation
regarding gene
ontology homologs,
three-dimensional
structure, kinase/
phosphatase families,
protein domains.
RIPP-DB RIKEN Plant http://metadb.riken.jp/ [42]
Phosphoproteome metadb/db/SciNetS_
Database: information ria102i)
obtained by LC-MS/
MS shotgun
phosphoproteomics of
Arabidopsis and rice
Medicago Phosphoproteomic data http://www.phospho. [43]
PhosphoProtein from Medicago roots medicago.wisc.edu/
database and phosphorylation db/index.php
sites on proteins
involved in symbiotic
signaling.
(continued)
Table 2
(continued)

ProMEX Mass spectral reference http://promex.pph. [44]
database of tryptic univie.ac.at/promex/
peptide fragmentation
derived from plants.
AtProteome Data of the high-density, http://fgcz-atproteome. [45]
organ-specific uzh.ch
proteome map for
Arabidopsis. All the
information about the
protein identifications
is shown together with
a proteogenomic
mapping of the
peptides onto the
genome, together
with links to other
databases.
Pep2pro Proteome information http://fgcz-pep2pro. [46]
on Arabidopsis based uzh.ch
on 2.6 million peptide
spectra, provides the
information about the
protein identifications
with a proteogenomic
mapping of the
peptides onto the
genome and
annotation in organ-
specific processes;
allows the user a
specific peptide search.
Phospho.ELM Curated database of http://phospho.elm.eu. [47]
experimentally verified org/index.html
phosphorylation sites
in eukaryotic proteins,
linked to literature
references; also
incorporates sites
contained in universal
databases such as
UniProt (www.
uniprot.org).
(continued)
Table 2
(continued)

Predictors
NetPhos 3.1 server Kinase-specific http://www.cbs.dtu.dk/ [48]
prediction of services/NetPhos-
phosphorylation sites 3.1/
for the following
17 kinases: ATM, CKI,
CKII, CaM-II,
DNAPK, EGFR,
GSK3, INSR, PKA,
PKB, PKC, PKG,
RSK, SRC, cdc2, cdk5,
and p38MAPK
PHOSFER Predicts phosphorylation http://saphire.usask.ca/ [49]
(PHOsphorylation sites in soybean saphire/phosfer/
Site FindER) proteins
KinasePhos 2.0 Integrates SVM (support http://kinasephos2.mbc. [18, 50]
vector machines). nctu.edu.tw/
Approximately 91%
accuracy for prediction
of phosphorylated Ser,
Thr, Tyr, and histidine
residues is exhibited by
this tool.
Scansite 4.0 Prediction of https://scansite4.mit. [51]
phosphorylation sites edu/4.0/#home
is based on a matrix of
selectivity values for
amino acids for every
probable site of
phosphorylation.
PhosphoRice Meta-predictor of rice- https://github.com/ [52]
specific PEHGP/
phosphorylation site, PhosphoRice
was constructed by
integrating the newly
predictors,
NetPhos2.0,
NetPhosK,
Kinasephos, Scansite,
Disphos, and
Predphosphos with
parameters selected by
restricted grid search
and random search
(continued)
Table 2
(continued)

MUsite 1.0 Pretrained model for http://musite. [53]
prediction of kinase- sourceforge.net
specific protein
for A. thaliana and
other five eukaryotic
models. Provides a
unique functionality
for training
customized prediction
models (including
condition-specific
models) from users’
own data.
PlantPhos Prediction of potential http://csb.cse.yzu.edu. [54]
phosphorylation sites tw/PlantPhos/index.
from catalytic kinase html
motifs generated using
maximal dependence
decomposition
(MDD)
PhosK3D Web server for http://csb.cse.yzu.edu. [55]
identifying kinase- tw/PhosK3D/
specific
on protein sequences
and three-dimensional
structures.
2 Materials
Prepare all solutions using purified deionized water and analytical

grade reagents. Prepare and store all reagents at room temperature
(unless indicated otherwise).
Note: mention of specific companies does not represent an
endorsement by the authors. Reagents are purchased from Sigma,
unless otherwise noted. Prepare all solutions using double-
deionized water (MilliQ) and analytical HPLC-grade reagents.
2.1 Total Protein 1. Mortar and pestle.

Extraction 2. Liquid nitrogen.
3. Polyvinylpolypyrrolidone (PVPP).
4. Extraction buffer: 150 mM Trizma base pH 8, 100 mM KCl,
1.4 M Sucrose, 1% Triton X-100, 5% (v/v) β-mercaptoethanol.
Freshly add protease inhibitor cocktail (e.g., Sigma Plant

Protease Inhibitor Cocktail, 100 μL for every 5 g of tissue),
1 mM phenylmethylsulfonyl fluoride (PMSF, previously
prepared with absolute ethanol see Note 2). Then, add the
following phosphatase inhibitors: 10 mM sodium pyrophos-
phate dibasic (Na2H2P2O7), 1 mM sodium orthovanadate
(Na3VO4), 10 mM β-glycerolphosphate, 50 mM sodium fluo-
ride (NaF) (see Notes 3 and 4).
5. Phenol, Tris-saturated pH 8.0.
6. Protein quantification reagents (e.g., Thermo Scientific Pierce
BCA Protein Assay kit).
7. 1.5 mL tubes.
2.2 SDS– 1. SDS protein sample buffer: To make 1 mL of a 4 stock mix,

Polyacrylamide Gel mix the following: 0.1 g sodium dodecyl sulfate (SDS), 0.4 g
Electrophoresis sucrose, 50 μL 1 M Tris–HCl pH 6.8, 10 μL 100 mM EDTA,
400 μL water, 200 μL 14.7 M β-mercaptoethanol, and
bromophenol blue.
2. 10x Laemmli electrophoresis running buffer: Dissolve in
1000 mL water the following: 30.0 g Tris, 144.0 g glycine,
and 10.0 g SDS. Store the running buffer at room
temperature.
3. Separating buffer: 1.5 M Tris–HCl, pH 8.8, 0.4% SDS.
4. Stacking buffer: 0.5 M Tris–HCl, pH 6.8; 0.4% SDS.
5. Fresh 10% ammonium persulfate (APS) water solution.
6. Casting of two separation 14% acrylamide mini-gels with 6 M
urea: 3.6 g urea, 1.18 mL water, 2.5 mL separation buffer,
3.5 mL of 40% acrylamide, 6 μL N,N,N0 ,N-
0
-tetramethylethylenediamine (TEMED), and 60 μL 10% APS.
7. Casting of 6% acrylamide stacking gel for two mini-gels: 3.6 g
urea, 3.6 mL water, 2.5 mL stacking buffer, 1.5 mL acrylam-
ide, 8 μL TEMED, and 80 μL 10% APS.
8. Prestained protein molecular weight markers (e.g., Thermo
Scientific).
9. Gel casting and electrophoresis system (e.g., Mini-PRO-
TEAN® Tetra Handcast System with Mini-PROTEAN®
Tetra Cell, Bio-Rad).
2.3 Reduction, 1. Prepare just before use 50 mM of ammonium bicarbonate

Alkylation, and (NH4HCO3) in 100 mL water.
Digestion 2. Reduction buffer (prepare just before use): 10 mM Tris
(2-carboxyethyl) phosphine hydrochloride (TCEP) in
50 mM NH4HCO3.
3. 30 mM iodoacetamide in 50 mM NH4HCO3.
4. 30 mM 1,4-dithiothreitol (DTT) in 50 mM NH4HCO3.

5. Acetone, MS grade.
6. Triethylammonium bicarbonate (TEAB).
7. Trypsin, MS grade.
8. Trypsin enhancer (e.g., ProteaseMAX™ Surfactant,
Promega).
2.4 Direct 1. Iron-chelate resin in spin columns (e.g., Pierce™ HiSelect™

Phosphopeptide Fe-NTA phosphopeptide Enrichment, Thermo Scientific™).
Enrichment
2.5 High pH 1. Reversed-phase fractionation resin, trimethylamine (0.1%)

Reversed-Phase (RP) (e.g., Pierce™ high pH reversed-phase peptide fractionation
Fractionation kit, Thermo Scientific™).
2.6 SCX-RP 1. SCX Diluent: 25% ACN/water, pH 3.0.

Fractionation and 2. SCX “A” buffer: 10 mM KH2PO4/25% acetonitrile, pH 3.0.
Enrichment
3. SCX “B” buffer: 10 mM KH2PO4/1 M KCl/2% acetonitrile
(1:1:1) pH 3.0.
4. Equilibration/loading/rinse buffer ¼ SCX diluent and SCX
“A” Buffer (1:1).
5. Elution buffer: SCX “A” buffer: SCX “B” buffer (1:1).
6. Strong cation exchanger cartridges (e.g., HyperSep Strong
Cation Exchanger (SCX) SPE Cartridges [Thermo
Scientific]).
7. RP equilibrium/load buffer: 0.1% TFA.
8. RP desalting buffer: 5% MeOH/0.1% TFA (1:1).
9. RP elution buffer: 50% ACN/0.1% TFA (1:1).
10. Formic acid, MS grade.
2.7 Other Materials 1. Gel documentation system (e.g., Gel Doc™ XR System,
Bio-Rad®, and software Image Lab™).
2. Centrifugal vacuum concentrator (e.g., CentriVap,
LABCONCO®).
3. Liquid chromatography–mass spectrometry (LC-MS) system.
3 Methods
Carry out all procedures at ice-cold temperature; except for SDS

buffer, add phosphatase inhibitors to all other buffer solutions.
3.1 Tissue Protein 1. Grind approximately 3 g of fruit tissue to fine powder in liquid
Extraction nitrogen, adding PVPP (1:10, w/w) while grinding. Keep into
a 50-mL tube. If not processed after sampling, store at 80 C
until analysis.
2. Add 6 mL of extraction buffer to every 3 g of fruit tissue.
3. Add 6 mL Tris-saturated phenol pH 8.0 and incubate the
samples with agitation in crushed ice for 30 min.
4. Centrifuge at 10,000 g for 30 min at 4 C and transfer each
upper phenolic phase to a new centrifuge tube.
5. Add 4 volumes of ice-cold acetone with 0.07% (v/v)
β-mercaptoethanol for soluble protein precipitation at
20 C overnight.
6. Centrifuge at 3000 g for 30 min at 4 C.
7. Remove the supernatant after centrifugation, and let the pellet
dry under a laboratory fume hood.
8. Resuspend the dried pellet with 350 μL of phosphate-buffered
saline (PBS) 1 (Sigma) with SDS (1%). Vortex and sonicate
for 20 min.
9. Centrifuge at 15,000 g for 10 min at RT. Transfer the
supernatant to a new tube.
10. Measure protein concentration of the extract.
11. Store at 80 C until use.
3.2 Subject the 1. Run the gels at 10 mA/gel through the stacking gel and
Extract to SDS– increase to 25 mA/gel when the samples have entered the
Polyacrylamide Gel separation gel.
Electrophoresis (SDS-
PAGE) According to
Laemmli [56]
3.3 Reduction, 1. To 100 μg of protein extract add water to a final volume of

Alkylation, and 100 μL.
Digestion 2. Add 10 mM TCEP and incubate for 45 min at 60 C.
3. To alkylate the proteins, add 30 mM IAM and incubate in the
dark for 60 min at room temperature (21 C). Add 30 mM
DTT and incubate for 10 min at room temperature.
4. Add 1 mL ice-cold acetone and incubate overnight at 20 C
for protein precipitation.
5. Centrifuge at 10,000 g for 15 min at 4 C. Discard the
supernatant and let the pellet dry in a vacuum concentrator.
6. Resuspend the dried pellet with 150 μL 50 mM TEAB + 0.1%
SDS. Sonicate for 15 min.
7. Measure protein concentration of the reduced/alkylated
extract.
8. Add ProteaseMAX™ Surfactant (Promega) and trypsin at a

(0.01:1:30) ratio (trypsin–protein). Incubate for 3 h at 37 C
(see Note 5).
3.4 Direct Fe-NTA 1. Lyophilize digested sample in a centrifugal vacuum

Enrichment concentrator.
2. Follow the manufacturer’s instructions (Thermo Scientific) for
Fe-NTA-based phosphopeptide enrichment.
(a) Resuspend digested peptides in binding/wash buffer (see
Note 6).
(b) Equilibrate the spin column in binding/wash buffer and
centrifuge at 1000 g for 30 s.
(c) Bind phosphopeptides to spin column and gently rock for
30 min at room temperature, then centrifuge as indicated
in the previous step.
(d) Wash column thrice with binding/wash buffer discarding
each flow through after centrifugation at 1000 g.
Repeat washing once with water.
(e) Elute the phosphopeptides with elution buffer and centri-
fuge at 1000 g; repeat one more time (see Note 7).
3.5 Fractionation 1. Dry digested sample in a centrifugal vacuum concentrator.

Prior to Fe-NTA 2. Follow the manufacturer’s instructions (Thermo Scientific) for
Enrichment high-pH reversed-phase fractionation (see Note 8).
3.5.1 High RP
Fractionation and
Enrichment
3.5.2 SCX-RP 1. Reconstitute lyophilized digested sample in 1 mL equilibration

Fractionation and buffer. Adjust pH 2.5–3.0 with formic acid if necessary.
Enrichment 2. Wet HyperSep SPE cartridge with 2 mL of Milli-Q water.
3. Pass 1 mL of elution buffer at 1–2 drops/s.
4. Wash with 2 mL of Milli-Q water.
5. Equilibrate with 5 mL of equilibration buffer at 1–2 drops/s.
6. Load the sample slowly, no faster than 1 drop/s. Collect the
effluent.
7. Rinse the cartridge with 3 mL of equilibration buffer and
collect the effluent in the same tube.
8. Elute the sample with 1.5 mL of elution buffer with increasing
concentrations of KCl: 75, 250, 500 mM, no faster than
1 drop/s. Collect the effluents in clean tubes.
9. Dry the effluents in a centrifugal vacuum concentrator.
10. Reconstitute dried effluents with 1 mL 0.1–0.5% TFA. Adjust

pH 2.5–3.0 with formic acid if necessary.
11. Wet new HyperSep SPE cartridge with 2 mL of RP equilib-
rium/load buffer.
12. Load the sample slowly and 1 mL of RP equilibrium/load
buffer.
13. Pass 1 mL of RP desalting buffer slowly.
14. Elute the sample with 1 mL of RP elution buffer. Collect the
effluents and let them dry in a centrifugal vacuum
concentrator.
15. Fe-NTA Enrichment is carried out as described in Subheading
3.4.
3.6 LC/MS-MS Mass spectrometric analysis is performed according to the methods

Analysis available to the user (see Note 9): neutral-loss scanning, multistage
activation, or MS2 fragmentation.
We use a nanosystem UltiMate RSLC (Dionex, Sunnyvale, CA)
and an Orbitrap Fusion™ Tribid™ (Thermo Fisher Scientific, San
José, CA) mass spectrometer with electrospray ionization in posi-
tive mode at 3.5 kV. Each sample is reconstituted with 50 μL of
0.1% formic acid. Twenty microliters is injected to a C18 precolumn
(2 cm 3 μm ID, 75 μm OD, Dionex, Sunnyvale, CA) at a flow
rate of 3 μL min1. Peptides are separated on an EASY-Spray
column (25 cm 75 μm ID), PepMap RSLC C18 2 μm, at a
flow rate of 300 μL min1 for 100 min. Solvent A (0.1% formic
acid) and solvent B (0.1% formic acid in 90% ACN) are used to
establish a elution gradient: solvent A for 10 min, solvent B from 7%
to 20% for 35 min, solvent B (20%–25%) for 15 min, solvent B
(25–95%) for 20 min, and solvent A for 8 min. Calibration is
performed with caffeine, Met-Arg-Phe-Ala (MRFA), and
Ultramark 1621.
A typical analysis MS2 will show ions a, b, and y with multiple
charges and the neutral loss of the phosphate in Fig. 2.
4 Notes
1. In phosphoproteomics, it is better to use data-independent

acquisition (DIA) over data-dependent acquisition (DDA). In
this way, even low-abundance peptide ions will not be lost
because of their low intensities. DIA helps to overcome the
poor ionization and low stoichiometry of phosphopeptides in
the samples.
2. APMSF is unstable in aqueous solutions, and the buffer should
be used as soon as possible after the addition of PMSF.
Fig. 2 Phosphorylated amino acids: T1, T16, S22 (79.96633 Da), charge: +3. Identified with: Mascot (v1.36);
fragments used for search: a, a-H2O, a-NH3; b, b-H2O, b-NH3; y-H2O, y-NH3
3. Any given extraction protocol has its physicochemical limita-

tions. In our experience with several tropical fruits, we obtained
optimal results with a protocol based on phenol and acetone
precipitation.
4. To preserve phosphorylation, it is essential that extract buffers
and SDS-PAGE sample buffers contain high concentrations of
phosphatase inhibitors, for example, when using sodium fluo-
ride and β-glycerolphosphate together at 50 and 100 mM,
respectively [57]
5. In order to keep ProteaseMAX™ Surfactant efficiency, pH
should be maintained at 7.8.
6. We suggest carrying out a previous C18 or SCX prefractiona-
tion as a way to enhance the phosphopeptide recovery. Be sure
to keep pH < 3 during this procedure and to gently mix the
sample with the resin.
7. Dry immediately to avoid modification in the phosphopeptides
due to extreme acid of the elution buffer.
8. When using this kit, special care should be taken when mixing
the sample and buffer to avoid resin slurrying.
9. We used Orbitrap fusion™ Tribrid™ MS1 detection in Orbi-
trap and ddMS2 in ion trap (IT) when using electron transfer
dissociation (ETD), higher-energy C-trap dissociation (HCD),
and collision-induced dissociation (CID). In Orbitrap fusion™
Tribrid™ method settings. Experiment 1. In a cycle time of 3 s,
master scan in detector Orbitrap at resolution 120,000, use
quadrupole isolation in scan range 350–1500 m/z, maximum
injection time: 50 ms. Include charge states: 2–8. Scan event
type 1: condition charge states 3–4, range 300–1600 m/z.
Scan event type 2: condition charge state 2, 3, 4, 5, range

400–1600 m/z and scan event type 3: condition charge states
2–5.
ddMS2 ETD. MS2, isolation mode in quadrupole. ETD
activation type, collision energy 10%. Ion trap scan rate: rapid.
First mass 120 m/z. Maximum injection time: 100 ms. Orbi-
traps and Q-TOF instruments have a mechanism of fragmen-
tation that preserves the PTMs.
ddMS2 HCD. MS2, isolation mode: quadrupole, HCD
activation type with collision energy 30%. Detector type: ion
trap, scan range mode: auto: m/z normal. Ion trap scan rate:
rapid, first mass (m/z): 100, maximum Injection time (ms): 50.
ddMS2 CID. MS2, isolation mode: quadrupole, CID acti-
vation type, collision energy 30%. Activation Q: 0.25. Ion trap
scan rate: rapid. AGC target: 1.0e4. Maximum injection time
(ms): 50.
Acknowledgments
This work was supported by the National Council of Science and

Technology (FS-1515, Fordecyt 292399, INFR-2015-01-
255045, and INFR-2017-01-280898 to V.M.L. and U0004-
PROCEDYT_2015-1_259915 to E.R.M.).
References
1. Friso G, van Wijk KJ (2015) Posttranslational phosphoproteomics: methods and protocols.
protein modifications in plant metabolism. Springer, New York, pp 25–46
Plant Physiol 169:1469–1487 8. Yang Z, Li N (2015) Absolute quantitation of
2. Mann M, Ong SE, Gronborg M et al (2002) protein posttranslational modification isoform.
Analysis of protein phosphorylation using mass In: Schulze WX (ed) Plant phosphoproteo-
spectrometry: deciphering the phosphopro- mics: methods and protocols. Springer,
teome. Trends Biotechnol 20:261–268 New York, pp 105–119
3. Kumar V, Khare T, Sharma M et al (2018) 9. McNulty DE, Annan RS (2008) Hydrophilic
Engineering crops for the future: a phospho- interaction chromatography reduces the com-
proteomics approach. Curr Protein Pept Sci plexity of the phosphoproteome and improves
19:413–426 global phosphopeptide isolation and detection.
4. de la Fuente van Bentem S, Roitinger E, Mol Cell Proteomics 7:971–980
Anrather D et al (2006) Phosphoproteomics 10. Gan CS, Guo T, Zhang H et al (2008) A
as a tool to unravel plant regulatory mechan- comparative study of electrostatic repulsion-
isms. Physiol Plant 126:110–119 hydrophilic interaction chromatography
5. Macek B, Mann M, Olsen JV (2009) Global (ERLIC) versus SCX-IMAC-based methods
and site-specific quantitative phosphoproteo- for phosphopeptide isolation/enrichment. J
mics: principles and applications. Annu Rev Proteome Res 7:4869–4877
Pharmacol Toxicol 49:199–221 11. Beltran L, Cutillas PR (2012) Advances in
6. Cutillas PR, Timms JF (2010) Approaches and phosphopeptide enrichment techniques for
applications of quantitative LC-MS for proteo- phosphoproteomics. Amino Acids
mics and activitomics. In: Cutillas PR, Timms 43:1009–1024
JF (eds) LC-MS/MS in proteomics. Springer, 12. Silva-Sanchez C, Li H, Chen S (2015) Recent
New York, pp 3–17 advances and challenges in plant phosphopro-
7. Schweighofer A, Meskiene I (2015) Phospha- teomics. Proteomics 15:1127–1141
tases in plants. In: Schulze WX (ed) Plant
13. Fı́la J, Honys D (2012) Enrichment techniques peptides on hafnium oxide prior to mass spec-
employed in phosphoproteomics. Amino Acids trometric analysis. Analyst 134:31–33
43:1025–1047 26. Ye J, Zhang X, Young C et al (2010) Opti-
14. Ito J, Taylor NL, Castleden I et al (2009) A mized IMAC-IMAC protocol for phosphopep-
survey of the Arabidopsis thaliana mitochon- tide recovery from complex biological samples.
drial phosphoproteome. Proteomics J Proteome Res 9:3561–3573
9:4229–4240 27. Larsen MR, Thingholm TE, Jensen ON et al
15. Villén J, Gygi SP (2008) The SCX/IMAC (2005) Highly selective enrichment of phos-
enrichment approach for global phosphoryla- phorylated peptides from peptide mixtures
tion analysis by mass spectrometry. Nat Protoc using titanium dioxide microcolumns. Mol
3:1630 Cell Proteomics 4:873–886
16. Batth TS, Francavilla C, Olsen JV (2014) 28. Aryal UK, Ross AR (2010) Enrichment and
Off-line high-pH reversed-phase fractionation analysis of phosphopeptides under different
for in-depth phosphoproteomics. J Proteome experimental conditions using titanium dioxide
Res 13:6176–6186 affinity chromatography and mass spectrome-
17. Wu J, Warren P, Shakey Q et al (2010) Inte- try. Rapid Commun Mass Spectrom
grating titania enrichment, iTRAQ labeling, 24:219–231
and Orbitrap CID-HCD for global identifica- 29. Salomon AR, Ficarro SB, Brill LM et al (2003)
tion and quantitative analysis of phosphopep- Profiling of tyrosine phosphorylation pathways
tides. Proteomics 10:2224–2234 in human cells using mass spectrometry. Proc
18. Ren L, Li C, Shao W et al (2017) TiO2 with Natl Acad Sci U S A 100:443–448
tandem fractionation (TAFT): an approach for 30. Rush J, Moritz A, Lee KA et al (2005) Immu-
rapid, deep, reproducible, and high- noaffinity profiling of tyrosine phosphorylation
throughput phosphoproteome analysis. J Pro- in cancer cells. Nat Biotechnol 23:94
teome Res 17:710–721 31. Bergström Lind S, Molin M, Savitski MM et al
19. Adelmant GO, Cardoza JD, Ficarro SB et al (2008) Immunoaffinity enrichments followed
(2011) Affinity and chemical enrichment for by mass spectrometric detection for studying
mass spectrometry-based proteomics analyses. global protein tyrosine phosphorylation. J Pro-
In: Ivanov A, Lazarev A (eds) Sample prepara- teome Res 7:2897–2910
tion in biological mass spectrometry. Springer, 32. Zhang G, Neubert TA (2006) Use of deter-
Dordrecht, pp 437–486 gents to increase selectivity of immunoprecipi-
20. Reinders J, Sickmann A (2005) State-of-the-art tation of tyrosine phosphorylated peptides
in phosphoproteomics. Proteomics prior to identification by MALDI quadrupole-
5:4052–4061 TOF MS. Proteomics 6:571–578
21. Li W, Backlund PS, Boykins RA et al (2003) 33. Hogrebe A, von Stechow L, Bekker-Jensen DB
Susceptibility of the hydroxyl groups in serine et al (2018) Benchmarking common quantifi-
and threonine to b-elimination/Michael addi- cation strategies for large-scale phosphopro-
tion under commonly used moderately high- teomics. Nat Commun 9:1045
temperature conditions. Anal Biochem 34. Boersema PJ, Aye TT, van Veen TA et al (2008)
323:94–102 Triplex protein quantification based on stable
22. Warthaka M, Karwowska-Desaulniers P, Pflum isotope labeling by peptide dimethylation
MK (2006) Phosphopeptide modification and applied to cell and tissue lysates. Proteomics
enrichment by oxidation-reduction condensa- 8:4624–4632
tion. ACS Chem Biol 1:697–701 35. Bonhomme L, Valot B, Tardieu F et al (2012)
23. Pinkse MW, Uitto PM, Hilhorst MJ et al Phosphoproteome dynamics upon changes in
(2004) Selective isolation at the femtomole plant water status reveal early events associated
level of phosphopeptides from proteolytic with rapid growth adjustment in maize leaves.
digests using 2D-NanoLC-ESI-MS/MS and Mol Cell Proteomics 11:957–972
titanium oxide precolumns. Anal Chem 36. Boex-Fontvieille E, Daventure M, Jossier M
76:3935–3943 et al (2013) Photosynthetic control of Arabi-
24. Kweon HK, Håkansson K (2006) Selective zir- dopsis leaf cytoplasmic translation initiation by
conium dioxide-based enrichment of phos- protein phosphorylation. PLoS One 8:e70692
phorylated peptides for mass spectrometric 37. Weckwerth W, Willmitzer L, Fiehn O (2000)
analysis. Anal Chem 78:1743–1749 Comparative quantification and identification
25. Rivera JG, Choi YS, Vujcic S et al (2009) of phosphoproteins using stable isotope label-
Enrichment/isolation of phosphorylated ing and liquid chromatography/mass
spectrometry. Rapid Commun Mass Spectrom 48. Blom N, Sicheritz-Pontén T, Gupta R et al

14:1677–1681 (2004) Prediction of post-translational glyco-
38. Goshe MB, Veenstra TD, Panisko EA et al sylation and phosphorylation of proteins from
(2002) Phosphoprotein isotope-coded affinity the amino acid sequence. Proteomics
tags: application to the enrichment and identi- 4:1633–1649
fication of low-abundance phosphoproteins. 49. Trost B, Kusalik A (2013) Computational
Anal Chem 74:607–616 phosphorylation site prediction in plants using
39. Heazlewood JL, Durek P, Hummel J et al random forests and organism-specific instance
(2007) PhosPhAt: a database of phosphoryla- weights. Bioinformatics 29:686–694
tion sites in Arabidopsis thaliana and a plant- 50. Wong YH, Lee TY, Liang HK et al (2007)
specific phosphorylation site predictor. Nucleic KinasePhos 2.0: a web server for identifying
Acids Res 36(Database):D1015–D1021 protein kinase-specific phosphorylation sites
40. Cheng H, Deng W, Wang Y et al (2014) based on sequences and coupling patterns.
dbPPT: a comprehensive database of protein Nucleic Acids Res 35:W588–W594
phosphorylation in plants. Database (Oxford) 51. Obenauer JC, Cantley LC, Yaffe MB (2003)
2014:bau121 Scansite 2.0: proteome-wide prediction of cell
41. Yao Q, Ge H, Wu S et al (2013) P3DB 3.0: signaling interactions using short sequence
from plant phosphorylation sites to protein motifs. Nucleic Acids Res 31:3635–3641
networks. Nucleic Acids Res 42(Database 52. Que S, Li K, Chen M et al (2012) PhosphoR-
issue):D1206–D1213 ice: a meta-predictor of rice-specific phosphor-
42. Nakagami H, Sugiyama N, Mochida K et al ylation sites. Plant Methods 8:5
(2010) Large-scale comparative phosphopro- 53. Gao J, Thelen JJ, Dunker AK et al (2010)
teomics identifies conserved phosphorylation Musite: a tool for global prediction of general
sites in plants. Plant Physiol 153:1161–1174 and kinase-specific phosphorylation sites. Mol
43. Rose CM, Venkateshwaran M, Grimsrud PA Cell Proteomics 9:2586–2600
et al (2012) Medicago phosphoprotein data- 54. Lee TY, Bretaña NA, Lu CT (2011) PlantPhos:
base: a repository for Medicago truncatula using maximal dependence decomposition to
phosphoprotein data. Front Plant Sci 3:122 identify plant phosphorylation sites with sub-
44. Hummel J, Niemann M, Wienkoop S et al strate site specificity. BMC Bioinformatics
(2007) ProMEX: a mass spectral reference 12:261
database for proteins and protein phosphoryla- 55. Su MG, Lee TY (2013) Incorporating sub-
tion sites. BMC Bioinformatics 8:216 strate sequence motifs and spatial amino acid
45. Baerenfaller K, Grossmann J, Grobei MA et al composition to identify kinase-specific phos-
(2008) Genome-scale proteomics reveals Ara- phorylation sites on protein three-dimensional
bidopsis thaliana gene models and proteome structures. BMC Bioinformatics 14:S2
dynamics. Science 320:938–941 56. Laemmli UK (1970) Cleavage of structural
46. Hirsch-Hoffmann M, Gruissem W, Baerenfal- proteins during the assembly of the head of
ler K et al (2012) pep2pro: the high- bacteriophage T4. Nature 227:680–685
throughput proteomics data processing, analy- 57. Dephoure N, Gould KL, Gygi SP et al (2013)
sis, and visualization tool. Front Plant Sci Mapping and analysis of phosphorylation sites:
3:123 a quick guide for cell biologists. Mol Biol Cell
47. Dink H, Chica C, Via A et al (2011) Phospho. 24:535–542
ELM: a database of phosphorylation sites.
Nucleic Acids Res 39:D261–D267
Chapter 15
Label-Free Quantitative Phosphoproteomics for Algae

Megan M. Ford, Sheldon R. Lawrence II, Emily G. Werth,
Evan W. McConnell, and Leslie M. Hicks
Abstract
The unicellular alga Chlamydomonas reinhardtii is a model photosynthetic organism for the study of
microalgal processes. Along with genomic and transcriptomic studies, proteomic analysis of Chlamydomo-
nas has led to an increased understanding of its metabolic signaling as well as a growing interest in the
elucidation of its phosphorylation networks. To this end, mass spectrometry-based proteomics has made
great strides in large-scale protein quantitation as well as analysis of posttranslational modifications (PTMs)
in a high-throughput manner. An accurate quantification of dynamic PTMs, such as phosphorylation,
requires high reproducibility and sensitivity due to the substoichiometric levels of modified peptides, which
can make depth of coverage challenging. Here we present a method using TiO2-based phosphopeptide
enrichment paired with label-free LC-MS/MS for phosphoproteome quantification. Three technical
replicate samples in Chlamydomonas were processed and analyzed using this approach, quantifying a
total of 1775 phosphoproteins with a total of 3595 phosphosites. With a median CV of 21% across
quantified phosphopeptides, implementation of this method for differential studies provides highly repro-
ducible analysis of phosphorylation events. While the culturing and extraction methods used are specific to
facilitate coverage in algal species, this approach is widely applicable and can easily extend beyond algae to
other photosynthetic organisms with minor modifications.
Key words Phosphorylation, Quantitative proteomics, Mass spectrometry, Chlamydomonas reinhard-

tii, Label-free, Algae
1 Introduction
The unicellular alga Chlamydomonas reinhardtii is a model organ-

ism for the study of microalgal processes, particularly photosynthe-
sis due to its photoheterotrophic growth [1]. More recently,
Chlamydomonas research has expanded to include the utilization
of microalgae for biofuel production due to their ability to produce
large amounts of triacylglycerol while having rapid growth poten-
tial and tolerance to environmental conditions [2]. Along with
Electronic supplementary material: The online version of this chapter (https://doi.org/10.1007/978-1-0716-

0528-8_15) contains supplementary material, which is available to authorized users.
197
198 Megan M. Ford et al.
genomic and transcriptomic studies [3–5], proteomic analysis of

Chlamydomonas has led to an increased understanding of its meta-
bolic signaling as well as a growing interest in the elucidation of its
phosphorylation networks, particularly those related to biofuel
production [6, 7].
Protein phosphorylation is a posttranslational modification
(PTM) that serves as a rapid and reversible means to modulating
protein activity and signal transduction in the cell. This modifica-
tion involves the addition of a phosphate group to an amino acid by
a protein kinase, which together with phosphatases, can act as a
molecular switch to regulate complex signaling networks. Protein
phosphorylation has been extensively studied for more than
60 years due to its widespread prevalence and its critical involve-
ment in the regulation of nearly all basic cellular processes
[8, 9]. Dynamic protein phosphorylation plays a central role in
cell proliferation, metabolism, signaling, and survival, emphasizing
the need for an efficient and selective method of analysis. However,
studying these events remains an analytical challenge.
One important challenge stems from the labile nature of phos-
phorylation. As a PTM that is tightly linked to protein function, the
phosphorylation status of proteins continually changes in response
to specific conditions and stimuli. Thus, understanding phosphor-
ylation requires detection and quantification of the same phospho-
protein(s) in multiple states, or proteoforms, across different
conditions while using sample preparation techniques, such as
flash freezing and the use of phosphatase inhibitors, to ensure the
signal being analyzed is answering the biological question of inter-
est. An additional challenge arises from the large dynamic range of
phosphorylation events in the cell, which is dependent on the
abundance of the protein in the cell, that can span many orders of
magnitude [10], and the occupancy of the phosphorylation site,
which is generally low at any given time [10]. Also, while phos-
phorylation occurs on thousands of proteins, many of them share
little sequence homology, increasing the difficulty in identifying
dynamic changes in phosphorylation across an entire
phosphoproteome [11].
To date several enrichment approaches have been employed to
address the challenges in assessing protein phosphorylation
[12]. Among these, titanium dioxide metal oxide affinity chroma-
tography (TiO2-MOAC) is one of the most common shotgun
enrichment methods for phosphopeptides from complex biological
samples [13–15]. TiO2-based enrichments have been shown to be
more selective [15], and are less sensitive to interferents such as
salts and detergents than immobilized metal affinity chromatogra-
phy [16, 17]. However, they show preference to singly phosphory-
lated peptides over those with multiple phosphosites, potentially
due to stronger interactions between TiO2 and multiphosphory-
lated peptides making elution of these peptides challenging
Algal Phosphoproteomics 199
[18]. At acidic pH, TiO2 has a high affinity for phosphorylated

species, forming a bidentate bond with the titanium surface and
two of the oxygen atoms [18]. To minimize copurification of acidic
peptides, the use of organic acids, such as phthalic,
2,5-dihydroxybenzoic, or lactic acid, as an additive for binding
enhances the overall selectivity of this enrichment method [12] .
LC-MS/MS offers highly reproducible and accurate systems-
level analysis that can be paired with enrichment for the study of
large-scale protein phosphorylation [19]. For quantification, a
label-free approach can provide advantages over label-based tech-
niques, primarily in experimental design flexibility [20]. Label-free
quantitation (LFQ), with a number of software programs available
to aid in data analysis [20], allows for rapid, straightforward, and
cost-effective measurements of a wide range of protein abundances.
Typically, LFQ is employed via one of two approaches: changes in
ion intensity from LC-peak areas [i.e., area under the curve (AUC)]
of the peptides, or based on spectral counting of peptides from MS2
analysis. The latter approach is limited in its ability to quantify
proteins of low abundance [6] due in part to the variability in
spectral count response for each peptide making it necessary to
observe many spectra for a given protein to assume a linear response
between counts and abundance. Additionally, many experiments
employ a dynamic exclusion of ions already selected for fragmenta-
tion, making accurate quantitation with this method challenging
[21]. In phosphoproteomics, quantitation is performed on a single
peptide for each phosphorylation site, making AUC quantitation
generally preferable for these studies. However, AUC requires
highly reproducible chromatography and high mass accuracy
because it relies on accurate peak alignment and mass measurement
for quantification.
Here we present a method to quantify the phosphoproteome of
Chlamydomonas that uses a combination of efficient extraction,
TiO2-based phosphopeptide enrichment and LFQ to provide in
depth coverage of the phosphoproteome (Fig. 1). Using this
method, analysis of replicate samples resulted in the quantification
of 3595 phosphosites on 1775 phosphoproteins. Assessment of the
reproducibility of this method shows the technical replicates are
highly similar, with a 21% median CV. These results are similar to
previous studies performed using a similar approach that uses iden-
tical sample preparation and LC separation with a different make/
model of mass spectrometer [19, 22]. While our quantitative
breadth of coverage is extensive, qualitative studies have shown
that the global phosphoproteome is still drastically larger than can
be obtained in a shotgun LFQ approach. A previous study [23],
which used two enrichment methods and additional fractionation
to create a total of 60 samples subjected to LC-MS/MS, identified
over 4500 phosphoproteins from nearly 16,000 phosphosites,
showing that there is room for improvement in the depth of
Fig. 1 Phosphoproteomic workflow for Chlamydomonas reinhardtii cells. Briefly, Chlamydomonas cultures are
harvested, resuspended in lysis buffer and sonicated. The lysate is collected and soluble proteins are reduced,
alkylated and digested with trypsin. Phosphopeptides are enriched for using a titanium dioxide-based (TiO2)
enrichment before being subjected to LC-MS/MS analysis. For the data reported here, samples were pooled
after resuspension and aliquoted into three technical replicates to remove any biological variation
coverage obtained in these phosphoproteomic studies. Implemen-

tation of an orthogonal fractionation prior to analysis would help
improve this depth of coverage, but at the cost of increased instru-
ment time and variability from the added sample preparation.
Although providing moderate depth of coverage, the method out-
lined here provides an accurate and high-throughput approach for
analyzing algal phosphoproteomic samples.
2 Materials
2.1 Cell Culture 1. Hutner’s Trace Elements stock [24]. This can be purchased as a
stock solution or prepared in lab (see Note 1).
2. TRIS–Acetate–Phosphate (TAP) Media: 20 mM TRIS base,
17.5 mM acetic acid, 1.65 mM K2HPO4, 945 μM KH2PO4,
287 μM CaCl2, 405 μM MgSO4, 7.01 mM NH4Cl, and Hut-
ner’s Trace Elements. Stock solutions can be made for easy
preparation of TAP media (see Note 2).
3. TAP agar media plates, 1.5% agar: To TAP media (see Subhead-
ing 2.1, item 2), add Bacto Agar and autoclave. Cool media to
52 C and pour plates into petri dishes, 100 15 mm, in
biosafety cabinet, about 10 mL per plate. Let plates solidify
overnight, Parafilm to seal each plate and store at 4 C.
4. Chlamydomonas reinhardtii, strain CC-2895 (6145c mt-).
5. 100 μE m 2
s 1
white light source.
6. Platform shaker.
7. Liquid nitrogen, 0.5 L.
2.2 Protein 1. Lysis buffer: 100 mM TRIS, pH 8.0, 1% Sodium dodecyl

Extraction sulfate (SDS), 1 cOmplete protease inhibitor cocktail
(Roche, Risch-Rotkreuz, Switzerland) and 1 phosSTOP
phosphatase inhibitor cocktail (Roche). Stock solutions can
be made for easy preparation of lysis buffer (see Note 3).
When preparing the lysis buffer, stir slowly when fully dissol-
ving contents to minimize agitation and bubble formation
from the SDS.
2. Covaris 2 mL milliTUBE tubes and 24 Place milliTUBE rack.
3. 100 mM ammonium acetate in methanol (MeOH).
4. 70% ethanol (EtOH).
5. 100 mM TRIS, pH 8.0. Using a 1 M TRIS stock (see Note 3) is
recommended for ease of buffer preparation.
6. Resuspension buffer: 8 M urea, 100 mM TRIS, pH 8.0.
7. CB-X Protein Assay Kit (G-Biosciences, St. Louis, MO, USA)
or equivalent protein quantification assay.
2.3 Reduction, 1. Reduction buffer: 500 mM dithiothreitol in 100 mM TRIS,

Alkylation, and pH 8.0.
Digestion 2. Alkylation buffer: 500 mM iodoacetamide (IAM) in 100 mM
TRIS, pH 8.0. Make fresh for each experiment and cover tube
with aluminum foil or keep buffer in the dark to prevent
degradation of light-sensitive IAM solution.
3. Trypsin resuspension buffer: 50 mM acetic acid.
4. Promega (Madison, WI, USA) Trypsin Gold, Mass Spectrom-
etry grade.
5. 20% trifluoroacetic acid (TFA).
2.4 Desalting 1. Waters (Milford, MA, USA) Sep-Pak C18 1 cc Vac Cartridge,
50 mg, 55–105 μm particle size.
2. 0.1% TFA (LC-MS grade).
3. 80% acetonitrile (ACN, LC-MS grade), 0.1% TFA (LC-MS
grade).
4. Vacuum manifold with 24-port cover (Phenomenex, Torrance,
CA, USA) or equivalent setup.
2.5 Phosphopeptide 1. Wash Buffer: 80% ACN (LC-MS grade), 1% TFA (LC-MS
Enrichment grade).
2. Resuspension Buffer: 80% ACN (LC-MS grade), 1% TFA
(LC-MS grade), 25 mg/mL phthalic acid. This can be made
by adding phthalic acid to the Wash Buffer.
3. Elution Buffer: 20% ACN (LC-MS grade), 5% aqueous

ammonia.
4. TiO2 phosphopeptide enrichment tips, 3 mg. Titansphere™
Phos-TiO Spin Columns (GL Sciences, Torrance, CA, USA)
recommended.
5. Spin column centrifuge adaptors.
2.6 Sample 1. 1% formic acid (FA, LC-MS grade), 2% ACN (LC-MS grade).
Purification 2. 0.1% FA (LC-MS grade).
3. 60% ACN (LC-MS grade), 0.1% FA (LC-MS grade).
4. Millipore (Burlington, MA, USA) C18 ZipTips.
2.7 LC-MS/MS 1. 5% ACN (LC-MS grade), 0.1% TFA (LC-MS grade).

2. LC-MS Total Recovery Vials.
3. Symmetry C18 trap column (100 Å, 5 μm, 180 μm 20 mm;
Waters).
4. HSS T3 C18 column (100 Å, 1.8 μm, 75 μm 250 mm;
Waters). Mobile Phase A: 0.1% FA. Add 1 mL of Optima
LC-MS grade FA to 1 L of Optima LC-MS grade water.
5. Mobile Phase B: 0.1% FA in ACN (LC-MS grade).
6. NanoAcquity UPLC system (Waters).
7. Q Exactive HF-X Hybrid Quadrupole Orbitrap mass spec-
trometer (ThermoFisher, Waltham, MA, USA).
2.8 Data Analysis 1. Progenesis QI for Proteomics v2.0 (Nonlinear Dynamics, Dur-
ham, NC, USA).
2. Mascot Daemon v3.5.1 (Matrix Science, Boston, MA, USA).
3. R script for processing phosphoproteome data. The code used
for processing these data is available on GitHub (https://
github.com/hickslab/QuantifyR).
3 Methods
3.1 Culturing 1. Maintain Chlamydomonas strain on TAP agar plates under

continuous light, streaking a fresh plate from a single colony
on a previous plate every 1–2 weeks.
2. Grow a 100 mL starter culture of Chlamydomonas using TAP
media in a 250 mL flask. In a biosafety cabinet, select a single
colony from a TAP agar plate and suspend it in the TAP media.
Grow the culture 4–5 days shaking at 120 rpm and under
continuous light until a growth density of OD750 0.4–0.5 is
reached.
3. Prepare 6 350 mL liquid culture of Chlamydomonas in TAP

media. Transfer 3.5 mL of a starter culture to fresh TAP media.
Use a 1 L flask for 350 mL of culture to provide sufficient room
for consistent mixing. Shake at 120 rpm with 100 μmol m 2 s 1
white light at room temperature. Grow for 3–4 days until an
OD750 of 0.4–0.5 is reached (see Note 4).
4. Centrifuge each culture for 5 min at 6000 g at 4 C in a 1 L
centrifuge bottle to harvest the Chlamydomonas.
5. Decant the supernatant from each culture while not disturbing
the cell pellet in the centrifuge bottle.
6. Resuspend the Chlamydomonas pellets in 10 mL of fresh TAP
media and transfer each solution to a 15 mL conical
centrifuge tube.
7. Centrifuge each culture for 2 min at 3200 g, at 4 C.
8. Decant the supernatant from each culture while not disturbing
the cell pellet in the centrifuge bottle.
9. Place the conical centrifuge tubes containing cell pellets in
liquid nitrogen until fully frozen. Store at 80 C until
performing plant-based protein extraction.
3.2 Protein 1. Resuspend cell pellets in 4 mL lysis buffer (see Note 5) and
Extraction transfer to Covaris 2 mL tubes. Keep samples on ice during
resuspension.
2. Sonicate samples in a 4 C water bath for 3 min at 200 cycles/
burst, 100 W power, and 13% duty cycle using an E220 focused
ultrasonicator (Covaris, Woburn, MA, USA).
3. Transfer samples from Covaris tubes to 2 mL centrifuge tubes,
keeping the samples on ice.
4. Centrifuge cell lysates at 16,000 g for 10 min at 4 C and
collect the supernatant into a 50 mL conical tube.
5. Add 1 mL of fresh lysis buffer to the pelleted cell debris and
vortex.
6. Centrifuge this sample again at 16,000 g for 10 min at 4 C.
Collect the supernatant and combine with the first extraction in
a 15 mL conical tube.
7. Precipitate proteins by adding 5 volumes (about 30 mL) of
cold 100 mM ammonium acetate in MeOH. Incubate samples
overnight at 80 C.
8. Collect protein pellet by centrifuging for 5 min at 2000 g.
Decant the supernatant without disturbing the pellet.
9. Perform two additional washes with 30 mL fresh 100 mM
ammonium acetate in MeOH followed by a wash with 30 mL
70% EtOH. For each wash, resuspend the pellet by vortexing
before centrifuging for 5 min at 2000 g.
10. Allow protein pellets to dry for 5 min in a fume hood at room
temperature.
11. Resolubilize the pellets in 1–2 mL minimal resuspension
buffer. Incubate for 1 hr. to ensure protein is fully dissolved.
12. Use a 10 μL aliquot of each replicate to perform protein
quantification using the CB-X Protein Assay. Complete assay
using manufacturer’s protocol (see Note 6).
13. Normalize each replicate to 4 mg/mL and use a 0.5 mL aliquot
(2 mg) of each sample to continue through the remaining steps
in the protocol.
3.3 Reduction, 1. Reduce samples using 10 mM DTT. Add 10 μL reduction

Alkylation, and buffer to each sample. Incubate for 30 min at room tempera-
Digestion ture while shaking (500–850 rpm).
2. Alkylate samples using 40 mM IAM (see Note 7). Add 40 μL of
alkylation buffer to each sample. Incubate for 45 min in the
dark at room temperature while shaking.
3. Following alkylation, diluted the samples fivefold using
100 mM TRIS, pH 8.0 so the concentration of urea is <2 M,
which is a requirement for effective tryptic protein digestion.
For 0.5 mL samples, add 2 mL of 100 mM TRIS, pH 8.0.
4. Perform overnight digestion using mass spectrometry-grade
trypsin (Trypsin Gold from Promega is recommended) at a
protease to protein ratio of 1:50 at 25 C. For 2 mg lysate,
40 μg trypsin is needed. Gently invert or shake the samples
during digestion.
5. Following digestion, quench the reaction by adding 20% TFA
to the samples until their pH is less than 3 when measured with
a pH test strip. Usually 0.2–0.4% final volume TFA, or 5–10 μL
for 2.5 mL samples, is sufficient.
6. Freeze samples at 80 C following digestion until desalting
using 50 mg SepPak (Waters) cartridges is performed.
3.4 Desalting 1. Thaw samples on ice and centrifuge them for 5 min at
10,000 g to pellet. Remove undigested protein pellet from
soluble peptide mixture to avoid clogging the cartridges.
2. Set up one cartridge for each sample on a vacuum manifold
using test tubes to collect the flow through from the cartridges.
3. Wet cartridges by adding 1 mL of 80% ACN, 0.1% TFA (see
Note 8).
4. Equilibrate cartridges using 2 mL of 0.1% TFA.
5. Load peptide samples onto the cartridge and recover the flow
through in a new test tube.
6. Reapply this flow through to the cartridge.
7. After the flow through passes through, switch to a new test

tube and flow 2 mL of 0.1% TFA are added to the cartridges to
remove salts.
8. Elute desalted peptides into a new 2 mL tube by adding 1.5 mL
of 80% ACN, 0.1% TFA to the cartridge. Once the elution
flows all the way through the cartridge, apply vacuum for about
5 s to collect the remaining solvent from the packed bed.
9. Following peptide elution, freeze the samples and vacuum
centrifuge to dryness.
3.5 Phosphopeptide 1. Each sample uses one TiO2 tip placed in a microcentrifuge tube
Enrichment using an adaptor. Preelute the tips using 100 μL of elution
buffer (see Note 9).
2. Condition each tip with 100 μL of wash buffer twice, for a total
of 200 μL, followed by 3 washes using 100 μL of resuspension
buffer.
3. Resuspend the dried peptides in 150 μL of resuspension buffer.
Centrifuge the samples at 10,000 g for 5 min to prevent
clogging and load onto the tips. Use a new centrifuge tube to
recover the sample flow through.
4. Reapply the flow through five times.
5. Following binding using a new centrifuge tube, wash the tips
using 100 μL of resuspension buffer twice and then wash three
times with 100 μL of wash buffer.
6. Using a new centrifuge tube to collect the buffer, elute the
phosphopeptide-enriched samples using two aliquots of
100 μL of elution buffer, combining them for a total of
200 μL of elution.
7. Flash-freeze the elution with liquid nitrogen and vacuum cen-
trifuge to dryness with the concentrator set to room
temperature.
3.6 Sample 1. Resuspend phosphopeptide-enriched samples in 15 μL 1% FA,

Purification 2% ACN.
2. Centrifuge the samples at 15,000 g for 5 min and transfer to
a new tube, taking care not to disturb the pellet if present, to
remove any insoluble portion of the sample.
3. Aliquot 15 μL 60% ACN, 0.1% FA for each sample into its own
tube to elute samples from the ZipTip.
4. Perform a C18 ZipTip purification on each sample, using a new
tip each time (see Note 10).
5. Attach a ZipTip to a 10 μL pipette. With pipette set to 10 μL,
draw up LC-MS grade ACN to wet the tip. Discard the ACN
while keeping the resin wet. Repeat twice for a total of three
preelution steps.
6. Equilibrate the ZipTip by pipetting 0.1% FA three times, dis-
carding the solvent each time while keeping the resin wet.
7. Pipet the sample 10 times to load the peptides onto the ZipTip.
8. Wash six times with 0.1% FA.
9. Elute the peptides by pipetting 10 times using aliquoted elu-
tion solvent from step 4, expelling all of the solvent from the
pipette tip.
10. Dry down all of the eluted peptide samples.
3.7 LC-MS/MS 1. Resuspend phosphopeptide samples in 20 μL and whole cell

samples in 40 μL of 5% acetonitrile, 0.1% TFA and transfer to a
Total Recovery Vial (Waters).
2. Inject 5 μL of each sample and perform LC-MS/MS analysis
on each sample using a NanoAcquity UPLC system (Waters)
coupled to a Q Exactive HF-X Hybrid Quadrupole Orbitrap
mass spectrometer (ThermoFisher) via a Nanospray Flex Ion
Source (ThermoFisher). Inject the peptide mixture to a Sym-
metry C18 trap column (100 Å, 5 μm, 180 μm 20 mm;
Waters) with a flow rate of 5 μL/min for 3 min using 99% A and
1% B, then separate on a HSS T3 C18 column (100 Å, 1.8 μm,
75 μm 250 mm; Waters) using a gradient of increasing
mobile phase B at a flow rate of 300 nL/min for 120 min
total. Increase mobile phase B from 5–35% in 90 min, ramp
to 85% in 5 min, hold for 5 min, return to 5% mobile phase B in
2 min, and reequilibrate for 13 min.
3. Use the following MS parameters: Use a tune file set with
positive polarity, 2.2 kV spray voltage, 325 C capillary tem-
perature, and 40 S-lens RF level. In the instrument method,
include lock masses best of 371.10124 and 445.12003 back-
ground polysiloxane ions. Select full MS/DD-MS2 scan type
and set method duration to 120 min and default charge state to
2. Perform MS survey scan in profile mode across
350–1600 m/z at 120,000 resolution until 50 ms maximum
IT or 3 106 AGC target is reached. Select the top 20 features
above 5000 counts excluding ions with unassigned, +1, or >+8
charge state. Collect MS2 scans at 45,000 resolution with NCE
at 32 until 100 ms maximum IT or 1 105 AGC target. Set the
dynamic exclusion window for precursor m/z to 10 s and an
isolation window of 0.7 m/z. Check the system’s performance
every 8 h using an injection of BSA tryptic digest run with the
same instrument method.
3.8 Data Analysis 1. Upload acquired spectral files (∗.raw) into Progenesis QI for
Proteomics (Nonlinear Dynamics). Use automatically assigned
reference spectrum to align the total ion chromatograms to
minimize run-to-run differences in retention time and normal-
ize peak abundances. Design experiment so that replicates are
grouped together as one subject. Export a combined peak list
(∗.mgf).
2. Upload and determine peptide sequence and protein inference
using Mascot (Matrix Science). Use the following search para-
meters: Search against the database containing the proteome
for the organism of interest, in this case the Phytozome Chla-
mydomonas proteome appended with the NCBI mitochon-
drial and chloroplast databases, along with the sequence for
common laboratory contaminants (www.thegpm.org/cRAP;
116 entries). Use a target decoy MS/MS search with trypsin
protease specificity with up to two missed cleavages, a peptide
mass tolerance of 15 ppm, and a fragment mass tolerance of
0.1 Da. Set a fixed modification of carbamidomethylation at
cysteine and include the following variable modifications: acet-
ylation at the protein N-terminus, oxidation at methionine,
and phosphorylation at serine, threonine, and tyrosine. After
the search is complete, adjust the false discovery rate of the
significant peptide identifications to be less than 1% using the
embedded Percolator algorithm. Export matches (∗.xml) and
reupload data to Progenesis.
3. From Progenesis, export the “Peptide Measurements” from
the “Review Proteins” tab (Table S1). These data can be used
to determine the number of phosphosites, and phosphopro-
teins identified in each replicate (Fig. 2a) and the reproducibil-
ity can be assessed (Fig. 3).
4. The proteomics data have been deposited to the ProteomeX-
change Consortium (www.proteomexchange.org) via the
PRIDE partner repository [25] with the dataset identifiers
PXD012261.
5. Parse data using custom R script found at GitHub (https://
github.com/hickslab/QuantifyR) or using similar parsing
technique. This script groups together features matched with
identical sequence, modifications, and score with differing pro-
tein accessions, representing them by the protein accession
with the highest number of unique peptides and largest confi-
dence score assigned by Progenesis. Features duplicated by
multiple peptide identifications are reduced to a single peptide
with the highest Mascot ion score. The results are then limited
to only peptide with one or more phosphosites. Identifiers are
made by joining the protein accession of each feature with the
single-letter amino acid code of the modified residue and
Fig. 2 Summary of quantitation results between three replicate samples. A. Number of phosphopeptides,
phosphoproteins, and statistics for each individual replicate and combined data with filtered and imputed
data. B. Histogram of the % CV for quantitated phosphosites
Fig. 3 Plots comparing the log2 transformed abundances between replicate samples
location of the modification. The data are then reduced to

unique identifiers by summing the abundance of all contribut-
ing features (charge states, missed cleavages, etc.). Each identi-
fier group is represented in the final dataset by the peptide with
the highest Mascot score (Table S2). Using these parsed
results, the total number of phosphosites, phosphoproteins,
and %CV can be calculated for the three replicates (Fig. 2a
and b).
4 Notes
1. Hutner’s Trace Elements stock preparation taken from Chla-

mydomonas Resource Center (www.chlamycollection.org).
Stock preparation is extensively described on the Chlamydo-
monas Resource Center Website and by Hutner et al. [24].
2. TAP Salts Stock (40): Add 15.00 g of NH4Cl, 4.00 g

MgSO4·7H2O, and 2.00 g CaCl2·2H2O to 1 L water. Stir
until dissolved and autoclave. TAP Phosphate Stock (1000):
Add 288.00 g K2HPO4 and 144.00 g KH2PO4 to 1 L of water.
Stir until dissolved and autoclave. TAP Acetate Stock, pH 7.0
(50): Add 121.00 g TRIS base and 50 mL of glacial acetic
acid to 950 mL water. Stir to dissolve and filter sterilize. For 1 L
of media combine the following amounts of stock solutions
and autoclave: 25 mL TAP Salts Stock, 1 mL TAP Phosphate
Stock, 20 mL of TAP Acetate Stock, and 1 mL of Hutner’s
Trace Elements.
3. 1 M TRIS Stock (10), pH 8.0: Dissolve 121.10 g of TRIS
base in 800 mL of water, adjust the pH to 8.0 by adding
concentrated HCl, and add water to a final volume of 1 L.
20% SDS Stock: Add 20.00 g SDS to 80 mL water, slowly mix
to dissolve keeping the speed low to prevent frothing and
heating if needed to no higher than 68 C, and adjust to final
volume of 100 mL with water. For 10 mL of buffer, add 1 mL
1 M TRIS Stock solution, 1 protease inhibitor tablet, 1 phos-
phatase inhibitor tablet, and 0.5 mL 20% SDS stock solution to
8.5 mL of distilled water.
4. An OD750 of 0.4–0.5 was identified as mid-log phase growth
for this strain of Chlamydomonas based on the known growth
patterns [22]. Growth curves should be measured and used to
identify the optical density where mid-log growth occurs when
using this method to study other strains or algal species. This
ensures that the cells are actively growing, there is no shortage
of any nutrients, and enough material is harvested for each
sample to perform phosphoproteomic analysis.
5. Three Chlamydomonas cultures were harvested, resuspended
in lysis buffer, combined, and realiquoted into three technical
replicates to assess the reproducibility of this method and nor-
malize any biological variability in the samples. When using this
method for differential studies, each culture should be a
biological replicate, with no recombination step.
6. For CB-X protein assay, take a 10 μL aliquot of the protein
sample and perform the assay according to the manufacturer’s
instructions. Briefly, add 1 mL of CB-X reagent and vortex.
Centrifuge the sample at 15,000 g for 5 min. Remove the
supernatant without disturbing the pelleted protein. Add
50 μL of Solubilization Buffer 1 and 50 μL Solubilization
Buffer 2, and pipet to resuspend the pellet. Incubate for
1 min before adding 1 mL CB-X Assay Dye. Incubate for
5 min before measuring the absorbance of the sample at
595 nm.
7. IAM in solution is unstable and light sensitive. Keep IAM
solution in the dark before and during alkylation to prevent
degradation. Covering the tubes or mixer with aluminum foil

works well for this.
8. When using C18 SepPak cartridges, a manifold can be used to
apply vacuum to the samples to increase the flow rate through
the cartridges. Vacuum can be used for all of the steps in the
procedure except for the initial loading of the peptides onto the
cartridge and the elution of the peptides. Flow rate should not
exceed 1 mL/min when vacuum is used. The bed of the
cartridges should stay wet throughout the procedure by
keeping a small amount of solvent above the packed bed at all
times.
9. For each step in the enrichment, centrifuge the tips at 1000 g
at room temperature to pass buffer through the column. For
steps using 100 μL and 150 μL buffer, centrifuge the tips for
3 min and 5 min, respectively.
10. ZipTips work by drawing solvent through the resin using a
micropipette to aspirate up and down. It is important that the
resin remains wet throughout the purification by leaving a
small amount of solvent visible above the resin bed at all
times until the sample is ready for elution.
Acknowledgments
This research was supported by a National Science Foundation

CAREER award (MCB-1552522) awarded to L.M.H. NSF MRI
(CHE-1726291) supported the purchase of the Q-Exactive HF-X
mass spectrometer, and we thank Dr. Brandie Ehrmann for training
on the HF-X instrument.
References
1. Harris EH (2001) Chlamydomonas as a model 5. Miller R, Wu G, Deshpande RR et al (2010)
organism. Annu Rev Plant Physiol Plant Mol Changes in transcript abundance in Chlamydo-
Biol 52:363–406 monas reinhardtii following nitrogen depriva-
2. Hu Q, Sommerfeld M, Jarvis E et al (2008) tion predict diversion of metabolism. Plant
Microalgal triacylglycerols as feedstocks for Physiol 154:1737–1752
biofuel production: perspectives and advances. 6. Wang H, Alvarez S, Hicks LM (2012) Com-
Plant J 54(4):621–639 prehensive comparison of iTRAQ and label-
3. Merchant SS, Prochnik SE, Vallon O et al free LC-based quantitative proteomics
(2007) The Chlamydomonas genome reveals approaches using two Chlamydomonas rein-
the evolution of key animal and plant func- hardtii strains of interest for biofuels engineer-
tions. Science 318:245–250 ing. J Proteome Res 11:487–501
4. Zones JM, Blaby IK, Merchant SS et al (2015) 7. Roustan V, Bakhtiari S, Roustan P-J et al
High-resolution profiling of a synchronized (2017) Quantitative in vivo phosphoproteo-
diurnal transcriptome from Chlamydomonas mics reveals reversible signaling processes dur-
reinhardtii reveals continuous cell and meta- ing nitrogen starvation and recovery in the
bolic differentiation. Plant Cell 27:2743–2769 biofuel model organism Chlamydomonas rein-
hardtii. Biotechnol Biofuels 10:280. https://
doi.org/10.1186/s13068-017-0949-z
8. Krebs EG, Fischer EH (1955) Phosphorylase recovery from complex biological samples. J
activity of skeletal muscle extracts. J Biol Chem Proteome Res 9:3561–3573
216:113–120 18. Aryal UK, Ross ARS (2010) Enrichment and
9. Fischer EH, Krebs EG (1955) Conversion of analysis of phosphopeptides under different
phosphorylase b to phosphorylase a in muscle experimental conditions using titanium dioxide
extracts. J Biol Chem 216:121–132 affinity chromatography and mass spectrome-
10. Eriksson J, Fenyö D (2010) Modeling experi- try. Rapid Commun Mass Spectrom
mental design for proteomics. Methods Mol 24:219–231
Biol 673:223–230 19. Werth EG, McConnell EW, Lianez IC et al
11. Blackburn K, Goshe MB (2009) Challenges (2019) Investigating the effect of target of
and strategies for targeted phosphorylation rapamycin kinase inhibition on the Chlamydo-
site identification and quantification using monas reinhardtii phosphoproteome: from
mass spectrometry analysis. Brief Funct Geno- known homologs to new targets. New Phytol
mic Proteomic 8:90–103 221:247–260
12. Dunn JD, Reid GE, Bruening ML (2010) 20. Neilson KA, Ali NA (2011) Less label, more
Techniques for phosphopeptide enrichment free: approaches in label-free quantitative mass
prior to analysis by mass spectrometry. Mass spectrometry. Proteomics 11:535–553
Spectrom Rev 29:29–54 21. Bantscheff M, Schirle M, Sweetman G et al
13. Kokubu M, Ishihama Y, Sato T et al (2005) (2007) Quantitative mass spectrometry in pro-
Specificity of immobilized metal affinity-based teomics: a critical review. Anal Bioanal Chem
IMAC/C18 tip enrichment of phosphopep- 389:1017–1031
tides for protein phosphorylation analysis. 22. Werth EG, McConnell EW, Gilbert TSK et al
Anal Chem 77:5144–5154 (2017) Probing the global kinome and phos-
14. Ruprecht B, Koch H, Medard G et al (2015) phoproteome in Chlamydomonas reinhardtii
Comprehensive and reproducible phosphopep- via sequential enrichment and quantitative pro-
tide enrichment using iron immobilized metal teomics. Plant J 89:416–426
ion affinity chromatography (Fe-IMAC) col- 23. Wang H, Gau B, Slade WO et al (2014) The
umns. Mol Cell Proteomics 14:205–215 global phosphoproteome of Chlamydomonas
15. Larsen MR, Thingholm TE, Jensen ON et al reinhardtii reveals complex organellar phos-
(2005) Highly selective enrichment of phosphorylation in the flagella and thylakoid mem-
phorylated peptides from peptide mixtures brane. Mol Cell Proteomics 13:2337–2353
using titanium dioxide microcolumns. Mol 24. Hutner SH, Provasoli L, Schatz A et al (1950)
Cell Proteomics 4:873–886 Some approaches to the study of the role of
16. Tsai C-F, Wang Y-T, Chen Y-R et al (2008) metals in the metabolism of microorganisms.
Immobilized metal affinity chromatography Proc Am Philos Soc 94:152–170
revisited: pH/acid control toward high selec- 25. Vizcaı́no JA, Côté RG, Csordas A et al (2013)
tivity in phosphoproteomics. J Proteome Res The PRoteomics IDEntifications (PRIDE)
7:4058–4069 database and associated tools: status in 2013.
17. Ye J, Zhang X, Young C et al (2010) Opti- Nucleic Acids Res 41:D1063–D1069
mized IMAC protocol for phosphopeptide
Chapter 16
Targeted Quantification of Phosphopeptides by Parallel

Reaction Monitoring (PRM)
Sara Christina Stolze and Hirofumi Nakagami
Abstract
Parallel reaction monitoring (PRM) is a liquid chromatography–mass spectrometry (LC-MS)-based tar-
geted peptide/protein quantification method that was initially implemented for Orbitrap mass spectro-
meters. Here, we describe detailed workflows that utilize the freely available MaxQuant and Skyline
software packages to target peptides of interest, primarily focusing on phosphopeptides.
Key words Parallel reaction monitoring (PRM), Targeted quantification, Orbitrap mass spectrome-
ter, Phosphopeptide, Phosphorylation, Posttranslational modification (PTM)
1 Introduction
Deducing the functions of gene products from genomic and tran-

scriptomic information alone is difficult; thus, determination of
protein abundance and posttranslational modification (PTM) sta-
tus are crucial [1–3]. Recent developments in MS-based targeted
proteomics methods, including parallel reaction monitoring
(PRM), have paved the way for the sensitive detection and accurate
quantification of peptides/proteins of interest in complex samples
[4–6]. These MS-based methods are complementary to classical
Western blotting but have not yet been widely utilized and/or
accepted in plant research fields. However, MS-based techniques
should in fact be the methods of choice for peptide/protein quan-
tification due to their higher sensitivity and the limited availability
of specific antibodies for Western blotting [7].
PRM is an alternative to selected reaction monitoring (SRM)
and was developed to take advantage of high-resolution and accu-
rate mass analyzers incorporated in Orbitrap mass spectrometers
[6, 8]. SRM is performed on triple quadrupole mass spectrometers
and utilizes the third quadrupole to detect a single isolated frag-
ment ion derived from a precursor ion. PRM, on the other hand,
213
214 Sara Christina Stolze and Hirofumi Nakagami
utilizes a high-resolution and accurate mass analyzer, Orbitrap or

time-of-flight (TOF), to detect all fragment ions derived from a
precursor ion in parallel. PRM has several advantages over SRM.
First, the identity of quantified ions can easily be confirmed because
full MS/MS spectra are available for database search. Second, the
quality of quantification can be reliably assessed from relative abun-
dance distributions for expected and measured fragmented ions.
Third, PRM has superior selectivity and sensitivity because of the
use of a high-resolution and accurate mass analyzer. Fourth, and
most importantly, PRM does not require the selection of fragment
ions for the construction of a data acquisition method; hence, it is
possible to target peptides of interest without acquiring MS/MS
spectra for the peptides in advance. For successful PRM analysis, a
list of target precursor ions is needed, which can be derived from a
data-dependent acquisition (DDA) or alternatively be generated
from an in silico digest of target proteins.
We here describe workflows for the setup of PRM methods with
two distinct approaches. The described methods focus especially on
constructing a target list for PRM data acquisition on Orbitrap mass
spectrometers with the use of the freely available MaxQuant and
Skyline software packages [9–11]. The first approach (Method A)
utilizes MS data acquired in DDA mode to construct a target list.
The second approach (Method B) explains how to construct a target
list with protein sequence information alone and without MS data
for the targeted peptides. The described protocols were developed
for targeting phosphopeptides, but they can be easily adapted for
nonmodified peptides or peptides with other modifications simply
by adjusting the parameters of the protocol.
2 Materials
MaxQuant software (https://maxquant.org) [9, 10]. Version

1.5.7.4 was used for this protocol.
Skyline (64-bit) 4.2.019009 software (https://skyline.ms)
[11] was used for this protocol.
3 Methods
Method A: Construction of a PRM-based targeted method using

data-dependent acquisition (DDA) data.
3.1 Sample Prepare samples containing proteins of interest in the phosphory-

Preparation for DDA lated form for DDA measurements in order to construct a PRM
and PRM method for targeted quantification. To identify and then to target
Measurements phosphopeptides/phosphosites from plant material, phosphopep-
tide enrichment protocols need to be optimized/established
PRM of Phosphopeptides 215
beforehand to ensure good reproducibility. Details of the titanium

dioxide (TiO2)-based phosphopeptide enrichment protocol that
we use have been described previously [12]. For analysis of phos-
phopeptides/phosphosites of in vitro–treated recombinant pro-
teins, we recommend analyzing digested peptides without further
phosphopeptide-enrichment. Parallel analysis of negative controls
in which the phosphorylated forms of the proteins of interest are
low-abundant or absent can help to define good targets for PRM.
3.2 DDA Measure phosphopeptide-enriched samples in DDA mode (see

Measurement Using Notes 1 and 2).
an Orbitrap Mass
Spectrometer
3.3 DDA Data After DDA measurement, analyze derived RAW data using Max-
Processing Quant software.
with MaxQuant
1. Download, install, and open the MaxQuant software (https://
Software maxquant.org).
2. Go to the “Raw files” tab, and click “Load” to import Thermo
RAW files. If more than two files are going to be analyzed,
specify the experimental design in the same tab.
3. Go to the “Group-specific parameters” tab, and choose the
“Modifications” option. Select phosphorylation of serine, thre-
onine, and tyrosine as variable modifications besides the default
settings of alkylation (e.g., carbamidomethylation) of cysteine
residues as fixed and oxidation of methionine residues and
protein N-terminal acetylation as variable modifications.
4. Choose the “Digestion” option and select an enzyme. We
usually keep the default setting for “Max. missed cleavages,”
which is “2.”
5. (Optional for quantification). Choose the “Label-free quantifi-
cation” option and select the “LFQ.” Set the “LFQ min. ratio
count” parameter to “1,” and deselect the “Fast LFQ.” Quan-
tification is optional and not needed for constructing a PRM
method. We often select this option to evaluate the reproduc-
ibility of the replicates.
6. Go to the “Global parameters” tab, and choose the
“Sequences” option. Click “Add file” to import a FASTA-
formatted protein database that is suitable for the analyzed
samples.
7. (Optional for quantification). Choose the “Adv. identification”
option, and enable the “match between runs.”
8. (Optional for quantification). Choose the “Label-free quantifi-
cation” option, and enable the “iBAQ.”
9. Click “Start” to run the analysis.
3.4 Target List 1. Download, install, and open the Skyline software (https://
Construction skyline.ms).
with Skyline Software 2. Open a “Blank Document.”
Using the MaxQuant
3. If you have used Skyline before, go to the “Settings” menu, and
Output File click “Default” to restore the default settings. Skyline automat-
ically restores settings that were used on the previous occasion,
and, therefore, it is necessary/desirable to clear previous
settings.
4. Go to the “Settings” menu, and open the “Peptide
Settings” form.
5. Go to the “Digestion” tab, and select an enzyme. For “Max
missed cleavages,” set the values that was defined for the Max-
Quant search.
6. Go to the “Filter” tab, and set the “Min length” option to “7”
and the “Max length” option to “25,” which are the default
settings for the MaxQuant search. If you have changed these
parameters for the MaxQuant search, match the parameters
accordingly. Set the “Exclude N-terminal AAs” option to “0.”
7. Go to the “Settings” menu, and open the “Transition
settings” form.
8. Go to the “Filter” tab, and set the “Precursor charges” option
to “2, 3, 4”, the “Ion charges” option to “1,” and the “Ion
types” option to “p”(precursor). Set the “To: (of the “Product
ion selection”)” option to the “last ion.”
9. Go to the “Instrument” tab, and set the “Min m/z” and “Max
m/z” options to the values that were used for the DDA
measurement.
10. Go to the “Full-Scan” tab, and set parameters for the “MS1
filtering.” Set the “Isotope peaks included” option to “Count”
and the “Precursor mass analyzer” option to “Orbitrap.” Set
the “Resolving power: At:” option to the values that were used
for the DDA measurement. Then, close the “Transition
settings” form.
11. Save the document with the adjusted settings.
12. Import the MaxQuant output file. Go to the “File” menu, and
click “Peptide Search” under the “Import” option to open the
“Import Peptide Search” form.
13. Select “DDA with MS1 filtering” for the “Workflow” option.
Click “Add Files” to select the MaxQuant output “msms.txt”
file, then click “Next.” The “mqpar” file for the MaxQuant
search is also needed for building a spectral library and has to
be stored in a same location as the “msms.txt” file.
14. Skyline will build a spectral library from the “msms.txt” file and
automatically search for the Thermo RAW files that were used
for the MaxQuant search. If the Thermo RAW files are stored
in a different location, manually select the files, then click
“Next.”
15. Select the option of adding all modifications and click “Next”
(see Note 3).
16. Click “Browse” to import the FASTA-formatted protein data-
base that was used for the MaxQuant search. If you intend to
make a target list for proteins of interest, import a database
with the selected proteins only. Alternatively, the FASTA-
formatted database can be copy-pasted into the box from a
text document. Click “Finish” to start importing the data.
17. If you want to extract peptides that are unique to the proteins
of interest, select the “Remove duplicate peptides” and click
“OK.” The uniqueness will only be assessed within the
imported database, and, therefore, extra evaluation is needed
to ensure uniqueness within the samples you plan to measure.
18. Arrange the Skyline window for selecting targets. We usually
arrange the window with the following panels, “Peak Areas—
Replicate Comparison”, “Retention Times—Replicate Com-
parison”, and “Retention Times—Scheduling”, as shown in
Fig. 1. These panels can be opened from the “View” menu.
19. Inspect the data for precursors of interest. Good results can be
achieved with sharp, defined peaks without coelution. An
“idotp” value of >0.95 is recommended. High reproducibility
is also a good indicator. In some cases, the software does not
Fig. 1 Example of precursor selection from MS1 scanning in Skyline

Fig. 2 Setup of a PRM method on a Q-Exactive Plus system (Thermo)
pick the correct peak for all replicates. If necessary, reintegrate

peaks manually. You can add notes to a target upon right-
clicking and choosing the “Edit note” option.
20. Keep precursors that you would like to target by PRM, and
delete all others from the “Targets.”
21. Export an isolation list. Go to the “File” menu, and click
“Isolation List” under the “Export” option to open the
“Export Isolation List” form.
22. Select “Thermo Q Exactive” for “Instrument type.” See
Note 4 for additional recommendations. Click “OK” to create
an isolation list and store it.
23. Save the document to be used as a template for the subsequent
PRM data analysis. See Note 5 for an alternative option.
3.5 PRM Data 1. Create a PRM method using the Thermo Xcalibur software. In
Acquisition our case, with a Thermo Q-Exactive Plus, a method combining
one full scan followed by n PRM (n ¼ number of targeted
precursors) events was used to acquire PRM data (see Note 6).
Figure 2 shows the setup of the instrument method on a
Q-Exactive Plus.
2. Measure samples with the PRM method.
3.6 PRM Data 1. Process acquired RAW data using the MaxQuant software, as
Analysis stated above (A). Alternatively, import the Thermo RAW file
into the Skyline software for analysis (B). See Note 7 for limita-
tions of option B. Data inspection with the Skyline software is
basically the same for both approaches (C).
A1. MaxQuant analysis: Analyze the PRM data using the Max-
Quant software, applying the same settings as stated under
Subheading 3.3.
A2. Skyline analysis: Prepare a document for data analysis.
Open the document that was generated as stated in Sub-
heading 3.4, step 23.
A3. Adjust the “Transition settings.” Go to the “Filter” tab,

and set the “Ion charges” option to “1, 2” and the “Ion
types” option to “p, b, y”. Set the “From: (of the “Product
ion selection”)” option to “ion 1” and the “To: (of the
“Product ion selection”)” option to “last ion.”
A4. Set parameters for the “MS/MS filtering” in the “Full-
Scan” tab. Set the “Acquisition method” option to
“Targeted” and the “Product mass analyzer” option to
“Orbitrap.” Set the “Resolving power: At:” option
to values that were used for the PRM measurement.
A5. Save the document with the adjusted settings.
A6. Import the MaxQuant output file. Go to the “File” menu,
and click “Peptide Search” under the “Import” option to
open the “Import Peptide Search” form.
A7. Select the “Filter for document peptides” option, and
select “PRM” for the “Workflow” option. Click “Add
Files” to select the MaxQuant output “msms.txt” file.
The “mqpar” file for the MaxQuant search is also needed
for building a spectral library, and has to be stored in the
same location as the “msms.txt” file. Click “Next” to
proceed.
A8. Select the Thermo RAW files to be analyzed, and click
“Next.”
A9. Select the option of adding all modifications, and click
“Next.”
A10. Define the number of product ions to be quantified using
the “Pick” option, if needed. We usually keep “5”, which is
the default setting. Click “Next” to proceed (see Note 8).
A11. Click “Finish” to start importing the data.
B1. Skyline analysis: Prepare the document for data analysis.
Follow the procedures A2–A5.
B2. Import the Thermo RAW files. Go to the “File” menu, and
click “Results” under the “Import” option to open the
“Import Results” form.
B3. Select “Collision Energy” from the “Optimizing” options.
B4. Click “OK” and choose the Thermo RAW files to be
imported.
C1. Analyze the results in Skyline. Arrange the Skyline window
for data analysis. To simultaneously display MS1 and MS2
data, go to the “View” menu, and select “Split Graph”
under the “Transitions” option. Alternatively, you can
right-click a panel, and select the “Split Graph” under the
“Transitions” option as shown in Fig. 3.
Fig. 3 Example of a split-graph analysis of PRM data in Skyline
C2. The results of the Skyline analysis can be exported using

the “Report” option under the “Export” option in the
“File” menu. The different report options can be further
customized to user requirements.
Method B: Construction of the PRM-based targeted method based

on in silico digest.
For possibilities and limitations of this approach refer to Note 9.
3.7 Target List 1. Open a “Blank Document,” and save the document.
Construction 2. Adjust the “Peptide settings.” Go to the “Digestion” tab, and
with the Skyline set appropriate “Enzyme” and “Max missed cleavages.”
Software by In Silico
3. Go to the “Filter” tab, and set the “Min length” option to “7”
Digest and the “Max length” option to “25,” which are the default
settings for the MaxQuant search. You can also attempt to
target shorter or longer peptides, and later adjust the para-
meters for the MaxQuant search. If you want to target the
site in the N-terminal region, set the “Exclude N-terminal
AAs” option to “0.”
4. Go to the “Modifications” tab, and click “Edit list” for the
“Structural modifications.” Add “Phospho (ST)” and “Phos-
pho (Y)” as variable modifications. Other possible modifica-
tions, namely alkylation (e.g., carbamidomethylation) of
cysteine residues as fixed, oxidation of methionine residues
and protein N-terminal acetylation as variable modifications,
also need to be added.
Fig. 4 Transition list created by in silico digest from a FASTA file containing a
protein of interest
5. Adjust the “Transition settings.” Go to the “Filter” tab, and set

the “precursor charges” option to “2, 3, 4”, the “Ion charges”
option to “1,” and the “Ion types” option to “p.” Set the
“From: (of the “Product ion selection”)” option to the “m/
z > precursor” and the “To: (of the “Product ion selection”)”
option to the “last ion.”
6. Import a FASTA file containing the proteins of interest. Go to
the “File” menu, and click “FASTA” under the “Import”
option to select the FASTA file. This will create a list of all
(theoretically) possible precursors with the given parameters
(Fig. 4).
7. Choose precursors of interest and delete all other transitions
from the “Targets” list (see Note 10).
8. Export an isolation list. Go to the “File” menu, and click the
“Isolation List” under the “Export” option to open the
“Export Isolation List” form.
9. Select the “Thermo Q Exactive” for the “Instrument type.”

See Note 4 for additional recommendations. Then click “OK.”
10. Save the document to be used as a template for the PRM data
analysis.
4 Notes
1. We will refer to phosphopeptide-enriched samples throughout

this method, but the method is neither limited to analysis of
phosphopeptides nor enriched samples. With appropriate adap-
tations, the protocol may also be used to target peptides with
other modifications or without any modifications in different
types of samples.
2. In addition to phosphopeptide enrichment protocol optimiza-
tion, we also recommend optimizing the data acquisition
method on the mass spectrometer for phosphopeptide identi-
fication. An increased MS/MS injection time often provides
better sensitivity and identification numbers. We recommend
measurement of at least three replicates for each condition to
account for the technical variation that can be introduced
during sample preparation and measurement.
3. The modifications box will contain all modifications that were
detected in the MaxQuant search. Alternatively, desired mod-
ifications can be selected during the adjustment of the docu-
ment before import under the “Peptide Settings” form in the
“Modifications” tab. If this option was used, but all modifica-
tions found in the MaxQuant search were not added at that
point, the modifications box during the import will notify the
user about additional modifications that can be added by
selection.
4. The number of precursors to be targeted in a PRM experiment
depends on the mass spectrometer’s specifications as well as on
the chromatographic width of the peak. The larger the number
of precursors that are targeted, the longer the cycle time that is
needed for one full scan + MS/MS. Prolonged cycle times
result in inferior coverage of chromatographic peaks; 10 scans
across a peak should serve as an orientation mark. The number
of precursors for targeting can be increased by using a sched-
uled method; this, however, requires very stable and reproduc-
ible chromatography. A scheduled PRM isolation list can only
be created from a document containing spectral information
and not from a document created by importing a FASTA file as
stated below in Note 5.
5. Alternatively, a template for analysis can be created by import-
ing a FASTA file containing the proteins of interest into a blank
Skyline document applying the settings specified in Subheading

3.4, steps 4–10. From this list, only the precursors that are to
be targeted are retained and all others are deleted. Save the
document to be used as an analysis template and/or to create
an isolation list.
6. Using the combination of one full scan and PRM will enable
quantification on both MS1 and MS2 levels. For a Q-Exactive
system, make sure to adjust the “loop count” to the number of
targeted precursors when setting up the method.
7. (a) This option, which uses the analysis template created by
uploading a FASTA file as described in Note 5, will not gener-
ate a spectral library. Therefore, the MS2 peak area will not
display a dot product (dotp) score. This option will also not
provide information about the numbers of scans obtained
per peak.
(b) This option will generate a list of all detected MS2 ions that
were derived from a precursor. It is, however, recommended to
keep only ions that were robustly detected (indicated by a green
traffic light symbol). Undesired transitions can be removed by
opening the transition list of a precursor and unchecking the
respective ions. The transition list can be opened by hovering
over the m/z value for the precursor in the target list and
clicking the downward arrow that appears.
8. You can subsequently change the setting under the “Transition
Settings” form in the “Library” tab. However, you can only
reduce the number for filtering. It is not possible to increase
the number to have data for additional product ions because
the data for the additional ions will not be imported for
analysis.
9. If difficulties are encountered in obtaining spectra for phos-
phopeptides of interest, a target list can also be generated
without the DDA data. In contrast to the abovementioned
method that can be regarded as an ab initio approach, for
optimal results, the alternative method should be based on a
priori data: an example of a priori information would be a
phosphorylation substrate in which one or multiple sites of
phosphorylation are known but for which no MS data are
available. In this case, an in silico digest of the target protein
can be used to generate a list of theoretical precursors that can
be targeted in a PRM method without the need for prior
acquisition of DDA spectra. For best results, in the first
instance, all theoretical precursors generated in the in silico
digest should be targeted (see Note 10). If several rounds of
analysis are planned, the target list can be adjusted to the
precursors with good identifications in the previous runs.
10. Miscleavages should also be considered when choosing a pre-

cursor. Any information from prior MS experiments with the
proteins of interest may facilitate identification of good targets.
If a precursor contains multiple phosphorylatable sites, for
example, a serine and a threonine, it is sufficient to target
only one precursor to cover all (putative) phosphorylation
site isomers because they all have the same m/z value. Analysis
of MS/MS fragmentation data created in the PRM measure-
ment can enable a precise localization of the phospho group if
fragments containing the phosphorylation have been observed.
However, to enable this distinction between possible isomers,
it is necessary to include all of them in the template for the
PRM analysis.
Acknowledgments
This work was supported by the Max-Planck-Gesellschaft. We

thank Neysan Donnelly for editing the manuscript.
References
1. Walley JW, Sartor RC, Shen Z et al (2016) 7. Lehmann U, Wienkoop S, Tschoep H et al
Integration of omic networks in a developmen- (2008) If the antibody fails--a mass western
tal atlas of maize. Science 353:814–818 approach. Plant J 55:1039–1046
2. Marx H, Minogue CE, Jayaraman D et al 8. Peterson AC, Russell JD, Bailey DJ et al (2012)
(2016) A proteomic atlas of the legume Medi- Parallel reaction monitoring for high resolu-
cago truncatula and its nitrogen-fixing endo- tion and high mass accuracy quantitative, tar-
symbiont Sinorhizobium meliloti. Nat geted proteomics. Mol Cell Proteomics
Biotechnol 34:1198–1205 11:1475–1488
3. Seaton DD, Graf A, Baerenfaller K et al (2018) 9. Cox J, Mann M (2008) MaxQuant enables
Photoperiodic control of the Arabidopsis pro- high peptide identification rates, individualized
teome reveals a translational coincidence mech- p.p.b.-range mass accuracies and proteome-
anism. Mol Syst Biol 14:e7962 wide protein quantification. Nat Biotechnol
4. Vidova V, Spacil Z (2017) A review on mass 26:1367–1372
spectrometry-based quantitative proteomics: 10. Tyanova S, Temu T, Cox J (2016) The Max-
targeted and data independent acquisition. Quant computational platform for mass
Anal Chim Acta 964:7–23 spectrometry-based shotgun proteomics. Nat
5. Arsova B, Watt M, Usadel B (2018) Monitor- Protoc 11:2301–2319
ing of plant protein post-translational modifi- 11. MacLean B, Tomazela DM, Shulman N et al
cations using targeted proteomics. Front Plant (2010) Skyline: an open source document edi-
Sci 9:1168 tor for creating and analyzing targeted proteo-
6. Bourmaud A, Gallien S, Domon B (2016) Par- mics experiments. Bioinformatics 26:966–968
allel reaction monitoring using quadrupole- 12. Nakagami H (2014) StageTip-based HAM-
Orbitrap mass spectrometer: principle and MOC, an efficient and inexpensive phospho-
applications. Proteomics 16:2146–2159 peptide enrichment method for plant shotgun
phosphoproteomics. Methods Mol Biol
1072:595–607
Chapter 17
Enrichment of N-Linked Glycopeptides and Their

Identification by Complementary Fragmentation Techniques
Eduardo Antonio Ramirez-Rodriguez and Joshua L. Heazlewood
Abstract
N-linked glycans are a ubiquitous posttranslational modification and are essential for correct protein folding
in the endoplasmic reticulum of plants. However, this likely represents a narrow functional role for the
diverse array of glycan structures currently associated with N-glycoproteins in plants. The identification of
N-linked glycosylation sites and their structural characterization by mass spectrometry remains challenging
due to their size, relative abundance, structural heterogeneity, and polarity. Current proteomic workflows
are not optimized for the enrichment, identification and characterization of N-glycopeptides. Here we
describe a detailed analytical procedure employing hydrophilic interaction chromatography enrichment,
high-resolution tandem mass spectrometry employing complementary fragmentation techniques (higher-
energy collisional dissociation and electron-transfer dissociation) and a data analytics workflow to produce
an unbiased high confidence N-glycopeptide profile from plant samples.
Key words N-linked glycans, Glycoproteomics, HILIC, Higher-energy collisional dissociation, Elec-
tron-transfer dissociation
1 Introduction
Asparagine (N)-linked glycosylation is a covalent posttranslational

modification that is found across all eukaryotes. The modification
has been linked to numerous important functions such as enzyme
activity, protein–protein interactions, protein folding, and sorting
[1]. In mammals, N-linked glycans have been connected with a
variety of cellular functions and are implicated in various diseases
[2]. In plants, N-linked glycans are also essential for protein folding
as part of the endoplasmic reticulum (ER) quality control (ERQC)
system [3]. However, roles for N-glycans in processes such as
influencing enzyme activity in plants are less clear [4]. Until
recently it was unclear whether N-linked glycans formed in the
Golgi apparatus of plants were associated with any obvious function
[5]. However, most studies were performed in Arabidopsis thali-
ana, while recent efforts in Oryza sativa (rice) have identified that
225
226 Eduardo Antonio Ramirez-Rodriguez and Joshua L. Heazlewood
N-linked glycan biosynthetic mutants are severely affected in

growth, are sterile and adaptation to low temperature environ-
ments is affected [6].
N-linked glycans are comprised of a highly conserved core N-
acetylglucosamine (GlcNAc) and mannose (Man) that is shared
between eukaryotes. Differences in glycan extensions, decorations
and processing of this core define the N-glycan diversities observed
across Eukaryota. The structure of plant N-linked glycans ranges
from high-mannose structures (HexNAc2Hex9) to complex bian-
tennary oligosaccharide comprising a variety of glycans (HexNA-
c4Hex5Fuc3Pent1). The sequential N-glycan biosynthetic process
results in a diversity of structures at a given N-glycan site. The
resultant microheterogeneity complicates data analysis, protein
quantitation and makes it challenging to identify and directly attri-
bute function due to the macroheterogeneity at the protein level
[7]. As a result, few studies have examined the function of a specific
N-glycoproteoforms. Thus, the characterization of N-linked gly-
coproteins in plants has generally occurred through the application
proteomic methods, which have been challenging due to the size
and physicochemical properties of N-glycopeptides, poor fragmen-
tation during collision-induced dissociation and the variations due
to microheterogeneity of structures found at a given site. N-glycan
profiles from plant proteins were initially obtained through the
enzymatic removal of structures and profiling by mass spectrome-
try. Initial efforts to identify N-glycoproteins employed affinity
methods and profiling by mass spectrometry with N-glycan sites
inferred by informatic techniques [8–11], but findings were not
associated with a polypeptide sequence. To reduce identification
issues caused by the presence of the glycan, studies employed
endoglycosidases (PNGase A/F) to remove N-glycans from
enriched preparations prior to identification by tandem mass spec-
trometry. Such approaches have identified nearly 3000 sites from
over 1600 proteins from the reference plant Arabidopsis
[12, 13]. However, the use of an N-glycosidases to improve identi-
fication by mass spectrometry results in data that lack any N-glycan
structural information. A handful of studies in plants have now
applied high-resolution tandem mass spectrometry on enriched
N-glycopeptide fractions revealing the extent of microheterogene-
ity in plant N-glycoproteins [7, 14, 15]. These studies have
provided high-resolution information on the variation of N-glycan
structures at a given site from around 500 N-glycoproteins from
Arabidopsis.
As with most posttranslational modifications, the identification
of N-linked glycopeptides from a complex protein lysate by tandem
mass spectrometry is infrequent. Consequently, some enrichment
strategy is necessary prior to mass spectrometry. Initial enrichment
methods in plants employed complex enrichment and fractionation
techniques utilizing cation exchange chromatography and gel
N-linked Glycoproteomics in Plants 227
filtration [16]. However, these methods have generally been

applied for profiling N-glycan structures by mass spectrometry
[17]. The use of lectin affinity for the enrichment of N-glycopep-
tides has been commonly applied in plant samples [9]. Indeed, the
first plant glycoproteomic study employed a lectin mixture com-
prising concanavalin A (ConA), wheat germ agglutinin (WGA),
and Ricinus communis (castor bean) agglutinin (RCA120) to
enrich N-glycopeptides [13]. A solid phase enrichment approach
using hydrazide beads has also been employed to capture N-glyco-
peptides from plant samples [12]. Recently the WGA enrichment
method was applied to Arabidopsis samples and intact N-glycopep-
tides analyzed by high resolution tandem mass spectrometry
[15]. However, the authors reported an abundance of N-glycopep-
tides harboring single N-linked GlcNAc residues, an N-glycan not
commonly reported in Arabidopsis samples [10, 17]. These find-
ings indicate a potential structural bias when enriching with
WGA [7].
An alternative enrichment strategy employs hydrophilic inter-
action liquid chromatography (HILIC) for the enrichment of
N-glycopeptides and was first applied to human plasma
[18]. Although N-glycans were also removed with PNGase A
prior to mass spectrometry, the enrichment method appeared unbi-
ased, as it was not selecting a glycan type. Recently, the enrichment
of N-glycopeptides from plant samples using HILIC has been
conducted and intact N-glycopeptides characterized by tandem
mass spectrometry [7, 14]. The proportions of N-glycans identified
in these studies reflected those previously found in a variety of plant
species, including tobacco, Arabidopsis, and Lotus japonica [7],
reflecting the unbiased nature of the HILIC enrichment method.
Here, we outline a strategy to analyze enriched N-glycopeptides
from plant microsomal preparations and using HILIC enrichment,
and a high-resolution tandem mass spectrometry employing com-
plementary fragmentation techniques (HCD and ETD) to produce
an unbiased N-glycopeptide profiles. A recent study utilizing this
approach in Arabidopsis identified over 1000 distinct N-glycopep-
tides from over 300 glycoproteins with an FDR <1% [7].
2 Materials
Prepare all solutions using ultrapure water (18 MΩ-cm at 25 C)

and analytical grade reagents. Prepare and store all reagents at room
temperature (unless indicated otherwise). Follow institutional reg-
ulations when disposing of waste materials.
2.1 Microsomal 1. Approximately 1 g plant tissue (fresh weight) (see Note 1).
Preparation 2. Ceramic mortar and pestle (medium size).
3. Microsome Extraction Buffer: 50 mM HEPES-KOH

(pH 6.8), 0.4 M sucrose, 1 mM dithiothreitol (DTT), 5 mM
MnCl2 and 5 mM MgCl2.
4. Proteinase inhibitor cocktail, such as cOmplete EDTA-free
proteinase inhibitor cocktail tablet (Roche).
5. Miracloth (EMD Millipore).
6. Funnel (glass), 80 mm.
7. Preparative centrifuge tubes, 10 mL.
8. Preparative centrifuge, with fixed angle rotor for 10 mL tubes
and capable of 10,000 g.
9. Ultracentrifuge tubes, 12 mL.
10. Ultracentrifuge with fixed angle rotor for 12 mL tubes capable
of 100,000 g for pelleting microsomes.
11. Protein Quantification Assay, such as Pierce™ BCA Protein
Assay Kit.
2.2 Digestion 1. Denaturing Buffer: 7 M urea in 100 mM ammonium bicarbon-

and Hydrophilic ate (see Note 2).
Interaction Liquid 2. 100 mM ammonium bicarbonate.
Chromatography
3. 1 M dithiothreitol (DTT) (see Note 3).
(HILIC) Enrichment
of N-Glycopeptides
4. 37 C incubator.
5. 1 M iodoacetamide (IAA) (see Note 4).
6. Trypsin, sequencing grade (see Note 5).
7. Acetic acid.
8. C18 Solid phase extraction (C18 SPE), such as Sep-Pak plus
C18 cartridges (Waters Corporation).
9. 1 mL syringe.
10. 10 mL syringe.
11. Centrifuge with rotor to handle 2 mL microfuge tubes capable
of 13,000 g.
12. C18 SPE Buffer 1: 0.1% formic acid.
13. C18 SPE Buffer 2: 80% acetonitrile and 0.1% formic acid.
14. SpeedVac concentrator.
15. Hydrophilic interaction liquid chromatography SPE (HILIC
SPE) spin columns, such as MacroSpin Columns HILIC
(#SMM HIL, The Nest Group) (see Note 6).
16. HILIC Loading Buffer: 80% acetonitrile, 1%
17. HILIC Elution Buffer 1: 70% acetonitrile, 1%
18. HILIC Elution Buffer 2: 60% acetonitrile, 1%

19. HILIC Elution Buffer 3: 50% acetonitrile, 1% trifluoroacetic acid.
2.3 Identification 1. ZipTipC18 Pipette Tips (Millipore).

of N-Glycopeptides by 2. ZipTip Buffer 1: 80% acetonitrile, 0.1% formic acid.
Tandem Mass
3. ZipTip Buffer 2: 0.1% formic acid.
Spectrometry
4. SpeedVac concentrator.
5. Nano-flow liquid chromatograph with tandem mass spectrom-
eter capable of triggered Electron-transfer dissociation (ETD),
such as an Orbitrap Fusion™ Lumos™ Tribrid™ Mass Spec-
trometer (Thermo Fischer Scientific) with an Ultimate 3000
RSLC nano-flow HPLC (Thermo Fischer Scientific) (see Note
7).
6. C18 nano-trap column (100 Å, 75 μm 2 cm).
7. C18 analytical column (100 Å, 75 μm 50 cm).
8. MS Loading Buffer: 3% (v/v) acetonitrile and 0.1% (v/v)
formic acid.
9. MS Buffer B: 100% acetonitrile and 1% formic acid (v/v).
2.4 Spectral Data 1. Byonic™ (Protein Metrics) (see Note 8).

Interrogation 2. Plant species specific database in FASTA format.
and Matching
3. Microsoft Excel.
3 Methods
Carry out all procedures at room temperature, unless otherwise

indicated. A workflow of the methods is outlined in Fig. 1.
3.1 Preparation 1. Harvest 1 g fresh weight of plant material (see Note 1).
of Microsomal Fraction 2. Place material in 8 mL of Microsomal Extraction Buffer and a
and Peptide Digestion prechilled mortar and pestle and grind on ice until tissue is
homogenized.
3. Place two layers of Miracloth into a funnel and filter homoge-
nate into 10 mL centrifuge preparative tube on ice. Gently
squeeze the Miracloth to extract as much homogenate as pos-
sible (see Note 9).
4. Centrifuge the homogenate at 3000 g for 10 min at 4 C.
5. Carefully transfer the supernatant into prechilled 12 mL ultra-
centrifuge tubes.
6. Centrifuge the supernatant at 100,000 g for 30 min.
7. Discard supernatant being careful not to disturb the pellet.
Fig. 1 Graphical summary of the HILIC-based N-glycan peptide analysis workflow from plant material
8. Resuspend the pellet (microsomal fraction) in residual Extrac-

tion Buffer.
9. Quantify the amount of protein using a protein quantification
assay (see Note 10).
3.2 Digestion 1. Take around 500 μg of microsomal protein and make up to

and N-Glycopeptide 100 μL in Denaturing Buffer.
Enrichment 2. Add DTT to a final concentration of 10 mM incubate the
samples at 60 C for 60 min (see Note 3).
3. Allow the sample to cool to room temperature, add IAA to a
final concentration of 100 mM and incubate at room tempera-
ture for 45 min (see Note 4).
4. Dilute the sample to 1 M urea with 100 mM ammonium
bicarbonate (see Note 11).
5. Add trypsin at 1:25 trypsin–protein ratio (20 μg trypsin) and
incubate overnight at 37 C (see Note 12).
6. Add acetic acid to a concentration of 1% (v/v).
7. Desalt and purify tryptic peptides with a C18 SPE cartridge.
8. Wash the C18 SPE cartridge using 10 mL of C18 SPE Buffer
2 using a 10 mL syringe.
9. Precondition the C18 SPE cartridge using 10 mL of C18 SPE
Buffer 1 with a 10 mL syringe, repeat this step.
10. Load the peptide sample (approximately 1 mL) into the pre-
conditioned C18 SPE cartridge using a 1 mL syringe.
11. Wash the peptides with 10 mL of C18 SPE Buffer 1 using a

10 mL syringe, repeat this wash step.
12. Elute the peptides with 2 mL of C18 SPE Buffer 2.
13. Concentrate peptides using a SpeedVac concentrator (see Note
13).
14. Resuspend the peptides with 500 μL HILIC Loading Buffer.
15. For HILIC enrichment of N-glycopeptides, first condition the
HILIC SPE column (placed in a 2 mL microfuge tube) using
500 μL of ultrapure water by centrifugation at 200 g for
3 min.
16. Precondition the HILIC SPE column with 500 μL HILIC
Loading Buffer by centrifugation at 200 g for 3 min, repeat
this step.
17. Load the peptides (500 μL) onto the HILIC SPE column and
centrifuge at 200 g, for 3 min.
18. Wash the sample by adding 500 μL HILIC Loading Buffer and
centrifuge at 200 g, for 3 min, repeat this step.
19. Elute N-glycopeptides with 200 μL of HILIC Elution Buffer
1 (centrifuge 200 g for 3 min), save eluate.
2 (centrifuge 200 g, for 3 min), save eluate.
3 (centrifuge 200 g for 3 min), save eluate and combine
(approximately 600 μL) (see Note 14).
22. Remove acetonitrile and concentrate peptides from the com-
bined eluate using a SpeedVac concentrator (see Note 13).
3.3 Identification 1. Prior to analysis by tandem mass spectrometry, N-glycopep-

of N-Glycopeptides by tides are harvested using ZipTipC18 Pipette Tips to ensure
Tandem Mass consistent loading onto the C18 nano-trap column (see
Spectrometry Note 15).
2. Wash the ZipTipC18 Pipette Tip by aspirating 10 μL of ZipTip
Buffer 1 three times, then expel and discard liquid.
3. Condition the ZipTipC18 Pipette Tip by aspirating 10 μL of
ZipTip Buffer 2 three times, then expel and discard liquid.
4. Resuspend the concentrated N-glycopeptides in 10 μL of Zip-
Tip Buffer 2.
5. Load the resuspended N-glycopeptides onto the conditioned
ZipTipC18 by performing 10–15 cycles of aspiration–dispensa-
tion cycles.
6. Wash the ZipTipC18 by aspirating 10 μL of the ZipTip Buffer
2, dispense to waste and repeat four more times.
7. Elute the N-glycopeptides using 5 μL of ZipTip buffer 1, aspi-

rate a few times and dispensing into a clean tube. Repeat five
times, resulting in final volume of 25 μL.
8. Remove acetonitrile using a SpeedVac concentrator and resus-
pend nearly dried peptides with 8–10 μL of MS Loading Buffer
(see Note 13).
9. Load about 6 μL of the purified glycopeptide mix into the C18
nan-trap column of the nanoflow liquid chromatography tan-
dem mass spectrometry (LC-MS/MS), using the MS Loading
Buffer at isocratic flow of 5 μL min 1.
10. Elute the N-glycopeptides into the tandem mass spectrometer
using a gradient of 3% MS Buffer B to 20% over 95 min,
followed by 20% MS Buffer B to 40% in 10 min, then 40%
MS Buffer B to 80% over 5 min. Maintain at 80% MS Buffer B
for 5 min before equilibration to 3% MS Buffer B over 10 min
(see Note 16).
11. Operate the MS in a positive ion mode, at a resolution of
120,000 in full scan mode using data-dependent acquisition
in HCD triggered ETD MS/MS analysis mode (see Note 17).
12. For HCD triggered ETD, the MS2 is operated in HCD mode
with a resolution of 30,000, AGC target of 50,000, Activation
Q of 0.25, EThcD (False) and Collision Energy of 30% for ions
above 50,000 with a charge state between 3 and 8 (see Note
18).
13. ETD was conducted at a resolution of 30,000 using charge
dependent reaction times of 11.59 ms (+6), 16.69 ms (+5),
26.08 ms (+4), and 46.37 ms (3+) (see Note 19).
14. An AGC target of 300,000 for the precursor ion was triggered
when one of the following ions was detected in the top 20 ions
in the HCD fragment spectra: 138.0545 (GlcNAc, fragment
1), 163.06 (Hex), 186.076 (GlcNAc, fragment 2), 204.0967
(GlcNAc), or 366.1396 (ManGlcNAc) (see Note 17).
3.4 Spectral Data 1. Spectral data were interrogated using Byonic™ (Protein
Interrogation Metrics) against a plant specific protein database in FASTA
format (see Note 20).
2. Default search parameters were employed with the following
changes: precursor mass tolerance—5 ppm, Fragmentation
type—Both: HCD and ETD, Fragment mass tolerance
(HCD)—10 ppm, Fragment mass tolerance (ETD)—
20 ppm, Fixed modifications—Carbamidomethyl (Cys), Vari-
able modification—Oxidation (M), Charge states—3,4,5
Applied to unassigned spectra, Precursor isotype off by x—
Too high (wide).
3. A collection of plant specific modifications is added to the

Glycan option, using the custom glycan text field and employ-
ing the fine control format. These inputs are outlined in
Table 1 under columns “N-Glycan composition” and “Byonic
Format” columns in Table 1 (see Note 21).
4. Use the following format for the list of N-glycan structures in
Table 1 to generate a custom plant glycan database in
Byonic™:
HexNAc(4)Hex(3)Fuc(1)Pent(1) @ NGlycan | common1
5. After spectral assignments and matching, high confidence
PSMs were obtained by filtering data in Microsoft Excel to
only include peptides with a glycan modification and log prob-
ability (|Log Prob|) of >4 for HCD ( p < 0.0001) spectra or
>2 for ETD spectra ( p < 0.01) (see Note 22).
4 Notes
1. This method has been successfully used to enrich N-glycopep-

tides from leaf, stem and floral material from Arabidopsis thali-
ana. The approach should readily work on most plant species
assuming protein extraction can be undertaken on the tissue
being studied [19].
2. Urea can readily degrade to ammonium and cyanate in solution
and this decomposition will accelerate if the solution is heated
or old. The solution should be made fresh and maintained at
room temperature.
3. A stock solution of 1 M DTT can be stored in aliquots at
20 C.
4. A stock solution of 1 M IAA can be stored in aliquots at
20 C. IAA alkylates thiol group on cysteine residues after
reduction with DTT. This step and the DTT step can be
omitted, however it is virtually impossible to detect cysteine
containing peptides unless controlled alkylation is undertaken.
IAA is light and heat sensitive and should be stored in the dark.
5. Most high-grade sources of trypsin can be employed; however,
some suppliers of trypsin for proteomics have produced a sta-
bilized enzyme where lysine residues have been modified by
reductive methylation making the enzyme resistant to autolytic
digestion [20].
6. Macrospin columns have a capacity of around 300 μg of pro-
tein/peptide.
7. N-glycopeptides fragmented only using HCD or CID will
often result in complex MS/MS spectra that are difficult to
match. This is due to the retention and/or partial
Table 1
N-glycan structures used to generate plant glycan database for MS/MS data interrogation
Mass Relative abundance

N-glycan composition Type (amu) (%) Byonic format
HexNAc(2)Hex(11) Immature 2188.74 0.0 @ NGlycan | rare1
HexNAc(2)Hex(10) Immature 2026.69 0.7 @ NGlycan |
common1
HexNAc(2)Hex(9) High mannose 1864.63 3.1 @ NGlycan |
common1
common1
common1
common1
common1
HexNAc(2)Hex(4)Pent(1) Hybrid 1186.41 1.1 @ NGlycan |
common1
HexNAc(2)Hex(4)Fuc(1) Hybrid 1200.43 0.2 @ NGlycan | rare1
HexNAc(2)Hex(4)Fuc(1) Hybrid 1332.49 1.0 @ NGlycan |
Pent(1) common1
Pent(1)
HexNAc(2)Hex(5)Pent(1) Hybrid 1348.47 0.2 @ NGlycan | rare1
Pent(1)
Pent(1)
Pent(1)
HexNAc(3)Hex(4) Hybrid 1257.45 0.6 @ NGlycan |
common1
HexNAc(3)Hex(4)Pent(1) Hybrid 1389.49 0.9 @ NGlycan |
common1
HexNAc(3)Hex(4)Fuc(1) Hybrid 1535.57 1.8 @ NGlycan |
Pent(1) common1
(continued)
Table 1
(continued)

Pent(1)
HexNAc(3)Hex(5) Hybrid 1419.50 0.3 @ NGlycan | rare1
Pent(1)
HexNAc(3)Hex(3) Complex 1095.40 0.3 @ NGlycan | rare1
HexNAc(3)Hex(3)Pent(1) Complex 1227.44 5.5 @ NGlycan |
common1
HexNAc(3)Hex(3)Fuc(1) Complex 1241.45 0.4 @ NGlycan | rare1
HexNAc(3)Hex(3)Fuc(1) Complex 1373.52 11.1 @ NGlycan |
Pent(1) common1
common1
HexNAc(4)Hex(3)Fuc(1) Complex 1576.60 20.4 @ NGlycan |
Pent(1) common1
HexNAc(4)Hex(4)Pent(1) Complex 1592.57 0.5 @ NGlycan | rare1
Pent(1)
Pent(1)
Pent(1)
common1
HexNAc(2)Hex(3) Paucimannose 892.32 0.8 @ NGlycan |
common1
HexNAc(2)Hex(3)Pent(1) Paucimannose 1024.36 2.8 @ NGlycan |
common1
(continued)
Table 1
(continued)

HexNAc(2)Hex(3)Fuc(1) Paucimannose 1038.38 0.6 @ NGlycan |
common1
HexNAc(2)Hex(3)Fuc(1) Paucimannose 1170.44 16.4 @ NGlycan |
Pent(1) common1
HexNAc(2)Hex(3)Fuc(2) Paucimannose 1448.52 0.0 @ NGlycan | rare1
Pent(2)
HexNAc(2)Hex(4) Truncated 1054.37 1.1 @ NGlycan |
common1
HexNAc(2)Hex(2) Truncated 730.27 0.2 @ NGlycan | rare1
HexNAc(2)Hex(2)Fuc(1) Truncated 1008.38 1.1 @ NGlycan |
Pent(1) common1
HexNAc(2)Hex(2)Pent(1) Truncated 862.31 0.2 @ NGlycan | rare1
HexNAc(2)Hex(2)Fuc(1) Truncated 876.34 0.2 @ NGlycan | rare1
HexNAc(2)Hex(1)Fuc(1) Truncated 714.29 0.0 @ NGlycan | rare1
HexNAc(2)Hex(1) Truncated 568.22 0.2 @ NGlycan | rare1
HexNAc(2) Truncated 406.17 0.1 @ NGlycan | rare1
HexNAc(1) Truncated 203.09 0.6 @ NGlycan |
common1
The relative abundance was calculated from high confidence N-glycopeptide matches from Arabidopsis thaliana [7]. The
Byonic Format can be used to create a plant N-glycan database. The most common plant N-glycan structures are shown
as bold
fragmentation of the N-glycan during HCD / CID. The

benefit of ETD as a complement to HCD/CID is that the N-
glycan remains intact on the peptide backbone and the
subsequent ETD fragmentation spectra generates c- and z-
series fragment ions for optimal spectra matching (Fig. 2).
8. It is possible to employ other search engines, but we have
found that Byonic™ is simple to use and is well suited for
matching N-glycopeptide spectra from either HCD or ETD.
The software employs standard glycan strings to enable cus-
tomization of many glycan structures as variable modifications
in the search parameters (Table 1). Note: these programs can-
not distinguish isomers, nor can they identify the branching
structure of the glycopeptide.
9. Be cautious when squeezing Miracloth as it can easily split. If
dealing with small volumes and intense squeezing is necessary,
a vinyl mesh support can be employed.
GlcNAc Gal
a GlcNAc
Man Xyl
204.1 m/z
y11y10y9 y8 y7 y6 y5 y4 y3 Fuc
Glc
INATGVVAPVGFK
b2 b3 b4 b5 b6 b10 b12 1475.84 (M+H)+
Intensity (cps)
1271.75 (M+H)+
2+
ManGlcNAc 636.86 (M+2H)
|
738.42 (M+2H)2+
366.14 m/z | (M+H)+
1621.95
- y7
- y9
- y11
- y5
- y6
- y10
- y8
- b3
- y12
- b2
1679.93 (M+H)+
- b12
b6
- b10
- y3
- b4
- b5
500 1000 1500
m/z
b
1222.08 (M+2H)2+
z11z10 z7 z6 z4 z3 z2 815.05 (M+3H)3+
INATGVVAPVGFK
Intensity (cps)
c2 c3 c4 c5 c6 c7
- z6
- c7
- z7
- c6
- z2
- z3
- z4
- c5
- z10
- z11
- c4
- c3
- c2
500 1000 1500
m/z
Fig. 2 An example of HCD triggered ETD fragmentation spectra. (a) A peak of 204.1 m/z (GlcNAc) and
366.1396 m/z (ManGlcNAc) in the HCD fragmentation spectra triggered ETD fragmentation of this precursor
ion 815.05 [M+3H]3+. (b) Resultant ETD spectra (See Notes 18 and 19)
10. A smaller volume than recommended by these assay kits can be

employed to reduce waste (e.g., 5 μL instead of 10 μL as
suggested in the Pierce™ BCA Protein Assay Kit).
11. A solution of 7 M urea will readily denature proteins; however,
the activity of trypsin is dramatically reduced in 7 M urea.
Dilution to 1 M urea before adding trypsin will enable diges-
tion of proteins by trypsin.
12. Adding trypsin to protein at a ratio of 1:25 or 1:50 is generally
recommended. When using stabilized trypsin, a ratio of 1:25
should be suitable in most cases. However, if using a nonsta-
bilized form of trypsin, increase the ratio to 1:10 for trypsin–
protein.
13. Dry the sample down until a few microliters of liquid remain.
This reduces the chance of peptides “sticking” to the plastic
microfuge tube. If these peptide concentration steps are affect-
ing sample yield, it is possible to employ surface siliconization
of microfuge tubes or the addition of 1% bovine serum
albumin (BSA) to the sample has been shown to significantly

improve recovery [21].
14. The sequential concentrations of acetonitrile employed for
elution of plant N-glycopeptides from the HILIC SPE col-
umns was empirically tested and this range (70–50% acetoni-
trile) was found to be optimal in the selective elution of
peptides harboring a range of expected N-glycan structures.
Lower concentrations of acetonitrile for elution from HILIC
SPE will result in a plethora of unmodified hydrophilic
peptides.
15. After election and concentration of N-glycopeptides from the
HILIC SPE column the sample is theoretically ready for analy-
sis by LC-MS/MS. However, the total amount of peptides in
this fraction can vary considerably. The use of ZipTipC18
Pipette Tips at this step enables a fixed peptide amount to be
loaded onto the C18 nano-trap column. A typical ZipTipC18
Pipette Tip has a peptide binding capacity of around 1 μg,
although the supplier indicates it can bind up to 5 μg.
16. The elution profile has been optimized for the separation of
hydrophilic N-glycopeptides with an elongated ramp to 20%
acetonitrile.
17. During HCD, the N-glycan structures on N-glycopeptides will
also be fragmented. This will result in the generation of N-
glycan signature ions that can be used to trigger ETD of the
same precursor ion. The following fragment ions are common
during HCD fragmentation of plant N-glycopeptides:
138.0545 m/z (GlcNAc, fragment 1), 163.06 m/z (Hex),
186.076 m/z (GlcNAc, fragment 2), 204.0967 m/z
(GlcNAc), and 366.1396 m/z (ManGlcNAc).
18. HCD triggered ETD will generate two classes of fragmenta-
tion spectra. HCD fragmentation spectra will contain some y-
and b-series ions, N-glycan fragments, y- and b-series ions
harboring glycans (usually a HexNAc or two) as well as the
charged precursor ion (peptide) without the N-glycan. Thus,
the HCD fragmentation spectra enable confirmation of an
N-glycan (N-glycan fragments) and an accurate estimation of
the mass of the N-glycan structure and the peptide. ETD
fragmentation spectra usually contain z- and c-series ions
enabling confident assignment when using a search engine.
Note: only about 10% of MS/MS spectra from an HCD trig-
gered ETD analysis will comprise ETD spectra (Fig. 2).
19. While it is possible to undertake ETD-only analysis of samples
in conjunction with HCD-only or HCD triggered ETD to
obtain important ETD fragmentation spectra for unique N-
glycopeptides, the current generation of instruments generate
few MS/MS.
20. A Protein Metrics Byonic Viewer is freely available to view

result files (.byrslt) generated after data interrogation.
21. Based on only high confidence N-glycopeptides matched from
our analysis of eight Arabidopsis thaliana samples [7], we have
generated a plant N-glycan database suitable for Byonic™ that
includes a Fine Control parameter (common or rare) based on
the frequency that the particular structure was detected in our
samples. Those N-glycan structures identified in 0.5% of all
N-glycopeptides identified were classifies as “rare.” This will
greatly accelerate data processing, resulting in a 10 min analysis
time for a 1 gigabyte raw datafile on a system with 24 cores at
3000 MHz.
22. By employing stringent log probability cutoffs, we estimated
an FDR < 1% for all PSMs (FDR 2D). While some of the
reported N-glycan structures did not conform to expected
compositions, for example, HexNAc(5)Hex(3)Pent(1), using
multiple replicates eliminated many of these unexpected
structures.
References
1. Hebert DN, Lamriben L, Powers ET et al 9. Elbers IJW, Stoopen GM, Bakker H et al
(2014) The intrinsic and extrinsic effects of (2001) Influence of growth conditions and
N-linked glycans on glycoproteostasis. Nat developmental stage on N-glycan heterogene-
Chem Biol 10:902–910 ity of transgenic immunoglobulin G and
2. Stanley P, Taniguchi N, Aebi M (2015) N- endogenous proteins in tobacco leaves. Plant
Glycans. In: rd VA, Cummings RD et al (eds) Physiol 126:1314–1322
Essentials of glycobiology. Cold Spring Har- 10. Strasser R, Stadlmann J, Svoboda B et al
bor Laboratory Press, Cold Spring Harbor (2005) Molecular basis of N-acetylglucosami-
(NY), pp 99–111 nyltransferase I deficiency in Arabidopsis thali-
3. Liu Y, Li J (2014) Endoplasmic reticulum- ana plants lacking complex N-glycans.
mediated protein quality control in Arabidop- Biochem J 387:385–391
sis. Front Plant Sci 5:162 11. Pedersen CT, Loke I, Lorentzen A et al (2017)
4. Rips S, Bentley N, Jeong IS et al (2014) Multi- N-glycan maturation mutants in Lotus japoni-
ple N-glycans cooperate in the subcellular tar- cus for basic and applied glycoprotein research.
geting and functioning of Arabidopsis Plant J 91:394–407
KORRIGAN1. Plant Cell 26:3792–3808 12. Song W, Mentink RA, Henquet MG et al
5. Strasser R (2016) Plant protein glycosylation. (2013) N-glycan occupancy of Arabidopsis N-
Glycobiology 26:926–939 glycoproteins. J Proteome 93:343–355
6. Fanata WI, Lee KH, Son BH et al (2013) 13. Zielinska DF, Gnad F, Schropp K et al (2012)
N-glycan maturation is crucial for cytokinin- Mapping N-glycosylation sites across seven
mediated development and cellulose synthesis evolutionarily distant species reveals a diver-
in Oryza sativa. Plant J 73:966–979 gent substrate proteome despite a common
7. Zeng W, Ford KL, Bacic A et al (2018) N- core machinery. Mol Cell 46:542–548
linked glycan micro-heterogeneity in glycopro- 14. Ma J, Wang D, She J et al (2016) Endoplasmic
teins of Arabidopsis. Mol Cell Proteomics reticulum-associated N-glycan degradation of
17:413–421 cold-upregulated glycoproteins in response to
8. Henquet M, Lehle L, Schreuder M et al (2008) chilling stress in Arabidopsis. New Phytol
Identification of the gene encoding the alpha 212:282–296
1,3-mannosyltransferase (ALG3) in Arabidop- 15. Xu SL, Medzihradszky KF, Wang ZY et al
sis and characterization of downstream (2016) N-glycopeptide profiling in
N-glycan processing. Plant Cell 20:1652–1664
Arabidopsis inflorescence. Mol Cell Proteomics assignment of their glycosylation sites using
15:2048–2054 HILIC enrichment and partial deglycosylation.
16. Wilson IB, Zeleny R, Kolarich D et al (2001) J Proteome Res 3:556–566
Analysis of Asn-linked glycans from vegetable 19. Ford KL, Zeng W, Heazlewood JL et al (2015)
foodstuffs: widespread occurrence of Lewis a, Characterization of protein N-glycosylation by
core alpha1,3-linked fucose and xylose substi- tandem mass spectrometry using complemen-
tutions. Glycobiology 11:261–274 tary fragmentation techniques. Front Plant Sci
17. Strasser R, Schoberer J, Jin C et al (2006) 6:674
Molecular cloning and characterization of Ara- 20. Rice RH, Means GE, Brown WD (1977) Sta-
bidopsis thaliana Golgi alpha-mannosidase II, a bilization of bovine trypsin by reductive meth-
key enzyme in the formation of complex N- ylation. Biochim Biophys Acta 492:316–321
glycans in plants. Plant J 45:789–803 21. Goebel-Stengel M, Stengel A, Tache Y (2011)
18. Hagglund P, Bunkenborg J, Elortza F et al The importance of using the optimal plastic-
(2004) A new strategy for identification of N- ware and glassware in studies involving pep-
glycosylated proteins and unambiguous tides. Anal Biochem 414:38–46
Chapter 18
High-Resolution Lysine Acetylome Profiling by Offline

Fractionation and Immunoprecipitation
Jonas Giese, Ines Lassowskat, and Iris Finkemeier
Abstract
Acetylation of lysine side chains at their ε-amino group is a reversible posttranslational modification (PTM),
which can affect diverse protein functions. Lysine acetylation was first described on histones, and nowadays
gains more and more attention due to its more general occurrence in proteomes, and its possible crosstalk
with other protein modifications. Here we describe a workflow to investigate the acetylation of lysine-
containing peptides on a large scale. For this high-resolution lysine acetylome analysis, dimethyl-labeled
peptide samples are pooled and offline-fractionated using hydrophilic interaction liquid chromatography
(HILIC). The offline fractionation is followed by an immunoprecipitation and liquid chromatography–-
tandem mass spectrometry (LC-MS/MS) for data acquisition and subsequent data analysis.
Key words Lysine acetylation, HILIC, Offline fractionation, Dimethyl labeling, MaxQuant
1 Introduction
Plants are exposed to ever changing environmental conditions

[1]. A proper development and growth of plants relies on the
metabolic acclimation to such conditions [2]. A fast acclimation is
realized by signaling networks to restore metabolic homeostasis
after disturbance [1, 3]. Cell signaling networks are often mediated
by posttranslational protein modifications (PTM) through phos-
phorylation, redox regulation, acetylation, and other modifica-
tions, which then regulate gene expression and protein turnover
[4, 5]. The chemical modification can have several consequences
for the proteins such as an altered stabilization, degradation, locali-
zation, interactions with other proteins and metabolites, as well as
the regulation of enzyme activities [6].
The acetylation of lysines ε-amino groups was initially
described on histones where it regulates chromatin and gene
expression [7]. The addition of an acetyl-group to lysine side chains
neutralizes the positive charge of the amino group [8]. Usually the
modification is catalyzed by lysine acetyltransferases using acetyl-
241
242 Jonas Giese et al.
CoA as substrate [9–11]. However, under specific conditions, such

as a high cellular pH and large amounts of substrate, lysine acetyla-
tion can also happen nonenzymatically [12, 13]. These conditions
can be met during respiration in the mitochondrial matrix or in the
chloroplast stroma while photosynthesis takes place [14]. In con-
trast to N-terminal acetylation, lysine acetylation is reversible and
can be removed by various types of deacetylases [15–18]. Detailed
investigations of lysine acetylomes was delayed for a long time
compared to the analyses of phosphoproteomes due to several
technical limitations [14]. Technical advances in high-resolution
mass spectrometry and the development of more efficient antibo-
dies for the enrichment of modified peptides via immunoprecipita-
tion (IP) lead to huge leaps in acetylation research [19]. The
numbers of identified acetylation sites in numerous organisms are
continuously increasing [11, 13, 20–26]. While the first acetylomes
of Arabidopsis only comprised about 100 acetylation sites [27, 28],
over 2100 acetylation sites on more than 1000 proteins could be
identified with the enhanced workflow presented here, which also
led to the discovery of new target proteins of histone deacetylases in
the nucleus as well as in chloroplasts [29, 30]. Lysine-acetylated
proteins were found especially abundant in plant mitochondria,
chloroplasts, and nuclei [13, 27–29, 31–34]. Furthermore, it was
discovered that lysine acetylation negatively regulates the activity of
the RubisCO enzyme as well as the ADP-sensitivity of RubisCO
activase [27, 29].
Since the occupancy of acetylation on lysine side chains is
generally rather low, an enrichment of the acetylated peptides by
IP is necessary for their detection by mass spectrometry [35]. For
the quantification of lysine acetylation changes between samples, a
labeling and pooling step of the different samples is required prior
to the IP to reduce technical error caused by the enrichment.
Dependent on the organism and type of tissue, different isotopic
labeling techniques can be used. If the label cannot be incorporated
into the proteome of the living organism, a peptide-based labeling
approach is preferred. Here we utilize the dimethyl-labeling of the
peptide amino groups, as it allows for triplexing, and is affordable in
comparison to commercial labeling reagents. The isotopic
dimethyl-labeling results in labeling efficiencies of more than 99%
[36]. Using a bottom-up proteomics approach, trypsinated, frac-
tionated, and immunoprecipitated peptide samples are analyzed on
a nano-LC MS/MS setup coupled via an ESI source.
The workflow consists of six methods that are utilized to obtain
a quantitative high resolution lysine acetylome profile of Arabidop-
sis thaliana leaves (Fig. 1).
1. Denaturing extraction of proteins using detergents to solubi-
lize membrane proteins to achieve a higher proteome coverage.
As detergent sodium dodecyl sulfate (SDS) is used.
Offline Fractionation for Lysine Acetylome Profiling 243
Fig. 1 Workflow for quantitative lysine acetylome profiling. The workflow comprises six crucial steps:
(1) Experimental setup to investigate up to three different conditions/genotypes or treatments. (2) Protein
extraction from harvested leaf tissue under denaturing and reducing conditions with subsequent alkylation and
trypsin digestion using a modified FASP protocol. (3) Desalting and isotopic dimethyl labeling, followed by
pooling of the labeled samples in equal amounts of peptide. (4) Offline fractionation of peptides utilizing a ZIC-
HILIC column on an HPLC system collecting seven peptide fractions. (5) Enrichment of lysine-acetylated
peptides by immunoprecipitation (IP) using an anti-acetyl lysine antibody. (6) Measuring desalted samples
(enriched and IP input for whole proteome analysis) on a nano-LC MS/MS system. Repeat analysis for at
least four biological replicates
2. Alkylation and trypsin digestion is done by application of a

modified filter-aided sample preparation (FASP) method
[37]. In this step SDS is removed from the sample by washing
with an urea buffer, which is followed by alkylation with chlor-
oacetamide (CAA) and subsequent trypsination.
3. Isotopic dimethyl labeling of peptides on C18 columns is used
for an accurate quantification and reduction of technical error
[38]. Up to three different samples can be combined (triplex-
ing). Within replicates the different labels should be swapped
to avoid a labeling bias due to some rare retention time shift
events of deuterated peptides.
4. Samples from leaf extracts usually have a high complexity. To
enable better coverage of the lysine acetylome, an additional
fractionation step prior MS/MS analysis is performed. This
offline fractionation is executed using a hydrophilic interaction
liquid chromatography (HILIC) column (e.g., Sequant
ZIC-HILIC column, Merck) using a reversed-phase buffer
system (Fig. 2) on an HPLC or FPLC system.
mAU %B
70
150
60
50
100
40
30
50
20
10
0
Fractions 1 2 3456 7
Fig. 2 Stepped-linear gradient and peptide distribution of ZIC-HILIC offline fractionation. The gradient was set
up with a flow rate of 0.5 mL per min for 22 column volumes (CV, 1 CV ¼ 2493 mL), which corresponds to
115 min. The gradient started with 10 % buffer BZH in buffer AZH for the duration of three CV. The concentration
of buffer BZH was then increased to 58 % with a linear gradient over 12 CV. Another three CV ran at 79 %
buffer BZH. The gradient was then reset to 10 % for another three CV. Twenty-two fractions of peptides were
collected during the first 19 CV; twenty-one of those fractions were pooled into seven fractions with about
equal peptide amounts. The last fraction was discarded, as it contained no peptides
5. Enrichment of lysine-acetylated peptides. As lysine-acetylated

peptides are underrepresented in the whole peptide popula-
tion, enrichment by IP is crucial for identification and a good
coverage of acetylation sites. Bead-bound anti-acetyl lysine
antibodies are available from various suppliers.
6. Samples are desalted and subsequently measured on a nano-LC
MS/MS setup.
2 Materials
2.1 Protein 1. Liquid nitrogen.

Extraction 2. SDT-extraction buffer: 4% (w/v) sodium dodecyl sulfate (SDS)
in 100 mM Tris–HCl pH 7.6 containing 100 mM dithiothrei-
tol (DTT). Prepare 2.5 volumes (v/w) of sample material.
3. Pierce 660 nm Protein Assay with ionic detergent compatibility
reagent (Thermo Scientific).
4. Mortar and pestle.
5. Heat block.
6. Ultrasonic bath.
7. Benchtop centrifuge for microcentrifuge tubes. Centrifugation
with speeds of at least 15,000 g should be possible.
2.2 Filter-Aided 1. Urea buffer: 8 M urea in 100 mM Tris–HCl pH 8.5. Prepare

Sample Preparation 25 mL per sample. The buffer should always be prepared fresh
(FASP) (see Note 1).
2. Chloroacetamide (CAA) solution: 55 mM CAA in urea buffer.
Prepare 1 mL per sample. The solution should always be
prepared fresh (see Note 1).
3. ABC buffer: 50 mM ammonium bicarbonate in dH2O. Prepare
20 mL per sample.
4. ABC buffer containing 10 % (v/v) acetonitrile (ACN).
5. Proteomics-grade trypsin dissolved in 50 mM acetic acid to a
final concentration of 1 μg/μL (e.g., Trypsin MS approved by
SERVA).
6. Centrifugal filter device (CFD): Amicon Ultra-4 30k MWCO
(Merck Millipore) or similar (see Note 2).
7. Laboratory film (e.g., Parafilm M, Bemis Company).
8. Benchtop centrifuge, accommodating 15 mL conical-
bottomed tubes.
9. Microvolume UV-VIS spectrophotometer (e.g., NanoDrop
2000, Thermo Scientific).
2.3 Desalting 1. Methanol.

and Dimethyl Labeling 2. Buffer AC18: 0.5 % (v/v) formic acid (FA) in ultrapure water.
of Peptides
3. Buffer BC18: 80 % (v/v) ACN (LC-MS-grade), 0.5 % (v/v) FA
in ultrapure water.
4. 50 mM NaH2PO4 (monohydrate).
5. 50 mM Na2HPO4 (dihydrate).
6. 4 % (v/v) CH2O (light label), CD2O (intermediate label)
or13CD2O (heavy label) in water (LC-MS-grade).
7. 0.6 M NaBH3CN (light and intermediate label) or NaBD3CN
(heavy label) in water.
8. 10 % (v/v) trifluoroacetic acid (TFA) (LC-MS-grade) in ultra-
pure water.
9. Sep-Pak C18 classic cartridge, 360 mg sorbent per cartridge,
55–105 μm particle size (Waters).
10. 5 mL plastic syringe.
11. Vacuum concentrator (e.g., Concentrator 5301, Eppendorf).
2.4 ZIC-HILIC Offline 1. Buffer AZH: 95 % (v/v) ACN (LC-MS-grade) with 2 % (v/v)
Fractionation FA and 5 mM ammonium acetate.
2. Buffer BZH: 0.07 % (v/v) FA and 5 mM ammonium acetate.
3. ZIC-HILIC column (e.g., Sequant ZIC-HILIC column,
150 4.6 mm, 3.5 μm, 200 Å, Merck).
4. High pressure liquid chromatography (HPLC) unit with a

UV/VIS A280 nm detector and fraction collector (e.g.,
LC-20A Prominence HPLC with SPD-20A UV/VIS detector
and FRC-10A fraction collector).
5. Ultrasonic bath.
6. Benchtop centrifuge for microcentrifuge tubes, reaching
speeds up to 12,000 g.
2.5 Enrichment 1. TBS Buffer: 50 mM Tris–HCl with 150 mM NaCl, pH 7.6,

of Lysine-Acetylated sterile-filtered.
Peptides 2. 5 M NaOH.
3. dH2O.
4. 10 % (v/v) TFA.
5. 1 % (v/v) TFA.
6. 5 % (v/v) ACN, 1 % (v/v) TFA.
7. Anti-acetyl-lysine antibody bound to agarose beads (e.g., Anti-
Acetyl Lysine, Agarose, 10 mg/ mL in 1 mL glycerol slurry,
Immunechem).
8. pH test paper.
9. 1.5 mL low-binding reaction tubes.
10. Rolling wheel in fridge or cold room.
11. Refrigerated benchtop centrifuge for microcentrifuge tubes.
2.6 Desalting 1. 30 % (v/v) MeOH, 1 % (v/v) TFA.

of Peptides 2. 0.2 % (v/v) TFA.
on SDB-RPS
3. 80 % (v/v) ACN, 5 % (v/v) ammonia.
Stop-and-Go-
Extraction Tips 4. Buffer A∗: 2 % ACN, 0.1 % TFA.
(Stage Tips) 5. Styrene divinylbenzene reversed-phase sulfonate (SDB-RPS)
solid-phase extraction disks (Empore).
6. Stage Tip Adapter (Sonation) or Stage tipping centrifuge (e.g.,
STC-V2, Sonation).
7. 2 mL low-binding reaction tubes.
8. 0.5/ or 1.5 mL low-binding reaction tubes.
9. Benchtop centrifuge for microcentrifuge tubes, reaching
speeds of at least 15,000 g.
3 Methods
3.1 Protein The following protocol is optimized for Arabidopsis leaves (see
Extraction Note 3). Unless, indicated otherwise all steps are carried out at
room temperature (see Note 4).
1. Leaves are harvested and immediately frozen in liquid nitrogen.
Material is then ground under liquid nitrogen to a fine powder.
Required amount of material for protein extraction is weighed
into a precooled reaction tube (2 mL for up to 300 mg or a
15 mL tube) (see Note 5).
2. Heat a water bath to 95 C and preheat SDT extraction buffer
to 95 C (see Note 6).
3. Mix the samples with 2.5 volumes (v/w) of hot SDT-lysis
buffer. Vortex immediately to resuspend the powder in the
buffer. Incubate samples for 5 min at 95 C in the water bath
and vortex twice in between for 15–20 s.
4. Place samples in an ultrasonic bath and sonicate for 15 min.
5. Centrifuge extract for 30 min in a benchtop centrifuge at Vmax
(15,000–21,000 g) (see Note 7).
6. Transfer the supernatant to a new tube without disturbing the
pelleted material.
7. Repeat steps 5 and 6.
8. Determine protein concentration using the Pierce 660 nm
protein assay with ionic detergent compatibility reagent
according to the manufacturer’s instructions.
3.2 Filter-Aided Centrifugation steps will be done at 4000 g, unless indicated

Sample Preparation otherwise. CFDs should be centrifuged until supernatant is at least
(FASP) tenfold concentrated. If sufficient concentration is not achieved
during the indicated centrifugation times, longer centrifugation
may be necessary. A peptide yield after elution of 50–70 % com-
pared to initial protein amount can be expected.
1. Add 2 mL urea buffer on the CFD and centrifuge for 5 min for
conditioning of the membrane (see Note 8).
2. Dilute the sample eightfold with urea buffer to decrease SDS
concentration below 0.5 %. Load the diluted sample onto the
CFD (see Note 9).
3. Centrifuge samples for 15 min.
4. Discard flow-through and add 4 mL urea buffer to CFD.
Repeat step 3.
5. Discard flow-through. Add 1 mL CAA solution and softly mix
the solution for 1 min by up and down pipetting. Be careful to
not damage the membrane. For alkylation, incubate the sam-
ples for 30 min at room temperature in the dark.
6. Centrifuge the CFD for 15 min and discard the flow-through.

7. Add 4 mL of urea buffer to the CFD and centrifuge for 15 min.
Discard the flow-through. Repeat this step twice.
8. Add 4 mL ABC buffer to the CFD and centrifuge for 15 min.
Discard the flow-through and repeat this step twice.
9. Place CFD into new collection tubes.
10. Fill the CFD with ABC Buffer until the membrane is covered.
Add trypsin at a 1:100 enzyme to protein ratio (e.g., 10 μg
trypsin to 1 mg protein). Softly mix the sample using a pipette.
Be careful to not damage the membrane (see Note 10).
11. Close the cap of the tube and secure it with laboratory film.
Then incubate the CFD at 37 C overnight.
12. To elute the peptides from the CFD, centrifuge for 15 min.
13. Rinse the filters by adding 500 μL ABC buffer to the CFD and
centrifuge.
14. Repeat step 13.
15. Repeat step 13 with 1 mL ABC buffer containing 10 % ACN.
16. Determine the peptide concentration with a microvolume
UV-VIS photometer at a wavelength of 280 nm (see Note 11).
17. Add TFA to a final concentration of 1 % to acidify the sample
for storage or further processing.
3.3 Desalting Labeling reagent should be prepared shortly before use as shown in
and Dimethyl Labeling Table 1. Solutions must be kept cold until use to prevent unwanted
of Peptides on C18 side-reactions.
Columns 1. Sep-Pak C18 (360 mg) cartridges are assembled with a 5 mL
syringe (without piston) in a rack above a tray to collect flow-
through.
2. Flush Sep-Paks with 3 mL methanol to condition the C18
matrix (see Note 12).
3. Repeat step 2 once with 3 mL buffer BC18 and afterward with
3 mL buffer AC18.
4. Load peptide sample on Sep-Paks (see Note 13).
5. Add 1 mL buffer AC18.
7. Add dimethyl-labeling reagent on Sep-Paks (see Notes 13 and
14).
10. Transfer SepPaks to 2 mL reaction tubes to collect eluate. Elute
samples twice by loading 700 μL buffer BC18 (see Note 13).
Table 1
Scheme to prepare dimethyl labeling reagent. 5 mL reagent per sample is used
Light label Intermediate label Heavy label

50 mM NaH2PO4 1 mL 1 mL 1 mL
50 mM Na2HPO4 3.5 mL 3.5 mL 3.5 mL
4% (v/v) CH2O 0.25 mL – –
4% (v/v) CD2O – 0.25 mL –
13
4% (v/v) CD2O – – 0.25 mL
0.6 M NaBH3CN 0.25 mL 0.25 mL –
0.6 M NaBD3CN – – 0.25 mL
Volume in total 5 mL 5 mL 5 mL
The dimethyl-labeling of peptides at unmodified lysine residues and N-termini introduces a specific mass shift which is
28, 32, or 36 Da for the light, intermediate, and heavy labels, respectively
11. Determine yield of desalted peptides using a microvolume

UV-VIS photometer at 280 nm (see Note 15).
12. Corresponding differentially labeled samples are combined in a
1:1:1 ratio in relation to their total peptide amount.
13. Dry peptides using a vacuum concentrator.
3.4 ZIC-HILIC Offline 1. Dissolve dried peptides in 34 μL buffer AZH and 900 μL buffer
Fractionation BZH. Sonicate for 4 min in ultrasonic bath and vortex shortly.
Centrifuge for 1 min at 12,000 g.
2. Transfer supernatant into a new 2 mL reaction tube. Dissolve
pellet in 34 μL buffer AZH and 500 μL buffer BZH. Sonicate for
4 min in ultrasonic bath and vortex shortly. Centrifuge for
1 min with maximum speed.
3. Combine supernatants and repeat step 2 if a pellet is persistent.
4. Collect supernatant and combine with previous supernatants
(see Note 16).
5. Samples are fractionated on an HPLC unit using a 3.5 μm
ZIC-HILIC column. Up to 10 mg can be loaded on the
column. A segmented linear gradient from 10 to 58% buffer
BZH and a flow rate of 500 μL per minute is used (Fig. 2).
6. Twenty-two fractions are collected and combined into seven
fractions in 1.5 mL reaction tubes aiming at equally distributed
peptide amounts.
7. Fractions are then dried in a vacuum centrifuge.
(A) 750 (B) 7500
number of protein groups

number of KAc sites
500 5000
250 2500
0 0
g
g
/m
/m
/m
/m
/m
/m
AB
AB
AB
AB
AB
AB
µL
µL
µL
µL
µL
µL
10
25
50
10
25
50
Fig. 3 Optimization of antibody bead amount for lysine acetylome analysis. (a) The number of identified lysine-
acetylated peptides (Kac) differs between different antibody bead (AB) amounts used for the IP (10, 25, 50 μL
antibody solution, respectively, with 1 mg peptide). (b) The nonenriched total proteome samples showed
identical numbers of protein groups (mean SD, n ¼ 4). Peptides and protein groups were identified with
MaxQuant with settings reported in Hartl et al. [29]
3.5 Enrichment All steps regarding the anti-acetyl-lysine antibody bound to agarose
of Lysine-Acetylated beads should be performed on ice, as the antibody is sensitive to
Peptides temperature changes. All buffers should be precooled.
1. Resuspend samples in a maximum volume of 1 mL TBS buffer
(see Note 17).
2. The samples should be adjusted to pH 7–8.
3. Prepare an aliquot of 10 μg of peptide sample for the total
proteome analysis and acidify with 10 % (v/v) TFA to a final
concentration of 1 % (v/v).
4. Use 25–50 μL of antibody–bead slurry per milligram of peptide
(Fig. 3). Pipetting of the antibody should be done using cut
tips. Up to 500 μL of antibody can be pooled into one 1.5 mL
low-binding reaction tube for equilibration (see Note 18).
5. Add 1 mL precooled TBS and incubate the antibody for 5 min
on a rolling wheel in the fridge or a cold room. Centrifuge
afterward for 1 min at 1000 g, 4 C.
6. Carefully remove the supernatant (see Note 19).
7. Repeat steps 4 and 5 twice.
8. Distribute corresponding amounts of the washed antibody
beads into fresh tubes and add the peptides.
9. Incubate overnight on a rolling wheel at 4 C.
10. Centrifuge samples for 2 min at 1000 g, 4 C.
11. Transfer supernatant to a new reaction tube and store it as flow-

through.
12. Add 1 mL precooled TBS and incubate the antibody beads for
5 min on a rolling wheel in the fridge or a cold room. Centri-
fuge afterward for 1 min at 1000 g, 4 C.
13. Carefully remove the supernatant.
14. Repeat steps 12 and 13 three times.
15. Add 1 mL dH2O and incubate the antibody for 5 min on a
rolling wheel in the fridge or a cold room. Centrifuge afterward
for 1 min at 1000 g, 4 C.
16. Carefully remove the supernatant.
18. Elute with 1 % (v/v) TFA bead volume (e.g., 25 μL antibody
beads corresponds to 25 μL 1 % TFA). Incubate the antibody
beads for 5 min on a rolling wheel in the fridge or a cold room.
Centrifuge afterward for 1 min at 1000 g, 4 C.
19. Transfer supernatant into a new reaction tube and keep it as
enriched sample.
20. Repeat step 18. Transfer supernatant to the enriched sample.
21. Elute with 5 % (v/v) ACN, 1 % (v/v) TFA bead volume.
Incubate the antibody beads for 5 min on a rolling wheel in
the fridge or a cold room. Centrifuge afterward for 1 min
at 1000 g, 4 C.
22. Transfer supernatant to the enriched sample.
3.6 Desalting 1. Stack three layers of SDB-RPS matrix on top of each other.
of Peptides Punch one disk consisting of three stacked layers out and push
on SDB-RPS into a 200 μL pipette tip (see Note 20).
Stop-and-Go- 2. Assemble the Stage Tip with a 2 mL reaction tube joint by an
Extraction Tips adaptor.
(Stage Tips) 3. Add 100 μL ACN onto the Stage Tip and centrifuge tips at
1500 g (approximately 1–2 min).
4. Discard flow-through.
5. Add 100 μL 30 % (v/v) MeOH, 1 % (v/v) TFA onto the Stage
Tip and centrifuge the tips at 1500 g (approximately
1–2 min).
7. Add 100 μL 0.2 % (v/v) TFA onto the Stage Tip and centrifuge
the tips at 1500 g (approximately 1–2 min).
9. Load each sample (enriched for acetylome and input for total
proteome analyses) onto a Stage Tip and centrifuge tips at
650 g until the sample is loaded (approximately 5–10 min).

Multiple loading steps may be necessary. Transfer Stage Tip
into used 2 mL reaction tubes of steps 3–8 (see Note 21).
10. Add 100 μL 0.2 % (v/v) TFA onto the Stage Tip and centrifuge
the tips at 1500 g (approximately 1–2 min).
12. Transfer Stage Tips to fresh 2 mL reaction tubes to elute
desalted peptides.
13. For elution add 60 μL 80 % (v/v) ACN, 5 % (v/v) ammonia to
the tips and centrifuge at 650 g (see Note 21).
14. Transfer the eluate to a smaller reaction tube (0.5 or 1.5 mL)
and dry the peptides in a vacuum centrifuge.
15. Samples are resuspended in 10 μL buffer A∗ and peptide
concentration is determined using a microvolume UV-VIS
photometer at 280 nm. Dilute peptides to a final concentration
of 0.1–0.2 μg/μL.
3.7 Guidelines 1. High-resolution nano-UHPLC-MS/MS setup with 15–20 cm

for LC-MS/MS Analysis (75 μm diameter) C18 reversed-phase capillary columns (e.g.,
1.9 μm ReproSil-Pur C18-AQ, Dr. Maisch GmbH) should
be used.
2. Column oven should be set to 50 C.
3. At maximum, 0.5–1 μg of peptides is injected.
4. Separation is done at 300 nl/min flow.
5. Nano-LC Gradient:
(a) Buffer A: 0.1 % (v/v) formic acid.
(b) Buffer B: 80 % (v/v) ACN, 0.1 % (v/v) formic acid.
(c) 5 min linear gradient, at 5 % buffer B in buffer A.
(d) 60 min linear gradient, to 20 % buffer B.
(e) 25 min linear gradient, to 35 % buffer B.
(f) 10 min linear gradient, to 55 % buffer B.
(g) 5 min linear gradient, to 98 % buffer B.
(h) 10 min linear gradient at 98 % buffer B.
6. Detection of peptides on Q-Exactive HF MS (Thermo
Scientific).
(a) Positive mode.
(b) Mass range 300–1750 m/z at resolution 60,000.
(c) AGC target value 3e6.
(d) Lock mass enabled (445.12003).
(e) MS2: Top 15 selected, resolution 15,000, isolation win-

dow 1.3 m/z, AGC target 1e5, dynamic exclusion of
fragmented peptides.
(f) Charge states of +1, >8, and unassigned charges are
excluded from fragmentation.
7. Data analysis.
(a) Evaluation of raw data using MaxQuant [39].
(b) Search against Araport 11 database (www.araport.org).
(c) Trypsin is set as protease with two (total proteome sam-
ples) or four (enriched samples) missed cleavages, since
acetylation will lead to a miscleavage of trypsin.
(d) PSM and Protein FDR is 1 %.
(e) Multiplicity is set to 2 or 3, respectively, for light
(Dimethyl 0), intermediate (Dimethyl 4) and heavy
(Dimethyl 8) label.
(f) Fixed modification: Carbamidomethylation.
(g) Variable modifications: methionine oxidation, N-terminal
acetylation, lysine acetylation (only for enriched samples).
(h) Match between runs and requantify are enabled.
4 Notes
1. Urea and CAA solutions must be freshly prepared and cannot

be stored for a prolonged period. Preparation of the urea buffer
should be done in advance as it only dissolves slowly. Urea-
containing solution should not be heated above room temper-
ature, otherwise formation of isocyanate can happen, which
leads to carbamylation of proteins. The CAA solution has to
be kept in darkness until it is used.
2. Up to 2 mg protein can be loaded on Amicon Ultra-4 CFDs.
3. This protocol for protein extraction is optimized for Arabidop-
sis leave material, but it should also work fine on every
other kind of plant tissue, as long as it can be ground to a fine
powder.
4. Placing SDS or urea containing buffers on ice or at 4 C leads
to precipitation.
5. Sample should not thaw after harvest to avoid protein degrada-
tion and unwanted modifications.
6. A glass beaker placed on a magnetic stirrer with heating is
sufficient. Use magnetic stirring to distribute heat evenly.
7. In case you use 15 mL reaction tubes, shortly spin down

samples and transfer the supernatant with a cut tip to 2 mL
reaction tubes.
8. After approximately one minute you should stop the centrifuge
and check if the majority of buffer is still above the filter. In rare
cases abnormal fast flow-through can happen, which indicates a
broken filter unit. These CFDs must be replaced to not lose the
sample during the procedure. Broken columns are also indi-
cated in later centrifugation steps by a green flow-through
when flow-through should already be colorless.
9. As 4 mL is the maximum capacity of the CFD, multiple load-
ings might be required.
10. An additional digest with LysC can be done to achieve a more
complete digestion. We usually add LysC at a 1:100 enzyme to
protein ratio and incubate samples on CFDs 2 h at room
temperature. Afterward, trypsin is added as in Subheading
3.2, step 10 described.
11. For the determination of peptide concentration, it is assumed
that an absorption of 1 equals a peptide concentration of
1 mg/mL. Expected yields of the FASP would be approxi-
mately 50–70 % of the initial amount of protein.
12. A 1 mL pipette with cut tip can be used at all conditioning and
washing steps to moderately increase flow. One drop every few
seconds would be convenient.
13. Load and elute sample by gravity flow if possible. Use pipette
only if flow stops.
14. Exchange tray below rack to collect toxic flow-through sepa-
rately and discard accordingly.
15. As high amounts of ACN affect the concentration measure-
ment through fast evaporation, take 10 μL of the eluted sample
and dry in a vacuum centrifuge. Afterward, dissolve the pellet
in 10 μL 2 % ACN, 0.1 % TFA to determine the peptide
concentration. It is assumed that an absorption of 1 equals a
peptide concentration of 1 mg/ mL.
16. Sample is now dissolved in 95 % Buffer AZH and 5 % Buffer
BZH.
17. Sometimes it can be difficult to resuspend the samples. Adjust-
ing the pH to a range between 7 and 8 can help. The pH can be
set by adding a few μL 5 M NaOH. Also, short sonication
using an ultrasonic water bath can be helpful to dissolve pellets.
18. Be sure to mix the antibody slurry thoroughly by gentle
mixing.
19. Be careful not to disturb the antibody pellet. Using Gel-loader
tips can be advantageous.
20. Three layers of SDB-RPS can be used to desalt up to 25 μg

peptide.
21. If sample is not flowing through the matrix, centrifugation
speed may be increased in small steps of 100 g for 2 min
until sample is completely loaded/ eluted.
Acknowledgments
We gratefully acknowledge the Deutsche Forschungsgemeinschaft

(DFG, German Research Foundation) for financial support
through the project grants FI1655/3-1; FI1655/4-1; FI1655/6-
1; and the infrastructure grant INST211/744-1. This work was
carried out within the ERA-CAPS program “KatNat.”
References
1. Calfapietra C, Peñuelas J, Niinemets U€ (2015) yeast histone H4 acetyltransferase. J Biol
Urban plant physiology: adaptation-mitigation Chem 270:24674–24677
strategies under permanent stress. Trends Plant 10. Drazic A, Myklebust LM, Ree R et al (2016)
Sci 20:72–75 The world of protein acetylation. Biochim Bio-
2. Nunes-Nesi A, Fernie AR, Stitt M (2010) Met- phys Acta 1864:1372–1401
abolic and signaling aspects underpinning the 11. Koskela MM, Brünje A, Ivanauskaite A et al
regulation of plant carbon nitrogen interac- (2018) Chloroplast acetyltransferase NSI is
tions. Mol Plant 3:973–996 required for state transitions in Arabidopsis
3. Dietz K-J (2015) Efficient high light acclima- thaliana. Plant Cell 30(8):1695–1709
tion involves rapid processes at multiple mech- 12. Wagner GR, Payne RM (2013) Widespread
anistic levels. J Exp Bot 66:2401–2414 and enzyme-independent Nε-acetylation and
4. Hartl M, Finkemeier I (2012) Plant mitochon- Nε-succinylation of proteins in the chemical
drial retrograde signaling: post-translational conditions of the mitochondrial matrix. J Biol
modifications enter the stage. Front Plant Sci Chem 288:29036–29045
3:1–7 13. König A-C, Hartl M, Boersema PJ et al (2014)
5. Johnová P, Skalák J, Saiz-Fernández I et al The mitochondrial lysine acetylome of Arabi-
(2016) Plant responses to ambient temperature dopsis. Mitochondrion 19:252–260
fluctuations and water-limiting conditions: a 14. Hosp F, Lassowskat I, Santoro V et al (2017)
proteome-wide perspective. Biochim Biophys Lysine acetylation in mitochondria: from
Acta 1864:916–931 inventory to function. Mitochondrion
6. Huber SC, Hardin SC (2004) Numerous post- 33:58–71
translational modifications provide opportu- 15. Alinsug MV, Yu C-W, Wu K (2009) Phyloge-
nities for the intricate regulation of metabolic netic analysis, subcellular localization, and
enzymes at multiple levels. Curr Opin Plant expression patterns of RPD3/HDA1 family
Biol 7:318–322 histone deacetylases in plants. BMC Plant Biol
7. Allfrey VG, Faulkner R, Mirsky AE (1964) 9:37
Acetylation and methylation of histones and 16. Shen Y, Wei W, Zhou D-X (2015) Histone
their possible role in the regulation of RNA acetylation enzymes coordinate metabolism
synthesis. Proc Natl Acad Sci U S A and gene expression. Trends Plant Sci
51:786–794 20:614–621
8. Yang X-J, Seto E (2008) Lysine acetylation: 17. Pandey R, Müller A, Napoli CA et al (2002)
codified crosstalk with other posttranslational Analysis of histone acetyltransferase and his-
modifications. Mol Cell 31:449–461 tone deacetylase families of Arabidopsis thali-
9. Kleff S, Andrulis ED, Anderson CW et al ana suggests functional diversification of
(1995) Identification of a gene encoding a chromatin modification among multicellular
eukaryotes. Nucleic Acids Res 30:5036–5055
18. König A, Hartl M, Pham PA et al (2014) The histone deacetylase substrate proteins in Ara-
Arabidopsis class II sirtuin is a lysine deacety- bidopsis. Mol Syst Biol 13:949
lase and interacts with mitochondrial energy 30. Füßl M, Lassowskat I, Née G et al (2018)
metabolism. Plant Physiol 164:1401–1414 Beyond histones: new substrate proteins of
19. Choudhary C, Mann M (2010) Decoding sig- lysine deacetylases in Arabidopsis nuclei.
nalling networks by mass spectrometry-based Front Plant Sci 9:461
proteomics. Nat Rev Mol Cell Biol 31. He D, Wang Q, Li M et al (2016) Global
11:427–439 proteome analyses of lysine acetylation and suc-
20. Zhang K, Zheng S, Yang JS et al (2013) Com- cinylation reveal the widespread involvement of
prehensive profiling of protein lysine acetyla- both modification in metabolism in the
tion in Escherichia coli. J Proteome Res embryo of germinating rice seed. J Proteome
12:844–851 Res 15:879–890
21. Henriksen P, Wagner SA, Weinert BT et al 32. Smith-Hammond CL, Hoyos E, Miernyk JA
(2012) Proteome-wide analysis of lysine acety- (2014) The pea seedling mitochondrial N-
lation suggests its broad regulatory scope in ε-lysine acetylome. Mitochondrion
Saccharomyces cerevisiae. Mol Cell Proteomics 19:154–165
11:1510–1522 33. Xiong Y, Peng X, Cheng Z et al (2016) A
22. Lundby A, Lage K, Weinert B et al (2012) comprehensive catalog of the lysine-acetylation
Proteomic analysis of lysine acetylation sites in targets in rice (Oryza sativa) based on proteo-
rat tissues reveals organ specificity and subcel- mic analyses. J Proteome 138:20–29
lular patterns. Cell Rep 2:419–431 34. Zhang Y, Song L, Liang W et al (2016) Com-
23. Weinert BT, Wagner SA, Horn H et al (2011) prehensive profiling of lysine acetylproteome
Proteome-wide mapping of the drosophila analysis reveals diverse functions of lysine acet-
acetylome demonstrates a high degree of con- ylation in common wheat. Sci Rep 6:21069
servation of lysine acetylation. Sci Signal 4:ra48 35. Weinert BT, Iesmantavicius V, Moustafa T et al
24. Svinkina T, Gu H, Silva JC et al (2015) Deep, (2014) Acetylation dynamics and stoichiome-
quantitative coverage of the lysine acetylome try in Saccharomyces cerevisiae. Mol Syst Biol
using novel anti-acetyl-lysine antibodies and 10:716
an optimized proteomic workflow. Mol Cell 36. Lassowskat I, Hartl M, Hosp F et al (2017)
Proteomics 14:2429–2440 Dimethyl-labeling-based quantification of the
25. Zhou H, Finkemeier I, Guan W et al (2018) lysine acetylome and proteome of plants. In:
Oxidative stress-triggered interactions between Fernie AR, Bauwe H, Weber APM (eds) Pho-
the succinyl- and acetyl-proteomes of rice torespiration. Springer New York, New York,
leaves. Plant Cell Environ 41:1139–1153 NY, pp 65–81
26. Walley JW, Shen Z, McReynolds MR et al 37. Wiśniewski JR, Zougman A, Nagaraj N et al
(2018) Fungal-induced protein hyperacetyla- (2009) Universal sample preparation method
tion in maize identified by acetylome profiling. for proteome analysis. Nat Methods
Proc Natl Acad Sci U S A 115:210–215 6:359–362
27. Finkemeier I, Laxa M, Miguet L et al (2011) 38. Boersema PJ, Raijmakers R, Lemeer S et al
Proteins of diverse function and subcellular (2009) Multiplex peptide stable isotope
location are lysine acetylated in Arabidopsis. dimethyl labeling for quantitative proteomics.
Plant Physiol 155:1779–1790 Nat Protoc 4:484–494
28. Wu X, Oh M-H, Schwarz EM et al (2011) 39. Tyanova S, Temu T, Cox J (2016) The Max-
Lysine acetylation is a widespread protein mod- Quant computational platform for mass
ification for diverse proteins in Arabidopsis. spectrometry-based shotgun proteomics. Nat
Plant Physiol 155:1769–1778 Protoc 11:2301–2319
29. Hartl M, Füßl M, Boersema PJ et al (2017)
Lysine acetylome profiling uncovers novel
Chapter 19
A Versatile Workflow for the Identification

of Protein–Protein Interactions Using GFP-Trap Beads
and Mass Spectrometry-Based Label-Free Quantification
Guillaume Née, Priyadarshini Tilak, and Iris Finkemeier
Abstract
Protein functions often rely on protein–protein interactions. Hence, knowledge about the protein interac-
tion network is essential for an understanding of protein functions and plant physiology. A major challenge
of the postgenomic era is the mapping of protein–protein interaction networks. This chapter describes a
mass spectrometry-based label-free quantification approach to identify in vivo protein interaction networks.
The procedure starts with the extraction of intact protein complexes from transgenic plants expressing the
protein of interest fused to a GFP-Tag (bait-GFP), as well as plants expressing a free GFP as background
control. Enrichment of the GFP-tagged protein together with its interaction partners, as well as the free
GFP, is performed by immunoaffinity purification. The pull-down quality can be evaluated by simple
gel-based techniques. In parallel, the captured proteins are trypsin-digested and relatively quantified by
label-free mass spectrometry-based quantification. The relative quantification approach largely relies on the
normalization of protein abundances of background-binding proteins, which occur in both bait-GFP and
free GFP pull-downs. Therefore, relative quantification of the protein pull-down is superior over methods
that solely rely on protein identifications and removal of often copurified high-abundance proteins from the
bait-GFP pull-downs, which might remove real interaction partners. A further strength of this method is
that it can be applied to any soluble GFP-tagged protein.
Key words Protein–protein interactions, GFP-trap, Label-free quantitative proteomics
1 Introduction
The harmonious regulation of plant development and metabolism is

achieved by a complex network of gene products [1]. Considerable
effort in the last two decades has led to the sequencing, assembly,
and annotation of plant genomes [1–3]. However, the mere identi-
fication of genes only provides little information about the molecu-
lar functions of the encoded proteins. In the current postgenomic
era, mapping of protein–protein interaction networks is a major
Guillaume Née and Priyadarshini Tilak have contributed equally to this work.
257
258 Guillaume Née et al.
Isolation of proteins Statistical analysis
3
SN SN
Bait protein
2.5
complexes
GFP protein
-log10 p-value
2
complexes
1.5
Bait enrichment
1
SN SN
0.5
Non-bound
proteins
0
Indirect interaction -4 -3 -2 -1 0 1 2 3 4
log2 fold enrichment
Direct interaction Interacting
Bait protein proteins
P P
Contaminant proteins GFP Contaminant
proteins
In silico data processing
Washes
SN SN
No proteins
Enriched bait
protein complexes
LC MS/MS analysis
P P
Contaminant proteins
MS2
Intensity
Elution MS1
Interacting proteins
SN SN
Intensity
GFP GFP
Relative abundance
bait
m/z
Contaminant
proteins m/z
P P
Elution Time
optional
Pull-down procedure analysis Tryptic digest and desalting
EXP CTR EXP CTR
ML
ML
ML
ML
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
C18 Stage tip
Trp
Total staining Western Blot

1. Input / 2. Non-bound / 3. Last wash / 4. Elution Desalted ready
Eluted Tryptic to measure
Abundant proteins GFP Bait GFP proteins peptides samples
Interacting proteins Contaminant proteins
Fig. 1 Workflow for the mass spectrometry-based label-free quantification of protein interaction partners of
GFP-tagged proteins. Proteins are extracted from tissue under native conditions. The GFP-tagged protein and
possible interaction partners (EXP) are enriched on an agarose-coupled GFP antibody matrix. Next to the
Quantification of Protein-Protein Interactions with GFP-Trap Beads 259
challenge in plant biology [4]. This is particularly relevant for study-

ing proteins, which require the formation of a protein complex [5–
8], proteins that associate with regulatory subunits [9], and protein
localization [10, 11]. Protein–protein interaction studies are also
essential to understand how plant metabolic fluxes can be controlled
through the formation of metabolons [12]. Moreover, many pro-
tein activities are regulated by posttranslational modifications
(PTM); therefore, it is necessary to identify the underlying modify-
ing enzymes, which often only transiently interact with these pro-
teins [13, 14]. Hence, mapping the physical protein interaction
network brings a higher level of understanding than solely the
observation of protein spatial and temporal co-occurrences
[15]. Moreover, an unneglectable part of the genes (around 13%
in Arabidopsis) is annotated with an unknown function
[16, 17]. Knowing physical partners of the encoded proteins can
lead to a substantial advancement in their characterization [18, 19].
Several molecular or biochemical techniques are available to
study protein–protein interactions such as (1) two-hybrid systems
[20], (2) bimolecular fluorescence complementation [21], (3) blue
native-PAGE gel electrophoresis [22], (4) cocrystallization [23],
(5) size exclusion chromatography [24], (6) affinity purification
[25, 26], and (7) immunoprecipitations [27], among others.
While the first two are limited to analyze the direct interaction of
only two proteins, the latter require complementing of these tech-
niques with quantitative mass spectrometry (MS), which will allow
the identification of protein–protein interaction networks at a large
scale [28, 29], and to distinguish background copurifying proteins
from real interactors. Since antibodies directed against the protein
of interest are not always available, it is often more easily achievable
to express the protein of interest as tagged-protein (e.g., GFP-tag,
Myc-tag, Flag-tag) in the living organism, for which immobilized
antibody matrices are available [30, 31].
In this chapter, we describe a workflow (Fig. 1) to identify
native protein interaction networks from Arabidopsis expressing
GFP-tagged bait proteins.
ä
Fig. 1 (continued) proteins of interest, background-binding proteins (grey and black dots) are copurified.
These unspecific binders are important for the correct normalization of the label-free quantification. After
washing, the bound proteins are eluted by acidic denaturation. The quality of the pull-down can be assessed
by gel electrophoresis and Western blotting. The eluted protein samples can be directly processed by tryptic
digestion followed by a C18 desalting steps. Peptide samples are analyzed by LC-MS/MS and the acquired
spectra are processed by computational analysis using MaxQuant [32, 33]. Label-free quantification values
(LFQ) are used to calculate the fold-enrichment and a t-test p-value from the biological replicates. Data can be
presented as a volcano plot, in which the interacting protein (blue dots), which satisfy quality thresholds (e.g.,
adjusted p-value <0.05 and at least twofold enrichment), appear on the upper right part, while the unspecific
GFP-agarose interactors (black dots) are located outside the threshold limits. SN supernatant, P pellet, ML
molecular ladder, Trp trypsin
This procedure uses plant tissue expressing either the bait

protein fused to a GFP-tag or GFP only as starting material. After
protein extraction under native conditions, free GFP or the bait-
GFP protein is enriched by immunoaffinity purification together
with their putative interaction partners and background-binding
proteins. This setup enables a comparative mass spectrometry-
based label-free quantification, which can discriminate unspecific
protein interactions from real interactors [18, 19], and which
should be preferred over using blacklists of the so-called nonspecific
proteins [26]. A commercially available GFP-matrix coupled with a
single-domain antibody from camelids termed VHH is used for the
enrichment of the GFP-tagged proteins [31]. Trapped protein
complexes are then eluted from the beads and subjected to trypsin
digestion. The resulting peptides are analyzed by LC-MS/MS
analysis, and proteins are identified and quantified by label-free
quantification using the MaxQuant software [32, 33]. Further pro-
cessing of the data using the Perseus computational platform allows
to evaluate the reproducibility of the data, and to separate candidate
interacting protein from unspecific interactions by calculation of a
fold-enrichment assorted by a p-value [34]. In addition, this work-
flow also includes a fast and easy gel-based procedure to evaluate
the quality of the pull-down. Although this chapter is described for
leaf tissue, it can be applied to any other plant tissue or organelle
preparation.
In Arabidopsis, we previously applied this work-flow to success-
fully identify interacting proteins of a mitochondrial sirtuin-type
deacetylase and of a seed protein with an unknown molecular
function leading to substantial advances in the understanding of
their respective role in plant biology [18, 19].
2 Materials
Prepare all the buffers using ultrapure water (double distilled or

MilliQ water), unless otherwise specified.
2.1 Plant Tissue 1. Leaf tissue harvested from Arabidopsis thaliana plants expres-
sing the bait-GFP protein and leaf tissue harvested from Ara-
bidopsis thaliana plant lines expressing a free GFP (see Note 1).
2.2 Immunoaffinity 1. Extraction buffer: 50 mM Tris–HCl, pH 7.5, 150 mM NaCl,

Pull-Down Procedure 10 % (v/v) glycerol, 2 mM EDTA, 5 mM dithiothreitol (DTT)
(see Note 2), 1 % (v/v) Triton-X100, protease inhibitor cock-
tail for plant cell and tissue extracts Sigma P9599 at a dilution
1:100 (v/v).
2. Equilibration/wash buffer: 50 mM Tris–HCl, pH 7.5,

150 mM NaCl, 10 % (v/v) glycerol, 2 mM EDTA, 5 mM DTT.
3. Elution buffer: 0.2 % trifluoroacetic acid (TFA) (v/v).
4. Neutralization buffer: 0.1 M Tris–HCl, pH 8.0, 1 mM CaCl2.
5. ChromoTek GFP-Trap® A beads (see Note 3).
6. Pierce 660 nm Protein Assay (see Note 4).
2.3 SDS– 1. Resolving gel buffer: 1.5 M Tris–HCl, pH 8.8.

Polyacrylamide Gel 2. Stacking gel buffer: 0.5 M Tris–HCl, pH 6.8.
Electrophoresis
3. SDS 10 % (v/v).
4. Ammonium persulfate (APS) 10 % (v/v).
5. 30% acrylamide–bisacrylamide solution, 37.5:1 (see Note 5).
6. N,N,N,N0 -Tetramethylethylenediamine (TEMED).
7. 5 protein loading buffer: 0.25 M Tris–HCl, pH 6.8, 25 %
glycerol, 10 % SDS, 0.1 % bromophenol blue, 12.5 % (v/v)
mercaptoethanol.
8. 10 running buffer: 1 % SDS, 1.92 M glycine, 0.25 M Tris; the
pH of the buffer should be 8.3, no pH adjustment is required.
9. Electrophoresis apparatus such as Mini-PROTEAN® contain-
ing spacer plates with 1.0 mm integrated spacers, casting sys-
tem (Bio-Rad).
10. Gel staining solution: Oriole™ Fluorescent Gel Stain
(Bio-Rad).
2.4 Western Blot 1. Transfer buffer: 25 mM Tris, 190 mM glycine, 20 % methanol,

0.1 % SDS. the pH of the buffer should be 8.3, no pH adjust-
ment is required.
2. 10 Tris-buffered saline (TBS): 1.5 M NaCl, 0.1 M Tris–HCl,
pH 7.4.
3. TBS containing 0.05 % (v/v) Tween-20 (TBST).
4. Blocking solution: 5 % (w/v) Skim milk powder (SERVA Elec-
trophoresis) prepared in TBS.
5. Nitrocellulose membrane (e.g., Amersham Protran 0.45 μM
nitrocellulose Western blotting membranes).
6. Grade 3MM cellulose chromatography paper (GE Healthcare
Life Science Whatman™).
7. Clean plastic or glass container.
8. Anti-GFP antibody raised in mouse (Roche #11814460001).
9. Secondary anti-mouse HRP-linked antibody (Sigma A3562).
10. Rocking shaker (VWR® Mini Blot Mixer).
11. Ponceau S staining solution: 0.1 % (w/v) Ponceau S dissolved

in a 5 % (v/v) acetic acid solution.
12. ECL detection kit for Western blot: Amersham ECL Prime
Western Blotting Detection Reagent kit (GE Healthcare Life
Science).
13. Semi-Dry electrophoretic transfer apparatus (Biometra).
14. Chemiluminescence imaging system: ChemiDoc™ MP Imag-
ing system (Bio-Rad).
2.5 Mass 1. Ammonium bicarbonate (ABC) solution: 0.05 M NH4HCO3.

Spectrometry 2. Chloroacetamide (CAA) stock solution: 550 mM CAA
prepared in ABC solution.
3. Dithiothreitol (DTT) stock solution: 500 mM DTT.
4. Proteomics-grade trypsin stock solution: 1 μg/μL in
0.1 M HCL.
5. 10 % formic acid.
6. 100 % methanol.
7. Buffer A: 0.5 % formic acid (FA) in H2O.
8. Buffer B: 80 % acetonitrile (ACN), 0.5 % FA.
9. Buffer A∗: 2 % ACN, 0.1 % TFA.
10. A high-resolution mass spectrometer for LC-MS/MS analysis
(e.g., QExactive HF coupled to a nano-liquid chromatography,
Thermo Scientific).
11. Software: Max-Quant [32, 33] and Perseus [34] (see Note 6).
3 Methods
All steps are performed at RT, unless specified otherwise.
3.1 Pull-Down 1. Homogenize around 2 g of leaf tissue per sample in liquid

Procedure nitrogen (see Note 7).
3.1.1 Extraction of Intact 2. Transfer the frozen powder into a prechilled tube and add cold
Protein Complexes extraction/wash buffer at a ratio of 1:3 [g: mL]. Make sure that
all the tissue powder is completely submerged by inverting
the tube.
3. Incubate sample for 30–60 min at 4 C on a test tube rotator
(12 rpm) and centrifuge for 20 min at 4 C and 18,000 g.
4. Transfer supernatant to a new tube, and if necessary, repeat
centrifugation step until the supernatant is clear (see Note 8).
5. Perform protein quantification for each sample using the Pierce
660 nm Protein Assay kit (see Note 9).
6. Adjust all samples to a similar protein concentration by diluting

samples with extraction buffer (see Note 10). The adjusted
protein samples are named “Input fractions”.
3.1.2 GFP-Trap 1. Add 25 μL of commercially available GFP-Trap® A beads (50 %

a Capture of Protein slurry) to 500 μL of ice-cold equilibration buffer in a 2 mL low
Complexes protein binding tube. Prepare one tube per sample and invert
tubes three times (see Note 11).
2. Centrifuge beads for 1 min at 4 C, 1000 g and remove
supernatant (see Note 12).
3. Resuspend beads in 500 μL of cold equilibration buffer and
repeat step 2.
4. Add a volume corresponding to 1–5 mg total protein (maxi-
mum volume 1.5 mL) of adjusted samples (Input fractions
from Subheading 3.1.1) to each tube of equilibrated
GFP-trap beads.
5. Incubate for 2–4 h at 4 C on test tube rotator at 12 rpm (see
Note 13).
6. Centrifuge at 4 C and 1000 g for 1 min and remove
supernatant (named “nonbound” fraction).
7. Wash the beads by slowly adding 500 μL of ice-cold equilibra-
tion/wash buffer to the GFP-trap beads and carefully invert
the tube to resuspend the beads. Do not vortex.
8. Centrifuge at 4 C and 1000 g for 1 min and discard
supernatant.
9. Repeat steps 7 and 8 one to four times to eliminate most but
not all the background-binding proteins. Keep 50 μL of the last
wash for the downstream gel analysis of the pull-down quality
(see Note 14).
10. Elute bait-GFP containing protein complexes by addition of
35 μL of elution buffer to the matrix and wait for about 5 min,
centrifuge for 1 min at room temperature (RT), 1000 g and
transfer supernatant to a new tube. Directly neutralize elution
fractions by adding 35 μL of neutralization buffer.
11. Repeat step 9 and combine eluates (see Note 15).
3.2 Validation 1. Prepare 10 mL of a 10 % SDS-PAGE resolving gel solution, by

of the Pull-Down mixing 2.5 mL of resolving buffer, 3.33 mL of a 30% acrylam-
Procedure ide–bisacrylamide (37.5:1) solution, 4 mL of distilled water,
100 μL of a 10 % SDS solution, 66 μL of a 10 % APS solution,
3.2.1 SDS-PAGE and 14 μL of TEMED.
2. Cast two gels using clean Mini-PROTEAN® Spacer Plates with
1.0 mm integrated spacers by pouring 4.5 mL resolving gel
solution per gel. Make sure that no air bubbles are trapped in
the resolving gel solution.
3. Gently overlay with isopropanol (see Note 16).

4. Let the gel polymerize for about 30–40 min.
5. Prepare 2.2 mL of 4.2% of stacking gel solution by mixing
550 μL of stacking gel buffer, 308 μL of acrylamide–bisacryla-
mide (37.5:1) solution, 1.3 mL of distilled water, 22 μL of a
10 % SDS solution, 11 μL of a 10 % APS solution, and 9 μL of
TEMED.
6. Remove isopropanol by inverting the gel cassette. Pour stack-
ing gel solution on the top of the polymerized resolving gel and
immediately insert a 10-well gel comb without introducing air
bubbles. Let it polymerize for about 60 min (see Note 17).
7. Prepare samples in 5 protein loading buffer. Prepare 20 μg of
input fraction, 20 μg of nonbound fraction, 20 μL of the last
wash, and 20 μL of the elution fraction. Heat samples at 95 C
for 5 min to denature proteins.
8. Place the gels in the electrophoresis tank and fill with 1x SDS
running buffer. Load two identical gels with either GFP-bait or
control samples.
9. Perform electrophoresis at 15 mA per gel until the sample has
entered the gel. Then apply 25 mA per gel until the dye front
reaches the bottom of the gel (see Note 18).
10. Following electrophoresis, open the gel plates using a spatula
and recover the resolving gels.
11. Stain one gel with Oriole™ Fluorescent Gel Stain (Bio-Rad)
for 90 min under gentle agitation (see Note 19). Keep the
second gel for the Western blot analysis using an anti-GFP
antibody.
3.2.2 Western Blot 1. Rinse the gel with water and transfer it to a container filled with
Analysis Western blot transfer buffer.
2. Cut a nitrocellulose membrane to the size of the gel and
immerse it with Western blot transfer buffer (see Note 20).
3. Soak four to eight pieces of Whatman filter paper (same size as
the nitrocellulose membrane) in the transfer buffer and place
them onto the anode side of the transfer apparatus. Position
the nitrocellulose membrane onto the Whatman paper. Place
the gel on top of the membrane and place four to eight pieces
of Whatman filter paper soaked in the transfer buffer on top (see
Note 21).
4. Assemble the blotting system and transfer proteins at 1.2 mA
per cm2 of membrane for 1 h.
5. After transfer, carefully recover the membrane and place it,
protein face up, into a clean plastic container.
6. Assess the quality of the transfer by immersing the membrane

in a Ponceau S staining solution for 2–5 min.
7. Remove Ponceau S staining by washing the membrane with
TBS buffer.
8. Cover membrane with blocking solution. Incubate for 1 h with
gentle agitation.
9. Remove the blocking solution and wash the membrane with
TBS-T for 10 min.
10. Dilute the primary anti-GFP antibody 1:2500 in TBS-T.
11. Incubate the membrane with the primary antibody for 1 h at
RT with gentle agitation.
12. Remove primary antibody solution and wash three times the
membrane with TBS-T for 10 min.
13. Dilute the secondary antibody cross-linked to horseradish per-
oxidase (1:20000) in TBS-T.
14. Incubate the membrane with the secondary antibody for 1 h at
RT with gentle agitation.
15. Remove secondary antibody solution and wash the membrane
three times with TBS-T for 10 min.
16. Incubate the membrane with the Amersham ECL Prime West-
ern Blotting Detection Reagent or similar (see Note 22).
17. Visualize the GFP-tagged protein of interest and the free GFP
with the ChemiDoc™ MP system using the accumulation
mode from 30 s to 5 min.
3.3 Sample 1. Add 1 μL of a 500 mM DTT stock solution to 100 μL of each

Preparation for Mass elution sample to reach a final concentration of 5 mM DTT.
Spectrometry Analysis Incubate samples for 30 min in the dark.
2. Add 3 μL of a 550 mM CAA stock solution to each sample to
reach a final concentration of 15 mM CAA. Incubate sample
for 1 h in the dark (see Note 23).
3. Quench excess CAA by addition of 2.5 μL of a 500 mM DTT
stock solution to reach a final concentration of 12 mM DTT
followed by incubation for 10 min in the dark.
4. Add 440 μL of ABC buffer (0.05 M NH4HCO3) for 5
dilution.
5. Add trypsin at a trypsin to protein concentration ratio of 1:100
(min. conc. 5 ng/μL) and incubate at 37 C for 16 h.
6. Stop the digestion by adding 55 μL of a 10 % formic acid
solution to reach a final concentration of 1 %.
7. Load a 200 μL pipette tip with a C18 matrix to prepare a STop
and GO Extraction Tip (STAGE-Tip) (see Note 24).
8. Stepwise equilibrate matrix with 60 μL methanol, 60 μL of

buffer B, and 60 μL of buffer A.
9. Slowly load peptide sample onto the C18 matrix.
10. Wash twice with 30 μL of buffer A (see Note 25).
11. Elute peptides with 20 μL of buffer B twice.
12. Evaporate the peptide-containing elutes using a vacuum
concentrator.
13. Resuspend sample in 10–12 μL of C18 buffer A∗ for LC-MS/
MS analysis.
3.4 LC-MS/MS Data 1. Separate peptides in a stepped gradient of 0–55 % solvent B

Acquisition, Data (80 % ACN, 0.1 % FA) at a flow rate of 300 μL/min for 60 min
Processing, followed by wash steps (see Note 26).
and Statistical 2. Acquire mass spectra in the Orbitrap analyzer with a Top15
Analysis method and a resolution of 120,000 in MS1, and 15,000 in
MS2. Use a scanning mass range from 300 to 1750 m/z. Set
collision energy to 25 (see Note 27). Exclude peptides with a
charge of +1, >+6 and peptides which are not assigned to any
charge state from fragmentation. Accumulate ions to a target
value of 3 106 (MS1) and 5 104 (MS2).
3. Process the raw spectrum files with label-free quantification
enabled in MaxQuant and search against the Arabidopsis
TAIR10 database including the sequence from the
GFP-tagged protein (http://www.arabidopsis.org). Use preset
standard search settings of MaxQuant. Activate the “match
between runs” option (see Note 28).
4. Perform quantitative statistical analysis of the protein groups
table using Perseus or similar software.
5. Define experiment groups and filter proteins for “two in at least
one group”, to increase the stringency of the dataset.
6. Use log2 transformed LFQ intensity values to calculate the
relative protein abundance between samples and perform a
statistical analysis (e.g., two-sample t-test).
4 Notes
1. The quality of the starting transgenic plant material is a key

factor to successfully identify protein interactions from plant
tissues. Hence, several aspects must be considered: (1) We
recommend evaluating if the fusion protein construct is cor-
rectly expressed in the tissue of interest. (2) The levels and the
spatiotemporal expression pattern of a protein are important
features of its function. Expressing the fusion construct under
its native promoter is preferable over the use of a constitutive
promoter. (3) In some cases, it is possible that the presence of a

fused tag to the protein of interest might alter its activity or
interactions. Therefore, it is recommended to validate the
functionality of the fusion protein by genetic complementation
assays whenever possible. Consequently, a functionally com-
pleted line expressing the protein of interest under its native
promoter can be considered as the most suitable starting plant
material. (4) The method described here aims to discriminate
interacting proteins from most of the proteins that are nonspe-
cifically copurified with the bead matrix. It relies on the label-
free quantitative mass spectrometry analysis of the protein
abundances of the control pull-down versus the pull-down
from the bait-sample (Fig. 1). Thus, for a reliable statistical
analysis, the GFP stoichiometry matters. Therefore, care must
be taken during selection of a suitable line expressing a free
GFP for background control. This plant line should ideally
accumulate similar amounts of the GFP protein in comparison
to the bait-protein plant line. A Western blot analysis for the
detection of the GFP protein will help in selecting a suitable
control line. (5) Protein subcellular localization is a determi-
nant of the protein interaction network. Consequently, if the
protein of interest is targeted to organelles, it is recommended
to include the same or similar targeting amino acid sequence
fused to the free GFP. (6) In a minimal experimental setup, also
tissue from nontransgenic wild-type plants can be included as
background control. However, this might increase the number
of false positive enriched proteins. Anyhow, the biological data
interpretation must be done carefully to extract a meaningful
conclusion, and other methods will require the confirmation of
the protein–protein interactions as listed in the introduction.
2. DTT is used to reduce disulfide bonds of proteins. It is recom-
mended to prepare DTT solutions freshly.
3. GFP-Trap® A beads (ChromoTek) are recommended for this
procedure. This commercial matrix offers the advantage to
limit contamination of the sample with antibody chains at the
elution step and therefore improves MS identification of inter-
acting proteins.
4. This protein quantification kit is fully compatible with the
extraction procedure presented herein. If you intend to modify
the extraction buffer or use another protein quantification
method, ensure that all the components of the extraction
buffer are compatible with the quantification assay.
5. Buy commercially available bottle of acrylamide–bisacrylamide
solution, 37.5:1 and store at 4 C.
6. Download MaxQuant and Perseus from http://www.
maxquant.org/.
7. Freshly harvested tissues are recommended, but tissues stored

at 80 C might also be suitable, depending on the stability of
the protein–protein interactions. To allow for a statistical anal-
ysis of the data, the bait-protein and the control pull-downs
need to be carried out in at least biological triplicates each.
Cool down mortars and pestles before homogenization to
avoid protein degradation.
8. Be careful to avoid the lipid layer that could be present on top
of the aqueous phase.
9. At this point, it is possible to check if the protein of interest is
present in the native extract by Western blot analysis. If the
protein is not detected, you may need to optimize the compo-
sition of the extraction buffer (varying the glycerol, salt, and
reductant concentration might help).
10. The sample protein concentrations should be adjusted to
between 1 and 5 mg mL1 depending on the abundance of
the bait-GFP protein in the extract. Protein concentrations
have to be kept similar between all samples to avoid loading
bias during the pull-down.
11. A volume of 25 μL of GFP-Trap® A beads is, in principle,
sufficient to immobilize 0.5 nmoles of GFP-tagged protein.
12. To avoid loss of beads, first carefully remove around 75% of the
supernatant with a 1 mL pipette. Carefully remove the remain-
ing supernatant using a gel-loader tip.
13. Some pull-down experiments may require overnight incuba-
tion, especially in the case of very low in vivo abundance of the
bait-GFP protein.
14. For label-free quantification analysis, it is important to not
wash away all the background-binding proteins. Absence of
background proteins makes normalization and statistical anal-
ysis impossible. You might have to optimize the number of
washing steps in case you see too much or too little
background.
15. Eluted protein samples can be stored at 20 C until further
processing.
16. Adding isopropanol to cover the gel helps to get an even gel
surface and helps removing possible air bubbles trapped in
between the glass plates.
17. It is important to let the gel polymerize well for the best
resolution. However, do not let the gel polymerize for more
than 2 h to avoid unwanted drying. Covering the gel-casting
chamber with a wet paper towel can preserve humidity if the
gels will not be used immediately.
18. Gel electrophoresis is carried out in a Mini-PROTEAN®

(Bio-Rad) apparatus for example.
19. No fixation or destaining steps are required. The Oriole™
Fluorescent Gel Stain (Bio-Rad) can detect nanograms of pro-
tein in the elution fraction. Proteins can be visualized using the
standard setting for ethidium bromide of the ChemiDoc™ MP
(Bio-Rad) imager for example.
20. Nitrocellulose membrane should be handled only at the cor-
ners with adapted flat-ended forceps.
21. Gently roll over the sandwich with a clean glass tube to remove
any air bubbles, which may have formed within the blotting
sandwich.
22. Mix an equal volume of solution A and B from the Amersham
ECL Prime Western Blotting Detection Reagent kit and use
immediately. Make sure that all the membrane is evenly cov-
ered with the reagent. For an even distribution of the reagent,
you can place a Parafilm® sheet on top of the membrane.
23. Complete cysteine alkylation is essential to optimize the iden-
tification rate. The pH of the reaction mixture must be around
eight to ensure that all cysteine residues are deprotonated and
limit unspecific alkylation by CAA. You can check the pH of the
reaction mixtures by applying 5 μL from each sample onto
using colorimetric pH paper.
24. Sample processing via STAGE tips can be achieved by applying
pressure onto the tip either using a syringe or by centrifugation
at 1000 g (this requires a STAGE tip centrifuge (Sonation
GmbH, Biberach) or an adaptor to fit the tip onto a 1.5 mL
reaction tube).
25. If your sample will not be analyzed immediately, do not elute
the peptides and store the STAGE tips at 20 C. Peptides are
more stable when bound to the C18 matrix during storage in
comparison to storage in solution.
26. Peptides are separated using in-house packed C18 fused silica
emitters (75 μm inner diameter, SilicaTip™ PicoTip™ Emit-
ter, New Objective), which are cut to about 17 cm in length
and heated to 50 C in a column oven.
27. Any high-resolution nano-UHPLC-MS/MS setup can be used
for sample analyses. We analyze samples using an EASY-nLC™
1200 coupled to a Q Exactive™ HF Hybrid Quadrupole-
Orbitrap Mass Spectrometer (Thermo Fisher Scientific).
28. The “match between runs” option allows for using the MS1
precursor intensity for quantification purposes in samples
where the MS2 identification is missing. In such cases, the
precursor identity can be matched to another sample where
an MS2 spectrum has been acquired and where the MS1 mass
is present at the same elution time window.
Acknowledgments
We gratefully acknowledge the Deutsche Forschungsgemeinschaft

(DFG) for financial support through the project grants NE2296/1-
1 and FI1655/6-1, and the infrastructure grant INST211/744-1.
References
1. Bolger M, Schwacke R, Gundlach H et al 12. Zhang Y, Beard KF, Swart C (2017) Protein-
(2017) From plant genomes to phenotypes. J protein interactions and metabolite channel-
Biotechnol 261:46–52 ling in the plant tricarboxylic acid cycle. Nat
2. Schatz MC, Witkowski J, Mccombie WR Commun 8:15212
(2012) Current challenges in de novo plant 13. Liebert MA, Rouhier N, Villarejo A et al
genome assembly. Genome Biol 13:2–7 (2005) Identification of plant glutaredoxin tar-
3. Hümann U, Woetzel S, Madrid-Herrero E et al gets. Antioxid Redox Signal 7:919–929
(2017) Improving and correcting the contigu- 14. Hao Y, Wang H, Qiao S et al (2016) Histone
ity of long-read genome assemblies of three deacetylase HDA6 enhances brassinosteroid
plant species using optical mapping and chro- signaling by inhibiting the BIN2 kinase. Proc
mosome conformation capture data. Genome Natl Acad Sci U S A 113:2–7
Res 27:778–786 15. Jones AM, Xuan Y, Xu M et al (2014) Border
4. Bontinck M, Van Leene J, Gadeyne A et al control — a membrane-linked interactome of
(2018) Recent trends in plant protein complex Arabidopsis. Science 1:711–717
analysis in a developmental context. Front 16. Krishnakumar V, Hanlon MR, Contrino S et al
Plant Sci 9:1–14 (2015) Araport: the Arabidopsis information
5. Sako K, Yanagawa Y, Kanai T et al (2014) portal. Nucleic Acids Res 43:D1003–D1009
Proteomic analysis of the 26S proteasome 17. Luhua S, Hegie A, Suzuki N et al (2013) Link-
reveals its direct interaction with transit pep- ing genes of unknown function with abiotic
tides of plastid protein precursors for their deg- stress responses by high-throughput pheno-
radation. J Proteome Res 13:3223–3230 type screening. Physiol Plant 148:322–333
6. Dose A, Sindlinger J, Bierlmeier J et al (2016) 18. Née G, Kramer K, Nakabayashi K et al (2017)
Interrogating substrate selectivity and compo- DELAY of GERMINATION1 requires PP2C
sition of endogenous histone deacetylase com- phosphatases of the ABA signalling pathway to
plexes with chemical probes. Angew Chem Int control seed dormancy. Nat Commun 8:1–8
Ed 55:1192–1195 19. König A, Hartl M, Pham PA et al (2014) The
7. Inoshima MM, Ikuchi KK (2015) Chemical Arabidopsis class II sirtuin is a lysine deacety-
tools for probing histone deacetylase (HDAC) lase and interacts with mitochondrial energy
activity. Anal Sci 31:287–292 metabolism. Plant Physiol 164:1401–1414
8. Kramer K, Finkemeier I, Humpf H et al (2016) 20. Trigg SA, Garza RM, MacWilliams A et al
The SAGA complex in the rice pathogen Fusar- (2017) CrY2H-seq: a massively multiplexed
ium fujikuroi: structure and functional charac- assay for deep-coverage interactome mapping.
terization. Mol Microbiol 102:951–974 Nat Methods 14:819–825
9. Gao X, Liu CZ, Li DD et al (2016) The Arabi- 21. Bock R (2016) Lighting the way to protein-
dopsis KIN βγ subunit of the SnRK1 complex protein interactions: recommendations on best
regulates pollen hydration on the stigma by practices for bimolecular fluorescence comple-
mediating the level of reactive oxygen species mentation analyses. Plant Cell 28:1002–1008
in pollen. PLoS Genet 12(7):e1006228 22. Senkler J, Senkler M, Eubel H et al (2017) The
10. Editor D (2014) Dual-targeting of Arabidopsis mitochondrial complexome of Arabidopsis
chloroplasts and peroxisomes involves interac- thaliana. Plant J 89:1079–1092
tion with Trx m2 in the cytosol. Mol Plant 23. Cui B, Fang S, Xing Y et al (2015) Crystallo-
7:252–255 graphic analysis of the Arabidopsis thaliana
11. Garagounis C, Kostaki K, Hawkins TJ et al BAG5 – calmodulin protein complex research
(2017) Microcompartmentation of cytosolic communications. Acta Crystallogr F Struct
aldolase by interaction with the actin cytoskel- Biol Cryst Commun 71:870–875
eton in Arabidopsis. J Exp Bot 68:885–898 24. Bai Y (2015) Detecting protein-protein inter-
actions by gel filtration chromatography. In:
Protein-protein interactions: methods and than affinity purification mass spectrometry

applications, 2nd edn. Humana Press, (AP-MS). Mol Cell Proteomics 14:120–135
New York, NY, pp 223–232 30. Moree WJ, Mitchell M, Widger W et al (2016)
25. Ströher E, Dietz K (2006) Concepts and Observations on different resin strategies for
approaches towards understanding the cellular affinity purification mass spectrometry of a
redox proteome. Plant Biol 8:407–418 tagged protein. Anal Biochem 515:26–32
26. Van LJ, Eeckhout D, Cannoot B et al (2015) 31. Rothbauer U, Zolghadr K, Tillib S et al (2006)
An improved toolbox to unravel the plant cel- Targeting and tracing antigens in live cells with
lular machinery by tandem affinity purification fluorescent nanobodies. Nat Methods
of Arabidopsis protein complexes. Nat Protoc 3:887–889
10:169–187 32. Cox J, Mann M (2008) MaxQuant enables
27. Birkenbihl RP, Kracher B, Ross A et al (2018) high peptide identification rates, individualized
Principles and characteristics of the Arabidopsis p.p.b.-range mass accuracies and proteome-
WRKY regulatory network during early wide protein quantification. Nat Biotechnol
MAMP-triggered immunity. Plant J 26:1367–1372
96:487–502 33. Tyanova S, Temu T, Cox J (2016) The Max-
28. Hein MY, Hubner NC, Poser I et al (2015) A Quant computational platform for mass
human interactome in three quantitative spectrometry-based shotgun proteomics. Nat
dimensions organized by stoichiometries and Protoc 11:2301–2319
abundances. Cell 163:712–723 34. Tyanova S, Temu T, Sinitcyn P et al (2016) The
29. Keilhauer EC, Hein MY, Mann M (2015) Perseus computational platform for compre-
Accurate protein complex retrieval by affinity hensive analysis of (prote)omics data. Nat
enrichment mass spectrometry (AE-MS) rather Methods 13:731
Chapter 20
In Vivo Cross-Linking to Analyze Transient Protein–Protein

Interactions
Heidi Pertl-Obermeyer and Gerhard Obermeyer
Abstract
Cross-linking converts noncovalent interactions between proteins into covalent bonds. The now artificially
fused molecules are stable during purification steps (e.g., immunoprecipitation). In combination with a
variety of techniques, including Western blotting, mass spectrometry (MS), and bioinformatics, this
technology provides improved opportunities for modelling structural details of functional complexes in
living cells and protein–protein interaction networks. The presented strategy of immunoaffinity purification
and mass spectrometry (AP-MS) coupled with in vivo cross-linking can easily be adapted as a robust
workflow in interactome analyses of various species, also nonmodel organisms.
Key words In vivo cross-linking, Protein–protein interaction, Immunoaffinity purification, In-solu-

tion digestion, Mass spectrometry, Pollen
1 Introduction
Protein–protein interactions are essential for living cells. Several

methods have been developed to study these interactions at the
level of individual molecules and at the global scale. Among these,
coimmunoprecipitation (co-IP) is a very common key technique for
the analysis of protein–protein interactions, including interactions
of subunits within a protein complex [1]. By the use of an antibody,
which specifically recognizes one of the known components of the
multiprotein complex, the entire complex can be isolated using the
specific antibody covalently attached to magnetic beads or protein
A/G agarose beads. The immunoprecipitated samples can then be
analyzed by SDS-PAGE or subjected to in-solution trypsin diges-
tion [2] followed by mass spectrometry analysis to identify the
proteins. Unfortunately, this method is limited (1) by the require-
ment of a specific antibody for each protein of interest, (2) by the
cross-reactivity of the antibody that very often leads to the identifi-
cation of some false positives and, most important, (3) by the loss of
transient or very weak interactions during the purification
273
274 Heidi Pertl-Obermeyer and Gerhard Obermeyer
procedure. To overcome some of these limitations a chemically

cross-linker can be added to cell cultures to prevent the loss of
certain components of the protein complex and additionally, strin-
gent washing conditions can remove unspecifically bound proteins
and contaminants. Many protein cross-linking reagents are com-
mercially available and are defined by a minimum of two reactive
groups and classified according to their reactivity (primary amines,
e.g., lysine, or sulfhydryls, e.g., cysteine), membrane permeability,
water solubility, cleavability, and spacer length between the two
reactive groups [1]. Homobifunctional (identical reactive groups
at either end of a spacer arm), amine-reactive NHS
(N-hydroxysuccinimide) esters or imidates, and heterobifunctional
(different reactive groups at either end), amine-reactive, photoacti-
vatable phenyl azides are the most commonly used cross-linkers. An
extensive list of cross-linking reagents including a cross-linker selec-
tion tool is available from Thermo Fisher Scientific (https://www.
thermofisher.com). A very useful reagent for stabilizing protein–
protein interactions in vivo is formaldehyde [3]. It is an inexpensive
and commonly used protein cross-linker, which reacts primarily
with lysine residues and additionally, is membrane permeable,
which allows for rapid fixation of transient processes in living cells
and consequently, inactivation of many cellular proteases. Formal-
dehyde is also a short cross-linker (spacer arm length ~ 2 Å) that
allows the capture of protein associations in close proximity and
simultaneously, minimizes the risk of identification of false positives.
A major advantage of this technique is the reversibility of the
induced covalent bonds, especially when subsequent analysis of
the sample by mass spectrometry is planned.
To maximize the yield of chemically cross-linked peptides, an
optimization of the cross-linking reaction (i.e., concentration of
formaldehyde, incubation time) is mandatory (see Fig. 1). In this
chapter we present a simple, inexpensive and robust procedure for
cross-linking proteins in living organisms for mass spectrometry-
based analysis of protein–protein interactions.
2 Materials
Prepare all buffers or solutions using double distilled or ultrapure

water and analytical grade reagents. Perform all centrifugation steps
at 4 C (unless specified otherwise).
2.1 Plant Material All plant tissues or adequate plant cell cultures, for example, pollen
cultures from lily (Lilium longiflorum Thunb.), thale cress (Arabi-
dopsis thaliana), tobacco (Nicotiana tabacum), and tomato (Sola-
num lycopersicum) or seedlings as well as liquid cultures from
specific plant tissues or organs (explants from roots, stems, leaves,
flowers) as well as yeast cell cultures can be used for in vivo cross-
linking experiments.
Interactome Analysis in Living Cells 275
a cell lysates cell lysates cell lysates

kDa kDa kDa
250-- 250--
250--
150-- 150--
150--
100-- 100-- *
75-- 100--
75-- PM H+
75-- ATPase
50--
50--
50--
37-- 37-- 37--
14-3-3s
25-- 25-- 25--
M
% PFA, 20 min % PFA, 20 min % PFA, 20 min
b cell lysates cell lysates cell lysates

kDa kDa kDa
250-- 250--
250--
150-- 150-- 150--
100-- 100-- * 100--
PM H+
75-- 75-- 75-- ATPase
50-- 50-- 50--
37-- 37-- 37--
14-3-3s
25-- 25--
25--
M 0 10 20 30 45 60 min 0 10 20 30 45 60 min 0 10 20 30 45 60 min
0.5 % PFA 0.5 % PFA 0.5 % PFA
c cell lysates (0.5% PFA, 20 min) cell lysate (0.5 % PFA, 20 min) cell lysate (0.5 % PFA, 20 min)
kDa kDa kDa
250--
148-- 250-- 250--
150-- 150--
100-- * 100--
98-- 75-- 75--
64-- 50-- 50--
50-- 37-- 37--
14-3-3s
36--
25-- 25--
22--
M 15 5 10 15 20 30 min 15 5 10 15 20 30 min 5 10 15 20 30 min
RT 95 °C RT 95 °C 95 °C
Fig. 1 Optimization of in vivo cross-linking of lily pollen grains. (a) To determine the optimal PFA concentration
pollen grains from 5 flowers (~0.5 g fresh weight) were incubated in 0.0625–1% (w/v) PFA for 20 min at room
temperature. Cell lysate proteins were then separated by SDS-PAGE and visualized by Coomassie stain (left)
and analyzed by immunodetection with monoclonal anti-14-3-3 antibody (middle) and monoclonal anti-PM H+
ATPase antibody (right). (b) To define the optimal incubation time pollen grains were incubated for 10–60 min
in 0.5% (w/v) PFA at RT. Incubation times of 20 min and more resulted in band smearing, which is an
indication that the cross-linking is too excessive. 0 ¼ untreated pollen grains; 5 μl cell lysate were loaded per
gel lane; (a) and (b) 15 min denaturing of samples at RT prior SDS-PAGE. (c) To reverse PFA cross-links, cell
lysates (pollen grains in 0.5% (w/v) PFA for 20 min) were incubated in 6 sample loading buffer, incubated for
5–30 min at 95 C and analyzed by CBB (Coomassie Brilliant Blue R250) or Western Blot. A shift from 14-3-3
proteins cross-linked in complexes (∗) to 14-3-3 monomers and a losing of high molecular weight PM H+
ATPase complexes (arrow) are clearly visible. 5 μl cell lysate were loaded per gel lane [4]
2.2 Formaldehyde 1. Cross-link solution: 10% (w/v) paraformaldehyde (PFA), 0.5 g

Cross-Linking of Cells PFA in 5 ml culture medium (see Note 1). Always prepare
fresh!
2. Stop buffer: 1.25 M glycine stock solution, weigh 0.938 g in
10 ml double distilled water. Prepare fresh solution each time.
2.3 Preparation of Prepare fresh solutions for each extraction.

Cell Lysates
1. Germination medium (Med B): 292 mM sucrose, 1.6 mM
H3BO3, 1 mM KCl, 0.1 mM CaCl2, pH 5.6 (see [5]), weigh
10 g sucrose in 90 ml double distilled water, add 100 μl of 1 M
KCl stock solution, add 100 μl of 100 mM CaCl2 stock solu-
tion, add 1.6 ml of 100 mM H3B03 stock solution, and fill up
to 100 ml. The pH value should be 5.6, adjust with Tris or
MES if necessary.
2. 8 μm filter.
3. Vacuum filtration unit.
4. Liquid nitrogen.
5. Precooled mortar and pestle (see Note 2).
6. Phosphate-buffered saline (PBS, pH 7.4): 1 mM NaH2PO4,
5 mM Na2HPO4, 140 mM NaCl, 5 mM ethylenediaminete-
traacetic acid (EDTA), 1% (v/v) Triton X-100. Weigh in 0.16 g
NaH2PO4 H2O, 0.98 g Na2HPO4 2 H2O, 8.10 g NaCl
and fill up with double distilled water to 1 l.
7. Lysis buffer: Weigh 0.093 g EDTA, add 40 ml PBS, add 50 μl
leupeptin of 10 mM stock solution (see Note 3), add 5 μl
pepstatin A of 10 mM stock solution (see Note 4) add 50 μl
PMSF of 1 M stock solution (see Note 5), add 100 μl E-64 of
1 mM stock solution (see Note 6), and add 500 μl Triton
X-100. Make up to 50 ml with PBS. Always prepare fresh and
keep on ice! Protease inhibitors like PMSF degrade in aqueous
solutions with a half-life time of ca. 30 min at pH 8.
2.4 Immunoaffinity Prepare fresh solutions for each purification. Solutions can be kept
Purification at room temperature, if not otherwise indicated.
1. Magnetic beads (see Note 7).
2. Magnetic particle concentrator.
3. PBS, pH 7.4.
4. Wash Buffer: PBS + 0.1% (w/v) BSA (bovine serum albumin),
0.01 g BSA and fill up to 10 ml with PBS. Store at 4 C.
5. Storage Buffer: PBS + 0.1% (w/v) BSA + 0.02% (v/v) NaN3,
weigh 0.01 g BSA, add 20 μl of 1 M NaN3 stock solution and
fill up to 10 ml with PBS. Store at 4 C.
6. 0.2 M triethanolamine, pH 8.2: Weigh 0.371 g of triethanola-

mine and add 8 ml PBS. Adjust pH with NaOH. Fill up to
10 ml with PBS.
7. Cross-link solution II: 20 mM dimethyl pimelimidate dihy-
drochloride (DMP) in 0.2 M triethanolamine, pH 8.2 (see
Note 8), 5.1 mg DMP and add 1 ml of 0.2 M triethanolamine
solution. Prepare just before use! This solution is used for
covalently binding a primary mouse antibody to the beads.
8. 50 mM Tris, pH 7.5: Weigh 0.303 g Tris and add 40 ml double
distilled water. Adjust pH value with 1 N HCl. Fill up to 50 ml
with water.
9. Elution Buffer: 0.1 M glycine hydrochloride, pH 2.5, 1% (v/v)
Triton X-100. Weigh 0.375 g glycine and add 30 ml double
distilled water. Add 500 μl Triton X-100. Adjust pH value with
1 N HCl and fill up to 50 ml with water.
10. Saturated 1 M Tris solution: Weigh 6.057 g Tris and fill up to
50 ml with water.
2.5 In-Solution Always use the highest grade reagents available. To avoid keratin
Trypsin Digestion contamination of the samples, wear powder-free gloves and work
clean. To avoid contaminations with released softening agents do
not use autoclaved tips and tubes.
1. 10 mM Tris–HCl, pH 8.0: Weigh 0.121 g Tris and make up to
100 ml with double distilled water. Adjust pH value with 1 N
HCl. Store at room temperature.
2. UTU Buffer: 6 M urea, 2 M thiourea, in 10 mM Tris–HCl,
pH 8.0, weigh 9.009 g urea, 3.806 g thiourea and fill up to
25 ml with 10 mM Tris–HCl, pH 8.0. Store as 1 ml aliquots at
20 C.
3. Reduction buffer: 6.5 mM DTT (DL-dithiothreitol), weigh
5 mg DTT and add 5 ml of 10 mM Tris–HCl, pH 8.0. Store
500 μl aliquots at 20 C.
4. Alkylation buffer: 27 mM iodoacetamide (IAA), weigh
2.49 mg iodoacetamide and add 5 ml of 10 mM Tris–HCl,
pH 8.0. Keep the 500 μl aliquots protected from light at
20 C.
5. Endoproteinase Lys-C: 0.5 μg/μl stock solution, dissolve
15 μg Lys-C in 30 μl resuspension buffer (provided by manu-
facturer). Keep at 20 C for short-term storage or at 80 C
for long-term storage.
6. Sequencing grade modified trypsin: 0.5 μg/μl stock solution,
dissolve 20 μg trypsin in 40 μl resuspension buffer (provided by
manufacturer). Keep at 20 C for short-term storage or at
80 C for long-term storage.
7. 10% (v/v) trifluoroacetic acid (TFA): Add 1 ml TFA to 9 ml

double distilled water in a glass bottle. Store at room
temperature.
2.6 Peptide Prepare fresh solutions and store them at room temperature. Use
Desalting with C18- analytical grade or hypergrade agents.
StageTips 1. Acetonitrile (ACN).
2. Buffer A: 5% (v/v) ACN, 0.1% (v/v) TFA, to 9.4 ml ultra-pure
water add 500 μl ACN, and 100 μl of 10% (v/v) TFA in a glass
bottle.
3. Buffer B: 80% (v/v) ACN, 0.1% (v/v) TFA, to 1.9 ml ultrapure
water add 8 ml ACN, and 100 μl of 10% (v/v) TFA in a glass
bottle.
4. C18 StageTips (see Note 9).
5. Centrifuge.
6. Spin adaptors (Fig. 2).
7. Vacuum concentrator.
Fig. 2 Production of stop-and-go-extraction tips (C18-StageTips) according to

[6]. A double-disk StageTip is inserted into a spin adapter (e.g., screw insulator,
www.skiffy.com) and placed into a 2 ml microcentrifuge tube where the lid has
been removed. These 2 ml tubes (W) are used to collect solutions for
equilibration and wash steps. For elution of the bound tryptic peptides the spin
adapter with the C18-StageTip is placed into a fresh 1.5 ml tube (E)
3 Methods
3.1 Formaldehyde This work flow was developed to study putative interaction partners
Cross-Linking and for the PM H+ ATPase in lily pollen [4]. Although the following
Preparation of Cell protocol uses pollen cultures as starting material, it can be easily
Lysates adapted to and optimized for other plant tissues or cell cultures.
1. Incubate pollen grains of 5 flowers (~0.5 g fresh weight) in
10 ml germination medium (Med B) for 10 min at room
temperature in petri dishes.
2. For optimization of the cross-linking reactions, incubate pollen
grains in germination medium with different formaldehyde
concentrations (0.0625–1% (w/v) PFA (see Fig. 1.a) (see
Note 10) for 20 min at room temperature. In addition, it is
very important to define the optimal incubation time for the
cross-linking reaction (see Fig. 1b). Therefore, incubate pollen
grains in the before tested optimal PFA concentration (e.g.,
0.5% (w/v) PFA) for 0, 10, 20, 30, 45, and 60 min at room
temperature. Keep controls (¼ untreated, not cross-linked pol-
len grains, 0% PFA, 0 min) on ice.
3. Stop the cross-linking reactions by adding glycine to a final
concentration of 125 mM (see Note 11) and incubate for
10 min at room temperature.
4. Filter the pollen grain culture through a 8 μm filter using a
vacuum filtration unit and immediately transfer the remainder
on the filter (¼ pollen grains) with a spatula into a prechilled
glass beaker filled with liquid nitrogen. Avoid thawing of the
pollen grains!
5. Homogenize the pollen grains in liquid N2 using a precooled
mortar and pestle to a very fine powder. Avoid thawing of the
powder!
6. Transfer the fine powder with a spatula into a precooled (liquid
N2) 1.5 ml microcentrifuge tube. Let the liquid nitrogen evap-
orate and cautiously add 1 ml ice-cold Lysis buffer (see Note
12).
7. Incubate on a rotational shaker at 4 C for 10 min.
9. After centrifugation, collect the supernatant (¼ cell lysate)
using a pipette or a syringe with a fine needle. Do not disturb
the pellet.
10. Analyze the cross-linking reactions by SDS-PAGE and/or
immunodetection (see Fig. 1a and b.). In order to analyze the
sample by mass spectrometry, reverse the PFA cross-links of the
cell lysates by incubating the cell lysates in 6 sample loading
buffer for 5–30 min at 95 C and analyze by SDS-PAGE

and/or Western Blot (see Fig. 1c).
11. Store cell lysates at 80 C.
3.2 Covalent This protocol uses a monoclonal primary antibody for immunopre-
Coupling of Antibodies cipitation, but can be used for polyclonal antibodies, too.
to Magnetic Beads
1. Resuspend the magnetic beads in the original vial by briefly
vortexing and transfer the required amount of beads
(100–500 μl beads) into a 1.5 ml microcentrifuge tube (see
Note 13).
2. Place the tube in the particle concentrator (¼ magnet) for
2 min at room temperature.
3. Suck out the supernatant and discard. Do not touch the col-
lected magnetic beads inside the tube with the pipette tip.
4. Remove the tube from the magnet and add 1 ml PBS, pH 7.4
to the beads. Mix well by pipetting up and down.
5. Repeat steps 2–4 three times in total. Take a 10 μl aliquot (¼
“untreated beads”; washed beads before antibody capture, see
Fig. 3) for SDS-PAGE analysis.
6. Place the tube on the magnet for 2 min at room temperature.
The now washed magnetic beads are ready for capture of target
Ig (¼ primary antibody).
7. Discard the supernatant and add 100–1000 μl primary mono-
clonal antibody to the corresponding amount of magnetic
beads into the tube, for example, to 500 μl washed magnetic
beads 1 ml monoclonal antibody against a PM H+ ATPase
(hybridoma clone 46E5B11F6) [7] is added (see Note 14).
8. Incubate tube with slow rotation mixing for 2 h at 4 C.
10. Take out the supernatant (¼ unbound primary antibody) and
store at 4 C (see Note 15).
11. Wash magnetic beads with 1 ml PBS, pH 7.4 by slowly pipet-
ting up and down.
13. Remove from the magnet and pipette off the supernatant and
discard.
14. Repeat steps 11–14 three times in total. Take a 5 μl aliquot
from the third washing step (¼ “beads – DMP”; magnetic
beads before DMP treatment, see Fig. 3) for SDS-PAGE
analysis.
15. Place the tube on the magnet for 2 min, pipette off the super-
natant and discard. Remove the tube from the magnet and add
kDa
Xlink Ig to beads mock elution
230--
150--
100--
80--
60--
50--
40--
30--
25--
20--
15--
beads + DMP
beads - DMP
untreated beads
Ab W1 W3 E1 E2
Fig. 3 Coupling antibody to magnetic beads for immunoaffinity purification. Anti-

PM H+ ATPase beads were prepared by covalently binding a monoclonal Ab
(antibody) against the PM H+ ATPase to magnetic IgG beads with dimethyl
pimelimidate (DMP). A mock elution of the beads was performed to check the
coupling efficiency and to remove uncoupled material. Ab ¼ monoclonal
antibody against the PM H+ ATPase (heavy and light chains), 0.5 μl;
DMP ¼ beads incubated with anti-PM H+ ATPase Ab for 2 h at 4 C but not
covalently coupled with DMP, 5 μl; +DMP ¼ Ab covalently coupled to beads with
DMP, 5 μl; 10 min denaturing at 95 C; W1–W3 ¼ wash in lysis buffer, 20 μl;
E1/E2 ¼ eluates 1 and 2, 20 μl; 20 min denaturing at RT. Membrane was
incubated with secondary antibody only [4]
1 ml of 0.2 M triethanolamine, pH 8.2 to the beads. Mix well

by cautiously pipetting up and down for 2 min.
16. Repeat step 15 three times in total.
natant and resuspend beads in 1 ml of cross-link solution
II. Prepare the solution immediately before adding to the
beads and check pH value.
18. Incubate the tube with rotational mixing for 30 min at 20 C.
19. Place the tube on the magnet for 2 min and discard
supernatant.
20. Remove from the magnet and stop the cross-linking reaction
by resuspending the beads in 1 ml of 50 mM Tris, pH 7.5 and
incubate for 15 min at RT by rotational mixing.
21. Place the tube on the magnet for 2 min and discard
supernatant.
22. Wash beads with 1 ml Wash Buffer by cautiously pipetting
up-and-down.
23. Repeat steps 21–22 three times in total. Take a 5 μl aliquot
from the third washing step (¼ “beads + DMP”; magnetic
beads after DMP treatment, see Fig. 3) for SDS-PAGE analysis.
24. Resuspend the beads in 1 ml Storage Buffer and store the now
activated beads at 4 C.
3.3 Immuno- Prior to elution of the captured target antigen or cross-linked

precipitation (Antigen antigen complexes a mock elution using Lysis Buffer without any
Binding to Ig-Coated additional proteins has to be performed to eliminate coelution of
Beads) impurities (see Fig. 3).
1. Place the tube with activated magnetic beads on the magnet for
2 min, and pipette off and discard supernatant.
2. Wash beads with 1 ml PBS, pH 7.4 for 2 min by slowly pipet-
ting up-and-down.
3. Place the tube with activated magnetic beads on the magnet for
2 min, and pipette off and discard supernatant.
4. Add 1 ml Lysis Buffer and cautiously vortex beads for 2 min.
5. Repeat steps 3–4 three times in total. Take a 20 μl aliquot from
each wash step (¼W1–W3) for SDS-PAGE analysis (see Fig. 3).
6. Add 200 μl Elution Buffer to the beads, vortex slowly for 2 min
and place the tube on the magnet for 2 min.
7. Pipette off supernatant, transfer into a fresh 1.5 ml microcen-
trifuge tube and immediately add 20 μl of saturated 1 M Tris
solution. Take a 20 μl aliquot (E1) for SDS-PAGE.
8. Repeat steps 6–7 for second eluate (E2).
9. Wash beads with 1 ml PBS, pH 7.4 and add 1 ml Storage
Buffer. Store beads at 4 C or keep on with binding of target
antigen.
10. Therefore, place the tube again on the magnet for 2 min,
pipette of the supernatant and discard.
11. Remove from the magnet, add 1 ml PBS, pH 7.4, and slowly
pipette up-and-down for washing the beads.
12. Repeat steps 10–11 three times.
13. Thaw 1 ml of cross-linked cell lysate (see Subheading 3.1) on
ice and add again fresh protease inhibitors (10 μM leupeptin,
1 μM pepstatin A, 1 mM phenylmethanesulfonyl fluoride
(PMSF), 2 μM E-64). Mix well.
14. Incubate with slow rotation on a rotational shaker for 2 h at
4 C.
natant (¼ unbound protein) and transfer into a 1.5 ml micro-
centrifuge tube. Keep the tube on ice.
16. Remove the tube from the magnet, add 1 ml Lysis Buffer and
carefully wash beads with captured target proteins for 2 min by
cautiously vortexing.
17. Place the tube on the magnet for 2 min, and transfer superna-
tant into fresh 1.5 ml tubes (wash fraction, W1).
18. Repeat steps 16–17 three times in total. Keep the wash frac-
tions on ice (W1–W3).
19. Remove tube from the magnet, and add 200 μl Elution Buffer
to the beads. Vortex gently for 2 min.
20. Place the tube on the magnet for 2 min.
21. Transfer the supernatant into a fresh 1.5 ml microcentrifuge
tube and immediately add 20 μl of saturated 1 M Tris solution
to the elution fraction. Keep the tube on ice (¼ E1, eluate 1).
22. Repeat steps 19–21 for second eluate (¼ E2, eluate 2).
23. Store all samples at 20 C until used for analysis via immu-
nodetection and/or mass spectrometry.
24. Remove the tube from the magnet, add 1 ml PBS, pH 7.4, and
wash beads by slowly pipetting up-and-down.
25. Place the tube on the magnet for 2 min, and pipet off the
supernatant and discard.
26. Remove the tube from the magnet, and add 1 ml Storage
Buffer and store beads at 4 C.
3.4 In-Solution In order to analyze the cross-linked samples by mass spectrometry

Digestion the covalent bonds of the protein complexes (¼ PFA cross-links) in
the elution fractions have to be reversed.
1. Incubate the cross-linked samples (eluates E1 and/or E2,
200 μl) for 20 min at 95 C.
2. Dry down the sample in a vacuum concentrator to complete
dryness.
3. Dissolve the pellet in 50 μl UTU Buffer, and ultrasonicate for
10 min at room temperature (RT) to solubilize proteins.
4. Add 1 μl of reduction buffer (per 50 μg of total protein) for
reduction of cysteine residues, and incubate for 30 min at RT.
5. Add 1 μl of alkylation buffer (per 50 μg of total protein) and
incubate for 20 min at RT protected from light.
6. Add 0.5 μl endoproteinase Lys-C (per 50 μg of total protein,
1:100 ratio), and incubate for 3 h at RT.
7. Dilute with 4 volumes of 10 mM Tris–HCl, pH 8.0 (see

Note 16).
8. Add 1 μl of trypsin solution per 50 μg of total protein (1:100
ratio) and incubate at 37 C overnight.
9. After digestion stop the reaction by acidifying the digest to
0.2% (v/v) TFA final concentration. Use a 10% (v/v) TFA
stock solution.
10. Centrifuge samples at 10,600 g for 5 min at RT to get rid
of insoluble material.
3.5 Peptide 1. Insert the 200 μl C18-StageTip into a spin adapter (see Fig. 2)
Desalting and and place it into a fresh 2 ml microcentrifuge tube where the lid
Purification has been removed.
2. Place into a centrifuge and load 50 μl of Buffer B into the C18
tip for equilibration. Centrifuge at 1000 g for 2 min at RT.
3. Load 100 μl of Buffer A into the C18 tip, and centrifuge at
1000 g for 3 min at RT.
4. Repeat step 3.
5. Remove the tube from the centrifuge, discard the waste from
the microcentrifuge tube, and insert the tube with the C18
back into the centrifuge.
6. Load the tryptic peptide mixture into the C18 tip. Centrifuge
at 2650 g for 5 min at RT. If necessary repeat this step
to force the sample solution through.
7. Load 100 μl of Buffer A into the C18 tip for washing, and
centrifuge at 1000 g for 3 min at RT.
8. Repeat step 7.
9. Remove the tube from the centrifuge and discard the waste
from the microcentrifuge tube. Place the C18 tip with the spin
adapter into a fresh 1.5 ml microcentrifuge tube, and insert the
tube into the centrifuge.
10. Load 20 μl Buffer B into the C18 tip to elute the bound tryptic
peptides from the C18 matrix. Centrifuge at 1000 g for
2 min at RT.
11. Repeat step 10.
12. Spin down the final volume of the eluate (¼ 40 μl) to dryness in
a centrifugal vacuum concentrator (~ 45 min).
13. Store samples at 80 C until used for LC-MS/MS.
4 Notes
1. The PFA stock solution is prepared by heating the solution in

culture medium (e.g., lily pollen germination medium Med B)
to ~60 C for 30 min and by adding 1–2 NaOH pellets. The
solution was cooled to room temperature and filtered through
a 0.22 μm filter. Wear gloves and work in a chemical fume
hood. We noticed precipitations in mannitol containing media!
2. Despite breaking and homogenizing frozen cells with mortar
and pestle, cell lysis can be performed by different methods
(e.g., homogenization with a Teflon Potter-Elvehjem–type
homogenizer). It is very important to keep the homogenate
cold, and therefore, working with an ice bath and as fast as
possible is very crucial. To avoid heating of protein solutions
ultra-sonication is not recommended.
3. Leupeptin is an inhibitor of serine and cysteine proteases. To
prepare a 10 mM stock solution add 1050 μl ultrapure water to
5 mg leupeptin. Prepare 100–250 μl aliquots and store at
20 C.
4. Pepstatin A is a highly selective inhibitor of acid proteases
(aspartyl peptidases). To prepare a 10 mM stock solution add
729 μl DMSO (dimethyl sulfoxide) to 5 mg pepstatin A. Ali-
quot and store at 20 C.
5. PMSF inhibits serine and cysteine proteases. Weigh 1.742 g
and make up to 10 ml with DMSO. Prepare 1 ml aliquots and
store at 20 C. Work cautiously due to the high toxicity
of PMSF.
6. E-64 is a highly selective cysteine protease inhibitor which will
not inhibit serine proteases like other cysteine protease inhibi-
tors. To prepare a 1 mM stock solution add 2798 μl ultrapure
water to 1 mg E-64. Store at 20 C.
7. Dynabeads (Dynabeads™ M-280 Sheep anti-mouse IgG,
Thermo Fisher Scientific, Waltham, MA, USA) are uniform,
superparamagnetic, polystyrene beads with affinity purified
sheep anti-mouse IgG covalently attached onto the bead sur-
face. The beads will bind specific antigens via a mouse primary
antibody. If only a polyclonal primary antibody is available to
catch your protein of interest the use of sheep anti-rabbit IgG
beads is necessary. These beads efficiently bind rabbit IgGs of
all subclasses.
8. Always prepare the cross-link solution freshly. DMP is stored at
20 C, and therefore allow to equilibrate with room temper-
ature before use. Check pH value of the cross-link solution as
the pH should not be less than 8.0! If necessary, adjust pH with
3 M NaOH.
9. The 200 μl C18 StageTips are commercially available (e.g.,

catalog number: 87784, Pierce C18 Tips, Thermo Fisher Sci-
entific, Waltham, MA, USA) or can be manufactured in-house
according to [6].
10. For 0.0625% (w/v) PFA add 62.5 μl of a 10% (w/v) PFA stock
solution, for 0.125% (w/v) add 125 μl of a 10% (w/v) PFA
stock solution, for 0.25% (w/v) add 250 μl of a 10% (w/v) PFA
stock solution, for 0.5% (w/v) add 500 μl of a 10% (w/v) PFA
stock solution, and for 1% (w/v) add 1 ml of a 10% (w/v) PFA
stock solution to 10 ml pollen cultures.
11. Add 1 ml of a 1.25 M glycine stock solution to 10 ml pollen
cultures.
12. Add the precooled lysis buffer drop-by-drop. If the buffer is
added too fast rests of liquid nitrogen start to boil and the
frozen fine cell powder easily spouts out from the tube.
13. The required amount of beads depends on the concentration
of the used primary antibody solution.
14. Calculate the amount of antibody and beads according to the
manufacturer’s recommendations. The required amount of
antibody depends on the concentration, specificity and affinity
of the used primary antibody and the amount and specificity of
your target antigen in the sample. A highly specific (monoclo-
nal or polyclonal) antibody against the desired target protein,
for which putative interaction partners should be identified, in
an adequate amount is a prerequisite for immunoaffinity
purification.
15. Because the titer (i.e., the concentration) of most antibody
solutions is quite high and the used amount of primary anti-
body is mostly in excess, the recovered antibody solution can
still be used for some further immunodetection experiments.
16. This dilution step is necessary in order to reduce the high salt
concentration for the following trypsin digestion, which is
favorable for best trypsin operation.
Acknowledgments
The research work is partially financed by the Austrian Research

Fund (FWF, P29626).
References
1. Miernyk JA, Thelen JJ (2008) Biochemical with K15NO3 as a tool for quantitative
approaches for discovering protein-protein analysis of proteins and metabolites. Plant Meth-
interactions. Plant J 53:597–609 ods 2:14
2. Engelsberger WR, Erban A, Kopka J et al 3. Vasilescu J, Guo X, Kast J (2004) Identification
(2006) Metabolic labeling of plant cell cultures of the protein-protein interactions using in vivo
cross-linking and mass spectrometry. Proteomics 6. Rappsilber J, Mann M, Ishihama Y (2007) Pro-
4:3845–3854 tocol for micro-purification, enrichment,
4. Pertl-Obermeyer H, Schulze WX, Obermeyer G pre-fractionation and storage of peptides for
(2014) In vivo cross-linking combined with proteomics using StageTips. Nat Protoc
mass spectrometry analysis reveals receptor-like 2:1896–1906
kinases and Ca2+ signalling proteins as putative 7. Villalba JM, Lützelschwab M, Serrano R (1991)
interaction partners of pollen plasma membrane Immunolocalisation of the plasma membrane
H+ ATPases. J Proteome 108:17–29 H+ ATPase in maize coleoptiles and enclosed
5. Pertl-Obermeyer H, Obermeyer G (2013) leaves. Planta 185:458–461
Pollen-cultivation and preparation for proteome
studies. Methods Mol Biol 1072:435–449
Chapter 21
Proteome Analysis of 14-3-3 Targets in Tomato Fruit Tissues

Yongming Luo, Yu Lu, Junji Yamaguchi, and Takeo Sato
Abstract
Tomato is a major crop plant and an important constituent of the human diet. Exclusive features such as
bearing fleshy fruits and undergoing a phase transition from partially photosynthetic to fully heterotrophic
metabolism make tomato fruit a model system for fruit development studies. Although the tomato genome
has been completely sequenced, functional proteomics studies are still at their starting stage. Proteomics
technologies, especially the combination of multiple approaches, provide a very powerful tool to accurately
identify functional proteins and investigate certain sets of proteins in more detail. The direct binding of
plant 14-3-3 proteins to their multiple target proteins modulates the functions of the latter, suggesting that
these 14-3-3 proteins are directly involved in various physiological pathways. This chapter outline methods
for the identification of 14-3-3 protein complexes in tomato fruit tissues. These methods include detailed
protocols for protein extraction, coimmunoprecipitation, SDS-PAGE, SYPRO Ruby staining, in-gel trypsin
digestion, and LC-MS/MS analysis for 14-3-3 interactomics.
Key words Tomato fruit, 14-3-3 protein, Interactome
1 Introduction
The tomato (Solanum lycopersicum L.) is the most economically

important crop in commercial production worldwide. In contrast
to model plants that bear dry fruits, tomato plants constitute the
ideal model system for studying the development of fleshy fruit
[1, 2]. Although the full genome of this species was sequenced in
2012 [3], few studies have investigated protein–protein interac-
tions in tomato plants.
The 14-3-3 proteins are highly conserved, versatile regulatory
proteins that participate in a diverse range of cellular processes
through direct binding to their target proteins. In plants, 14-3-3
target proteins are distributed across a large number of physiologi-
cal pathways, including those involved in primary metabolism,
hormone signaling, cell growth and division, and response to mul-
tiple environmental stresses (Fig. 1). In Arabidopsis, proteomic
analyses have identified over 300 14-3-3 target proteins, reflecting
the complicated regulatory network of 14-3-3 proteins in plant
289
290 Yongming Luo et al.
Fig. 1 Illustration of plant 14-3-3 protein functions in various physiological pathways. The 14-3-3 proteins
preferentially form saddle-shaped homo- or heterodimers, in which a broad central groove is able to bind to
target proteins. These 14-3-3 proteins bind to the target protein mainly through the phosphorylated 14-3-3
binding motifs in the latter
physiology [4, 5]. Arabidopsis 14-3-3 proteins serve as carbon–

nitrogen (C/N) nutrient balance regulators, with the abundance of
these 14-3-3 proteins being essential for plant in response to C/N
nutrient status [6–9]. Because carbon and nitrogen nutrients also
profoundly affect fruit development and quality, the identification
of 14-3-3 target proteins in tomato fruit tissue is critical for under-
standing the ability of 14-3-3 proteins to modulate primary metab-
olism in tomato fruit tissue.
This chapter describes a protocol for the identification of 14-3-
3 target proteins, it includes coimmunoprecipitation of these target
proteins from transgenic tomato fruit expressing FLAG tag-fused
14-3-3 protein, followed by LC-MS/MS analysis of the precipi-
tated proteins. With this method, 106 proteins are identified,
including key enzymes involved in carbon metabolism and photo-
synthesis [10], suggesting the need for further research on the
functions of 14-3-3 proteins in tomato fruit.
2 Materials
2.1 Plant Material The tomato (Solanum lycopersicum L.) cultivar Micro-Tom, expres-
sing FLAG tag-fused 14-3-3λ driven by a CaMV 35S promoter
(FLAG-14-3-3λ), and wild-type Solanum lycopersicum L were
employed [10] (see Note 1). Tomato plants are grown in 9 cm
pots containing peat moss-based soil Jiffy-Mix soil (Sakata Seed,
Japan), supplemented with HYPONeX nutrient mixture (N:P:
K ¼ 6:10:5) (HYPONeX JAPAN, Japan), at 25 C and 16:8 h
light–dark cycles .
2.2 Immunopreci- 1. Protein extraction buffer: 100 mM Tris–HCl, pH 7.5, 10%

pitation of 14-3-3 glycerol (v/v), 150 mM NaCl, 5 mM MgCl2, 1 mM EDTA,
Complexes from 0.5% Triton X-100 (v/v), 10 μM MG132, 1 complete prote-
Tomato Fruit Tissue ase inhibitor mixture (Roche, Switzerland), 1 PhosSTOP
phosphatase inhibitor cocktail (Roche, Switzerland).
14-3-3 Interactome in Tomato Fruit 291
2. Wash buffer: 100 mM Tris–HCl, pH 7.5, 10% glycerol (v/v),

150 mM NaCl, 5 mM MgCl2, 1 mM EDTA, 0.5% Triton
X-100 (v/v).
3. Bradford reagent (Bio-Rad, USA).
4. Immunoprecipitation bead: anti-FLAG M2 antibody conju-
gated to magnetic beads (Sigma-Aldrich, USA).
5. Elution buffer: 3 FLAG peptides (5 mg/mL) (Sigma-
Aldrich, USA).
2.3 SDS-PAGE 1. 2 SDS sample buffer: 4% SDS, 100 mM dithiothreitol,

and SYPRO Ruby Stain 125 mM Tris–HCl, pH 6.8, 20% glycerol, 0.02% bromophenol
blue (BPB).
2. Unstained protein molecular weight markers (Bio-Rad, USA).
3. Ready-made 10% polyacrylamide (w/v) gels (Perfect NT Gel;
DRC, Japan).
4. SDS-PAGE running buffer: 25 mM Tris, 192 mM glycine,
0.1% SDS (w/v).
5. SYPRO Ruby Protein Gel Stain (Invitrogen, USA).
6. Wash solution: 10% methanol (v/v), 7% acetic acid (v/v).
2.4 In-Gel Trypsin 1. Dehydration solution: 100% acetonitrile.

Digestion 2. Wash buffer: 50 mM ammonium bicarbonate.
3. Reduction solution: 10 mM dithiothreitol, 50 mM ammonium
bicarbonate.
4. Alkylation solution: 55 mM iodoacetamide, 50 mM ammo-
nium bicarbonate.
5. Trypsin solution: Sequence grade modified trypsin (Promega,
USA), 10 μg/mL in 50 mM ammonium bicarbonate.
6. Extraction solution I: 50% acetonitrile (v/v), 5% formic acid
(v/v).
7. Extraction solution II: 70% acetonitrile (v/v), 5% formic acid
(v/v).
8. Dissolving solution: 5% acetonitrile (v/v), 0.1% formic acid
(v/v).
9. Hitech tube crystal (M-50001; HITECH, Japan).
10. Peptide low absorbable micropipette tip (BM2051; BM Bio,
Japan).
11. Ultrafree-MC centrifugal filters (pore size 0.45 μm; Millipore,
USA).
2.5 LC-MS/MS 1. Reverse-phase (RP) chromatography buffer A (RPB-A): 0.1%

Analysis formic acid (v/v).
2. Reverse-phase (RP) chromatography buffer B (RPB-B): 0.1%

formic acid (v/v) in acetonitrile.
3. Reverse-phase (RP) chromatography column: nano-HPLC
capillary column (NTCC-360/75-3-125; Nikkyo Technos,
Japan).
4. EASY-nLC 1000 liquid chromatograph (ThermoFisher Scien-
tific, USA).
5. Orbitrap Elite mass spectrometer (ThermoFisher Scientific,
USA).
3 Methods
3.1 Immunopreci- 1. Harvest the expanding green tomato fruit from WT and
pitation of 14-3-3 FLAG-14-3-3λ plants and grind the fruits in liquid nitrogen
Complex from Tomato using mortar and pestle [10] (see Note 2).
Fruit Tissue 2. Equilibrate 20 μL of anti-FLAG M2 magnet beads for each
sample with extraction buffer at least for two times to remove
the stock buffer.
3. Add 3 μL protein extraction buffer supplemented with inhibi-
tors per mg fresh weight of tomato fruit; transfer the suspen-
sion to a prechilled 1.5 mL tube and place on ice.
4. Centrifuge twice at 20,000 g for 5 min at 4 C and remove
the insoluble residues.
5. Determine the protein concentration in the supernatant by the
Bradford method and transfer lysates containing 3 mg proteins
to new 1.5 mL tubes. Add sufficient extraction buffer to equal-
ize lysate volume between negative control (WT) and FLAG-
14-3-3λ.
6. Add 20 μL anti-FLAG M2 magnetic beads to the protein
extracts.
7. Incubate samples at 4 C for 1 h on a rotary shaker.
8. Spin down magnet beads particles at 8000 g for 30 s at 4 C
and discard the supernatant by placing the tube in the appro-
priate magnetic separator. Then wash the beads for three times
with wash buffer and add 140 μL of wash buffer.
9. Add 60 μL of 3FLAG peptides solution and incubate at 4 C
for 1 h on a rotary shaker to elute the FLAG-14-3-3λ proteins
from the beads.
10. Centrifuge at 8000 g for 30 s at 4 C and transfer the
supernatant into a new 1.5 mL tube by placing the tube in
the appropriate magnetic separator.
11. Centrifuge at 20,000 g for 1 min to remove the unexpected

remaining beads completely and transfer 180 μL of the super-
natant into a new 1.5 mL tube.
12. Add 720 μL of prechilled 100% acetone, vortex mix for 10 s,
and incubate at 30 C for over 2 h.
13. Centrifuge at 13,000 g for 10 min at 4 C and discard the
supernatant.
14. Dry up the precipitate with a vacuum centrifuge for 15 min at
37 C.
15. Add 36 μL of 1 SDS sample buffer and incubate the samples
at 37 C for 1 h.
16. Centrifuge by 20,000 g for 5 min to spin down any unex-
pected particle before applying to SDS-PAGE.
3.2 SDS-PAGE 1. Load immunoprecipitated samples and molecular weight mar-

and SYPRO Ruby kers onto the lanes of a 10% SDS–polyacrylamide gel and start
Staining the electrophoresis.
2. Stop the electrophoresis when the BPB dye front reaches the
one third position of the gel.
3. Transfer the gel carefully into a container containing SYPRO
Ruby staining solution. Cover the container with aluminum
foil and gently shake the gel for 3 h at room temperature (see
Note 3).
4. Discard the SYPRO Ruby staining solution and wash the gel
twice each for 30 min with wash buffer.
5. Discard the wash buffer and wash the gel with milli-Q water for
10 min.
6. Observe the protein bands with a Safe Imager™ 2.0 Blue-
Light Transilluminator (see Note 4).
3.3 In-Gel Trypsin 1. Excise the entire gel of each lane with a clean scalpel and chop
Digestion the excised gels into pieces of approximately 1 1 mm; transfer
these pieces into a 1.5 mL microcentrifuge tube (see Notes 5–7).
2. Add 400 μL 100% acetonitrile to each tube and shake for
15 min at room temperature (see Note 6).
3. Remove the acetonitrile and dry the gel with a vacuum centri-
fuge (approx. 15 min at 37 C).
4. Add 400 μL reduction solution to each tube and allow the dry
gel pieces to soak by shaking at 56 C for 45 min.
5. Cool the tube to room temperature, remove the reduction
solution, and add 400 μL alkylation solution; shake for
30 min in the dark.
6. Discard the alkylation solution and wash the gel samples with
400 μL gel washing buffer for 10 min.
7. Discard the washing buffer and dehydrate again with 400 μL
dehydration solution for 10 min. Repeat the dehydration
protocol.
8. Dry the gel with a vacuum centrifuge for 15 min at 37 C.
9. Add a sufficient amount of trypsin solution to each dried gel
samples, making sure the trypsin solution covers all the gel
pieces, and incubate at 37 C for 16 h (see Note 8).
10. Add 100 μL extraction solution I and shake for 30 min at room
temperature. Transfer the supernatant to a new tube (see
Note 9). Repeat this procedure with extraction solution II.
11. Dry the solution with a vacuum centrifuge (approx. 1.5 h at
37 C).
12. Add 20 μL dissolving solution to dissolve the dried peptides,
and filter each with an Ultrafree-MC Centrifugal Filter to avoid
contamination of gel pieces.
3.4 LC-MS/MS 1. Transfer the trypsin-digested peptide solution into an HPLC

Analysis and Protein vial (11-19-102) (AMR, Japan) suitable for the autosampler
Identification (Accela AS) (ThermoFisher Scientific, USA).
2. Inject the samples in the HPLC apparatus; peptides are
separated on an analytical nano-capillary column with an
EASY-nLC 1000 liquid chromatograph system (ThermoFisher
Scientific, USA).
3. Elute the peptides at a column flow rate of 300 nL/min by
applying a three-step linear gradient: 0~55 min 0~35% RPB-B,
55~60 min 35~100% RPB-B, 60~68 min 100% RPB-B.
4. Survey the full-scan spectra obtained with the Orbitrap mass
analyzer (ThermoFisher Scientific, USA).
The ten most intense precursor ions, ranging from 300 to
1500 m/z, are scanned and measured in the mass spectrometer
at 120,000 resolution at m/z 400. These ions are sequentially
isolated and fragmented (collision-induced dissociation at
35 eV), with the corresponding fragment ions measured in
the linear ion trap.
5. Search the TAIR10 (http://www.arabidopsis.org/ index.jsp)
and ITAG2.4 (http://solgenomics.net/) databases using the
SEQUEST algorithm embedded in Proteome Discoverer 1.4
software (ThermoFisher Scientific, USA).
6. Use the following parameters for the searches (see Note 10):
(a) Precursor ion tolerance of 10 ppm,
(b) Product ion mass tolerance of 0.8 Da,
(c) Trypsin as the proteolytic enzyme, allowing up to two

missed cleavages,
(d) Carbamidomethylation on cysteine as a fixed modifica-
tion, and,
(e) Oxidation of methionine as a variable modification.
7. Employ an automatic decoy database strategy to estimate false
discovery rate (FDR), and filter the resulting peptides to pres-
ent only those proteins with <1% FDR. Accept only those
matched peptides with XCorr values for singly (z ¼ 1), doubly
(z ¼ 2), and triply (z ¼ 3) charged ions of 1.5, 2.0, and
2.5, respectively. Consider positively identified peptides to be
putative interactors with tomato 14-3-3λ only when at least
two of three replicates have SEQUEST scores >10 when com-
pared with negative control WT plants.
4 Notes
1. FLAG-tag is a polypeptide epitope tag, having the amino acid

sequence DYKDDDDK, that can be added to a protein for
immunoprecipitation.
2. Expanding green fruits were harvested in this experiment. The
culture period and fruit condition are dependent on the pur-
pose of the experiment [1].
3. Staining incubation time can be shortened with a microwave
oven, as described in the manufacturer’s protocol (Invitrogen,
USA).
4. The successful immunoprecipitation of FLAG-14-3-3λ can be
confirmed by western blotting with anti-FLAG antibody as
well as by SYPRO Ruby staining.
5. Gels stained with CBB or silver stain should be destained.
6. Our protocol includes adding approximately 100 μL gel to
each tube, along with 400 μL solution.
7. Tubes should be tolerant to organic solvents such as acetoni-
trile, with low absorbability of proteins and peptides.
8. The solution should cover all gel pieces. After 30 min, the
samples should be checked, with more trypsin solution added
if the liquid was absorbed by the gel pieces.
9. Beginning with this step, pipette tips with low absorbability of
peptides should be used.
10. These parameters can be adjusted, depending on the purpose
of these experiments.
Acknowledgments
This work was supported by a Grant in-Aid for Scientific Research

to T.S. [Nos. 15K18819 and 17K08190], by Grants in Aid for
Scientific Research to J.Y. [Nos. 15H0116705, 262921888, and
18H02162] from the Japan Society for the Promotion of Science
(JSPS), and by a grant from The NOASTEC foundation, Hokkaido
University Young Scientist Support Program to T.S. Lu was sup-
ported by a JSPS research fellowships (2016-2018) and JSPS Post-
doctoral fellowships for Research in Japan (2018-2020). Luo was
supported by a Support Grant for Self-Supported International
Graduate Student (Hokkaido University Faculty of Science:
2017-2018). This work was also supported by a Cooperative
Research Grant of the Plant Transgenic Design Initiative, Gene
Research Center, University of Tsukuba.
References
1. Tohge T, Alseekh S, Fernie AR (2014) On the for growth phase transition in Arabidopsis
regulation and function of secondary metabo- seedlings. Plant J 60:852–864
lism during fruit development and ripening. J 7. Sato T, Maekawa S, Yasuda S et al (2011)
Exp Bot 65:4599–4611 Identification of 14-3-3 proteins as a target of
2. Shikata M, Hoshikawa K, Ariizumi T et al ATL31 ubiquitin ligase, a regulator of the C/N
(2016) TOMATOMA update: phenotypic response in Arabidopsis. Plant J 68:137–146
and metabolite information in the Micro-Tom 8. Yasuda S, Sato T, Maekawa S et al (2014)
mutant resource. Plant Cell Physiol 57:e11 Phosphorylation of Arabidopsis ubiquitin
3. The Tomato Genome Consortium (2012) The ligase ATL31 is critical for plant carbon/nitro-
tomato genome sequence provides insights gen nutrient balance response and controls the
into fleshy fruit evolution. Nature stability of 14-3-3 proteins. J Biol Chem
485:635–641 289:15179–15193
4. Oecking C, Jaspert N (2009) Plant 14-3-3 9. Yasuda S, Aoyama S, Hasegawa Y et al (2017)
proteins catch up with their mammalian ortho- Arabidopsis CBL-interacting protein kinases
logs. Curr Opin Plant Biol 12:760–765 regulate carbon/nitrogen-nutrient response
5. Chang IF, Curran A, Woolsey R et al (2009) by phosphorylating ubiquitin ligase ATL31.
Proteomic profiling of tandem affinity purified Mol Plant 10:605–618
14-3-3 protein complexes in Arabidopsis thali- 10. Lu Y, Yasuda S, Li X et al (2016) Characteriza-
ana. Proteomics 9:2967–2985 tion of ubiquitin ligase SlATL31 and proteo-
6. Sato T, Maekawa S, Yasuda S et al (2009) mic analysis of 14-3-3 targets in tomato fruit
CNI1/ATL31, a RING-type ubiquitin ligase tissue (Solanum lycopersicum L.). J Proteome
that functions in the carbon/nitrogen response 143:254–264
Chapter 22
The Use of Proteomics in Search of Allele-Specific Proteins

in (Allo)polyploid Crops
Abstract
Most organisms are diploid, meaning they only have two copies of each chromosome (one set inherited
from each parent). Polyploid organisms have more than two paired (homologous) sets of chromosomes.
Many plant species are polyploid. Polyploid species cope better with stresses thanks to the redundancy in the
chromosome copy number and dispose in this way a greater flexibility in gene expression. Allopolyploid
species are polyploids that contain an alternative set of chromosomes by the cross of two (or more) species.
Gene variants unique for a preferential phenotype are most probable candidate markers controlling the
observed phenotype. Organ or tissue-specific silencing or overexpression of one parental homeolog is quite
common. It is very challenging to find those tissue-specific gene variants. High-throughput proteomics is a
successful method to discover them. This chapter proposes two possible workflows depending on the
available resources and the knowledge of the species. An example is given for an AAB hybrid and an ABB
hybrid. Allele-specific gene responses are picked up in this workflow as gene loci displaying genotype-
specific differential expression that often have single amino acid polymorphisms. If the resources are
sufficient, a genotype-specific mRNAseq database is recommended where a link is made to the allele-
specific transcription levels. If the resources are limited, allele-specific proteins can be detected by the
detection of genotype-specific peptides and the identification against existing genomics libraries of the
parents.
Key words Polyploidy, Homeolog, LC MSMS, Allelic variance
1 Introduction
Agricultural productivity results from the plant genotype–environ-

ment interaction and farm management (G E M). For a
sustainable agriculture, the right genotype is grown in the right
environment. The flexibility of cultivars toward the environment is
determined by genetic diversity (G) and a deeper understanding
thereof toward the phenotype. An absolute priority in the current
breeding programs is the identification of sources of natural varia-
tion with potential to rise the tolerance toward unfavorable (a)
biotic constraints while minimizing the yield penalty. Across the
vascular plant genera, a considerable proportion is polyploid.
297
298 Sebastien Christian Carpentier
Fig. 1 Example of an allele-specific protein in an AAB hybrid. The A genome is depicted in blue, the B genome
in Red. The sequence of this on chromosome 11 encoded protein is depicted and shows 2 SAAP indicated in
yellow
Especially many of our crops are polyploid with complex heterolo-

gous genomes. Barker and coworkers performed a study to assess
the polyploidy among the plant species and found that 24% of the
investigated species were polyploid [1]. Polyploids and especially
allopolyploids likely have an evolutionary advantage [1]. Genome
reorganization and the associated greater flexibility in gene expres-
sion allow coping with immediate stresses which may be beyond the
tolerances of their progenitors/parents. Gene loss or silencing,
neo- and/or subfunctionalization, intergenomic transfer, allele
dominance/codominance, differences in transcription/translation
efficiency and posttranslational modifications exemplify how the
genome, transcriptome, and proteome are regulated following
polyploidization events. A(n) (allo)polyploid genome is thus a
patchwork of gene variants enabling many genotype environment
interactions (Fig. 1). Gene variants unique for a preferential phe-
notype are most probable candidate markers controlling the
observed phenotype. GWAS (genome-wide association studies)
have been successfully applied in many crops, but this is challenging
for a complex multigene trait. Moreover, organ- or tissue-specific
silencing or overexpression of one parental homeolog is quite
common [2–5]. The link between gene or protein abundance and
SNP (single nucleotide polymorphism) or single amino acid poly-
morphisms (SAAP) and a preferential phenotype is more stringent
with knowledge of the tissue, time point, and environmental con-
dition of the gene/protein activity [6]. RNA sequencing effectively
combines gene expression quantification with gene sequencing and
Identification and Quantification of Homeologs via LC MSMS 299
allows for SNP calling [5]. Yet most current-day read mapping
software have difficulties to process complex (polyploid) genomes.
The read mapping efficiency to the reference genome might be
biased, and the degree of heterozygosity greatly increases compu-
tational effort, hampering quantitative results per genome
[7]. Consequently, the RNA reads are not separated and traced
back to their (sub)genome. Algorithms like PolyCat and HANDS2
process reads based on classification toward their genome of origin
but heavily depend on the presence and the quality of reference
genomes [7, 8]. Since not all cultivars of interest carry the reference
sequence, mapping efficiency biases can still occur when one refer-
ence genome is more closely linked to a constituting genome than
the other [7]. Proteomics allows for picking up and quantifying the
actual differential products without previous genome knowledge
[3, 9]. 2DE-based proteomics is very useful to identify allele-
specific protein isoforms in complex protein families [10]. This is
exemplified in banana for the HSP70 protein family [4]. However,
2DE is no longer the tool of choice in high-throughput differential
proteomics because of the labor and time involved in producing
2DE gels. Via LC-MSMS multiple allele-specific products (tryptic
specific peptides) can be quantified in a high-throughput manner
without required prior knowledge [11]. Proteomes clearly provide
insights into the consequences of genomic merging and reorgani-
zation [3, 5, 12–15].
2 Methods
2.1 Workflow 1: No When no resources are available to construct a proper mRNA seq
Resources Available library, genotype-specific proteins can be identified by analyzing a
for mRNA Seq (Fig. 2) sufficient amount of biological replicates and confirming the allele
specificity by identifying the genotypic peptides against a relevant
library. An example is worked out for an AAA and an AAB genotype
of banana/plantain. One is the well-known dessert banana, the
other gives a fruit that is higher in starch content and is consumed
cooked as a starch source. How to identify plantain (AAB) specific
proteins? The plantain genome is not known. A reference genome
is known of a double haploid AA [16] variety that has been rese-
quenced [17] and a reference B genome where the genomic reads
were aligned to the AA reference genome to determine the chro-
mosome locus [18]. The first plantain proteome was published in
2018 [13].
1. Download the publicly available FASTA files of both parent
genotypes AA and BB (e.g., in the case of banana https://
banana-genome-hub.southgreen.fr/).
2. Download the commonly proteomic contaminants (keratins
and trypsin) (https://www.uniprot.org/).
Fig. 2 Workflow for the identification of allele-specific proteins when no resources are available to generate
mRNA seq libraries. Example of an AAB hybrid
3. Merge both AA and BB libraries and contaminants and

remove preferably proteins that are identical via CD hit [19]
(see Note 1).
4. Characterize the peptides via label-free shotgun proteomics (see
Note 2) and search all runs against your constructed library
and the decoy database.
5. Quantify the peptides based on their abundance (Fig. 3) (see
Note 3).
Fig. 3 Label-free quantification of the peptides. The visualization of the peptide abundance in 3D (retention
time, intensity, m/z) helps to confirm the absence of the B allele–specific peptide in the AAA genotype (left
picture). Even if the peptide was not selected for MSMS in a run the abundance pattern confirms that the
peptide is present or absent
6. Construct a volcano plot (see Note 4).

7. Select candidate peptides (see Tables 1 and 2 “Volcano plot
selected”).
8. Confirm the peptide identification at a minimum confidence
level of 95% (see Tables 1 and 2 “Max probability”) (see Note
5).
9. BLAST each peptide against your library (see Tables 1 and
2 “Match in chromosome”). (Stand-alone and API BLAST
see https://blast.ncbi.nlm.nih.gov/Blast.cgi).
10. Eliminate all peptides with a 100% match with more than one
gene locus (see Tables 1 and 2 “Locus specific”).
11. Confirm on spectral counting that the spectrum is unique for
the genotype (see Tables 1 and 2 “Allele counting”).
12. Send all the Amino Acid substitutions in the protein to Panther
[20] http://pantherdb.org/tools/csnpScoreForm.jsp to
judge the impact.
13. Send all the differential proteins to Panther [20] for a GO
enrichment.
2.2 Workflow 2: When sufficient resources are available to construct a proper mRNA
Resources Are seq library, genotype-specific proteins can be conducted by analyz-
Available to Generate ing a sufficient amount of biological replicates and confirming the
mRNA Seq Libraries allele specificity by identifying the genotypic peptides against its
(Fig. 4) own mRNA library. An example is worked out for an AAA and an
ABB genotype of banana. How to identify B genome–derived
proteins in the ABB genotype?
Table 1
Example of protein inference between allelic isoforms
Match in Spectral
Max chromosome countsb
probability Locus Volcano plot Allele
Identified peptidea (%) 1B 1A unique selected AAA AAB specific
ADPNVDFAFCSQSL 100 1 1 1 0 1 B
R
GDPNVDFTFCSQSL 100 1 1 0 10 1 A
R
GLAIISLK 99 1 1 1c 1 0 33 B
SAAG 100 1 1 0 10 1 A
SDGADLHGLAII
SLK
SAAGSDGADLR 100 1 1 1 0 2 B
SCLDACR 100 1 1 0 0 46 80 0
RSIVGETCNQIAR 100 1 1 0 27 4 A
SIVGETCNQIAR 100 1 1 0 40 25 A
SIVSETCNQIAR 100 1 1 1 0 30 B
VVYADAASELR 100 1 1 0 0 3 0d
a
Single amino acid polymorphism is indicated in bold. The SAAP G48A in GDPNVDFTFCSQSLR has been evaluated by
PANTHER as probably damaging
b
Total number of spectral counts for 21 AAA samples and 30 AAB
c
The peptide GLAIISLK should not be present in the AAA sample since in the AA genome the sequence is (H)
GLAIISLK(L) while in the BB genome the sequence is (R)GLAIISLK(L)
d
A BLAST suggests that VVYADAASELR is also allele specific since the AA genome sequence would code for
EVYADAASDLR, but the low spectral counts and the high variability in abundance prevent us from confirming this
peptide
1. Check the quality of your reads via FastQC.

2. Map the reads against the reference genome [21] (or assemble
them de novo if no ref. gene is available) and assemble Binary
Alignment Map (BAM) files.
3. Concatenate the BAM files and apply variant calling with vcf
TOOLS and SAM tools [22, 23].
4. Generate a FASTA file from each library.
5. Download the commonly proteomic contaminants and merge
them with the mRNA generated FASTA.
6. Characterize the peptides via label-free shotgun proteomics (see
Note 2) and search all runs against your constructed library
and the decoy database.
7. Quantify the peptides based on their abundance (see Note 3).
Table 2
Example of protein inference and distinction between paralogous and allelic isoforms
Spectral
Match in chromosome countsb
Maximum probability Locus Volcano plot Allele
Peptidea (%) 6B 6A 7B 6A 9A 4B unique selected AAA AAB specific
APGGCNNPCTVFK 100 1 1 1 0 0 89 131 0
CAADINGQCPAALK 100 1 1 1 0 13 B
CAADINGQCPAALKAPGGCNNPC 100 1 1 1 0 1 B
TVFK
CSYTVWAAAVPGGGR 100 1 1 0 0 71 126 0
DDQTSTFTCPGGANYR 100 1 1 1 0 25 B
DDQTSTFTCPGGTNYR 100 1 1 0 47 74 A
NCPDAYSYPK 100 1 1 1 1 1 0 0 48 92 0
NCPDAYSYPKDDQTSTF 100 1 1 1 0 75 B
TCPGGANYR
NNCPDAYSYPKDDATSTFTCPGG 100 1 1 0 0 19 40 0
TNYR
QLNQGQSWTINVNAGTTGGR 100 1 1 0 0 32 37 0
RNCPDAYSYPK 100 1 1 0 0 7 9 0
RNCPDAYSYPKDDQTSTF 100 1 1 1 0 16 B
TCPGGANYR
TDQYCCNSGSCGPTDYSR 100 1 1 1 0 76 B
(continued)
Identification and Quantification of Homeologs via LC MSMS
303
304
Table 2
(continued)
Spectral
Match in chromosome countsb
Maximum probability Locus Volcano plot Allele
Peptidea (%) 6B 6A 7B 6A 9A 4B unique selected AAA AAB specific
TGCSFDGSGR 100 1 1 0 0 193 342 0
TGCSFDGSGRGR 100 1 1 0 0 8 25 0
a
Single amino acid polymorphism is indicated in bold. The SAAP T218A in DDQTSTFTCPGGTNYR has been evaluated by PANTHER as probably benign. A BLAST
additionally confirms that CAADINGQCPAALK is also allele specific since the AA genome sequence would code for CAADINGQCPGALK. A BLAST confirms that
TDQYCCNSGSCGPTDYSR is also allele specific, since the AA genome sequence would code for TDQYCCNSGSCSPTDYSR
b
Total number of spectral counts for 21 AAA samples and 30 AAB
Fig. 4 Workflow for the identification of allele-specific proteins when resources

are available to generate mRNA seq libraries. Example of an ABB hybrid
8. Construct a volcano plot (see Note 4).

9. Select candidate peptides.
10. Confirm the peptide identification at a minimum confidence
level of 95% (see Note 5).
11. BLAST the peptide against all your libraries.
12. Eliminate all peptides that have a 100% match with more than
one gene locus and/or library (see Note 6).
13. Upload your BAM files and the reference genome to visualize
the read count per allele [24].
14. Confirm in Integrative Genomics Viewer (IGV) on read count-
ing that the peptide is unique for the genotype, confirm the
different alleles, and quantify the allele expression (see Note 7).
15. Send all the AA substitutions in the protein to Panther [20]
http://pantherdb.org/tools/csnpScoreForm.jsp to judge the
impact.
16. Send all the differential proteins to Panther [20] for a GO
enrichment.
3 Notes
1. It is preferred to eliminate duplicate proteins because the num-

ber of proteins has an influence on the search statistics and the
decoy database.
2. It is preferred to quantify the peptides via label-free MSMS
because it is cheaper and because in this way the peptides are
quantified at the MS level. Consequently, interesting allele-
specific peptides can be flagged even if they were of lower
abundance and not selected for MSMS. Alternatively, a faster
hybrid MS can be applied that can perform MS and MSMS in
parallel.
3. Alternatively, spectral counting can be applied though less
powerful (see Tables 1 and 2, Fig. 3).
4. Volcano plots are commonly used to display the results of
omics experiments. A volcano plot is a type of scatterplot that
shows statistical significance (P value) versus magnitude of
change (fold change).Volcano plots can be made in many soft-
ware packages such as Microsoft excel or R studio. Since we are
looking here to discover proteins that are unique for a certain
allele, the peptide should not be detected in the other genotype
not containing the allele. Theoretically the fold difference is
then infinite. However, in practice a false positive (low) abun-
dance value can occur due to mismatching between LC runs.
To avoid too many false negative results the cut off value for the
fold change can be set lower than infinite. The minimum
recommended cut off value for the corrected repeated testing
(e.g., Benjamini–Hochberg correction [25]) ANOVA is 0.01.
5. Some of the statistically interesting peptides (unique for a
genotype class) might not result in a hit if the exact peptide
sequence is not present in the database. If the spectrum is of
good quality, an error tolerant or a cross species search can be

conducted, de novo sequencing or spectral clustering [26] can
be done to identify the peptide.
6. In this BLAST search you can find paralogous genes within the
same library/genome and homologous genes in the other
mRNA libraries.
7. Realize that the quantification of the differential peptides in
proteomics is only possible via absolute quantification with a
peptide standard.
Acknowledgments
Jelle van Wezemael, Nadia Campos, Farhana Bhuiyan, and Kusay

Arat are gratefully acknowledged for technical assistance.
References
1. Barker MS, Arrigo N, Baniaga AE et al (2016) 9. Samyn B, Sergeant K, Carpentier S et al (2007)

On the relative abundance of autopolyploids Functional proteome analysis of the banana
and allopolyploids. New Phytol 210:391–398 plant (Musa spp.) using de novo sequence anal-
2. Adams KL, Cronn R, Percifield R, Wendel JF ysis of derivatized peptides. J Proteome Res
et al (2003) Genes duplicated by polyploidy 6:70–80
show unequal contributions to the transcrip- 10. Carpentier SC (2016) 2-D PAGE map analysis.
tome and organ-specific reciprocal silencing. Springer, Berlin, pp 215–235
Proc Natl Acad Sci U S A 100:4649–4654 11. Carpentier SC, America T (2014) Plant prote-
3. Carpentier SC, Pants B, Renaut J et al (2011) omics. Springer, Cham, pp 333–346
The use of 2D-electrophoresis and de novo 12. Soltis DE, Misra BB, Shan S et al (2016) Poly-
sequencing to characterize inter- and intra- ploidy and the proteome. Biochim Biophys
cultivar protein polymorphisms in an allopoly- Acta 1864:896–907
ploid crop. Phytochemistry 72:1243–1250 13. Campos NA, Swennen R, Carpentier SC
4. Vanhove A-C, Vermaelen W, Swennen R et al (2018) The plantain proteome, a focus on
(2015) A look behind the screens: characteri- allele specific proteins obtained from plantain
zation of the HSP70 family during osmotic fruits. Proteomics 18:1700227
stress in a non-model crop. J Proteome 14. Koh J, Chen S, Zhu N et al (2012) Compara-
119:10–20 tive proteomics of the recently and recurrently
5. Wesemael J, Hueber Y, Kissel E et al (2018) formed natural allopolyploid Tragopogon mirus
Homeolog expression analysis in an allotriploid (Asteraceae) and its parents. New Phytol
non-model crop via integration of transcrip- 196:292–305
tomics and proteomics. Sci Rep 8:1353 15. Hu G, Koh J, Yoo M-J et al (2014) Proteomics
6. Zivy M, Wienkoop S, Renaut J et al (2015) The profiling of fiber development and domestica-
quest for tolerant varieties: the importance of tion in upland cotton (Gossypium hirsutum L.).
integrating “omics” techniques to phenotyp- Planta 240:1237–1251
ing. Front Plant Sci 6:448 16. D’hont A, Denoeud F, Aury J-M et al (2012)
7. Page JT, Gingle AR, Udall JA (2013) PolyCat: The banana (Musa acuminata) genome and
a resource for genome categorization of the evolution of monocotyledonous plants.
sequencing reads from allopolyploid organ- Nature 488:213
isms. G3 (Bethesda) 3:517–525 17. Martin G, Baurens F-C, Droc G et al (2016)
8. Khan A, Belfield EJ, Harberd NP (2016) Improvement of the banana “Musa acumi-
HANDS2: accurate assignment of homoealle- nata” reference sequence using NGS data and
lic base-identity in allopolyploids despite miss- semi-automated bioinformatics methods.
ing data. Sci Rep 6:29234 BMC Genomics 17:243
18. Davey MW, Gudimella R, Harikrishna JA et al 23. Li H (2011) A statistical framework for SNP
(2013) A draft Musa balbisiana genome calling, mutation discovery, association
sequence for molecular genetics in polyploid, mapping and population genetical parameter
inter-and intra-specific Musa hybrids. BMC estimation from sequencing data. Bioinformat-
Genomics 14:683 ics 27:2987–2993
19. Li W, Godzik A (2006) Cd-hit: a fast program 24. Thorvaldsdóttir H, Robinson JT, Mesirov JP
for clustering and comparing large sets of pro- (2012) Integrative genomics viewer (IGV):
tein or nucleotide sequences. Bioinformatics high-performance genomics data visualization
22:1658–1659 and exploration. Brief Bioinform 14:178–192
20. Thomas PD, Campbell MJ, Kejariwal A et al 25. Benjamini Y, Hochberg Y (1995) Controlling
(2003) PANTHER: a library of protein families the false discovery rate: a practical and powerful
and subfamilies indexed by function. Genome approach to multiple testing. J Royal Stat Soc B
Res 13:2129–2141 57:289–300
21. Dobin A, Davis CA, Schlesinger F et al (2013) 26. Johansson P, Alm R, Emanuelsson C et al
STAR: ultrafast universal RNA-seq aligner. (2006) SPECLUST: a web tool for clustering
Bioinformatics 29:15–21 of mass spectra. J Proteome Res 5(4):785–792
22. Li H, Handsaker B, Wysoker A et al (2009)
The sequence alignment/map format and
SAMtools. Bioinformatics 25:2078–2079
Chapter 23
Methods for Optimization of Protein Extraction

and Proteogenomic Mapping in Sweet Potato
Thualfeqar Al-Mohanna, Norbert T. Bokros, Nagib Ahsan,
George V. Popescu, and Sorina C. Popescu
Abstract
The complexity in chemical composition alongside the genomic complexity of crop plants poses significant
challenges for the characterization of their proteomes. This chapter provides specific methods that can be
used for the extraction and identification of proteins from sweet potato, and a proteogenomic method for
the subsequent peptide mapping on the haplotype-derived sweet potato genome assembly. We outline two
basic methods for extracting proteins expressed in root and leaf tissues for the label-free quantitative
proteomics—one phenol-based procedure and one polyethylene glycol (PEG) 4000-based fractionation
method—and discuss strategies for the organ-specific protein extraction and increased recovery of
low-abundance proteins. Next, we describe computational methods for improved proteome annotation
of sweet potato based on aggregated genomics and transcriptomics resources available in our and public
databases. Lastly, we describe an easily customizable proteogenomics approach for mapping sweet potato
peptides back to their genome location and exemplify its use in improving genome annotations using a mass
spectrometry data set.
Key words Sweet potato proteomics, Phenol protein extraction, Polyethylene glycol protein extrac-
tion, Proteogenomics analysis
1 Introduction
Sweet potato (Ipomoea batatas, (L.) Lam) is an important global

commercial crop from the morning glory family (Convolvulaceae)
[1]. Worldwide, sweet potato is the sixth most important food crop
following rice, wheat, potato, maize, and cassava. The sweet potato
has a complex genome, being hexaploid (2n ¼ 6 ¼ 90) and highly
polymorphic. As a consequence, sweet potato genome sequencing,
assembly, and annotation has progressed slowly and convolutedly
over the past 10 years. At present, only a sparse collection of
genomic resources and annotations exists for sweet potato; this
includes a haplotype-resolved assembly of I. batatas genome [2],
assembly and gene annotations of progenitor genomes (I. trifida
309
310 Thualfeqar Al-Mohanna et al.
and I. triloba) [3, 4], the proteome annotations of Ipomoea nil [5],
transcriptomics datasets available in public databases [6, 7]. Using a
new technique of haplotype-resolved genome assembly, Yang et al.
[2] de novo sequenced and assembled the hexaploid I. batatas
without the guidance of wild related diploid genomes. The study
examined the probable evolutionary history of the cultivated sweet
potato. As such, sweet potato genome contained two B1 and four
B2 component genomes (B1B1B2B2B2B2) and was proposed to
have resulted from an initial crossing between a tetraploid ancestor
and a diploid progenitor, followed by a whole genome duplication
event [2].
Current extraction methodologies and present technologies
enable analysis of almost complete proteomes in humans and ani-
mals [8]. In plant systems, although discovery and comparative
proteomics approaches have accelerated the pace of breakthroughs
in experimental and crop plants, significant challenges remain. The
depth by which proteomes are being explored and analyzed, and
the means of enhancing the confidence level of protein identifica-
tion continues to be important issues in this field [8, 9]. Conse-
quently, approaches for protein extraction and separation from
tissues are constantly evaluated for performance on parameters
such as protein extraction and detection, accurate quantification,
post-extraction artifacts [10], and amenability to combinatorial
utilization or multiplexing for improving the extraction of low-
abundance proteins and increased throughput [11]. In addition
to these generic challenges, additional hurdles in obtaining protein
preparations of sufficient quality and quantity for mass spectrome-
try have been described for sweet potato, such as low-protein
content in storage roots and high abundance of secondary meta-
bolites in leaves.
Given the rapid advances in genomics and proteomics technol-
ogies, proteogenomic methods are being developed to take advan-
tage of available genomics resources for resolving proteomes as well
as to improve annotations of complex genomes [12, 13]. A pro-
teogenomic analysis of Arabidopsis thaliana uncovered evidence
that 13% of the Arabidopsis proteome was incomplete due to
missing and incorrect gene models including 778 new protein-
coding genes and refined annotation of 695 gene models
[14]. For complex plant genomes with high ploidy, such as sweet
potato, proteogenomics tools can significantly improve genome
annotations while providing a basis for improved peptide
identification.
Recently we have performed a root and leaf LC-MS/MS anal-
ysis and identified 4321 nonredundant proteins from sweet potato
[15]. In this chapter, we describe two methods for protein extrac-
tion and solubilization using sweet potato leaves and tuberous
roots and a proteogenomic method to identify the composition of
leaf and root proteomes using a high-throughput label-free
methodology.
Optimization of Protein Identification for Label-Free Quantitative. . . 311
2 Materials
2.1 Materials 1. Acetone.

for Phenol Procedure, 2. Ammonium acetate.
Method 1 (M1)
3. β-Mercaptoethanol,
4. Disodium EDTA.
5. Ethanol.
6. 8-Hydroxyquinoline
7. Hydrochloric acid.
8. Methanol.
9. Small ball bearing.
10. Sodium hydroxide (NaOH).
11. Sucrose.
12. Tris base and Tris–HCl.
2.1.1 Stock Solutions 1. Make 50 mL of 6 M sodium hydroxide:

(a) Add 12 g of NaOH to 40 mL of Milli-Q H2O and stir
until dissolving completely.
(b) Bring the volume to 50 mL with Milli-Q H2O and store it
at RT.
2. Make 50 mL of 0.2 M EDTA, pH 8.0.
(a) Add 1.86 g disodium EDTA to 30 mL of Milli-Q H2O
and stir it to dissolve EDTA.
(b) Add I mL of 6 M NaOH and monitor the pH to reach
8.0.
(c) Bring the volume to 50 mL with Milli-Q H2O and store it
at RT.
3. Make 500 mL of Extraction buffer, pH 8.0.
(a) Add 154 g of sucrose (final concentration 0.9 M) and
6.0 g of Tris base (final concentration 100 mM) to
350 mL of Milli-Q H2O.
(b) Add 25 mL of 0.2 M EDTA stock (final concentration
10 mM) and stir vigorously until dissolving completely.
(c) Adjust the pH to 8.0 by using 6 M HCl.
(d) Bring the volume to 500 mL with Milli-Q H2O.
(e) Filter sterilize and store it at 4 C.
4. Make 500 mL of Precipitation Solution:
(a) Add 3.85 g of ammonium acetate to 400 mL of methanol
(purity >99%) and stir vigorously until dissolving
completely.
(b) Bring the volume to 500 mL with high purity of

methanol.
(c) Store it at 20 C.
5. Make 500 mL of Phenol Saturated, pH 8.0.
(a) Mix 500 mL of phenol and 500 mL of Tris–HCl, pH 8.0,
in a dark glass bottle.
(b) Stir the buffer vigorously for 20 min and incubate it at
4 C for 2 h; then, remove the Tris buffer in the upper
phase by using a vacuum aspirator in the fume hood.
(c) Add 500 mL of Tris–HCl, pH 8.0 of fresh buffer.
(d) Repeat step b.
(e) Add 0.5 g of 8-Hydroxyquinoline to make final concen-
tration is 0.1% and stir the solution vigorously for 20 min.
(f) Repeat steps c and d.
(g) Check the pH to ensure that it is 8.0. Otherwise, repeat
step f.
(h) Store it at 4 C.
6. Ball bearings (BBs).
(a) Rinse BBs with 3 volumes of phenol saturated buffer,
pH 8.0 for 10 min at RT with inversion, and decant the
solution.
(b) Add 3 volumes of precipitation solution for 5 min at RT
with inversion and decant the solution.
(c) Wash 3 times with 100% methanol and dry them in oven
65 C on a small tray in a single layer.
(d) Store them in sterile falcon tubes.
(e) These ball bearings can be reused if they are not rusty.
2.2 Materials 1. β-Mercaptoethanol,

for Polyethylene Glycol 2. Ethanol.
(PEG) Procedure 4000,
3. Magnesium chloride.
Method 2 (M2)
4. NP-40 detergent.
5. Phenylmethylsulfonyl fluoride (PMSF).
6. Polyethylene glycol (PEG) 4000.
7. Sodium hydroxide (NaOH).
8. Sucrose.
9. Tris–HCl.
2.2.1 Stock Solutions 1. Make 50 mL of 6 M NaOH as described above.

2. Make 500 mL of extraction buffer I:
(a) Add 15.35 g of Tris–HCl and 18.50 g Tris-Base (0.5 M
Tis-Buffer) to 350 mL of Milli-Q H2O and stir it until
completely dissolves.
(b) Add 10 mL of NP-40 (2% V/V) to Tris-Buffer.
(c) Dissolve 87.10 mg of PMSF (1 mM) in acetonitrile; then,
transfer it to Tris-Buffer.
(d) Add 0.95 g of MgCl2 (20 mM) to the buffer and stir it
until completely dissolves.
(e) Add 10 mL of β-Mercaptoethanol (2% V/V) to the buffer.
(f) Adjust the pH to 8.3 if necessary (The pH of Tris buffer is
supposed to be 8.3).
(g) Bring the volume to 500 mL with Milli-Q H2O.
3. Make 250 mL of extraction buffer II.
(a) Add 7.68 g of Tris–HCl and 9.25 g Tris-Base (0.5 M
Tis-Buffer) to 150 mL of Milli-Q H2O and stir it until
completely dissolves.
(b) Add 5 mL of NP-40 (2% V/V) to Tris-Buffer.
(c) Dissolve 43.55 mg of PMSF (1 mM) in acetonitrile; then,
transfer it to Tris-Buffer.
(d) Add 0.48 g of MgCl2 (20 mM) to the buffer and stir it
until completely dissolves.
(e) Add 59.90 g of sucrose (0.7 M) to the buffer.
(f) Add 5 mL of β-Mercaptoethanol (2% V/V) to the buffer.
(g) Adjust the pH to 8.3 if necessary.
(h) Bring the volume to 250 mL with Milli-Q H2O.
4. Make 200 mL of 50% PEG 4000.
(a) Add 100 g to 70 mL of Milli-Q H2O.
(b) Stir the solution vigorously until mixing completely.
(c) Bring the volume to 200 mL with Milli-Q H2O.
3 Methods
3.1 Overview In this section, we describe two protein extraction methods and the
of the Protein optimized protocols for sweet potato leaf and root tissue processing.
Extraction The pipeline for sample processing and protein extraction and
and Optimization solubilization is presented in Fig. 1a. Two distinct protocols were
Methodology optimized for the extraction of proteins from lyophilized sweet
potato tissue, a phenol-based Method 1 (M1) and polyethylene
glycol (PEG) 4000 fractionation-based method (M2) [16]. Briefly,
Fig. 1 Methodology for sweet potato protein extraction and identification. (a) Workflow describing the
methodology for tissue processing, the phenol-based procedure (M1) and PEG fractionation (M2). (b and c)
Representative two-dimensional SDS-PAGE of total protein preparations obtained from sweet potato roots.
Total protein extracted using the Phenol-based procedure in shown in (b), and total protein extracted using the
PEG 4000 procedure in shown in (c). Molecular weights (MW, kDa) of proteins in the marker lanes are listed on
the left. (d) Phenol-based procedure for the extraction of proteins from root and leaves showing the upper
phase following phenol extraction
in M1, proteins are extracted with a Phenol Extraction buffer,

followed by ammonium sulfate–based precipitation, acetone wash
of the pellet, and resuspension in ethanol for storage. In M2,
proteins are extracted with NP-40 Extraction buffer, mixed with
PEG4000 up to 15% final concentration, precipitated with acetone
(4 volumes acetone for leaf protein, and 2 volumes of acetone for
root proteins), followed by resuspension in ethanol for storage.
Compared to organic solvents such as phenol, polyethylene glycols
do not generally denature proteins [17, 18]. Examples of total
protein preparations obtained from sweet potato roots using
Phenol and PEG 4000 are shown in representative 2D gels (Fig. 1b

and c). In our hands, the phenol procedure extracted more protein
from both storage roots and leaves compared to the PEG-based
method. However, cumulatively, the two methods extracted a
larger diversity of protein classes.
3.1.1 Tissue Collection Sweet potato leaves and storage roots are utilized for protein
extraction. Leaf samples harvested from plants in the vegetative
stage yield abundant protein than leaves from mature plants. Stor-
age roots are collected from fully developed mature plants. Follow-
ing collection, samples are immediately frozen in liquid nitrogen
and stored at 80 C until further processing. Frozen tissues can be
lyophilized or ground to a powder in liquid nitrogen using mortar
and pestle.
3.1.2 Protein Extraction In this section, we describe a phenol-based method for extraction and
Using Phenol Procedure solubilization of proteins from sweet potato root and leaf tissue.
(M1) (See Notes 1–4)
1. Add 4 μL of β-Mercaptoethanol for each mL of extraction
buffer to make the final percent is 0.4%.
2. Weigh 200 mg of powdered sample in 2 mL tube and add
3 BBs in it.
3. Add 750 μL of phenol saturated buffer, pH 8 and 750 μL of
extraction buffer, and vortexing vigorously for 1 h at RT to be
homogeneous.
4. Centrifuge the sample at 15,550 g for 10 min at 4 C. The
results for root and leaf are shown in Fig. 1d.
5. Transfer around 330 μL of the upper phase to a screw top
2 mL tube.
6. Add 5 volumes (1650 μL) of ice-cold precipitation solution
(stored at 20 C), and add 3 BBs in it, vortex the sample
briefly to mix it [5].
7. Store the sample at 20 C overnight (16 h) to precipitate the
proteins.
8. Invert the sample that has Precipitated proteins to mix it.
9. Centrifuge at 3,312 g for 10 min at RT.
10. Decant the supernatant and collect the pellets using a magnetic
stand.
11. Add 1 mL of ice-cold precipitation solution and vortex
vigorously.
12. Centrifuge the sample at 15,550 g for 5 min at RT.
13. Repeat step 10.
14. Wash the pellets with 1 mL 80% ice-cold acetone (stored at
20 C) and vortex vigorously.

16. Wash the pellets with 1 mL 70% ice-cold ethanol (stored at
17. Repeat step 15.
18. Resuspend the pellets in 300 μL 70% ethanol and store them at
20 C.
3.1.3 Protein Extraction In this section, we describe a PEG 4000-based method for extraction
Using Polyethylene Glycol and solubilization of proteins from sweet potato root and leaf tissue.
Procedure 4000 (M2)
1. Weigh 500 mg of powdered sample in 15 mL tube.
2. Add 5 mL of extraction buffer I, and vortexing vigorously for
15 min at RT to be homogeneous.
3. Centrifuge the sample at 828 g for 15 min at 4 C.
(a) Supernatant: keep it for step 4.
(b) Pellets: add 5 mL of extraction buffer II, and repeat
steps 2 and 3.
4. Filter the supernatant by using a 2.0 μm filter to remove any
impurities or insoluble residues in the supernatant.
5. Add 50% PEG 4000 to make final concentration is 15% in the
supernatant and mix the solution to be homogeneous.
6. Incubate the sample on the ice for 30 min.
7. Centrifuge the sample at 13,250 g for 15 min at 4 C and
keep the supernatant for next step [6, 7].
8. Add cold acetone (stored at 20 C) 2 volumes for sweet
potato root and 4 volumes for sweet potato leaves to precipitate
the proteins.
9. Incubate the sample at 20 C for 30 min.
10. Centrifuge at 15,550 g for 5 min [8].
11. Keep the pellets for the next step.
12. Wash the pellets with 1 mL 80% ice-cold acetone (stored at
13. Centrifuge the sample at 15,550 g for 5 min at RT.
14. Decant the supernatant and collect the pellets.
15. Wash the pellets with 1 mL 70% ice-cold ethanol (stored at
17. Resuspend the pellets in 500 μL 70% ethanol and store them at
20 C.
Fig. 2 Workflow for peptide search and identification by LC-MS/MS. A concatenated database containing
“target” and “decoy” sequences was employed to estimate the false discovery rate (FDR) [20]. Peptide
assignments from the database search were filtered down to a 1% FDR by a logistic spectral score as
previously described [20, 21]
3.2 LC-MS/MS Purified sweet potato protein preparations were identified using
and Peptide LC-MS/MS. The LC-MS/MS analysis was performed as described
Identification in [19] and outlined in Fig. 2. Briefly, the peptides were separated
through a linear reversed-phase gradient through a C18 column.
Survey full scan MS spectra (m/z 400–1800) were acquired at a
resolution of 17,500. The search was performed using MASCOT
v. 2.4 (Matrix Science, Ltd., London, UK). The resulting peptide-
spectrum matches (PSMs) were reduced to sets of unique PSMs by
eliminating lower scoring duplicates. To provide high confidence
data, the MASCOT results were filtered for Mowse Score (>20).
Peptide assignments from the database search were filtered down to
a 1% FDR as previously described [19, 20]. Peptide spectrum
matching of MS/MS spectra was searched against the NCBI Ipo-
moea taxon (txid4119) proteins dataset containing 58,282 pro-
teins (NCBI; downloaded 2/12/2018) using MASCOT v. 2.4
(Matrix Science, Ltd., London, UK).
3.3 Proteogenomic Unique peptides from the LC-MS/MS analysis were queried using
Analysis Workflow tBLASTn against either the haplotype resolved sweet potato
genome or the transcriptome available at http://public-genomes-
ngs.molgen.mpg.de/SweetPotato/DOWNLOADS/. To imple-
ment a new classification method of BLAST search results, that
allow mapping of peptides to existing genome annotations while
allows the discovery of novel peptides, composition-based filtering
was turned off, the word size was decreased to 2, and the e-value
Fig. 3 Workflow for the proteogenomic analysis of sweet potato
cutoff was raised to 1000 to increase the number of potential hits

returned against query peptides [22]; only the top 50 hits/query
were generated to limit the size of the output files. tBLASTn results
were next parsed to identify perfect and imperfect matches along
the sweet potato genome and transcriptome. Perfect matches were
defined as peptide hits that matched its entire length with no
mismatches or gaps. Imperfect matches were defined as peptide
hits that mapped perfectly along >90% of the length of the query
peptide with a sequence identity >80% (this category excluded
previously matched peptides). Multiple hits were allowed for indi-
vidual peptides as they were expected due to a large number of
estimated genes, numerous predicted genome duplication events
and the high ploidy of sweet potato. The proteogenomics analysis
workflow is outlined in Fig. 3.
3.4 Proteogenomic Here we describe the computational methods needed to analyze the
Analysis Method (See peptides identified in the proteomics screen. The method assumes that
Note 5) genome and transcriptome annotations are available for matching
and validating identified peptides. The method allows for the discov-
ery and improvement of genomics annotations in the target genome.
3.4.1 Data Input 1. MASCOT output file for individual tissues, extraction meth-
Processing ods, and replicate numbers should first be obtained.
for Proteogenomic Analysis 2. Uniquely identified peptides from all tissues, extraction meth-
ods, and replicates should be parsed, modified to remove fea-
ture annotations, and pooled into a FASTA format query file.
(a) Read MASCOT output files into R using the “read.xlsx”
function in the “openxlsx” package.
(b) Parse unique peptides from read MASCOT files using the
R code: “all_peptides<-unique(c(file1$peptide_column,
file2$peptide_column, filex, peptide_column)”.
(c) Use the “stringr” package to remove nonalphanumeric

characters using the “str_replace_all” function.
(d) Format the unique peptides from target datasets into
FASTA format using the “write.fasta” function in the
“seqinr” package.
Example FASTA formatted file for use in successive steps:
>SWPT_1
MGKGPGLYTDIGKK
>SWPT_2
KKKPVTVSYNGEDKPGFLKK
>SWPT_3
MTLGAGGSSVVVPRN
>SWPT_4
MASLLLPGGRT
...
Alternatively, MASCOT output files can be parsed using

Excel.
3. Download a FASTA formatted genome for a target species of
interest:
(a) Obtain the sweet potato genome from http://public-
genomes-ngs.molgen.mpg.de/SweetPotato/
DOWNLOADS/.
(b) Download the sweet potato transcriptome datasets from
the same site.
Other transcriptome annotations may be available at the
following sites:
GT4SP: http://sweetpotato.plantbiology.msu.edu/index.
shtml and.
SRA: https://www.ncbi.nlm.nih.gov/sra.
3.4.2 Blast Peptides 1. Build a nucleotide database from the genome using the “make-
against Genome blastdb” function of BLAST 2.2.31.
and Transcriptome 2. Run tBLASTn to map peptides against the genome using the
Annotations following parameters:
(a) db: nucleotide database created in step 4,
(b) query: peptide query created in step 2,
(c) word_size: 2,
(d) outfmt: 6,
(e) comp_based_stats: F,
(f) evalue: 1000,
(g) max_target_seqs: 50.
3. Repeat steps 4 and 5 for the transcriptome of target species.

4. Read the BLAST output files into R to generate the peptide list
mapping:
(a) Append the original query peptide sequences into a new
column for each hit using the “merge” function of the
“dplyr” package.
(b) Calculate the length of each peptide for each hit using the
“length” function.
(c) Calculate the coverage of each query peptide along a hit
within the genome as the length of a query hit/length of
the full peptide.
3.4.3 Classify Peptides 1. Design a peptide hit classification scheme based on perfect or
and Generate New imperfect matches within the genome or transcriptome and
Annotations (See Note 6) generate the corresponding category lists. Use the following
definitions:
(a) Perfect match:
l Coverage: 100%.
l Mismatches: 0.
l Gaps: 0.
(b) Imperfect match:
l Peptide hit was not previously classified as a perfect
match.
l Coverage: >90%.
l Sequence identity: >80%.
2. Generate lists of classified peptide hits as follows:
(a) Perfect match on the genome and perfect match on the
transcriptome—these are the peptides that support cur-
rent genome sequence and gene model annotations.
(b) Perfect match on the genome and imperfect match on the
transcriptome—this category includes short exons
extensions.
(c) Perfect match on the genome and no match on the tran-
scriptome—this category includes putative coding
sequences in intergenic regions, large exons extensions,
intron retentions, exon alternative ORFs and 50 and
30 UTRs.
(d) Imperfect match on the genome and no match on the tran-
scriptome—this category represents a combination of
events: putative coding sequences in intergenic regions,
exons extensions, intron retentions, exon alternative
ORFs and 50 and 30 UTRs combined with genome
assembly errors, Single Nucleotide Polymorphisms

(SNPs) and indels.
(e) Perfect match on the transcriptome and imperfect match on
the genome—this includes SAAVs, indels, synonymous
SNPs and genome assembly errors.
(f) Perfect match on the transcriptome and no match on the
genome—this includes alternative splice junctions, indels,
and genome assembly errors.
(g) Imperfect match on the transcriptome and imperfect-match
on the genome—includes SAAVs, indels, and combinations
of events including synonymous SNPs, alternative splice
junctions, genome assembly errors, short exons
extensions.
(h) Imperfect match on the transcriptome and no match on the
genome—this includes combinations of events including
SNPs, alternative splice junctions, indels, and genome
assembly errors.
3. Generate annotations for the peptides classified above: analyze
the novel peptides in categories b. to h. from step 2 and classify
each peptide, using definitions from [13, 23], in one of the
following classes:
(a) Intergenic.
(b) Intron retention.
(c) Exon extension.
(d) 50 UTR
(e) 30 UTR
(f) Exon alternative ORFs.
(g) Alternative splicing event.
(h) SAAVs.
(i) Putative Indels.
(j) Putative synonymous SNPs.
(k) Genome assembly error.
4. Generate BED format annotations corresponding to the novel
peptides classified at the previous step.
5. Visualize the genome, transcriptome and the peptides annota-
tions using IGV [24]. Superpose the peptide BED format
annotations as tracks on the sweet potato genome and tran-
scriptome annotations.
3.5 Novel Peptide For well annotated genomes, newly identified peptides in shotgun
Analysis proteomics are typically assigned to one of the following categories
and Validation [13, 23]: pseudogenes, lncRNA, intergenic region, exon extension,
Fig. 4 Visualization of sweet potato genome transcriptome and peptide annotations files in the Integrative
Genomics Viewer
intron retention, alternative splicing, alternative ORF, 50 or 30

UTR, and single amino acid variants (SAAV). However, the sweet
potato genome is only partially resolved and the low amount of
transcriptomic data does not capture entirely the gene expression.
The peptides mapped in the previous step were first classified in
eight categories based on matching existing genome and transcrip-
tome annotations. The novel peptides where then assigned to one
of the eleven classes and used to improve the genome and tran-
scriptome annotations. The peptides were visualized in the Integra-
tive Genomics Viewer (IGV) [24] together with genomic and
transcriptomic annotations (Fig. 4). Selected novel peptides can
be further validated using orthogonal validation methods as
in [23].
The protocols described herein provide a baseline set of tools
that facilitate streamlined extraction of proteins for mass spectrom-
etry applications and mapping of peptides to a target genome for
genome annotations through the use of proteogenomics. Proteo-
genomics analysis methods described here provide additional evi-
dence for currently annotated genes and transcripts and predict
novel ORFs and splice variants of annotated genes in complex
polyploid plants. While the optimized protein extraction and pro-
teogenomics methods described in this chapter were exemplified
for the analysis of sweet potato, they are easily customizable for
other plant proteomes and can be used for further improvement of
genome and proteome annotations.
4 Notes
1. Handling multiple samples at once can be difficult and condu-

cive to errors. We recommend processing 24 samples or less in
parallel.
2. 20 mL phenol, 70 mL Precipitation solution, 20 mL Extrac-
tion buffer, and 150 ball bearings are assumed to be sufficient
for 24 samples.
3. Before starting the extraction, incubate Phenol at 4 C for

30 min to phase-separate, remove the small clear upper phase
and use the lower phase as described in Phenol (M1) method.
4. Bring samples to room temperature (RT) and keep the samples
sitting on ice for 5–10 min before weighing. For long-term
storage conditions, the weighing samples should be taken less
than 30 min.
5. If additional genomic data is available, such as annotated pseu-
dogenes, SAAVs, SNPs, ncRNAs, alternative ORFs, transfrags,
the peptide classification can be extended with additional cate-
gories of annotations (Pseudogene, ncRNA, short ORFs, etc.).
6. Multiple hits were retained along different regions of the
genome and transcriptome for both perfect and imperfect
matches.
Acknowledgments
Funding for this work was provided by the USDA-National Insti-

tute of Food and Agriculture Hatch project MIS-145120, and the
Mississippi Agricultural & Forestry Experiment Station to SCP,
RR, and MS. GVP acknowledges the support from the USDA-
Agricultural Research Unit through the Big Data: Biocomputing,
Bioinformatics, and Biological Discovery project 6066-21310-
004-25-S.
References
1. International Potato Center (2017) Sweetpo- 7. NCBI (2019) The Sequence Read Archive
tato facts and figures. Accessed 15 Jan 2019. online at: https://www.ncbi.nlm.nih.gov/sra.
http://www.cipotato.org/sweetpotato/ Projects: SRA PRJNA79717, SRA
2. Yang J, Moeinzadeh M-H, Kuhl H et al (2017) PRJEB4145, SRA PRJNA72435
Haplotype-resolved sweet potato genome 8. Aebersold R, Mann M (2016) Mass-
traces back its hexaploidization history. Nat spectrometric exploration of proteome struc-
Plants 33:696–703 ture and function. Nature 537:347
3. Hirakawa H, Okada Y, Tabuchi H et al (2015) 9. Omenn GS, Lane L, Lundberg EK et al (2015)
Survey of genome sequences in a wild sweet Metrics for the human proteome project 2015:
potato, Ipomoea trifida (HBK) G. Don. DNA Progress on the human proteome and guide-
Res 22:171–179 lines for high-confidence protein identification.
4. Wu S, Lau KH, Cao Q et al (2018) Genome J Proteome Res 14:3452–3460
sequences of two diploid wild relatives of 10. Rose JKC, Bashir S, Giovannoni JJ et al (2004)
cultivated sweetpotato reveal targets for Tackling the plant proteome: practical
genetic improvement. Nat Commun 9:4580 approaches, hurdles and experimental tools.
5. Hoshino A, Jayakumar V, Nitasaka E et al Plant J 39:715–733
(2016) Genome sequence and analysis of the 11. Erickson BK, Rose CM, Braun CR et al (2017)
Japanese morning glory Ipomoea nil. Nat A strategy to combine sample multiplexing
Commun 7:13295 with targeted proteomics assays for high-
6. Leinonen R, Sugawara H, Shumway M et al throughput protein signature characterization.
(2010) The sequence read archive. Nucleic Mol Cell 65:361–370
Acids Res 39:D19–D21
12. Jaffe JD, Berg HC, Church GM (2004) Pro- 19. Ahsan N, Belmont J, al CZ (2017) Highly
teogenomic mapping as a complementary reproducible improved label-free quantitative
method to perform genome annotation. Pro- analysis of cellular phosphoproteome by opti-
teomics 4:59–77 mization of LC-MS/MS gradient and analyti-
13. Nesvizhskii AI (2014) Proteogenomics: con- cal column construction. J Proteome
cepts, applications and computational strate- 165:69–74
gies. Nat Methods 11:1114–1125 20. Elias JE, Gygi SP (2007) Target-decoy search
14. Castellana NE, Payne SH, Shen Z et al (2008) strategy for increased confidence in large-scale
Discovery and revision of Arabidopsis genes by protein identifications by mass spectrometry.
proteogenomics. Proc Natl Acad Sci U S A Nat Methods 4:207
105:21034–21038 21. Yu K, Sabelli A, DeKeukelaere L et al (2009)
15. Al-Mohanna T, Ahsan N, Bokros NT et al Integrated platform for manual and high-
(2019) Tissue-specific proteomic and proteo- throughput statistical validation of tandem
genomic analysis of sweetpotato (Ipomoea mass spectra. Proteomics 9:3115–3125
batatas). J Proteome Res 18:2719–2734 22. Zhou K, Panisko EA, Magnuson JK et al
16. Lee DG, Ahsan N, Lee SH et al (2007) A (2008) Proteomics for validation of automated
proteomic approach in analyzing heat- gene model predictions. United States.
responsive proteins in rice leaves. Proteomics Accessed 15 Jan 2019 https://www.osti.gov/
7:3369–3383 servlets/purl/1241230
17. Atha DH, Ingham KC (1981) Mechanism of 23. Zhu Y, Orre LM, Johansson HJ et al (2018)
precipitation of proteins by polyethylene gly- Discovery of coding regions in the human
cols. Analysis in terms of excluded volume. J genome by integrated proteogenomics analysis
Biol Chem 256:12108–12117 workflow. Nat Commun 9:903
18. Ingham KC (1990) Precipitation of proteins 24. Robinson JT, Thorvaldsdóttir H, Winckler W
with polyethylene glycol. Methods Enzymol et al (2011) Integrative genomics viewer. Nat
182:301–306 Biotechnol 29:24
Chapter 24
In Silico Analysis of Class III Peroxidases: Hypothetical

Structure, Ligand Binding Sites, Posttranslational
Modifications, and Interaction with Substrates
Sabine Lüthje and Kalaivani Ramanathan
Abstract
Functional analyses of peroxidases are a major challenge. In silico analysis appears to be a powerful tool to
overcome at least some of the problems that arose from (1) the numerous possible functions of peroxidases,
(2) their low substrate specificity, and (3) the compensation of knockout mutants by other isoenzymes.
Amino acid sequences and crystal structures of peroxidases were used for the prediction of tertiary
structures, posttranslational modifications, ligand and substrate binding sites, and so on of uncharacterized
peroxidases. This protocol presents tools and their applications for an in silico analysis of soluble and
membrane-bound peroxidases, but it may be used for other proteins, too.
Key words AtPrx47, AtPrx64, HRP, Tertiary structure, Topology, Posttranslational modification,
Substrate channel analysis
1 Introduction
Plant peroxidases of the secretory pathway (EC 1.11.1.7; class III

peroxidases, donor: H2O2 oxidoreductases) are a huge protein
family of heme-containing enzymes that bear at least a N-terminal
signal peptide directed to the endoplasmic reticulum (ER), show a
high structural conservation, N-glycosylation, and other posttrans-
lational modifications [1, 2]. Numerous possible functions, low
substrate specificity, and compensation of knockout mutants by
other isoenzymes make functional analyses of these enzymes a
major challenge [3–5]. An in silico analysis appears to be a powerful
tool to solve at least a part of these problems [6–10].
Manifold bioinformatic tools are available in the World Wide
Web. The Bioinformatics Resource Portal of the Expert Protein
Analysis System (ExPASy) provides a list of freely accessible predic-
tion programs (https://www.expasy.org/proteomics). Tools for
(1) protein sequences and identification, (2) proteomic
325
326 Sabine Lüthje and Kalaivani Ramanathan
experiments, (3) function analysis, (4) sequence sites, features and

motifs, (5) protein modifications, (6) protein structure, (7) protein
interactions, and (8) similarity search and alignment can be found
at this site [11]. The results of two different programs and versions
for the same prediction may reveal different results because of the
algorithms used.
Proteins can be positively identified by the Basic Local Align-
ment Search Tool for proteins (BLASTp) in any protein database as
long as the coverage of the amino acid sequence is sufficient by the
detected peptides. Examples for protein databases are the servers of
the Universal Protein database (UniProt) and the National Center
for Biotechnology Information (NCBI) that collected amino acid
sequences with experimental evidence [12, 13]. The ARAMEM-
NON database makes amino acid sequences of thale cress (Arabi-
dopsis thaliana (L.) HEYNH) and membrane proteins available
[14]. PeroxiBase allocates amino acid sequences of 11,236 entries
for peroxidases (as on January 2019), 6554 out of these are class III
peroxidases [15]. At least 220 entries in the PeroxiBase are from
Arabidopsis. Crystal structures of several soluble peroxidases are
available at the protein databank (PDB, https://www.rcsb.org/)
[16]. For example, 958 peroxidase templates are provided at PDB
(as on January 2019). Twelve of these templates are isoenzymes of
Arabidopsis.
The Fast Approximation of Smith & Waterman Algorithm
(FASTA)-format of amino acid sequences [17] allows for predic-
tion of physicochemical properties [18], posttranslational modifi-
cations [19–22], topology [23–25], signal peptides and cellular
localization of a protein [26, 27]. Tools like SwissModel and Pro-
tein Homology/AnalogY Recognition Engine (Phyre2) predict
tertiary structures of proteins by templates with high sequence
similarity and confidence that enable analysis of structural compo-
nents and substrates [28, 29]. Due to the fact that the structure of
the cleavable N-terminal signal peptide is missing in templates of
soluble peroxidases, this part is missing in models predicted by
SwissModel. In contrast, Phyre2 calculates this part by an ab initio
method. However, confidence of this part may be low. For predic-
tion of ligand binding-sites, hypothetical models (PDB format) can
be submitted to 3D-LigandSite [30]. On the Protein Data Bank
Europe the interactive tool Proteins, Interfaces, Structures and
Assemblies (PDBePISA) is available that allows for exploration of
macromolecular interfaces [31, 32].
Different tools visualize predicted macromolecular structures.
For example, University of California, San Francisco (UCSF) Chi-
mera [33] or PyMOL Molecular Graphics System (Schrödinger,
LLC, New York, USA), a Python based open-source viewer, are
frequently used for protein images. PyMOL allows for the calcula-
tions of electrostatics with the Adaptive Poisson-Boltzmann Solver
(APBS) plugin as well as ligand docking and binding-site analyses
In Silico Analysis of Class III Peroxidases 327
by the Autodock/Vina plugin [34–36]. Autodock (http://

autodock.scripps.edu/), FireDock (http://bioinfo3d.cs.tau.ac.il/
FireDock/), PatchDock, or SwissDock can be used for prediction
of molecular interactions between a target protein and a small
molecule [37–41]. For docking analyses, templates of chemical
compounds are available at the ZINC database [42].
This protocol used two plasma membrane-bound class III
peroxidases from Arabidopsis (AtPrx47 and AtPrx64) as examples
[10, 43] to show application of some of these tools for the in silico
analysis of plant peroxidases. Soluble horseradish peroxidase (HRP,
1hch.pdb) was used for comparison.
2 Materials
2.1 Amino Acid FASTA format of amino acid sequences.

Sequences
2.1.1 PeroxiBase http://peroxibase.toulouse.inra.fr/.

(RedOxiBase)
2.2 Physicochemical https://web.expasy.org/protparam/

Properties
2.2.1 ProtParam v. 1.0
2.3 Topology http://www.cbs.dtu.dk/services/TMHMM/

2.3.1 TMHMM v. 2.0
2.3.2 HMMTOP v. 2.0 http://www.enzim.hu/hmmtop/
2.4 Signal Peptides http://www.cbs.dtu.dk/services/SignalP/

and Localization
2.4.1 SignalP v. 4.1
2.4.2 PSORT v. 1.0) http://psort1.hgc.jp/form.html
2.5 Posttranslational https://prosite.expasy.org/

Modifications
2.5.1 Pyrrolidone
Carboxylic Acid
Modification (PROSITE
v. 20.0)
2.5.2 N-Glycosylation http://www.cbs.dtu.dk/services/NetNGlyc/

(NetNGlyc v. 1.0)
2.5.3 Palmitoylation http://csspalm.biocuckoo.org/

(CSS-PALM v. 2.0)
2.5.4 GPI-Anchor http://gpi.unibe.ch/

(GPI-SOM)
2.6 Tertiary For modeling of the three-dimensional structure of peroxidases by

Structure SwissModel or Phyre2, the most similar crystallized class III per-
oxidases with sufficient sequence similarities (>30%) to the peroxi-
dase amino acid sequence were used as templates.
2.6.1 Modeling of Protein https://swissmodel.expasy.org/

Structure
SwissModel
Phyre2 v. 2.0 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id¼index
2.7 Interactive Protein images were prepared either by UCSF Chimera v. 1.13.1 or
Visualization by PyMOL v. 2.2. The electrostatics of protein surfaces was calcu-
of Structures lated with the APBS plugin using PyMOL generated PQR and
existing hydrogens and termini. Docking models were prepared
with UCSF Chimera.
2.7.1 PyMol v. 2.2 https://pymol.org/2/
APBS Plugin https://raw.githubusercontent.com/Pymol-Scripts/Pymol-script-

repo/master/plugins/apbsplugin.py
2.7.2 UCSF Chimera https://www.cgl.ucsf.edu/chimera/

v. 1.13.1
2.8 Docking http://www.sbg.bio.ic.ac.uk/3dligandsite/

Analyses
2.8.1 3DLigandSite v. 1.0
2.8.2 PatchDock Server (https://bioinfo3d.cs.tau.ac.il/PatchDock/)

v. 1.3
2.8.3 SwissDock Server http://www.swissdock.ch/docking
Target Templates 1. HRP (1hch),

2. AtPrx47 (modeled by 5twt.1.A using SwissModel),
3. AtPrx64 (modeled by 3hdl.1.A using SwissModel).
Substrate Templates http://zinc.docking.org/

(ZINC v. 12)
1. ZINC00057908: Esculetin (6,7-Dihydroxycoumarin).
2. ZINC00057733: Scopoletin (7-Hydroxy-5-methoxycou-
marin, 7-Hydroxy-6-methoxychromen-2-on).
3. ZINC00056615: DAB (Diaminobenzidine).
4. ZINC36470923: Indol-3-acetic acid (IAA).
5. ZINC30320649: Nicotine-Adenine-dinucleotide reduced
form (NADH).
6. ZINC00058258: Ferulic acid.
7. ZINC13512224: Guaiacol.
8. ZINC12359045: Coniferyl alcohol (4-(3-hydroxy-1-prope-
nyl)-2-methoxyphenol).
9. ZINC00001554: Salicylic acid.
10. ZINC12418399: Sinapyl alcohol (4-(3-hydroxyprop-1-enyl)-
2,6-dimethoxyphenol).
11. ZINC01532486 Caffeyl alcohol (4-(3-hydroxy-1-propen-1-
yl)-1,2-benzenediol).
3 Methods
3.1 Amino Acid Amino acid sequences (FASTA-Format) of peroxidases were down-
Sequences loaded from PeroxiBase.
3.1.1 PeroxiBase 1. Open PeroxiBase.

2. Go to multicriteria search.
3. Enter Arabidopsis, chose peptide (PEP) and search (see Note
1).
4. Add class III peroxidase and search again.
5. Choose peroxidase of interest (e.g., AtPrx47, AtPrx64).
6. Export Amino Acid Sequence(S) in FASTA Format.
3.2 Physicochemical 1. Open ProtParam v.1.0.

Properties 2. Paste target sequence into the text box (see Note 2).
3. Click the Compute parameters button.
4. Extract predicted data of interest (number of amino acids,
molecular weight, theoretical isoelectric point (pI), etc.).
3.3 Topology 1. Open TMHMM v. 2.0 (see Note 3).

3.3.1 TMHMM 2. Paste target sequence(s) into the text box.
3. Chose output format (e.g., Output format: extensive, with
graphics).
4. Click the submit button.
5. Extract data of interest (e.g., number of transmembrane heli-
ces, expected number of amino acids in transmembrane helices,
and amino acid range of transmembrane helices) (see Note 4).
3.3.2 HMMTOP 1. Open HMMTOP v. 2.0.

2. Go to submit.
3. Paste target sequence into the text box.
4. Click the submit button.
5. Extract predicted data of interest (e.g., number of transmem-
brane helices, amino acid range of transmembrane helices,
location of N- and C-termini, etc.).
3.4 Signal Peptides 1. Open SignalP v. 4.1.

and Localization 2. Paste target sequence into the text box (see Note 5).
3.4.1 SignalP 3. Select Organism group (e.g., Eukaryotes).
4. Use default for other parameters or adapt to the experimental
setup.
5. Submit.
6. Extract data of interest.
3.4.2 PSORT 1. Open PSORT v. 1.0.

2. Select source of Input sequence (e.g., plant).
3. Type a sequence ID (e.g., AtPrx47).
5. Submit.
6. Extract final results and data of interest.
3.5 Posttranslational 1. Open PROSITE v. 20.0.

Modifications 2. Paste target sequence into the text box (Quick Scan mode of
3.5.1 Pyrrolidone Scan Prosite) (see Note 6).
Carboxylic Acid (PCA) 3. Use default attitudes or disable.
4. Scan.
5. Extract predicted features as far as useful.
3.5.2 N-Glycosylation 1. Open NetGlyc v. 1.0.

2. Paste target sequence(s) into the text box.
3. Use default attitudes or change.
4. Submit.
5. Extract data of interest (e.g., position, potential of predicted
N-glycosylation sites).
3.5.3 Palmitoylation 1. Open CSS-PALM v. 2.0.

2. Select online.
3. Paste target sequence into the text box (see Note 7).
4. Use default attitudes or change threshold as needed.
5. Submit.
6. Extract results or visualize.
7. For FAQs, go to documentation.
3.5.4 GPI-Anchor 1. Open GPI-SOM.

2. Paste target sequence(s) into the text box.
3. Create a job name.
4. Click the GO button.
5. Extract results.
3.6 Tertiary 1. Start modeling.

Structure 2. Paste target sequence into the text box (see Note 8).
3.6.1 Modeling of Protein 3. Add a project name.
Structure 4. Add your e-mail address to be informed at the end of
SwissModel modeling.
5. Click the Build Model button (see Note 9).
6. Choose the model with best QMEAN value.
Phyre 2 1. Open Phyre2 v. 2.0 (see Note 10).

2. Enter e-mail address to be informed when job has finished.
4. Set modeling Mode to intensive.
5. After the job has finished, model (PDB) and link with results
will be sent by e-mail.
6. Open results link.
7. Extract data of interest and in case of multiple template predic-
tion templates as well (see Note 11).
3.7 Interactive Protein images have been prepared by PyMOL (Fig. 1a–f) and
Visualization UCSF Chimera (Fig. 1g–o) as described below.
of Structures
3.7.1 PyMOL PyMOL is free for educational use. For publication of PyMOL
images, a license is necessary. Tutorials, Script Library, Plugins
and Commands, and so on are given at https://pymolwiki.org/
index.php/Main_Page.
Tertiary Structure 1. Open the target PDB (e.g., 1hch) in PyMOL v. 2.2 in the
Upper Control Panel, by file and open (see Note 12).
2. Modify the protein structure in the PyMOL Viewer window by
the Object Control Panel, in layer target (e.g., 1HCH): H
(Hide) everything; S (show) cartoon.
3. Visualize cofactors (heme and calcium atoms) of the peroxidase
by the following steps. PyMOL Viewer window, Selection
Tools: S (sequence).
4. PyMOL Viewer window, Selection Tools: Selecting: Atoms.
5. Move the gray bar at the bottom of the sequence to the other
end and select the heme (HEM),
6. Modify the molecule in the PyMOL Viewer window. Use the
Object Control Panel, choose the layer of the selected HEM
(sele): S (Show) sticks, C (Color) grays and gray 60.
7. Deselect the heme by a click to the background in the
Display Area.
8. Rotate the molecule according to the Mouse Legend (left
button and mouse movement). Select δMeso edge, and cen-
ter atom of the heme in the Display Area.
9. Change the color of both atoms in the PyMOL Viewer win-
dow, use the Object Control Panel: layer (sele): C (Color)
reds, red.
10. Deselect by click to background in the Display Area.
11. Select both calcium (CA) atoms at the end of the sequence.
12. Visualize the atoms in the PyMOL Viewer window. Use the
Object Control Panel; choose the layer of the selected CA
(sele): S (Show) spheres, C (Color) cyans, cyan.
14. Highlight β-Sheets in the PyMOL Viewer Window, Use Selec-
tion Tools: Selecting: Residues
15. Select the three amino acids of each β-sheet in the structure
according to the Mouse Control Legend. Mouse Mode:
3-Button Viewing, find β-sheets by rotation of the molecule
(left button and mouse movement). Zoom in if necessary
(right button and mouse movement).
Fig. 1 In silico analyses of AtPrx47 and AtPrx64 in comparison to HRP. Electrostatic potential surface of (a) the
reference protein HRP is shown in comparison to Phyre2 models of (b) AtPrx47 and (c) AtPrx64. Negative
electrostatic potential (red); positive electrostatic potential (blue); neutral electrostatic potential (white). (d)
HRP with helices (pink), β-sheets (yellow), heme (gray sticks), and Ca2+ ions (blue spheres) and four disulfide
bonds (blue sticks). Alignment of models predicted by SwissModel (orange) and Phyre2 (blue) for (e) AtPrx47
and (f) AtPrx64. Alignment of AtPrx47 models by the two prediction programs clearly reveals the missing
β-sheets from the modeling. (g) Ferulic acid binding in HRP, (h) caffeyl alcohol binding in AtPrx47, (i) sinapyl
alcohol binding in AtPrx64. Helices (green), substrates (blue sticks) showing different substrate affinity pattern
based on the channel electrostatics. (j) Active site of HRP with ferulic acid, (k) active site of AtPrx47 with
caffeyl alcohol, (l) active site of AtPrx64 with sinapyl alcohol. Substrate channel of (m) HRP, (n) AtPrx47, and
(o) AtPrx64. Hydrophobic residues (yellow), polar residues (green), basic residues (blue) and acidic residues
(red) are shown. Figures a–f were created by the PyMOL Molecular Graphics system and a–c with the ABPS
plugin. Figures g–o were created by UCSF Chimera Molecular Visualization Application. Further results of in
silico analysis of AtPrx47 and AtPrx64 may be found in Lüthje and Martinez-Cortes [10]
16. Modify the color of β-sheets in the PyMOL Viewer window,

use the Object Control Panel, layer (sele): C (Color) yellows,
yellow.
17. Deselect by click to background of the Display Area.
18. Rotate the molecule for optimal perspective (left button and
mouse movement).
19. Upper Control Panel, Ray.
20. Upper Control Panel, File, Save Image as, for example, png.
Alignment 1. Open the target PDBs in PyMOL Upper control panel, by file
and open.
2. PyMOL Viewer window, Object Control Panel, layer all: H
(Hide) everything; S (Show) cartoon.
3. PyMOL Viewer Window, Object Control Panel, layer target 1:
C (Color) for example, blues, blue.
C (Color) for example, oranges, orange.
A (Action) align to molecule, Object: target 2.
7. Upper control panel, Ray.
8. Upper control window, File, Save Image as, for example, png.
APBS Plugin 1. Open the target in PDB format by PyMOL in the Upper
Control Panel, file and open.
2. Upper Control Panel, choose Plugin and APBS Tools.
3. Use “PyMOL generated PQR and PyMOL generated Hydro-
gens and termini” (see Note 13).
4. Set Grid and Run APBS (see Note 14).
5. Open visualization.
6. Update.
7. In the Molecular Surface Box choose solvent accessible surface
and Show.
8. Upper Control Panel, open File and Save Image as, for exam-
ple, png in the main menu for all Images of interest.
9. Upper Control Panel, type “rotate y, 180” in the command line
to get the back view.
10. Upper Control Panel, type “rotate x, 90” in the command line
to get the top view.
3.7.2 UCSF Chimera 1. Open UCSF Chimera v.1.13.1.

2. Select file menu and Open the required PDB model.
Tertiary Structure
3. Open Actions menu, select the Surface sub-menu to show the
surface of the protein.
4. Open Actions menu, select transparency according to the req-
uisite (e.g., 30%).
5. Open Select menu, Residue option will help to highlight the
amino acids based on their properties.
Active Center 1. Select file menu and Open the required PDB model.
2. Select, Structure, Protein.
3. Action, Ribbon, hide.
Surface by Properties 1. Select file menu and Open the required PDB model.
of Residues 2. Open Select menu, select Residue, amino acid category (e.g.,
Hydrophobic).
3. Open Actions menu, select Color, yellow.
4. Open Select menu, select Residue, amino acid category, polar.
5. Open Actions menu, select Color, green.
6. Open Select menu, select Residue, standard amino acids, basic
amino acids: LYS, ARG, HIS.
7. Open Actions menu, select Color, blue.
8. Open Select menu, select Residue, standard amino acids, acidic
amino acids: GLU, ASP.
9. Open Actions menu, select Color, blue.
10. Open Select menu, select residue, all nonstandard, HEM.
11. Open Actions menu, select Color, by heteroatom.
3.8 Docking 1. Open 3DLigandSite v. 1.0.

Analyses 2. Paste target sequence into the text box (see Note 15).
3.8.1 3DLigandSite 3. Enter your e-mail address.
4. Enter a job description.
5. Start prediction by the 3dligandSite search button.
6. A link with results will be sent by e-mail.
7. Extract the predicted model (PDB format) and data of interest.
3.8.2 Protein-Heme 1. The Protein Database (PDB) file from SwissModel was used in
Docking PatchDock server v. 1.3 to get a heme bound peroxidase
structure.
2. The receptor molecule and the substrate (heme) molecule are
given as PDB files.
3. Clustering RMSD (Root-Mean-Square Deviation) value is nor-
mally kept as 4.0.
4. An e-mail address is given to get the results.
5. The results obtained are redefined by FireDock server and
ranked based on the ACE (Atomic Contact Energy) values.
6. The best heme bound peroxidase PDB model is chosen for
substrate docking.
3.8.3 SwissDock 1. The heme bound protein (peroxidase) (PDB file) is used as the
Analysis target in the SwissDock server.
2. The substrate (ligand) obtained from ZINC database v. 12.0 is
Protein–Substrate Docking uploaded in MOL2 format.
3. The job name and the e-mail address are given and the docking
procedures are initiated by click on the Start Docking button.
4. A link to a zip file with all docking possibilities is obtained as a
result to the e-mail address.
3.8.4 Visualization 1. UCSF Chimera v 1.13.1 was used for docking analysis.
of Docking Results 2. After opening the required PDB file (target.pdb) obtained
from the SwissDock online tool, the substrate interaction is
analyzed.
3. Under Tools menu, Surface/binding analysis submenu, View
Dock option is used.
4. Open Dock results tab pops up, which helps in selecting the
substrate cluster file, “Clusters.dock4.pdb”.
5. Then the type selection is made by clicking “Dock 4, 5 or 6”
option.
6. This gives all the possible docking results of the substrate.
7. Then the substrate docking is checked manually and the plau-
sible product is decided by its access to the substrate channel
and based on “Delta G values.”
8. This procedure has been repeated for each protein with all
substrates tested (Table 1).
Table 1
Docking analyses of natural and artificial peroxidase substrates
Substrates HRP –ΔG AtPrx47 –ΔG AtPrx64 –ΔG

1-methoxynaphthalene 5.86 5.42 5.84
Caffeyl alcohol 5.70 6.56 6.48
Coniferyl alcohol 5.45 6.21 6.16
Esculetin/ 6,7-Dihydroxycoumarin 5.61 6.11 6.27
Ferulic acid 6.29 5.65 6.01
Sinapyl alcohol 6.11 5.64 6.54
Salicylic acid 5.40 No result 5.53
IAA 5.95 5.62 5.49
NADH 7.13 8.34 9.54
DAB 5.40 No result 5.23
Guaiacol 5.43 4.96 5.49
Docking analyses have been done by UCSF Chimera v. 1.13.1 with models predicted by SwissDock server using
SwissModels and ZINCdock files as templates [42]. Structures of AtPrx47 and AtPrx64 were predicted by SwissModel
using templates from class III peroxidases from switchgrass (5twt1A) and highly glycosylated peroxidase from royal palm
tree (3hdl1A), respectively [32, 44]. Crystal structure of HRP (1 hch) was used for comparison [45]. Caffeyl alcohol fits
best for AtPrx47, whereas sinapyl alcohol appears to be preferred by AtPrx64. Macromolecular structure of AtPrx47 will
need further elucidation, because β-sheets were predicted neither by SwissModel nor by Phyre2 [10] and salicylic acid or
3,30 -diaminobenzidine (DAB) could not be fitted to the model
4 Notes
1. Alternatively, enter short name of peroxidase of interest (e.g.,

AtPrx64).
2. Alternatively, enter accession number (ACC) (e.g., Q9SZB9 or
Q43872) or a sequence identifier (ID) and select endpoints of
sequence in the next step before computing parameters.
3. Older version may be used alternatively.
4. Plots can be extracted in postscript, as script for gnuplot, or raw
data for plotting if necessary.
5. Alternatively, upload a file in FASTA format or use multiple
sequences in FASTA format.
6. UniProtKB ACC.No. or identifiers or PDB identifiers are
possible.
7. Input can be multiple sequences in FASTA format or an
uploaded file.
8. Or upload a target sequence file in FASTA format.
9. Alternatively, search for templates before starting modeling.
10. Login to have expert mode with more options.

11. Models predicted by Phyre2 will automatically be submitted to
3DLigandSite and a link for the prediction will be given in the
results.
12. Alternatively, Upper Control Panel, Plugin, PDB Loader Ser-
vice, enter the 4-digit PDB code (e.g., 1 hch).
13. If this does not work choose one of the other options.
14. In case of unassigned atoms, delete those in the Object Control
Panel of the PyMOL Viewer window. Use the newly created
layer (unassigned), go to A (Action) and remove atoms. Run
APBS again.
15. Submit your own protein structure.
Acknowledgments
This work was supported by a PhD student grant to K. R. from the

Dr. Elisabeth Appuhn Foundation.
References
1. Welinder KG, Justesen AF, Kjaersgard IV et al 9. Shigeto J, Nagano M, Fujita K et al (2014)
(2002) Structural diversity and transcription of Catalytic profile of Arabidopsis peroxidases,
class III peroxidases from Arabidopsis thaliana. AtPrx-2, 25 and 71, contributing to stem lig-
Eur J Biochem 269:6063–6081 nification. PLoS One 9:e105332
2. Zámocký M, Furtmüller PG, Obinger C 10. Lüthje S, Martinez-Cortes T (2018)
(2010) Evolution of structure and function of Membrane-bound class III peroxidases: unex-
class I peroxidases. Arch Biochem Biophys pected enzymes with exciting functions. Int J
500:45–57 Mol Sci 19:E2876
3. Hiraga S, Sasaki K, Ito H et al (2001) A large 11. Artimo P, Jonnalagedda M, Arnold K et al
family of class III plant peroxidases. Plant Cell (2012) ExPASy: SIB bioinformatics resource
Physiol 42:462–468 portal. Nucleic Acids Res 40:W597–W603
4. Passardi F, Cosio C, Penel C et al (2005) Per- 12. UniProt Consortium T (2018) UniProt: the
oxidases have more functions than a Swiss army universal protein knowledgebase. Nucleic
knife. Plant Cell Rep 24:255–265 Acids Res 46:2699
5. Cosio C, Dunand C (2009) Specific functions 13. Sayers EW, Agarwala R, Bolton EE et al (2019)
of individual class III peroxidase genes. J Exp Database resources of the National Center for
Bot 60:391–409 Biotechnology Information. Nucleic Acids Res
6. Lüthje S, Meisrimler CN, Hopff D et al (2011) 47:D23–D28
Phylogeny, topology, structure and functions 14. Schwacke R, Schneider A, Van Der Graaff E
of membrane-bound class III peroxidases in et al (2003) ARAMEMNON, a novel database
vascular plants. Phytochemistry 72:1124–1135 for Arabidopsis integral membrane proteins.
7. Herrero J, Esteban-Carrasco A, Zapata JM Plant Physiol 131:16–26
(2013a) Looking for Arabidopsis thaliana per- 15. Fawal N, Li Q, Savelli B et al (2013) Peroxi-
oxidases involved in lignin biosynthesis. Plant Base: a database for large-scale evolutionary
Physiol Biochem 67:77–86 analysis of peroxidases. Nucleic Acids Res 41:
8. Herrero J, Fernández-Pérez F, Yebra T et al D441–D444
(2013b) Bioinformatic and functional charac- 16. Berman HM, Westbrook J, Feng Z et al (2000)
terization of the basic peroxidase 72 from Ara- The Protein Data Bank. Nucleic Acids Res
bidopsis thaliana involved in lignin 28:235–242
biosynthesis. Planta 237:1599–1612
17. Lipman DJ, Pearson WR (1985) Rapid and 31. Krissinel E, Henrick K (2007) Inference of
sensitive protein similarity searches. Science macromolecular assemblies from crystalline
227:1435–1441 state. J Mol Biol 372:774–797
18. Gasteiger E, Hoogland C, Gattiker A et al 32. Moural TW, Lewis KM, Barnaba C et al (2017)
(2005) Protein identification and analysis Characterization of class III peroxidases from
tools on the ExPASy server. In: Walker JM Switchgrass. Plant Physiol 173:417–433
(ed) The proteomics protocols handbook. 33. Pettersen EF, Goddard TD, Huang CC et al
Humana Press, New York, pp 571–607 (2004) UCSF chimera a visualization system
19. Ren J, Wen L, Gao X et al (2008) CSS-Palm for exploratory research and analysis. J Comput
2.0: an updated software for palmitoylation Chem 25:1605–1612
sites prediction. Protein Eng Des Sel 34. Grosdidier A, Zoete V, Michielin O (2011)
21:639–644 SwissDock, a protein-small molecule docking
20. Sigrist CJA, de Castro E, Cerutti L et al (2012) web service based on EADock DSS. Nucleic
New and continuing developments at PRO- Acids Res 39:W270–W277
SITE. Nucleic Acids Res 21:D344–D347 35. Morris GM, Huey R, Lindstrom W et al (2009)
21. Blom N, Sicheritz-Ponten T, Gupta R et al Autodock4 and AutoDockTools4: automated
(2004) Prediction of post-translational glyco- docking with selective receptor flexiblity. J
sylation and phosphorylation of proteins from Comput Chem 16:2785–2791
the amino acid sequence. Proteomics 36. Seeliger D, de Groot BL (2010) Ligand dock-
4:1633–1649 ing and binding site analysis with PyMOL and
22. Fankhauser N, M€aser P (2005) Identification autodock/Vina. J Comput Aided Mol Des
of GPI anchor attachment signals by a Koho- 24:417–422
nen self-organizing map. Bioinformatics 37. Nanda T, Tripathy K, Ashwin P (2011) Inte-
21:1846–1852 gration of Bioinformatics Tools for Proteomics
23. Krogh A, Larsson B, von Heijne G et al (2001) Research. J Comput Sci Syst Biol S13. https://
Predicting transmembrane protein topology doi.org/10.4172/jcsb.S13-002
with a hidden Markov model: application to 38. Hetenyi C, van der Spoel D (2011) Toward
complete genomes. J Mol Biol 305:567–580 prediction of functional protein pockets using
24. Tusnády GE, Simon I (2001) The HMMTOP blind docking and pocket search algorithms.
transmembrane topology prediction server. Protein Sci 20:880–893
Bioinformatics 17:849–850 39. Andrusier N, Nussinov R, Wolfson HJ (2007)
25. Möller S, Croning MDR, Apweiler R (2001) FireDock: fast interaction refinement in molec-
Evaluation of methods for the prediction of ular docking. Proteins 69:139–159
membrane spanning regions. Bioinformatics 40. Mashiach E, Schneidman-Duhovny D, Andru-
17:646–653 sier N et al (2008) FireDock: a web server for
26. Nielsen H, Krogh A (1998) Prediction of sig- fast interaction refinement in molecular dock-
nal peptides and signal anchors by a hidden ing. Nucleic Acids Res 36:W229–W232
Markov model. Proc Int Conf Intell Syst Mol 41. Schneidman-Duhovny D, Inbar Y, Nussinov R
Biol 6:122–130 et al (2005) PatchDock and SymmDock: ser-
27. Nakai K, Horton P (1999) PSORT: a program vers for rigid and symmetric docking. Nucleic
for detecting the sorting signals of proteins and Acids Res 33:W363–W367
predicting their subcellular localization. Trends 42. Irwin JJ, Sterling T, Mysinger MM et al (2012)
Biochem Sci 24:34–35 ZINC: a free tool to discover chemistry for
28. Waterhouse A, Bertoni M, Bienert S et al biology. J Chem Inf Model 52:1757–1768
(2018) SWISS-MODEL: homology modelling 43. Lee Y, Rubio MC, Alassimone J et al (2013) A
of protein structures and complexes. Nucleic mechanism for localized lignin deposition in
Acids Res 46:W296–W303 the endodermis. Cell 153:402–412
29. Kelley LA, Mezulis S, Yates CM et al (2015) 44. Watanabe L, de Moura PR, Bleicher L et al
The Phyre2 web portal for protein modeling, (2010) Crystal structure and statistical cou-
prediction and analysis. Nat Protoc pling analysis of highly glycosylated peroxidase
10:845–858 from royal palm tree (Roystonea regia). J Struct
30. Wass MN, Kelley LA, Sternberg MJ (2010) Biol 169:226–242
3DLigandSite: predicting ligand-binding sites 45. Berglund GI, Carlsson GH, Smith AT et al
using similar structures. Nucleic Acids Res 38 (2002) The catalytic pathway of horseradish
(Suppl):W469–W473 peroxidase at high resolution. Nature 417:463
Chapter 25
MALDI Mass Spectrometry Imaging of Peptides in Medicago

truncatula Root Nodules
Caitlin Keller, Erin Gemperline, and Lingjun Li
Abstract
Mass spectrometry imaging is routinely used to visualize the distributions of biomolecules in tissue sections.
In plants, mass spectrometry imaging of metabolites is more often conducted, but the imaging of larger
molecules is less frequently performed despite the importance of proteins and endogenous peptides to the
plant. Here, we describe a matrix-assisted laser desorption/ionization mass spectrometry imaging method
for the imaging of peptides in Medicago truncatula root nodules. Sample preparation steps including
embedding in gelatin, sectioning, and matrix application are described. The method described is employed
to determine the spatial distribution of hundreds of peptide peaks.
Key words Medicago truncatula, Root nodules, Peptides, Mass spectrometry imaging, MSI, MALDI
1 Introduction
Matrix-assisted laser desorption/ionization mass spectrometry

imaging (MALDI-MSI) is a powerful tool to visualize the spatial
distribution of molecules in a tissue [1]. In MALDI-MSI, a laser is
fired at discrete positions across a matrix-covered tissue. At each
position, a mass spectrum is collected. Once the instrument collects
mass spectra at all of the positions, software programs generate a
pixel for each discrete position and extracts the ion intensity for a
particular m/z across all pixels to create an image, or heat map, for
that m/z. In this way, hundreds of images can be generated from a
single instrument run. To prepare a sample for analysis, the general
sample preparation steps are flash freezing and embedding, section-
ing, and applying a suitable matrix. Sample preparation is a critical
step to preserve the sample and to achieve good signal of the chosen
analytes [2, 3]. For example, the matrix coating, which assists in
ionizing analyte molecules in the tissue section, can influence the
type of analytes in a sample that will ionize and the spatial resolution
of the imaging experiment. MALDI-MSI has been applied to many
different analyte types, including metabolites [4, 5], neuropeptides
341
342 Caitlin Keller et al.
[6], and proteins [7] in many different organisms. However, appli-

cations of the technique to plants have focused on small molecules
[8], with only a few focusing on larger molecules [9–12].
Here, we provide a detailed protocol focusing on applying
MALDI-MSI to investigate peptides present in the root nodules
of Medicago truncatula (Medicago) [9]. Medicago forms
specialized organs, called root nodules, on its roots as a result of a
symbiotic relationship with rhizobia bacteria for biological nitrogen
fixation. Plant peptides are involved in the formation of the nodule
on the roots of the plant, as well as in plant growth and develop-
ment [13, 14]. For example, nodule-specific cysteine-rich peptides
are involved in the differentiation of bacteria into bacteroids in the
root nodules [15], and CLAVATA3/embryo-surrounding region
(CLE) peptides are involved in autoregulation of nodulation
[16, 17]. Thus, the protocol here aims to provide a method that
can be used to determine the spatial distribution of plant peptides
via MALDI-MSI to further our understanding about these impor-
tant biomolecules.
2 Materials
Prepare all solutions using ultrapure water (Milli-Q) and high-

performance liquid chromatography (HPLC) grade organic sol-
vents, unless otherwise noted. Reagents can be stored at room
temperature, unless otherwise noted.
2.1 Embedding 1. Plant material: Medicago truncatula plants inoculated with

Nodules Sinorhizobium meliloti (Rm1021).
2. Embedding Media: 100 mg/mL gelatin (DB Difco™). Dis-
solve gelatin in 37 C water bath.
3. Plastic embedding containers suitable for storage at 80 C.
4. Dry ice.
2.2 MALDI-MSI 1. Optimal cutting temperature (OCT) compound.

Sample Preparation 2. 25 75 mm glass slides
3. 50% Methanol: methanol, water (50:50 v:v)
4. 50% Methanol 0.1% FA: methanol, water (50:50 v:v), 0.1%
formic acid (FA)
5. DHB matrix solution: 40 mg/mL 2,5-dihydroxybenzoic acid
(DHB) in 50% methanol 0.1% FA. Sonicate the matrix until
completely dissolved.
6. 50% Acetonitrile: acetonitrile, water (50:50 v:v)
7. 50% Acetonitrile 0.1% FA: acetonitrile, water (50:50 v:v), 0.1%
formic acid
Peptide Imaging in Medicago Truncatula 343
8. CHCA matrix solution: 5 mg/mL α-cyano-4-hydroxycin-

namic acid (CHCA) in 50% acetonitrile 0.1% FA. Sonicate the
matrix until completely dissolved.
9. SA matrix solution: 5 mg/mL sinapic acid (SA) in 50% aceto-
nitrile 0.1% FA. Sonicate the matrix to completely dissolve it.
3 Methods
3.1 Embedding 1. Figure 1 demonstrates the sample workflow for MALDI-MSI

Nodules of Medicago root nodules.
2. Trim nodules from the plant with about 2–4 mm of surround-
ing roots (see Note 1).
3. Place nodule in a plastic cup or similar holding container of
appropriate size for your sample (for example a
5 mm 5 mm 5 mm square plastic cup for very small
samples) with a drop of 100 mg/mL gelatin (see Note 2).
4. Place on dry ice and wait for nodule and gelatin to freeze. The
gelatin will turn white when frozen.
5. Once the nodule is frozen, fill the embedding container with
100 mg/mL gelatin. Wait for the entire embedding container
Fig. 1 MALDI-MSI scheme showing the sample preparation, instrument analysis, and data analysis steps for a
typical experiment
with gelatin to freeze. Once the gelatin is completely white, the

nodule can be stored at 80 C (see Note 3) prior to MSI
analysis.
3.2 MALDI-MSI 1. Take the embedded nodule and trim sample to rectangle with a
Sample Preparation couple mm of gelatin surrounding the tissue on all sides. Do
this quickly to minimize the time the sample is at room
temperature.
2. Attach sample to a cryostat chuck with a drop of OCT com-
pound (see Note 4).
3. Allow sample on the chuck to equilibrate in the cryostat at
20 C for 15 min.
4. Align the sample so that the cryostat is cutting sections evenly
across the root and root nodule. This can be done by taking
about five sections and adjusting the chuck if part of the sample
is being missed (see Note 5).
5. Once the center of the nodule (or other desired depth) is
reached, thaw mount sections onto a glass slide by warming
the back of the slide against your hand and then placing the
front of the slide gently onto the tissue section.
6. Continue until a desired number of sections across the z stack
(i.e., the depth) of the root nodule are obtained.
7. Keep the sections in a dry environment (i.e., dry box) while
preparing the TM Sprayer for matrix application (see Note 6).
8. Turn nitrogen gas on TM Sprayer to 10 psi, and the solvent
pump to 0.25 mL/min. The solvent for the pump should
match what the matrix is dissolved in (without the FA), so for
DHB this would be 50% methanol and for CHCA this would
be 50% acetonitrile. Turn on the TM Sprayer and laptop (see
Note 7).
9. Set the temperature on the software to the appropriate temper-
ature for the desired solvent and TM Sprayer system (see
Note 8). As a starting point, 80 C is the appropriate tempera-
ture for 50% methanol.
10. Load the dissolved matrix (i.e., DHB, CHCA, SA; see Note 9)
into the sample loop with the knob in the load position.
11. Load the TM Sprayer method and manually change gas pres-
sure and flow rate if method differs from the initial parameters
of 10 psi and 0.25 mL/min. The TM Sprayer has recom-
mended methods for specific matrices and analyte types,
although method parameters may need to be optimized for a
specific application. For DHB imaging of peptides, method
parameters typically used are 1250 velocity, 0.1 mL/min,
12 passes, 30 s dry time, rotate and offset (cc pattern), 10 psi,
80 C. For imaging of peptides using CHCA and SA as
matrices, method parameters to start from are 1100 velocity,

0.2 mL/min, 8 passes, 30 s dry time, rotate and offset (cc pat-
tern), 10 psi, 85 C (see Note 10).
12. Once the TM Sprayer has reached the appropriate temperature,
add slides containing sample to the sample holder. Secure slides
in place as necessary to prevent movement during matrix
application.
13. Switch the sample loop knob to the spray position. Once
matrix is coming out of the nozzle, start the TM Sprayer
program.
14. After the matrix application is finished, cool down the system
while flushing with the solvent the matrix is dissolved in (for
DHB, this would be 50% methanol) at 0.25 mL/min. Rinse
the sample loop three times with solvent and toggle the knob.
Once the system is below 50 C, the system can be turned off.
15. Store the sample in a dry box at 20 C if running on the
instrument the following day.
3.3 MSI Data 1. Place glass slide(s) with sample into the slide adapter. If import-
Acquisition ing the image of the glass slide, scan the slide in the adapter
on the MALDI LTQ with a scanner. Then add the backing plate and insert the plate
Orbitrap XL into the instrument. Alternatively, the slide can be scanned after
inserting the plate into the instrument with the camera in the
instrument (see Note 11).
2. Open the plate image in the MALDI source dialog box in the
Tune software. Zoom in as necessary to see sample, depending
on sample size. Draw boxes around the areas to be imaged (see
Note 12). Save this as a MALDI position file. For MS1 imag-
ing, using a rectangle box and raster motion works best. Also
set the desired spatial step size (75 μm is the smallest raster size
without oversampling).
3. In Xcalibur software (Thermo), set up the sequence by adding
the file name, path location, instrument method, and MALDI
position file. The instrument method contains parameters
controlling the mass resolution, mass range, and centroid/
profile data. The instrument method also requires a tune file,
which controls the laser energy and the microscans (micro-
scans/step is controlled in the instrument file). The microscans
and microscans/step should match to ensure that one pixel is
one mass spectrum in the data file.
4. Check the laser energy by shooting the laser on a matrix only
area that is not being imaged and checking the signal level. You
can adjust the laser energy in your tune file as necessary to get
the optimal signal.
5. Start the sequence.
3.4 Data Processing 1. Once the data is collected, it can be viewed in ImageQuest, or
exported to another software program. To visualize the data in
ImageQuest, use the average spectra within a selected area tool
to view an average spectrum of a certain area of the sample. In
the bottom window of ImageQuest, there should be a spec-
trum from the sample. Figure 2 shows example spectra aver-
aged over the nodules for peptide imaging results with DHB,
CHCA, and SA matrices.
2. Look through the peaks in the collected spectrum, zooming in
as appropriate, and when one wants to visualize the distribution
of a certain peak in the tissue, select add new data set. Select the
single dataset option with plot type Mass Range/TIC. Use the
m/z for the mass range and select the desired tolerance window
(i.e., 5 ppm). Repeat as necessary to visualize the m/z in the
sample. Under the 2D tab, there are other color bar options as
well as smoothing options.
3. To view in MSiReader [18], export the data in ImageQuest
into an imzML format, keeping the data in profile.
4. Load the imzML file into MSiReader and select the desired
mass tolerance, image smoothing, and color bar parameters.
Insert a m/z that is localized across the sample to visualize the
sample (one can find a good m/z for this in ImageQuest).
Normalize to the total ion count (TIC). To pull out m/z
unique to the sample, use the polygon tool to create interro-
gated and reference zones. Outline around the sample to create
an interrogated zone, then create a matrix only region for the
reference zone.
5. Use the extract peaks unique to the interrogated zone tool to
create a list of m/z present in the image. One will need to set
percentage numbers for the threshold a m/z needs to be above
in the interrogated zone and the threshold a m/z needs to be
below in the reference zone to be added to the list. Also set the
algorithm for peak centroid calculation (typically parabolic
centroid works well).
6. Once the list has been created, use the generate an image for
each peak in a list tool to create images for all the m/z. Manu-
ally go through the images and remove any bad images (i.e.,
images that have signal in the matrix as well as the sample or
do not appear to have any signal anywhere). Figure 3 shows
example MALDI-MSI images generated from peptide imaging
of root nodules with either DHB or CHCA as the matrix.
Different distributions across the root and root nodules are
observed.
Fig. 2 Example spectra average over the entire root nodules for MALDI-MSI on
the root nodules with different matrices. The matrices are CHCA (a), DHB (b), and
SA (c)
Fig. 3 MALDI-MSI images of peptides with either DHB (a, b) or CHCA (c, d, e) as the matrix. The images are
generated at 5 ppm
4 Notes
1. For best results, select nodules that are red in color and elon-
gated rod in shape rather than round. These are the nodules in
which the symbiosis is well developed.
2. To make the sectioning process easier, ensure that the nodule is
as flat as possible with the root in line with the nodule. This will
help to get both the root and the nodule in the same plane
when sectioning.
3. If the nodule is not completely frozen when covered in gelatin,
it will not stick to the bottom of the cup and instead will float
up to the middle or top of the cup. This makes the nodule
harder to find and may result in the positioning of the nodule
being lost. After adding the gelatin, the cup should be kept
level while waiting for the rest of the gelatin to freeze. If the
gelatin freezes at an angle, it will be harder to level the nodule
while sectioning to get both the root and root nodule in a
single section. Avoid air bubbles close to the nodule when
adding the gelatin, as this also will make the nodules harder
to section.
4. OCT compound is beneficial as it provides a way to “glue” the
sample to the cryostat chuck during sectioning. However, it is a
polymeric species and will suppress analyte signal if it comes
into contact with the sample. Thus, care should be taken to
ensure that the OCT compound does not come into contact
with the sample or with the blade or stage of the cryostat. By
placing a small drop of OCT to the back of the gelatin sur-
rounded sample, where the OCT does not contact the sample
or blade, the sample can be secured onto the chuck without any
interference from the OCT.
5. For plant root nodules, our lab typically uses 16 μm, but other
section thicknesses between 8 and 35 μm [19], can be used.
6. After sectioning and before matrix application, washing steps
to remove highly abundant lipid species can increase signal
intensity and observed protein peaks [20]. For protein imag-
ing, ethanol washes and potentially a Carnoy wash are typically
used to remove the lipid species that can suppress protein
signal. For endogenous peptide imaging, washes may (or may
not) remove the target peptides, depending on the chemical
properties of the peptides. Thus, care should be taken when
using washing techniques with peptides to ensure that they are
not being removed in the washing steps.
7. Here the TM Sprayer is used to apply the matrix evenly across
the sample. It is important that the matrix is applied in a
homogenous manner at all points on the tissue so that matrix
inhomogeneity does not skew the results. A matrix application
method should be reproducible run-to-run to ensure that
results remain consistent. Other automatic sprayers can be
used (i.e., home-built or the Bruker ImagePrep). Other matrix
application techniques include the airbrush and sublimation
[21]. Airbrush application can be achieved easily with minimal
expense, however, user-to-user variation can be high and
reproducibility can be a challenge. Sublimation provides very
small crystal size and good imaging results for metabolomics
studies, but due to the dry application, the method requires
further recrystallization steps for analysis of larger molecules
(i.e., peptides and proteins) [22].
8. The temperature of the TM Sprayer should be about 5 C
below the temperature at which the “puffing” sound starts.
This sound indicates that the matrix is not being sprayed in a
consistent manner. If run at a temperature when the solvent is
“puffing” the matrix will not cover the sample homogeneously,
which will negatively affect results.
9. There are many different matrices to choose from. DHB and
CHCA are both common matrices and can be used for a variety
of analytes. Other matrices may be used primarily for larger
peptides and proteins (i.e., SA) or primarily for negative mode
(i.e., 9-aminoacrilamide). Matrices other than DHB and
CHCA may work well depending on your desired analyte.
10. If this method is too wet, you can cut the flow rate in half and
double the number of passes to achieve the same matrix density
but with a drier spray.
11. The preferred scanning method depends on the sample and
time considerations. For the nodules, scanning in with the
camera on the instrument provides good alignment and
image quality, but this takes 25 min per slide. For larger tissues,
the scanner separate from the instrument works well and

saves time.
12. To check the alignment of the image to the slide in the instru-
ment you can click a point on the image and check the cursor
position on the camera box on the tune page to see where the
actual position is. It can also be helpful to check the outside of
the boxes to ensure the sample is not being cut off.
Acknowledgments
This work was supported in part by funding from the National

Science Foundation (NSF) Division of Integrative Organismal Sys-
tems (IOS) RESEARCH PGR award #1546742, the University of
Wisconsin-Madison Graduate School and the Wisconsin Alumni
Research Foundation (WARF). The MALDI-Orbitrap and
Q-Exactive instruments were purchased through an NIH shared
instrument grant (NCRR S10RR029531). LL acknowledges a
Vilas Distinguished Achievement Professorship and Charles Mel-
bourne Johnson Professorship with funding provided by the WARF
and University of Wisconsin-Madison School of Pharmacy.
References
1. Caprioli RM, Farmer TB, Gile J (1997) Molec- 7. Chaurand P, Norris JL, Cornett DS et al
ular imaging of biological samples: localization (2006) New developments in profiling and
of peptides and proteins using MALDI-TOF imaging of proteins from tissue sections by
MS. Anal Chem 69:4751–4760 MALDI mass spectrometry. J Proteome Res
2. Goodwin RJ, Pennington SR, Pitt AR (2008) 5:2889–2900
Protein and peptides in pictures: imaging with 8. Lee YJ, Perdian DC, Song Z et al (2012) Use
MALDI mass spectrometry. Proteomics of mass spectrometry for imaging metabolites
8:3785–3800 in plants. Plant J 70:81–95
3. Buchberger AR, DeLaney K, Johnson J et al 9. Gemperline E, Keller C, Jayaraman D et al
(2018) Mass spectrometry imaging: a review of (2016) Examination of endogenous peptides
emerging advancements and future insights. in Medicago truncatula using mass spectrome-
Anal Chem 90:240–265 try imaging. J Proteome Res 15:4403–4411
4. Ye H, Gemperline E, Venkateshwaran M et al 10. Poth AG, Mylne JS, Grassl J et al (2012) Cyclo-
(2013) MALDI mass spectrometry-assisted tides associate with leaf vasculature and are the
molecular imaging of metabolites during nitro- products of a novel precursor in petunia (Sola-
gen fixation in the Medicago truncatula-Sinor- naceae). J Biol Chem 287:27033–27046
hizobium meliloti symbiosis. Plant J 11. Cavatorta V, Sforza S, Mastrobuoni G et al
75:130–145 (2009) Unambiguous characterization and tis-
5. Gemperline E, Jayaraman D, Maeda J et al sue localization of Pru P 3 peach allergen by
(2015) Multifaceted investigation of metabo- electrospray mass spectrometry and MALDI
lites during nitrogen fixation in Medicago via imaging. J Mass Spectrom 44:891–897
high resolution MALDI-MS imaging and 12. Grassl J, Taylor NL, Millar AH (2011) Matrix-
ESI-MS. J Am Soc Mass Spectrom assisted laser desorption/ionisation mass spec-
26:149–158 trometry imaging and its development for
6. Chen RB, Li L (2010) Mass spectral imaging plant protein imaging. Plant Methods 7:11
and profiling of neuropeptides at the organ and 13. Tavormina P, De Coninck B, Nikonorova N
cellular domains. Anal Bioanal Chem et al (2015) The plant peptidome: an
397:3185–3193
expanding repertoire of structural features and 19. Qin L, Zhang Y, Liu Y et al (2018) Recent
biological functions. Plant Cell 27:2095–2118 advances in matrix-assisted laser desorption/
14. Batut J, Mergaert P, Masson-Boivin C (2011) ionisation mass spectrometry imaging
Peptide signalling in the rhizobium-legume (MALDI-MSI) for in situ analysis of endoge-
symbiosis. Curr Opin Microbiol 14:181–187 nous molecules in plants. Phytochem Anal
15. Van de Velde W, Zehirov G, Szatmari A et al 29:351–364
(2010) Plant peptides govern terminal differ- 20. Seeley EH, Oppenheimer SR, Mi D et al
entiation of bacteria in symbiosis. Science (2008) Enhancement of protein sensitivity for
327:1122–1126 MALDI imaging mass spectrometry after
16. Mortier V, Den Herder G, Whitford R et al chemical treatment of tissue sections. J Am
(2010) CLE peptides control Medicago trun- Soc Mass Spectrom 19:1069–1077
catula nodulation locally and systemically. 21. Gemperline E, Rawson S, Li L (2014) Optimi-
Plant Physiol 153:222–237 zation and comparison of multiple MALDI
17. Mortier V, De Wever E, Vuylsteke M et al matrix application methods for small molecule
(2012) Nodule numbers are governed by inter- mass spectrometric imaging. Anal Chem
action between CLE peptides and cytokinin 86:10030–10035
signaling. Plant J 70:367–376 22. Yang J, Caprioli RM (2011) Matrix sublima-
18. Robichaud G, Garrard KP, Barry JA et al tion/recrystallization for imaging proteins by
(2013) MSiReader: an open-source interface mass spectrometry at high spatial resolution.
to view and analyze high resolving power MS Anal Chem 83:5728–5734
imaging files on matlab platform. J Am Soc
Mass Spectrom 24:718–721
Chapter 26
Cystatin Activity–Based Protease Profiling to Select

Protease Inhibitors Useful in Plant Protection
Marie-Claire Goulet, Frank Sainsbury, and Dominique Michaud
Abstract
Protease inhibitors of the cystatin protein superfamily show potential in plant protection for the control of
herbivorous pests. Here, we describe a cystatin activity–based profiling procedure for the selection of potent
cystatin candidates, using single functional variants of tomato cystatin SlCYS8 and digestive Cys proteases
of the herbivore insect Colorado potato beetle as a case study. The procedure involves the capture of target
Cys proteases with biotinylated versions of the cystatins, followed by the identification and quantitation of
captured proteases by mass spectrometry. An example is given to illustrate usefulness of the approach as an
alternative to current procedures for recombinant inhibitor selection based on in vitro assays with synthetic
peptide substrates. A second example is given showing its usefulness as a tool to compare the affinity spectra
of inhibitor variants toward different subsets of target protease complements.
Key words Plant protease inhibitors, Herbivorous insect digestive proteases, Cystatin activity–based
protease profiling, Cys protease capture, Biotinylated cystatins
1 Introduction
Many authors have discussed the potential of plant protease inhi-

bitors to protect crops from herbivorous pests [1, 2], and the
implementation of protease inhibitor–expressing plant lines in agri-
cultural fields has been documented in recent years [3, 4]. These
proteins act as competitive pseudosubstrate inhibitors to enter the
active site cleft of target proteases and prevent peptide bond hydro-
lysis [5]. Inhibited enzymes in the pest midgut may no longer
process plant proteins, causing dietary protein wastage, amino
acid shortage, developmental delays, and eventual death of the
herbivore [6].
Significant efforts have been deployed over the years to
improve the potency of protease inhibitors for pest control, mostly
involving the rational design of inhibitor variants with improved
activity toward model target proteases and/or the engineering of
hybrid inhibitor fusions integrating multiple functional domains
353
354 Marie-Claire Goulet et al.
[7]. At present, a complementary task to harness the full potential

of these proteins in plant protection consists of developing analyti-
cal tools adapted to the functional characterization and proper
selection of potent inhibitor candidates. Current procedures to
compare the potency of protease inhibitors against herbivore diges-
tive proteases typically rely on in vitro protease inhibitor assays with
synthetic peptide substrates to calculate dissociation constants (Kd)
toward model proteases or to determine threshold inhibitory con-
centrations (e.g., IC50) for the inhibition of specific protease
families in midgut extracts [8]. Such measurements, although giv-
ing basic information about the relative inhibitory potency of
inhibitor candidates, say little about the eventual value of these
proteins in actual plant–pest contexts. Peptide substrates for diag-
nostic purposes are selected based on their specificity toward well-
defined protease families but their resistance to some isoforms of
these families, or their susceptibility to isoforms of other protease
families, cannot be ruled out a priori when assessing complex
protease complements such as those usually found in herbivorous
arthropods [9, 10]. Most importantly, in vitro assays to monitor
the activity of model protease subsets do not consider the whole
complement of protease targets in the herbivore, and hence the full
range of protease isoforms eventually staying active in the midgut
after inhibitor intake [11].
In practice, a straightforward way to select protease inhibitors
among a collection of possible variants may not be only to test their
inhibitory potency against a few selected protease models, but also
to compare their effective binding range against the whole range of
eventual protease targets in the pest midgut. To this end, we
devised an activity-based functional proteomics approach that
allows for a direct comparison of inhibitor affinity profiles toward
whole midgut protease complements in crude protein extracts of
herbivorous insects [12]. The approach involves the capture of
insect cysteine (Cys) proteases with Cys-type inhibitors of the
plant cystatin protein family [13], followed by liquid chromatogra-
phy–tandem mass spectrometry (LC-MS/MS) peptide analysis of
the captured proteases. Unlike in vitro activity assays with peptide
substrates, a picture of direct protease–inhibitor interactions in
source extracts is obtained, with no masking or confounding effects
to generate over- or underestimation of target protease binding
ranges [11]. A step-by-step protocol is here described for the
procedure, using single functional variants of tomato cystatin
SlCYS8 and digestive proteases of the major coleopteran pest Col-
orado potato beetle as a “plant (inhibitor)–insect (proteases)”
model of agronomic significance [14].
Cystatin Activity-Based Protease Profiling 355
2 Materials
2.1 Biotinylated 1. Biotinylated cystatins expressed in Escherichia coli for cystatin–

Plant Cystatins protease complex enrichment on avidin-embedded agarose
beads [12, 15] (see Note 1).
2.2 Insect Midgut 1. Insect proteins from snap-frozen Colorado potato beetle (Lep-
Proteins tinotarsa decemlineata) fourth-instar larvae reduced to a fine
powder in liquid nitrogen (Praxair) [11].
2.3 Laboratory Tools 1. Reduced glutathione–Sepharose agarose beads

and Materials (GE Healthcare).
2. Novagen™ human factor Xa (EMD Millipore).
3. Pierce NeutrAvidin™ agarose beads (Thermo Fisher
Scientific).
4. Bio-Rad Protein Assay Kit™ (Bio-Rad).
5. Sequencing grade trypsin (Promega).
6. Mini-PROTEAN Tetra Cell™ unit for protein 1D gel electro-
phoresis (Bio-Rad).
7. Multi-Mix™ tube rotator (VWR).
8. Gel scanner and image analysis software for protein densitome-
try in polyacrylamide slab gels (see Note 2).
9. Temperature-controlled shaker.
11. Centrifugal vacuum concentrator.
12. Basic UV–visible spectrophotometer.
2.4 Media, Buffers Culture media, buffers, and solvents are made up as aqueous solu-
and Other Solutions tions with ultrapure water and analytical grade reagents. All solu-
tions are stored at 4 C unless otherwise indicated. Working
buffers, reagents, and standard step-by-step protocols for SDS-
PAGE are used as described in [16].
1. Luria–Bertani (LB) medium: 10 g/L tryptone, 5 g/L yeast
extract, 5 g/L NaCl, pH 7.0.
2. 10 μg/mL chloramphenicol (Sigma-Aldrich).
3. 100 μg/mL carbenicillin (Sigma-Aldrich).
4. 1 mM isopropyl ß-D-1-thiogalactopyranoside (Sigma-Aldrich).
5. 50 μM D-biotin (Sigma-Aldrich).
6. Phosphate buffered-saline (PBS), pH 7.4.
7. 100 mM citrate phosphate buffer, pH 6.0.
8. Agarose beads washing buffer: 100 mM citrate phosphate,

pH 6.0, supplemented with 250 mM NaCl and 10 mM L-
cysteine.
9. 100 mM ammonium bicarbonate.
10. 50% (v/v) acetonitrile.
11. 10 mM dithiothreitol (Sigma-Aldrich).
12. 55 mM iodoacetamide (Sigma-Aldrich).
13. 2.0% (v/v) acetonitrile/1.0% (v/v) formic acid.
14. 50% (v/v) acetonitrile/1.0% (v/v) formic acid.
15. 0.1% (v/v) formic acid.
3 Methods
3.1 Capture of Target The whole procedure in this chapter includes two steps consisting
Proteases with of (1) capturing cystatin-sensitive target protease isoforms with
Biotinylated Cystatins biotinylated AviTagged cystatins (this section) (Fig. 1); and
(2) quantifying the captured proteases by LC-MS/MS analysis
(Subheading 3.2).
3.1.1 Heterologous 1. Grow 5-mL E. coli cultures overnight at 37 C in LB medium

Expression and Purification supplemented with 10 μg/mL chloramphenicol and 100 μg/
of the AviTagged Cystatins mL carbenicillin (see Notes 3 and 4).
2. Transfer the overnight cultures in 500 mL of LB medium
containing 100 μg/mL carbenicillin.
3. Allow the bacteria to multiply at 37 C under agitation, until
reaching an OD600 of 0.4–0.6.
4. Add 1 mM isopropyl ß-D-1-thiogalactopyranoside (Sigma-
Aldrich) to induce protein expression, and 50 μM D-biotin
(Sigma-Aldrich) to induce AviTag peptide biotinylation (see
Note 5).
5. Grow bacteria for 16 h at 37 C under agitation (see Note 6).
6. Centrifuge the cultures at 6000 g for 5 min at 4 C, and
discard the supernatants.
7. Submit the pellets to several freeze/thaw cycles at 20 C to
break bacterial cells. A minimum of three freeze–thaw cycles is
needed to obtain proper lysis of the bacteria.
8. Affinity-purify the AviTagged cystatins with reduced glutathi-
one–Sepharose agarose beads (GE Healthcare) as described by
the provider (see Note 3).
9. Remove the GST moiety by cleavage with Novagen™ human
factor Xa (EMD Millipore) as described by the provider.
Fig. 1 Schematic overview of the cystatin activity-based protease capture procedure. PHASE 1: Bacterially
expressed AviTagged inhibitors (e.g., AviTagged cystatins) are ligated to D-biotin by the action a biotin ligase,
in vivo during their expression in E. coli or in vitro following their recovery from the bacteria (Subheading
3.1.1). Phase 2: The biotinylated inhibitors (cystatins) are incubated with NeutrAvidin™ agarose beads to
generate cystatin-embedded agarose matrices (or beads) for protease capture (Subheading 3.1.2). Phase 3:
The beads are incubated with a crude protein extract of the putative target proteases (Subheading 3.1.3) to
capture those protease isoforms that show affinity for the tested inhibitor variants (Subheading 3.1.4). These
beads bound to the inhibitor–target protease complexes then serve as source material for LC-MS/MS analyses
(Subheading 3.2)
10. Proof-check the overall quality of the purified inhibitor pre-

parations on Coomassie blue-stained polyacrylamide slab gels
following 12% (w/v) SDS-PAGE [16].
11. Quantify the purified cystatins by densitometric analysis of
Coomassie blue-stained bands using a high-resolution gel
scanner and an appropriate image analysis software, after gen-
erating a protein standard curve with bovine serum albumin as
a reference.
3.1.2 Binding of 1. Centrifuge the NeutrAvidin™ agarose beads for 2 min at

AviTagged Cystatins to 500 g to remove commercial storage buffer (see Note 7).
NeutrAvidin™ Agarose 2. Wash the beads with one volume of PBS, pH 7.4, and centri-
Beads fuge for 2 min at 500 g.
3. Repeat step 2 twice.
4. Add biotinylated AviTagged cystatins (Subheading 3.1.1) in
excess concentration (see Note 8) and incubate the beads at
20 C for 30 min with gentle agitation on a VWR Multi-Mix

tube rotator.
5. Wash the beads with 10 volumes of PBS, pH 7.4, and centri-
fuge for 2 min at 500 g.
6. Repeat step 5 twice.
7. Submit 5 μL of the cystatin-embedded beads to 12% (w/v)
SDS-PAGE and stain with Coomassie blue to confirm proper
cystatin binding. The beads can be stored at 4 C until use,
pending adequate stability of the recombinant inhibitor.
3.1.3 Extraction of Target 1. Extract insect powder proteins in two volumes (e.g., 2 mL
Proteases buffer/g fresh powder) of 100 mM citrate phosphate buffer,
pH 6.0.
2. Keep the mixture on ice for 10 min.
3. Discard insoluble material by centrifugation at 20,000 g for
10 min at 4 C.
4. Assay soluble proteins in the supernatant using the Bio-Rad
Protein Assay™ kit (Bio-Rad), as described by the provider.
5. Use the supernatant as freshly prepared for Subheading 3.1.4,
or keep it at 80 C until use.
3.1.4 Target Protease 1. Incubate 20 μL of cystatin-embedded beads (Subheading

Capture on Cystatin- 3.1.2) with 5.5 mg of insect proteins (Subheading 3.1.3) in
Embedded Agarose Beads 900 μL of 100 mM citrate-phosphate buffer, pH 6.0, for
40 min at 20 C with gentle agitation (see Note 9).
2. Collect the beads by centrifugation for 2 min at 1000 g.
3. Wash by resuspension in 900 μL of agarose beads washing
buffer (see Note 10).
4. Centrifuge for 2 min at 1000 g.
5. Repeat steps 3 and 4 twice.
6. Add 10 μL of concentrated (4) SDS-PAGE loading buffer
[16] and 10 μL of agarose beads washing buffer to 20 μL of
agarose beads.
7. Heat for 5 min at 95 C.
8. Submit 25-μL samples of the resulting mixtures to 12% (w/v)
SDS-PAGE and stain the resolved proteins with Coomassie
blue [16].
9. Using a scalpel, collect protein band(s) in the gel
corresponding to cystatin-captured proteases (see Note 11). A
representative example of protein fraction profiles during the
enrichment process is shown on Fig. 2 for AviTagged SlCYS8-
captured proteases in Colorado potato beetle crude protein
extracts.
biotin/SlCYS8 biotin/Q47P
Crude Flow Wash Beads Wash Flow Beads+ Beads
Mr
extract biotin (empty)
200
116
97
66
45
31 a
21
b
14.4 c
6.5
Fig. 2 Avidin-affinity enrichment of Colorado potato beetle Cys proteases captured using biotinylated
AviTagged SlCYS8. Q47P, a single functional variant of SlCYS8 with limited inhibitory activity against
papain-like Cys proteases [17], was here used as a negative control for the protease capture step. Biotinylated
cystatins bound to the avidin beads were incubated with the insect protein extract for target protease capture
(Subheading 3.1.4). Proteins in test and control (Q47P) samples were visualized by Coomassie Blue staining
following 12% (w/v) SDS-PAGE. The crude (Crude extract), flow-through (Flow), washing (Wash), and beads-
bound (Beads) protein fractions are shown on the gel. A 30-kDa protein was readily detected in the Beads
fraction using wild-type SlCYS8 (Box a), corresponding to the previously described Cys protease LdP30
purified from Colorado potato beetle midgut extracts by affinity chromatography with the model plant cystatin
OCI as a ligand [18]. Boxes b and c correspond to avidin and AviTagged SlCYS8 recovered from the affinity
beads, respectively. Mr, on the left, refers to molecular weight protein markers (kDa)
3.2 Mass Gel slices collected at Subheading 3.1.4 are used as source material
Spectrometric to identify and quantify cystatin-captured protease isoforms. This
Analysis of Captured part of the procedure first involves protein sample preparation for
Proteases mass spectrometry, followed by the LC-MS/MS analysis per se,
peptide-based identification of the captured proteases, and their
quantitation based on peptide spectral count sampling statistics.
3.2.1 Sample 1. Wash the gel slices for 5 min in water and destain proteins three
Preparation for Mass times with equal volumes of 100 mM ammonium bicarbonate
Spectrometry and 50% (v/v) acetonitrile.
2. Dry the gel slices by washing for 10 min in 50% (v/v)
acetonitrile.
3. Reduce and alkylate entrapped proteins with 10 mM dithio-
threitol and 55 mM iodoacetamide, respectively.
4. Hydrolyze the proteins for 18 h at 37 C with 125 nM

Sequencing grade trypsin as described by the provider
(Promega).
5. Extract resulting peptides from the gel matrix by incubation for
10 min in 2% (v/v) acetonitrile/1.0% (v/v) formic acid.
6. Perform a second extraction in 50% (v/v) acetonitrile–1.0%
(v/v) formic acid.
7. Pool the two extractions and dry peptides in a centrifugal
vacuum concentrator.
8. Resuspend in 12 μL of 0.1% (v/v) formic acid, from which 5 μL
are taken for LC-MS/MS analysis.
3.2.2 LC-MS/MS Our LC-MS/MS analyses are performed at the Proteomics Plat-
Analysis form of CHU de Québec Research Center (http://proteomique.
ulaval.ca), Québec, QC, Canada. In brief, peptide samples pro-
duced at Subheading 3.2.1 are resolved by online reversed-phase
nanoscale capillary LC and analyzed by electrospray MS/MS. An
Eksigent ekspert™ nanoLC425 System is used, coupled to a
Triple-TOF 5600 plus mass spectrometer equipped with a nanoe-
lectrospray ion source (Sciex). Peptide separations take place in self-
pack PicoFrit columns [75 μm ID/15 μm tip] (New Objective)
packed with Reprosil-Pur C18 AQ media composed of 3-μm par-
ticles with pores of 120 A (Dr. Maisch, Woburn, MA, USA). The
peptides are eluted at 300 nL/min over 35 min along a 5–35%
(v/v) acetonitrile–0.1% (v/v) formic acid linear gradient. Full-scan
mass spectra [400–1250 m/z] are acquired under a data-dependent
acquisition mode using the Analyst software, version 1.7 (Sciex).
The 20 most intense ions are selected for collision-induced dissoci-
ation, with the dynamic exclusion period set a 20 s and a peptide ion
mass tolerance of 100 ppm.
3.2.3 Identification of MGF peak list files are generated with the Protein Pilot software,
Captured Proteases version 4.5 (Sciex) and analyzed using the Mascot software, version
2.5.1 (Matrix Science) to search the Uniprot protein sequence
database (http://www.uniprot.org/). Search parameters for pro-
tein matching are set as follows: a fragment ion mass tolerance of
0.1 Da; a parent ion tolerance of 0.1 Da; iodoacetamide derivatives
of Cys residues as fixed modification; oxidized Met residues as
variable modification; and a maximum allowed of two missed tryp-
sin cleavages. MS/MS-based peptide and protein identifications are
validated using the SCAFFOLD software, version 4.7.1 (Proteome
Software). A false discovery rate of 1%, as determined with the
Scaffold Local FDR algorithm, is applied for both peptides and
proteins. Proteins that contain similar peptides and cannot be
differentiated based on the MS/MS spectra are grouped to satisfy
the principle of parsimony.
3.2.4 Quantitation of Quantitative analysis of MS spectra is performed using spectral

Captured Protease count sampling statistics [19] on those peptides that correspond
Peptides to the digestive Cys protease (or intestains) of Colorado potato
beetle [12]. Differential numbers of captured proteases for the
SlCYS8 variants are discriminated statistically with a significance
threshold of 5%, considering spectral count mean values greater
than 4 for at least one inhibitor variant [20].
3.3 Working Spectral count data in Subheading 3.2.4 may be used to address
Examples questions of practical or scientific interest. We used the approach in
recent years to identify potent inhibitor variants for Colorado
potato beetle control [11, 12, 21] (Subheading 3.3.1). We also
used it to address basic questions about the evolution and struc-
ture/function relationships of protease–inhibitor interactions in
plant/insect systems, again taking the Colorado potato beetle as a
model [10, 22] (Subheading 3.3.2).
3.3.1 Example 1: The Attempts to implement resistance to Colorado potato beetle in

Protease Capture Approach potato using recombinant protease inhibitors have been hampered
as a Decision Tool to Select by the onset of multiple compensatory responses in this insect, like
Cystatins Useful in the expression of “insensitive” proteases or an increased consump-
Herbivore Pest Control tion of leaf tissue to counterbalance the loss of digestive protease
functions following inhibitor intake [23]. Despite obvious con-
straints in practice, protein engineering efforts have led over the
years to the development of improved recombinant inhibitors even-
tually useful in plant protection [7], such as for instance the SlCYS8
variants P2V and T6R, both shown to exhibit improved inhibitory
potency against Colorado potato beetle cathepsin L-like and
cathepsin B-like midgut protease activities [14]. Unexpectedly,
transgenic potato lines engineered to express P2V showed strong
detrimental effects against Colorado potato beetle fourth-instars
while T6R-expressing plant lines showed no effect for similar levels
of recombinant cystatin in leaves [11]. Such an apparent discrep-
ancy between the in vivo (feeding assay) and in vitro (protease
assay) data could be explained using the cystatin-based protease
capture approach, which indicated a broader affinity range for P2V
toward Colorado potato beetle proteases despite similar inhibitory
activities measured for the two cystatin variants using synthetic
peptide substrates (Fig. 3).
3.3.2 Example 2: The Complex protease inhibitor complements in plants are the result of
Protease Capture Approach evolutionary processes often involving gene duplication and posi-
as an Analytical Tool to tive selection of nonsynonymous mutations at functionally signifi-
Address Basic Questions cant amino acid sites [24]. A well-documented case is the 8-domain
on the Evolution and potato multicystatin, an 88-kDa protease inhibitor induced in leaf
Protease Binding tissue by Colorado potato beetle feeding [25]. The eight domains
Preferences of Plant of this protein present hypervariable amino acid sites at conserved
Cystatins protease inhibitory motifs, assumed to be instrumental in its broad
A B C
Z-Phe-Arg-MCA 100 Z-Arg-Arg-MCA 10
Relative no. of spectra

80
Inhibition rate (%)
Inhibition rate (%)

8
P2V 75
60 T6R
WT 6
50
40 4
20 25
2
T6 Loop 1
P2 Loop 2 0 0 0
0 10 20 30 40 50 WT T6R P2V WT T6R P2V
Inhibitor (nM) SlCYS8 variant (1 µM) SlCYS8 variant
Fig. 3 Affinity spectra of tomato SlCYS8 and single functional variants T6R and P2V toward Colorado potato
beetle digestive Cys proteases. (a) Structure model for SlCYS8 (GenBank Accession No. AF198390) showing
the approximate position of residues Pro-2 (P2) and Thr-6 (T6) targeted to produce P2V and T6R. Details for
the in silico modeling are given in ref. [10]. (b) Z-Arg-Phe-methylcoumarin (MCA) (cathepsin L-like) and Z-Arg-
Arg-MCA (cathepsin B-like) hydrolyzing activities in larval midgut extracts preincubated with the three cystatin
variants. Data on this panel were inferred from ref. [14]. Each bar is the mean of three independent (insect
replicate) values SE. (c) Relative spectral counts for digestive Cys protease (intestain) peptides captured with
biotinylated SlCYS8, P2V, or T6R in midgut extracts of fourth-instars. Data on this panel were inferred from ref.
[11]. Spectral counts are expressed relative to total spectra counted for wild-type SlCYS8 (mean value of 1).
Each bar is the mean of three independent (insect replicate) values SE
inhibitory range against insect digestive proteases [26]. An

unsolved question at this point is the influence of structural con-
straints to amino acid variability on the contribution of positive
selection to cystatin function. Amino acid substitutions retained in
plant cystatins during their evolution often involved closely related
residues [22] and the actual effects of positively selected amino
acids on cystatin functional diversity remain to be fully explored.
Toward this goal, we used the cystatin-based protease capture
approach to compare the affinity spectra of SlCYS8 single variants
bearing a leucine (L), an isoleucine (I) or a valine (V) in place of the
original proline-2 (P2) at positively selected amino acid site 2 in the
N-terminal region [22] (see Fig. 3a for a visual representation of P2
on SlCYS8). L, I, and V differ from each other only by the spatial
orientation of their terminal methyl groups and/or the distance
between these functional groups and the α-carbon atom. We previ-
ously reported roughly similar inhibitory spectra for the P2I, P2L,
and P2V variants toward Colorado potato beetle midgut proteases,
based on in vitro assays with diagnostic peptide substrates for
cathepsin L-like and cathepsin B-like activities [14]. A closer look
at protease targets of the three variants using the protease capture
approach, in fact, revealed an altered Cys protease binding profile
for P2L [22]. Whereas wild-type SlCYS8, and single variants P2I
and P2V, showed a net preference for the insect “intestain B”
(IntB) protease subfamily, P2L showed a well-balanced dual affinity
pattern for isoforms of the IntB and “intestain D” (IntD)
A 12 Total 10 IntB IntD

Relative # of spectra 10 60
8
8
6 40
6
4 4
20
2 2
0 0 0
WT P2I P2L P2V WT P2I P2L P2V WT P2I P2L P2V
WT P2I P2L P2V
IntB IntD
Fig. 4 Affinity spectra of wild-type SlCYS8 and single variants P2I, P2L, and P2V toward Colorado potato beetle
digestive Cys proteases. (a) Relative spectral counts for intestain peptides captured with biotinylated wild-type
SlCYS8 (WT), P2I, P2L, or P2V in midgut extracts of fourth instars. Spectral counts are expressed relative to
wild-type SlCYS8 (mean value of 1) for all detected intestains (Total, corresponding to subfamilies IntA–F) or
for peptides specific to major intestain subfamilies IntB and IntD [12]. Each bar is the mean of three
independent (insect replicate) values SE. (b) Intestain subfamily preference patterns of wild-type SlCYS8,
P2I, P2L, and P2V for major intestain families IntB and IntD. Pie charts illustrate the relative proportions of
IntB- vs IntD-specific peptides detected in the insect crude extract. Data on this figure were inferred from
reference [22]
subfamiles (Fig. 4). These observations showing different protease

isoform preferences for P2I, P2L, and P2V were pointing to an
effective contribution of closely related amino acids to the positive
selection-driven diversification of plant cystatin function. They
were suggesting, from an experimental standpoint, the usefulness
of our protease capture approach to assess basic scientific questions
about the evolution, structure, and function of these ubiquitous
plant proteins.
4 Notes
1. As an example, we here use functional variants of tomato

cystatin SlCYS8 [14] bearing a D-biotin-bound Avitag peptide
(SGGLNDIFEAQKIEWHE∗ [15]) at the C-terminus (see ref.
[12]).
2. Several image analysis software programs may be used for

protein densitometry, available on the market or freely
distributed. We here use the Phoretix 2-D Expression software,
v. 2005 (NonLinear USA).
3. Wild-type tomato cystatin SlCYS8 and single functional var-
iants of this inhibitor (P2I, P2L, P2V, and T6R [14]) are here
used for the demonstrations. All cystatin variants are expressed
in and purified from E. coli, strain AVB101 (Avidity LLC) using
the glutathione S-transferase (GST) gene fusion system
(GE Healthcare). Gene constructs for the GST fusions are
described in refs [12, 22]. AVB101 E. coli cells express a biotin
ligase, BirA, driving the in vivo biotinylation of AviTag peptides
(see Note 4).
4. As an alternative to AVB101 E. coli cells, biotinylation can be
performed in vitro following cystatin affinity purification using
a commercial preparation of the BirA biotin ligase
(EC 6.3.4.15) (Avidity LLC). This procedure is typically com-
pleted within 1 h. Detailed protocols for in vitro biotin ligation
of AviTag peptides are available on the provider’s website
(https://www.avidity.com/resources/protocols).
5. D-biotin is prepared as a 5 mM stock solution in warm 10 mM
bicine, pH 8.3, and filter-sterilized through a 0.2-μm filter
before use.
6. Optimal conditions for heterologous expression may vary from
one protein to another. Temperature may be reduced at 20 C
at this step if the protein tends to form inclusion bodies.
7. Different resins and agarose beads are available for avidin-based
enrichment. We used to work with the TetraLink Avidin™
resin from Promega [12] but this product was no longer avail-
able commercially in recent years. NeutrAvidin™ was here
selected given its high specificity and strong affinity for biotin
(Kd ¼ 1015 M). Strong denaturing conditions are required for
protein elution, which ensures retention of biotinylated pro-
teins on agarose beads throughout the protease capture
process.
8. Ten microliters of NeutrAvidin™ agarose beads can support
approximately 4 μg of purified cystatin, corresponding to
approximately 320 pmol of inhibitor. If inhibitors of different
molecular weights are compared in a given experiment,
amounts applied to the beads must be adjusted so as to use
equimolar concentrations of inhibitor. In the present case, we
added 8 μg of cystatin per 10 μL of beads, that is, twice their
binding capacity, to ensure saturation with the biotinylated
inhibitors. Sufficient volumes of reaction mixture must be
prepared at this step to ensure proper mixing of the solution
during the 30-min incubation.
9. The volume of protein extract added must be optimized based

on the abundance of Cys proteases in source extract. A prefil-
tration or precipitation step may be necessary before protease
capture for those extracts (e.g., plant leaf extracts) that contain
dilute amounts of proteases. pH of the binding reaction could
also require adjustment for some extracts given its possible
influence on inhibitor–protease interactions. The binding reac-
tion was here performed at pH 6.0, corresponding to the
overall pH optimum of Colorado potato beetle midgut Cys
proteases.
10. The washing step must be optimized as to minimize nonspe-
cific binding while maintaining target protease binding to the
immobilized inhibitor. The agarose beads washing buffer was
here supplemented with 250 mM NaCl to minimize nonspe-
cific binding and with 10 mM L-cysteine to provide reducing
conditions for Cys protease activity. L-Cysteine may be also
included in the binding reaction mixture to maintain target
proteases under an active form.
11. AviTagged inhibitors and captured proteases could, in some
cases, exhibit similar molecular weights. Optimization of elec-
trophoretic conditions before protease band recovery might be
indicated in such cases to avoid the masking of captured pro-
teases following Coomassie blue staining and eventual interfer-
ence by the inhibitor, found in large amounts in the bead
eluate, during the MS/MS analysis.
Acknowledgments
Work supported by Discovery and Discovery Accelerator Supple-

ment grants from the Natural Science and Engineering Research
Council of Canada to D.M.
References
1. Schlüter U, Benchabane M, Munger A et al 4. Li Y, Hallerman EM, Liu Q et al (2016) The

(2010) Recombinant protease inhibitors for development and status of Bt rice in China.
herbivore pest control: a multitrophic perspec- Plant Biotechnol J 14:839–848
tive. J Exp Bot 61:4169–4183 5. Birk Y (2003) Plant protease inhibitors.
2. Macedo MLR, de Oliveira CFR, Costa PM et al Springer, New York, NY
(2015) Adaptive mechanisms of insect pests 6. Broadway RM (2000) The adaptation of
against plant protease inhibitors and future insects to protease inhibitors. In: Michaud D
prospects related to crop protection: a review. (ed) Recombinant protease inhibitors in plants.
Protein Pept Lett 22:149–163 CRC Press, Boca Raton, FL, pp 80–88
3. Chen M, Shelton A, Ye GY (2011) Insect- 7. Sainsbury F, Benchabane M, Goulet MC,
resistant genetically modified rice in China: Michaud D (2012) Multimodal protein con-
from research to commercialization. Annu structs for herbivore insect control. Toxins
Rev Entomol 56:81–101 4:455–475
8. Michaud D, Nguyen-Quoc B (2000) Using 18. Visal-Shah SD, Vrain TC, Yelle S et al (2001)
natural and modified protease inhibitors. In: An electroblotting, two-step procedure for the
Michaud D (ed) Recombinant protease inhibi- detection of proteinases and the study of pro-
tors in plants. CRC Press, Boca Raton, FL, pp teinase/inhibitor complexes in gelatin-
114–127 containing polyacrylamide gels. Electrophore-
9. Srinivasan A, Giri AP, Gupta VS (2006) Struc- sis 22:2646–2652
tural and functional diversities in lepidopteran 19. Zhang B, VerBerkmoes NC, Langston MA et al
serine proteases. Cell Mol Biol Lett (2006) Detecting differential and correlated
11:132–154 protein expression in label-free shotgun prote-
10. Vorster J, Rasoolizadeh A, Goulet MC et al omics. J Proteome Res 5:2909–2918
(2015) Positive selection of digestive Cys pro- 20. Old WM, Meyer-Arendt K, Aveline-Wolf L
teases in herbivorous Coleoptera. Insect Bio- et al (2005) Comparison of label-free methods
chem Mol Biol 65:10–19 for quantifying human proteins by shotgun
11. Rasoolizadeh A, Munger A, Goulet MC et al proteomics. Mol Cell Proteomics
(2016) Functional proteomics-aided selection 4:1487–1502
of protease inhibitors for herbivore insect con- 21. Oppert B, Rasoolizadeh A, Michaud D (2014)
trol. Sci Rep 6:38827 The coleopteran gut and targets for pest con-
12. Sainsbury F, Rhéaume AJ, Goulet MC et al trol. In: Hoffmann K (ed) Insect molecular
(2012) Discrimination of differentially inhib- biology and ecology. CRC Press, Boca Raton,
ited cysteine proteases by activity-based FL, pp 291–317
profiling using cystatin variants with tailored 22. Rasoolizadeh A, Goulet MC, Sainsbury F et al
specificities. J Proteome Res 11:5983–5993 (2016) Single substitutions to closely related
13. Benchabane M, Schlüter U, Vorster J et al amino acids contribute to the functional diver-
(2010) Plant cystatins. Biochimie sification of an insect-inducible, positively
92:1657–1666 selected plant cystatin. FEBS J 283:1623–1635
14. Goulet MC, Dallaire C, Vaillancourt LP et al 23. Cingel A, Savic J, Lazarevic J et al (2016)
(2008) Tailoring the specificity of a plant cysta- Extraordinary adaptive plasticity of Colorado
tin toward herbivorous insect digestive cysteine potato beetle: “ten-striped spearman” in the
proteases by single mutations at positively era of biotechnological warfare. Int J Mol Sci
selected amino acid sites. Plant Physiol 17:1538
146:1010–1019 24. Christeller JT (2005) Evolutionary mechan-
15. Beckett D, Kovaleva E, Schatz PJ (1999) A isms acting on proteinase inhibitor variability.
minimal peptide substrate in biotin holoen- FEBS J 272:5710–5722
zyme synthetase-catalyzed biotinylation. Pro- 25. Bouchard E, Cloutier C, Michaud D (2003)
tein Sci 8:921–929 Oryzacystatin I expressed in transgenic potato
16. Smith BJ (1984) SDS polyacrylamide gel elec- induces digestive compensation in an insect
trophoresis of proteins. In: Walker JM natural predator via its herbivorous prey feed-
(ed) Methods in molecular biology, Proteins, ing on the plant. Mol Ecol 12:2439–2446
vol 1. Humana Press, Clifton, NJ, pp 41–55 26. Kiggundu A, Goulet MC, Goulet C et al
17. Arai S, Watanabe H, Kondo H et al (1991) (2006) Modulating the proteinase inhibitory
Papain-inhibitory activity of oryzacystatin, a profile of a plant cystatin by single mutations
rice seed cysteine proteinase inhibitor, depends at positively selected amino acid sites. Plant J
on the central Gln-Val-Val-Ala-Gly region con- 48:403–413
served among cystatin superfamily members. J
Biochem 109:294–298
Chapter 27
A Pipeline for Metabolic Pathway Reconstruction in Plant

Orphan Species
Cristina López-Hidalgo, Mónica Escandón, Luis Valledor,
and Jesus V. Jorrin-Novo
Abstract
In the era of high-throughput biology, it is necessary to develop a simple pipeline for metabolic pathway
reconstruction in plant orphan species. However, obtaining a global picture of the plant metabolism may be
challenging, especially in nonmodel species. Moreover, the use of bioinformatics tools and statistical
analyses is required. This chapter describes how to use different software and online tools for the recon-
struction of metabolic pathways of plant species using existing pathway knowledge. In particular, Quercus
ilex omics data is employed to develop the present pipeline.
Key words Metabolic pathways, Enzymes, Metabolomics, Proteomics, Transcriptomics
1 Introduction
Plants have an extraordinary level of metabolic diversity. The wide

diversity has provided a vast source of natural products that are
indispensable resources for humans, especially, for our health and
survival. Emphasizing that, Rai et al. [1] reported that over 60% of
the drugs introduced in the past 20 years are based on plant extracts
or their close derivatives. Owing to its importance, the study of
diversity of plant metabolism is essential. In order to achieve this
objective, elucidation of metabolic pathways and their reconstruc-
tion cannot be avoided. Metabolic pathways are defined as full set
of biochemical reactions that occur sequentially in biological sys-
tems. The substrates and products of these reactions are the meta-
bolites, whose transformations are catalyzed by enzymes.
Nevertheless, the discovery of full metabolic pathways and meta-
bolites in plants is far from being completed [2]. Reconstruction of
metabolic pathways is vital to achieve this. In fact, despite the
availability of several complete plant genomes (the model plant
Arabidopsis thaliana (GCA_000001735.2; https://www.ncbi.
367
368 Cristina López-Hidalgo et al.
nlm.nih.gov/assembly/GCF_000001735.4) or the forest tree Q.

suber (GCA_002906115.1; https://www.ncbi.nlm.nih.gov/assem
bly/GCF_002906115.1), and the growing amount of transcrip-
tomic, proteomic, and metabolomic data are currently available,
making sense of all this data at the metabolic level still remains a
major issue for plant scientists.
Some of the obstacles to effective metabolic pathway recon-
struction were the total number of metabolites in plant kingdom
whose estimation is between 100,000 and 200,000 [3], and the
high degree of compartmentation in plants [4]. In addition, the
metabolic pathway elucidation becomes more difficult, considering
the impressive range of secondary metabolites to escape from biotic
or abiotic stressors, and the plant alteration of metabolic composi-
tions during different physiological and environmental conditions
[5]. Finally, standing out, plant metabolomes can also reflect differ-
ent genetic backgrounds, due their metabolic changes related with
the environmental conditions of the origin of the population
[6]. This difficulty is aggravated in nonmodel species, such as the
vast majority of trees, due to incomplete or nonsequenced gen-
omes, poor availability of structurally and functionally annotated
databases [7], and the lack of optimized protocols for wet and in
silico analyses that allow for the acquisition of feasible genetics,
transcriptomics, proteomics, and metabolomics data.
Despite their difficulty as orphan and recalcitrant plant species,
forest trees have been considered at the wide system level such as
other model plants [6, 8, 9]. These works have implicated the use of
multidisciplinary approaches, from visual phenotype to molecular
-omics, through physiological and biochemical approaches. In fact,
nowadays, the knowledge on biosynthetic and metabolic pathways
of tree natural products is largely incomplete, but the genomic and
metabolomic information is expected to give clues to missing
enzymes and reactions for biosynthesis of diverse chemical sub-
stances including those with medicinal and nutritional values, in
addition to the elucidation of vital mechanisms underlying cellular
physiology by deciphering relationships between genotype and
phenotype [10].
In this direction, trying to fill this gap with the use of the
available high-throughput -omics, its combination and the imple-
mentation of required methodology, we hoped to provide a model
workflow for the reconstruction of metabolic pathways of plant
species using existing pathway knowledge, starting with some soft-
ware guidelines, followed by the metabolic pathway image repre-
sentation. This method was implemented in [11].
Metabolic Pathway Reconstruction in Plant Species 369
2 Materials
The reconstruction of metabolic pathways requires some informa-

tion about multiple molecular level information that constitutes a
metabolic pathway, such as enzymes (transcripts or protein
sequences) and metabolites, subtracts and products in the meta-
bolic reactions. For this, the employment of different omics tech-
nologies such as transcriptomics, proteomics, and metabolomics is
essential. The workflow begins with the omics analysis of the tissues
of interest. The omics data obtained (Subheading 2.1) consisting of
sets of identified transcripts, proteins, or annotated metabolites will
be integrated in several metabolic pathways. Different software and
online tools will be employed to carry out the integration men-
tioned previously (Subheading 2.2). These tools provide the visual
representation of metabolic pathways. The workflow is shown in
the Fig. 1.
2.1 Datasets The data employed in this work belongs to previously published
works [11–13].
2.1.1 Transcriptomics The transcripts FASTA format file is obtained as indicated in

Datasets [13]. In this work, a complete annotation of Q. ilex transcriptome
is carried out by using Sma3s v2 annotator (http://www.
bioinfocabd.upo.es/node/11). Further information is described
in Chapter 4 of this book.
2.1.2 Proteomics The protein FASTA format file is obtained as indicated in

Datasets [13]. Once, the proteins are identified, the Proteome Discoverer
software version 2.2 allows for the exportation of amino acid
sequences in a FASTA format file.
2.1.3 Metabolomics The metabolites were obtained as indicated in [11]. Some pipelines
Datasets are implemented for metabolite identification, including both com-
mercial software such as Compound Discoverer 3.0 (Thermo Sci-
entific™) and Progenesis QI software (Nonlinear Dynamics) and
open and free software packages such as MZmine2 [14], XCMS
[15], and MSDIAL [16]. The former group of software identify
compounds using online database search tools including
mzCloud™, Chemspider™, KEGG, and METLIN [17], and
local or in-house databases. In the employed data, data raws are
analyzed by AMDIS (http://www.amdis.net/) and metabolites are
“tentatively assigned” based on GC retention times (RT) and m/z
values through searches in different databases, including the Gölm
Metabolome Database [18], Alkane, Fiehn library 1 y 2 [19],
GC-TSQ, MoSys, and NIST/EPA/NIH Mass Spectral Library.
The annotated metabolites are named using the KEGG com-
pound reference database. For MapMan visualization, the name of
the metabolites must be compatible with the MapMan metabolite
Fig. 1 The workflow for metabolic pathway reconstruction is divided in four steps: omics data collection,
bioinformatics for (semi)quantitative analyses, bioinformatics for annotation, and data visualization. Employed
software and tools are referenced (transcriptomics, proteomics, and metabolomics)
identifiers. These identifiers can be shown in “MappingMetabolites

download” located in MapMan store (https://mapman.gabipd.
org/mapmanstore) or also in the result file “Mercator_result”
when the Mercator transcript or protein annotation is carried out.
2.2 Integration Tools Different resources and web application are employed to integrate
the multiple omics information. The first one, Mercator [20], is an
online tool to batch classify protein or gene sequences into Map-
Man functional plant categories. This tool allows for the automatic
structuring of whole plant transcriptomes and/or proteomes. Once
the annotation and functional plant categorization have been con-
ducted, KEGG (Kyoto Encyclopedia for Genes and Genomes;
http://www.genome.jp/kegg/) and MapMan (http://mapman.
gabipd.org/) [21, 22] are used to visualize the data in different
plant metabolic pathways.
3 Methods
3.1 Functional Plant 1. Go to the MERCATOR sequence annotation website (http://

Categorization www.plabipd.de/portal/mercator-sequence-annotation).
2. Upload FASTA format file (transcripts or proteins sequences
(Fig. 2a) (see Note 1).
3. Press START.
4. On the following page, the process status is displayed (Fig. 2b).
This can take several minutes.
5. Once the process is completed, you will already see the func-
tional categories pie chart (Fig. 3).
6. Moreover, you can download the result. This result consists of
a simple table (txt format file) which lists the classified tran-
scripts or proteins in the different functional categories and the
outcome descriptions. This column contains the annotated
transcript or protein description, with information as shown
in Table 1.
7. Now, extract the enzyme-related transcripts or proteins from
the fourth column of the result table (DESCRIPTION col-
umn). These enzymes usually have an Enzyme Commission
number (EC) (http://www.enzyme-database.org/) (see
Note 2).
8. Create a txt format file with a single column with EC numbers
(transcripts or proteins) and C numbers (metabolites).
3.2 KEGG Metabolic The KEGG metabolic pathway database contains a collection of
Pathways pathway maps that allow for the representation of molecular inter-
actions and reactions. Both transcripts and proteins can be
employed. In order to see the presence of transcripts or proteins
related to enzymes, the process must be conducted twice.
1. Copy the EC and C numbers in the KEGG mapper (https://
www.genome.jp/kegg/tool/map_pathway1.html) or upload
the previously generated file with the list of these numbers
(Fig. 4).
Fig. 2 (a) Screenshot of Mercator sequence annotator. This tool performs Blast searches against Arabidopsis
TAIR 10, Swiss-Prot, and Uniref90. In the picture, other databases can be shown. The results are filtered by a
threshold (Blast_cutoff). (b) Screenshot of the Mercator sequence annotator status process. (c) Screenshot of
the Mercator finished status. When the status process indicates that it is finished, the results can be
downloaded
2. Select the organism in search against (Press org and write ath
for Arabidopsis thaliana (thale cress) or other species, such as
pop for Populus trichocarpa).
3. Press execute.
4. On the following page, the result is displayed (Fig. 5a) (see
Note 3).
5. Choose the metabolic pathway by clicking on the name (e.g.,
ath00020 Citrate cycle (TCA cycle)—Arabidopsis thaliana
(thale cress)).
6. A picture of the metabolic pathway with the metabolic reac-
tions is shown (Fig. 5b). The detected items are highlighted
in red.
Fig. 3 (a) Functional categorization and distribution in percentage of the proteins or genes, according to the
categories established by MERCATOR. The pie chart shows different functional categories: PS (Photosynthe-
sis), major CHO metabolism, minor CHO metabolism, glycolysis, fermentation, gluconeogenesis/glyoxylate
cycle, OPP (Oxidative Pentose Phosphate), TCA/org transformation, mitochondrial electron transport/ATP
synthesis, cell wall, lipid metabolism, N-metabolism, amino acid metabolism, S-assimilation, metal handling,
secondary metabolism, hormone metabolism, cofactor and vitamin metabolism, tetrapyrrole synthesis, stress,
redox, polyamine metabolism, nucleotide metabolism, biodegradation of xenobiotics, C1-metabolism, mis-
cellaneous RNA, DNA, protein, signaling, cell, micro RNA, natural antisense, development, transport, and not
assigned. (b) Screenshot of the functional categorization result file (txt format) which lists for each gene or
protein in the first column the BINCODE, in the second column the BINCODE name, in the third column the
fasta file identifier, the fourth column the description of the annotated gene or protein, and in the fifth column
the type of molecular component (T is transcript; P is protein; and M is metabolite. The description contains
the information indicated in Table 1
Table 1
Example of the information indicated in DESCRIPTION column
Annotated transcript/
protein Functions in Involved in Located in Expressed in
(p48715|rbl_sinal: Ribulose- Response to In 10 components 24 plant structures
98.2) Ribulose bisphosphate cadmium ion,
bisphosphate carboxylase carbon fixation,
carboxylase large activity peptidyl-
chain precursor cysteine
(EC 4.1.1.39) S-nitrosylation,
(RuBisCO large response to
subunit) abscisic acid
(Fragment)—Sinapis stimulus
alba (White mustard)
(Brassica hirta) &
(atcg00490: 97.1)
large subunit of
RUBISCO. Protein
is tyrosine-
phosphorylated, and
its phosphorylation
state is modulated in
response to ABA in
Arabidopsis thaliana
seeds. RBCL
Expressed BEST Arabidopsis thaliana protein Component

during Contains InterPro domain/s match type
14 growth Ribulose bisphosphate carboxylase, Ribulose bisphosphate carboxylase T
stages large subunit, C-terminal large chain, catalytic domain
(InterPro:IPR000685), Ribulose (TAIR:AT2G07732.1). &
bisphosphate carboxylase, large (reliability: 194.2) & (original
subunit, ferredoxin-like N-terminal description: 526 nucleotides)
(InterPro:IPR017443), Ribulose
bisphosphate carboxylase, large
subunit, N-terminal (InterPro:
IPR017444), Ribulose
bisphosphate carboxylase, large
chain, active site (InterPro:
IPR020878)
The annotated transcript/protein information contains the EC numbers about enzymes related proteins and transcripts
3.3 MapMan 1. Start MapMan 3.6.0RC1 (https://mapman.gabipd.org/

Metabolic mapman-download) (see Note 4).
Representation 2. Load your mapping. The mapping file is the MERCATOR
annotation file. For that, choose the side panel Mapping
(Fig. 6), click right on Mapping and press “New Mapping.”
Fig. 4 Screenshot of Search Pathway mapping tool. This tool searches against KEGG pathway maps the given
objects (genes, transcripts, proteins, and metabolites)
Choose “From file” and select the MERCATOR

annotation file.
3. Once it has been loaded, import the metabolites and transcript
or protein list in the Experiment folder (Fig. 6). This must be a
txt format file with two columns (“identifier” and “value”). For
qualitative data, the value data is one (Fig. 6). This way, all the
omics items can appear with the same color. For quantitative
data, the values should contain log fold changes between a
treatment and a reference (see Note 5).
4. Choose a pathway from the Pathways folder to visualize the
interested metabolic pathway.
5. A picture of the citrate cycle pathway is shown (Fig. 7).
All the tools and software mentioned in this chapter allow for
the creation and visualization of different metabolic pathways. This
integration is qualitative, but there are other tools that allow for
quantitative and qualitative multi-omics data integration. As exam-
ples, Omics Visualizer with Cytoscape [23] and pRocessomics
(https://github.com/Valledor/pRocessomics) are bioinformatics
platforms for visualizing molecular interaction networks, allowing
forhigh throughput data sets integration. These graphical represen-
tations are useful for the biological interpretation of metabolic
pathways and making metabolic sense of the multiple levels of
omics data.
Fig. 5 (a) Screenshot of the KEGG pathways mapper results. The results consist of a list of the assigned
transcripts/proteins and the metabolites to each KEGG metabolic pathway. (b) Screenshot of the citrate cycle
pathway. The detected transcripts and metabolites are indicated in red
Fig. 6 Workflow with screenshots of the process that it is essential to carry out for visualizing the data on maps
of biological processes. The data must be uploaded to Experiment and Mapping folders
Fig. 7 Screenshots of different means to visualize the citrate cycle pathway in MapMan. (a) Core metabolism
overview. (b) Metabolites. (c) TCA representation. Each red square represents a metabolite or a transcript/
protein. More details can be found in [22]
4 Notes
1. It is strongly advised to analyze previously the FASTA format

file in the Mercator4 Fasta Validator tool (http://plabipd.de/
portal/mercator-fasta-validator).
2. The EC number can be extracted by different informatics
software as Excel or R.
3. Many times, the EC and C numbers are not found. This is
because many enzymes and metabolites are not associated to
any pathways. Also, it may be due to the fact that the EC
number is deprecated.
4. The installation instruction is here (https://mapman.gabipd.
org/web/guest/mapman-download-instructions).
5. It is important that transcripts and metabolites identifiers are
the same that appear in the mapping file (Third column,
IDENTIFIER (Fig. 3b)).
References
1. Rai A, Saito K, Yamazaki M (2017) Integrated Eucalyptus globulus recovery from water deficit.
omics analysis of specialized metabolism in Metabolomics 12:141
medicinal plants. Plant J 90:764–787 9. Pascual J, Cañal MJ, Escandón M et al (2017)
2. Viant MR, Kurland IJ, Jones MR et al (2017) Integrated physiological, proteomic and meta-
How close are we to complete annotation of bolomic analysis of UV stress responses and
metabolomes? Curr Opin Chem Biol adaptation mechanisms in Pinus radiata. Mol
36:64–69 Cell Proteomics 16:485–501
3. Ernst M, Silva DB, Silva RR et al (2014) Mass 10. Qi Q, Li J, Cheng J (2014) Reconstruction of
spectrometry in plant metabolomics strategies: metabolic pathways by combining probabilistic
from analytical platforms to data acquisition graphical model-based and knowledge-based
and processing. Nat Prod Rep 31:784 methods. BMC Proc 8:1–10
4. Allen DK, Libourel IGL, Shachar-Hill Y 11. López-Hidalgo C, Guerrero-Sánchez VM,
(2009) Metabolic flux analysis in plants: coping Gómez-Gálvez I et al (2018) A multi-omics
with complexity. Plant Cell Environ analysis pipeline for the metabolic pathway
32:1241–1257 reconstruction in the orphan species Quercus
5. Fiehn O (2002) Metabolomics - the link ilex. Front Plant Sci 9:1–16
between genotypes and phenotypes. Plant 12. Guerrero-Sanchez VM, Maldonado-Alconada
Mol Biol 48:155–171 AM, Amil-Ruiz F et al (2017) Holm oak
6. Meijón M, Feito I, Oravec M et al (2016) (Quercus ilex) Transcriptome. De novo
Exploring natural variation of Pinus pinaster sequencing and assembly analysis. Front Mol
Aiton using metabolomics: is it possible to Biosci 4:70
identify the region of origin of a pine from its 13. Guerrero-Sanchez VM, Maldonado-Alconada
metabolites? Mol Ecol 25:959–976 AM, Amil-Ruiz F et al (2019) Ion torrent and
7. Valledor L, Carbó M, Lamelas L et al (2018) lllumina , two complementary RNA-seq plat-
When the tree let us see the forest: systems forms for constructing the holm oak (Quercus
biology and natural variation studies in forest ilex ) transcriptome. PLoS One 7454228:1–18
species. In: Progress in botany. Springer, Ber- 14. Pluskal T, Castillo S, Villar-Briones A et al
lin, Heidelberg, pp 345–367 (2010) MZmine 2: modular framework for
8. Correia B, Valledor L, Hancock RD et al processing, visualizing, and analyzing mass
(2016) Integrated proteomics and metabolo- spectrometry-based molecular profile data.
mics to unlock global and clonal responses of BMC Bioinformatics 11:395
15. Gowda H, Ivanisevic J, Johnson CH et al mass spectrometry. Anal Chem

(2014) Interactive XCMS online: simplifying 81:10038–10048
advanced metabolomic data processing and 20. Lohse M, Nagel A, Herter T et al (2014) Mer-
subsequent statistical analyses. Anal Chem cator: a fast and simple web server for genome
86:6931–6939 scale functional annotation of plant sequence
16. Tsugawa H, Cajka T, Kind T et al (2015) data. Plant Cell Environ 37:1250–1258
MS-DIAL: data-independent MS/MS decon- 21. Thimm O, Bl€asing O, Gibon Y et al (2004)
volution for comprehensive metabolome anal- MAPMAN: a user-driven tool to display geno-
ysis. Nat Methods 12:523–526 mics data sets onto diagrams of metabolic path-
17. Guijas C, Montenegro-Burke JR, Domingo- ways and other biological processes. Plant J
Almenara X et al (2018) METLIN: a technol- 37:914–939
ogy platform for identifying knowns and 22. Usadel B, Poree F, Nagel A et al (2009) A
unknowns. Anal Chem 90(5):3156–3164 guide to using MapMan to visualize and com-
18. Nielsen J, Jewett M (2007) Metabolomics. A pare Omics data in plants: a case study in the
powerful tool in systems biology. Springer, crop species, maize. Plant Cell Environ
Heidelberg 32:1211–1229
19. Kind T, Wohlgemuth G, Lee DY et al (2009) 23. Shannon P, Markiel A, Ozier O et al (2003)
FiehnLib – mass spectral and retention index Cytoscape: a software environment for
libraries for metabolomics based on quadru- integrated models of biomolecular interaction
pole and time-of-flight gas chromatography/ networks. Genome Res 13:6
Chapter 28
Detection of Plant Low-Abundance Proteins by Means

of Combinatorial Peptide Ligand Library Methods
Egisto Boschetti and Pier Giorgio Righetti
Abstract
The detection and identification of low-abundance proteins from plant tissues is still a major challenge.
Among the reasons are the low protein content, the presence of few very high-abundance proteins, and the
presence of massive amounts of other biochemical compounds. In the last decade numerous technologies
have been devised to resolve the situation, in particular with methods based on solid-phase combinatorial
peptide ligand libraries. This methodology, allowing for an enhancement of low-abundance proteins, has
been extensively applied with the advantage of deciphering the proteome composition of various plant
organs. This general methodology is here described extensively along with a number of possible variations.
Specific guidelines are suggested to cover peculiar situations or to comply with other associated analytical
methods.
Key words Plant proteome, Low-abundance proteins, Combinatorial peptide ligand library
1 Introduction
In the last 10 years plant proteomics has experienced a fast growth

especially thanks to the development or optimization of relevant
techniques, allowing for an in-depth discovery of proteins present
in various organs. In contrast to animal proteomics, there are
specific difficulties that hamper proper discoveries [1–3]. One of
the major drawbacks is that in some tissues (like leaves) a few
proteins dominate the landscape and prevent proper discovery of
low-abundance polypeptides. This is further aggravated by the
presence of various plant constituents (polyphenols, polysacchar-
ides) that strongly interfere with various sample manipulations,
such as protein capture via various chromatographic means and
analyses via different electrophoretic methodologies. In spite of
the relative paucity of genomics data, progresses have been exten-
sively made. However, additional efforts could be useful to assist
scientists to tackle the sequencing of more and more plant genomes
(most of the papers published so far deal with the proteomes of
381
382 Egisto Boschetti and Pier Giorgio Righetti
Arabidopsis thaliana and rice, Oryza sativa, and focus on profiling

organs, tissues, cells, or subcellular proteomes). The present chap-
ter deals with an emerging and powerful methodology for the
detection of low-abundance proteins already extensively adopted
in animal proteomics [4–6], namely, the combinatorial peptide
ligand library (CPLL) technique.
2 Plant Proteins: a Minor Component of Plant Extracts with Specific Properties
The protein content in plant cells is about 20 times lower than in

animal cells since the major part of the biomass is constituted of
thick polysaccharidic cell walls. Proteins present in various organs
are of very different molecular mass up to very large constructs
comprising a number of hydroxyproline molecules [7]. Due to
their intricate combination with polysaccharides (e.g., cellulose,
lignins, hemicelluloses, and pectins) the solubility of plant proteins
can be challenging requiring for instance the presence calcium ions
in aqueous buffers as well as other salts or chaotropes [8] or even
complex deglycosylation processes [9]. Often, stringent extraction
methods are necessary to dissociate tightly bound proteins to cell
walls, entailing the use of complex procedures such as sequential
washings with a 0.2 M CaCl2 buffered solutions followed by boil-
ing extraction with 62 mM Tris–HCl buffer, pH 6.8, 2% SDS, 10%
v/v glycerol, and 100 mM dithiothreitol [10]. In addition to all
these differences from animal proteins, plant proteins are relatively
resistant to current proteases, impairing their analysis with classical
methods based on polypeptide breakdown and peptide sequencing.
Many low-concentration enzymes are present (proteases, lipases,
nucleases, oxidases, and signaling involved hydrolases) not only
suggesting a great level of dynamic roles but also necessitating
specific preliminary treatments like delipidation, nucleic acids
hydrolysis, and sugar extractions.
Although plant proteins are scarcely present in plant tissues,
some of them are much more abundant as compared to others, thus
creating an extremely large dynamic concentration range. This is an
organ-dependent plant situation. A typical case is represented by
RuBisCO in leaves extracts that does not allow for detecting the
presence of many other low-abundance leaf species [11]. In seeds, a
massive protein presence is related to storage proteins [12]. Among
high-abundance proteins prolamins and gliadins in wheat [13],
vicilin in maize embryo [14], and beta-conglycinin and glycinin
(dominant proteins in soybean seeds) can be mentioned [15].
The above-described situations necessitate a drastic reduction
of protein dynamic concentration range to access LAP otherwise
masked in mass spectrometry by strong signals generated by HAP.
In electrophoretic separation techniques the large surface area
occupied by concentrated proteins (spots in 2-DE and thick
Low-Abundance Plant Proteomics 383
bands in SDS-PAGE) overlaps with LAPs and prevents their detec-

tion. However, prior to resolving the question of the reduction of
the dynamic concentration range, pretreatments of plant extracts
are most of the time mandatory.
3 Pretreatments of Plant Extracts to Eliminate Interfering Material
Since the initial plant crude extracts comprise many interfering

substances incompatible with the use of CPLLs, special treatments
are necessary to prepare a “clean” protein solution [16]. Improved
protein extraction from recalcitrant tissues, nonproteinaceous
material extraction and protein precipitation are the most popular
strategies.
For plant tissue extraction several points have to be considered.
Plant cells are rich in proteases, which requires the presence of
inactivating agents, and rich in polysaccharides (lots of them poly-
anionic) interacting directly with CPLLs and thus interfering with
protein capture. The presence of various pigments, lipids, polyphe-
nols, and secondary metabolites completes the list of products that
may be problematic with protein separation and analyses [17].
The following general rules should be adopted: (1) the aqueous
extraction should be performed in relatively low ionic strength to
prevent the solubilization of nucleic acids; (2) with highly viscous
material, such as latex and honey, a dilution is recommended;
(3) when dealing with proteins that are engaged within the cell
wall, such as pollen proteins, some amounts of nonionic detergent
(less than 0.5–1%) and urea (less than 3 M) should be used at a
concentration compatible with CPLLs. Examples are given in the
literature with detailed technical information [18]. A preliminary
lipid removal step is particularly recommended with plant seeds
such as soya beans, peanuts, corn, sunflower, and many others.
Simple extractions with nonpolar organic solvents are possible
with some risks of protein denaturation. Other methods are also
described involving a sequence of operations [19]. Pigments and
polyphenols elimination can be obtained with a phenol treatment
associated with some amounts of polyvinylpyrrolidone.
Precipitation is an essential step in the preparation of the pro-
tein solution. TCA alone or associated with acetone with reducing
agents allows for precipitating most proteins leaving in the super-
natant plenty of undesired materials [20]. Pellets containing pro-
teins are then separated by centrifugation and redissolved in the
selected buffer.
Protein precipitation can also be performed with ammonium
sulfate or with polyethylene glycol to collect precipitates that are
free or almost free of CPLL-interfering substances. Ammonium
sulfate precipitation of all proteins present in a plant extract is
performed at 80–90% saturation. Naturally at the end of this
operation the salt is to be removed. This task is accomplished by

simple dialysis at a very low molecular cutoff (e.g., 3500 Da) or by
centrifugation using appropriate filtration-integrated devices or
even desalting chromatography.
Another possible approach is acidic precipitation addressing
some categories of proteins; it can be operated by acidifying the
solution with acetic acid at pH 3–4.
To complete the picture, a less popular method is protein
precipitation by a chloroform–methanol mixture with water in
1:4:3 proportions [21].
Several variants to the abovementioned methods are also
described to precipitate proteins [22]; however, the protein solubi-
lization protocol to follow is not always easy and frequently neces-
sitates the presence of zwitterionic surfactants and chaotropes.
The associations of undesirable material elimination and pro-
tein precipitation can sometimes be a good option. For instance
pigments are eliminated by using a Tris-HCl solution saturated
with phenol followed by a protein precipitation with ammonium
sulfate in the presence of methanol [23]. This possibility is espe-
cially recommended when the analysis of proteins is based on
two-dimensional electrophoresis (2-DE), but it depends on the
plant organ where from the protein extraction is obtained.
All the above sample pretreatments contribute to obtain signif-
icantly better analytical results especially when using
two-dimensional electrophoresis and related methods. However,
not all plant-derived biological material, such as wine, needs to be
submitted to a preliminary treatment [24].
All the above-described preliminary operations not only
remove undesired materials but also contribute to concentrate
many proteins that are present in very low amounts.
In spite of many available cleanup protocols specific to plant
extracts, some of them are not compatible with CPLLs. Figure 1
illustrates possible options for four typical plant extracts used in
conjunction with CPLLs.
4 The Reduction of Protein Dynamic Range with Low-Abundance Protein

Enhancement
In spite of the presence of many proteins, plant proteomic analysis

suffers from the very low level of gene expression and the presence
of individual proteins with particularly high concentration com-
pared to all others. This situation results in proteomes where the
individual components concentration difference spans over several
orders of magnitude. Many proteins are present only in few copies
and consequently very difficult to detect. In this context depletion
and enrichment procedures have been devised to improve the
Fig. 1 Summary of plant extracts pretreatment possibilities as a function of their

source
situation like in animal proteomes. Precipitation [25, 26] fraction-

ation [27], depletion [28], and enrichment [29, 30] are the major
approaches.
Affinity-based selective separation methods (e.g., for the analy-
sis of phosphoproteomes) are another way to enrich for protein
categories. As an example for the analysis of phosphoproteome, a
labeling of phosphate groups on serine and threonine residues by a
biotin tag followed by a separation using avidin affinity chromatog-
raphy has been described [31]. Unfortunately the abovementioned
approaches are labor-intensive and comprise an intrinsic risk of
losing very low-abundance proteins due to the multiple manipula-
tions. Moreover, they do not concentrate polypeptides present in
trace amounts all together. The method described in this chapter
drastically reduces the dynamic range of protein concentration,
allowing for a large analytical scan of the entire proteome. It proved
its efficiency with a number of plant extracts such as maize seeds
[32], spinach [33] and Arabidopsis thaliana leaves [34], rubber
plant latex [35], and fruits [36].
In the last decade a method has been devised and progressively
optimized to compress the dynamic concentration range in order
to decrease the concentration of high-abundance proteins, thus
reducing the signal coverage and at the same time concentrating
the rare species. This process is operated by the so-called combina-
torial peptide ligand libraries or CPLLs that has been extensively
described for a number of biological extracts including plants [37–
39].
CPLLs are a mixture of small beads (ca. 65 μm diameter) to

which hexapeptides are covalently linked and commercialized
under the trade name of ProteoMiner. The number of peptides
reaches various millions depending on the number of amino acids
used for the synthesis; however, each bead carries a single type of
peptide in a large number of copies. This is thus a mixed bed of
beads different from each other and individually capable to capture
a protein or a group of them. When a plant protein extract is
exposed to such a solid phase under large overloading conditions,
each bead with affinity to an abundant protein will rapidly become
saturated and the vast majority of the same protein will remain
unbound. In contrast, trace proteins will not saturate the
corresponding partner beads unless the sample volume is large
enough to provide for increasing amounts of proteins. Once the
excess of unbound proteins is eliminated by filtration or centrifuga-
tion, all captured proteins can be harvested by elution at a much
lower dynamic concentration range than in the original biological
sample. Proteins present at trace levels become thus detectable by
current analytical methods.
To succeed, two technical essential conditions are to be met:
(1) each single bead must contain copies of one unique hexapeptide
ligand (one-bead-one-peptide) [40] and (2) an oversaturated load-
ing condition [41] is required. In agreement with these statements,
Huhn et al. [42] and Rivers et al. [43] showed that the loading
volume is critical for the reduction of the dynamic range, increasing
the number of proteins identifications while increasing sample
volume.
The entire protein adsorption mechanism is regulated by sev-
eral physicochemical parameters such as pH [44], buffer ionic
strength, presence of competitors, temperature and protein
concentration.
To date a large number of applications is available. When just
considering plant proteomics applications, this technique extends
the elucidation of proteome compositions from various organs
[36], the detection of allergens [6], and the differential protein
expression upon stress conditions [34].
5 Materials and Methods
Ammonium bicarbonate, ammonium sulfate, CHAPS (3-[(3-cho-

lamidopropyl)-dimethylammonio]-1-propanesulfonate), chloro-
form, citric acid, dithiothreitol, ethylene glycol, formic acid,
guanidine, glycine, iodoacetamide, methanol, potassium phosphate
monobasic, sodium chloride, sodium dodecyl sulfate, sodium phos-
phate dibasic dehydrate, thiourea, tris(hydroxymethyl) amino-
methane, and urea are chemicals and biochemicals of high purity
grade from Sigma-Aldrich, Saint Louis, MO. Protease inhibitor
cocktail is from Roche Diagnostics, Basel, CH., or from Sigma-

Aldrich, Saint Louis, MO. Rapigest SF is from Waters Corp.,
Milford, MA.
ProteoMiner (a CPLL), a solid-phase combinatorial peptide
ligand library as a mixed bed, is available from Bio-Rad Labora-
tories, Hercules, CA, USA (see Note 1).
Vortex and benchtop centrifuge from Thermo Fisher Scientific.
Centricon, centrifugal filters for cutoff 3000 or 10,000 kD are
from Millipore Corp. Milford, MA.
6 Protein Capture with Concomitant Dynamic Range Reduction
The capture of proteins by CPLLs is operated according to specific

situations. For instance, when attempting to analyze the
low-abundance proteome the recommended general procedure is
to use physiological buffer conditions of pH and ionic strength and
at room temperature. By the modulation of these generic capture
conditions it is thus possible to target the reduction of dynamic
concentration range on various category of proteins (see Fig. 2 and
detailed descriptions below). A number of other preliminary con-
siderations have to be accounted in order to reach optimized
results. To this end refer to Notes 1–7.
Another point of interest to be mentioned is the nature of the
plant sample. To prevent possible negative interferences with
CPLLs the protein extract must be clear without products in sus-
pension. It has also to be DNA-free and should not contain lipids,
pigments, polysaccharides, and other chemicals that prevent a
proper capture of proteins. For complete information see Subhead-
ing 3.
6.1 General Capture The amount of neutral salt to reach physiological conditions is
Method under 150 mM. Most frequently the buffer used is a 25 mM phosphate
Physiological buffer containing 0.15 M sodium chloride, pH 7.2 (PBS). These
Conditions buffers mimic the conditions that reign within a cell; by definition
these conditions fully preserve the biological functions of proteins.
(a) Equilibrate the plant protein clear sample with the selected
physiological buffer (e.g., PBS). This operation is performed
by different ways. For instance, if the protein sample is a
lyophilized material, dissolve the powder in the buffer and
clarify by centrifugation. Otherwise a dialysis or a diafiltration
operation or a desalting chromatography or even a desalting
by centrifugation using dedicated membrane devices can be
adopted. For information on protein amounts and concentra-
tions, see Notes 3, 8–11. If the protein sample contains pro-
teases (this is frequently the case) a tablet of a cocktail of
protease inhibitors is added.
Initial protein aqueous clear extract
Low Acidic Alkaline Lyotropic

ionic PBS
pH pH salts
strength
CPLLs CPLLs CPLLs CPLLs CPLLs
Low-abundance proteins
Large Stringent Acidic Alkaline Hydrophobic

collection collection proteins proteins proteins
Fig. 2 Protein capture phase of the dynamic range reduction process by means
of CPLLs. The operation is most generally performed under physiological con-
ditions; however, the reduction of ionic strength allows for capturing more
proteins especially those that have week affinity for the CPLLs. Specific condi-
tions can be used to enhance either or alkaline or hydrophobic low-abundance
protein capturing
(b) Equilibrate the CPLLs using a physiological buffer; this is the

same buffer used to dissolve the protein sample. Then drain
out the excess of liquid by low-speed centrifugation at about
1250 g at 20 C for a few min.
(c) Put in contact the protein sample and the CPLL beads; stir
gently to maintain the beads suspended within the liquid. The
contact time should be of at least 2–3 h or overnight. Room
temperature incubation is the most current option; however,
the protein capture can be performed at a different tempera-
ture, such as for instance at 4 C (see Note 12).
(d) Eliminate the excess of supernatant by centrifugation. Wash
the CPLL beads with the equilibration buffer to remove the
excess of proteins and store at 4 C while waiting for the
protein elution (see Subheading 7).
6.2 Protein Capture The reduction of ionic strength of the capture buffer promotes or
in Low-Ionic Strength intensifies electrostatic interactions. Weakly charged proteins can
thus be more easily attracted by the electrical charge of the beads.
The result of the decrease of ionic strength is an increase of binding
capacity as this is the case when dealing with ion exchange chroma-
tography. Under these conditions the amount of proteins offered
to the beads should at least be doubled to reach optimized condi-

tions for the reduction of dynamic protein concentration range.
The technical methodology remains exactly the same as per the
protein capture under physiological conditions (see section above).
6.3 The Capture Proteins carry electrical charges of both signs depending on the
of Dominantly Acidic pH. At low pH the positive charge is exacerbated and proteins that
Proteins at neutral pH are negatively charged will reverse their electrical sign.
In this case proteins that were captured by CPLLs because of
attractive electrical sign could be repulsed by them. This is why
for a good reproducibility a perfect control of pH is mandatory.
Since proteins are captured by the beads thanks also to other
interactions, the variation of pH may contribute to weaken the
interaction intensity with certain protein species to the point that
no capture occurs.
The range of pH values where the CPLLs can be operated is
between 3 and 10 [44]. Beyond these limits virtually all proteins are
charged, respectively, either positively or negatively.
As a general rule when the operation is performed in acidic
conditions the capture of anionic proteins is enhanced.
(a) Adjust the protein extract at the desired acidic pH (most
generally pH 4) by adding dropwise either acetic acid or citric
acid up to pH stabilization. This operation can also be per-
formed by buffer exchange (dialysis or diafiltration or gel
filtration). Remove possible materials in suspension.
(b) Equilibrate the CPLL beads using an acidic buffer of the same
pH selected for the protein extract. Then drain out the excess
of liquid by centrifugation under low-speed (about 1250 g
at 20 C for a few min).
(c) Mix the protein sample and the CPLL beads; stir gently to
maintain the beads in suspension for at least 2–3 h or over-
night at constant room temperature. A majority of acidic
proteins are captured.
(d) The protein capture extent will depend also on the ionic
strength of the buffer as described in Subheading 4.
(e) Eliminate the excess of supernatant by centrifugation.
(f) Wash 2–3 times the CPLL beads with the equilibration buffer
to remove the excess of proteins and proceed for the elution of
captured proteins by one of the methods described in
Subheading 7.
6.4 The Capture As stated above (Subheading 6.3), when varying the buffer pH,
of Dominantly Cationic proteins acquire a different net electrical charge. In alkaline condi-
Proteins tions the dominant charge is negative for proteins having an iso-
electric point below the environmental pH.
As a general rule when the operation is performed in alkaline

conditions the capture of cationic proteins is enhanced.
(a) Adjust the protein extract at the desired alkaline pH (most
generally pH 9) by adding dropwise a solution of ammonium
hydroxide or of Tris base up to pH stabilization. This opera-
tion can also be performed by buffer exchange (dialysis or
diafiltration or gel filtration). Remove possible material in
suspension.
(b) Equilibrate the CPLL beads using an alkaline buffer of the
same pH selected for the protein extract. Then drain out the
excess of liquid by centrifugation at about 1250 g at 20 C
for a few min.
(c) Mix the protein sample and the CPLL beads; stir gently to
night at constant room temperature. A majority of cationic
proteins are captured while the majority of anionic proteins
may stay in solution. The capture extent will depend also on
the ionic strength of the buffer as described in Subheading 4.
(d) Eliminate the excess of supernatant by centrifugation.
(e) Wash 2–3 times the CPLL beads with the equilibration buffer
to remove the excess of proteins and proceed for the elution of
captured proteins by one of the methods described in
Subheading 7.
6.5 Focus Among natural amino acids composing proteins the most hydro-
on Hydrophobic phobic are isoleucine, leucine, valine and phenylalanine. They con-
Protein Capture tribute to confer a certain degree of hydrophobicity to the entire
polypeptidic construct. A typical method of separating this cate-
gory of proteins is hydrophobic chromatography [45]. This
method is based on the use of structuring salts selected from the
Hofmeister series. The most common process is to equilibrate the
columns by using a buffer comprising at least 1 M ammonium
sulfate. Under these conditions the most hydrophobic proteins
are adsorbed by the CPLL beads and are thus subtracted from the
protein solution. Electrostatic interactions are minimized because
of the presence of strong salt ions. Within the present context with
an entire proteome, the capture of hydrophobic protein by CPLLs
can easily be enhanced. The technical details are as follows:
(a) To the protein extract add the desired amount of lyotropic salt
(generally this is 1 M ammonium sulfate final concentration).
In case of difficulties with protein precipitation the user should
refer to Note 13. Protein equilibration can alternatively be
equilibrated with a buffer comprising the lyotropic salt by
buffer exchange (dialysis or diafiltration or gel filtration). A
possible cloudy material may appear in the supernatant and
should be removed by centrifugation at 10,000 g for
10 min.
(b) Equilibrate the CPLL beads using a buffer containing the

same amount of ammonium sulfate adopted for sample condi-
tioning. Then drain out the excess of liquid by centrifugation
under low-speed (about 1250 g at 20 C for a few min).
(c) Mix the protein sample and the CPLL beads; stir gently just to
night at constant room temperature. A majority of hydropho-
bic proteins are captured.
(d) Eliminate the excess of supernatant by centrifugation. Wash
2–3 times the CPLL beads with the equilibration buffer con-
taining the lyotropic salt to remove the excess of biological
material and proceed for the elution of captured proteins by
one of the methods described in Subheading 7.
7 Recovery Protocols of Plant Protein from CPLLs
Interaction forces between proteins and CPLL beads are of differ-

ent nature. The most dominating forces are electrostatic interac-
tions, hydrophobic associations, hydrogen bonding, and van der
Waals interactions. They can act singularly or collectively according
to the sequence of the hexapeptides.
Electrostatic interactions are probably the most representative
forces; they depend on the environmental pH and can be
challenged by the presence of salt ions. These forces are attractive
or repulsive depending on the sign of the electrical charge. Changes
in temperature influence the intensity of electrostatic interactions:
for instance a decrease in temperature increases the interaction.
Hydrophobic associations are only attractive. They are attrib-
uted to the presence of hydrophobic amino acids throughout the
protein sequence. This interaction results from an association of
cooperative molecules capable to repel water. As a result water
molecules around these associations are particularly structured con-
tributing thus to strengthen the association with the global reduc-
tion of entropy. Hydrophobic associations are modulated by the
environmental temperature (an increase of temperature up a certain
level reinforces the molecular association) and by the presence of
structuring salts.
The dissociation of hydrophobic interactions (this is the pur-
pose of this section) is produced by competing molecules such as
heavy alcohols, glycols and detergents and water-destructuring
molecules (chaotropic agents such as urea and guanidine). A simple
reduction of ionic strength may also contribute to decrease the
strength of weak hydrophobic associations.
Hydrogen bonding is largely present in polypeptidic structures.
It takes its origin from two electronegative atoms that share the
same hydrogen atom. For instance the protonation of glutamic and
aspartic acids acts as a donating group contribute to the creation of

hydrogen bonding. These interactions occur when the distance
between molecular species is short: the shortest the distance the
strongest the hydrogen bond.
Hydrogen bonding is quite sensitive to pH changes, competi-
tors and water destructuring agents (guanidine and urea). In cer-
tain cases, analog molecules (arginine and citrulline) could act as
competitors of hydrogen bonding.
Within the context of the protein interactions with CPLLs,
grafted hexapeptides may comprise chemical groups capable to
interact as a mixed mode. In this case, concomitant electrostatic
interactions, hydrophobic associations, and hydrogen bonding may
be present. This situation is to be considered when designing a
proper elution protocol for protein harvesting. Global protein
elution from CPLLs is the most common option; however, alterna-
tive fractionated elution may facilitate the analytical process with
the delivery of more detailed information in terms of proteome
composition. A summary of various options is given in Fig. 3.
7.1 Global Protein Global protein elution from CPLLs is the most frequent option in
Harvesting protein harvesting for proteomics analysis. To this end all involved
interaction forces have to be challenged. In frequent cases it has
been observed that after elution some proteins are still present on
the beads. They are polypeptides retained with high association
constants, among them low-abundance proteins. If they are not
eluted they escape the proteomics analysis with a significant reduc-
tion efficiency of the CPLL treatment. It is within this context that
several global elution methods can be devised.
7.1.1 Global Protein This is one of the most efficient elution methods. It involves
Elution sodium dodecyl sulfate (SDS) as repeatedly described [46]. SDS is
with SDS-Containing known in electrophoresis to confer to proteins a similar global
Buffers charge by sticking on the proteins via hydrophobic associations
and exposing thus strong sulfonate groups. With this profound
restructuring, proteins desorb from the solid CPLL phase. This
operation is performed in the presence of dithiothreitol preventing
the formation of disulfur bonds while enhancing the solubility of
proteins. In addition the high temperature of treatment (boiling
water bath) accelerates the elution procedure to just a few minutes.
(a) Prepare an aqueous solution of 3% SDS (this concentration
could be as high as 10%) and add dithiothreitol (DTT) up to a
final concentration of 25 mM.
(b) To 100 μL of CPLL beads loaded with proteins add 200 μL of
SDS-DTT solution. Mix gently while preventing the forma-
tion of foam and then put in a boiling bath for 10 min.
(c) Cool down the bead suspension and separate the supernatant
by low-speed centrifugation (e.g., 2000 g for 10 min).
Fig. 3 Protein elution phase from CPLLs. A variety of protein desorption methods
can be combined either as global protein elution or as fractionated elution. The
latter can be composed of two, three, or more desorption steps. When more than
one desorption is involved, proteins are collected as a function of elution
stringency or by challenging individually the elemental molecular interactions
(d) Make another protein extraction with an additional 200 μL of

SDS-DTT solution under the same conditions and separate
the supernatant. Pool the latter with the first eluate and store
in the cold while waiting for proteomic analyses. For compati-
bility with the following analytical determinations see Sub-
heading 8. In a number of cases SDS present in the eluate
must be eliminated; Note 14 gives detailed instructions for
protein precipitation. To check that all proteins are desorbed
from CPLL beads, a recommendation is given in Note 15.
7.1.2 Global Protein Another efficient agent capable to desorb proteins from complex
Elution with Guanidine affinity column is guanidine hydrochloride. Such a solution is used
Hydrochloride Solutions at a quite high concentration. It easily competes against electro-
static interactions. Guanidine is a strong chaotropic agent able to
weaken hydrogen bondings and hydrophobic associations. The
final result is the total desorption of proteins that are captured by
CPLLs. Naturally after exposure with guanidine hydrochloride,
desorbed proteins are destructured and hence denatured.
(a) Prepare an aqueous solution of 6 M guanidine and adjust the
pH to 6 by addition of 3–6 M hydrochloric acid.
the guanidine elution solution, mix gently for 10 min.
(c) Separate the proteins that are in solution in the supernatant by
low speed centrifugation (e.g., 2000 g for 10 min). Then
repeat the operation with the recovered CPLL pellet to be sure
that all proteins located within the bead pores are extracted.
The second supernatant also recovered by centrifugation is
pooled with the first one.
(d) The assembled eluate solution is not directly analyzable by
current methods because of the presence of high concentra-
tions of guanidine. The protein solution must thus be dialyzed
against any appropriate buffer and if necessary concentrated
and precipitated.
(e) To check that all proteins are desorbed from CPLL beads a
recommendation is given with Note 15.
7.2 Fractionated Several sequenced elution methods have been reported. They are
Elution Approaches also detailed in a dedicated book where variations are described
[47]. The principle is to start with a relatively mild elution step
followed by other desorption steps each of them being performed
with chemical agents or displacers of increased stringency. The
reason behind this approach is first to be sure that all proteins are
desorbed and second that each fraction is populated by a lower
number of species compared to a global protein elution, thus facil-
itating the following analytical procedures.
7.2.1 Two-Step Elution (a) Prepare two different desorbing aqueous solutions. The first is
with Increased Stringency composed of 4 M urea, 1% CHAPS, 5% acetic acid, the second
is a 6 M guanidine-HCl, pH 6.0.
the first elution solution. Mix gently for about 10 min.
(c) Separate proteins that are in the supernatant from the beads by
low speed centrifugation (e.g., 2000 g for 10 min). Treat
the CPLL beads a second time under exactly the same condi-
tions and pool the two supernatants. Store this first eluate in
the cold.
(d) Mix then CPLL beads pellets with 200 μL of guanidine-HCL
solution and gently shake for 10 min. Separate the supernatant
by centrifugation and repeat the operation. Separate the sec-
ond supernatant by low-speed centrifugation and pool with
the first one. Store this second eluate in the cold.
(e) The two eluates are ready for protein analysis by chromatog-
raphy, mass spectrometry or electrokinetic methodologies.
7.2.2 Three-Step Elution (a) Prepare three different desorbing aqueous solutions. The first
with Increased Stringency is composed of 2 M thiourea, 7 M urea, and 2% CHAPS (here
(Option 1) named TUC). The second solution is composed of 9 M urea
acidified to pH 3 by acetic acid or citric acid (here named
UCA). The third solution is a mixture of acetonitrile, isopro-
panol, ammonia at 20% and water (6, 12, 10 and 72% respec-
tively) (here named AIAW).

TUC solution; mix gently for 10 min.
(c) Separate proteins that are in the supernatant from the beads by
low speed centrifugation (e.g., 2000 g for 10 min). Treat
the CPLL beads a second time under exactly the same condi-
tions and pool the two supernatants. Store this first eluate in
the cold.
(d) Proceed similarly for the obtention of the second and the third
eluate.
(e) Store the three eluates in the cold.
(f) The eluates are ready for protein composition analysis by mass
spectrometry, chromatography, or electrokinetic methodolo-
gies. For compatibility with analytical determinations see
Subheading 8.
7.2.3 Three-Step (a) Prepare three different desorbing aqueous solutions. The first
Increased Stringency is composed of 1 M sodium chloride, the second composed of
Elution (Option 2) 3 M guanidine-HCl pH 6.0 and the third comprising 9 M
urea titrated with citric acid up to pH 3–3.5.
(b) Proceed as three steps elution described above in Subheading
7.2.2.
(c) Store the three eluates in the cold.
(d) The eluates are ready for protein composition analysis by mass
spectrometry, chromatography or electrokinetic methodolo-
gies (see Subheading 8).
7.3 Direct on-Bead When the analysis of captured proteins is performed by the
Protein Digestion so-called shotgun approach, the most direct way to proceed is to
make a digestion of the captured proteins directly on the beads. The
method is derived from the in-solution digestion of proteins
[48]. The operation requires some excess of trypsin, since part of
it will be captured by the CPLL beads. Basically the process is as
follows:
(a) After protein capture on the peptide library beads (whatever
the method or the physicochemical conditions), the beads are
rapidly washed twice with 200 μL of 100 mM ammonium
bicarbonate containing 0.1% Rapigest (this is not mandatory,
but it facilitates the proteolysis process). This is obtained by
adding 1 mL of 100 mM ammonium bicarbonate to the 1 mg
Rapigest vial lyophilizate and shake gently for few minutes).
The bead suspension is then vortexed for few min.
(b) Add 300 μL of 10 mM DTT and heat the bead suspension at
65 C for 1 h under gentle stirring or occasional shaking.
(c) Add 300 μL of 55 mM iodoacetamide, mix and store in the

dark for 60 min at room temperature.
(d) Add 60 μL of 0.2 μg/μL trypsin sequencing grade.
(e) Vortex the bead suspension and incubate overnight at 37 C
under gentle shaking.
(f) Add 200 μL of 500 mM formic acid, vortex for few seconds
and incubate for about 40 min at room temperature.
(g) Recover then the supernatant by filtration (30,000 MWCO)
under centrifugation (e.g., 10,000 g for 20 min) in order to
separate insoluble material and beads.
(h) In order to fully extract the remaining peptides wash the beads
under centrifugation once with 50 μL of 500 mM formic acid
and mix to previous filtrate.
(i) Stripped beads could then be kept at 20 C for possible
further analysis.
(j) The solution of peptides is then dried by speedvac and redis-
solved in 20 μL HPLC solvent for LC–mass spectrometry
analysis.
8 Compatibility Between Protein Elution from CPLLs and Analysis
Proteomic analysis for samples obtained after treatment with

CPLLs may not be directly streamlined. However, there are situa-
tions when the proteomics analysis can be directly applied after
protein harvesting.
Proteomics analysis frequently starts with SDS-PAGE separa-
tion for which the sample composition is critical. In this respect
protein harvesting by boiling the CPLL beads with a solution of
sodium dodecyl sulfate in the presence of reducing agents appears
fully compatible with SDS-PAGE with no preliminary formulations
[49]. In other circumstances the elution of proteins is operated by a
mixture of chemical agents that are compatible with isoelectric
focusing. This is the elution with TUC (see Subheading 7.2.2).
After this first protein separation dimension it can then be possible
to extend to two-dimensional electrophoresis and then protein spot
identifications. Nevertheless, the treatment of CPLLs with TUC
solutions may not elute 100% of proteins from the beads and
should be completed by another orthogonal desorption operation.
To circumvent this situation TUC solution could comprise some
amounts of cysteic acid that produces an almost exhaustive desorp-
tion of proteins. In this case due to the very low pI value of cysteic
acid, which collects at the anode, two-dimensional electrophoresis
can also be easily performed [50].
When 2D-DIGE is used as two-dimensional electrophoresis

analysis, the only elution that is compatible with this technique is
the use of 20 mM Tris buffer containing 7 M urea, 2 M thiourea
and 4% CHAPS, pH 8.5 (sodium carbonate could also be used
instead of Tris).
For direct ELISA-based assays of eluted proteins, the denatur-
ing desorption agents that can be compatible are 0.2 M glycine-
HCl, 2% NP-40, pH 2.4; 0.1 M acetic acid, 2% NP-40; 1 M NaCl,
2% NP-40; or 0.1 M acetic acid containing 40% ethylene glycol. In
case a single eluent does not desorb all proteins from the beads
these solutions could be used as a sequence and the eluates pooled.
In all other circumstances, the protein solution collected from
CPLL beads needs to be treated in order to equilibrate them in
appropriate buffers by diafiltration, by extensive dialysis or gel
filtration.
It is here recalled that protein elution from beads may not be
necessary. This is the case when trypsin digestion is operated
directly on the beads and the obtained peptides directly analyzed
by LC-MS/MS [41]. This approach is recommended especially
when dealing with small samples involving small volumes of beads
with time saving and largely reduced protein losses.
9 Practical Application Examples of CPLL-Treated Plant Extracts
Examples are numerous in the literature and it is out of scope to

make a general review on the subject. Essential application exam-
ples are focused on the analysis of plant proteomes [51], the discov-
ery of specific proteins [52] the detection of expression differences
upon specific conditions [53] and the discovery of plant allergens
[54]. Overall, various plant organs extracts have been thus analyzed
with the intervention of combinatorial peptide ligand libraries and
few illustrative examples are given.
Within the domain of low-abundance proteome investigations,
studies on particularly recalcitrant plant proteomes should be men-
tioned. This is materialized by the analysis of avocado and banana
pulps [54, 55]. In both cases, about 1% total protein is embedded
either in solid oil (avocado) or in huge amounts of polysaccharides
(banana). In order to improve discovery of low-abundance species,
in parallel with the standard, native condition extraction, a denatur-
ing solubilization protocol has been implemented, based on 3%
boiling SDS (an anathema in CPLL treatments, since it would
completely inhibit the protein capture). This issue has been circum-
vented in two ways: (1) SDS removal by the classical acetone–
methanol precipitation and (2) the dilution of SDS from 3 to
0.1% in presence of another CPLL-compatible surfactant, like
0.5% CHAPS. This procedure allowed for identifying 1012 unique
proteins; 174 of them were in common with the control, untreated
Fig. 4 SDS–polyacrylamide gel electrophoresis analysis of various fruit pulp protein extracts before and after
treatment with combinatorial peptide ligand library. (a) banana pulp [55]; (b) mango pulp [57]; (c) lemon pulp
[58]; (d) orange pulp [59]; (e) wolfberry pulp [60]; (f) avocado pulp [54]; (g) olive pulp [56]. By courtesy from
Boschetti and Righetti [36]
sample and 190 present only in the control. Overall 648 new
proteins have been detected via CPLLs. In the case of banana, out
of a total number of 1131 proteins identified, 849 were attributed
to the CPLL technology.
From olive fruit pulp [56], where only native extraction was
applied, the number of unique gene products found was only
252, but already much higher compared to what was known from
the literature. Examples of analysis of fruit proteins before and after
treatment with CPLLs are illustrated on Fig. 4.
To the large list of known protein allergens from plants there
are molecules that are below the detection limits. They can be
evidenced after treatment with CPLLs. One of the most represen-
tative examples is the discovery of low-abundance allergens from
cypress pollen [61]. From patient serum exposure the list of cypress
pollen allergens has been enriched of several new, never-described
species such as chaperone protein HSP104, a Sigma factor SigB
regulation protein (a hydrolase involved in stress regulation mech-
anism), and Rab-like protein. A number of other allergens have
been discovered using CPLLs in Hevea latex [35], mango [62], and
banana [55].
The detection of plant protein markers due to environmental

unexpected factors obtained by the use of CPLLs has been exten-
sively reviewed [36]. Biotic (e.g., pathogen attacks) and nonbiotic
factors (temperature changes, flooding, drought, contact with
heavy metals, etc.) have been described. In most cases defense
mechanisms involving signaling proteins as well as antioxidative
complexes are involved.
10 Notes
1. ProteoMiner, a CPLL (combinatorial peptide ligand library), is

a beaded and porous mixed bed affinity-like solid phase
designed for proteomics applications. It is commercialized by
Bio-Rad Laboratories, Hercules, USA.
Prior to use, commercial CPLLs need to be conditioned
for an optimal efficiency. When CPLLs are delivered dry, they
need to be fully rehydrated to recover gel pores compatible
with protein-free diffusion. These beads carrying different hex-
apeptides (from very hydrophilic to highly hydrophobic
sequences) do not have all the same swellability properties.
To comply with various situations it is first advised to slurry
100 mg of dry beads in 2 mL methanol for 30 min while
shaking gently and then add 2 mL of phosphate buffer (e.g.,
25 mM pH 7). The rehydration is to be extended overnight at
room temperature. The rehydrated beads are then washed
extensively, with the buffer selected for the capture of proteins
as described above for the aqueous slurry. Rehydrated and
buffer-equilibrated beads can be stored in the cold at 4 C
and used within the day.
2. It is recommended not to reuse hexapeptide beads because
(1) some level of carryover may appear, with consequent mis-
interpretation of data, and (2) some hexapeptides may have
been modified as a consequence of stringent elution conditions
from previous operations.
3. The sample should be clear and not contain lipids in suspen-
sion. Large amounts of nucleic acids or viscous polysaccharides,
when present, should also be removed using current methods.
Samples should not contain a large amount of detergents or
denaturing agents. For example, nonionic detergents are toler-
ated at concentrations not exceeding 0.5% (wt/vol); urea is
also tolerated at a concentration not exceeding 3 M.
The method can be applied to a large variety of plant
protein extracts after appropriate elimination of interfering
biopolymers. Nevertheless, specific aspects of optimization
might have to be considered according to the encountered
issues. If for instance the protein concentration is below
0.1 mg/mL it may be useful to have a preliminary concentra-

tion. This would improve the capture by CPLLs in case the
affinity is too low. Among possible concentration methods are
dialysis followed by lyophilization or membrane concentration
under centrifugation.
The presence of proteases, relatively frequent in plant
extracts, is deleterious for the integrity of proteins. Their activ-
ity must be stopped with various selected inhibitors or inactiva-
tion agents prior to contact with CPLLs.
4. It may happen that during the capture stage, bead aggregation
occurs. In this case the supernatant must be separated by high-
speed centrifugation and the collected solid material is to be
washed extensively with PBS under strong shaking (e.g., vor-
tex) up to the dissociation of beads from each other. Chemical
agents are not recommended since they may desorb captured
proteins.
5. An insufficient decrease of high-abundance proteins treated
with CPLLs may mean that the amount of proteins in the initial
sample was not sufficient to saturate the beads. This could be
resolved by either increasing the amount of sample or by
decreasing the volume of CPLL beads.
6. It is reminded that the enrichment of low-abundance proteins
renders the sample to be analyzed more complex (many more
proteins are detectable). The subsequent analytical operations
might become of difficult interpretation. To facilitate the anal-
ysis it is advised to fractionate the collected proteins or to elute
the capture proteins sequentially.
7. Generally CPLL treatments are highly reproducible; however,
if results are not exactly the same from an experiment to
another, it is advised to check the ionic strength and the pH
of the initial sample. Actually even little modifications of these
parameters alter the affinity of proteins for the hexapeptides
baits grafted on CPLL beads with consequent modification of
the molecular interaction process.
8. The protein concentration to offer to CPLLs should be
between 1 and 10 mg/mL. Lower concentrations may render
the capture of very low-abundance proteins challenging when
the dissociation constant is too high.
9. The total amount of plant protein from the sample should be
minimum 50 mg for 100 μL of hexapeptide ligand library
beads.
10. To increase the probability to find low-copy species the loading
should be increased.
11. When the volume of the sample is very small the volume of
CPLLs should be decreased; however, the smallest volume of
beads usable without losing too much the selectivity is around

10 μL.
12. Incubation of plant proteins with CPLLs should be performed
at room temperature. An increase of temperature may engen-
der stronger hydrophobic associations; a decrease of tempera-
ture may result in an acceleration of electrostatic interactions.
Low temperatures also increase the viscosity of the sample with
more difficult diffusion within the pores of the gel beads.
Temperature fluctuations between serial experiments may ren-
der the reproducibility challenging. Use always exactly the
same incubation temperature throughout similar experiments.
13. The presence of ammonium sulfate in the plant protein sample
may engender partial protein precipitation with consequent
protein losses. To prevent this phenomenon, the concentration
of lyotropic salts has to be adjusted case-by-case below the
critical level of precipitation.
14. The precipitation of proteins by methanol–chloroform in view
of eliminating sodium dodecyl sulfate can be performed by
adding four volumes of cold pure methanol to the protein
solution while stirring vigorously for few minutes. Then three
volumes of pure cold chloroform are added with continuous
stirring. Finally three additional volumes of deionized water are
added. The protein precipitation process is complete within
10–20 min at room temperature. Proteins are removed by
centrifugation at 15,000 g for about 5 min at 4 C (aggre-
gated proteins will be located the liquid interface). The aque-
ous layer is then pipetted out and discarded. Four other
volumes of methanol are added while stirring for a few min-
utes. The supernatant is removed again by centrifugation at
15,000 g for about 5 min at 4 C without disturbing the
protein precipitate. A last wash with acetone may facilitate the
removal of methanol. Protein pellets are dissolved by using an
appropriate buffer compatible with subsequent operations.
15. After elution the CPLL beads are theoretically free of proteins.
To check for the protein absence 100 μL of the “eluted” beads
is mixed with 10% SDS solution containing 25 mM DTT and
boiled for 10 min. The supernatant is then recovered and
directly analyzed by SDS-PAGE. Staining must be very sensi-
tive (e.g., silver staining). The presence of protein bands indi-
cates an incomplete protein desorption.
References
1. Jorrı́n-Novo JV, Maldonado AM, Echevarrı́a- nuclear subproteome studies in rice (Oryza
Zomeño S et al (2009) Plant proteomics sativa) endosperm. Electrophoresis
update (2007–2008): second-generation pro- 29:604–617
teomic techniques, an appropriate experimen- 13. Ribeiro M, Nunes-Miranda JD, Branlard G
tal design, and data analysis to fulfill MIAPE (2013) One hundred years of grain omics:
standards, increase plant proteome coverage identifying the glutens that feed the world. J
and expand biological knowledge. J Proteome Proteome Res 12:4702–4716
72:285–314 14. Xiong E, Wu X, Yang L et al (2014)
2. Agrawal GK, Rakwal R (2008) Plant proteo- Chloroform-assisted phenol extraction
mics: technologies, strategies, applications. improving proteome profiling of maize
Wiley, Hoboken embryos through selective depletion of high-
3. Agrawal GK, Job D, Zivy M et al (2011) Time abundance storage proteins. PLoS One 9:
to articulate a vision for the future of plant e112724
proteomics - a global perspective: an initiative 15. Tavakolan M, Alkharouf NW, Matthews B et al
for establishing the international plant proteo- (2014) SoyProLow: a protein database
mics. Proteomics 11:1559–1568 enriched in low abundant soybean proteins.
4. Boschetti E, Hernandez-Castellano LE, Righ- Bioinformation 10:599–601
etti PG (2019) Progress in farm animal prote- 16. Carpentier SC, Panis B, Vertommen A et al
omics: the contribution of combinatorial (2008) Proteome analysis for non-model
peptide ligand libraries. J Proteome 197:1–13 plants: a challenging but powerful approach.
5. Boschetti D’AA, Candiano G, Righetti PG Mass Spectrom Rev 27:354–377
(2018) Protein biomarkers for early detection 17. Gengenheimer P (1990) Preparation of
of diseases: the decisive contribution of CPLLs. extracts from plants. Methods Enzymol
J Proteome 188:1–14 182:174–193
6. Boschetti E, Fasoli E, Righetti PG (2015) The 18. Boschetti E, Bindschedler L, Tang C et al
discovery of low-abundance allergens by prote- (2009) Combinatorial peptide ligand libraries
omics analysis involving combinatorial peptide and plant proteomics: a winning strategy at a
ligand libraries. Jacobs J Allergy Immunol price. J Chromatogr A 1216:1215–1222
2:015 19. Wang W, Vignani R, Scali M et al (2004)
7. Hijazi M, Velasquez SM, Jamet E et al (2014) Removal of lipid contaminants by organic sol-
An update on post-translational modifications vents from oilseed protein extract prior to elec-
of hydroxyproline-rich glycoproteins: toward a trophoresis. Anal Biochem 329:139–141
model highlighting their contribution to plant 20. Méchin V, Damerval C, Zivy M (2007) Total
cell wall architecture. Front Plant Sci protein extraction with TCA-acetone. Meth-
5:395–405 ods Mol Biol 355:1–8
8. Millar DJ, Whitelegge JP, Bindschedler LV et al 21. Wessel D, Flugge UI (1984) A method for the
(2009) The cell wall and secretory proteome of quantitative recovery of proteins in dilute solu-
a tobacco cell line synthesising a secondary tions in the presence of detergents and lipids.
wall. Proteomics 9:2355–2372 Anal Biochem 138:141–143
9. Xu MS, Chen S, Wang WQ et al (2013) 22. Isaacson T, Damasceno CM, Saravanan RS et al
Employing bifunctional enzymes for enhanced (2006) Sample extraction techniques for
extraction of bioactives from plants: flavonoids enhanced proteomic analysis of plant tissues.
as an example. J Agric Food Chem Nat Protoc 1:769–774
61:7941–7948
23. Faurobert M, Pelpoir E, Chaı̈b J (2007) Phe-
10. Cho WK, Hyun TK, Kumar D et al (2015) nol extraction of proteins for proteomic studies
Proteomic analysis to identify tightly-bound of recalcitrant plant tissues. Methods Mol Biol
cell wall protein in rice calli. Mol Cells 355:9–14
38:685–696
24. Cereda A, Kravchuk AV, D’Amato A et al
11. Demirevska-Kepova K, Simova-Stoilova L, (2010) Proteomics of wine additives: mining
Kjurkchiev S (1999) Barley leaf RuBisCO, for the invisible via combinatorial peptide
RuBisCO-binding protein and RuBisCO acti- ligand libraries. J Proteome 73:1732–1739
vase and their protein/protein interactions.
Bulg. J Plant Physiol 25:31–44 25. Kim YJ, Wang Y, Gupta R et al (2015) Prot-
amine sulfate precipitation method depletes
12. Li G, Nallamilli BR, Tan F et al (2008) abundant plant seed-storage proteins: a case
Removal of high-abundance proteins for
study on legume plants. Proteomics peroxidase amino acid sequences. Proteomics

15:1760–1764 16:491–503
26. Lee HM, Gupta R, Kim SH et al (2015) Abun- 38. Zhu W, Xu X, Tian J et al (2016) Proteomic
dant storage protein depletion from tuber pro- analysis of Lonicera japonica immature flower
teins using ethanol precipitation method: buds using combinatorial peptide ligand
suitability to proteomics study. Proteomics libraries and polyethylene glycol fractionation.
15:1765–1769 J Proteome Res 15:166–181
27. Alam I, Sharmin S, Kim KH et al (2013) An 39. Ye Z, Zhou S, Thannhauser TW et al (2014)
improved plant leaf protein extraction method Identification of drought-induced leaf pro-
for high resolution two-dimensional polyacryl- teomes in switchgrass. Proc Plant Animal
amide gel electrophoresis and comparative pro- Genome Conference, San Diego
teomics. Biotech Histochem 88:61–75 40. Righetti PG, Boschetti E (2013) Combinato-
28. Mortezai N, Harder S, Schnabel C et al (2010) rial peptide libraries to overcome the classical
Tandem affinity depletion: a combination of affinity-enrichment methods in proteomics.
affinity fractionation and immunoaffinity Amino Acids 45:219–229
depletion allows the detection of 41. Thulasiraman V, Lin S, Gheorghiu L et al
low-abundance components in the complex (2005) Reduction of the concentration differ-
proteomes of body fluids. J Proteome Res ence of proteins in biological liquids using a
9:6126–6134 library of combinatorial ligands. Electrophore-
29. Mithoe SC, Menke FL (2015) Phosphopeptide sis 26:561–3571
immuno-affinity enrichment to enhance detec- 42. Huhn C, Ruhaak LR, Wuhrer M et al (2012)
tion of tyrosine phosphorylation in plants. Hexapeptide library as a universal tool for sam-
Methods Mol Biol 1306:135–146 ple preparation in protein glycosylation analy-
30. Wu XN, Xi L, Pertl-Obermeyer H et al (2017) sis. J Proteome 75:1515–1528
Highly efficient single-step enrichment of low 43. Rivers J, Hughes C, McKenna T (2011) Asym-
abundance phosphopeptides from plant mem- metric proteome equalization of the skeletal
brane preparations. Front Plant Sci 8:1673 muscle proteome using a combinatorial hexa-
31. Kwon SJ, Choi EY, Seo JB et al (2007) Isola- peptide library. PLoS One 6:e28902
tion of the Arabidopsis phosphoproteome 44. Fasoli E, Farinazzo A, Sun CJ et al (2010)
using a biotin-tagging approach. Mol Cells Interaction among proteins and peptide
24:268–275 libraries in proteome analysis: pH involvement
32. Fasoli E, Pastorello EA, Farioli L et al (2009) for a larger capture of species. J Proteome
Searching for allergens in maize kernels via 73:733–742
proteomic tools. J Proteome 72:501–510 45. Eriksson KO, Belew M (2011) Hydrophobic
33. Fasoli E, D’Amato A, Kravchuk AV et al (2011) interaction chromatography. Methods Bio-
Popeye strikes again: the deep proteome of chem Anal 54:165–181
spinach leaves. J Proteome 74:127–136 46. Candiano G, Dimuccio V, Bruschi M et al
34. Fröhlich A, Gaupels F, Sarioglu H et al (2012) (2009) Combinatorial peptide ligand libraries
Looking deep inside : detection of for urine proteome analysis: investigation of
low-abundant proteins in leave extracts of Ara- different elution systems. Electrophoresis
bidopsis thaliana and phloem exudates of 30:2405–2411
Cucurbita maxima. Plant Physiol 47. Boschetti E, Righetti PG (2013)
159:902–914 Low-abundance protein discovery: state of the
35. D’Amato A, Bachi A, Fasoli E et al (2010) art and protocols. Elsevier, Waltham
In-depth exploration of Hevea brasiliensis 48. Fonslow BR, Carvalho PC, Academia K et al
latex proteome and “hidden allergens” via (2011) Improvements in proteomic metrics of
combinatorial peptide ligand libraries. J Prote- low abundance proteins through proteome
ome 73:1368–1380 equalization using ProteoMiner prior to Mud-
36. Righetti PG, Boschetti E (2016) Global pro- PIT. J Proteome Res 10:3690–3700
tein expression analysis in plants by means of 49. Righetti PG, Boschetti E, Zanella A et al
peptide libraries. J Proteome 143:3–14 (2010) Plucking, pillaging and plundering pro-
37. Nguyen-Kim H, San Clemente H, Balliau T teomes with combinatorial peptide ligand
et al (2016) Arabidopsis thaliana root cell libraries. J Chromatogr A 1217:893–900
wall proteomics: increasing the proteome cov- 50. Farinazzo A, Fasoli E, Kravchuk AV et al
erage using a combinatorial peptide ligand (2009) En bloc elution of proteomes from
library and description of unexpected Hyp in combinatorial peptide ligand libraries. J Prote-
ome 72:725–730
51. Jorrı́n-Novo JV, Valledor-González L, Castil- 57. Fasoli E, Righetti PG (2013) The peel and pulp
lejo-Sánchez MA et al (2018) Proteomics anal- of mango fruit: a proteomic samba. Biochim
ysis of plant tissues based on two-dimensional Biophys Acta 1834:2539–2545
gel electrophoresis, in Advances in Plant Eco- 58. Fasoli E, Colzani M, Aldini G et al (2015)
physiology Techniques Lemon peel and Limoncello liqueur: A proteo-
52. Campos NA, Swennen R, Carpentier SC mic duet. Biochim Biophys Acta
(2018) The plantain proteome, a focus on 1834:1484–1491
allele specific proteins obtained from plantain 59. Lerma-Garcı́a MJ, D’Amato A, Simó-Alfonso
fruits. Proteomics 18:1700227 EF et al (2016) Orange proteomic fingerprint-
53. Singh P, Pitambara, Rajput RS et al (2018) ing: from fruit to commercial juices. Food
Proteomics approaches to study host pathogen Chem 196:739–749
interaction. J Pharmacogn Phytochem 60. D’Amato A, Esteve C, Fasoli E et al (2013)
7:1649–1654 Proteomic analysis of Lycium barbarum
54. Esteve C, D’Amato A, Marina ML et al (2012) (Goji) fruit via combinatorial peptide ligand
Identification of avocado (Persea americana) libraries. Electrophoresis 34:1729–1736
pulp proteins by nanoLC-MS/MS via combi- 61. Shahali Y, Sénéchal H, Poncet P (2018) The
national peptide ligand libraries. Electrophore- use of combinatorial hexapeptide ligand library
sis 33:2799–2805 (CPLL) in allergomics. Methods Mol Biol
55. Esteve C, D’Amato A, Marina ML et al (2013) 1871:393–403
In-depth proteomic analysis of banana (Musa 62. Gomez Cardona EE, Heathcote K, Teran ML
spp.) fruit with combinatorial peptide ligand et al (2018) Novel low-abundance allergens
libraries. Electrophoresis 34:207–214 from mango via combinatorial peptide libraries
56. Esteve C, D’Amato A, Marina ML et al (2012) treatment: a proteomics study. Food Chem
Identification of olive (Olea europaea) seed and 269:652–660
pulp proteins by nLC-MS/MS via combinato-
rial peptide ligand libraries. J Proteome
75:2396–2403
Chapter 29
iTRAQ-Based Proteomic Analysis of Rice Grains

Marouane Baslam, Kentaro Kaneko, and Toshiaki Mitsui
Abstract
Cereal proteins have formed the basis of human diet worldwide, and their level of consumption is expected
to increase. The knowledge of the protein composition and variation of the cereal grains is helpful for
characterizing cereal varieties and to identify biomarkers for tolerance mechanisms. Grains produce a wide
array of proteins, differing under conditions. Quantitative proteomics is a powerful approach allowing the
identification of proteins expressed under defined conditions that may contribute understanding the
complex biological systems of grains. Isobaric tags for relative and absolute quantitation (iTRAQ) is a
mass spectrometry–based quantitative approach allowing, simultaneously, for protein identification and
quantification from multiple samples with high coverage. One of the challenges in identifying grains
proteins is their relatively high content (~90–95%) of carbohydrate (starch) and low protein (~4–10%)
and lipid (~1%) fractions. In this chapter, we present a robust workflow to carry out iTRAQ quantification
of the starchy rice grains.
Key words Protein biomarkers, Chalkiness, Isobaric tags for relative and absolute quantification
(iTRAQ), Oryza sativa, Seed proteomics
1 Introduction
Although starch is the most dominant component of the rice grain,

it does not explain all variation in grain quality between rice culti-
vars. Total protein content also influences rice grain quality but also
does not completely account for all known variation in grain qual-
ity. Variation in rice grain protein composition influences taste and
texture of cooked rice. Differences in protein abundance are asso-
ciated with different genotypic and phenotypic traits [1–4]. Thus,
proteomics can directly and globally explore the protein levels and
its respective posttranslational modifications [5]. Proteomics could
be a powerful tool to better understand the genetic basis of plant
responses to environmental cues by directly comparing protein
abundance under stress conditions between genotypes differing in
their stress responses. Recently, a mass spectrometry (MS)–based
quantitative proteomics is becoming indispensable for gaining
insights into the biological systems at the molecular level. Isobaric
405
406 Marouane Baslam et al.
tags for relative and absolute quantitation (iTRAQ) is one of the

most popular chemical tagging approaches which allows for multi-
plexing up to eight samples in a single run with high coverage. The
iTRAQ method screen for global proteomic changes for identifying
differentially regulated proteins and the activated transduction
pathways. The potential candidate proteins may then be utilized
in elucidating the molecular mechanism explaining the response of
plants to a particular environmental condition. This field of
research aims to identify molecular features that can be developed
as biomarkers for crop improvement and provides genetic resources
underlying grain chalkiness, one of the principal targets for the
improvement of rice characteristics. iTRAQ technology has been
applied to rice [6–8], wheat [9–13], cotton [14], and other crop
species. Recently, this technique has been employed in the studies
of grain development [15–17] and chalkiness under high tempera-
ture to identify potential sources of tolerance for variety improve-
ment [6]. Such studies provide therefore an excellent starting
material for further elucidating the molecular and biochemical
basis of grain aspects and crops improvement. While recently several
studies have examined the quantitative proteomics of leaves, roots,
and stems, it has been challenging to focus on the “subdiscipline”
of grains such as rice owing to the complexity and the relatively low
protein content and usually high amount of interfering compounds
mainly starch (and others, e.g., rigid cell wall and phenolic com-
pounds). In order to overcome the problem of starchy endosperm,
we have successfully optimized the conditions for rice grain prote-
ome analysis by iTRAQ LC MS/MS. Here, the experimental work-
flow will lay the basis for further profound grains studies in the field
of proteomics.
2 Materials
For accurate mass spectrometry analysis, it is recommended to use

chromatography and mass spectrometric grade reagents and pre-
pare all the solutions in ultrapure water (18 MΩ/cm resistivity at
25 C).
2.1 Plant Material Seeds from rice plants (Oryza sativa L. cv. Koshihikari) grown in
paddy field or controlled (Biotron LPH-1.5PH-NCII, Nihon-ika,
Tokyo, Japan) conditions (see Note 1).
2.2 Protein 1. Rice grain grader (RGQI20A, Satake, Hiroshima, Japan).

Extraction 2. Viewer (Fujicolor lightbox New-5000 Inverter, Fuji film Co.,
and Quantification Tokyo, Japan).
3. Grain huller.
4. Coffee mill (MJ-51, Melitta Japan).
Proteomics of Starchy Rice Grains 407
5. Rice milling machine (KETT Electric Laboratory, Tokyo,

Japan).
6. Razor blade.
7. Mortar and pestle.
8. High-speed microcentrifuge (Himac CF-RXII, HITACHI).
9. Mixer (Delta mixer Se-08, TAITEC).
10. Extraction solution: 7 M urea, 2 M thiourea, 1% (w/v)
CHAPS, 1% (w/v) Triton X-100, 10 mM dithiothreitol
(DTT).
11. Methanol.
12. Chloroform.
13. Pierce bicinchoninic acid (BCA) protein assay kit (Thermo
Fisher Scientific Pierce, Rockford, IL.).
14. Pierce™ 660 nm Protein Assay Reagent (Thermo Fisher Sci-
entific Pierce, Rockford, IL.)
15. Bovine serum albumin (BSA) protein standard.
2.3 Protein Digestion 1. Block Bath (CB-100A, AS ONE Corporation, Osaka, Japan).
and iTRAQ Labeling 2. Urea, 8.0 M.
3. Endoproteinase Lys-C (Wako, Tokyo, Japan), 1 μg μL1 (see
Note 2).
4. Trypsin (Wako, Tokyo, Japan), 1 μg μL1 (see Note 2).
5. iTRAQ Reagent Multiplex kit: (4-plex: 114, 115, 116, 117;
AB SCIEX, Foster, CA).
6. iTRAQ Dissolution buffer provided in iTRAQ kit.
7. Absolute Ethanol.
8. Reducing buffer: 50 mM Tris-(2-carboxyethyl) phosphine
(TCEP) (see Note 2).
9. Alkylating solution: 200 mM methyl methanethiosulfonate
(MMTS) (see Note 2).
2.4 Cation Exchange 1. 0.5 mL Syringe (Hamilton).

Liquid 2. Speed vac (CC-105, TOMY, Tokyo, Japan).
Chromatography
3. MonoSpin® C18 columns (GL science).
and Peptide Desalting
4. ICAT cation exchange buffer pack (Applied Biosystems),
including elution, loading, cleaning, and storage contain ace-
tonitrile buffers.
5. Formic acid 1% (v/v).
6. Acetonitrile: 5% (v/v) in 0.1% (v/v) formic acid.
7. Activation solution: 80% acetonitrile in 1% (v/v) formic acid.
2.5 Mass 1. Liquid chromatography system (EASY-nLC 1000 and DiNa-A

Spectrometry KYA (Tech Corporation).
2. ESI nano stage (KYA Tech Corporation).
3. LTQ Orbitrap XL mass spectrometer (Thermo Fischer
Scientific).
4. MonoCap C18 High Resolution 2000; 0.1 mm i.
d. 2000 mm, (GL Science).
5. Solvent A: 2% (v/v) acetonitrile in 0.1% (v/v) formic acid.
6. Solvent B: 80% (v/v) acetonitrile in 0.1% (v/v) formic acid.
3 Methods
3.1 Sample 1. Husk rice seeds with a grain huller (see Note 3).
Preparation 2. Polish 10 g of grain samples for 30–40 min using a rice milling
machine in order to remove the embryo and aleurone layer.
3.2 Protein 1. Resuspend 200 mg of powdered sample of starchy grain in

Extraction 0.4 mL of extraction buffer (see Notes 4 and 5) by vortexing
for 15 s at high speed.
2. Centrifuge at 20,000 g for 10 min (4 C). Transfer the
resulting supernatant to an Eppendorf tube.
3. Add 400 μL of methanol to 100 μL of supernatant and mix by
vortexing.
4. Add 100 μL of chloroform and 300 μL of ultrapure water and
vortex for 5 s.
5. Centrifuge for 1 min at 10,000 g at 4 C and remove the
upper aqueous phase.
6. Add 400 μL methanol and vortex thoroughly.
7. Centrifuge at 10,000 g at 4 C, for 15 min. Discard the
supernatant and keep the pellet.
3.3 Protein Digestion 1. Resuspend immediately the protein pellet in 8 M urea. Solubi-
lize completely the protein sample if necessary, by incubation
overnight at 4 C or alternatively by sonication.
2. Determine the protein concentration by Pierce 660 nm. Pro-
tein Assay kit (Thermo Fisher Scientific) using bovine serum
albumin (BSA) as a standard.
3. Add to the protein solution (50 μg total protein) 2 μL of
dissolution buffer from iTRAQ kit and 2 μL of reducing buffer
(TCEP); mix well by vortexing for 15 s, and spin down briefly.
4. Incubate the mixture at 60 C for 1 h, and spin down the
solution.
5. Alkylate by adding 1 μL of the cysteine blocking reagent from

iTRAQ kit. Mix well by vortexing for 15 s and spin down.
6. Incubate at 37 C for 1 h (see Note 6), and spin down.
7. Dilute with an equal volume of iTRAQ dissolution buffer
provided in the kit. For protein digestion add 5 μL of endo-
proteinase Lys-C (1 μg μL1) and incubate at 37 C for 3–4 h.
Dilute 10 times with ultra-pure water (see Note 7). Vortex for
30 s and spin down the solution.
8. Add 5 μL of the trypsin solution to each sample tube for
further digestion. Vortex to mix for 1 min and spin down
briefly to bring all the solution down at the bottom. Incubate
at 37 C for 12–16 h (overnight) (see Note 8).
3.4 iTRAQ Peptide 1. Bring the iTRAQ reagents to label peptides provided as set of
Labeling four (iTRAQ® Reagent 114, 115, 116, and 117) out of the
freezer to room temperature. Spin down to bring the solution
to the bottom of the vial.
2. Add 500 μL of absolute ethanol provided in iTRAQ kit to each
vial of the iTRAQ Reagent. Vortex each vial for 30 s and then
spin down the solution.
3. Transfer the entire contents of each freshly prepared iTRAQ
reagent to their respective tryptic peptide sample tube. Vortex
each tube for 30 s to mix, then spin.
4. Incubate the iTRAQ labeling reaction tubes for 1 h at room
temperature (see Notes 9–12).
5. Add 400 μL of ultrapure water, vortex for 30 s, and spin down.
6. Combine the content of each iTRAQ reagent-labeled sample
tube into one tube.
7. Vortex to mix, then spin.
3.5 Cation Exchange 1. Set the cation exchange column in a 0.5 mL syringe and clean it
Chromatography with 1 mL of cleaning buffer to condition the cartridge. Keep
the injection flow in this and following steps at 1 drop per
second. Divert to waste.
2. Inject 2 mL of the Cation Exchange Buffer-Load. Divert to
waste.
3. Slowly inject (¼1 drop/second) the mixed iTRAQ-labeled
peptide samples onto the cation-exchange cartridge and collect
the flow-through in a sample tube (see Notes 13 and 14).
4. Wash with 1 mL of loading buffer (see Note 15).
5. To elute the peptide, slowly inject (¼1 drop/s) 500 μL of
elution buffer. Collect the eluted peptides as a single fraction
in an Eppendorf tube.
6. Add 1 mL of cleaning buffer to wash the undigested proteins

(i.e., trypsin) from the cation-exchange cartridge and collect
the flow-through in two fractions of 0.5 mL each.
7. Dry the concentrated iTRAQ-labeled peptide samples (50 μL)
in a speed vac for further fractionation.
8. Wash the column with 2 mL of storage buffer. Seal the column
with Parafilm to avoid drying out. Store the cartridge at
2–8 C.
3.6 C18 Spin Columns 1. Add 500 μL of 1% (v/v) formic acid to acidify the peptides
and Peptide Desalting solution.
2. Add 100 μL of 80% acetonitrile in 1% formic acid.
3. Centrifuge the C18 cartridge at 5000 g for 2 min.
4. Equilibrate the C18 column by adding 1% formic acid to the
cartridge. Centrifuge at 10,000 g for 2 min.
5. Transfer completely the iTRAQ-labeled peptides onto the
equilibrated C18 cartridge. Centrifuge at 10000 g for
2 min and collect the flow-through. Load the flow-through
fraction again onto the C18 column and centrifuge at
10,000 g for 1 min (see Note 16).
6. Wash off the column by adding 1.5 mL of 1% formic acid (see
Note 17).
7. Elute the peptides by adding 600 μL of 80% acetonitrile in 1%
formic acid. Collect the flow-through in new Eppendorf tube
the eluted peptides by centrifugation at 10,000 g for 2 min.
8. Dry the desalted peptides in a speed vacuum for further ana-
lyses by MS/MS.
3.7 Mass 1. Reconstitute the iTRAQ-labeled peptides in 20 μL of 2% ace-

Spectrometry tonitrile in 0.1% formic acid.
2. Load the iTRAQ-labeled peptides (20 μL) onto a trap column
(HiQ sil C-18W-3; 0.5 mm i.d. 1 mm, 3 μm particle size)
with buffer A using a DiNa-A system (KYA Tech., Tokyo,
Japan).
3. For the MS calibration parameters, apply a linear gradient from
0 to 33% buffer B for 600 min, followed by another linear
gradient 33–100% buffer B for 10 min, and back to 0% buffer
B in 15 min.
4. Load directly the peptides eluted from the HiQ sil C-18W-3
column on a separation column (MonoCap C18 High Resolu-
tion 2000; 0.1 mm i.d. x 2000 mm). Subsequently, the sepa-
rated peptides are introduced into a LTQ-Orbitrap XL mass
spectrometer (Thermo Fisher Scientific) at a flow rate of
300 nl/min and an ionization voltage 1.7–2.5 kV. The LTQ
Orbitrap XL mass spectrometer includes an octupole acting as

collision cell able to perform an alternative peptide fragmenta-
tion termed higher energy collision-induced dissociation
(HCD).
5. Operate a liquid chromatography-MS/MS (LC-MS/MS)
spectrometer using Xcalibur 2.0 software (Thermo Fisher Sci-
entific). The mass range selected for MS scan is set to
350–1600 m/z and the top three peaks are subjected to
MS/MS analysis. The full MS scan is detected in the Orbitrap,
and the MS/MS scans are detected in the linear ion trap and
Orbitrap. The normalized collision energy for MS/MS is set to
35 eV for collision-induced dissociation (CID) and 45 eV for
higher-energy C-trap dissociation (HCD). High resolution of
Fourier transform mass spectrometer (FTMS) is maintained at
60,000 resolutions.
Divalent or trivalent ions are subjected to MS/MS analysis
in dynamic exclusion mode; the peaks obtained from the
LC-MS are detected as divalent or trivalent ions, and therefore
a mass difference of 1 would be detected as only 0.5 or 0.3.
This small difference could not accurately be detected in the
LC-MS system. Therefore, the peptides containing Asn/Gln
are analyzed by MS/MS to distinguish between deamidated
peptides and isomerized peptides.
Proteins are identified with Proteome Discoverer v. 1.4
software, the SEQUEST HT (Thermo Fisher Scientific), and
MsAmanda [18] search tool using the UniProt (http://www.
uniprot.org/) O. sativa subsp. japonica database (63,535 pro-
teins) with the following parameters: enzyme, trypsin; maxi-
mum missed cleavages site, 2; peptide charge, 2+ or 3+; MS
tolerance, 5 ppm; MS/MS tolerance, 0.5 Da; dynamic modi-
fication, carboxymethylation (C), oxidation (H, M, W),
iTRAQ 4-plex (K, Y, N-terminus). It has been suggested that
a higher proportion of the proteome can be quantified by using
multiple search engines [19]. The False discovery rates must be
<1%.
4 Notes
1. Seed samples can be stored in a dry cool room over a 4–10 C

temperature range until they are used.
2. Prepare fresh immediately before use.
3. This step can be omitted if seed samples are limited.
4. When the rice flour becomes a rice cake, add 0.8 mL water (two
volumes of extraction buffer) (Fig. 1).
5. This step should be carried out on ice.
Fig. 1 Photograph of starchy grain aspects during the steps of protein extraction process
6. Do not heat above 37 C, since the urea degradation products

might modify amino acids residues of proteins.
7. For effective trypsin digestion, dilute each protein preparation
so that the final concentrations of the detergents and other
reagents do not inhibit trypsin activity.
8. The total volume of the digestion mixture must be less than
300 μL. If it is higher, lyophilize and reconstitute with 300 μL
of dissolution buffer.
9. Allow iTRAQ vials to warm up first.
10. Each iTRAQ vial may be provided in different volumes by the
vendor, so the final volumes should not be the same for all
the tags.
11. If the pH of the peptide/iTRAQ mixture is less than 7.5, the
labeling efficiency would be significantly reduced. For optimal
labeling efficiency, the pH must be between 7.5 and 8.5.
12. Labeling with the 4-plex iTRAQ for 1 h, while the 8-plex
version reagent requires a reaction time for 2 h.
13. Samples should be loaded with a relatively slow flow rate to
maximize the binding of the peptides to the column.
14. Test the pH of the sample by placing 0.5 μL of the solution
onto a pH paper. If the pH is not between 2.3 and 3.3, adjust
by adding more Cation Exchange buffer-Load.
15. The eluted solution should be collected in a new tube to avoid
unforeseen trouble.
16. The flow-through loaded for second time should be collected

in a fresh tube to check the recovery rate of the peptide.
17. The washed fractions should be collected in a new Eppendorf
tube. In the case of the sample eluted with Clean Buffer
(1.0 M), 2 mL of 0.1% formic acid should be added.
Acknowledgments
This research was supported by KAKENHI Grants-in-Aid for Sci-

entific Research (A) (15H02486) from Japan Society for the Pro-
motion of Sciences, Strategic International Collaborative Research
Program by the Japan Science and Technology Agency (JST
SICORP), and Grant for Promotion of KAAB Projects (Niigata
University) from the Ministry of Education, Culture, Sports, Sci-
ence, and Technology, Japan.
References
1. Tsutsui K, Kaneko K, Hanashiro I et al (2013) reveals alterations in hull development in rice

Characteristics of opaque and translucent parts (Oryza sativa L.). PLoS One 10:10 e0133696
of high temperature stressed grains of rice. J 8. Wang ZQ, Xu XY, Gong QQ et al (2014) Root
Appl Glycosci 60:61–67 proteome of rice studied by iTRAQ provides
2. Wakasa Y, Yasuda H, Oono Y et al (2011) integrated insight into aluminum stress toler-
Expression of ER quality control-related ance mechanisms in plants. J Proteome
genes in response to changes in BiP1 levels in 98:189–205
developing rice endosperm. Plant J 9. Fu Y, Zhang H, Mandal SN et al (2016) Quan-
65:675–689 titative proteomics reveals the central changes
3. Lin CJ, Li CY, Lin SK et al (2010) Influence of of wheat in response to powdery mildew. J
high temperature during grain filling on the Proteome 130:108–119
accumulation of storage proteins and grain 10. Kang GZ, Li GZ, Wang LN et al (2014)
quality in rice (Oryza sativa L.). J Agric Food Hg-responsive proteins identified in wheat
Chem 58:10545–11055 seedlings using iTRAQ analysis and the role
4. Lin SK, Chang MC, Tsai YG et al (2005) Pro- of ABA in hg stress. J Proteome Res
teomic analysis of the expression of proteins 14:249–267
related to rice quality during caryopsis develop- 11. Alvarez S, Choudhury SR, Pandey S (2014)
ment and the effect of high temperature on Comparative quantitative proteomics analysis
expression. Proteomics 5:2140–2156 of the ABA response of roots of drought-
5. Ralhan R, DeSouza LV, Matta A et al (2008) sensitive and drought-tolerant wheat varieties
Discovery and verification of head-and-neck identifies proteomic signatures of drought
cancer biomarkers by differential protein adaptability. J Proteome Res 13:1688–1701
expression analysis using iTRAQ labeling, mul- 12. Ge P, Hao PC, Cao M et al (2013) iTRAQ-
tidimensional liquid chromatography, and tan- based quantitative proteomic analysis reveals
dem mass spectrometry. Mol Cell Proteomics new metabolic pathways of wheat seedling
7:1162–1173 growth under hydrogen peroxide stress. Prote-
6. Kaneko K, Sasaki M, Kuribayashi N et al omics 13:3046–3058
(2016) Proteomic and glycomic characteriza- 13. Ford KL, Cassin A, Bacic A (2011) Quantita-
tion of rice chalky grains produced under mod- tive proteomic analysis of wheat cultivars with
erate and high-temperature conditions in field differing drought stress tolerance. Front Plant
system. Rice 9:26 Sci 2:44
7. Wang SZ, Chen WY, Xiao WF et al (2015) 14. Liu J, Pang CY, Wei HL et al (2015) iTRAQ-
Differential proteomic analysis using iTRAQ facilitated proteomic profiling of anthers from a
photosensitive male sterile mutant and wild
type cotton (Gossypium hirsutum L.). J Prote- phosphoprotein characterization reveals the
ome 126:68–81 central metabolism changes involved in wheat
15. Cui Y, Yang MM, Dong J et al (2017) iTRAQ- grain development. BMC Genomics 15:1029
based quantitative proteome characterization 18. Dorfer V, Pichler P, Stranzl T et al (2014) A
of wheat grains during filling stages. J Integr universal identification algorithm optimized
Agric 16:20156–22167 for high accuracy tandem mass spectra. J Pro-
16. Yang MM, Yang J, Dong WC et al (2016) teome Res 13:3679–3684
Characterization of proteins involved in early 19. Elias JE, Haas W, Faherty BK et al (2005)
stage of wheat grain development by iTRAQ. J Comparative evaluation of mass spectrometry
Proteome 136:157–166 platforms used in large-scale proteomics inves-
17. Ma CY, Zhou JW, Chen GX et al (2014) tigations. Nat Methods 2:667–675
iTRAQ-based quantitative proteome and
INDEX
A Data dependent acquisition (DDA)....................... 5, 127,

141, 170–172, 174–176, 192, 214–223, 232
Algae ...............................................................82, 197–210 Data Integration Analysis for Biomarker discovery using a
Allele specific proteins...................................... 4, 297–305 Latent component method for Omics studies
Allelic variance ...........................................................4, 158
(DIABLO) ................................. 22, 25, 35–38, 52
Apoplast .................................................4, 80, 83, 86, 107 Data validation ..........................................................2, 7–8
Apoplastic fluid..........................................................79–86 Detergent-resistant membrane (DRM) ................. 90, 91,
Arabidopsis thaliana .......................................79–86, 170,
95–98, 101, 103, 104
179, 187, 225, 233, 236, 239, 242, 260, 274, Dimethyl labeling................................................ 133–145,
310, 326, 367, 372, 374, 381, 385 183, 242, 243, 245, 248–249
AtPrx47 ...............................................327–330, 333, 337
AtPrx64 ...............................................327–329, 333, 337 E
B Electron transfer dissociation (ETD)................. 193, 194,
227, 229, 232, 233, 236–238
Bioinformatics ............................................ 2, 3, 7, 12, 58, Endoplasmic reticulum (ER).............................. 117–129,
141, 325, 370, 375 225, 325
Biomarkers.................................... 5, 22, 25, 35, 135, 406
Experimental design............................143, 148, 199, 215
Biotinylated cystatins ........................................... 355–359
Bottom-up ................................................. 5, 57, 157, 242 F
C Forest species................................................................. 158
14-3-3 proteins ........................................... 275, 289, 290
Cell walls.................................................79, 86, 179, 373,
Fragmentation .................................................12, 89, 127,
382, 383, 406 134, 163, 185, 192, 194, 199, 224–239, 253,
Chalkiness ...................................................................... 406 266, 411
Chlamydomonas reinhardtii...........................12, 197, 200 Functional proteomics ..............................................6, 354
Chloroplasts......................................................... 4, 69–78,
Fungal disease................................................................ 136
119, 207, 242
Cocoa pod ............................................................ 133–145 G
Co-immunoprecipitation (co-IP)........................ 273, 290
Combinatorial peptide ligand library.............. 4, 381–401 Genomes....................................................... 3, 50, 53, 57,
Confidence parameters ................................................. 166 121, 142, 257, 289, 298, 309, 367, 381
Custom protein databases ........................................59, 65 GFP-trap ............................................................... 257–269
Cys proteases ...................................................6, 354, 359, Glycoproteomics ........................................................... 227
361–363, 365
H
Cystatin activity-based protease profiling ........... 353–365
Cytoscape.............................................. 23, 25, 29–33, 35, Higher-energy collisional dissociation ......................... 411
38–41, 43–48, 55, 375 Holm oak............................................. 8, 57–67, 157–167
Homeolog ..................................................................... 298
D Horseradish peroxidase (HRP) .......................... 120, 124,
Data acquisition ..................................... 5, 140, 169–177, 261, 265, 327, 328, 333, 337
214, 217, 221, 266, 345 Hydrophilic interaction liquid chromatography
Databases .................................................... 3, 22, 58, 102, (HILIC) ................................................... 227, 228,
112, 142, 158, 184, 207, 214, 229, 253, 266, 230, 231, 238, 243–246, 249
294, 300, 310, 326, 360, 368, 411 Hydrophobic proteins ...................................90, 390–391
Hypothetical structure ......................................... 325–338
https://doi.org/10.1007/978-1-0716-0528-8, © Springer Science+Business Media, LLC, part of Springer Nature 2020
415
PLANT PROTEOMICS: METHODS AND PROTOCOLS
416 Index
I Mass spectrometry (MS)..................................... 5, 69, 90,
108, 118, 133, 147, 157, 169, 189, 204, 226,
Identification ....................................................... 3, 23, 57, 242, 259, 273, 310, 341, 354, 382, 405
69, 80, 89, 109, 127, 133, 148, 158, 170, 183, Mass spectrometry imaging (MSI) ..................... 341–350
207, 215, 226, 244, 257, 274, 290, 297, 310, Matrix-assisted laser desorption/ionization
325, 359, 369, 386 (MALDI) .................................................. 341–350
Immunoaffinity ...............................................5, 260, 261, MaxQuant ................................. 109, 112, 113, 214–219,
276, 281, 286 221, 250, 253, 259, 260, 266, 267
Immunoaffinity purification ................................... 5, 260, Medicago truncatula ............................................ 341–350
276, 281, 286 Membrane trafficking................................................80, 90
Immunoprecipitation (IP) .................................. 241–255, Metabolic pathways.......................................... 8, 367–379
259, 280–283, 290–293, 295 Metabolites ............................................. 7, 12–16, 22–24,
In gel digestion............................................90–93, 97–99, 30, 32–37, 179, 241, 310, 341, 367–371, 373,
102, 291, 293, 294 375, 376, 378, 379, 383
Inhibitors ............................................................. 6, 74, 77, Metabolomics ................................. 23, 30, 349, 368–370
81, 82, 119, 128, 150, 151, 180, 188, 189, 198, Microalgae ....................................................7, 11–20, 197
201, 209, 228, 260, 276, 281, 285, 292, Microdomain ...........................................................89–105
353–365, 386, 387, 400 Moniliophthora roreri .................................................... 136
In silico analysis ....................................... 6, 325–338, 368 Multiple co-inertia analysis (MCIA) .............................. 22
In solution digestion..........................................90, 93, 98,
102, 103, 110, 111, 113, 273, 277, 283, 284, 395 N
Interaction networks.................................. 5, 21–55, 257,
259, 267, 375 Nano-LC-MS/MS .........................................89–105, 125
Interactome ....................................................................... 4 N-linked glycans................................................... 225, 226
In vivo cross-linking............................................. 273–286 Non-model species........................................................ 158
Isobaric tags for relative and absolute quantification Nucleus .................................................... 70, 71, 124, 242
(iTRAQ)...............................................3, 118, 134,
O
148, 183, 405–413
Isolation .......................................................11–20, 70–76, Offline fractionation............................................. 241–255
108, 117–129, 163, 173, 193, 194, 206, 218, Orbitrap ...........................................................3, 109, 112,
221, 223, 253 127, 144, 163, 172, 174–177, 192–194, 202,
Isotopic variants ...........................................133–135, 144 206, 213–216, 219, 229, 266, 269, 292, 294,
345, 408, 410, 411
L Orphan plant species........................................ 8, 157–167
Label-free.........................................................3, 5, 6, 118,
P
119, 125, 148, 183, 197–210, 215, 257–269,
300–302, 310 Parallel reaction monitoring (PRM) ...................... 5, 170,
Label-free quantification (LQF)............................. 5, 148, 213–224
197–210, 215, 257–269, 301 Partial least squares (PLS) ...........................22, 25, 30–33
Ligand binding-sites ..................................................... 326 Partial least square-discriminant analysis
Lipids .................................................... 7, 12–16, 89, 107, (PLS-DA).......................................................22, 35
147, 179, 268, 349, 373, 383, 387, 399 Peptides ........................................................ 4, 65, 86, 89,
Liquid chromatography coupled to tandem mass 108, 121, 133, 147, 158, 169, 182, 198, 213,
spectrometry (LC MS/MS) ................. 5, 89–105, 230, 242, 260, 274, 291, 299, 310, 325, 342,
112, 125, 137, 139, 144, 147, 149, 160, 353, 382, 409
169–174, 176, 199, 200, 202, 206, 229, 232, Peroxidases class III ............................................. 325–338
238, 252–253, 260, 262, 266, 284, 290, 291, Phenol protein extraction .................................... 314, 316
294, 295, 310, 317, 354, 356, 357, 360, 397, 411 Phosphopeptides ........................................ 148, 149, 151,
Low-abundance protein......................180, 310, 381–401 153, 155, 180, 181, 189, 191–193, 198–201,
Lysine acetylation........................................ 148, 242, 253 205, 206, 208, 213–224
Phosphoproteome...................................... 181, 183, 184,
M 198, 199, 202, 242, 385
Mascot ..................................................57, 121, 127, 128, Phosphoproteomics ................................... 148, 179–194,
142–144, 193, 202, 207, 317–319, 360 197–210
PLANT PROTEOMICS: METHODS AND PROTOCOLS
Index 417
Phosphorylation ..................................... 7, 135, 147–155, SEQUEST .......................................................57, 65, 160,
180, 184, 198, 199, 207, 215, 223, 224, 241 163, 294, 295, 411
Pigments ........................................................7, 12–16, 19, Shotgun .......................................................................3, 25
147, 383, 384, 387 Signaling .................................................70, 71, 107, 108,
Pinus ................................................................................ 58 111, 180, 198, 241, 289, 373, 382, 399
Plasma membrane ...........................................79, 89–105, Single amino acid polymorphisms (SAAP).................298,
107–114, 119, 327 302, 304
Pollen .......................................................... 274, 275, 279, Skyline......................................... 171–174, 214, 216–222
285, 286, 383, 398 Sodium dodecyl sulfate polyacrilamide gel electrophoresis
Polyacrylamide gel electrophoresis (PAGE) ................. 97, (SDS-PAGE)..................................... 77, 110, 111,
158, 259, 261, 357, 358, 398 123, 161, 190, 263–264, 273, 275, 279–282,
Polyethylene glycol (PEG) ........................ 109, 312–314, 291, 293, 314, 355, 359, 383, 396, 401
316, 383 Sparse partial least squares (sPLS)........ 30–34, 44, 45, 47
Polyploidy...................................................................... 298 Stable-isotope labeling......................................... 133, 148
Post-translational modification (PTM)................ 4–6, 54, Subcellular ....................................................4, 69–78, 118
58, 70, 135, 147–149, 179, 180, 194, 198, 213, Substrate ....................................................... 80, 136, 223,
225, 226, 241, 259, 298, 325–338, 405 242, 354, 362, 367
Principal components analysis (PCA) ......................21, 22 Substrate channel analysis.................................... 333, 336
Protease inhibitors .............................................. 6, 74, 77, Substrates.............................................................. 325–338
81, 82, 119, 128, 188, 201, 209, 260, 276, 281, Sweetpotato ...................................................... 6, 309–323
285, 353–365, 386 SYPRO Ruby............................................... 291, 293, 295
Proteases ....................................................... 6, 16, 74, 77,
81, 82, 92, 99, 119, 128, 134, 179, 180, 188, T
201, 204, 207, 209, 253, 260, 274, 276, 281, Tandem mass tags (TMT) ..................134, 147–155, 183
285, 353–365, 382, 383, 386, 387, 400
Targeted data acquisition (TDA) ....................... 170–172,
Protein networks .......................................................42, 49 174–176
Protein-protein interaction...............................25, 28, 37, Targeted quantification................................171, 213–224
40, 49, 70, 225, 259, 267, 268, 273, 274, 289
Tertiary structure ....................................... 326, 328, 331,
Proteogenomics ............................................... 6, 309–323 332, 334, 335
Purification ................................................. 3, 5, 6, 16–19, TiO2-based phosphopeptide enrichment ........... 199, 202
69–78, 81, 83–84, 90–91, 93–96, 100–101, 183, Tomato ...................................5, 274, 289–295, 354, 362
202, 205, 210, 259, 273, 276, 281, 284, 286,
Topologies ...................................... 47, 54, 326, 327, 330
356, 357, 364 Transcriptomics .6, 7, 23, 30, 49, 57–67, 198, 213, 310,
322, 368–370
Q
Transmembrane domains ..............................90, 102, 107
Quantitative proteomics .......................... 6, 80, 133–145, Tropical fruits ....................................................... 179–194
147–149, 169–177, 405, 406 Trypsin ...................................................... 89, 92, 93, 100,
Quercus ilex.......................................... 8, 57–67, 157–167 108–111, 113, 114, 121, 125, 127, 134, 136,
138, 142, 150, 152, 160, 162, 172, 180, 189,
R 191, 200, 201, 204, 207, 228, 230, 233, 237,
Rice (Oryza sativa) .......................................57, 107–114, 243, 245, 248, 253, 254, 259, 260, 262, 265,
118, 225, 309, 382, 405–413 273, 277, 284, 286, 291, 293–295, 299, 355,
RNA-seq analysis.......................... 24, 26, 49, 57, 60, 298 360, 395–397, 407, 409–412
Two-dimensional gel electrophoresis (2-DE) ....... 89, 90,
Root nodules ........................................................ 341–350
157, 382, 384
S Two-phase partitioning ...................................91, 95, 104
Secretion ................................................................. 80, 117 U

Seed.............................................158, 161, 165, 260, 290
Un-targeted quantification ........................................... 170
Sequence database......................................................... 360

Plant Proteomics: Methods and Protocols

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Plant Proteomics: Methods and Protocols

Uploaded by

Copyright:

Available Formats

Methods in

Molecular Biology 2139

For further volumes:

Methods and Protocols

Mari Angeles Castillejo

ISSN 1064-3745 ISSN 1940-6029 (electronic)

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Cordoba, Spain Jesus V. Jorrin-Novo

1 What Is New in (Plant) Proteomics Methods and Protocols:

11 Quantitative Profiling of Protein Abundance and Phosphorylation

24 In Silico Analysis of Class III Peroxidases: Hypothetical Structure,

CARLOS FUENTES-ALMAGRO • Proteomics Facility, SCAI, University of Cordoba, Cordoba,

SHELDON R. LAWRENCE II • Department of Chemistry, University of North Carolina at

HEIDI PERTL-OBERMEYER • Department of Biosciences, Membrane Biophysics, Paris-Lodron-

MATSUO UEMURA • United Graduate School of Agricultural Sciences, Iwate University,

What Is New in (Plant) Proteomics Methods and Protocols:

The success of the previous editions of “Plant Proteomics Methods

2 Novelties in the 2015–2019 Period

The main objective of a proteomics experiment is to identify, char-

small fraction of the whole proteome (1–5%). For a higher cover-

difficulty in characterizing interactions is even greater than PTMs

plant proteomics experiment performed on Arabidopsis. This

Up to 2010, -omics approaches were developed independently

Chapter 27, by Lopez-Hidalgo et al., is a good example of how

Multiple Biomolecule Isolation Protocol Compatible

Key words Microalgae, Proteomics, Lipids, Metabolite, Pigments, DNA, RNA

Microalgae have gained attention in industry during the last dec-

Lipids extraction Lipids

1900 g, 2 min Spin DNA RNA Proteins

TUBE TUBE TUBE P

Metabolites extraction DNA

Non polar Polar

Desalting Protein digestion 1 cm

of transcripts, proteins, or metabolites and its integration into

2. Tris-Acetate-Phosphate Media (TAP) (https://www.

2.3 Sampling and 1. Metabolite extraction buffer (MEB): methanol–chloroform–

10. RNase solution: 300 μL of WB2 and 3 μL of 20 mg/mL

3.1 Sampling 1. Harvest 50 mL of culture and centrifuge at 1900 g for 5 min.

3.2 Metabolite Following steps must be done in ice and centrifugations at 4 C

3. Centrifuge at 20,000 g for 6 min and transfer supernatant to

6. The concentration of chlorophylls and carotenoids (in μmol

5. Add 400 μL of acetonitrile to the tube RP and mix first by

3.6 Protein Following steps must be at 4 C unless other conditions are

1. Phase separation mix should be prepared in the 1.5 mL tube.

12. Sometimes polar phase can be slightly cloudy, becoming trans-

26. Choose protein quantification method depending on the com-

Our research group is generously funded by Spanish Ministry of

Protein Interaction Networks: Functional and Statistical

Electronic supplementary material: The online version of this chapter (https://doi.org/10.1007/978-1-0716-

In this chapter, different workflows aimed to conduct all of

Next, we describe different approaches aiming to obtain networks

Depending on the approach used, we have developed different

1. Selection of differential expression proteins (for targeted net-

(a) Statistical integration networks: Dynamic protein–protein

3. Now we can create a DGEList variable (a list-based system,

dpList <- DGEList(counts=proteins, genes=rownames(proteins))

4. To correct for variations between samples, the Trimmed Mean

dpList <- calcNormFactors(dpList, method="TMM")

5. Once dataset is ready, it is needed to define a matrix that will

design <- matrix(c(c(1,1,1,0,0,0,0,0,0), c(0,0,0,1,1,1,0,0,0), c(0,0,0,0,0,0,1,1,1)),

A common negative binomial dispersion parameter consid-

dpList <- estimateGLMCommonDisp(dpList, design=design)

6. After calculating dispersions, differential expression values can