You are on page 1of 2417

Werner Dubitzky

Olaf Wolkenhauer
Kwang-Hyun Cho
Hiroki Yokota
Editors

Encyclopedia of
Systems Biology

1 3Reference
Encyclopedia of Systems Biology
Werner Dubitzky • Olaf Wolkenhauer
Kwang-Hyun Cho • Hiroki Yokota
Editors

Encyclopedia of
Systems Biology

With 830 Figures and 148 Tables


Editors
Werner Dubitzky Kwang-Hyun Cho
Biomedical Sciences Research Institute Department of Bio and Brain Engineering
University of Ulster Korea Advanced Institute of Science and
Coleraine, UK Technology (KAIST)
Daejeon, Republic of Korea
Olaf Wolkenhauer
Department of Computer Science Hiroki Yokota
University of Rostock Department of Biomedical Engineering
Rostock, Germany Rensselaer Polytechnic Institute
Troy, NY, USA

ISBN 978-1-4419-9862-0 ISBN 978-1-4419-9863-7 (eBook)


ISBN 978-1-4419-9864-4 (Print and electronic bundle)
DOI 10.1007/ 978-1-4419-9863-7
Springer New York Heidelberg Dordrecht London

Library of Congress Control Number: 2013931182

# Springer ScienceþBusiness Media LLC 2013


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this
publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s
location, in its current version, and permission for use must always be obtained from Springer. Permissions
for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to
prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publication,
neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or
omissions that may be made. The publisher makes no warranty, express or implied, with respect to the
material contained herein.

Printed on acid-free paper

Springer is part of Springer ScienceþBusiness Media (www.springer.com)


Preface

Systems biology refers to the quantitative analysis of the dynamic interactions among
multiple components of a biological system and aims to understand the behavior of
the system as a whole. Systems biology involves the development and application of
system-theoretic concepts for the study of complex biological systems through
iteration over mathematical modeling, computational simulation and biological
experimentation. Systems biology could be viewed as a new concept and tool to
increase our understanding of biological systems, to develop more directed experi-
ments, and to allow accurate predictions.
Systems biology is among the most rapidly growing fields of scientific research
and technology development. Over the past ten to fifteen years, the research activities
in this field have been increasing exponentially. This is evidenced by the number of
published papers as well as the growing number of conferences, journals, research
institutions and centers, and academic events on the topic of systems biology. While
research, development and applications in this field are becoming more widespread,
one major challenge is the lack of a single reference resource providing up-to-date
overviews and definitions of the major concepts and subjects across the various sub-
areas of systems biology. The Encyclopedia of Systems Biology is addressing this
lack and provides a unified information resource. It has been conceived as
a comprehensive reference work covering all aspects of systems biology, including
the investigation of living matter through a tight coupling of biological experimen-
tation, mathematical modeling and computational analysis and simulation. The main
aim of the Encyclopedia is to provide a complete reference of established knowledge
in systems biology – a ‘one-stop shop’ for someone seeking information on key
concepts in this field. As a result, the Encyclopedia comprises a broad range of topics
relevant in the context of systems biology.
The audience targeted by the Encyclopedia includes researchers, developers,
teachers, students and practitioners who are studying or working in the field of
systems biology. Keeping in mind the varying needs of the potential readership, we
have structured and presented the content in a way that is accessible to readers from
a wide range of backgrounds. In contrast to encyclopedic online resources, which
often rely on the general public to author their content, a key consideration in the
development of the Encyclopedia of Systems Biology was to have subject matter
experts define the concepts and subjects of systems biology. We believe that there is
no other treatment of this subject in terms of depth and authority of the field.
Ultimately, systems biology represents a trans-disciplinary scientific paradigm, in
which researchers from diverse backgrounds work jointly using a shared conceptual
framework and combined disciplinary-specific approaches to address complex life
science R&D problems. Thus, the Encyclopedia of Systems Biology derives its

v
vi Preface

raison d’eˆtre from a modern trans-disciplinary perspective on how to think about and
explore complex biological phenomena. We believe that this new publication will
help to define the meaning and scope of systems biology, and to extend the impact that
systems biology will have on future developments in the life sciences.

Acknowledgments

It has taken several years of hard work by many people to bring this Encyclopedia to
life. We would like to give our sincere thanks to the members of the Editorial Board,
the contributing authors and the reviewers. Without their expertise, dedication and
continuous efforts this comprehensive reference work would not exist.
We wish to thank Joseph Burns (Senior Editor), Sandra Fabiani (Executive Editor)
and Melanie Tucker (Editor) who proposed the project to us and provided invaluable
counsel in shaping and developing this reference work. We wish to express our
profound gratitude to Sylvia Blago and Simone Giesler from Springer for their
outstanding editorial efforts in producing this exciting Encyclopedia.

March 2013 Werner Dubitzky, Olaf Wolkenhauer, Kwang-Hyun Cho,


Hiroki Yokota
Editors-in-Chief
Coleraine / Rostock / Daejeon / Indianapolis
Editors-in-Chief

Werner Dubitzky Biomedical Sciences Research Institute, University of Ulster,


Coleraine, UK
Olaf Wolkenhauer Department of Computer Science, University of Rostock,
Rostock, Germany
Kwang-Hyun Cho Department of Bio and Brain Engineering, Korea Advanced
Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Hiroki Yokota Department of Biomedical Engineering, Rensselaer Polytechnic
Institute, Troy, NY, USA

vii
About the Editors

Professor Werner Dubitzky currently occupies the Chair of Bioinformatics at the at


the University of Ulster, Coleraine, UK. In 1991, he received an MSc in electrical
engineering from the University of Applied Sciences, Augsburg, Germany, and in
1997 a Ph.D. in computer science from the University of Ulster at Jordanstown, UK.
His main research areas include bioinformatics and computational systems biology,
artificial intelligence, data mining and management, multi-scale modelling and
simulation, and large-scale computing technology.
I was always fascinated by the diversity and complexity in the life sciences. Looking
at the intricate inner workings of only a single cell, one wonders if humans will
ever fully understand its complicated ways. It is clear to me that a comprehensive
understanding of complex life phenomena requires a discipline-crossing approach.
To make such an approach efficient, we require a shared trans-disciplinary conceptual
framework. I am excited to be involved in the development of the Encyclopedia of
Systems Biology, as I see it as an important contribution in the construction of such
a framework.

ix
x About the Editors

Professor Olaf Wolkenhauer received first degrees in control engineering from the
University of Applied Sciences in Hamburg, Germany and the University of
Portsmouth, U.K. in 1994 and his Ph.D. from UMIST in Manchester in 1997). Before
moving to the University of Rostock, where he occupies the Chair in Systems Biology &
Bioinformatics, he a research lectureship at the Control Systems Centre in Manchester,
a joint senior lectureship between the Department of Biomolecular Sciences and the
Department of Electrical Engineering and Electronics, at UMIST. Since October 2004
he is an Adjunct Professor in the Department of Electrical Engineering and Computer
Science at Case Western Reserve University, Cleveland, USA an became 2005 a fellow
of the Stellenbosch Institute for Advanced Study (STIAS). Olaf Wolkenhauer’s
research interest is mathematical modelling and data analysis, focussing on nonlinear
dynamical systems in molecular and cell biology. Further, detailed information can be
obtained from his webpage at www.sbi.uni-rostock.de.
Systems biology should not be considered a discipline but an approach to under-
stand complex biological systems. Nonlinear spatio-temporal phenomena, crossing
multiple levels of structural and functional organization are the reason why mathe-
matical modelling and simulation can play an important role. The fact that so many
different topics come together in systems biology is a challenge but also an additional
motivation to advance our understanding of living systems.
About the Editors xi

Professor Kwang-Hyun Cho received B.S., M.S., and Ph.D. degrees in electrical
engineering from the Korea Advanced Institute of Science and Technology (KAIST)
in 1993 (summa cum laude), 1995, and 1998, respectively. He was an Associate
Professor at the College of Medicine, Seoul National University, Korea until 2007
and is currently the Chair Professor in the Department of Bio and Brain Engineering
at the KAIST and a director of the Laboratory for Systems Biology and Bio-Inspired
Engineering (http://sbie.kaist.ac.kr). He was the recipient of the IEEE/IEEK Joint
Award for Young IT Engineer, the Young Scientist Award from the President of
Korea, and the Walton Fellow Award from Science Foundation of Ireland. He has
been working on systems biology with biomedical applications, network evolution,
and bio-inspired engineering. His innovative contribution to systems biology and
bio-inspired engineering research by combining an engineering approach with
biochemical experimentation has led to over 126 high-profile international
journal publications. He is currently the Editor-in-Chief of IET Systems Biology
(IET, London, U.K.).
As a systems scientist, I was intrigued by the complex and puzzling dynamics of
living systems. In particular, their self-organizing behaviors and emergent properties
led me to think about the fundamental difference of living systems compared to
non-living systems from a systems science point of view. Systems biology is the way
of approaching answers to such questions and I hope the Encyclopedia of Systems
Biology to be a useful travel guide for that expedition.
xii About the Editors

Dr Hiroki Yokota is a Professor of Biomedical Engineering at the Purdue School of


Engineering and Technology, at Indiana University Purdue University Indianapolis
(IUPUI), U.S.A. He is also a Director of Biomechanics and Biomaterials Research
Centre at IUPUI. He received a PhD in astronautics from the University of Tokyo
(Japan) in 1983, and a PhD in molecular, cellular and developmental biology from
Indiana University (U.S.A.) in 1993. He is a fellow of American Institute for Medical
and Biological Engineering since 2009. Currently, his main research interests are
skeletal biology, molecular and cellular mechanics, and applications of control
theories in metabolic processes.
Unlike a construction crane’s arm, our long bones are constantly remodeled by
a series of destruction events followed by rebuilding. This simplified comparison
through my research of skeletal systems highlights two independent systems design
principles in the crane and bone. My sincere hope is that the Encyclopaedia of
Systems Biology will assist our understanding of adaptive biological systems and
stimulate useful applications of their design principles.
Section Editors

Applications in Drug Discovery


Nagasuma Chandra Bioinformatics Centre and SERC, Indian Institute of Science,
Malleshwaram, Bangalore, India

Artificial Intelligence and Machine Learning


Daniel Berrar Tokyo Institute of Technology, Interdisciplinary Graduate School of
Science and Engineering, Nagatsuta, Midori-ku, Yokohama, Japan
Werner Dubitzky University of Ulster, Coleraine, UK

Aritificial Life
Jan T. Kim University of East Anglia, School of Computing Sciences, Norwich, UK

Cancer
Carlos Sonnenschein Department of Anatomy and Cellular Biology, Tufts Univer-
sity School of Medicine, Boston, MA, USA
Ana Soto Department of Anatomy and Cellular Biology, Tufts University School of
Medicine, Boston, MA, USA

Cell Cycle
Attila Csikász-Nagy CoSBi, The Microsoft Research, Centre for Computational
and Systems Biology, University of Trento, Povo (Trento), Italy

Computational MicroRNA Biology


Ulf Schmitz Department of Systems Biology and Bioinformatics, University of
Rostock, Rostock, Germany
Julio Vera Group of Systems Biology for Cancer and Aging, Department of
Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany

Gene Regulatory Networks: Modeling, Reconstruction and Analysis


Luonan Chen Department of Electrical Engineering and Electronics, Osaka Sangyo
University, Daito, Osaka, Japan

xiii
xiv Section Editors

Large-Scale and High-Performance Computing


Mario Cannataro School of Informatics and Biomedical Engineering, University
“Magna Græcia” of Catanzaro, Viale Europa (Località Germaneto), Catanzaro, Italy

Mathematical Theory and Methods


Daniel A. Beard Department of Physiology, Biotechnology and Bioengineering
Center, Medical College of Wisconsin, Milwaukee, WI, USA
Helen Byrne Centre for Mathematical Medicine and Biology, School of Mathemat-
ical Sciences, University of Nottingham, Nottingham, UK

Metabolic Networks
Osbaldo Resendis-Antonio Systems Biology Group, Instituto Nacional de
Medicina Genomica, (INMEGEN), Tlalpan, Mexico City, DF, Mexico

Model Databases
Jacky Snoep Department of Biochemistry, University of Stellenbosch, Matieland,
South Africa

Philosophical and Foundational Issues


Ulrich Krohs Department of Philosophy, University of Hamburg, Hamburg,
Germany

Quantitative Experiment Design


Ann E. Rundell Biomedical Engineering, Purdue University, West Lafayette,
IN, USA

Resources
Eduardo Mendoza Department of Computer Science, University of the Philippines,
Diliman, Quezon City, Philippines

Single Cell Experiments


Feng-Chun Yang Herman B Wells Center for Pediatric Research, Indiana Univer-
sity School of Medicine, Indianapolis, IN, USA

Standards, Exchange Formats, Guidelines and Ontologies


Chris Taylor EMBL, The European Bioinformatics Institute Wellcome Trust
Genome Campus, Hinxton, Cambridgeshire, UK

Statistical Theory and Methods


Eugene Kolker Biomedical and Health Informatics Division, Medical Education
and Biomedical Informatics Department, School of Medicine, Seattle, WA, USA
Section Editors xv

Structural Network Analysis


Nitin Bhardwaj Molecular Biophysics and Biochemistry, Yale University, New
Haven, CT, USA

Systems Approaches to Translational Biomedical Research: Focus


on Diagnostic and Prognostic Applications
Francisco Azuaje Laboratory of Cardiovascular Research, CRP-Sante,
Luxembourg

Systems Immunology
Sudipto Saha Centenary Campus, Bose Institute, Kolkata, West Bengal, India

Text Mining
J€
org Hakenberg Department of Computer Science and Department of Biomedical
Informatics, Arizona State University, Tempe, AZ, USA

Toponomics
Walter Schubert Molecular Pattern Recognition Research Group, Medical Faculty,
Otto-von-Guericke-University Magdeburg, Magdeburg, Germany

Transcriptional Regulation
Masami Horikoshi Institute of Molecular and Cellular Biosciences, University of
Tokyo, Bunkyo-ku, Tokyo, Japan

Translational Control and Systems Biology


Katsura Asano Division of Biology, Kansas State University, Manhattan, KS, USA
Contributors

Adele A. Abrahamsen Center for Research in Language, University of California,


San Diego, La Jolla, CA, USA

Pragyan Acharya Department of Biochemistry, Indian Institute of Science, Ban-


galore, Karnataka, India

Naruhiko Adachi Structure-guided Drug Development Project, JBIC Research


Institute, Japan Biological Informatics Consortium, Tokyo, Japan

Christoph Adami Department of Microbiology and Molecular Genetics, Michigan


State University, East Lansing, MI, USA

Giuseppe Agapito Department of Experimental Medicine and Clinic, University


Magna Graecia of Catanzaro, Catanzaro, Italy

Shipra Agrawal BioCOS Life Sciences Pvt. Limited, Institute of Bioinformatics


and Applied Biotechnology, Bangalore, Karnataka, India

Baltazar D. Aguda Mathematical Biosciences Institute, Ohio State University,


Columbus, OH, USA

Jesús Aguilar-Ruiz School of Engineering, Pablo de Olavide University, Escuela


Politécnica Superior, Seville, Spain

Mario Albrecht Computational Biology and Applied Algorithmics, Max Planck


Institute for Informatics, Saarbr€ucken, Germany

Annette A. Alcasabas BioSyntha Technology, Welwyn Garden City, UK

Roberta Alfieri Institute for Biomedical Technologies – CNR (Consiglio Nazionale


delle Ricerche), Segrate, Milan, Italy

Valencia Alfonso Structural Biology and BioComputing Programme, Spanish


National Cancer Research Centre (CNIO), Madrid, Spain

Syed Toufeeq Ali-Ahmed Department of BioMedical Informatics, Vanderbilt


University, Nashville, TN, USA

Frank Allg€ ower Institute for Systems Theory and Automatic Control, University of
Stuttgart, Stuttgart, Germany

Ronnie Alves Instituto de Informática, Universidade Federal do Rio Grande do Sul,


Porto Alegre, Brazil

xvii
xviii Contributors

Flavio Amara Dipartimento di Scienze Biomolecolari e Biotecnologie, Università


di Milano, Milan, Italy
Francesco Amato Department of Clinical and Experimental Medicine, Magna
Graecia University of Catanzaro, Catanzaro, Italy
Sophia Ananiadou National Centre for Text Mining, School of Computer Science,
University of Manchester, Manchester, UK
Hanne Andersen Department of Physics and Astronomy, Centre for Science
Studies, Aarhus University, Aarhus, Denmark
James Anderson Department of Engineering Science, University of Oxford,
Oxford, UK
Hifzur Rahman Ansari Bioinformatics Centre, Institute of Microbial Technology,
Chandigarh, Chandigarh, India
Erick Antezana Department of Biology, Systems Biology Group, Norwegian
University of Science and Technology, Trondheim, Norway
Dagoberto Armenta-Medina Departamento de Ingenierı́a Celular y Biocatálisis,
Instituto de Biotecnologı́a, Universidad Nacional Autónoma de México, Cuernavaca,
Morelos, México
Katsura Asano Molecular, Cellular, and Developmental Biology Program,
Division of Biology, Kansas State University, Manhattan, KS, USA
Diana Ascencio Laboratorio Nacional de Genómica para la Biodiversidad,
CINVESTAV-IPN, Irapuato, Guanajuato, Mexico
Teresa K. Attwood Faculty of Life Sciences and School of Computer Science,
University of Manchester, Manchester, UK
Claudio Avignone-Rossa Department of Microbial and Cellular Sciences,
Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, UK
M. Madan Babu MRC Laboratory of Molecular Biology, University of Cambridge,
Cambridge, UK
Walter de Back Centre for High Performance Computing (ZIH), Technical
University Dresden, Dresden, Germany
David A. Bader School of Computational Science and Engineering, College of
Computing, Georgia Institute of Technology, Atlanta, GA, USA
urg B€
J€ ahler Department of Genetics, Evolution and Environment and UCL Cancer
Institute, University College London, London, UK
José Ignacio Agulleiro Baldó Supercomputing-Algorithms Group, University of
Almerı́a, Almerı́a, Spain
Eva Balsa-Canto Ministerio de Economı́a y Competitividad, Bioprocess Engineer-
ing Group, IIM-CSIC, Vigo, Spain
Julio R. Banga Ministerio de Economı́a y Competitividad, Bioprocess Engineering
Group, IIM-CSIC, Vigo, Spain
Contributors xix

Mary Helen Barcellos-Hoff Department of Radiation Oncology and Cell Biology,


New York University School of Medicine, New York, NY, USA
André Bardow Institute of Technical Thermodynamics, RWTH Aachen
University, Aachen, Germany
Neil Andrew D. Bascos Protein Structure and Immunology Laboratory, National
Institute of Molecular Biology and Biotechnology, University of the Philippines,
Quezon City, Philippines
Riza Theresa Batista-Navarro National Centre for Text Mining, Manchester
Interdisciplinary Biocentre, Manchester, UK
William A. Baumgartner Jr University of Colorado Denver, Aurora, CO, USA
Sean Bechhofer School of Computer Science, University of Manchester,
Manchester, UK
William Bechtel Department of Philosophy and Center for Chronobiology, Univer-
sity of California, San Diego, La Jolla, San Diego, CA, USA
Elena Beisswanger Jena University Language and Information Engineering
(JULIE) Laboratory, Friedrich-Schiller-University Jena, Jena, Germany
Xavier Bellés Institut de Biologia Evolutiva, CSIC-UPF (Universitat Pompoi
Fabra), Barcelona, Spain
Bertrand Bellier Immunology-Immunopathology-Immunotherapy, UPMC Univer-
sity Paris 06, UMR7211, CNRS, UMR7211, INSERM, U959, Paris, France
Daniel Berrar Interdisciplinary Graduate School of Science and Engineering,
Tokyo Institute of Technology, Midori-ku, Yokohama, Japan
Dany Beste Division of Microbial Sciences, Faculty of Health and Medical
Sciences, University of Surrey, Guildford, Surrey, UK
Fabrizio Bezzo CAPE-Laboratory, Dipartimento di Ingegneria Industriale, Univer-
sity of Padova, Padova, Italy
Upinder S. Bhalla Neurobiology, National Center for Biological Sciences, Tata
Institute of Fundamental Research, Bangalore, India
M. M. Srinivas Bharath Department of Neurochemistry, National Institute of
Mental Health and Neurosciences (NIMHANS), Bangalore, Karnataka, India
P. Jayadeva Bhat School of Bioscience and Engineering, Indian Institute of
Technology, Mumbai, India
Animesh Bhattacharya Department of Dermatology, Venereology and
Allergology, University of Leipzig, Leipzig, Germany
Leonardo Bich Department of Logic and Philosophy of Science, IAS-Research
Center for Life, Mind, and Society, University of the Basque Country, UPV/EHU,
Donostia-San Sebastián, Spain
José Román Bilbao-Castro Supercomputing-Algorithms Group, University of
Almerı́a, Almerı́a, Spain
xx Contributors

Zhao Bing Department of Biochemistry, University of Western Ontario, London,


ON, Canada
Emanuele G. Biondi CR1-CNRS-Laboratory of Biochemistry and Integrated
Structural Biology, Institut de Recherche Interdisciplinaire - IRI CNRS USR3078,
Villeneuve d’Ascq, France
Jesus Bisbal Departament de Tecnologies de la Informació i les Comunicacions,
Universitat Pompeu Fabra, Barcelona, Spain
Hagen Blankenburg Computational Biology and Applied Algorithmics, Max
Planck Institute for Informatics, Saarbr€
ucken, Germany
Rainer Blasczyk Hannover Medical School, Institute for Transfusion Medicine,
Hannover, Germany
Hendrik Blockeel Department of Computer Science, Katholieke Universiteit
Leuven, Heverlee, Belgium
Leiden Institute of Advanced Computer Science, Universiteit Leiden, Leiden, The
Netherlands
Olivier Bodenreider Lister Hill National Center for Biomedical Communications,
US National Library of Medicine, Bethesda, MD, USA
Davide Bolchini School of Informatics, Indiana University, Indianapolis, IN, USA
Daniel R. Boutz Institute for Cellular and Molecular Biology, University of Texas,
Austin, TX, USA
Vani Brahmachari Dr. B. R. Ambedkar Center for Biomedical Research,
University of Delhi, Delhi, India
Ryan R. Brinkman Terry Fox Laboratory, BC Cancer Agency, BC Cancer
Research Centre, Vancouver, BC, Canada
Department of Medical Genetics, University of British Columbia, Vancouver, BC,
Canada
Randall D. Britten Auckland Bioengineering Institute, University of Auckland,
Auckland, New Zealand
Alistair J. P. Brown Institute of Medical Sciences, University of Aberdeen,
Aberdeen, UK
Gerhard Buck-Sorlin Institut de Recherche en Horticulture et Semences, UMR
1345 (INRA/Agrocampus-ouest/Université d’Angers) Agrocampus Ouest, Angers,
France
Scott M. Bugenhagen Computational Bioengineering Group, Biotechnology and
Bioengineering Center, Medical College of Wisconsin, Milwaukee, WI, USA
Marc Bullock Cancer Research UK Centre, University of Southampton
Cancer Sciences Division, Southampton University Hospital NHS Trust, Southamp-
ton, UK
Richard M. Burian Department of Philosophy, Virginia Tech, Blacksburg, VA, USA
Contributors xxi

Michael E. Bushell Department of Microbial and Cellular Sciences, University of


Surrey, Guildford, Surrey, UK
Gregery T. Buzzard Department of Mathematics, Purdue University, West
Lafayette, IN, USA
Laurence Calzone Institut Curie – INSERM U900 – Mines ParisTech, Paris, France
Anyela Camargo Institute of Biological, Environmental and Rural Sciences,
Aberystwyth University, Aberystwyth, Ceredigion, UK
Mario Cannataro Department of Medical and Surgical Sciences, University Magna
Graecia of Catanzaro, Catanzaro, Italy
Virginio Cantoni Department of Computer Engineering and Systems Science,
University of Pavia, Pavia, Italy
Computational Biology, KTH Royal Institute of Technology, Stockholm, Sweden
Martin Carrier Department of Philosophy, Bielefeld University, Bielefeld, Germany
Eugenio Cesario ICAR-CNR (Istituto di Calcolo e Reti ad Alte Prestazioni), Rende,
CS, Italy
Guilhem Chalancon MRC Laboratory of Molecular Biology, University of
Cambridge, Cambridge, UK
Alan Champneys Department of Engineering Mathematics, University of Bristol,
Bristol, UK
Mark R. Chance Center for Proteomics and Bioinformatics and Department of
Genetics, School of Medicine, Case Western Reserve University, Cleveland, OH,
USA
Nagasuma Chandra Department of Biochemistry, Indian Institute of Science,
Bangalore, India
Rupanjali Chaudhuri G.N. Ramachandran Knowledge Centre for Genome Infor-
matics, Institute of Genomics and Integrative Biology, Delhi, India
Vijayalakshmi Chelliah EMBL European Bioinformatics Institute, Hinxton,
Cambridgeshire, UK
Katherine C. Chen Department of Biological Sciences, Virginia Polytechnic
Institute and State University, Blacksburg, VA, USA
James J. Chen U.S. Food and Drug Administration, National Center for Toxico-
logical Research (HFT-20), Jefferson, AR, USA
Li-Hsunan Chen Department of Chemical Engineering, National Chung Cheng
University, Chiayi, Taiwan
Xi Chen Advanced Modeling and Applied Computing Laboratory, Department of
Mathematics, University of Hong Kong, Hong Kong, China
Wai-Ki Ching Advanced Modeling and Applied Computing Laboratory, Depart-
ment of Mathematics, University of Hong Kong, Hong Kong, China
xxii Contributors

Miew Keen Choong Centre for Health Informatics, Australian Institute for Health
Innovation, University of New South Wales, Sydney, NSW, Australia
I-Chun Chou The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Yunfei Chu Artie McFerrin Department of Chemical Engineering, Texas A&M
University, College Station, TX, USA
Andrea Ciliberto IFOM – The FIRC Institute of Molecular Oncology, Milan, Italy
Jean Clairambault Equipe-Projet BANG, INRIA Paris-Rocquencourt BP 105,
Le Chesnay, Paris, France
Amanda Clare Department of Computer Science, Aberystwyth University,
Aberystwyth, UK
Tim Clark Department of Neurology, Massachusetts General Hospital and Harvard
Medical School, Boston, MA, USA
Richard H. Clayton Department of Computer Science, University of Sheffield,
Sheffield, UK
Gilles Clermont Center for Inflammation and Regenerative Modeling, University
of Pittsburgh, Pittsburgh, PA, USA
George M. Coghill Institute for Complex System and Mathematical Biology,
University of Aberdeen, Aberdeen, UK
Kevin Bretonnel Cohen Center for Computational Pharmacology, University of
Colorado, Aurora, CO, USA
Matteo Comin Department Information Engineering, University of Padova,
Padova, Italy
David Cook AstraZeneca, Translational Patient Safety and Enabling Science,
Macclesfield, Chester, UK
Carlo Cosentino Department of Clinical and Experimental Medicine, Magna
Graecia University of Catanzaro, Catanzaro, Italy
Richard G. Côté European Molecular Biology Laboratory, European Bioinformat-
ics Institute (EBI), Hinxton, Cambridge, UK
Carl F. Craver Philosophy and Philosophy-Neuroscience-Psychology Program,
Philosophy Department, Washington University in Saint Louis, St. Louis, MO, USA
Attila Csikász-Nagy The Microsoft Research - University of Trento Centre for
Computational and Systems Biology, Trento, Italy
Lalit Darunte Department of Chemical Engineering, Indian Institute of Techno-
logy, Mumbai, India
Ranjan K. Dash Department of Physiology, Biotechnology and Bioengineering
Center, Medical College of Wisconsin, Milwaukee, WI, USA
Jonathan F. Davies Fondazione Bruno Kessler, Trento, Italy
Contributors xxiii

Barbara J. Davis Section of Pathology, Tufts Cummings School of Veterinary


Medicine Biomedical Sciences, North Grafton, MA, USA

Thomas S. Deisboeck Harvard-MIT (HST) Athinoula A. Martinos Center for


Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA

David S. DeLuca Cancer Genome Analysis Group, The Broad Institute, Cambridge,
MA, USA

Alexander DeLuna Laboratorio Nacional de Genómica para la Biodiversidad,


CINVESTAV-IPN, Irapuato, Guanajuato, Mexico

Andreas Deutsch Center for Information Services and High Performance Compu-
ting (ZIH), Technical University Dresden, Dresden, Germany

Rohan Dhiman Center for Pulmonary and Infectious Disease Control, University of
Texas Health Science Center at Tyler, Tyler, TX, USA

Akos Dobay Institute of Evolutionary Biology and Environmental Studies (IEU),


University of Zurich, Zurich, Switzerland

Maria Pamela Dobay Department of Physics, Ludwig-Maximilians University,


Munich, Germany

Andreas Doncic Department of Biology, Stanford University School of Medicine,


Stanford, CA, USA

Brecht M. R. Donckels Department of Mathematical Modelling, Statistics and


Bioinformatics, Ghent University, Ghent, Belgium

Piercarlo Dondi Department of Computer Engineering and Systems Science,


University of Pavia, Pavia, Italy

Francis J. Doyle III Department of Chemical Engineering, Institute for


Collaborative Biotechnologies, University of California, Santa Barbara, CA, USA

Andreas Dr€
ager Center for Bioinformatics, University of Tuebingen, Tuebingen,
Germany

Dirk Drasdo Institute National de Recherche en Informatique et en Automatique,


Paris–Rocquencourt, France
Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany

Andreas Dress University of Bielefeld, Bielefeld, Germany


Key Laboratory for Computational Biology (PICB), Shanghai Institutes
for Biological Sciences (SIBS), Chinese Academy of Science and Max Planck
Society (CAS–MPG) Partner Institute, Shanghai, China
Science Center at “infinity3 GmbH”, Bielefeld, Germany

Franco du Preez SysMO-DB team, Manchester Centre for Integrative Systems


Biology, University of Manchester, Manchester, UK

Werner Dubitzky Biomedical Sciences Research Institute, University of Ulster,


Coleraine, UK
xxiv Contributors

Włodzisław Duch Department of Informatics, Nicolaus Copernicus University,


Toruń, Poland
School of Computer Engineering, Nanyang Technological University, Singapore,
Singapore
Franco duPreez SysMO-DB team, University of Manchester, Manchester Centre
for Integrative Systems Biology, Manchester, UK
Harsh Dweep Medical Research Center, Medical Faculty Mannheim, University of
Heidelberg, Mannheim, Germany
Gaurav Dwivedi The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Oliver Ebenh€ oh Institute for Complex Systems and Mathematical Biology,
University of Aberdeen, Kings College, Old Aberdeen, Aberdeen, UK
Karen Eilbeck Department of Biomedical Informatics, University of Utah, Salt
Lake City, UT, USA
Department of Human Genetics, Eccles Institute of Human Genetics, University of
Utah, Salt Lake City, UT, USA
Yasser EL-Manzalawy Center for Computational Intelligence, Learning, and Dis-
covery, Computer Science, Iowa State University, Ames, IA, USA
Heiko Enderling Center of Cancer Systems Biology, St. Elizabeth’s Medical Cen-
ter - CBR 115D, Tufts University School of Medicine, Boston, MA, USA
David Epstein Mathematics Institute, University of Warwick, Warwick, UK
Arantza Etxeberria Department of Logic and Philosophy of Science, IAS-
Research Center for Life, Mind, and Society, University of the Basque Country,
UPV/EHU, Donostia - San Sebastián, Spain
Anthony Faiola School of Informatics, Indiana University, Indianapolis, IN, USA
Dana Faratian Centre for Molecular Pathology, Institute of Cancer Research,
Surrey, UK
Malcolm Farrow School of Mathematics & Statistics, Newcastle University,
Newcastle upon Tyne, UK
Adrien Fauré Graduate School of Science and Engineering, Yamaguchi Univesity,
Yamaguchi, Japan
Nnenna A. Finn The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Darren R. Flower School of Life and Health Sciences, Aston University, Birming-
ham, UK
Thomas Fober Philipps-Universit€at Marburg, Marburg, Germany
Luı́s L. Fonseca Department of Biomedical Engineering, Georgia Institute of
Technology and Emory University, Atlanta, GA, USA
Contributors xxv

Christian V. Forst Department of Clinical Sciences, Southwestern Medical Center,


University of Texas, Dallas, TX, USA
Erika De Francesco Exeura, Rende (CS), Italy
Sherry Freiesleben Department of Systems Biology and Bioinformatics, Institute
of Computer Science, University of Rostock, Rostock, Germany
Peter Frick Vanderbilt University, Nashville, TN, USA
Tamaki Fujimori Human Metabolome Technologies Inc., Tsuruoka, Yamagata, Japan
Johannes F€
urnkranz Technical University of Darmstadt, Darmstadt, Germany
Alessandro Gaggia Department of Computer Engineering and Systems Science,
University of Pavia, Pavia, Italy
Ester M. Garzón Department of Computer Architecture and Electronics, Univer-
sity of Almerı́a, Almerı́a, Spain
Riccardo Gatti Department of Computer Engineering and Systems Science,
University of Pavia, Pavia, Italy
Hao Ge School of Mathematical Sciences and Centre for Computational Systems
Biology, Fudan University, Shanghai, China
Martin Gerner Faculty of Life Sciences, University of Manchester, Manchester, UK
Andrew D. Gewitz Institute for Computational and Mathematical Engineering and
School of Medicine, Stanford University, Stanford, CA, USA
Anne Gieseler Molecular Pattern Recognition Research (MPRR) Group, Otto-von-
Guericke-University Magdeburg Medical Faculty, Magdeburg, Germany
Véronique Giudicelli Laboratoire d’ImmunoGénétique Moléculaire, Institut de
Génétique Humaine UPR 1142, Université Montpellier 2, Montpellier, France
Carole Goble School of Computer Science, University of Manchester, Manchester, UK
Andrés Fernando González Barrios Department of Chemical Engineering,
Universidad de los Andes, Bogotá, Colombia
Orland Gonzalez Institute for Bioinformatics, Ludwig-Maximilians-University
Munich, Munich, Germany
Sara Green Department of Physics and Astronomy, Centre for Science Studies,
Aarhus University, Aarhus, Denmark
Norbert Gretz Medical Research Center, Medical Faculty Mannheim, University
of Heidelberg, Mannheim, Germany
Manish Grover Department of Biochemistry, Indian Institute of Science, Banga-
lore, Karnataka, India
Daniel V. Guebel Department of Systems Biology and Bioinformatics, Institute of
Computer Sciences, University of Rostock, Rostock, Germany
Concettina Guerra Department Information Engineering, University of Padova,
Padova, Italy
xxvi Contributors

Shailendra K. Gupta Indian Institute of Toxicology Research (CSIR), Lucknow,


India
Pietro Hiram Guzzi Department of Experimental Medicine and Clinic, University
Magna Gratia of Catanzaro, Catanzaro, Italy
Gordon L. Hager Laboratory of Receptor Biology and Gene Expression, National
Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Juergen Hahn Artie McFerrin Department of Chemical Engineering, Texas A&M
University, College Station, TX, USA
Udo Hahn Jena University Language and Information Engineering (JULIE) Labo-
ratory, Friedrich-Schiller-University Jena, Jena, Germany
J€org Hakenberg Department of Computer Science and Department of Biomedical
Informatics, Arizona State University, Tempe, AZ, USA
Marta Halina Department of Philosophy and Science Studies Program, University
of California, San Diego, San Diego, La Jolla, CA, USA
David Harrison School of Medicine, University of St Andrews, St Andrews,
Scotland, UK
G. V. HarshaRani Neurobiology, National Center for Biological Sciences, Tata
Institute of Fundamental Research, Bangalore, India
Jan Hasenauer Institute for Systems Theory and Automatic Control, University of
Stuttgart, Stuttgart, Germany
Haralambos Hatzikirou Department of Pathology, School of Medicine, University
of New Mexico, Albuquerque, NM, USA
Michael Hauhs Ecological Modeling, University of Bayreuth, Bayreuth, Germany
Winston Haynes Seattle Children’s Research Institute, Seattle, WA, USA
Yongzheng He Department of Pediatrics, Indiana University School of Medicine,
Indianapolis, IN, USA
Michael Hecker Department of Neurology, Institute of Immunology, University of
Rostock, Rostock, Germany
Jan G. Hengstler Leibniz Research Center for Working Environment and Human
Factors, Dortmund, Germany
Alexander Herbig Center for Bioinformatics T€ubingen, Faculty of Science,
University of T€
ubingen, T€
ubingen, Germany
Georgina Hernández-Montes Departamento de Medicina Molecular y
Bioprocesos, Universidad Nacional Autónoma de México, Cuernavaca, Morelos,
México
Roger Higdon Seattle Children’s Research Institute, Seattle, WA, USA
Reyk Hillert Molecular Pattern Recognition Research (MPRR) Group, Otto-von-
Guericke-University, Magdeburg, Germany
Contributors xxvii

Stefan Hoehme Interdisciplinary Center for Bioinformatics, University of Leipzig,


Leipzig, Germany
Carson Holt Department of Human Genetics, University of Utah, Salt Lake City,
UT, USA
Vasant Honavar Center for Computational Intelligence, Learning, and Discovery,
Computer Science, Iowa State University, Ames, IA, USA
Christian I. Hong Department of Molecular and Cellular Physiology, University of
Cincinnati College of Medicine, Cincinnati, OH, USA

Seok-Hee Hong School of IT, University of Sydney, Sydney, NSW, Australia


Katsuhisa Horimoto Computational Biology Research Center, National Institute of
Advanced Industrial and Science and Technology, Koto-ku, Tokyo, Japan
Anna Horváth Department of Applied Biotechnology and Food Science, Budapest
University of Technology and Economics, Budapest, Hungary
Péter Horvatovich Department of Pharmacy, Analytical Biochemistry Research
Group, University of Groningen, Groningen, The Netherlands
C. Vyvyan Howard Centre for Molecular Biosciences, University of Ulster,
Coleraine, UK

Yufei Huang Picower Institute for Learning and Memory, Massachusetts Institute
of Technology, Cambridge, MA, USA
Greehey Children’s Cancer Research Institute, University of Texas at Texas Health
Science Center at San Antonio, San Antonio, TX, USA
Department of Epidemiology and Biostatistics, University of Texas at Texas Health
Science Center at San Antonio, San Antonio, TX, USA
Fuyan Hu Institute of Systems Biology, Shanghai University, Shanghai, China

Michael Hucka Computing and Mathematical Sciences, California Institute of


Technology, Pasadena, CA, USA
Eyke H€
ullermeier Philipps-Universit€at Marburg, Marburg, Germany
Andreas H€
uttemann Department of Philosophy, University of Cologne, Cologne,
Germany
Philippe Huneman Institut d’Histoire et de Philosophie (IHPST), des Sciences et
des Techniques, Université Paris 1 Panthéon-Sorbonne, Paris, France
Ying Hung Department of Statistics and Biostatistics, Rutgers University,
Piscataway, NJ, USA

C. Anthony Hunt Department of Bioengineering and Therapeutic Sciences,


University of California, San Francisco, CA, USA
Akira Ishihama Department of Frontier Bioscience, Hosei University, Tokyo,
Japan
xxviii Contributors

Koichi Ito Graduate School of Frontier Sciences, University of Tokyo, Chiba,


Tokyo, Japan
Ravi Iyengar Department of Pharmacology and Systems Therapeutics, Mount Sinai
School of Medicine, New York, NY, USA
James W. Jacobberger Case Comprehensive Cancer Center, Case Western
Reserve University, Cleveland, OH, USA
Shruti Jain Dr. B. R. Ambedkar Center for Biomedical Research, University of
Delhi, Delhi, India
Lars Juhl Jensen Novo Nordisk Foundation Center for Protein Research,
University of Copenhagen, Copenhagen, Denmark
Paul A. Jensen Department of Biomedical Engineering, University of Virginia,
Charlottesville, VA, USA
Euna Jeong Institute of Medical Science, University of Tokyo, Tokyo, Japan
Di Jia Department of Surgery/Harvard Medical School, Vascular Biology Program/
Children’s Hospital Boston, Boston, MA, USA
Guangxu Jin Systems Medicine and Bioengineering, Bioengineering and Bioinfor-
matics Program, The Methodist Hospital Research Institute, Weill Medical College,
Cornell University, Houston, TX, USA
Andrew R. Jones Institute of Integrative Biology, University of Liverpool,
Liverpool, UK
Bj€orn Junker Systems Biology and Bioinformatic, Leibniz Institute of Plant Gene-
tics and Crop Plant Research (IPK), OT Gatersleben, Stadt Seeland, Germany
Nick Juty EMBL-European Bioinformatics Institute, Wellcome Trust Genome
Campus, Hinxton, Cambridgeshire, UK
Lars Kaderali Institute for Medical Informatics and Biometry Technical
University, Dresden, Germany
Marie I. Kaiser Department of Philosophy, University of Cologne, Cologne,
Germany
Aleksi Kallio CSC, IT Center for Science Ltd, Espoo, Finland
Orsolya Kapuy Department of Biochemistry, Oxford Centre for Integrative Sys-
tems Biology, University of Oxford, Oxford, UK
Mahendra Kavdia Department of Biomedical Engineering, Wayne State
University, Detroit, MI, USA
Gota Kawai Department of Life and Environmental Sciences, Chiba Institute of
Technology, Narashino, Chiba, Japan
Junya Kawauchi Department of Biochemical Genetics, Medical Research Institute,
Tokyo Medical and Dental University, Tokyo, Japan
Sarah M. Keating EMBL -European Bioinformatics Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridgeshire, UK
Contributors xxix

C. Maria Keet KRDB Research Centre, Free University of Bozen-Bolzano,


Bolzano, Italy
Douglas Bruce Kell School of Chemistry and Manchester Interdisciplinary
Biocentre, University of Manchester, Manchester, UK
Evelyn Fox Keller Massachusetts Institute of Technology (MIT), Cambridge,
MA, USA
Melissa L. Kemp The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Stefan Kempa Integrative Metabolomics and Proteomics, BIMSB (Berlin Institute
for Medical Systems Biology), Berlin, Germany
Noel Kennedy School of Biomedical Sciences, University of Ulster, Coleraine, UK
Carsten Kettner Beilstein-Institut zur F€orderung der Chemischen Wissenschaften,
Frankfurt am Main, Germany
Melissa Key Center for Computational Diagnostics, Indianapolis, IN, USA
Michael Khan Department of Biological Science, Biomedical Research Institute,
University of Warwick, Coventry, UK
Javed Mohammed Khan Department of Chemistry and Biomolecular Sciences and
ARC Center of Excellence in Bioinformatics, Macquarie University, Sydney, NSW,
Australia
Narsis Aftab Kiani ViroQuant Research Group Modeling, BioQuant, University of
Heidelberg, Heidelberg, Germany
Andrzej M. Kierzek Division of Microbial Sciences, Faculty of Health and Medical
Sciences, University of Surrey, Guildford, Surrey, UK
Jan T. Kim School of Computing Sciences, University of East Anglia, Norwich,
Norfolk, UK
Jin-Dong Kim Database Center for Life Science, Tokyo, Japan
Angelika Kimmig Departement Computerwetenschappen, Katholieke Universiteit
Leuven, Heverlee, Belgium
John R. King School of Mathematical Sciences, University of Nottingham,
Nottingham, UK
Alexandros Kiparissides Biological Systems Engineering Laboratory, Department
of Chemical Engineering, Centre for Process Systems Engineering, Imperial College,
London, UK
Max Kistler IHPST, Université Paris 1 Panthéon-Sorbonne, Paris, France
Shigetaka Kitajima Department of Biochemical Genetics, Medical Research
Institute, Tokyo Medical and Dental University, Tokyo, Japan
David Klatzmann Immunology-Immunopathology-Immunotherapy, UPMC Uni-
versity Paris 06, UMR7211, CNRS, UMR7211, INSERM, U959, Paris, France
xxx Contributors

Edda Klipp Theoretical Biophysics, Humboldt-Universit€at zu Berlin, Berlin,


Germany
Dirk Koczan Department of Neurology, Institute of Immunology, University of
Rostock, Rostock, Germany
Tetsuro Kokubo Department of Supramolecular Biology, Graduate School of
Nanobioscience, Yokohama City University, Yokohama, Kanagawa, Japan
Michal Kolář Institute of Molecular Genetics, Academy of Sciences of the Czech
Republic, Prague, Czech Republic
Eugene Kolker Bioinformatics and High–throughput Analysis Laboratory, Seattle
Children’s Research Institute, Seattle, WA, USA
Departments of Biomedical Informatics and Medical Education and Pediatrics,
University of Washington, Seattle, WA, USA
Arun S. Konagurthu Clayton School of Information Technology, Monash
University, Clayton, VIC, Australia
K. K. Koutroumpas Automatic Control Laboratory, ETH Zurich, Zurich,
Switzerland
Martin Krallinger Structural Biology and BioComputing Programme, Spanish
National Cancer Research Centre (CNIO), Madrid, Spain
Clemens Kreutz Institute for Physics, Freiburger Center for Data Analysis
and Modeling (FDM), University of Freiburg, Freiburg, Germany
Ulrich Krohs Department of Philosophy, University of M€unster, M€unster, Germany
Andreas Krusche Molecular Pattern Recognition Research (MPRR) Group,
Otto-von-Guericke-University, Magdeburg, Germany
Kai Kruse MRC Laboratory of Molecular Biology, University of Cambridge,
Cambridge, UK
Martin Kuiper Department of Biology, Systems Biology Group, Norwegian
University of Science and Technology, Trondheim, Norway
Sanjeev Kumar BioCOS Life Sciences Private Limited, Bangalore, Karnataka, India
Manfred Kunz Department of Dermatology, Venereology and Allergology,
University of Leipzig, Leipzig, Germany
Hitoshi Kurumizaka Graduate School of Advanced Science and Engineering,
Waseda University, Tokyo, Japan
Xin Lai Department of Systems Biology and Bioinformatics, Institute of Computer
Science, University of Rostock, Rostock, Germany
Camille Laibe EMBL-European Bioinformatics Institute, Wellcome Trust Genome
Campus, Hinxton, Cambridgeshire, UK
Tai Ning Lam Department of Bioengineering and Therapeutic Sciences, University
of California, San Francisco, CA, USA
Contributors xxxi

Tze Hau Lam Department of Microbiology, Yong Loo Lin School of Medicine,
National University of Singapore, Singapore, Singapore
Christoph H. Lampert IST Austria, Klosterneuburg, Austria
Qizong Lao Laboratory of Receptor Biology and Gene Expression, National Cancer
Institute, National Institutes of Health, Bethesda, MD, USA
Sneh Lata Cold Spring Harbor Laboratory, Genome Research Center, Woodbury,
NY, USA
Reinhard Laubenbacher Virginia Bioinformatics Institute, Virginia Tech,
Blacksburg, VA, USA
Sang Yup Lee Department of Chemical and Biomolecular Engineering and
Department of Bio and Brain Engineering, Korea Advanced Institute of Science
and Technology (KAIST), Daejeon, Korea
Tae Lee School of Biology and School of Physics, Georgia Institute of Technology,
Atlanta, GA, USA
Yun Lee The Wallace H. Coulter Department of Biomedical Engineering, Georgia
Institute of Technology and Emory University, Atlanta, GA, USA
Marie-Paule Lefranc Laboratoire d’ImmunoGénétique Moléculaire, Institut de
Génétique Humaine UPR 1142, Université Montpellier 2, Montpellier, France
Jinzhi Lei Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University of
Beijing, Beijing, China
Florian Leitner Structural Biology and BioComputing Programme, Spanish
National Cancer Research Centre (CNIO), Madrid, Spain
Sabina Leonelli ESRC Centre for Genomics in Society, University of Exeter,
Exeter, Devon, UK
Ulf Leser Institute for Computer Science, Humboldt-Universit€at zu Berlin, Berlin,
Germany
Arthur M. Lesk Department of Biochemistry and Molecular Biology, and Huck
Institutes of Genomics, Proteomics, and Bioinformatics, Pennsylvania State Univer-
sity, University Park, PA, USA
Wentian Li The Robert S. Boas Center for Genomics and Human Genetics,
Feinstein Institute for Medical Research, Manhasset, NY, USA
Yan Li Department of Pediatrics, Indiana University School of Medicine, Indiana-
polis, IN, USA
Zhenping Li School of Information, Beijing Wuzi University, Beijing, China
Wolfram Liebermeister Institut f€ur Biochemie, Charité - Universit€atsmedizin
Berlin, Berlin, Germany
Frank Lin Centre for Health Informatics, Australian Institute for Health Innovation,
University of New South Wales, Sydney, NSW, Australia
Ke-Qin Liu Institute of Systems Biology, Shanghai University, Shanghai, China
xxxii Contributors

Xiaojun Liu Internal Medicine, The Second Hospital of Hebei Medical University,
Shijiazhuang, Hebei, China
Xiaoping Liu Institute of Systems Biology, Shanghai University, Shanghai, China
Zhi-Ping Liu Key Laboratory of Systems Biology, Shanghai Institutes for Biolo-
gical Sciences, Chinese Academy of Sciences, Shanghai, China
Catherine M. Lloyd Auckland Bioengineering Institute, University of Auckland,
Auckland, New Zealand
Luca Lombardi Department of Computer Engineering and Systems Science,
University of Pavia, Pavia, Italy
Isaac F. López-Moyado Center for Genomic Sciences-UNAM, Universidad
Nacional Autonoma de Mexico, Cuernavaca (Morelos), Mexico
Jingky Lozano-K€
uhne Department of Public Health, University of Oxford,
Oxford, UK
Long Jason Lu Division of Biomedical Informatics, Cincinnati Children’s Hospital
Research Foundation, Cincinnati, OH, USA
John Lygeros Automatic Control Laboratory, ETH Zurich, Zurich, Switzerland
Zoi Lygerou School of Medicine, Laboratory of General Biology, University of
Patras, Patras, Greece
Shuangge Ma Yale School of Public Health, Yale University, New Haven, CT, USA
Hong-Wu Ma School of Informatics, University of Edinburgh, Edinburgh, UK
Lars Malmstr€ om Institute of Molecular Systems Biology IMSB, ETH Zurich,
Zurich, Switzerland
Marcos Malumbres Spanish National Cancer Research Centre (CNIO), Madrid,
Spain
Sebastian Mana-Capelli Department of Molecular Genetics and Microbiology,
University of Massachusetts, Worcester, MA, USA
Athanasios Mantalaris Biological Systems Engineering Laboratory, Department
of Chemical Engineering, Centre for Process Systems Engineering, Imperial College,
London, UK
Samuel Marguerat Department of Genetics, Evolution and Environment and UCL
Cancer Institute, University College London, London, UK
Alberto Marin-Sanguino Faculty of Mechanical Engineering, Specialty
Division for Systems Biotechnology, Technical University Munich, Garching,
Germany
José Antonio Martı́nez Department of Computer Architecture and Electronics,
University of Almerı́a, Almerı́a, Spain
Elio Mattia Centre for Systems Chemistry, Stratingh Institute for Chemistry,
Rijksuniversiteit Groningen, Groningen, The Netherlands
Contributors xxxiii

Dannel McCollum Department of Molecular Genetics and Microbiology,


University of Massachusetts, Worcester, MA, USA

Johnjoe McFadden Division of Microbial Sciences, Faculty of Health and Medical


Sciences, University of Surrey, Guildford, Surrey, UK

Barry McMullin The Rince Institute, Dublin City University, Dublin, Ireland

Hendrik Mehlhorn Leibniz Institute of Plant Genetics and Crop Plant Research (IPK),
Gatersleben, Stadt Seeland, Germany

Francisco Melo Pontificia Universidad Católica de Chile, Santiago, Chile


Millennium Institute on Immunology and Immunotherapy, Santiago, Chile

Nuno D. Mendes Knowledge Discovery and Bioinformatics (KDBIO) Group,


INESC–ID, Lisbon, Portugal
Network Modelling Group, Instituto Gulbenkian de Ciência (IGC), Oeiras,
Portugal

Eduardo Mendoza Department of Computer Science, University of the Philippines


Diliman, Quezon City, Philippines

Donna L. Mendrick Division of Systems Biology, National Center for


Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA

Jia Meng Picower Institute for Learning and Memory, Massachusetts Institute of
Technology, Cambridge, MA, USA

Carsten Mente Center for Information Services and High Performance


Computing (ZIH), Technical University Dresden, Dresden, Germany

Ivan Merelli Institute for Biomedical Technologies – CNR (Consiglio Nazionale


delle Ricerche), Segrate, Milan, Italy

Marco Mernberger Philipps-Universit€at Marburg, Marburg, Germany

Luciano Milanesi Institute for Biomedical Technologies – CNR (Consiglio


Nazionale delle Ricerche), Segrate, Milan, Italy

Andrew K. Miller Auckland Bioengineering Institute, University of Auckland,


Auckland, New Zealand

Alex Mirnezami Cancer Sciences, Faculty of Medicine, University of


Southampton, Southampton, UK
Cancer Research UK Centre, Cancer Sciences Division, School of Medicine,
Southampton General Hospital, Southampton, UK

Vladimir Mironov Department of Biology, Systems Biology Group, Norwegian


University of Science and Technology, Trondheim, Norway

Satoru Miyano Institute of Medical Science, University of Tokyo, Tokyo, Japan

Raúl Montañez Institut de Biologia Evolutiva CSIC-UPF (Universitat Pompoi


Fabra), Barcelona, Spain
xxxiv Contributors

Emily A. Moon Functional Genomics Laboratory, Department of Psychiatry and


Human Behavior, University of California, Irvine, CA, USA
Ion Moraru R.D. Berlin Center for Cell Analysis and Modeling, University of
Connecticut Health Center, Farmington, CT, USA
Alvaro Moreno Department of Logic and Philosophy of Science, University of the
Basque Country, San Sebastian, Spain
Rafael Moreno-Sánchez Department of Biochemistry, National Institute for
Cardiology “Ignacio Chávez”, Mexico City, Mexico
Sergio Moreno Instituto de Biologı́a Molecular y Celular del Cáncer, CSIC/
Universidad de Salamanca, Salamanca, Spain
Hisao Moriya Okayama University Research Core for Interdisciplinary Sciences,
Kita-ku, Okayama, Japan
Ettore Mosca Institute for Biomedical Technologies – CNR (Consiglio Nazionale
delle Ricerche), Segrate, Milan, Italy
Marsha A. Moses Department of Surgery/Harvard Medical School, Vascular
Biology Program/Children’s Hospital Boston, Boston, MA, USA
Lenny Moss Department of Sociology and Philosophy, University of Exeter,
Exeter, Devon, UK
Matteo Mossio IAS-Research Philosophy of Biology Group, Department of
Logic and Philosophy of Science, University of the Basque Country (UPV/EHU),
Donostia – San Sebastian, Spain
Sumanta Mukherjee Bioinformatics Centre, Indian Institute of Science,
Bangalore, Karnataka, India
Ivan Mura The Microsoft Research - University of Trento Centre for Computa-
tional and Systems Biology, Trento, Italy
Katsuhiko S. Murakami Department of Biochemistry and Molecular Biology,
Pennsylvania State University, University Park, PA, USA
Yota Murakami Department of Chemistry, Hokkaido University, Sapporo, Japan
Mark A. Musen Stanford Center for Biomedical Informatics Research, Stanford
University, Stanford, CA, USA
Marco Muzi-Falconi Dipartimento di Scienze Biomolecolari e Biotecnologie,
Università di Milano, Milan, Italy
Rajeswara Babu Mythri Department of Neurochemistry, National Institute of
Mental Health and Neurosciences (NIMHANS), Bangalore, Karnataka, India
Jose C. Nacher Department of Information Science, Faculty of Science, Toho
University, Funabashi, Chiba, Japan
Haroon Naeem Institute for Bioinformatics, Ludwig-Maximilians-University
Munich, Munich, Germany
Masao Nagasaki Institute of Medical Science, University of Tokyo, Tokyo, Japan
Contributors xxxv

Grzegorz Nalepa Department of Pediatrics, Section of Pediatric Hematology-


Oncology, Riley Hospital for Children, Indiana University School of Medicine,
Indianapolis, IN, USA
Nobukazu Nameki Gunma University, Kiryu, Japan
Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of
Bioinformatics, Bharathiar University, Coimbatore, India
Tim W. Nattkemper Biodata Mining Group, Faculty of Technology, Bielefeld
University, Bielefeld, Germany
Chrystopher L. Nehaniv Royal Society/Wolfson Bio Computation
Research Laboratory, School of Computer Science, University of Hertfordshire,
Hatfield, UK
Patrick Nelson CCMB 2017 Palmer Commons, University of Michigan, Ann
Arbor, MI, USA
Goran Nenadic Manchester Interdisciplinary Biocentre, University of Manchester,
Manchester, UK
David P. Nickerson Auckland Bioengineering Institute, University of Auckland,
Auckland, New Zealand
Poul M. Nielsen Department of Engineering Science, University of Auckland,
Auckland, New Zealand
Kay Nieselt Center for Bioinformatics T€ubingen, Faculty of Science, University of
T€
ubingen, T€
ubingen, Germany
Siegfried Nijssen Katholieke Universiteit, Leuven, Belgium
Svetoslav Nikolov Institute of Mechanics and Biomechanics-BAS, Bulgarian
Academy of Science, Sofia, Bulgaria
Chris J. Norbury Sir William Dunn School of Pathology, University of Oxford,
Oxford, UK
Béla Novák Department of Biochemistry, Oxford Centre for Integrative Systems
Biology, University of Oxford, Oxford, UK
Nicolas le Novère EMBL-European Bioinformatics Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridgeshire, UK
Geoff Nunns Auckland Bioengineering Institute, University of Auckland,
Auckland, New Zealand
Michael F. Ochs Department of Oncology, Johns Hopkins University, Baltimore,
MD, USA
Maureen A. O’Malley Department of Philosophy, University of Sydney, Sydney,
NSW, Australia
Sandra Orchard EMBL Outstation, European Bioinformatics Institute, Hinxton,
Cambridge, UK
Gerard J. Ostheimer Department of Agriculture, Washington, DC, USA
xxxvi Contributors

Casey L. Overby Medical Education and Biomedical Informatics, University of


Washington School of Medicine, Seattle, WA, USA
Brigitte Katrin Paap Department of Neurology, Institute of Immunology, Univer-
sity of Rostock, Rostock, Germany
Graham Packham Cancer Research UK Centre, University of Southampton
Cancer Sciences Division, Southampton University Hospital NHS Trust,
Southampton, UK
Niall Palfreyman Biotechnology and Bioinformatics, Weihenstephan-Triesdorf
University of Applied Science, Freising, Germany
Alida Palmisano Department of Biological Sciences and Department of Computer
Science, Department of Biological Sciences Virginia Tech, Virginia Polytechnic
Institute and State University, Blacksburg, VA, USA
Luigi Palopoli Dipartimento di Elettronica, Informatica e Sistemistica Università
della Calabria, Rende, Italy
Paolo De Paoli Centro di Riferimento Oncologico (CRO) – IRCCS, Aviano, PN, Italy
Antonis Papachristodoulou Department of Engineering Science, University of
Oxford, Oxford, UK
Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford, UK
Jason A. Papin Department of Biomedical Engineering, University of Virginia,
Charlottesville, VA, USA
Sunwon Park Department of Chemical and Biomolecular Engineering, Korea
Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Chaitali Paul G.N. Ramachandran Knowledge Centre for Genome Informatics,
Institute of Genomics and Integrative Biology, Delhi, India
Shayn M. Peirce Department of Biomedical Engineering, University of Virginia,
Charlottesville, VA, USA
Luiz O. F. Penalva Department of Cellular and Structural Biology, Greehey
Children’s Cancer Research Institute, University of Texas Health Science Center,
San Antonio, TX, USA
Livia Pérez-Hidalgo Instituto de Biologı́a Molecular y Celular del Cáncer, CSIC/
Universidad de Salamanca, Salamanca, Spain
Ernesto Perez-Rueda Departamento de Ingenieŕıa Celular y Biocatálisis, Instituto
de Biotecnologia, Universidad Nacional Autonóma de México, Cuernavaca, More-
los, Mexico
Nikolai Petrovsky Department of Endocrinology, Flinders Medical Centre/Flinders
University, Adelaide, Australia
Steve R. Pettifer Faculty of Life Sciences and School of Computer Science,
University of Manchester, Manchester, UK
Andrés Mauricio Pinzón Velasco Department of Systems Engineering,
Universidad Jorge Tadeo Lozano, Bogotá, Colombia
Contributors xxxvii

Efstratios Pistikopoulos Biological Systems Engineering Laboratory, Department


of Chemical Engineering, Centre for Process Systems Engineering, Imperial College,
London, UK

Conrad Plake Biotechnology Center (BIOTEC), Technische Universit€at Dresden,


Dresden, Germany

Hannes Planatscher Natural and Medical Sciences Institute at the University of


Tuebingen, Reutlingen, Germany

Paolo Plevani Dipartimento di Scienze Biomolecolari e Biotecnologie, Università


di Milano, Milan, Italy

Daniel Polani Adaptive Systems Research Group, School of Computer Science,


University of Hertfordshire, Hatfield, UK

Joseph R. Pomerening Department of Biology, Indiana University, Bloomington,


IN, USA

Ranjan K. Pradhan Department of Physiology, Biotechnology and Bioengineering


Center, Medical College of Wisconsin, Milwaukee, WI, USA

Karyala Prashanthi Bioinformatics Centre, Indian Institute of Science, Bangalore,


Karnataka, India

Corrado Priami Microsoft Research-University of Trento Centre for Computa-


tional and Systems Biology and DISI, University of Trento, Povo, Trento, Italy

Ela Pustulka-Hunt GS SystemsX, ETH Zurich, Zurich, Switzerland

Sampo Pyysalo Department of Computer Science, University of Tokyo, Tokyo, Japan

Zhen Qi The Wallace H. Coulter Department of Biomedical Engineering, Georgia


Institute of Technology and Emory University, Atlanta, GA, USA

Hong Qian Department of Applied Mathematics, University of Washington,


Seattle, WA, USA

Yu-Qing Qiu Department of Chemical Pathology, The Chinese University of Hong


Kong, Hong Kong, China

Novi Quadrianto SML-NICTA and RSISE-ANU, Canberra, ACT, Australia

Andreas Quandt Institute of Molecular Systems Biology IMSB, ETH Zurich,


Zurich, Switzerland

Vito Quaranta The Vanderbilt-Ingram Cancer Center, Nashville, TN, USA

Anna Rácz-Mónus Department of Applied Biotechnology and Food Science,


Budapest University of Technology and Economics, Budapest, Hungary

Gajendra Raghava Bioinformatics Centre, Institute of Microbial Technology,


Chandigarh, Chandigarh, India

Kalpana Raja Data Mining and Text Mining Laboratory, Department of Bioinfor-
matics, Bharathiar University, Coimbatore, India
xxxviii Contributors

Meghna Rajvanshi Department of Chemical Engineering, Indian Institute of


Technology Bombay, Powai, Mumbai, Maharashtra, India
Jorge Mario Gomez Ramirez Department of Chemical Engineering, Universidad
de los Andes, Bogotá, Colombia
Jan Ramon Declarative Languages and Artificial Intelligence Group, Katholieke
Universiteit Leuven, Leuven, Belgium
Shoba Ranganathan Department of Chemistry and Biomolecular Sciences and
ARC Center of Excellence in Bioinformatics, Macquarie University, Sydney,
NSW, Australia
Department of Biochemistry, Yong Loo Lin School of Medicine, National University
of Singapore, Singapore
Rajni Rani Molecular Immunogenetics Group, National Institute of Immunology,
New Delhi, Delhi, India
M. R. Satyanarayana Rao Chromatin Biology Laboratory, Molecular Biology and
Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Banga-
lore, Karnataka, India
Andreas Raue Institute for Physics, University of Freiburg, Freiburg, Germany
Dietrich Rebholz-Schuhmann European Bioinformatics Institute, Hinxton, UK
Ee chee Ren Laboratory of Immunogenetics and Viral Host-Pathogen Genomics,
Singapore Immunology Network, Department of Microbiology, Yong Loo Lin
School of Medicine, National University of Singapore, Singapore, Singapore
Xianwen Ren Key Laboratory of Systems Biology of Pathogens, Ministry of
Health, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and
Peking Union Medical College, Beijing, China
Osbaldo Resendis-Antonio Center for Genomics Sciences-UNAM, Universidad
Nacional Autonóma de México, Cuernavaca, Morelos, Mexico
Silvia Restrepo Department of Biological Sciences, Universidad de los Andes,
Bogotá, Colombia
Steven D. Rhodes School of Medicine, Indiana University, Indianapolis, IN, USA
Robert C. Richardson Department of Philosophy, University of Cincinnati,
Cincinnati, OH, USA
Gregory Riddick Neuro-Oncology Branch, National Cancer Institute, Bethesda,
MD, USA
Andrea Rocco Division of Microbial Sciences, Faculty of Health and Medical
Sciences, University of Surrey, Guildford, Surrey, UK
Carlos Rodrı́guez-Caso Complex Systems Laboratory, University Pompeu Fabra,
Barcelona, Spain
Domingo Rodrı́guez-Baena School of Engineering, Pablo de Olavide University,
Escuela Politécnica Superior, Seville, Spain
Contributors xxxix

Maria Rodriguez-Fernandez Department of Chemical Engineering, Institute for


Collaborative Biotechnologies, University of California, Santa Barbara, CA, USA
M. Carmen Romano Institute for Complex Systems and Mathematical Biology,
University of Aberdeen, Aberdeen, UK
Institute of Medical Sciences, University of Aberdeen, Aberdeen, UK
Simona E. Rombo Istituto di Calcolo e Reti ad Alte Prestazioni, Consiglio
Nazionale delle Ricerche, Rende (CS), Italy
Dipartimento di Elettronica, Informatica e Sistemistica, Università della Calabria,
Rende, Italy
Ricardo Cruz-Herrera del Rosario Genome Institute of Singapore, Singapore,
Singapore
Sigrid Rouam Université Paris-Sud, Villejuif, France
Jianhua Ruan Department of Computer Science, University of Texas at San
Antonio, San Antonio, TX, USA
Fabian Rudolf Department of Biosystems Science and Engineering, ETH Zurich,
Basel, Switzerland
Oliver Ruebenacker Center for Cell Analysis and Modeling, University of Con-
necticut Health Center, Farmington, CT, USA
Kepa Ruiz-Mirazo Department Logic and Philosophy of Science, University of the
Basque Country, Donostia – San Sebastián, Spain
Biophysics Unit (CSIC–UPV/EHU), University of the Basque Country, Leioa, Spain
Ann E. Rundell Biomedical Engineering, Purdue University, West Lafayette, IN, USA
Mika O. Ruonala NeuroToponomics Group, Center for Membrane Proteomics
Campus Riedberg, Goethe University of Frankfurt, Frankfurt, Germany
Emma Saavedra Department of Biochemistry, National Institute for Cardiology
“Ignacio Chávez”, Mexico City, Mexico
Sudipto Saha School of Medicine, Center for Proteomics and Bioinformatics, Case
Western Reserve University, Cleveland, OH, USA
Kazuki Saito Graduate School of Frontier Sciences, University of Tokyo, Chiba,
Tokyo, Japan
Shigeru Saito Infocom Corporation, Tokyo, Japan
Taiichi Sakamoto Chiba Institute of Technology, Narashino, Japan
Kishore R. Sakharkar Infectious Diseases Cluster, Universiti Sains Malaysia,
Pulau Pinang, Malaysia
Meena K. Sakharkar Graduate School of Life and Environmental Science,
University of Tsukuba, Tsukuba, Ibaraki, Japan
Alberto Sanguino Specialty Division for Systems Biotechnology, Technical
University of Munich, Garching, Germany
xl Contributors

Giuliana Di Santo Dipartimento di Elettronica, Informatica e Sistemistica


Università della Calabria, Rende, Italy

Akinori Sarai Department of Bioscience and Bioinformatics, Kyushu Institute of


Technology (KIT), Iizuka, Japan

Kazunori Sasaki Human Metabolome Technologies Inc., Tsuruoka, Yamagata, Japan

Masamitsu Sato Department of Biophysics and Biochemistry, Graduate School of


Science, University of Tokyo, Bunkyo–ku, Tokyo, Japan
PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama, Japan

Emre Sayan Cancer Research UK Centre, University of Southampton Cancer


Sciences Division, Southampton University Hospital NHS Trust, Southampton, UK

Jutta Schickore Department of History and Philosophy of Science, Indiana Uni-


versity, Bloomington, IN, USA

Tamar Schlick Department of Chemistry and Courant Institute of Mathematical


Sciences, New York University, New York, NY, USA

Oliver Schmidt School of Biomedical Sciences, University of Ulster, Coleraine, UK

Bernhard Schmierer Department of Biochemistry, Oxford Centre for Integrative


Systems Biology (OCISB), University of Oxford, Oxford, UK

Ulf Schmitz Department of Systems Biology and Bioinformatics, Institute of Com-


puter Science, University of Rostock, Rostock, Germany

Christian Sch€onbach Department of Bioscience and Bioinformatics, Kyushu Insti-


tute of Technology, Iizuka, Fukuoka, Japan

Falk Schreiber Leibniz Institute of Plant Genetics and Crop Plant Research (IPK),
OT Gatersleben, Stadt Seeland, Germany
Martin Luther University Halle–Wittenberg, Halle, Germany

Walter Schubert Molecular Pattern Recognition Research (MPRR) Group,


Otto–von–Guericke University Magdeburg, Magdeburg, Germany
Chinese Academy of Science and Max Planck Society (CAS–MPG) Partner Institute
for Computational Biology (PICB), Shanghai, China
Human Toponome Project, ToposNomos Ltd, Munich, Germany

Jean-Marc Schwartz Manchester Institute of Biotechnology, Faculty of Life


Sciences, University of Manchester, Manchester, UK

Masayuki Seki Graduate School of Pharmaceutical Sciences, Tohoku University,


Sendai, Miyagi, Japan

Rituparna Sen G.N. Ramachandran Knowledge Centre for Genome Informatics,


Institute of Genomics and Integrative Biology, Delhi, India

Toshiya Senda Biomedicinal Information Research Centre (BIRC), National Insti-


tute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Contributors xli

Ab Rauf Shah G.N. Ramachandran Knowledge Centre for Genome Informatics,


Institute of Genomics and Integrative Biology, Delhi, India

Jagesh Shah Department of Systems Biology, Harvard Medical School and Renal
Division, Brigham and Women’s Hospital, Boston, MA, USA

Deepak Sharma Translational Health Science and Technology Institute, Gurgaon,


India

Nobuo Shimamoto Faculty of Life Sciences, Kyoto Sangyo University, Kyoto, Japan

Richard Simon Division of Cancer Treatment and Diagnosis, National Cancer


Institute, Rockville, Maryland, USA

Harpreet Singh Computational Biology Service Unit, Cornell University, Ithaca,


NY, USA
Biomedical Informatics Centre, Indian Council of Medical Research, New Delhi,
India

Adrien Six Immunology-Immunopathology-Immunotherapy, UPMC University


Paris 06, UMR7211, CNRS, UMR7211, INSERM, U959, Paris, France

Jan M. Skotheim Department of Biology, Stanford University School of Medicine,


Stanford, CA, USA

Brian D. Sleeman School of Mathematics, University of Leeds, Leeds, UK

Andrew D. Smith Molecular and Computational Biology Section, Division of


Biological Sciences, University of Southern California, Los Angeles, CA, USA

Jacky L. Snoep Department of Biochemistry, Stellenbosch University, Matieland,


South Africa
Manchester Institute for Biotechnology, University of Manchester, Manchester, UK
Molecular Cell Physiology, Vrije Universiteit Amsterdam, Amsterdam, The
Netherlands

Tatsuhiko Someya University of Tsukuba, Tsukuba, Japan

Pramod R. Somvanshi Department of Chemical Engineering, Indian Institute of


Technology Bombay, Powai, Mumbai, Maharashtra, India

Carlos Sonnenschein Department of Anatomy and Cellular Biology, Tufts Univer-


sity School of Medicine, Boston, MA, USA

Ana M. Soto Department of Anatomy and Cellular Biology, Tufts University


School of Medicine, Boston, MA, USA

Sandro J. de Souza Ludwig Institute for Cancer Research, Sao Paolo, Brazil

Irena Spasić School of Computer Science and Informatics, Cardiff University,


Cardiff, UK

Josef Spidlen Terry Fox Laboratory, BC Cancer Agency, BC Cancer Research


Centre, Vancouver, BC, Canada
xlii Contributors

Rahul Sreekumar Cancer Research UK Centre, University of Southampton Cancer


Sciences Division, Southampton University Hospital NHS Trust, Southampton, UK
Ramachandran Srinivasan G.N. Ramachandran Knowledge Centre for Genome
Informatics, Institute of Genomics and Integrative Biology, Delhi, India
Gesa Staats Centre for Molecular Biosciences, University of Ulster, Coleraine, UK
Larissa Stanberry Bioinformatics and High-throughput Analysis Laboratory,
Seattle Children’s Research Institute, Seattle, WA, USA
Ian Stansfield Institute of Medical Sciences, University of Aberdeen, Aberdeen, UK
J€orn Starruß Centre for High Performance Computing (ZIH), Technical University
Dresden, Dresden, Germany
Karl Staser Departments of Pediatrics and Biochemistry, Indiana University School
of Medicine, Indianapolis, IN, USA
Christiana Staudinger Department of Molecular Systems Biology, University of
Vienna, Vienna, Austria
Joerg Stelling Department of Biosystems Science and Engineering, ETH Zurich,
Basel, Switzerland
Achim Stephan Institute of Cognitive Science, University of Osnabr€uck,
Osnabr€
uck, Germany
Susan Stepney Department of Computer Science, University of York, York, UK
Carsten Sticht Medical Research Center, Medical Faculty Mannheim, University of
Heidelberg, Mannheim, Germany
Beatriz Stransky Universidade Federal do ABC, Santo Andre, Brazil
Suresh Subramani Data Mining and Text Mining Laboratory, Department of
Bioinformatics, Bharathiar University, Coimbatore, India
Chenglei Sun Institute of Systems Biology, Shanghai University, Shanghai,
Shanghai, China
Xiaojuan Sun Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua
University of Beijing, Beijing, China
Myong-Hee Sung Laboratory of Receptor Biology and Gene Expression, National
Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Avadhesha Surolia Molecular Biophysics Unit, Indian Institute of Science,
Bangalore, India
Ákos Sveiczer Department of Applied Biotechnology and Food Science, Budapest
University of Technology and Economics, Budapest, Hungary
Martin Swain Institute of Biological, Environmental, and Rural Sciences,
Aberystwyth University, Aberystwyth, Ceredigion, UK
Hideaki Tagami Graduate School of Natural Sciences, Nagoya City University,
Nagoya, Aichi, Japan
Contributors xliii

Kan Tanaka Chemical Resources Laboratory, Tokyo Institute of Technology,


Yokohama, Kanagawa, Japan
Jijun Tang Department of Computer Science and Engineering, University of South
Carolina, Columbia, SC, USA
Luis Tari Pharma Early Development Informatics, Hoffmann-La Roche Inc.,
Nutley, NJ, USA
Utpal Tatu Department of Biochemistry, Indian Institute of Science, Bangalore,
Karnataka, India
Giorgio Terracina Dipartimento di Matematica, Università della Calabria, Rende, Italy
Madhan Thamilarasan Department of Neurology, Institute of Immunology,
University of Rostock, Rostock, Germany
Denis Thieffry IBENS - UMR ENS - CNRS 8197 - INSERM 1024, Paris,
France
Véronique Thomas-Vaslin Immunology-Immunopathology-Immunotherapy,
UPMC University Paris 06, UMR7211, CNRS, UMR7211, INSERM, U959, Paris,
France
Paul Thompson School of Biomedical Sciences, University of Ulster, Coleraine, UK
Jens Timmer Institute for Physics, University of Freiburg, Freiburg, Germany
BIOSS Centre for Biological Signalling Studies and Freiburg Institute for Advanced
Studies (FRIAS), Freiburg, Germany
Department of Clinical and Experimental Medicine, Link€oping University,
Link€
oping, Sweden
Jared Toettcher University of California, San Francisco, CA, USA
Joo Chuan Tong Data Mining Department, Institute for Infocomm Research,
Singapore, Singapore
Department of Biochemistry, Yong Loo Lin School of Medicine, National University
of Singapore, Singapore
Weida Tong Division of Systems Biology, National Center for Toxicological
Research, US Food and Drug Administration, Jefferson, AR, USA
Néstor V. Torres Department of Biochemistry and Molecular Biology, University
of La Laguna, San Cristóbal de La Laguna, Islas Canarias, Tenerife, Spain
Giuseppe Tradigo Department of Experimental and Clinical Medicine, University
Magna Græcia of Catanzaro, Catanzaro, Italy
Baltasar Trancón y Widemann Ecological Modelling, University of Bayreuth,
Bayreuth, Germany
Guy Tsafnat Centre for Health Informatics, Australian Institute for Health
Innovation, University of New South Wales, Sydney, NSW, Australia
Krasimira Tsaneva-Atanasova Department of Engineering Mathematics,
University of Bristol, Bristol, UK
xliv Contributors

Nam-Kiu Tsing Advanced Modeling and Applied Computing Laboratory,


Department of Mathematics, University of Hong Kong, Hong Kong, China
Jarno Tuimala Finnish Red Cross Blood Service, Helsinki, Finland
Tamás Turányi Institute of Chemistry, E€
otv€
os University (ELTE), Budapest, Hungary
John J. Tyson Department of Biological Sciences, Virginia Polytechnic Institute
and State University, Blacksburg, VA, USA
Darren Tyson Vanderbilt University, Nashville, TN, USA
Jon Umerez IAS-Research Philosophy of Biology Group, Department of Logic and
Philosophy of Science, University of the Basque Country (UPV/EHU), Donostia –
San Sebastian, Spain
Leoš Shivaya Valášek Laboratory of Regulation of Gene Expression, Institute of
Microbiology AVCR, Prague, Czech Republic
Shireen Vali Cell Works Group Inc., Bangalore, India
Justin N. Vaughn Department of Biochemistry, Cellular, and Molecular Biology,
University of Tennessee, Knoxville, USA
Marquis P. Vawter Functional Genomics Laboratory, Department of Psychiatry
and Human Behavior, University of California, Irvine, CA, USA
Fransisco Vázquez Department of Computer Architecture and Electronics, Univer-
sity of Almerı́a, Almerı́a, Spain
Kann Vearasilp Department of Systems Biology and Bioinformatics, Institute of
Computer Science, University of Rostock, Rostock, Germany
Nubia Velasco Department of Industrial Engineering, Universidad de los Andes,
Bogotá, Colombia
Pierangelo Veltri Department of Medical and Surgical Sciences, University Magna
Græcia of Catanzaro, Catanzaro, Italy
Kareenhalli V. Venkatesh Department of Chemical Engineering, Indian Institute
of Technology Bombay, Powai, Mumbai, Maharashtra, India
Celine Vens Department of Computer Science, Katholieke Universiteit Leuven,
Leuven, Belgium
Julio Vera Department of Systems Biology and Bioinformatics, Institute of
Computer Science, University of Rostock, Rostock, Germany
Rajni Verma G.N. Ramachandran Knowledge Centre for Genome Informatics,
Institute of Genomics and Integrative Biology, Delhi, India
Karin Verspoor Victoria Research Laboratory, National ICT Australia, University
of Melbourne, Melbourne, VIC, Australia
Center for Computational Pharmacology, University of Colorado, Aurora, CO, USA
Basilio Vescio Department of Clinical and Experimental Medicine, Magne Graecia
University of Catanzaro, Catanzaro, Italy
Contributors xlv

Kalyan C. Vinnakota Computational Bioengineering Group, Biotechnology and


Bioengineering Center, Medical College of Wisconsin, Milwaukee, WI, USA
P. K. Vinod Department of Biochemistry, Oxford Centre for Integrative Systems
Biology, University of Oxford, Oxford, UK
Rosella Visintin IEO, European Institute of Oncology, Milan, Italy
Olga Vitek Department of Statistics, Department of Computer Science, Purdue
University, West Lafayette, IN, USA
Juan Antonio Vizcaı́no EMBL Outstation, European Bioinformatics Institute
(EBI), European Molecular Biology Laboratory Wellcome Trust Genome Campus,
Hinxton, Cambridge, UK
Dat T. Vo Department of Cellular and Structural Biology, Greehey Children’s Cancer
Research Institute, University of Texas Health Science Center, San Antonio, TX, USA
Eberhard O. Voit The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Albrecht G. von Arnim Department of Biochemistry, Cellular, and Molecular
Biology and Graduate Program in Genome Science and Technology, University of
Tennessee, Knoxville, USA
Anja Voss-B€ ohme Center for Information Services and High Performance Com-
puting (ZIH), Technical University Dresden, Dresden, Germany
Christine Waddington MOAC/University of Warwick, Coventry, UK
Andreas Wagner Institute of Evolutionary Biology and Environmental Studies,
University of Zurich, Zurich, Switzerland
Steffen Waldherr Institute for Systems Theory and Automatic Control, University
of Stuttgart, Stuttgart, Germany
Feng-Sheng Wang Department of Chemical Engineering, National Chung Cheng
University, Chiayi, Taiwan
Haiying Wang School of Computing and Mathematics, Computer Science
Research Institute, University of Ulster, Jordanstown, UK
Jiguang Wang Beijing Institute of Genomics, Chinese Academy of Sciences,
Beijing, Beijing, China
Lin Wang School of Computer Science and Information Engineering, Tianjin
University of Science and Technology, Tianjin, China
Rui-Sheng Wang Department of Physics, Pennsylvania State University, Univer-
sity Park, PA, USA
Ruiqi Wang Institute of Systems Biology, Shanghai University, Shanghai, China
Yong Wang Academy of Mathematics and Systems Sciences, Chinese Academy of
Sciences, Beijing, China
Zhihui Wang Harvard-MIT (HST) Athinoula A. Martinos Center for Biomedical
Imaging, Massachusetts General Hospital, Charlestown, MA, USA
xlvi Contributors

Lin Wanglin School of Computer Science and Information Engineering, Tianjin


University of Science and Technology, Tianjin, China
Eric Werner Department of Physiology, Anatomy and Genetics, University of
Oxford, Oxford, UK
Department of Computer Science, University of Oxford, Oxford, UK
Oxford Advanced Research Foundation, Fort Myers, FL, USA
Stefanie Wienkoop Department of Molecular Systems Biology, University of
Vienna, Vienna, Austria
Peter Wilkinson Vaccine and Gene Therapy Institute, Port St. Lucie, FL, USA
Jana Wolf Mathematical Modelling of Cellular Processes, MDC Max Delbr€uck
Center for Molecular Medicine, Berlin, Germany
Katy Wolstencroft School of Computer Science, University of Manchester,
Manchester, UK
Arno Wouters Department of Philosophy, Erasmus University Rotterdam,
Rotterdam, The Netherlands
Xiaohua Wu Department of Pediatrics, Herman B Wells Center for Pediatric
Research, Indiana University School of Medicine, Indianapolis, IN, USA
Kejia Xu Institute of Systems Biology, Shanghai University, Shanghai, China
Yuki Yamaguchi Tokyo Institute of Technology, Graduate School of Bioscience
and Biotechnology, Yokohama, Japan
Feng-Chun Yang Departments of Pediatrics, Herman B Wells Center for Pediatric
Research, School of Medicine, Indiana University, Indianapolis, IN, USA
Guang Yao Department of Molecular and Cellular Biology, University of Arizona,
Tucson, AZ, USA
Weiwei Yin Department of Biomedical Engineering, Georgia Institute of Technol-
ogy and Emory University, Atlanta, GA, USA
Elizabeth Yohannes Center for Proteomics and Bioinformatics, School of
Medicine, Case Western Reserve University, Cleveland, OH, USA
Lingchong You Biomedical Engineering Department, Duke University, Durham, NC,
USA
Tao You Computational Biology, Innovative Medicines, AstraZeneca,
Macclesfield, Cheshire, UK
Tommy Yu Auckland Bioengineering Institute, University of Auckland, Auckland,
New Zealand
Choamun Yun Department of Chemical and Biomolecular Engineering, Korea
Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Hongseok Yun Department of Chemical and Biomolecular Engineering, Korea
Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Contributors xlvii

Judit Zámborszky The Microsoft Research - University of Trento Centre for


Computational and Systems Biology, CoSBi, Trento, Italy
An-Ping Zeng Institute of Bioprocess and Biosystems, Technical University
Hamburg-Harburg, Hamburg, Germany
Yi Zeng Department of Pediatrics, University of Arizona, Tucson, AZ, USA
Marie Lisandra Zepeda-Mendoza Center for Genomics Sciences-UNAM,
Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
Uwe Klaus Zettl Department of Neurology, Institute of Immunology, University of
Rostock, Rostock, Germany
Junhua Zhang Academy of Mathematics and Systems Science, Chinese Academy
of Sciences, Beijing, China
Key Laboratory of Random Complex Structures and Data Science, Chinese Academy
of Sciences, Beijing, China
National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy
of Sciences, Beijing, China
Minlu Zhang Department of Computer Science, University of Cincinnati,
Cincinnati, OH, USA
Shihua Zhang National Center for Mathematics and Interdisciplinary Sciences,
Academy of Mathematics and Systems Science, Chinese Academy of Sciences,
Beijing, China
Shu-Qin Zhang School of Mathematical Sciences, Fudan University, Shanghai,
China
Xiujun Zhang Institute of System Biology, Shanghai University, Shanghai, China
Yan Zhang Key Laboratory of Systems Biology, Shanghai Institutes for Biological
Sciences, Chinese Academy of Sciences, Shanghai, China
Zhong-Yuan Zhang School of Statistics, Central University of Finance and Eco-
nomics, Beijing, China
Shan Zhao Department of Pharmacology and Systems Therapeutics, Mount Sinai
School of Medicine, New York, NY, USA
Xing-Ming Zhao Institute of System Biology, Shanghai University, Shanghai,
China
Yingdong Zhao Division of Cancer Treatment and Diagnosis, National Cancer
Institute, Rockville, Maryland, USA
Huiru Zheng School of Computing and Mathematics, Computer Science Research
Institute, University of Ulster, Jordanstown, UK
Tianshou Zhou School of Mathematics and Computational Sciences, Sun Yet-Sen
University, Guangzhou, Guangdong, China
A

5-azaCytosine References

Vani Brahmachari and Shruti Jain Kaminskas E, Farrell AT, Wang YC, Sridhara R, Pazdur R
(2005) FDA drug approval summary: azacitidine
Dr. B. R. Ambedkar Center for Biomedical Research,
(5-azacytidine, Vidaza) for injectable suspension. Oncolo-
University of Delhi, Delhi, India gist 10(3):176–182

Synonyms

Azacitidine 7-Aminoactinomycin D

Definition ▶ Lymphocyte Labeling, Cell Division Investigation

It is the non-methylable analogue of cytosine base. It is


not a naturally occurring base but can be incorporated
into DNA during replication and into RNA during
transcription by growing the cells in media containing ABCB1
5-azacytidine as a drug or incorporated during
in vitro reaction. It is also known to inhibit DNA ▶ ATP-binding Cassette B1 Transporter
methyltransferases (DNMTs) thereby, causing lack of
methylation in DNA sequence, affecting the interac-
tion between regulatory protein and the nucleic acid
target. The inhibition of methylation occurs through
the formation of stable complexes between the mole- Abduction
cule and DNMTs, thereby saturating cell methylation
machinery. It is known to target the CpG islands in the Angelika Kimmig
human genome, especially in the promoter regions of Departement Computerwetenschappen, Katholieke
genes susceptible to aberrant hypermethylation. Such Universiteit Leuven, Heverlee, Belgium
analogues can be used for modulation of DNA meth-
ylation (Kaminskas et al. 2005).
Definition
Cross-References
Abduction is the task of finding an explanation for an
▶ Epigenetics, Drug Discovery observation with respect to the given knowledge.

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
A 2 Abortive Transcription

Characteristics References

The key principle underlying abduction is that Flach PA, Kakas AC (2000) Abduction and induction – essays
on their relation and integration. Kluwer, Dordrecht
of hypothesizing a set of facts that, together with the
Kakas AC, Kowalski RA, Toni F (1992) Abductive logic
available knowledge, explain a specific observation. For programming. J Log Comput 2(6):719–770
instance, given background knowledge about flying and Poole D (1993) Probabilistic Horn abduction and Bayesian
non-flying objects, including a rule stating that normal networks. Artif Intell 64:81–129
birds fly (8x:birdðxÞ ^ normalðxÞ ! fliesðxÞ), the
observation that Tweety flies (flies(tweety)) can be
explained by abducing that Tweety is a normal bird Abortive Transcription
({bird(tweety), normal(tweety)}).
More formally, given background knowledge B, Nobuo Shimamoto
a set A of abducibles (i.e., facts that can be part of Faculty of Life Sciences, Kyoto Sangyo University,
explanations), and a ground fact or observation o, the Kyoto, Japan
task of abduction is to find an explanation for o, that is,
a set of facts EA such that o can be inferred from B∪E
using ▶ deduction. Definition
Similarly to ▶ induction, abduction can be seen as
a form of inverted ▶ deduction. However, whereas The iterative synthesis of short oligo RNA of typically
induction finds general rules from several examples, 2–15 bases long by a promoter complex.
abduction assumes these rules to be part of the
background knowledge and instead finds ground facts Cross-References
explaining a single example (Flach and Kakas 2000).
The principle of abduction goes back to the ▶ Transcription in Bacteria
philosopher and logician Charles Sanders Peirce, who
viewed it as the part of scientific methodology seeking
explanations that turn surprising observations into Absolute Protein Quantification
plausible consequences of an existing theory.
In the context of Bayesian networks, finding the ▶ Selective Reaction Monitoring
most probable explanation (MPE), that is, the most
likely values of all non-observed variables given
observed values for a subset of variables, is some- Absorption Spectroscopy
times also considered abductive inference. A similar
idea, albeit in the context of a first order logic lan- Xiaohua Wu
guage and with a clear distinction between abducibles Department of Pediatrics, Herman B Wells Center for
and the rest of the theory, is used in probabilistic Pediatric Research, Indiana University School of
horn abduction (PHA) (Poole 1993). PHA associates Medicine, Indianapolis, IN, USA
probabilities to abducibles, thus allowing one to
choose explanations based on their probabilities. In
abductive logic programming (Kakas et al. 1992), Definition
integrity constraints are used to restrict the set of
possible explanations. A technique in which the power of a beam of light
measured before and after interaction with a sample is
compared.
Cross-References
Cross-References
▶ Deduction
▶ Induction ▶ Spectroscopy and Spectromicroscopy
Abstraction 3 A
(Giunchiglia et al. 1997) (▶ Knowledge Representa-
Abstraction tion) with as scope to simplify a logical theory of
a domain of interest or its visualization, using heuris-
A
C. Maria Keet tics, syntactic, or semantic abstraction, whereas
KRDB Research Centre, Free University of Bozen- the first approach to abstraction is also used to make
Bolzano, Bolzano, Italy distinctions between the implementation layer, logical
design layer, and conceptual modeling layer.

Synonyms
Characteristics
Aggregation; Generalization; Grouping
Abstraction Mechanisms
There are at least three principle mechanisms of
Definition abstraction in the second sense, which are graphically
depicted in Fig. 1, and each mechanism can have
Abstraction is used in two principal distinct senses: variations. For instance, for R-ABS, one can take the
1. The process to go from instances to a universal or abstraction from the constituent parts (▶ Mereology)
concept and, in turn, its meta-level or members to the whole (e.g., the individual sheep
2. The process to go from a detailed to simplified into a flock, enzymatic reaction into a metabolic path-
representation where one chooses to ignore certain way process) and for F-ABS, one “folds” (e.g., cus-
aspects, thereby reducing the scope toward salient tomer, payment, and book onto a “BookOrder” entity
semantics. or the steps in the MAPK cascade).
For systems biology, both senses of abstraction Implicitly, abstraction forces into existence a notion
are used, where the second sense comes into play of levels or layers, and successive abstractions then
only if the former becomes too wieldy at either the generate an abstraction hierarchy, which is a close
instance or the type level. Software support is often relative of levels of ▶ granularity and relates to the
required in order to deal with abstractions in systems notion of ▶ modularity and ▶ modularization of
biology, therefore much effort has gone into research a system. Not all approaches toward, and usages of,
of abstraction. The second sense enjoys attention abstraction consider levels and hierarchies explicitly
in particular in conceptual modeling for database and reify them into entities themselves that can be
systems (Talheim 2009) and artificial intelligence manipulated. For those that do, there are different

y y y

x
x x

abs(x) = y

Abstraction, Fig. 1 Three conceptually distinct types of remodeled as a function, F-ABS folding multiple entities and
abstraction operations; x and y are entities at a finer and coarser relations into a different type of entity, D-ABS deleting semanti-
level of abstraction (indicated with an oval), and x is abstracted cally less relevant entities and relations (Source: based on Keet
into y using an abstraction function abs. R-ABS the relation is (2007))
A 4 AC#

ways to formalize abstraction and its supporting levels


and hierarchies. In the first sense of abstraction, it is Activation Domain Shielding
common for the contents at each (explicit or implicit)
abstraction level to have its own language and vocab- Tetsuro Kokubo
ulary to represent the data, information, or knowledge Department of Supramolecular Biology, Graduate
at that level. In the second sense of abstraction, the School of Nanobioscience, Yokohama City
proposed formalizations aim to solve this within one University, Yokohama, Kanagawa, Japan
logic language.

Limitations Synonyms
To date, there is no overwhelming evidence that the
solutions proposed by computing are implementable Shielding of activation domain
either in information systems for systems biology
or modeling for systems biology. Where it already
can be useful is to provide guidelines to create Definition
abstraction levels in a consistent way, that is, not ad
hoc but with guiding principles and abstraction In budding yeast, several GAL genes including
procedures. GAL1, GAL10, and GAL7 are activated by Gal4p,
which is a transcriptional activator for the GAL
regulon. Expression of the GAL genes allows the cell
Cross-References to utilize the sugar galactose as a carbon source. In
the absence of galactose, Gal4p is bound to DNA but
▶ Granularity is inactive because its activation domain (AD) is
masked by Gal80p (Fig. 1a) (Wong and Struhl 2011).
Once galactose is added to the media, the signal
References transduction protein Gal3p binds the sugar along with
ATP to allow activation of Gal4p. Specifically, the
Giunchiglia F, Villafiorita A, Walsh T (1997) Theories of galactose- and ATP-bound form of Gal3p can
abstraction. AI Commun 10(3–4):167–176
form a stable complex with Gal80p, thereby titrating
Keet CM (2007) Enhancing comprehension of ontologies and
conceptual models through abstractions. In: Basili R, out the inhibitory effect of Gal80p on Gal4p
Pazienza MT (eds) 10th congress of the Italian association (Lavy et al. 2012). It is not known whether Gal80p
for artificial intelligence (AI*IA 2007), Springer Lecture is released from the transcription complex upon
Notes in Artificial Intelligence 4733, pp 814–822
activation or instead remains in complex with
Talheim B (2009) Abstraction. In: Liu L, Tamer Ozsu M (eds)
Encyclopedia of database system. Springer, New York Gal3p, possibly in a different position or orientation.
Regardless, upon binding of Gal3p and Gal80p, the
AD of Gal4p becomes able to recruit a range of
transcriptional coactivators. Such shielding of ADs
may represent the simplest mode of action by which
AC# transcriptional corepressors like Gal80p negatively
regulate gene expression.
▶ Database Accession Number The Tup1p-Cyc8p complex of budding yeast is
a well-known transcriptional corepressor that represses
transcription of several hundreds of genes. This com-
plex is widely conserved among eukaryotes and is
Activating Complexes therefore likely to be crucial for proper regulation of
gene expression. Previously, it was believed that
▶ Trithorax Complexes Tup1p-Cyc8p represses transcription by several
Activation Domain Shielding 5 A
Activation Domain
Shielding,
Fig. 1 Transcriptional
repression by shielding the A
activation domains (ADs) of
transcription activators.
(a) Regulation of Gal4p by
Gal80p. Under repressing
conditions, the AD of Gal4p is
shielded by Gal80p. In
derepressing conditions,
Gal80p dissociates from
Gal4p, exposing the AD of
Gal4p for recruitment of
various coactivators.
(b) Regulation of some
activators by Tup1p-Cyc8p.
Under repressing conditions,
Tup1p-Cyc8p prevents
activation by shielding the
ADs of these activators. Upon
derepression, these activators
may undergo specific
conformational changes that
affect their interactions with
Tup1p-Cyc8p, such that the
ADs can now recruit various
coactivators. In contrast to
Gal80p, Tup1p-Cyc8p
remains at the promoter,
where it contributes somehow
to the activation process.
When repression needs to be
reestablished, Tup1p-Cyc8p
may help to remove these
coactivators from the
promoter

distinct mechanisms that function redundantly, e.g., by Tup1p-depleted cells. Therefore, at least some tran-
recruiting histone deacetylases (HDACs), positioning scriptional repressors can be converted to transcrip-
nucleosomes at inhibitory sites, or inhibiting Mediator tional activators by exchange of their corepressor
function at promoters. Recently, however, it has been accessory factors for coactivating factors. However, it
proposed that the major function of Tup1p-Cyc8p is to remains possible that some other transcriptional repres-
shield ADs from interacting with coactivators such as sors may repress transcription more actively, e.g., in the
SWI/SNF, SAGA, and Mediator (Fig. 1b) (Wong and manner previously proposed for Tup1p-Cyc8p.
Struhl 2011). In this regard, Tup1p-Cyc8p and Gal80p
are functionally similar, although the former is
not specific for the GAL regulon and functions more Cross-References
widely than the latter. Consistent with this newer model,
the binding sites of Tup1p-Cyc1p are located very close ▶ Mechanisms of Transcriptional Activation and
to those of coactivators, as revealed by studies of Repression
A 6 Activator

References Definition

Lavy T, Kumar PR, He H, Joshua-Tor L (2012) The Gal3p Given a set of unlabeled training examples, an active
transducer of the GAL regulon interacts with the Gal80p
learning algorithm is allowed to select a number of
repressor in its ligand-induced closed conformation. Genes
Dev 26(3):294–303 training examples (one by one or in batches) and to
Wong KH, Struhl K (2011) The Cyc8-Tup1 complex inhibits obtain their target value at a certain cost per example.
transcription primarily by masking the activation domain of Next, the learning algorithm should predict the target
the recruiting protein. Genes Dev 25(23):2525–2539
value of a number of unseen test examples, incurring
a certain cost per mistake. The goal of the active
learning algorithm is to minimize the total cost.
Activator

Jianhua Ruan Characteristics


Department of Computer Science, University of Texas
at San Antonio, San Antonio, TX, USA Obtaining target values of training examples may be
expensive. For example, in experimental research
requiring microarrays, bioassays, patients, clinical tri-
Definition als, mass-spectrography, crystallography, or other
physical ▶ experiments in order to collect data, the
A DNA-binding protein that stimulates transcription of amount of available resources limits the amount of
one or more genes. Most activators recruit RNA poly- experimental data which can be obtained.
merase to promoter region by interacting directly with In an active learning setting, the learning algorithm
a subunit of DNA polymerase and serving as a liaison must take this economic reality into account. Every
between RNA polymerase and DNA, or change DNA target value has a cost, and once a model is learned
conformation to promote the transition to the open every prediction mistake also has a cost. The goal is to
complex required for initiation of transcription. be as economical as possible.

Problem Settings
Active Labeling During Cell Division There are several variations of the active learning
problem. First, active learning can be seen as learning
▶ Quantifying Lymphocyte Division, Methods by asking queries to some oracle (the teacher, experi-
mental setup, etc.) (Angluin 1988). Different query
types have been defined, among which the most impor-
Active Labeling Incorporation tant ones are:
• The membership query, which asks whether a par-
▶ Lymphocyte Labeling, Cell Division Investigation ticular example belongs to a particular class or not
• The equivalence query, which asks whether
a particular learned model is equivalent to the con-
Active Learning cept to be learned and asks for a counterexample if
the learned concept is not correct
Jan Ramon The equivalence query is very powerful and there-
Declarative Languages and Artificial Intelligence fore not very useful in practical applications. In partic-
Group, Katholieke Universiteit Leuven, Leuven, ular, it is usually not possible to verify whether a theory
Belgium is perfect (due to the absence of perfect experimenta-
tion equipment, resources or sufficiently knowledge-
able experts), and for many datasets it is even unlikely
Synonyms that a perfect theory can be derived. The membership
query lets the oracle return the target value of a given
Experimental design; Learning by queries example, and is more usual in most experimental
Active Learning 7 A
research. In the case of noisy experiments, the mem- the active learning algorithm is used directly by an
bership query may only give an approximative value experimental setup. One successful application was in
and it may be useful to ask the same query again to get the field of functional genomics (King et al. 2004).
A
a more precise measurement. Even so, the value of active learning strategies has
Second, variation is possible in the interaction pro- been demonstrated in several application areas, either
cedure between the learner and the oracle. Most com- on benchmark data or in comparative experiments with
mon is the situation where the learner asks queries one standard experimental designs. These domains include
by one and the oracle provides the answers immedi- drug discovery (De Grave et al. 2008), nanofiltration
ately. In practice, this is not always realistic. State-of- (Cano-Odena et al. 2010), sensor placement (Guestrin
the-art high-performance experimental setups et al. 2005), and robot gait optimization (Lizotte et al.
(microarrays, bioassays, . . .) process several experi- 2007).
ments simultaneously in batches or in pipelines. It is
therefore more efficient in practice to ask a next query Methods
before the first one has been answered. Several methods for active learning have been
Third, different prediction objectives are possible. described in the literature, for example, (Angluin
In the normal setting of active learning, the algorithm 1988; Tong and Koller 2001a; Lizotte et al. 2007;
is scored on prediction performance on a test set drawn Guestrin et al. 2005). Here, we mainly summarize
from the same distribution as the training set (Kearns a few general principles:
and Vazzirani 1994). However, one is not always most • When learning a predictive model, a common strat-
interested in learning a model of all aspects of the egy is to select instances for which the prediction
instance space. In some cases, one wants to find the using the current model is most uncertain. This is
best instance in the instance space according to some a particularly useful strategy for predictive models
criterion. Accordingly, one is interested in a model that which also report confidence, such as Gaussian
is especially accurate for the instances that are good processes (Guestrin et al. 2005).
while an accurate prediction of the value of bad • Alternatively, for version space–based algorithms,
instances is not very important. For example, in drug a good strategy is to try as well as possible to halve
discovery (De Grave et al. 2008), one wants to find the the version space with each query. For example,
“best” (most active, least toxic, . . .) molecule in when the parameters of a model are known to be in
a library of molecules. some region, it is good to choose instances to query
Finally, one can distinguish between membership such that this region is maximally reduced by the
query–based approaches where the learning algorithm answer. Tong and Koller (2001b) discuss the appli-
should provide the instances for which the class should cation of such strategies for support vector machines.
be obtained, and the membership query–based • When choosing several instances to query, a trade-
approaches where the algorithm can choose from off must be made between informativeness and
a predefined pool of instances. The latter is more real- diversity of the instances. In particular, choosing
istic, as in the first case it may happen that the learner several informative but very similar instances may
asks to perform a practically hard or impossible to not give a maximal total amount of new informa-
realize experiment. For example, for drug discovery tion. For example, in drug discovery one often
processes, large libraries of purchasable compounds investigates molecules similar to molecules known
are available. Other molecules can be tested too, but to have some activity in the hope to find a better
having to synthesize such molecules oneself drasti- variant, but one also performs a wider screening in
cally increases the cost of the experiment. the hope to find new regions in molecular space
where active compounds can be found.
Applications • Another idea arises from ▶ ensemble learning.
Active learning can be applied in a wide range of When several different models are learned on the
different applications. same data, they may agree on a part of the test
In principle, it is most useful in experimental instances and disagree on another part of the test
research. Only few systems have been implemented in instances. In such a case, Liere and Tadepalli (1997)
a completely automated closed loop where the output of argues that it is good to query the target values of
A 8 Actomyosin Ring

the instances on which the committee of classifiers Tong S, Koller D (2001b) Support vector machine active learn-
learned so far disagrees most. ing with applications to text classification. J Mach Learn Res
2:45–66
• When attempting to find the best instances in a set,
one will focus on the best instances according to the
quality criterion. A typical strategy among several
alternatives (De Grave et al. 2008) is to select Actomyosin Ring
instances optimistically, for example, by adding to
the current prediction a few standard deviations on Sebastian Mana-Capelli and Dannel McCollum
that prediction. This makes a trade-off between Department of Molecular Genetics and Microbiology,
selecting instances which will probably have University of Massachusetts, Worcester, MA, USA
a high quality and selecting instances in poorly
explored regions of the instance space.
Synonyms

Cross-References Contractile actomyosin ring (CAR); Contractile ring

▶ Ensemble
▶ Experiment Definition

The actomyosin ring is the most archetypical structure


of cytokinesis. The ring contains actin filaments cross-
References
linked by the motor protein type II MYOSIN as well
Angluin D (1988) Queries and concept-learning. Mach Learn
as numerous other actin cytoskeletal components.
2:319–342 Constriction of the actomyosin ring results in furrow
Cano-Odena A, Spilliers M, Dedroog T, De Grave K, Ramon J, ingression and cytokinesis. The precise mechanisms
Vankelecom IFJ (2010) Micropollutant removal via genetic that connect the contractile ring to the cell membrane
algorithms and high throughput experimentation. J Membr
Sci 366(12):25–32
are still not clear (Eggert et al. 2006). The ingression
De Grave K, Ramon J, De Raedt L (2008) Active learning force generated by this structure has been traditionally
for high throughput screening. In: Proceedings of the explained through the “purse and string” model
eleventh international conference on discovery science, (Schroeder 1972) whereby bipolar myosin filaments
Budapest. Lecture Notes in Computer Science, vol 5255,
pp 185–196
walk along actin filaments to bring about constriction
Guestrin C, Krause A, Singh AP (2005) Near-optimal of the ring in a manner similar to muscle constriction.
sensor placement in Gaussian processes. In: Proceedings of As constriction proceeds, acting filaments are
the 22nd international conference on machine learning, disassembled and as a result the thickness of the con-
Bonn, pp 265–272
Kearns M, Vazzirani U (1994) An introduction to computational
tractile ring remains approximately constant.
learning theory. MIT Press, Cambridge, MA
King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH,
Muggleton SH, Kell DB, Oliver SG (2004) Functional geno- Cross-References
mic hypothesis generation and experimentation by a robot
scientist. Nature 427:247–252
Liere R, Tadepalli P (1997) Active learning with committees for ▶ Cytokinesis
text categorization. In: Proceedings of the 14th conference
of the American association for artificial intelligence
(AAAI-97), Providence, pp 591–596
Lizotte D, Wang T, Bowling M, Schuurmans D (2007) Auto- References
matic gait optimization with Gaussian process regression.
In: Proceedings of the 20th international joint conference Eggert US, Mitchinson TJ, Field CM (2006) Animal cytokinesis:
on artificial intelligence, Hyderabad, pp 944–949 from parts list to mechanisms. Annu Rev Biochem
Tong S, Koller D (2001) Active learning for structure in 75:543–566
Bayesian networks. In: Proceedings of the seventeenth Inter- Schroeder TE (1972) The contractile ring: II. Determining its
national joint conference on artificial intelligence (IJCAI), brief existence, volumetric changes, and vital role in cleaving
Seattle. Morgan Kaufman, Washington, pp 863–869 Arbacia eggs. J Cell Biol 53:419–434
Adaptation 9 A
of naturalists in front of the adaptation of species
Acute Upper Respiratory Tract Infections to their milieu has been explained by Darwin’s
hypothesis of natural selection as the main
A
▶ Viral Respiratory Tract Infections process accounting for organismal traits. Natural
selection explains the presence of a trait, but this
explanation is not complete if one does not know an
“adaptation for what” the trait is, i.e., to which selec-
Acyclic Digraph tive pressures it owes its existence (Brandon 1990).
Notably, a trait can be shown to be an adaptation, i.e.,
▶ Directed Acyclic Graph shaped by natural selection, whereas we do not know
an adaptation for what (e.g., Kreitmann tests on
genome sequences may show they result from selec-
tion but give no idea of the reasons for which they
Adaptation have been selected).
Yet in the context of the explanation of mainte-
Philippe Huneman nance of traits, some behavioral ecologists defined
Institut d’Histoire et de Philosophie (IHPST), des an adaptation as the fittest phenotypic variant, not-
Sciences et des Techniques, Université Paris 1 withstanding its evolutionary origin (Reeve and
Panthéon-Sorbonne, Paris, France Sherman 1993), especially because many evolution-
ary explanations do not take the historical context into
account.
Definition Concerning the link between adaptations and adapt-
edness, one could argue that the more adaptations an
“Adaptation” may name a process or a state, can organism has, the more adapted it is; in this sense,
be used in physiological or evolutionary contexts, evolutionary biologist Julian Huxley called organisms
and concerns organisms or traits. An individual “bundles of adaptations” (though this not exactly being
organism has abilities to physiologically adapt to its his own view), yet this position can be criticized as too
environment, for example, by changing the values much adaptationist. The set of adaptations characterizing
of some parameters of its metabolism (pulse, body organisms constitutes what is often called its “design.”
temperature, etc.). Its being adapted is the result Given that natural selection somehow increases inclu-
of such physiological process. In evolutionary sive fitness, it is plausible to say that organisms are
biology, it may be useful to talk of adaptedness designed to maximize their inclusive fitness.
of organisms, namely, their being adjusted to their However, two adaptations for distinct environmen-
environments, and of traits themselves as adapta- tal demands can be conflicting, and therefore they do
tions. Adaptedness is always relative to an environ- not increase the fitness of organisms in an additive
ment. Traits as adaptations, fitness and adaptedness of manner. For example, risky behavior in some species
organisms are basically related by natural selection; are shown to have the function of attracting
namely, the process by which the more adapted female mates (e.g., through the “handicap principle,”
organisms, having higher chances of survival and see ▶ Sexual Selection), yet it conflicts with many
reproduction (i.e., higher fitness), on the average adaptations which increase the survival chances of
leave more offspring and hence increase the fre- the organisms (fear reactions, etc.)
quency of their heritable traits in the next generations With a given genotype, some individuals may
and finally lead to the fixation of those traits, namely, display a range of various phenotypes adapted to their
the adaptations. environment – this is called “phenotypic plasticity.”
In neo-Darwinian biology, a trait is an adaptation For instance, some species of fish will turn into either
when it results from natural selection (Burian 2005). males or females depending on the temperature of
To this extent, adaptation is defined by natural water. It is useful to distinguish this ▶ plasticity from
selection; calling a trait an adaptation is therefore the evolutionary notion of adaptation, which refers to
a historical statement. The longstanding amazement the genotypic underpinning of traits.
A 10 Adaptationism

Cross-References empirical claim that organic nature is perfectly adapted


or that natural selection always leads to optimization.
▶ Cell Cycle Checkpoints
▶ Explanation, Evolutionary
Cross-References

References ▶ Explanation, Evolutionary

Brandon R (1990) Adaptation and environment. MIT Press,


Cambridge
References
Burian R (2005 [1983]) Adaptation. In: Burian R (ed) The
epistemology of development, evolution, and genetics. Cam- Godfrey-Smith P (2001) Three kinds of adaptationism. In: Orzack
bridge University Press, Cambridge, pp 54–80 SH, Sober E (eds) Adaptationism and optimality. Cambridge
Reeve HK, Sherman P (1993) Adaptation and the goals of University Press, Cambridge, pp 335–357
evolutionary research. Quart Rev Biol 68:1–32 Gould SJ, Lewontin R (1978) The spandrels of san marco
and the panglossian paradigm: a critique of the adaptationist
programme. Proc R Soc Lond B 205:581–598
Maynard-Smith J (1984) Optimization theory in evolution. Ann
Rev Ecol Syst 9:31–56

Adaptationism

Philippe Huneman Adaptive Immune Response Cascade


Institut d’Histoire et de Philosophie (IHPST), des
Sciences et des Techniques, Université Paris 1 ▶ Adaptive Immune System
Panthéon-Sorbonne, Paris, France

Adaptive Immune System


Definition
Shoba Ranganathan
A position which assumes that each trait of an organ- Department of Chemistry and Biomolecular
ism is the result of an optimization process, driven by Sciences and ARC Center of Excellence in
natural selection, and then is somehow optimal. Steven Bioinformatics, Macquarie University, Sydney,
Jay Gould and Richard Lewontin (1978) famously NSW, Australia
forged the term and criticized it because it overlooks Department of Biochemistry, Yong Loo Lin School of
the fact that organisms display a cohesive unity and Medicine, National University of Singapore,
because adaptationist explanations are often not Singapore
falsifiable. However, Godfrey-Smith (2001) usefully
distinguished three brands of adaptationism:
adaptationism can be either methodological Synonyms
(Maynard-Smith 1984) and then unlikely to be right
or wrong; or an empirical claim, whose testing is not Adaptive immune response cascade; Adaptive
yet uncontroversial; or finally explanatory, namely, immunity
expresses the belief that the most “interesting traits”
(e.g., complex adaptations) are due to selection (yet the
notion of “interesting” is irreducibly subjective). Definition
Methodological adaptationism emphasizes that opti-
mality models allow one to discover constraints – The adaptive immune system is a collective term
e.g., developmental or historical constraints – which given to a group of highly specialized, systematic cells
prevent real organisms to reach the optima predicted and processes that prevent vertebrates from certain
by the models; therefore, it does not underlie an death by pathogenic infections (Alberts et al. 2002).
Adjuvants 11 A
Cross-References
Adhesion Molecule
▶ Major Histocompatibility Complex (MHC)
A
▶ TR Recognition of MHC-Peptide Complexes ▶ Adhesin

References Adipocytes
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P Steven D. Rhodes
(2002) The adaptive immune system. In: Molecular
School of Medicine, Indiana University, Indianapolis,
biology of the cell, 4th edn. Garland Science, New York,
pp 1363–1421 IN, USA

Definition

Adaptive Immunity Adipocytes are responsible for the storage of energy


in the form of lipids (fat). Adipocytes are derived
▶ Adaptive Immune System from mesenchymal stem cells and reside in the
adipose tissue. Adipocyte differentiation can be
induced in vitro from mesenchymal stem cell cultures
in the presence of factors such as dexamethasone,
isobutilmethylxanthine, indomethacin, and insulin.
Ada-Two-A-Containing Adipocytes can be identified phenotypically by posi-
tive Oil red O staining, indicating the accumulation
▶ ATAC of lipid droplets within the cytoplasm. Molecular
markers for adipocytes include peroxisome
proliferator-activated receptor gamma 2 (PPARg2),
C/EBPb, aP2, adipsin, leptin, and lipoprotein lipase.

Adhesin
Cross-References
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome ▶ Single Cell Assay, Mesenchymal Stem Cells
Informatics, Institute of Genomics and Integrative
Biology, Delhi, India

Adjuvants
Synonyms
Gajendra Raghava
Adhesion molecule Bioinformatics Centre, Institute of Microbial
Technology, Chandigarh, Chandigarh, India

Definition
Definition
Adhesins are cell surface–localized proteins helping in
the process of adhesion. In pathogens, adhesins consti- These are molecules used to improve the efficacy of
tute major virulence factor helping in the process of vaccine mostly through delivery of the vaccine in
adherence and colonization. controlled release mode and at the desired location.
A 12 Adult T-Cell Leukemia

Adult T-Cell Leukemia, Table 1 Clinical subclasses of ATL


Adult T-Cell Leukemia ATL subclasses Symptoms
Acute High number of ATL cells; frequent skin
Christian Sch€onbach lesions, systemic lymphadenopathy, and
Department of Bioscience and Bioinformatics, Kyushu hepatosplenomegaly
Institute of Technology, Iizuka, Fukuoka, Japan Chronic Slightly elevated white blood cell count;
skin lesions, lymphadenopathy, and/or
hepatosplenomegaly
Smoldering Low number ATL cells; skin or pulmonary
Synonyms lesion
Lymphoma Systemic lymphadenopathy; low number of
Adult T-cell leukemia/lymphoma; ATL; ATLL abnormal cells in peripheral blood

Definition Cross-References

HTLV-1 is the only known human retrovirus with onco- ▶ HTLV, Cellular Transcription
genic potential. About 5% of infected HTLV-1 individ-
uals develop ATL, a non-Hodgkin type lymphoma with References
abnormal mature CD4+ T-cell phenotypes and extremely
long latency of up to 60 years. HTLV-1 infections and Bazarbachi A, Plumelle Y, Ramos JC, Tortevoye P, Otrock Z,
Tax-mediated transformation of CD4+ T-cells alone is Taylor G, Gessain A, Harrington W, Panelatti G, Hermine O
not sufficient to trigger ATL. Other factors that are (2010) Meta-analysis on the use of Zidovudine and
Interferon-Alfa in adult T-Cell leukemia/lymphoma showing
necessary to develop ATL include proviral load, pro- improved survival in the leukemic subtypes. J Clin Oncol
gressive weakening of the host immune system, genetic 28(27):4177–4183. Epub ahead of print
background, virus-mediated genetic and epigenetic Higuchi M, Fujii M (2009) Distinct functions of HTLV-1
changes resulting in DNA damage, and clonal expansion Tax1 from HTLV-2 Tax2 contribute key roles to viral path-
ogenesis. Retrovirology 6:117
of altered CD4+ T-cells (Higuchi and Fujii 2009). Takatsuki K (2005) Discovery of adult T-cell leukemia.
Infections with HTLV-2 may result in HAM/ Retrovirology 2:16
TSP-like disease, but do not cause ATL or other
forms of leukemia/lymphoma. Although HTLV-2 is
transforming infected CD4+ T-cells, differences in Tax
proteins of HTLV-1 and HTLV-2 and their molecular Adult T-Cell Leukemia/Lymphoma
interactions with host proteins are thought to be
responsible for their distinct disease outcomes ▶ Adult T-Cell Leukemia
(Higuchi and Fujii 2009).
Typical ATL symptoms are skin lesions, lymphade-
nopathy, and hepatosplenomegaly. ATL is often Adverse Events
accompanied by hypercalcemia and increased osteo-
clast numbers. Clinically, ATL is divided into four Ravi Iyengar
subclasses (Table 1) (Takatsuki 2005). Department of Pharmacology and Systems
At present, there is no effective treatment of ATL. Therapeutics, Mount Sinai School of Medicine,
Chemotherapy, INFA2 plus antiviral treatment, human- New York, NY, USA
ized monoclonal IL2 antibody therapy, or allogeneic
hematopoietic stem cell transplantation yield mixed
results (Takatsuki 2005). Nevertheless, INFA2 combined Definition
with antiviral zidovudine (AZT) treatment prolonged the
survival time and improved clinical symptoms in patients Complications and/or undesirable side effects that are
with acute, chronic, and smoldering ATL, except lym- associated with pharmacological treatments, which
phoma-type ATL (Bazarbachi et al. 2010). can be the result of unpredicted interactions.
Agent-based Modeling 13 A
Cross-References or in vivo biological experiments in order to validate
and calibrate these rules according to relevant data.
▶ Biomarker Discovery, Knowledge Base In the field of cancer research, the use of ABM to
A
▶ Systems Pharmacology simulate cancer growth per se is not new (Wang and
Deisboeck 2008). Some sophisticated ABMs have
been developed to identify and quantify the relation-
ship between individual molecular properties,
Aerobic Glycolysis their microenvironmental conditions, and the overall
tumor morphology. It should be noted that current
▶ Warburg Effect ABMs often model extracellular factors as continu-
ous quantities, thereby rendering the models hybrid
in nature. The ABM approach allows researchers to
investigate one of the most important problems in
Agent-based Modeling cancer research – tumor heterogeneity, an inherent
feature of cancer cells. This is achieved by addressing
Zhihui Wang and Thomas S. Deisboeck the role of diversity in cell populations and
Harvard-MIT (HST) Athinoula A. Martinos Center for also within each individual cell. In current ABMs,
Biomedical Imaging, Massachusetts General Hospital, tumor growth and invasion patterns are neither
Charlestown, MA, USA predefined nor intuitive, but emerge as a result of
individual dynamics, which include cell–cell and
cell–microenvironment interactions and intercellular
Synonyms signaling. However, a major drawback of ABMs is
that they are generally too detailed to simulate over
Individual-based modeling (IBM) a long period of time, particularly in large, 3D
domains, and as a consequence current ABMs can
only process a relatively small number of cells.
Definition Hence, a more innovative way of thinking about
ABM is required to solve this problem in order to
Agent-based modeling (ABM), also referred to as move ABMs toward clinical application. We have
individual-based modeling (IBM), simulates the inter- seen a number of new methods being developed in
actions of autonomous entities (i.e., the agents) with the field (interested readers should refer to Wang and
each other and their local environment to predict Deisboeck (2008) for details).
higher-level emergent phenomena (Bonabeau 2002).
In biomedical research, while an agent can
represent a part of a cell or a cluster of cells, the ideal Cross-References
candidate for a software agent is now more commonly
recognized to be an individual cell (Walker and ▶ Multilevel Modeling, Cell Proliferation
Southgate 2009), since models can benefit from such
a direct one-to-one mapping between real and
virtual cells in terms of parameter acquisition from
experiments and model validation. As a simulation References
progresses, agents (representing individual cells)
Bonabeau E (2002) Agent-based modeling: methods and tech-
interact or communicate with other agents and their
niques for simulating human systems. Proc Natl Acad Sci
common microenvironment according to a set of USA 99(Suppl 3):7280–7287
predefined, biomedical data-driven “rules.” Because Walker DC, Southgate J (2009) The virtual cell–a candidate co-
an ABM’s simulation results are highly dependent ordinator for “middle-out” modelling of biological systems.
Brief Bioinform 10:450–461
on these rules, it is necessary to tightly couple these
Wang Z, Deisboeck TS (2008) Computational modeling of brain
algorithms at all stages of model development with tumors: discrete, continuum or hybrid? Sci Model Simul
iterative in silico studies as well as available in vitro 15:381–393
A 14 Agent-based Models, Discrete Models and Mathematics

incorporate the stochasticity inherent in biology. Out-


Agent-based Models, Discrete Models puts of ABM are frequently graphical in nature and
and Mathematics easily characterized by patterning metrics that define
the architectural features of cellular organizations at
Shayn M. Peirce the tissue level. Together, these attributes make ABM
Department of Biomedical Engineering, University of a useful tool in biological and biomedical research for
Virginia, Charlottesville, VA, USA investigating how complex and even somewhat ran-
dom interactions among individual cells at lower
levels of spatial scale integrate to produce emergent
Synonyms results at higher levels of scale.

Individual-based modeling; Multi-agent simulation


Characteristics

Definition Assumptions
There are a number of underlying assumptions that are
Agent-based modeling (ABM) is a computational typically made when constructing an ABM to study
modeling approach whereby individual entities in biological tissue patterning, and these are summarized
a complex system (e.g., biological cells) are below:
represented by discrete agents that interact autono- • Agents are finite in number, and unless otherwise
mously in simulated space and time to produce emer- specified, each agent represents a single biological
gent, often nonintuitive outcomes at the population cell within a tissue.
(e.g., biological tissue) level. ABM has roots in the • Biological cells are treated as “black boxes” in that
field of artificial intelligence and can be considered an biological phenomena at the subcellular level (e.g.,
offshoot of cellular automata (CA) modeling gene expression or intracellular signaling) are
(▶ Cellular Automata), which was first introduced in implicit and reflected in the behaviors and attributes
the 1940s by John Von Neumann (von Neumann and of agents at the cell level.
Morgenstern 2007). Unlike CA, however, agents in an • Agents operate within a bounded space whose
ABM are not confined to stationary points on a grid boarders are analogous to those of the biological
(i.e., they are mobile), they do not have to perform their tissue being simulated.
actions simultaneously at each time step, and they can • Space and/or time is represented discretely.
be heterogeneous in size or type (Grimm and Railsback A typical two-dimensional ABM simulation space,
2005). With the advent of personal computers in the for example, is partitioned into a grid of pixels on
1980s, the adoption of ABM accelerated and expanded which agents act and interact with one another and
into different fields, including ecology, epidemiology, with their simulated environment.
finance and economics, political science, the social and • The state of knowledge in a given field can be
behavioral sciences, and more recently, biological pat- effectively captured by the rule set that governs
terning. As applied to biological and biomedical inves- agent behaviors in an ABM.
tigation, ABMs are particularly well suited for • If the prediction of the ABM matches indepen-
investigating multicellular phenomena, or processes dently measured outcomes of a biological analogue,
where the independent behaviors of individual cells the rule set may, but not necessarily, capture key
acting in response to their dynamic local environment mechanisms controlling the biological system.
give rise to emergent tissue-level adaptations. In this
context, ABMs have been used to simulate and study Formulation
a variety of physiological and pathological processes Historically, biological ABMs have been used for two
including tumorigenesis, inflammation, wound primary purposes: (1) to replicate the emergent behav-
healing, and angiogenesis (Peirce et al. 2006). The iors of a complex biological system in order to better
rules governing individual agent behaviors are central understand its underlying mechanisms and (2) to
to the outcomes/predictions of ABMs and often address specific fundamental questions/hypotheses
Agent-based Models, Discrete Models and Mathematics 15 A

Agent-based Models, Discrete Models and Mathematics, cells (yellow) visualized using immunohistochemistry and con-
Fig. 1 The multicell composition of a complex tissue (rat mes- focal microscopy (a) (▶ Fluorescence Microscopy). An ABM
entery) includes microvascular endothelial cells (green) represents cells as agents and can simulate patterning outcomes
arranged in a network of vessels and interstitial progenitor at the level of the microvascular network (b)

about the underlying mechanisms of complex biolog- (e.g., matrix metalloproteinase) or bind and release
ical systems. The process of formulating an ABM is growth factors.
iterative, and few, if any, formalized protocols for the A reasonable third step is the selection of initial
construction of biological ABMs exist; however, conditions, an appropriate time step (clock increments
a general recommended stepwise approach for devel- during which events in the simulation are scheduled),
oping an ABM is summarized below. and a total duration (i.e., temporal window) for the
First, the simulation space and the agents are simulation. Frequently, time steps are discrete and rep-
defined according to the tissue being simulated and resent from millisecond to hours. During each time step,
its cellular and acellular composition (Fig. 1). the agents survey the environment, make decisions
Specifically, the simulation boundaries are set by the based on their local environment that are dictated by
size of the tissue being simulated (in two dimensions or the rule set, and exhibit behaviors. ABMs can simulate
three dimensions). Spatial boundaries, or edges of temporal windows ranging from minutes to years.
the simulation space, can be periodic, or wrapping The fourth step involves determining the relevant
from left to right and top to bottom, in the case of output metrics that capture the salient features of the
a two-dimensional simulation. An alternative bound- emergent outcome at the tissue level. It is often useful
ary condition is a closed boundary, wherein the to design metrics that quantitatively reflect the shape or
edges of the simulation space are rigid and serve as architecture of the tissue at different time steps, and/or
“walls” to prevent agents from exiting the space. The patterns of cell aggregates and multicell structures
number and phenotype of cells/agents contained within the tissue. An obvious guideline for selecting
within the tissue and the extracellular composition metrics is to include those that are utilized to describe
of the tissue are defined and ascribed to agent-types analogous patterning features in experimental data, as
in the ABM. this facilitates cross-validation of the ABM predictions
Second, the agents are endowed with the ability to against independent empirical studies.
exhibit the relevant physiological or pathological The fifth step involves compiling the set of rules that
behaviors. For example, cellular agents can proliferate, determine the behaviors of the individual agents in the
migrate, undergo apoptosis, differentiate, and/or ABM. Rules can be constructed as abstract representa-
secrete proteins, all in response to other signals in tions of known biological relationships or as specific
their simulated environment. Likewise, acellular representations of actual empirical data. Frequently
agents (e.g., collagen) may be endowed with the ability rules within a rule set are crafted as Boolean, true-
to remodel in response to enzymes in the environment false “if-then” statements, but they can also incorporate
A 16 Agent-based Models, Discrete Models and Mathematics

stochasticity and describe agent behaviors according to understanding and suggest specific experiments that
probability distributions. An example of a rule that would offer the most value moving forward.
might exist in a two-dimensional ABM simulation In summary, developing and using ABMs is an
meant to represent cell contact inhibition would be: “If iterative process that is most informative when married
an agent is within one pixel of a neighboring agent for with experimental studies to determine the underlying
more than 24 h, that agent will not divide.” The length of cell-level mechanisms of tissue patterning (Thorne
a biological ABM rule set can range from one to over et al. 2007). By integrating different types of experi-
200 rules, and can be limited by computational power mental data into a unified rule set and allowing agent/
and the available empirical data. cells to reveal the outcome of this integration in space
The first five steps described above pertain to the and time, ABMs can uncover new understanding that
design and assembly of the model, and the sixth step is is not obtainable using experimental approaches alone.
run and verify model by determining whether or not ABMs may play a role in translational research,
the output is internally consistent and, at least at an informing drug design and target identification, as
intuitive level, aligned with representative biological well as predicting side effects and determining mech-
outcomes. This step can also include performing addi- anisms of action (Vodovotz et al. 2010). Ultimately,
tional simulations in order to conduct a sensitivity ABMs should be considered as another tool in the
analysis on the parameter/rule set and to understand repertoire of systems biology approaches that is par-
if the unperturbed system reaches dynamic equilibrium ticularly useful in studying complex, multicell pro-
or steady state after multiple time steps. cesses and outcomes at the tissue level.
The final step in developing and using an ABM to
study biological processes involves performing an inde- Strengths and Weaknesses of ABM
pendent validation of the predictions by comparing Strengths
ABM outcomes to experimentally measured outcomes • Multicell behaviors can be represented at the level
and using the model to interrogate the underlying mech- of the tissue and individual cells can be tracked in
anisms of the complex system. This may include space and time in a relatively computationally effi-
performing simulations where agents or rules are sys- cient manner.
tematically removed or altered to understand the rela- • Heterogeneities in cell distribution and tissue com-
tive contributions of those factors to the overall system. position, as well as stochastic changes in biological
Removing the influences of key growth factors or events, can be modeled explicitly.
chemokines one by one, for example, would enable • ABMs produce graphical outputs that are easily
a “knockout” analysis to elucidate how the system interpreted by nonexperts in computational model-
performs in the absence of these factors. One might ing, enabling a potentially rich and fluid dialogue
perform such an analysis in order to independently between experimentalists and modelers.
predict phenotypes observed in an analogous transgenic • ABMs integrate diverse types of experimental data
knockout mouse. This step might also include testing and serve to instantiate existing and new hypotheses
alternative hypotheses by implementing them as rules in to elucidate mechanistic relationships across com-
the ABM rule set and comparing the different predic- plex biological systems.
tions to empirically measured results. This type of anal-
ysis might suggest a subset of plausible working Weaknesses
hypotheses to pursue experimentally, thus providing • Because ABMs can be very sensitive to small var-
a tool to inform the design of experiments in the wet iations in the rule set and rarely account for physical
lab and streamlining and/or accelerating the overall conservation laws explicitly, their predictions can
discovery process (Hunt et al. 2009). A common be highly irregular or nonbiological. Therefore, the
misconception about ABMs is that they are only infor- risk of trivial or irrelevant predictions should be
mative if they recapitulate features of the biological mitigated by rigorous independent validation
system. On the contrary, ABMs can be highly instruc- against experimental data.
tive when they fail to accurately predict measured • ABMs that incorporate stochastic rules require mul-
biological outcomes because they can point to specific tiple runs to fully explore the extent of possible
voids, inconstancies, and/or errors in data or outcomes.
Aging 17 A
• Depending on the size and scope of the model, von Neumann J, Morgenstern O (2007) Theory of games and
ABMs can have large numbers of rules and param- economic behavior (commemorative edition). Princeton
Classic Editions, Princeton
eters, which may make parameter identification dif-
A
ficult and require extensive sensitivity analyses to
determine the robustness of the predictions.
• ABMs do not explicitly account for intracellular Agglomerative Hierarchical Clustering
events and serve as abstractions of the biology at
this level, thus limiting their ability to provide mean- ▶ Hierarchical Agglomerative Clustering
ingful results regarding gene regulation, metabolic
processes, and intracellular signaling, for example.

ABM Tools Agglomerative Hierarchical Data


There are a number of different ABM software plat- Segmentation
forms that have been developed by individuals and
consortiums. NetLogo, MASON, Repast, and ▶ Hierarchical Agglomerative Clustering
SWARM have general applicability across a range of
disciplines, including biology, and have some of the
more active user communities. Each of these software
platforms has distinguishing features that make them Aggregation
more appealing than others for certain applications, as
reviewed previously (Railsback and Lytinen 2006). ▶ Abstraction
Other ABM software platforms have been developed
for specific fields of study (e.g., finance, education,
social sciences).
Aggregation Relation

Cross-References ▶ Mereology
▶ Meronymy
▶ Cellular Automata
▶ Fluorescence Microscopy

Aging
References
Rajeswara Babu Mythri1, Shireen Vali2 and
Grimm V, Railsback SF (2005) Individual-based modeling and
M. M. Srinivas Bharath1
ecology, Princeton series in theoretical and computational 1
biology. Princeton Classic Editions, Princeton Department of Neurochemistry, National Institute of
Hunt CA, Ropella GE, Lam TN, Tang J, Kim SH, Engelberg JA, Mental Health and Neurosciences (NIMHANS),
Sheikh-Bahaei S (2009) At the biological modeling and Bangalore, Karnataka, India
simulation frontier. Pharm Res 26:2369–400 2
Cell Works Group Inc., Bangalore, India
Peirce SM, Skalak TC, Papin JA (2006) Coupling intracellular
networks with tissue-level physiology. IBM J Res 50:1–15
Railsback SF, Lytinen SL (2006) Agent-based simulation plat-
forms: review and development recommendations. Definition
Simulations 82:609–623
Thorne BC, Bailey AM, Peirce SM (2007) Combining
experiments with multi-cell agent-based modelling to study Aging is classically defined as a long-term physiological
biological tissue patterning. Brief Bioinform 8:245–257 process associated with morphological and functional
Vodovotz Y, Constantine G, Faeder J, Mi Q, Rubin J, Bartels J, changes in the human body at the tissue and cellular
Sarkar J, Squires RH Jr, Okonkwo DO, Gerlach J, Zamora R,
level, aggravated by injury resulting in a progressive
Luckhart S, Ermentrout B, An G (2010) Translational sys-
tems approaches to the biology of inflammation and healing. imbalance of the regulatory systems including neuronal,
Immunopharmacol Immunotoxicol 32:181–95 hormonal, and immune mechanisms. Human life
A 18 Agonist

expectancy has nearly doubled since the twentieth cen- An agonist may be endogenous, i.e., a natural ligand
tury causing tremendous increase in aged population that binds to its receptor under in vivo conditions, or it
(65 years and older). The increasing number and sever- can be a synthetic, exogenous ligand.
ity of health problems associated with aging presents
one of the major health care challenges in the world.
This includes the increased propensity of aged popula-
tion to encounter neurological diseases. Akaike’s Information Criterion (AIC)
Physiological aging is an important causative factor
underlying the onset of sporadic PD. This is because the Kejia Xu
cumulative changes in the midbrain contribute to spe- Institute of Systems Biology, Shanghai University,
cific degenerative pathways and PD pathology. For Shanghai, China
example increased oxidative stress manifested by accu-
mulation of oxidatively modified proteins increases
with age. This increase magnifies several folds in late Definition
age due to a combination of increased production of
reactive species, decreased antioxidant activity, and Akaike’s Information Criterion (AIC) is a technique
impaired ability to repair or remove the modified pro- that measures the goodness of an estimated statistical
teins. Additionally, activities of the protein clearing model and selects a model from a set of candidate
cellular machinery including the ubiquitin-proteasome models. The chosen model is the one that is expected
system (UPS) and autophagy decline with age thereby to minimize the difference between the model and the
reducing the neuron’s ability to remove damaged or truth. Given a data set, several competing models may
modified proteins or protect itself from damaging free be ranked according to their corresponding AIC, and
radicals. This also leads to neurotoxic aggregation of the one having the lowest AIC will be the best.
uncleared proteins in the cell. Such cellular insults are In the general case, the AIC is defined as
compounded by other age-associated pathways such as
mitochondrial dysfunction, accumulation of iron, AIC ¼ 2k  2ln L; (1)
increased DA oxidation, increased deposits of lipids,
etc. which increase with aging. where k is the number of parameters in the statistical
model, and L is the maximized value of the likelihood
function for the estimated model.
Cross-References

▶ Disease System, Parkinson’s Disease References

Akaike H. A new look at the statistical model identification.


IEEE Trans Autom Contr. 1974;19(6):716–23

Agonist

Melissa L. Kemp
The Wallace H. Coulter Department of Biomedical Alarmins
Engineering, Georgia Institute of Technology and
Emory University, Atlanta, GA, USA ▶ Damage-associated Molecular Patterns

Definition
Algorithmic Modeling Languages
An agonist is a ligand that binds to a cell surface
receptor and triggers a response from the cell. ▶ Cell Cycle Modeling, Process Algebra
AliBaba 19 A
enables users to quickly compare KEGG pathways
AliBaba to recent findings in the literature.
A
Conrad Plake1 and J€org Hakenberg2 Usage
1
Biotechnology Center (BIOTEC), Technische AliBaba builds on the Java Web Start technology and
Universit€at Dresden, Dresden, Germany launches a client on the user’s machine. The client
2
Department of Computer Science and Department of accepts a PubMed query, sends it to the AliBaba server
Biomedical Informatics, Arizona State University, who in turn retrieves and analyzes the results from
Tempe, AZ, USA PubMed and gets back annotated abstracts. The query
syntax and semantics are exactly the same as already
known to PubMed users, and AliBaba also keeps the
Definition order of results. The results are parsed and shown on
a graphical user interface mainly consisting of three
AliBaba is a text mining (see ▶ Applied Text Mining) panels that show different aspects of the gathered
system that analyzes scientific abstracts, extracts information (see Fig. 1).
biomedical entities such as genes and diseases
(▶ Named Entity Recognition), and displays the Use Case
results as a graph depicting relationships between A patient with cough received standard treatment with
these entities, such as protein–protein interactions codeine and fell unresponsive after a while. To search
(Plake et al. 2006). AliBaba is available at: alibaba. a reason, “codeine intoxication” is a meaningful
informatik.hu-berlin.de. PubMed query, but it results in 162 abstracts. Posing
this query to AliBaba immediately shows a path from
codeine through poisoning to morphine and further to
Characteristics CYP2D6 (Fig. 1). This indicates the solution for this
case, where the patient was suffering from a CYP2D6
Literature searches form a substantial part of day-to- mutation leading to the ultrarapid metabolism of
day work in life science–related domains. Researchers codeine to morphine.
track recent developments in their fields, search for
answers to specific questions, or obtain overviews Named Entity Recognition
of known facts on a given topic. The PubMed Recognition of named entities (names referring to
(see ▶ MEDLINE and PubMed) interface to Medline proteins, species, drugs, diseases, etc.) is based on
is the most renowned search engine in the life pre-compiled word lists. Recognition of these names
sciences, indexing over 20 million abstracts from is performed using class-specific fuzzy searches.
more than 5,000 journals. Manually searching this Slight spelling variations (plural, capitalization,
vast amount is a tedious task, especially when exhaus- hyphenation) are allowed for all classes. For species,
tive information are needed, or when information we expand the word lists with common types of
about a set of objects are required. AliBaba helps abbreviations (where they are not already contained
users in analyzing large PubMed search results in the database), such as “S. cerevisiae” for the entry
by automatically extracting facts that are important “Saccharomyces cerevisiae.” Protein names appear
in molecular biology and molecular medicine, and with different variations, such as Arabic to Roman
by arranging them in an intuitive graphical represen- numbering conversions (“factor 7” versus “factor
tation. The software extracts both objects and VII”). AliBaba maps all recognized entities to their
their interrelationships, forming edges and nodes respective entries in the source databases. This
of the graph, respectively. Users can also manually enables the user to quickly access further information,
edit graphs (adding, changing, or deleting nodes for instance, protein sequences or functional descrip-
and edges) to prepare them for further usage or tions not contained in the current set of PubMed
screenshots. An interface to KEGG allows to overlay abstracts. In addition, this mapping helps to overcome
extracted networks with known pathways, which the problem of synonymity. This arises when
A 20 AliBaba

AliBaba, Fig. 1 A drug-disease-protein network extracted likewise connected to morphine and CYP2D6. The reason is that
from a PubMed search result and shown in AliBaba. The net- codeine is bioactivated by CYP2D6 into morphine, which can
work shows a connection between codeine (marked in the graph lead to life-threatening intoxication in certain patients, who
with a blue frame), cough, poisoning, and CYP2D6. Poisoning is show an ultrarapid form of this metabolism mark

different authors use different names for the same 50% for protein–protein interactions, thus requiring
entity (for example, CD95, Fas, and Apo-1 for the different approaches. AliBaba’s second strategy is
same protein). based on aligning sentences against a predefined set
of patterns learned from a training corpus. This tech-
Relation Mining nique is used to find protein–protein interactions and
AliBaba follows two different strategies for finding cellular locations of proteins. For protein–protein
associations between entities. The first is based on the interactions, the method achieves a precision of
occurrence of two entities in the same sentence. In 79% and a recall of 52%, as evaluated on an indepen-
this case, AliBaba calculates the confidence score for dent corpus. An external evaluation on the
each pair of entities depending on the number of ▶ BioCreative II.5 and the FEBS Letters Experiment
entities that occur between them. Other studies have on Structured Digital Abstracts II IPS test corpus
shown that the precision of co-occurrence-based showed a recall of 69%. The alignment score also
methods can be as high as 94% for gene-disease and defines the quality of a potential match which is
other associations; but it is, for example, less than reflected in the confidence score.
Alternative Routes 21 A
Related Tools
Tools related to AliBaba are EBIMed (▶ Retrieving a-Helix, b-Sheet, Loop Carbon-Alpha
and Extracting Entity Relations from EBIMed) and Positioning A
iHOP.
▶ Protein Structure Metapredictors

Cross-References
Alternative Routes
▶ Applied Text Mining
▶ BioCreative II.5 and the FEBS Letters Experiment Ernesto Perez-Rueda
on Structured Digital Abstracts Departamento de Ingenieŕıa Celular y Biocatálisis,
▶ MEDLINE and PubMed Instituto de Biotecnologia, Universidad Nacional
▶ Named Entity Recognition Autonóma de México, Cuernavaca, Morelos, Mexico
▶ Retrieving and Extracting Entity Relations from
EBIMed
Definition

References Two main hypotheses have been proposed to explain


how the metabolic pathways actually originated: the
Plake C, Schiemann T, Pankalla M, Hakenberg J, Leser U (2006) ▶ retrograde hypothesis and the ▶ patchwork
AliBaba: PubMed as a graph. Bioinformatics 22(19):2444–
2445, doi: 10.1093/bioinformatics/btl408. http://dx.doi.org/ hypothesis.
10.1093/bioinformatics/btl408 Recently, another mechanism associated to the
metabolical growth suggests the existence of alterna-
tive routes or alternologs, defined as branches or
enzymes that, proceeding via different metabolites,
converge in a common end product. These alternative
Alignment, Protein Interaction Networks branches contribute to genetic buffering similar to
gene duplication. It has been proposed that the origin
▶ Graph Alignment, Protein Interaction Networks of alternative branches is closely related to different
environmental metabolite sources and lifestyles
among species.

Allergen Cross-References

▶ Evolution of Metabolism, Amino Acid Biosynthesis


Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome Pathways
Informatics, Institute of Genomics and Integrative
Biology, Delhi, India
References

Conant GC, Wagner A (2003) Asymmetric sequence divergence


Definition of duplicate genes. Genome Res 13:2052–2058
Hernandez-Montes G, Diaz-Mejia JJ, Perez-Rueda E, Segovia L
Allergens are substances (proteins, carbohydrates, (2008) The hidden universal distribution of amino acid
biosynthetic networks: a genomic perspective on their ori-
particles, pollen grains, etc.) to which the body
gins and evolution. Genome Biol 9:R95
mounts a hypersensitive immune response typically Horowitz NH (1945) On the evolution of biochemical syntheses.
of Type I. Proc Natl Acad Sci USA 31:153–157
A 22 Alternative Splicing

The role of alternative splicing is important as it is


Alternative Splicing a premier mechanism (>50 % of transcripts) for defin-
ing tissue specificity. As such, tissue-specific mecha-
Luiz O. F. Penalva nisms exist to dictate alternative splicing constituting
Department of Cellular and Structural Biology, mainly of tissue-specific splicing factors. For example,
Greehey Children’s Cancer Research Institute, the brain, where the highest number of alternative
University of Texas Health Science Center, San splicing occur in the body, express several brain-
Antonio, TX, USA specific splicing factors such as nPTB, NOVA1,
NOVA2, and the Hu/ELAV group of proteins.

Definition
Cross-References
For the majority of genes, RNA transcribed from
DNA must be processed to become a mature messen- ▶ Post-transcriptional Regulatory Networks
ger RNA, which is later translated into protein. One
RNA processing step is splicing, which involves
the removal of intervening sequences, or introns,
and connection of exons, which contain the genetic Amplification
information required for protein synthesis. Greater
than 95% of the annotated multi-exons contain Kareenhalli V. Venkatesh
protein-coding genes that are alternatively spliced. Department of Chemical Engineering, Indian Institute
Alternative splicing is an important mechanism for of Technology Bombay, Powai, Mumbai,
gene regulation as well as a mechanism to create Maharashtra, India
a diverse proteome given a limited number of genes
encoded by the genome. Biologically, alternative
splicing can create different protein products, which Definition
may be differentially expressed both temporally and
spatially. Splicing is catalyzed by the spliceosome, Signaling networks contain modules that help the cell
a large macromolecular complex composed of five in responding to weak extracellular stimulus. Such
main small nuclear ribonucleoproteins (snRNP) in a capability to respond arises from the ability of the
concert with many other auxiliary proteins. However, signaling pathway to amplify a weak stimulus and is
the decision to remove or include particular exons or termed as amplification. There are two kinds of signal
splice sites is dictated by two elements, sequence amplification in biological systems, namely, magni-
elements encoded by the RNA (which usually reside tude amplification and sensitivity amplification. Mag-
at the exon-intron junction) and trans-acting proteins. nitude amplification occurs when the output response
Depending on the splicing event desired, the trans- molecules are produced in larger number than the input
acting proteins are subdivided into four categories signal (stimulus) molecules, while the sensitivity
based on its action: exonic splicing enhancers, exonic amplification occurs when the percentage change in
splicing silencers, intronic splicing enhancers, and response is higher than the percentage change in stim-
intronic splicing silencers. ulus (Koshland et al. 1982).
Many common alternative splicing events can Amplification can be characterized by plotting the
occur; these include exon skipping, intron retention, steady state input-output response for a given input
and alternative 5’ splice site selection, alternative 3’ signal for an upstream and downstream effector in
splice site selection, mutually exclusive exons, alter- a signaling pathway. The shift in the curve of a dose
native promoters, and alternative polyadenylation. response toward the left, for activation of
Exon skipping is by far the most common splicing a downstream component as compared to an upstream
event. On an average, exons are 50–200 bp in length component demonstrates amplification (see Fig. 1).
while intronic sequences are usually thousands of base Amplification is quantified using the half-saturation
pairs long. constant (K0.5) obtained using Hill Equation of the
Amplification 23 A
Amplification, 1
Fig. 1 Amplification in
0.9

am
signaling pathways. Dotted

tre
curve: Response of the 0.8 A

s
wn
upstream component; Solid

Do
Fractional Response

am
0.7
curve: Response of the

re
st
downstream component 0.6

Up
Upstream
0.5 K 0.5
Amplification = =3/1=3
Downstream
0.4 K 0.5
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Input Concentration in Arbitrary Units (AU)

input-output response curves. The ratio of the half- cascade is lost due to the saturation of the upstream
saturation constants for an upstream component to component. If the upstream component is saturated
that of a downstream component to changes in an (more than 90%) for a given input value and for the
input signal provides the measure of amplification, same input value if the downstream component is also
and is defined as follows: saturated, then the role of signal amplification is insig-
nificant (see Fig. 1). Amplification aids the cell to
Upstream respond to the smallest variations in the stimulus
K0:5
Amplification ¼ Downstream (1) allowing it to respond to a broad range of stimulus
K0:5
strengths. However, the downside of amplification is
Upstream
where K0:5 Downstream
and K0:5 are the half-saturation that the design motif also amplifies the signal noise
constants of the upstream and downstream compo- along with the input signal (Shibata and Fujimoto
nents of a signaling pathway obtained using Hill equa- 2005).
tion (see Fig. 1). Cascade of covalent modification
cycles is known to yield amplification. A classic exam-
ple is the activation of MAPK pathway in Xenopus Cross-References
oocytes. Hundredfold amplification in MAPK and
30-fold amplification in MAPKK have been observed ▶ Hill Equation
with respect to the upstream kinase, MAPKK (Huang ▶ Pathway Modeling, Metabolic
and Ferrell 1996). Signal amplification increases along ▶ Signaling Network Resources
the cascade and the amplification at particular step
depends upon the signal amplitude of the preceding
step (Heinrich et al. 2002). The extent of amplification References
depends on the ratio of the counteracting kinase to
phosphatase reaction rates, where the rate of phospha- Heinrich R, Neel BG, Rapoport TA (2002) Mathematical models
tase should be lesser than that of kinase to bring about of protein kinase signal transduction. Mol Cell 9(5):957–970
Huang CYF, Ferrell JE (1996) Ultrasensitivity in the mitogen-
signal amplification. activated protein kinase cascade. Proc Natl Acad Sci USA
The cascading of signals through multiple steps 93(19):10078–10083
helps in amplifying a weak signal along with increas- Koshland DE, Goldbeter A, Stock JB (1982) Amplification and
ing the response sensitivity. The extent of amplifica- adaptation in regulatory and sensory systems. Science
217:220–225
tion decreases with increasing strength of the signal Shibata T, Fujimoto K (2005) Noisy signal amplification in
and vice versa. There is certain threshold for the input ultrasensitive signal transduction. Proc Natl Acad Sci USA
stimulus above which the amplification ability of the 102(2):331–336
A 24 Analysis of Variance

If there is a strong treatment effect, one would expect


Analysis of Variance the variability between the treatments to be consider-
ably large than that within the treatment. To test the
Larissa Stanberry hypothesis, one can consider the ratio of the between- to
Bioinformatics and High-throughput Analysis within-variance. The ratio is large if there is a strong
Laboratory, Seattle Children’s Research Institute, treatment effect. If the data are random samples from
Seattle, WA, USA normal distribution and the variance is constant across
the treatment groups, under the null hypothesis of no
treatment effect, the ratio of the variances follows an
Synonyms F-distribution with parameters given by the degrees of
freedom of the numerator and denominator terms. The
ANOVA p-value is obtained by comparing the observed value of
the statistic against the F-distribution.
The following ANOVA table gives the summary of
Definition the analysis for the chick weight data. This is an exam-
ple of a one-factor ANOVA. The table gives the degrees
The analysis of variance evaluates the null hypothesis of of freedom, the sum of squares, and the mean square for
no difference in means across multiple treatment groups. the factor and the residuals. The mean square is given by
the ratio of the sum of squares over the corresponding
degrees of freedom. The mean squares value for “feed”
Characteristics gives an estimate of the variability between the different
feed types. The residuals mean squares measure the
Consider an experiment to compare the effectiveness within-treatment variability.
of various feed supplements on the growth rate of The F-statistic is computed as a ratio of the mean
chickens. Newly hatched chicks were randomly squares of “feed” to the mean squares of residuals. The
allocated into six groups, and each group was given p-value can be computed using Permutation Test or
a different feed supplement. Their weights in grams Randomization Test. Alternatively, assuming that
after 6 weeks were recorded along with feed types. The the data is normally distributed and the error variance
number of chickens for each supplement were: is constant, the F-statistic follows an F(5,65)-
casein:12, horsebean:10, linseed:12, distribution. The large value of the F-statistic indicates
meatmeal:11, soybean:14, sunflower:12. a significant effect of the feed type on chick weight at
The data set chickwts is freely available as a part 6 weeks. Note that before interpreting the results, it is
of R package. In this example, the response variable is important to verify the model assumptions.
chick weight at 6 weeks and the treatments are differ-
ent feed supplements. Analysis of Variance Table
Figure 1 shows the boxplots for each feed supple- Response: weight
ment. The weight at 6 weeks appears to vary Df Sum Sq Mean Sq F value Pr(>F)
considerable across the different feed types. Can the feed 5 231129 46226 15.365 5.936e -10 ***
variability in the weight be explained by feed type? To Residuals 65 195556 3009
answer this question, one could perform all possible
pairwise comparisons between the different treatment Although ANOVA results indicate the difference in
groups. However, this approach would inflate the mean weight for different feed types, they provide little
experiment-wise error rate. Analysis of variance guidance as to which treatments exactly differ from
(ANOVA) tests the hypothesis of no variation due to each other and in which direction. The investigators are
treatment against the alternative that there exists a pair presumably interested in understanding which feed is
of treatments for which the mean responses differ. associated with the largest/smallest weight measure-
ANOVA compares the variability across treatments ments. To answer this question, we can follow up the
to the variability within treatments. If there is no treat- ANOVA with a series of t-tests. Multiple comparisons
ment effect, the two measures should be about the same. increase the experiment-wise error rate. In this
Anaphase-Promoting Complex (APC/C) 25 A
Analysis of Variance,
Fig. 1 The boxplot showing °
the weight gain separately for
400
the two factors
° A

350

Weight at six weeks (gm) 300

250

°
200

150

100

casein horsebean linseed meatmeal soybean sunflower

example, performing all 15 pairwise comparisons with


the Type I error of 0.05 would imply the experiment- Analysis Situs
wise error of approximately 0.75. To control the
experiment-wise error rate, one can use the Bonferroni ▶ Topology and Toponomics
correction method, Fisher’s least-significant-
difference method, and other suitable procedures.

Anaphase-Promoting Complex (APC/C)


Cross-References
Sergio Moreno
▶ Hypothesis Testing
Instituto de Biologı́a Molecular y Celular del Cáncer,
▶ Multiple Hypothesis Testing
CSIC/Universidad de Salamanca, Salamanca, Spain
▶ Permutation Test
▶ Randomization Test
Synonyms
References
Cyclosome
Montgomery D (2009) Design and analysis of experiments,
7th edn. Wiley, Hoboken, NJ
Definition

Analysis of Variance (ANOVA) Tables Multiprotein E3 ligase complex involved in the


ubiquitylation of proteins for their degradation by the
▶ Experimental Design, Variability proteasome. It is activated by Cdc20 or Cdh1, which
A 26 Anaphase-Promoting Complex Inhibitors

specify substrate recognition, promoting anaphase pro- Plk1, securin and cyclin B (reviewed in Peters 2006).
gression, mitotic exit, or the maintenance of G1. Securin is an inhibitor of separase, a protease that
cleaves cohesin at anaphase onset. Securin is
Cross-References degraded when all kinetochores are attached by spin-
dle microtubules and therefore the cell is ready to go
▶ Anaphase-Promoting Complex Inhibitors into anaphase. Destruction of securin in turn activates
▶ CDK Inhibitors separase so that it can cleave cohesin. Another sub-
strate cyclin B is required for mitotic/meiotic progres-
sion until metaphase, but is downregulated by APC/C
Anaphase-Promoting Complex at anaphase onset onward. In general, APC/C sub-
Inhibitors strates contain motifs such as the destruction box
(D-box) or the KEN box. These motifs are directly
Masamitsu Sato recognized by APC/C co-activators Cdc20 and Cdh1
Department of Biophysics and Biochemistry, Graduate (see below).
School of Science, University of Tokyo, Bunkyo–ku,
Tokyo, Japan
PRESTO, Japan Science and Technology Agency, Co-activators of APC/C
Kawaguchi, Saitama, Japan APC/C is activated by co-activators Cdc20/Slp1/Fizzy
and Cdh1/Ste9/Fzr (Fizzy-related). These
co-activators contain WD40 repeats as well as the
Synonyms IR-tail at the C-termini. Those proteins are thought to
be important for substrate recognition and specificity
Anaphase-Promoting Complex (APC/C); Cyclosome of APC/C. One of the most important aspects of
APC/C function is how ordered destruction of sub-
Definition strates is achieved. As shown in Fig. 1, there are mainly
three stages of the cell cycle when APC/C operates:
APC/C (anaphase-promoting complex/cyclosome) is prometa-metaphase, anaphase, and G1 phase. The
an E3 enzyme (ubiquitin ligase) that degrades sub- potential activity of APC/C-Cdc20 increased during
strates including cyclin B and securin, to trigger ana- prometaphase, but the degradation of other substrates
phase onset. An E3 enzyme brings its substrates into such as cyclin B and securin is delayed by spindle
close proximity of a ubiquitin conjugating enzyme E2 assembly checkpoint (see below). During
so that E2 can transfer ubiquitin to the substrate. prometaphase, cyclin A is degraded, whereas securin
APC/C is a 1.5 MDa protein complex composed of at and cyclin B are degraded at anaphase onset. In late
least a dozen subunits. Activation of APC/C is essen- anaphase, “substrate switch” occurs because APC/C
tial for anaphase onset during mitosis. In meiosis, in degrades its co-activator Cdc20 and replaces it with
order to ensure two consecutive rounds of chromo- Cdh1. In budding yeast, Cdh1 is activated by Cdc14,
some segregation in meiosis I and meiosis II, APC/C the protein phosphatase that functions antagonistic to
activity needs to be partially inhibited during the Cdk1. APC/C-Cdh1 contributes to the degradation of
interkinesis period (the period between meiosis residual cyclin B and other substrates such as Plk1 and
I and II). This is operated by APC/C inhibitors Mes1 Cdc20. Both Cdc20 and Cdh1 recognize D-boxes, but
in fission yeast and Erp1/Emi2 in Xenopus oocytes. Cdh1 can also interact with KEN-boxes. This differ-
ential recognition system by Cdc20 and Cdh1 should
contribute to the ordered destruction of substrates. In
Characteristics addition, the processivity of polyubiquitination is also
important for temporal regulation of destruction of
Substrates of APC/C Cdh1 substrates (Rape et al. 2006). Among Cdh1 sub-
The substrates of APC/C are key regulators of mitosis, strates, for instance, securin is shown to be
including Aurora kinases and the Polo-like kinase polyubiquitinated by APC/C-Cdh1 in a processive
Anaphase-Promoting Complex Inhibitors 27 A
Ubiquitination Degraded proteins

Prometa Meta Ana Late ana - telo G1


A
APC/C APC/C APC/C
Cdc20 Cdc20 Cdh1

Ub
Ub Ub
Securin Securin

Ub
Ub Ub Ub Slow Ub Ub
Cyclin A Cyclin B Ub Ub Cyclin B Cyclin A Cyclin A
Ub
Cdk2 Cdk1 Cdk1
Ub Ub
Cdc20 Ub
Ub Ub
Securin Processive
Cdc14 Cdc14
Inactive Active

Cyclin A P Cdc20 Securin Cyclin A


Cdh1 Cdh1
Inactive Active

Anaphase-Promoting Complex Inhibitors, Fig. 1 A time- temporally organized by co-activators (Cdc20/Slp1/Fizzy and
line of APC/C activation and the substrate specificity. APC/C Cdh1/Ste9/Fzr). Red arrows indicate ubiquitination (Ub) by
has many kinds of substrates, but the timing of destruction is APC/C with co-activators

manner, while it takes long time for cyclin A to be Although budding yeast Mad3 contains a D-box,
polyubiquitinated (Fig. 1). Mad3 is not degraded by APC/C-Cdc20. Thus, Mad3
may be a pseudosubstrate inhibitor that docks to the
Inhibition of APC/C co-activator Cdc20 so that it cannot activate APC/C
For transition from meiosis I to meiosis II, inhibition of (Burton and Solomon 2007). In contrast, fission yeast
APC/C seems to be a conserved and essential system to Mad3 seems to be an actual substrate of APC/C to
ensure the consecutive M phases (see Fig. 2 of the inhibit cyclin degradation, which leads to metaphase
essay ▶ Meiosis). Moreover, inhibition of premature arrest if the checkpoint is activated (King et al. 2007).
activation of APC/C before anaphase is also an impor-
tant aspect of APC/C regulation. There are many kinds Mes1
of APC/C inhibitors that play roles in M phase pro- Fission yeast Mes1 is an APC/C inhibitor essential for
gression, as exemplified below. meiosis I–II transition. Mes1 is a small protein of
11.3 kDa and contains a KEN-box and a D-box to
The Spindle Assembly Checkpoint Components directly bind to the WD40 repeats of an APC/C
Mad3/BubR1 is a component of the spindle assembly co-activator Slp1/Cdc20/Fizzy. Mes1 is shown to be
checkpoint, which monitors if the attachment of micro- an actual substrate of APC/C (Kimata et al. 2008),
tubules to all kinetochores is established or not. When rather than a pseudosubstrate.
unattached kinetochores exist or tension between sister
chromatids is not sufficient, the spindle assembly Erp1/Emi2
checkpoint is activated to arrest the cell cycle. Mad2 An Erp1 homolog Emi1 is a pseudosubstrate inhibitor
recognizes unattached kinetochores, and Mad3/ of APC/C. An APC/C inhibitor Erp1 of Xenopus laevis
BubR1, together with Mad2, binds to Cdc20 to inhibit has a D-box and a zinc-binding region (ZBR). Muta-
premature activation of APC/C before anaphase. tion in the D-box lost the function of Erp1, and Erp1
A 28 Anaplasia

may be also a pseudosubstrate inhibitor of APC/C


(Nishiyama et al. 2007). Mutation in the ZBR also Anaplasia
caused loss of Erp1 function although it still can bind
to APC/C, suggesting that dual-inhibition mechanism Barbara J. Davis
may operate for Erp1 to inhibit APC/C. Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences,
Securin North Grafton, MA, USA
A recent study found that securin functions not only as
a separase inhibitor, but also as an APC/C inhibitor,
which assists accumulation of another APC/C sub- Definition
strate Cyclin B to promote mitosis (Marangos and
Carroll 2008). Anaplasdia is the lack of differentiation (to form back-
Thus, the molecular mechanisms how APC/C is ward) to represent the extent to which cells within the
inhibited can be differentiated depending upon species tumor resemble tissue counterparts.
and/or situations. It is interesting to note that the pri-
mary way to block the APC/C activity is to contain
the destruction motifs, irrespective of the actual or
pseudo substrates. Cross-References

▶ Cancer Pathology

Cross-References

▶ Meiosis Angiogenic Switch

Marsha A. Moses
Department of Surgery/Harvard Medical School,
References Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA
Burton JL, Solomon MJ (2007) Mad3p, a pseudosubstrate inhib-
itor of APCCdc20 in the spindle assembly checkpoint. Genes
Dev 21:655–667
Kimata Y, Trickey M, Izawa D, Gannon J, Yamamoto M, Definition
Yamano H (2008) A mutual inhibition between APC/C and
its substrate Mes1 required for meiotic progression in fission
yeast. Dev Cell 14:446–454 The angiogenic switch refers to a key early stage in
King EM, van der Sar SJ, Hardwick KG (2007) solid tumor progression during which an avascular
Mad3 KEN boxes mediate both Cdc20 and Mad3 tumor lesion first becomes vascularized (Fig. 1)
turnover, and are critical for the spindle checkpoint. PLoS
(Harper and Moses 2006). The acquisition of this
ONE 2:e342
Marangos P, Carroll J (2008) Securin regulates entry into angiogenic phenotype is a rate-limiting step in
M-phase by modulating the stability of cyclin B. Nat Cell a tumor’s development in that these new capillaries
Biol 10:445–451 provide necessary nutrients and gas exchange for the
Nishiyama T, Ohsumi K, Kishimoto T (2007) Phosphorylation
nascent tumor. The acquisition of a capillary network
of Erp1 by p90rsk is required for cytostatic factor arrest in
Xenopus laevis eggs. Nature 446:1096–1099 is required for exponential growth of the tumor and for
Peters JM (2006) The anaphase promoting complex/cyclosome: its metastasis. It has been proposed that the angiogenic
a machine designed to destroy. Nat Rev Mol Cell Biol switch occurs as a function of a perturbation in the
7:644–656
Rape M, Reddy SK, Kirschner MW (2006) The processivity of
balance between angiogenic stimulators and inhibitors
multiubiquitination by the APC determines the order of sub- in favor of the positive regulators (Folkman 2002;
strate degradation. Cell 124:89–103 Harper and Moses 2006).
Anoxia 29 A
Anisokaryosis
A
Barbara J. Davis
Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences,
North Grafton, MA, USA

Definition
Angiogenic Switch, Fig. 1 An in vivo tumor model that reli-
ably recapitulates the angiogenic switch (Harper and Moses Anisokaryosis is variation in nuclear size.
2006). Preangiogenic tumor nodules (avascular, A) and angio-
genic tumor nodules (vascular, V) are shown. Scale bar ¼ 1 mm

Cross-References

Cross-References ▶ Cancer Pathology

▶ Regulation of Tumor Angiogenesis

References ANOVA
Folkman J (2002) Role of angiogenesis in tumor growth and
metastasis. Semin Oncol 29(6 Suppl 16):15–18
▶ Analysis of Variance
Harper J, Moses MA (2006) Molecular regulation of tumor
angiogenesis: mechanisms and therapeutic implications. In:
Cancer: cell structures, carcinogens and genomic instability.
Birkhauser, Basel, pp 223–268

Anoxia

Bernhard Schmierer
Anisocytosis Department of Biochemistry, Oxford Centre for
Integrative Systems Biology (OCISB), University of
Barbara J. Davis Oxford, Oxford, UK
Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences,
North Grafton, MA, USA Definition

Complete absence of oxygen, as opposed to physiolog-


Definition ically normal (normoxic) or unphysiologically low
(hypoxic) oxygen levels.
Anisocytosis – Variation in cell size.

Cross-References Cross-References

▶ Cancer Pathology ▶ Cell Cycle Signaling, Hypoxia


A 30 Antagonist

Antagonist Antigen–Antibody Interface Residue


Prediction
Melissa L. Kemp
The Wallace H. Coulter Department of Biomedical ▶ B Cell Epitope Prediction
Engineering, Georgia Institute of Technology and
Emory University, Atlanta, GA, USA

Antigen-Binding Site
Definition
▶ Paratope
Antagonist is the opposite of an agonist. An antagonist
is a ligand that binds to the cell surface receptor but
does not trigger any response. Instead, once bound to
its target receptor, an antagonist blocks ligand access Antigenic Determinant
to the receptor.
▶ Epitope
▶ Systems Immunology, Data Modeling and
Scripting in R
Antibody-dependant Immune Response

▶ B Cell-mediated Immune Response


Antigenic Drift

▶ Antigenic Drift and Shift


Antibody-mediated Immunity

▶ B Cell-mediated Immune Response


Antigenic Drift and Shift

Antigen Christian Sch€onbach


Department of Bioscience and Bioinformatics, Kyushu
Gajendra Raghava Institute of Technology, Iizuka, Fukuoka, Japan
Bioinformatics Centre, Institute of Microbial
Technology, Chandigarh, Chandigarh, India
Synonyms

Definition Antigenic Drift; Antigenic Shift; Change of Antigenic


Properties by Mutations; Change of Antigenic Proper-
Antigen is a molecule which is recognized as foreign ties by Rearrangement of Viral Genome Segments
by the host immune system. This may be protein, lipid,
or carbohydrate in nature.
Definition

Viral infections result in a constant competition


Antigen–Antibody Binding Site between the virus and host to prevail over each other.
Prediction The virus tries to avoid or defeat the host defense
mechanisms, whereas the host defense tries to elimi-
▶ B Cell Epitope Prediction nate the virus. Two mechanisms enable viruses to
Antimicrobial Peptides 31 A
escape the host immune response: antigenic drift and protect itself from different types of pathogenic bacte-
antigenic shift (Weber and Elliott 2002; Boni 2008). ria. These peptides are gene encoded, evolutionarily
RNA viruses (e.g., influenza A) produce their one conserved, ubiquitous, and produced virtually by all
A
RNA-dependent RNA-polymerase to replicate their classes of living organisms ranging from bacteria (to
RNA. Since the viral RNA polymerase lacks proof- exercise competitive exclusion), plants to higher ver-
reading functions, sequence mutations occur fre- tebrates (used as weapons to fight a variety of infec-
quently. If the mutations result in amino acid changes tious agents). Apart from killing bacteria, these
the antigenic properties of the viral protein may peptides generally show a broad spectrum of activity
change, enabling the virus to escape antibody recogni- against fungi, viruses, and even against cancer cells
tion (antigenic drift) (Boni 2008). Another mechanism and are considered potential therapeutic agents. Cur-
that produces viral protein variants with new antigenic rently, a great deal of interest is shown in antimicrobial
properties is the exchange of viral genome segments peptides, which seem to be promising to overcome the
during mixed viral infections (antigenic shift) (Weber growing problem of antibiotic resistance among
and Elliott 2002). bacteria.

Cross-References Characteristics

▶ Viral Respiratory Tract Infections Properties


Antimicrobial peptides are very diverse with respect
to amino acid sequence and secondary structure
References (Diamond et al. 2009). Such is the diversity of
sequences that even in two closely related species of
Boni MF (2008) Vaccination and antigenic drift in influenza. animals, same peptide sequence is rarely recovered.
Vaccine 26(Suppl 3):C8–C14
Despite this diversity, these peptides possess certain
Weber F, Elliott RM (2002) Antigenic drift, antigenic shift and
interferon antagonists: how bunyaviruses counteract the conserved features like these are generally 12–100
immune system. Virus Res 88(1–2):129–136 residues in length, amphipathic in nature, and are pre-
dominantly positively charged owing to the abundance
of Lysine and Arginine residues. They share certain
properties, such as affinity for the negatively charged
Antigenic Shift phospholipids that are present on the outer surfaces of
the cytoplasmic membrane of many microbial species.
▶ Antigenic Drift and Shift So, on the whole, amphipathicity and net charge are
characteristics understandably conserved among many
antimicrobial peptides.
Based on their structures, antimicrobial peptides are
Antimicrobial Peptides classified into four major classes: b-sheet, a-helical,
loop, and extended peptides (Powers and Hancock
Sneh Lata1 and Gajendra Raghava2 2003). The b-sheet and a-helical classes are most com-
1
Cold Spring Harbor Laboratory, Genome Research mon in nature. Cationic antimicrobial peptides share
Center, Woodbury, NY, USA a common trait: They have the ability to fold into
2
Bioinformatics Centre, Institute of Microbial amphipathic or amphiphilic conformations, often
Technology, Chandigarh, Chandigarh, India induced by interaction with membranes (Powers and
Hancock 2003).

Definition Selectivity of Action


A significant consideration in antimicrobial peptide
Antimicrobial peptides (AMPs) are important compo- action is the degree to which these peptides
nents of the innate immune system, used by the host to distinguishe between microbial and host cells in
A 32 Antimicrobial Peptides

settings of potential toxicity. Evidence continues to antimicrobial peptides act by bacterial membrane dis-
mount in support that it is the fundamental differences ruption, which is absolutely necessary for bacterial
in biochemical and biophysical properties of microbial physiology; so, the chances of developing resistance
versus host cells that provide selective toxicity to anti- are remote. Antimicrobial peptides also possess
microbial peptides. The composition of membrane immunomodulating activity. These peptides have been
provides an important determinant by which antimi- reported to direct chemotaxis, neutralize Lipopolysac-
crobial peptides target microbial versus host mem- charides (LPS), and play role in wound healing too.
branes. Since the surface of the bacterial membranes Several peptides and their derivatives have already
is more negatively charged than mammalian cells, passed clinical trials successfully (Hancock and
antimicrobial peptides will show different affinities Chapple 1999; Levy 2000) and several others are in
toward the bacterial membranes and mammalian cell pipeline as potential therapeutics (Hancock and
membranes. Cationic property of antimicrobial pep- Chapple 1999). Their role as food preservative is well
tides enables them to interact more effectively with established; in addition, they have a number of other
the relatively charged microbial membrane than with biotechnological applications, e.g., in veterinary &
the relatively neutral mammalian membranes. So, Medical science, in transgenic plants (Osusky et al.
charge affinity is likely an important means conferring 2000), in aquaculture, and as aerosol spray for patients
selectivity to antimicrobial peptides. In addition, the of cystic fibrosis, as “chemical condoms” to limit the
presence of cholesterol membrane as stabilizing agent spread of sexually transmitted diseases, can enhance the
in mammalian cells and its absence in bacterial cells potency of existing antibiotics in vivo, probably by
also affect selectivity. Alternatively, access to host facilitating access of antibiotics into the bacterial cell
tissues may be restricted by the localization and regu- (Zasloff 2002). Designing novel peptides with antimi-
lated expression of antimicrobial peptides. crobial activities requires prediction methods to narrow
down the candidate peptides. Computer-aided predic-
Mode of Action tion methods (Lata et al. 2010, 2007) are proving to be
Antimicrobial peptides are believed to have various of great help in this direction.
modes of action to kill bacteria. These cationic pep-
tides first interact with the relatively negatively
charged bacterial membranes, attach to them and Cross-References
their amphipathicity allows them to partition thorough
the membrane and insert into membrane bilayers to ▶ Antimicrobial Peptides
form pores. Pore formation in the cell membrane ▶ Innate Immunity
brings about the leaking of the cytosolic components ▶ Pharmaceutical Toxicology, Application of
also known as cytolysis, thus killing the bacteria. Biosimulation
Alternatively, these may penetrate the cell membrane ▶ Prediction Model Integration
and travel inside the cell to bind intracellular mole-
cules like DNA, RNA, and proteins, thus hampering
the crucial metabolic activities. However, in many
References
cases, the exact mechanism of killing is not known.
Diamond G, Beckloff N et al (2009) The roles of antimicrobial
Applications peptides in innate host defense. Curr Pharm Des
In the past few decades, antimicrobial peptides have 15(21):2377–2392
Hancock RE, Chapple DS (1999) Peptide antibiotics.
emerged as promising solutions to the problem of grow- Antimicrob Agents Chemother 43(6):1317–1323
ing antibiotic resistance among various strains of bacte- Lata S, Sharma BK et al (2007) Analysis and prediction of
ria. Researchers are focusing on antimicrobial peptides, antibacterial peptides. BMC Bioinform 8:263
which also play an important role in innate immunity, as Lata S, Mishra NK et al (2010) AntiBP2: improved version of
antibacterial peptide prediction. BMC Bioinform 11(1):S19
alternative drugs. Their short length and fast & efficient
Levy O (2000) Antimicrobial proteins and peptides of
action against microbes has made them potential candi- blood: templates for novel antimicrobial agents. Blood
dates as peptide drugs. Unlike conventional antibiotics, 96(8):2664–2672
Applied Text Mining 33 A
Osusky M, Zhou G et al (2000) Transgenic plants expressing morphological and biochemical changes that result in
cationic peptide chimeras exhibit broad-spectrum resistance cell death. The changes include blebbing; cell shrink-
to phytopathogens. Nat Biotechnol 18(11):1162–1166
age; nuclear and DNA fragmentation. Apoptosis of
Powers JP, Hancock RE (2003) The relationship between pep- A
tide structure and antibacterial activity. Peptides tumor cells is beneficial as it terminates the growth of
24(11):1681–1691 cells on one hand and decreases the chances of metas-
Zasloff M (2002) Antimicrobial peptides of multicellular organ- tasis of tumors on the other. On the other hand, exces-
isms. Nature 415(6870):389–395
sive and inappropriate cell death is seen in case of
neurological diseases such as Parkinson disease.

Anti-oncogen Cross-References

▶ Tumor Suppressor Gene ▶ Epigenetics, Drug Discovery

Application Ontology
Antiviral Interferons
▶ Cell Cycle Ontology (CCO)
▶ Pro-inflammatory Mediators

Applied Natural Language Processing


Apoptosis
▶ Applied Text Mining
Vani Brahmachari and Shruti Jain
Dr. B. R. Ambedkar Center for Biomedical Research,
University of Delhi, Delhi, India Applied Text Mining

Kevin Bretonnel Cohen1 and Karin Verspoor1,2


1
Synonyms Center for Computational Pharmacology, University
of Colorado, Aurora, CO, USA
2
Programmed cell death Victoria Research Laboratory, National ICT
Australia, University of Melbourne, Melbourne, VIC,
Australia
Definition

The process of death of a cell as an end point of a series Synonyms


of defined ordered biochemical reactions in response to
a signal-like DNA damage or activation of certain Natural Language Processing
genes is called apoptosis or programmed cell death.
Apoptosis can be contrasted with necrosis which is
unordered and unregulated degeneration of cells. Apo- Definition
ptosis does not release toxic by-products unlike necro-
sis which may produce cellular debris that can damage Applied text mining involves the application of methods
the surrounding tissue. Whereas, apoptosis involves from ▶ natural language processing or other text analy-
programmed cell death wherein the cell receives spe- sis methodologies to enhance biological data analysis
cific signals and all cellular debris are eliminated effec- with information extracted from the biological
tively by exocytosis. During apoptosis cells undergo literature.
A 34 Applied Text Mining

Characteristics the database curators and which can safely be ignored.


Document triage systems typically assume that there
There are three primary use cases for applied text are particular kinds of information whose presence in a
mining. One of the prototypical use cases in much document makes it of importance to the model organ-
current work in the field has been the needs of the ism database curator. For example, document triage
model organism database curator. These needs may systems have been built to select documents about
include information retrieval and document classifica- protein–protein interactions (Krallinger et al. 2008a),
tion; gene name detection and mapping to genomic the presence of Gene Ontology terms, embryonic
databases; and information extraction, or the automatic development, tumors, and mutant alleles (Hersh and
extraction of constrained types of facts from text. Voorhees 2008). The performance of document triage
Another common use case since the beginning of systems varies widely depending on the application,
the modern era in genomic text mining has been and it is sometimes difficult to compare between
database construction. Here the typical model of the applications because of the use of different metrics to
task is again information extraction; the goal is to assess their performance.
build systems that can assemble facts for a database Once a document has been triaged and selected for
of very specific types of assertions. the further attention of a model organism database
Finally, text mining has been used as a tool for curator, the next most basic practical application type
interpreting the results of high-throughput assays. In is the gene mention application. Gene mention systems
this use case, users never see the output of text mining are a type of named entity recognition system
itself – rather, text mining becomes an additional (see ▶ Natural Language Processing) whose goal is
knowledge source to use in interpreting a high- to recognize gene and protein names in text. Gene
throughput data set. mention systems can be used as part of an initial triage
step, giving the model organism database curator
Model Organism Database Curation some indication of what gene or genes are being
Model organism databases, such as Mouse Genome studied in a paper. They can also be used to assist in
Informatics, curate information from the published reading – when a gene mention system is used to
literature into databases of the genomes of a specific highlight every occurrence of a gene or protein name
species. The rate of publication in biomedicine is in a text, it helps to guide the reader’s attention to
currently growing exponentially (Hunter and relevant stretches of text. Finally, gene mention can
B̃retonnel Cohen 2006), so it is difficult for the model be helpful in selecting sentences from a text that are
organism databases to manually keep up with the likely to be of high interest to the model organism
flow of available information. For this reason, model database curator. Modern gene mention systems can
organism databases and associated efforts like the perform at about 0.90 F-measure (Smith et al. 2008).
Gene Ontology consortium have been supporters of It should be noted that it can be useful to find
applied text mining research for some years. This other varieties of named entities in texts, such as
has led to model organism database curators having Gene Ontology terms. However, work on locating
a large impact on the direction of research in biomed- other types of terms is mostly still in its infancy,
ical natural language processing, particularly through although considerable success has been reached
their participation in the BioCreative shared tasks with some types, such as mutations (Gregory Caporaso
(Hirschman et al. 2005; Krallinger et al. 2008b). et al. 2007). Other semantic types remain a fruitful
Characterizing the workflow of model organism area for further research.
databases in order to best understand the potential A closely related problem is gene normalization.
insertion points for text mining is a matter of current Here, the task is to map from some mention of a gene
research. However, it seems clear that a number of or protein in text to a specific entity in a database. This
practical application types can be of use to database has been attempted for the Entrez Gene database and for
curators. UniProt. It is made difficult by the facts that the same
The most basic of these is a document triage system. gene name may apply to different organisms and by the
The purpose of a document triage system is to take a set fact that the same gene symbol may map to multiple
of documents and determine which should be read by genes in the same organism.
Applied Text Mining 35 A
Finally, having selected documents for reading, It draws a picture of the network, and for any edge in the
recognized named entities in text, and normalized network, allows the user to display the set of sentences
those entities to specific database identifiers, the final that provide evidence for that particular edge.
A
practical application is to extract information about A more recent approach is the Hanalyzer system
specific relationships in the text (see ▶ Natural Lan- (Leach et al. 2009). The Hanalyzer has been applied
guage Processing). In the simplest example, the model primarily to gene expression data, but is applicable to
organism database curator might want to be shown any data source that can be expressed as a network.
interacting proteins. Much more difficult applications It works by building two networks: one showing linked
exist as well, such as finding relationships between data points in the experimental data (e.g., genes
genes and Gene Ontology terms. In any case, any whose expression is significantly correlated), and one
information extraction application for model organism showing connections between data points that come
database curation is complicated by the gene normal- from preexisting knowledge sources, such as databases
ization problem and normalization of any other entities of interacting proteins, known phenotypes, KEGG
of interest. annotations, and many others. (The Hanalyzer differs
from an application like ChiliBot in utilizing these
Database Construction other sources of data.) Overlaying these two networks
An application since the early days of the modern era can both explain observed patterns in the experimental
in genomic NLP has been database construction. This data, and suggest novel hypotheses for further explo-
is approached as an information extraction task. We ration. Where the experimental data overlaps with the
differ here between model organism database curation, data from the knowledge network, the tool provides an
which can be viewed as a type of database construction explanation for the observed experimental effects and
and the construction of databases that are focused not can help the experimentalist both in understanding
on a specific organism but on a specific type of phe- the experimental data and in the more basic task of
nomenon or class of biological entity. Both rule-based validating that the experimental data accords with
(Blaschke et al. 1999) and machine-learning-based previous knowledge about the phenomenon under
(Craven and Kumlien 1999) approaches have been investigation. When the experimental network does
applied with success. Examples of databases built not line up with the knowledge network, we see what
through text mining include MuTeXt (Horn et al. the new findings of the experiment are, and are
2004), a database of mutations in G protein–coupled suggested with hypotheses for further experimental
receptors, and RLIMS-P (Hu et al. 2005), a database of validation. One such application of the tool led to the
enzymes, their substrates, and the associated phos- discovery of the involvement of four genes in tongue
phorylation sites. Numerous challenges exist for the development in the mouse that were not previously
construction of such systems, including dealing with known to be involved in this process. The finding was
speculation and negation. experimentally verified and has furthered our under-
standing of genetic factors that may lead to the devel-
High-Throughput Data Analysis opment of cleft palate.
Unlike the previous use cases, when text mining is One of the important data sources in the Hanalyzer
used as part of a high-throughput data analysis system, is text mining output. Figure 1 shows an example of
users never see the actual output of the text mining a link in the knowledge network that is obtained through
system itself. Early work in this area included using text mining. It shows that the two genes in question are
text mining results to improve BLAST searches both found in the synapse. As can be seen, the PubMed
(Chang et al. 2001) and to analyze gene expression identifier of the relevant document is given.
microarray data (Blaschke et al. 2001). These applica- The Hanalyzer uses a variety of forms of applied
tions are well-reviewed in (Ng 2006). One impressive natural language processing. In its simplest form, gene
application is the ChiliBot system (Chen and Sharp names are analyzed for co-occurrence metrics, using
2004). This application is intended for analyzing a weighting approach that negates many of the draw-
gene lists from microarray experiments. Given a list backs of simple co-occurrence. The Hanalyzer also
of genes, it gathers information from the literature to applies a more sophisticated information extraction
build a network of associations between those genes. approach, using the OpenDMAP semantic parser
A 36 Applied Text Mining

Applied Text Mining, Fig. 1 Screen capture of the Hanalyzer system, showing the nature of an edge in the graph and the pointer to
the PubMed abstract that provides evidence for it

(Hunter et al. 2008). The net effect is a substantial uncovering bugs in programs than running those pro-
increase in the amount of knowledge available to the grams against huge test collections, and that in addition,
system just from preexisting databases. using such test suites is considerably more efficient
because it can have a much faster run time (Bretonnel
Software Testing and Quality Assurance in Cohen et al. 2008).
Applied Text Mining
The fact that applied text mining systems are intended to Acknowledgment The authors would like to thank Hannah
be applied to research in human health and disease Tipney for providing the Hanalyzer figure.
places a special burden on system builders to ensure
that their systems are built to the highest, industrial-
strength standards of software quality. Testing software Cross-References
whose input is language presents special challenges that
are not found in other types of software. Here it has been ▶ Natural Language Processing
found helpful to combine techniques from software
testing with techniques from the field of descriptive
linguistics, or the science of describing unknown lan-
References
guages. These two fields turn out to have much in
common, both in theoretical background and in terms Blaschke C, Andrade MA, Ouzounis C, Valencia A (1999)
of practical techniques. In this approach, the text mining Automatic extraction of biological information from scien-
application is modeled as a language that we do not tific text: protein–protein interactions. In: Intelligent Systems
for Molecular Biology, AAAI Press, Menlo Park, pp 60–67
know anything about. One finding of this research has
Blaschke CJ, Oliveros C, Valencia A (2001) Mining functional
been that using structured test suites containing information associated with expression arrays. Funct Integr
manufactured data can be a more effective tool for Genomics 1(4):256–268
Archetypes 37 A
Bretonnel Cohen K, Baumgartner WA Jr, Hunter L (2008) Soft- Dai HJ, Liu F, Chen YF, Sun CJ, Katrenko S, Adriaans P,
ware testing and the naturally occurring data assumption in Blaschke C, Torres R, Neves M, Nakov P, Divoli A,
natural language processing. In: Software Engineering, Test- Mana-Lopez M, Mata J, Wilber WJ (2008) Overview of
ing, and Quality Assurance for Natural Language Processing, BioCreative II gene mention recognition. Genome Biol A
Association for Computational Linguistics, Columbus, 9(Suppl 2): S2
pp 23–30
Chang JT, Raychaudhri S, Altman RB (2001) Including
biological literature improves homology search. In: Pacific
Symposium on Biocomputing, Mauna Lani, HI, pp 374–383
Chen H, Sharp BM (2004) Content-rich biological network (Approx.) Boundary Condition
constructed by mining pubmed abstracts. BMC Bioinformat-
ics 5:1471–2105
▶ Constraint
Craven M, Kumlien J (1999) Constructing biological knowledge
bases by extracting information from text sources. In: Intel-
ligent Systems for Molecular Biology, Heidelberg, pp 77–86
Gregory Caporaso J, Baumgartner WA Jr, Randolph DA,
Bretonnel Cohen K, Hunter L (2007) Mutationfinder:
a high-performance system for extracting point mutation
ARC
mentions from text. Bioinformatics 23:1862–1865
Hersh W, Voorhees E (2008) TREC genomics special issue ▶ Mediator
overview. Inf Retrieval 12:1
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview
of BioCreAtIvE: critical assessment of information extrac-
tion for biology. BMC Bioinformatics 6:S1
Horn F, Lau AL, Cohen FE (2004) Automated extraction of Archetypes
mutation data from the literature: application of MuteXt to
G protein-coupled receptors and nuclear hormone receptors. Jesus Bisbal
Bioinformatics 20(4):557–568
Hu ZZ, Narayanaswami M, Ravikumar KE, Vijay-Shanker K, Departament de Tecnologies de la Informació i les
Wu CH (2005) Literature mining and database annotation of Comunicacions, Universitat Pompeu Fabra,
protein phosphorylation using a rule-based system. Bioinfor- Barcelona, Spain
matics 21(11):2759–2765
Hunter L, B̃retonnel Cohen K (2006) Biomedical language
processing: what’s beyond PubMed? Mol Cell 21:589–594
Hunter L, Lu Z, Firby J, Baumgartner WA Jr, Johnson HL, Ogren Synonyms
PV, Bretonnel Cohen K (2008) OpenDMAP: an open-source,
ontology-driven concept analysis engine, with applications to Templates
capturing knowledge regarding protein transport, protein
interactions and cell-specific gene expression. BMC Bioin-
formatics 9(78) http://www.biomedcentral.com/1471-2105/9/
78 (doi:10.1186/1471-2105- 9-78) Definition
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A
(2008a) Overview of the protein–protein interaction
annotation extraction task of BioCreative II. Genome Biol Archetypes are the formal representation of informa-
9(suppl 2):S4 tion concepts used in a given domain of application.
Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, An advanced approach to the modeling and man-
Wilbur J, Hirschman L, Valencia A (2008b) The BioCreative agement of information separates domain information
II – critical assessment for information extraction in biology
challenge. Genome Biol 9:1 models into three independent components: data,
Leach SM, Tipney H, Feng W, Baumgartner WA Jr, Kasliwal P, archetypes (i.e., structure), and semantics. Archetypes
Schuyler RP, Williams T, Spritz RA, Hunter L (2009) are used to define the different sets of related data
Biomedical discovery acceleration, with applications to cra- attributes, which together provide the necessary con-
niofacial development. PLoS Comput Biol 5(3):e1000215
Ng S-K (2006) Integrating text mining with data mining. In: text to adequately interpret the information which is to
Ananiadou S, McNaught J (eds) Text mining for biology and be stored, communicated, or shared. External ontol-
biomedicine. Artech House, Norwood ogies or terminological resources are optionally
Smith L, Tanabe LK, Johnson R, Kuo CJ, Chung IF, Hsu CN, referred to from the archetype definitions in order to
Lin YS, Klinger R, Friedrich CM, Ganchev K, Torii M,
Liu HF, Haddow B, Struble CA, Povinelli RJ, Vlachos A, annotate those definitions with semantically rich
Baumgartner WA Jr, Hunter L, Carpenter B, Tsai RTH, resources, facilitating interoperability and integration.
A 38 Area under the ROC Curve

This approach originated in the medical informatics Definition


community to standardize medical information
communication, but it is of general applicability. The area under a ▶ receiver operating characteristic
It advocates a principled design methodology to (ROC) curve, abbreviated as AUC, is a single scalar
information modeling. value that measures the overall performance of a
Archetypes are at the core of this methodology, binary classifier (Hanley and McNeil 1982).
which is often referred to as two-level modeling para- The AUC value is within the range [0.5–1.0], where
digm (Bisbal and Berry 2011). Its first level, the refer- the minimum value represents the performance of a
ence model, is a predefined set of very abstract classes random classifier and the maximum value would cor-
and aggregation rules that provide the flexibility to respond to a perfect classifier (e.g., with a classification
model any information concept. Its second level, the error rate equivalent to zero).
archetypes, add semantics and constraints to the poten- The AUC is a robust overall measure to evaluate the
tial instances of the reference model, in order to ensure performance of score classifiers because its calculation
that the desired semantics are captured by the clinical relies on the complete ROC curve and thus involves all
concepts defined by those archetypes. possible classification thresholds. The AUC is
The two-level modeling paradigm is being stan- typically calculated by adding successive trapezoid
dardized by the major bodies in medical informatics areas below the ROC curve. Figure 1 shows the ROC
(Europe’s CEN 13606, and USA’s HL7 RIM v3). curves for two score classifiers A and B. In this
While traditional information modeling methodolo- example, classifier A has a larger AUC value than
gies produce large and detailed domain-specific classifier B.
models, the two-level modeling paradigm does not The AUC has the important statistical property that
specify a predefined set of attributes that can be it represents the probability that a randomly chosen
represented. It therefore provides a higher level of positive instance will be ranked higher than
flexibility which is expected to shield information a randomly chosen negative instance. Therefore,
systems from software maintenance and evolutionary according to this measure, classifier A would have
costs. a better classification performance than classifier B.
The ROC curves of two score classifiers are shown.
The AUC for classifier B is shown as a dark gray filled
References area. The AUC for classifier A corresponds to the light
gray filled area plus the dark gray filled area.
Bisbal J, Berry D (2011) An analysis framework for electronic
health record systems: interoperation and collaboration in
shared healthcare. Methods Inf Med 50(2):180–189
1.0

A
0.8 B
True positive rate

Area under the ROC Curve 0.6

Francisco Melo
0.4
Pontificia Universidad Católica de Chile,
Santiago, Chile
Millennium Institute on Immunology and 0.2

Immunotherapy, Santiago, Chile


0.0
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Synonyms
Area under the ROC Curve, Fig. 1 The AUC is used as an
AUC overall measure to compare classifiers
Artificial Evolution 39 A
References synonymously with ▶ artificial life, but in principle
the concepts are overlapping but different.
Hanley JA, McNeil BJ (1982) The meaning and use of the area
under a receiver operating characteristic (ROC) curve. A
Radiology 143:29–36
Characteristics

In Computer Science
Within computer science, artificial evolution is com-
Argonaute-mediated Cleavage monly implemented in terms of an evolutionary or
genetic algorithm (▶ Genetic Algorithms, ▶ Evolution
▶ Target Cleavage Programming) (GA). A GA is a search procedure that
implements the three essential elements of Darwinian
evolution - replication, mutation, and selection. Typi-
cally, the procedure acts on a population of candidate
Arrays solutions, which are scored according to a fitness
criterion and selected to be replicated according to
▶ DNA Microarrays a formula that takes the fitness score into account
(see Fig. 1).
This formula may be as simple as selecting
a percentage of the highest fitness individuals (elite
Arrow Ontology selection) or be a probability that is proportional to
fitness (roulette wheel selection) or a mixture of those.
▶ Edge Ontology The fitness function is usually determined by the user
so that the maximum of that function is achieved for
the optimal solution. In cases where it is not known
which function is maximized by the optimal solution to
Artificial Evolution a problem, the construction of an appropriate fitness
function can be a difficult research problem. After
Christoph Adami being selected for replication, individuals are mutated
Department of Microbiology and Molecular Genetics, and recombined with a given rate to create new
Michigan State University, East Lansing, MI, USA candidate solutions that have inherited the features of
their parents but potentially carry new features not
previously present in the population. Typically, GAs
Synonyms use a variety of mutation mechanisms to create
variation. Evolutionary algorithms are alternatives to
Simulated evolution more traditional random or heuristic search algorithms
and work best in large search spaces where the best
solution consists of partial solutions that are also fit.
Definition
In Engineering
Artificial evolution refers to any procedure that uses Artificial evolution can be used to solve engineering
the mechanism of Darwinian evolution to generate problems by applying evolutionary computation tech-
a product. While this definition encompasses the evo- niques to hardware rather than software. For example,
lution of organic life forms by means of artificial it is possible to design electronic circuits in
selection (breeding of animals, plants, or microorgan- reconfigurable hardware (Thompson 1996) that have
isms), artificial evolution commonly refers to the novel properties that exploit the material properties of
instantiation of evolution within a nonbiological the substrate. In another example of unconventional
medium. Artificial evolution is sometimes used computing (▶ Unconventional Computation), a team
A 40 Artificial Evolution

Generation 01: WDLTMNLT DTJBKWIRZREZLMQCO P


Generation 02: WDLTMNLT DTJBSWIRZREZLMQCO P
Generation 10: MDLDMNLS ITJISWHRZREZ MECS P
Generation 20: MELDINLS IT ISWPRKE Z WECSEL
Generation 30: METHINGS IT ISWLIKE B WECSEL
Generation 40: METHINKS IT IS LIKE I WEASEL
Generation 43: METHINKS IT IS LIKE A WEASEL

Artificial Evolution, Fig. 2 Selected generations on the line of


descent obtained by running Dawkins’ “weasel” program

and controlled by a neural architecture evolve in a


three-dimensional world with simulated physics.
This work highlighted the importance of complex
Artificial Evolution, Fig.1 Typical flow diagram of events in environments in the evolution of complexity.
a genetic algorithm

Digital Life
evolved a circuit with the goal of producing an oscil- Computer viruses (or, more generally, malware)
latory signal. One of the resulting circuits evolved evolve by artificial means when the malware creators
a radio receiver that captured the clock signal from react to a changed fitness landscape (e.g., new security
a computer on a nearby desk (Bird and Layzell 2002). countermeasures) by adapting their virus to these
changes. In this instantiation of artificial evolution,
In Evolutionary Biology replication is usually a key feature of the malware
Artificial evolution is used in evolutionary biology to program, while the fitness function is implicit to the
illustrate the process of evolution itself by instantiating environment within which the virus seeks to replicate,
evolution algorithmically, usually within a computer rather than specified externally. Variation is usually
(Adami 1998). One of the earliest uses of computers to directed by the malicious users but can sometimes be
study the process of evolution is Richard Dawkins’ autonomous. The concept of an evolving computer
thought experiment that demonstrates the process of program as an instantiation of evolution (as opposed
evolution in an artificial medium by evolving the target to a simulation of evolution) gave rise to the field of
phrase “Methinks it is like a weasel” from a randomly digital life, where self-replicating computer programs
generated sequence of letters drawn from an alphabet are executed on virtual (simulated) central processing
of 28 (Dawkins 1986). In this example, sequences are units (CPUs). By encasing programs in a virtual world,
copied 100 times (instantiating heredity), while 1 in 20 it is possible to design the language they are written in
characters is changed (instantiating the process of (their genetic code), to control the program’s rate of
mutation). The resulting sequences are compared to mutation, and the complexity of the world they live in.
the target phrase, and the sequence that is closest to The first working digital life system called “tierra”
the target is chosen to be copied again (instantiating was created by Tom Ray (Ray 1992). In tierra,
selection), see Fig. 2. self-replicating computer programs live in simulated
In the same book, Dawkins introduces the core memory and compete for CPU time and memory
“biomorph” program that evolves tree-like structures space. Because the ▶ fitness of any particular ▶ digital
whose appearance is determined by nine developmen- organism in tierra is solely determined by its ability to
tal “genes” that determine the tree’s appearance. The contribute offspring to future generations, there is no
genes that encode the tree are mutated and recombined a priori optimal program. The digital life system
at random and selected by the user that, thus, guides the “Avida” (see Figs. 3 and 4) was developed at the
process of evolution to generate desired shapes. California Institute of Technology in order to study
Another well-known example of artificial evolution fundamental aspects of evolutionary biology, such as
that probes the coevolution of morphology and the evolution of complexity (Adami and Brown 1994;
behavior is due to Karl Sims (Sims 1994), who studied Ofria et al. 1998). Digital life research can provide
how creatures built from interconnected blocks a system’s view of evolutionary biology because
Artificial Evolution 41 A
Artificial Evolution,
Fig. 3 The CPU acting on
Memory Input
a self-replicating program in
the Avida environment. In this
nand IP Output A
version, four threads (marked
nop-A OP1? Environment
“Heads”) are executing the
nop-B CPU
avidian code simultaneously,
akin to the transcription of nop-D
genetic code by four DNA h-search FLOW Registers
polymerases h-copy AX:FF0265DC
if-label READ
BX:00000100
nop-C
h-divide CX:1864CDFE
h-copy
mov-head WRITE
nop-B
Heads

Stacks

Artificial Evolution,
Fig. 4 An avidian genome in
the act of self-replication.
Letters a–z stand for one of the
26 possible instructions. The
solid black line follows the
forward execution, while red
denotes a backward jump

digital organisms are best understood within the study the process of evolution itself but can also
selective environment they evolved in, which includes be adapted to study the evolution of behavior and
the population of programs itself. intelligence as well as software design (Beckmann
Digital life research has given rise to a number of et al. 2008).
discoveries in evolutionary biology that have been
validated later in biological organisms, such as the
“survival-of-the-flattest” effect or the importance of Cross-References
▶ epistasis in the evolution of ▶ complexity
(Adami 2006). This type of work illustrates the ▶ Artificial Life
idea that the difference between evolution in the bio- ▶ Complexity
chemical realm and evolution in the computational ▶ Digital Organism
domain lies only in the substrate that carries ▶ Epistasis
the information that evolves, but that the processes ▶ Evolution Programming
that underlie the dynamics (the algorithmic rules) ▶ Fitness
are the same. Because of this generality, digital life ▶ Genetic Algorithms
systems such as Avida have been used not only to ▶ Unconventional Computation
A 42 Artificial Immune Systems

References behavioral biology, as well as aspects of computer


science, mathematics, engineering, and robotics. Pre-
Adami C (1998) Introduction to artificial life. Springer, decessor disciplines that have informed Artificial Life
New York
also include theoretical and mathematical biology and
Adami C (2006) Digital genetics: unravelling the genetic basis of
evolution. Nat Rev Genet 7:109–18 cybernetics. In turn, Artificial Life can be one of the
Adami C, Brown CT (1994) Evolutionary learning in the 2D predecessors of Systems Biology.
artificial life system “Avida”. In: Brooks R, Maess P (eds) 4th The central aim of Artificial Life is to identify,
international conference on the synthesis and simulation of
understand, and use the fundamental principles that
living systems. MIT Press, Boston, pp 377–381
Beckmann BE, Grabowski LM, McKinley PK, Ofria C (2008) underpin and organize biological systems. Special
Applying digital evolution to the design of self-adaptive emphasis is given to finding simple principles or rules
software. In: IEEE symposium on artificial life ‘09, IEEE that give rise to the complexity that is characteristic of
Computational Intelligence Society, Nashville, TN,
biological systems, as complexity is regarded as an
pp 100–107
Bird J, Layzell P (2002) The evolved radio and its implications emergent phenomenon (▶ Emergence). A central
for modelling the evolution of novel sensors. In: Evolution- methodological pattern of Artificial Life research is
ary computation, 2002. CEC ’02, IEEE Press, pp 1836–1841 to study dynamics or phenomena of biological systems
Dawkins R (1986) The blind watchmaker. Oxford University
by synthesizing or simulating them in entirely different
Press, Oxford, UK
Ofria C, Brown CT, Adami C (1998) The avida user’s manual. media. These media include software (e.g., using com-
In: Adami C (ed) Introduction to artificial life. Springer, putational or mathematical models to study evolution),
New York, pp 297–350 hardware (e.g., using robots to study behavior), and
Ray TS (1992) An approach to the synthesis of life. In: Langton
even wetware (i.e., building systems from scratch
CG, Taylor C, Farmer JD, Rasmusse S (eds) Second inter-
disciplinary workshop on the synthesis and simulation of using techniques from chemistry and molecular
living systems. Addison Wesley, Santa Fe, p 371 biology).
Sims K (1994) Evolving virtual creatures. In: Glassners A (ed) The Artificial Life approach is analogous to that
Proceedings of the 21st annual conference on computer
of Artificial Intelligence, which aims to engineer
graphics and interactive techniques. ACM Press, pp 15–22
Thompson A (1996) Silicon evolution. In: Koza JR, Goldberg intelligent machines and comprises various techniques
DE, Fogel DB, Riolo RL (eds) Genetic programming 1996: that are based on principles gleaned from existing
proceedings of the 1st annual conference (GP96). MIT Press, natural systems (e.g., the brain). The name “Artificial
pp 444–445
Life” should be understood as a reflection of this
analogy.
Interest in the synthesis of life-like systems may
Artificial Immune Systems derive from two distinct motivations. On the one
hand, bio-inspired mechanisms are adopted to engineer
▶ Lymphocyte Dynamics and Repertoires, Modeling systems that exhibit desirable properties of living sys-
tems. Examples for such properties include robustness
to unpredictable environments, learning, or other
forms of adaptation. On the other hand, such systems
Artificial Life may be built to be amenable to observation or experi-
mentation that is infeasible with the original object of
Jan T. Kim bioscientific inquiry. Computational simulations are
School of Computing Sciences, University of East paradigmatic of this approach, as they allow continu-
Anglia, Norwich, Norfolk, UK ous, nondestructive observation of dynamical pro-
cesses where such observation would be impossible
for objects in molecular biology.
Definition The biosciences typically focus on existing living
systems, and understanding of biological system there-
Artificial Life (Langton 1992b; Bedau et al. 2000) is fore tends to be descriptive rather than predictive. The
a highly interdisciplinary field of research which com- scope of Artificial Life goes beyond the existing “life
prises areas of the biological sciences including genet- as we know it,” and expressly includes “life as it
ics, evolutionary biology, ecology, neurobiology, and could be.” As an example, a traditional approach to
Artificial Life 43 A
computational modeling in biology might suggest that in capturing phenomena such as the emergence of
parameters, such as a mutation rate, should be based on genetic parasitism.
empirical measurement or observation, whereas it is Substantial research interest has focused on the
A
a typical Artificial Life approach to scan a range of major transitions in evolution (Smith and Szathmáry
settings for such a parameter. In this regard, the field 1995), and in particular in the transition from chemical
draws from the formal sciences within its interdisci- to biological evolution which took place upon the
plinary spectrum and focuses on defining and under- emergence of the first living systems. Hypercycles
standing state spaces, rather than on describing (Eigen and Schuster 1978), which may have formed
individual states. Artificial Life and Systems Biology in an RNA world (Gesteland et al. 2006), provide an
share this methodological feature, as well as the under- explanation of the emergence of self-replication, bio-
lying vision of developing predictive elements within logical evolution. They are therefore considered as
the biosciences. a key step in the transition to biological evolution.
Hypercycles are based on artificial chemistry-like
systems in which molecules catalyze synthesis of other
Characteristics molecules. The product of such synthesis may itself
have catalytic activity. A hypercycle is a set molecules
Evolution in which each molecule catalyzes synthesis of the next
Evolution has generated the living systems that exist one, with the last catalyzing synthesis of the first mol-
today. Biological evolution is assumed to be open ecule. Synthesis of molecules in a hypercycle will
ended: There is no limit to the diversity and complexity accelerate more quickly than that of molecules that
of the systems that it generates, and the space of evo- do not participate in a hypercycle. However, assuming
lutionary search is to be shaped and extended by the a homogeneous reaction mixture, hypercycles are vul-
evolving systems themselves. ▶ Artificial evolution nerable to molecules that are parasitic in the sense that
has been used both as a means to study evolution and they receive catalytic support but do not provide any
to explore possible uses of evolutionary mechanisms such support back to the hypercycle. However, as a key
for tackling various computing challenges. Effects of Artificial Life contribution, Boerlijst and Hogewag
evolving systems on the search space are a key (1992) have shown that this vulnerability does not
prerequisite for evolvability in generalized biology necessarily exist in a nonhomogeneous reactor.
(▶ Evolvability, Generalized Biology). While software continues to be a predominant
Computer models of evolution are often structurally medium for studying evolution, advances in molecular
similar to ▶ evolutionary algorithms. However, Artifi- biotechnology have enabled instantiations of evolu-
cial Life studies tend to place emphasis on modeling tionary systems in wetware. Examples include directed
biological features of evolution, such as open- and other forms of in vitro evolution (Gesteland et al.
endedness, whereas evolutionary algorithms are often 2006), and a study of mutation rate evolution during
developed for optimization purposes. In such applica- 10,000 generations of Escherichia coli (Sniegowski
tions, the fitness function and its domain are externally et al. 1997), illustrating that mutual exchange between
defined and static, i.e., the fitness value of an individual experimental wetware approaches and theoretical
does not depend on other individuals. In contrast to software-based investigation is a common pattern of
this, Artificial Life models of evolution often comprise Artificial Life and Systems Biology.
an intrinsic fitness concept (Bedau and Packard 1992),
where fitness of an individual depends on its interac- Gene Regulation and Gene Regulatory Networks
tions with others or on an environment that is shaped ▶ Gene regulation and the dynamics of gene expres-
by the actions of individuals. sion have attracted interest in theoretical biology and
In many models, fitness is an implicit or emergent cybernetics following the discovery of gene regulatory
property. For example, in the Tierra (Ray 1992) and mechanisms which led to the operon model introduced
Avida (Ofria et al. 1998) systems, programs on a by Jacob and Monod. Further abstraction from bio-
virtual machine are subject to Darwinian evolution. chemical details and numerical characteristics of indi-
These systems cannot straightforwardly be applied vidual regulatory mechanisms resulted in various
for optimization, but they have been highly successful Artificial Life models of gene regulation and
A 44 Artificial Life

▶ regulatory networks. In the NK model of Boolean three-dimensional morphological structures, ▶ neural


networks (Kauffman and Weinberger 1989), gene networks, to control their behavior. and an evolution-
activity is discretized into “off ” and “on” states, and ary component to generate creatures that achieve
each of the N genes is regulated by a Boolean function given tasks. The more recent Framsticks platform
taking its inputs from K regulators. Boolean networks (Adamatzky and Komosinski 2005) provides a similar
have subsequently been generalized to ▶ Probabilistic level of physical realism.
Boolean Networks (Zhang et al. 2007), which are used ▶ Lindenmayer systems (Prusinkiewicz and
in Systems Biology to model gene ▶ regulatory Lindenmayer 1990) use a small set of simple, local
networks. growth rules to simulate plant morphogenesis, thus
Numerous approaches to computationally capture making them a useful concept to study mechanisms
gene expression dynamics have been developed in of emergence of morphological complexity. While
Systems Biology and its forerunning interdisciplinary traditional Lindenmayer systems operate on trees,
areas. Integration of regulatory networks into models extensions that operate on general graphs have been
that also address other biological processes has been developed by Kniemeyer et al. (2004) to enable simu-
a major focus of interest in Artificial Life. As an lation, e.g., of multicellular structures that are not trees
example, the suggestion (by developmental biologist in the graph-theoretic sense. Other extensions of
Lewis Wolpert) that a regulatory network could create Lindenmayer systems enable their use in studying the
positional information and, on this basis, a striped interplay of gene expression dynamics and growth
“French flag” pattern has been taken up as (Kim 2001) and simulating evolution of morphology
a challenge by Artificial Life researchers, and it has and morphogenesis (Jacob 1996).
resulted in numerous computational models of mor-
phogenetic pattern formation, e.g., by Flann et al. Robots and Other Hardwares
(2005); Knabe et al. (2010). While “French flag” Robots and other hardware systems are part of the
models typically use a static spatial structure, such as physical world and therefore subject to physical forces.
a lattice, other models and systems include a dynamic This can be an essential advantage over software-based
morphogenetic structure which may feed back into virtual systems in which much efforts may be required
gene expression levels (Marnellos and Mjolsness to simulate physical aspects. Therefore, robots and
1998; Kim 2001). Also, evolution of regulatory net- other physical systems are used in Artificial Life as
works that organize morphogenesis has been studied a medium that is complementary to software-based
(Kim 2000). In summary, Artificial Life research has approaches.
provided insights into various aspects of gene regula- Solving technical problems by drawing inspirations
tory networks, such as their attractor structure and their from the biological object of scientific inquiry or inspi-
evolvability. These systems level properties continue ration frequently is a prominent motivation underpin-
to be relevant to Systems Biology as well. ning robot based research. As an example, the
subsumption architecture for robots (Steels and Brooks
Development and Morphogenesis 1995) aims to modularize and distribute the control
Development of patterns and morphological structures logic within the physical structure of a robot in a bio-
has inspired formal models such as reaction-diffussion inspired way. It also provides a demonstration how
systems (Meinhardt 1982) and Lindenmayer systems biological systems may achieve behavior that is adap-
(Prusinkiewicz and Lindenmayer 1990). Many of these tive to their environment without relying on an explicit
approaches were adopted and further developed representation of the environment. Another important
within Artificial Life. Kumar and Bentley (2003) concept in this work is embodiment, i.e., the idea that
provide a selection of computational approaches to cognitive or behavioral capacities are intimately linked
morphogenesis. with the physical structure of an agent.
Models which represent morphologies at a level of In evolutionary robotics (Nolfi and Floreano 2000),
physical realism often include further aspects to evolutionary algorithms are applied to build controllers
achieve biological plausibility or realism beyond of a robot’s behavior. In other lines of research, evolu-
visual appearance. For example, the model by tionary concepts are applied in robotics, e.g., to evolve
Sims (1994) comprises virtual creatures with controllers, robot structures, or to jointly evolve both.
Artificial Life 45 A
In ▶ unconventional computation, evolution is also of considerable complexity. Langton (1984) showed
invoked to design other types of computer hardware. that by omitting the requirement of universal compu-
tation, a much simpler lattice automaton enabling self-
A
Swarms and Multi-Agent Approaches replicating structure (sometimes called Langton loops)
Many biological systems consist of multiple units, e.g., is feasible. This raised the possibility that once self-
tissues consist of cells. While the individual units, such replication emerges, e.g., as explained by the theory of
as single neurons, are relatively simple, complexity the hypercycle, computing capabilities could gradually
results from their collective operation within evolve.
a system. Systems of this type represent emergence Cellular automata continue to be important to Arti-
of complexity from simple units and principles, and ficial Life. Langton (1992a) has outlined key condi-
therefore it is a focus of Artificial Life research. tions for complexity and universal computation in
As a classical example, boids (“bird-oids”) cellular automata. Wuensche (2011) has extensively
(Reynolds 1987; Carlson 2000 (Nov) are software investigated attractor properties of cellular automata.
agents follow simple rules of avoiding others, aligning He has also highlighted the close structural similarity
with others, and tending toward the swarm’s center of between cellular automata and Boolean network, indi-
mass for cohesion. These simple behavioral principles cating the potential of discrete dynamics approaches
result in complex and visually realistic swarm-like for regulatory network analysis.
behavior. In Systems Biology, multi-agent systems Langton (1992a) characterizes living systems as
are adopted for modeling intracellular processes and being determined by a dynamics of information, rather
tissues (Kiran et al. 2010). than by the dynamics of energy. Consequently,
a stream of biological modeling in which information
Self-Organization theory (▶ Information Theory and Toponomics)
Systems that are capable of maintaining their own (Cover and Thomas 1991) plays a prominent role
structure are called self-organizing or has developed within Artificial Life (▶ Information
autopoietic. Cells, with their ability to synthesize all in Biological Modeling). Formal approaches on
their constituent components, are paradigmatic of a behavioral level consider agents which are equipped
autopoiesis. Within Artificial Life, the concept of with sensors and actuators. Sensors provide the agent
▶ computational autopoiesis has been studied. with information from its environment (e.g., the direc-
Artificial chemistries (Dittrich et al. 2001) are sys- tion of a target object), and actuators enable the agent
tems comprised of a set of chemical substances and to manipulate its environment (e.g., by moving
a set of rules that describe reactions in terms of educts around). Using such a scenario, relevant information
and products. They provide a framework for studying provides insights into the amount of information
the emergence of ▶ organizations. Many computa- required to complete a given task, and empowerment
tional models of biochemical networks that are used provides a means of identifying advantageous strate-
in Systems Biology are structurally closely related gies when there is no known reward function.
to artificial chemistries and, thus, amenable to
analysis from a perspective of chemical organization.
Hypercycles (see above) are examples of chemical
organizations, and their emergence can be studied in Cross-References
artificial chemistry systems.
▶ Artificial Evolution
Computation and Information Theory ▶ Computational Autopoiesis
Information theory and theory of computation have ▶ Emergence
had a strong influence on Artificial Life. Realizing ▶ Evolutionary Algorithms
that self-replication is a fundamental capacity of living ▶ Evolvability, Generalized Biology
systems, John von Neumann designed a logical ▶ Gene Regulation
machine on a lattice automaton which is capable of ▶ Gene Regulatory Networks
self-replication and of universal computation. Com- ▶ Identification of Gene Regulatory Networks, Neural
bining these two key capabilities results in a system Networks
A 46 Artificial Life

▶ Information in Biological Modeling Kim JT (2001) Transsys: a generic formalism for modelling
▶ Information Theory and Toponomics regulatory networks in morphogenesis. In: Kelemen J,
Sosik P (eds) Advances in artificial life (ECAL 2001),
▶ Lindenmayer System vol 2159, Lecture notes in artificial intelligence. Springer
▶ Organization Verlag, Berlin/Heidelberg, pp 242–251
▶ Probabilistic Boolean Networks Kiran M, Richmond P, Holcombe M, Chin LS, Worth D,
▶ Reaction-Diffusion-Advection Equation Greenough C (2010) FLAME: simulating large populations
of agents on paralled hardware architectures. In: Proceedings
▶ Regulatory Networks of the 9th international conference on autonomous agents and
▶ Unconventional Computation multiagent systems: volume 1, vol 1. International Founda-
tion for Autonomous Agents and Multiagent Systems, Rich-
land, SC, pp 1633–1636, http://dl.acm.org/citation.cfm?
id¼1838206. 1838517. ISBN 978-0-9826571-1-9
References Knabe JF, Schilstra MJ, Nehaniv C (2010) Evolution and mor-
phogenesis of differentiated multicellular organisms: auton-
Adamatzky A, Komosinski M (eds) (2005) Framsticks: omously generated diffusion gradients for positional
a platform for modelings, simulating and evolving 3D crea- information. In: Bullock S, Noble J, Watson RA, Bedau
tures. Springer, Berlin/Heidelberg, pp 37–66 MA (eds) Artificial life XI. MIT Press, Cambridge, MA,
Bedau MA, Packard NH (1992) Measurement of pp 321–328
evolutionary activity, teleology, and life. In: Langton CG, Kniemeyer O, Buck-Sorlin GH, Kurth W (2004) A graph gram-
Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II, mar approach to artificial life. Artif Life 10(4):413–431.
vol X of Santa Fe institute studies in the sciences of doi:10.1162/1064546041766451, ISSN 10654–5462
complexity, proceedings. Addison-Wesley, Redwood City, Kumar S, Bentley PJ (eds) (2003) On growth, from and com-
pp 431–461 puters. Elsevier, Amsterdam. ISBN 0-12-428765-4
Bedau MA, McCaskill JS, Packard NH, Rasmussen S, Adami C, Langton CG (1984) Self-reproduction in cellular automata.
Green DG, Ikegami T, Kaneko K, Ray TS (2000) Open Physica D 22:120–149
problems in artificial life. Artif Life 6:363–376 Langton CG (1992a) Life at the edge of chaos. In: Langton CG,
Boerlijst M, Hogewag P (1992) Self-structuring and selection: Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II,
spiral waves as a substrate for prebiotic evolution. In: volume X of Santa Fe institute studies in the sciences of
Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Arti- complexity, proceedings. Addison-Wesley, Redwood City,
ficial life II, vol X of Santa Fe institute studies in the sciences CA, pp 41–91
of complexity, proceedings. Addison-Wesley, Redwood Langton CG (1992b) Preface. In: Langton CG, Taylor C,
City, pp 431–461 Farmer JD, Rasmussen S (eds) Artificial life II, volume X
Carlson S (2000) Artificial life. Boids of a feather flock together. of Santa Fe institute studies in the sciences of complexity,
Sci Am 283:112–114 proceedings. Addison-Wesley, Redwood City, CA,
Cover TM, Thomas JA (1991) Elements of information theory. pp xiii–xviii
Wiley, New York Marnellos G, Mjolsness E (1998) A gene network model of
Dittrich P, Ziegler J, Banzhaf W (2001) Artificial chemistries — resource allocation to growth and reproduction. In:
a review. Artif Life 7:225–275 Adami C, Belew RK, Kitanp H, Taylor C (eds) Artificial
Eigen M, Schuster P (1978) The hypercycle. a principle of life VI. MIT Press, Cambridge, MA, pp 433–437
natural self-organization. Part a: emergence of the Meinhardt H (1982) Models of biological pattern formation.
hypercycle. Naturwissenschaften 64:541–565 Academic, London
Flann N, Hu J, Bansal M, Patel V, Podgorski G (2005) Biological Nolfi S, Floreano D (2000) Evolutionary robotics. MIT Press,
development of cell patterns: characterizing the space of cell Cambridge, MA. ISBN 978-0-262-14070-6
chemistry genetic regulatory networks. In: Capcarrere M, Ofria C, Brown TC, Adami C (1998) Avida user’s manual. In:
Freitas AA, Bentley PJ, Johnson CG, Timmis J (eds) Adami C (ed) Introduction to artificial life. TELOS/Springer-
Advances in artificial life (ECAL 2005), vol 3630, Lecture Verlag, Berlin/Heidelberg
notes in artificial intelligence. Springer Verlag, Berlin/ Prusinkiewicz P, Lindenmayer A (1990) The algorithmic beauty
Heidelberg, pp 57–66 of plants. Springer, New York
Gesteland RF, Cech TR, Atkins JF (eds) (2006) The RNA Ray TS (1992) An approach to the synthesis of life. In:
World. Cold Spring Harbor Laboratory Press, Cold Spring Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Arti-
Harbor. ISBN 0-87969-739-3 ficial life II, volume X of Santa Fe institute studies in
Jacob C (1996) Evolution programs evolved. In: Voigt H-M, the sciences of complexity, proceedings. Addison-Wesley,
Ebeling W, Rechenberg I, Schwefel H-P (eds) PPSN-IV. Redwood City, CA, pp 371–408
Springer-Verlag, Berlin, pp 42–51 Reynolds CW (1987) Flocks, herds, and schools: a distributed
Kauffman SA, Weinberger EW (1989) The NK model of rugged behavioral model. Comput Graph 21:25–34, http://www.cs.
fitness landscapes and its application to maturation of the toronto.edu/dt/siggraph97-course/cwr87/
immune response. J Theor Biol 141:211–245 Sims K (1994) Evolving 3D morphology and behavior by com-
Kim JT (2000) LindEvol: artificial models for natural plant petition. In: Brooks RA, Maes P (eds) Artificial life IV.
evolution. K€unstliche Intelligenz 1(2000):26–32 Cambridge, MA, MIT Press, pp 28–39
Asymmetric Cell Division 47 A
Smith JM, Szathmáry E (1995) The major transitions in evolu- The importance of an association rule is often char-
tion. Oxford University Press, Oxford, UK. ISBN acterized with two measures:
978–0198502944
Support measures the fraction of all rows in the data-
Sniegowski PD, Gerrish PJ, Lenski RE (1997) Evolution of high A
mutation rates in experimental populations of e. coli. Nature base that satisfy both, body and head of the rule.
387:703–705 Rules with higher support are more important.
Steels L, Brooks RA (eds) (1995) The artificial life route Confidence measures the fraction of the rows that
to artificial intelligence: building embodied situated
agents. Lawrence Erlbaum, New Haven, CT. ISBN 978-0- satisfy the body of the rule, which also satisfy the
8058-1518-4 head of the rule. Rules with high confidence have
Wuensche A (2011) Esploring discrete dynamics. Luniver Press, a higher correlation between the properties
Frome, UK described in the head and the properties described
Zhang SQ, Ching WK, Ng MK, Akutsu T (2007) Simulation
study in probabilistic boolean network models for genetic in the body.
regulatory networks. Int J Data Mining Bioinformatics If the above rule has a support of 10% and a confidence
1:217–240 of 80%, this means that 10% of all people buy bread,
butter, milk, and cheese together, and that 80% of all
people that buy bread and butter also buy milk and
cheese.

Asf1
Cross-References
▶ CIA/Asf1
▶ Rule
▶ Rule-Based Methods

Assay

▶ Biological Assay
Asymmetric Cell Division

Heiko Enderling
Association Rule Center of Cancer Systems Biology, St. Elizabeth’s
Medical Center - CBR 115D, Tufts University School
Johannes F€
urnkranz of Medicine, Boston, MA, USA
Technical University of Darmstadt, Darmstadt,
Germany
Definition

Definition An asymmetric cell division yields two daughter cells


with different cellular fate, i.e., a cell of type A gives
An association rule is a ▶ rule where certain properties rise to a daughter cell of type A and a daughter cell
of the data in the body of the rule are related to other of type B.
properties in the head of the rule.
A typical application example for association rules
are product associations. For example, the rule A

Bread; butter ! milk; cheese A

B
specifies that people that buy bread and butter also tend
to buy milk and cheese.
A 48 Asymmetry of the Heart, Development

Cross-References

▶ Cancer Stem Cell Kinetics

HH 11
Asymmetry of the Heart, Development TBX18
WT1 PITX2
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
PITX2 SNAI1 NODAL
Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France FGF8 NODAL CFC HH 4-6

BMP4 SHH

Definition

A developmental explanation appears as a cascade of + -


events involving signaling, production of morphogens, + -
HH 3
+ -
expression, and repression of gene products. Modeling
-
those cascades and identifying the morphogens as well +
as their propagation dynamics and the cells’ receptiv- R L
ity mechanisms yields an explanation of morphogene-
sis. The embryonic apparition of an asymmetry in the
Asymmetry of the Heart, Development, Fig. 1 The devel-
heart results from such cascade, where each moment of opmental cascade yielding the LR asymmetry of the cardiac
the process involves specific genes and molecules, and venuous pole
where important structures are often transitory,
a crucial fact first emphasized by advocates of epigen-
esis such as Caspar Friedrich Wolff or Karl Ernst
von Baer. The development of the asymmetry of the heart
At the venous pole of the embryonic vertebrate provides a clear example of our knowledge of specific
heart, a transitory structure, the proepicardium (PE) morphogenesis, its involving various levels of biolog-
produces the epicardium, coronary vasculature, and ical hierarchies, and its proper timing.
fibroblasts. In the chicken embryo, the PE displays
left-right asymmetry and develops only on the
right side, while on the left only a vestigial PE is Cross-References
formed, which subsequently gets lost by apoptosis
(Schlueter and Brand 2009). This asymmetry results ▶ Explanation, Developmental
from a cascade of signaling and gene expression
events which involves both sides of the PE, originally
symmetrical (Fig. 1). Transitory structures (here PE)
proves therefore to be crucial, since the symmetrical References
form yields the asymmetrical pattern. When the
Schlueter J, Brand T (2009) A right-sided pathway involving
asymmetric heart is built, the PE disappears by FGF8/Snai1 controls asymmetric development of the
being integrated into the structures built (Takano proepicardium in the chick embryo. PNAS
et al. 2007). 106(18):7485–7490
ATAC 49 A
Takano K, Ito Y, Obata S, Oinuma T, Komazaki O, Nakamura
M, Asashima M (2007) Heart formation and left-right asym- ATAC
metry in separated right and left embryos of a newt. Int J Dev
Biol 51:265–272 A
Tetsuro Kokubo
Department of Supramolecular Biology, Graduate
School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan
Asynchrony

Xiaojuan Sun and Jinzhi Lei Synonyms


Zhou Pei-Yuan Center for Applied Mathematics,
Tsinghua University of Beijing, Beijing, China Ada-Two-A-Containing

Definition
Definition
In mammals, yeast SAGA has diverged into two
Asynchrony, in the general meaning, is the state of related complexes: SAGA, which like yeast SALSA/
not being synchronized (▶ Synchronization). In sig- SLIK lacks Spt8, and ATAC (Ada-Two-A-Containing)
naling network, asynchrony refers to the situation in (Spedale et al. 2012). Human SAGA contains 18 sub-
which each unit in the network shows independent units (catalytic subunit Gcn5 or Pcaf, Taf5L, Taf6L,
responses to external signals, and no correlation Taf9, Taf10, Taf12, Ada1, Ada2b, Ada3, Spt3, Spt7,
among the units. Spt20, Sgf11/ATX7L3, Sgf29, Sgf73/ATX7, Usp22,
ENY2, and TRRAP), whereas human ATAC contains
12 subunits (catalytic subunit Gcn5 or Pcaf, catalytic
subunit Atac2, Ada2a, Ada3, Sgf29, Atac1, Hcf1,
Cross-References Wdr5, NC2b, Yeats2, MBIP, CHRAC17). SAGA and
ATAC can both contain either Gcn5 or Pcaf
▶ Synchronization (paralogues) as a catalytic subunit; thus, there are two

SAGA ATAC

Atac2 Atac2

Ada3 Ada2b Ada3 Ada2b Ada3 Ada2a Ada3 Ada2a

Sgf29 Gcn5 Sgf29 Pcaf Sgf29 Gcn5 Sgf29 Pcaf

histone acetyltransferase (HAT)


SAGA-specific Ada2 protein (Ada2b)
ATAC-specific Ada2 protein (Ada2a)

ATAC, Fig. 1 Comparison of human SAGA and ATAC


A 50 ATL

closely related but distinct complexes in each class


(Fig. 1). SAGA and ATAC share three subunits ATP-binding Cassette B1 Transporter
(Gcn5 or Pcaf, Ada3, and Sgf29) but contain distinct
Ada2 proteins (Ada2b in SAGA and Ada2a in ATAC). Vani Brahmachari and Shruti Jain
These two HAT complexes are recruited to different Dr. B. R. Ambedkar Center for Biomedical Research,
genomic loci, where they regulate different sets of University of Delhi, Delhi, India
genes. Furthermore, SAGA is recruited mainly to the
promoter regions of target genes, whereas ATAC is
recruited equally to promoter and enhancer regions Synonyms
(Krebs et al. 2011). ATAC binds to enhancers that are
not bound by p300, another human HAT. ATAC con- ABCB1; MDR1
tains two HAT subunits, Gcn5/Pcaf and Atac2, which
confer different substrate specificities: Gcn5/Pcaf acet-
ylates histone H3K9 and H3K14, whereas Atac2 acety- Definition
lates histone H4K16. The HAT activity of Gcn5 toward
mononucleosomes is greatly stimulated when it is in It belongs to the family of ATP-binding cassette (ABC)
complex with Ada2b-Ada3, but not when it is in com- transporters. It is a well-characterized human ABC
plex with Ada2a-Ada3. Thus, the HAT modules in transporter that was the first of its kind implicated in
SAGA and ATAC must play different roles in transcrip- multidrug resistance of cancer cells. Its normal expres-
tional regulation. Consistent with this, only SAGA con- sion is to prevent the uptake of some lipophilic drugs
tains the histone H2B deubiquitination (DUB) module into the brain and other key organs. It is over-expressed
(Usp22-Sgf11-Sgf73-ENY2), which appears to regulate in cancer cells resulting in drugs being pumped out of
the expression of some genes at the elongation stage. the cells faster than they can enter. This leads to a lower
concentration of the drug in the cell and reduces the
effectiveness of the drugs in killing cancer cells.
Cross-References

▶ Mechanisms of Transcriptional Activation and


Cross-References
Repression
▶ Epigenetics, Drug Discovery
References

Krebs AR, Karmodiya K, Lindahl-Allen M, Struhl K, Tora


L (2011) SAGA and ATAC histone acetyl transferase com-
plexes regulate distinct sets of genes and ATAC defines a class
ATP-dependent Nucleosome-
of p300-independent enhancers. Mol Cell 44(3):410–423 Remodeling Factors
Spedale G, Timmers HT, Pijnappel WW (2012) ATAC-king the
complexity of SAGA during evolution. Genes Dev Toshiya Senda1 and Naruhiko Adachi2
26(6):527–541 1
Biomedicinal Information Research Centre (BIRC),
National Institute of Advanced Industrial Science and
Technology (AIST), Tokyo, Japan
ATL 2
Structure-guided Drug Development Project, JBIC
Research Institute, Japan Biological Informatics
▶ Adult T-Cell Leukemia Consortium, Tokyo, Japan

ATLL Synonyms

▶ Adult T-Cell Leukemia Nucleosome-remodeling factor; Remodeling factor


Attractor 51 A
Definition Cross-References

ATP-dependent nucleosome-remodeling factors, ▶ Histone Post-translational Modification to


A
which belong to the ATP-dependent DNA/RNA Nucleosome Structural Change
helicase superfamily, promote the positional adjust-
ment of nucleosomes with ATP hydrolysis and are
involved in cellular processes such as gene expres- References
sion, genome duplication, repair of damaged DNA,
and chromosome recombination (Eberharter and Eberharter A, Becker PB (2004) ATP-dependent nucleosome
remodeling: factors and functions. J Cell Sci 117:3707–3711
Becker 2004). Several complexes with ATP-
Yamada K, Frouws TD, Angst B, Fitzgerald DJ, DeLuca C,
dependent nucleosome-remodeling activity have so Schimmele K, Sargent DF, Richmond TJ (2011) Structure
far been purified; these include SWI/SNF, RSC, and mechanism of the chromatin remodelling factor ISW1a.
NURF, ACF, CHRAC, NuRD/NRD, and INO80 Nature 472:448–453
complexes. Biochemical and biological studies
have shown that ATP-dependent nucleosome-
remodeling factors are large multi-subunit com- Attracter
plexes, which are composed of an ATPase subunit
and other associated subunits. The ATPase typically ▶ Attractor
belongs to the ATP-dependent DNA/RNA helicase
family. ATP-dependent nucleosome-remodeling
factors can be categorized by the ATPase subunit Attraction
into four families, the SWI/SNF, ISW1, CHD, and
INO80 families. The ATPase subunit of SWI/SNF ▶ Attractor
and RSC complexes belong to the SWI/SNF family.
Since the ATPase subunit of this family contains
a bromodomain, discovery of a functional relation-
ship with acetylated histones has been expected. Attractive Force
NURF, ACF, and CHRAC complexes possess an
ATPase subunit of the ISWI family. The C-terminal ▶ Attractor
region of the ATPase subunit in the ISWI family has
a SANT-like domain, which has DNA-binding activ-
ity. Indeed, the recent revelation of the crystal struc-
Attractiveness
ture of ISW1a showed that the SANT domain
interacts with DNA and suggested the structure– ▶ Attractor
function relationship of ATP-dependent nucleo-
some-remodeling factors (Yamada et al. 2011). The
NuRD/NRD complex can be categorized into the
Mi-2/CHD family. The ATPase subunit of this fam- Attractor
ily contains a chromodomain, which is known to
recognize methylated Lys residues in histones. Fuyan Hu
A functional relationship between ATP-dependent Institute of Systems Biology, Shanghai University,
nucleosome-remodeling factors and histone methyl- Shanghai, China
ation has been suggested. The INO80 complex
belongs to the INO80 family. The ATPase subunit
of the INO80 family contains an insertion of approx- Synonyms
imately 300 amino acid residues within their ATP-
dependent DNA/RNA helicase domain. Functions of Attracter; Attraction; Attractive force; Attractiveness;
the insertion are unknown. Magnet
A 52 Attribute Selection

Definition Biomedical Corpus (CALBC) the named entities from


different annotated corpora are harmonized to generate
In the study of dynamical systems, an attractor is the final corpus.
a “set,” “curve,” or “space” that is used to describe
a system toward which the system tends to evolve
regardless of the starting conditions of the system. It Introduction
is also known as a “limit set.” There are five known
types of attractors: point attractors, periodic point Biomedical text mining (TM) is seeking benchmarks
attractors, periodic attractors, strange attractors, and to assess existing TM solutions. A number of chal-
spatial attractors. lenges have been proposed to achieve this goal:
BioCreAtive I and II, JNLPBA and others (Hirschman
et al. 2005; Krallinger et al. 2008; Kim et al. 2004,
References 2009; LLL 05). In all these approaches, the organizers
deliver a set of manually annotated documents and ask
Ruelle D. Elements of differentiable dynamics and bifurcation the challenge participants (CPs) to reproduce the
theory. Boston: Academic; 1989. ISBN 0-12-601710-7.
results with their automatic methods.
The first CALBC Challenge is similar in the sense
that again an annotated corpus is provided to the chal-
lenge participants (CPs) for the reproduction of the
Attribute Selection annotations, but the first CALBC Challenge is differ-
ent with regards to the following modifications: (1) the
▶ Feature Selection annotated corpus has been generated automatically and
not manually (Silver Standard Corpus, SSC-I) and
(2) the size of the SSC-I is significantly bigger than
the corpora mentioned produced for the other chal-
AUC lenges (i.e., 150,000 annotated Medline abstracts for
training and testing). The automatic annotation of the
▶ Area under the ROC Curve corpus has been achieved by automatic harmonization
of the contributions from different automatic annota-
tion solutions (Rebholz-Schuhmann et al. 2010). Over-
all, the proposed approach of the CALBC project can
Automata cover a larger number of annotations due to the fact
that the annotations are produced automatically and
▶ Modeling Formalisms, Lymphocyte Dynamics and harmonized with automatic means.
Repertoires

Generation of the First and Second CALBC


Silver Standard Corpus (SSC-I and SSC-II)
Automated Corpus Generation (CALBC)
All initial project partners of the CALBC project (PP)
Dietrich Rebholz-Schuhmann annotated the corpus of 150,000 Medline abstracts
European Bioinformatics Institute, Hinxton, UK with their annotation solutions. All annotations were
delivered in the IeXML format and concept normali-
zation should make use of standard resources such as
Definition UMLS, UniProtKb, and EntrezGene, or should follow
the UMLS semantic type system (Rebholz-Schuhmann
A biomedical corpus contains a large number of bio- et al. 2006; Bodenreider and McCray 2003;
medical entities that have to be annotated for corpus Bodenreider 2004; The Universal Protein Resource
generation. In the Collaborative Annotation of a Large (UniProt) 2009; Maglott et al. 2007). The pair-wise
Automated Corpus Generation (CALBC) 53 A
alignment is based on the methods described in solutions lead to a bigger number of annotations that
Rebholz-Schuhmann et al. (2010a, b). All contribu- are captured. This achievement is counterbalanced by
tions from CPs were assessed against the SSC-I by the number of agreements that have to be available at
A
applying exact matching, nested matching, and cosine minimum to accept an annotation.
similarity matching with a 0.98 and 0.9 cosine similar- Second, for the same type of entity, e.g., “insulin,”
ity score (Rebholz-Schuhmann et al. 2010c). All sub- different annotation solutions use a different tag, e.g.,
missions from all participants have been evaluated PRGE instead of CHED and vice versa. The harmoni-
and the contributions with the best F-measure perfor- zation of the corpus can account for this, but will not
mance against the SSC-I have been selected for produce this type of polysemous annotation through-
the harmonization into the SSC-II, if the participants out the whole corpus, since not all mentions have been
did not train their solution on the training data. consistently annotated with the two different groups
over the whole corpus.
Third, inflections of terms, e.g., tumor vs. tumors
Results and “bear” versus “bearing,” lead to disagreements
between the different annotation solutions. In the first
All 4 PPs from the CALBC project and in addition, case, the inflectional variability could be resolved and
12 challenge participants (CPs) contributed annotated would lead to higher agreement, in the second case
data sets for evaluation against the SSC-I. CPs could assumptions about the usage of the verb or noun have
ignore the training data and deliver the annotations to be made to resolve conflicts.
from their annotation system, or could train
a machine-learning approach on the provided pre-
annotated data. In general, the performances of the Conclusions
annotation solutions were lower for the CHED and
PRGE in comparison to the identification of DISO The SSC-I delivers a large set of annotations
and SPE. One explanation is that the terminological (1,121,705) for a large number of documents
resources for DISO and SPE are better standardized as (100,000 Medline abstracts). The annotations cover
well as the mention of DISO and SPE in the literature. four different semantic groups and are sufficiently
The best performance over all semantic groups homogeneous to be reproduced with a trained classifier
were achieved from two annotation solutions that leading to an average F-measure of 85%.
have been trained on the SSC-I. The performances of Benchmarking the annotation solutions against the
the participants’ solutions were again measured SSC-II leads to better performance for the CPs’ anno-
against the SSC-II and with findings similar to the tation solutions in comparison to the SSC-I.
measurements against the SSC-I. For PRGE, a drop
in recall was noticed for the PPs solutions in combina-
tion with an increase in recall for the CPs solutions. Cross-References

▶ BioCreative II.5 and the FEBS Letters Experiment


Discussion on Structured Digital Abstracts
▶ BioNLP Shared Task
The manual analysis of the SSC-I and the SSC-II is ▶ Named Entity Recognition
ongoing work. Due to the size of the corpus, it requires ▶ Natural Language Processing
special IT solutions to oversee the regularities and ▶ Text Corpora, Relationship Extraction
irregularities in the corpus. A selection of irregularities
result from the methods applied. First, a number of
annotations are not captured (“false negatives,” FN,
References
reduced recall) if none of the solutions identifies the
entities. An increasing number of contributing annota- Bodenreider O (2004) The unified medical language system
tion solutions reduces the risk that annotations are (UMLS): integrating biomedical terminology. Nucleic
missed: A bigger number of included annotation Acids Res 32(Database issue):D267–D270
A 54 Automated Reasoning

Bodenreider O, McCray A (2003) Exploring semantic groups A formal language is designed in which a problem’s
through visual approaches. J Biomed Inform 36(6):414–432 assumptions and conclusion can be written down and
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview
of BioCreAtIvE: critical assessment of information extrac- algorithms are provided to solve the problem with
tion for biology. BMC Bioinform 6(1):S1 a computer in an efficient way.
Kim J.D, Ohta T, Tsuruoka Y, Tateishi Y, Collier N (2004)
Introduction to the bio-entity recognition task at JNLPBA.
In: Proceedings of the JNLPBA-04, Geneva, Switzerland,
pp 70–75 Characteristics
Kim J.D, Ohta T, Pyysalo S, Kano Y, Tsujii J (2009) Overview
of BioNLP’09 shared task on event extraction. In: Proceed- People employ reasoning informally by taking a set of
ings of the workshop on BioNLP: shared task, Colorado, premises and somehow arriving at a conclusion, that is,
pp 1–9
Krallinger M, Morgan A, Smith L, Leitner F, Ta-nabe L, Wilbur J, it is entailed by the premises (▶ Deduction), arriving at
Hirschman L, Valencia A (2008) Evaluation of textmining a hypothesis (▶ Abduction), or generalizing from facts
systems for biology: overview of the second BioCreAtIvE to an assumption (▶ Induction). Mathematicians and
community challenge. Genome Biol 9(2):S1 computer scientists developed ways to capture this
LLL’05 challenge. http://www.cs.york.ac.uk/aig/lll/lll05/
Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez gene: formally with logic languages to represent the knowl-
gene-centered information at NCBI. Nucleic Acids Res 35 edge and rules that may be applied to the axioms so that
(Database issue):D26–D31 one can construct a formal proof that the conclusion
Proceedings of the First CALBC Workshop. http://www.ebi.ac. can be derived from the premises. This can be done by
uk/Rebholz-srv/CALBC/docs/1stworkshopProceeding.pdf
Rebholz-Schuhmann D, Kirsch H, Nenadic G (2006) IeXML: hand (Solow 2005) for small theories, but this does not
towards a framework for interoperability of text processing scale up when one has, say, 80 or more axioms even
modules to improve annotation of semantic types in biomed- though there are much larger theories that require
ical text. In: Proceedings of BioLINK, ISMB 2006, a formal analysis, such as checking that the theory
Fortaleza, Brazil
Rebholz-Schuhmann D, Jimeno Yepes AJ, van Mulligen E.M, can indeed have a model and thus does not contradict
Kang N, Kors J, Milward D, Corbett P, Buyko E, itself. To this end, much work has gone into automat-
Tomanek K, Beisswanger E, Hahn U (2010) The CALBC ing reasoning.
silver standard corpus for biomedical named entities: a study The remainder of this section introduces briefly
in harmonizing the contributions from four independent
named entity taggers. In: Proceedings of the LREC several of the many purposes and usages of automated
(to appear) reasoning, the principal mechanisms (being deduction
Rebholz-Schuhmann D et al (2010b) Assessment of NER solu- and to a lesser extent abduction and induction), its
tions against the first and second CALBC Silver Standard limitations, and types of automated reasoners.
Corpus. Semantic Mining in Biomedicine, Hinxton
Rebholz-Schuhmann D, Jimeno Yepes AJ, Van Mulligen EM,
Kang N, Kors J, Milward D, Corbett P, Buyko E, Purposes
Beisswanger E, Hahn U (2010c) CALBC silver standard Automated reasoning, and deduction in particular, has
corpus. J Bioinform Comput Biol 8(1):163–179 found applications in “everyday life.” A notable exam-
The Universal Protein Resource (UniProt) (2009) Nucleic Acids
Res 37(Database):D169–D174 ple is hardware and (critical) software verification,
which gained prominence after Intel had shipped its
Pentium processors with a floating point unit error in
1994 that lost the company about $500 million. Since
Automated Reasoning then, chips are routinely automatically proven to func-
tion correctly according to specification before taken
C. Maria Keet into production. A different scenario is scheduling
KRDB Research Centre, Free University of problems at schools to find an optimal combination
Bozen-Bolzano, Bolzano, Italy of course, lecturer, and timing for the class or
degree program, which used to take a summer to do
manually but can now be computed in a fraction of
Definition it using constraint programming. Much closer to sys-
tems biology is the demonstration of discovering
Automated reasoning is the approach of implementing (more precisely: deriving) novel knowledge about pro-
the ability to make inferences on a computing system. tein phosphatases by Wolstencroft and coauthors
Automated Reasoning 55 A
(in Baker and Cheung 2008). They represented the some results); if it is unsound, then false conclusions
knowledge about the subject domain of protein phos- can be derived from true premises, which his even
phatases in humans in a formal ▶ bio-ontology and more undesirable (Hedman 2004; Portoraro 2010).
A
classified the enzymes of both human and the fungus An example is included in ▶ Proof, proving the
Aspergillus fumigatus using an automated reasoner, validity of a formula (the class of the problem) in
which showed that (1) the reasoner was as good as propositional logic (the formal language) using tableau
human expert classification, (2) it identified additional reasoning (the way how to compute the solution) with
p-domains (an aspect of the phosphatases) so that the the Tree Proof Generator (the automated reasoner).
human-originated classification could be refined, and
(3) it identified a novel type of calcineurin phosphatase Deduction
like in other pathogenic fungi. The fact that one can use Deduction is a way to ascertain if a theory
an automated reasoner (in this case: deduction, using T represented in a logic language entails an axiom a
a Description Logics knowledge base) as a viable that is not explicitly asserted in T (written as T |¼ a),
method in science is an encouragement to explore that is, a can be derived from the premises
such avenues further. though repeated application of deduction rules for
instance, a theory that states that “each arachnid has
Basic Idea as part 8 legs” and “each tarantula is an arachnid,” then
Essential to automated reasoning are: one can deduce – it is entailed in the theory – that “each
1. The choice of the class of problems the software tarantula has as part 8 legs.” An example is included in
program has to solve, such as checking the consis- ▶ deduction, which formally demonstrates that
tency of a theory (i.e., there are no contradictions) a formula is entailed in a theory T using said rules.
or computing a classification hierarchy of concepts Thus, strictly speaking, a deduction does not reveal
subsuming one another based on the properties novel knowledge but only that what was already
represented in the logical theory represented implicitly in the theory. Nevertheless,
2. The formal language in which to represent the with large theories, it is often difficult to oversee all
problems, which may have more or less features to implications of the represented knowledge, and, hence,
represent the subject domain knowledge, such as the deductions may be perceived as novel from
cardinality constraints (e.g., a member of the arach- a domain expert perspective, such as with the example
nids has as part exactly eight legs), probabilities, or about the protein phosphatases. This is in contrast to
temporal knowledge (e.g., that a butterfly is ▶ abduction and ▶ induction, where the reasoner
a transformation of a caterpillar) “guesses” knowledge that is not already entailed in
3. The way how the program has to compute the solu- the theory.
tion, such as using natural deduction or resolution
4. How to do this efficiently, be this achieved through Abduction
constraining the language into one of low complex- Compared to deduction, there is less permeation of
ity or optimizing the algorithms to compute the automated reasoning for abduction. From a scientist’s
solution or both perspective, automation of abduction may seem
Concerning the first item, with a problem being, for appealing because it would help one generate
example, “find out if my theory is consistent,” then the a hypothesis based on the facts put into the reasoner
problem’s assumptions are the axioms in the logical (Aliseda 2004). Practically, it has been used for, for
theory and the problem’s conclusion that is computed instance, fault detection: given the knowledge about
by the automated reasoner is a “yes” or a “no” a system and the observed defective state, find the
(provided the language in which the assumptions are likely fault in the system.
represented is decidable and thus guaranteed to termi- To formally capture theory with assumptions and
nate with an answer). With respect to how this is done facts and find the conclusion, several approaches have
(item 3), two properties are important for the calculus been proposed, each with their specific application
used: soundness and completeness (T |- j if and only if areas, for instance, sequent calculus, belief revision,
M |¼ j). If it is incomplete, then there exist entail- probabilistic abductive reasoning, and ▶ Bayesian
ments that cannot be computed (hence, “missing” networks.
A 56 Automated Reasoning

Induction that is, they are languages such that the corresponding
Logic Induction allows one to arrive at a conclusion reasoning services are guaranteed to terminate with an
that actually may be false even though the premises are answer. Description Logics form the basis of most of
true. The premises provide a degree of support so as to the Web Ontology Languages OWL and OWL 2 and
infer a as an explanation of b. Such a “degree” can be are gaining increasing importance in the semantic web
based on probabilities (a statistical syllogism) or applications area. Giving up expressiveness, however,
analogy. For instance, we have a premise that does lead to criticism from the modelers’ community,
“the proportion of bacteria that acquire genes through as a computationally nice language may not have the
horizontal gene transfer is 95%” and the fact that features deemed necessary to represent the subject
“Staphylococcus aureus is a bacterium,” then we domain adequately.
induce that the probability that S. aureus acquires
genes through horizontal gene transfer is 95%. Induc- Tools
tion by analogy is weaker version of reasoning, in There are many tools for automated reasoning, which
particular in logic-based systems, and yields very dif- differ in which language they accept, the reasoning
ferent answers than deduction. For instance, let us services they provide, and, with that, the purpose they
encode that some instance, say, Tibbles is a cat and aim to serve.
we know that all cats have the properties of having There are, among others, generic first- and higher-
a tail, four legs, and are furry. When we encode that order logic theorem provers (e.g., Prover9, MACE4,
another animal, Tib, who happens to have four legs and Vampire, HOL4); SAT solvers that compute if there is
is also furry, then by inductive reasoning by analogy, a model for the formal theory (e.g., GRASP, Satz);
we conclude that Tib is also a cat, even though in constraint satisfaction programming for solving, for
reality it may well be an instance of cheetah. On the example, scheduling problems and reasoning with soft
other hand, by deductive reasoning, Tib will not be constraints (e.g., Eclipse); DL reasoners that are used for
classified as being an instance of cat but may be an reasoning over OWL ontologies using deductive reason-
instance of a superclass of cats (e.g., still within the ing to compute satisfiability, consistency, and perform
suborder Feliformia), provided that the superclass has taxonomic and instance classification (e.g., Fact++,
declared that all instances have four legs and are furry. RacerPro, Hermit, CEL, QuOnto); and inductive logic
Given that humans do perform such reasoning, there programming tools (e.g., PROGOL and Aleph).
are attempts to mimic this process in software applica-
tions, most notably in the area of machine learning and
inductive logic programming (Lavrac and Dzeroski Cross-References
1994). The principal approach with inductive logic
programming is to take as input positive examples + ▶ Abduction
negative examples + background knowledge and then ▶ Bio-Ontologies
derive a hypothesized logic program that entails all the ▶ Classification
positive and none of the negative examples. ▶ Closed World Assumption
▶ Deduction
Limitations ▶ Fuzzy Logic
While many advances have been made in specific ▶ Induction
application areas, the main limitation of the ▶ Knowledge Representation
implementations is due to the computational complex- ▶ Open World Assumption
ity of the chosen representation language and the ▶ Proof, Logic
desired automated reasoning services. This is being
addressed by implementations of optimizations of the
algorithms or by limiting the expressiveness of the References
language, or both. One family of logics that focus
Aliseda A (2004) Logics in scientific discovery. Found Sci
principally on “computationally well-behaved” lan- 9:339–363
guages is Description Logics, which are decidable Baader F, Calvanese D, McGuinness DL, Nardi D,
fragments of first order logic (Baader et al. 2008); Patel-Schneider PF (eds) (2008) The description logics
Automated Term Recognition 57 A
handbook: theory and applications, 2nd edn. Cambridge the linguistic realization of specialized concepts in
University Press, Cambridge, UK the biology domain. As the main purpose of terms is
Baker CJO, Cheung H (eds) (2008) Semantic web:
to classify scientific knowledge, they are used as
revolutionizing knowledge discovery in the life sciences. A
Springer, New York a means of scientific communication to convey biolog-
Hedman S (2004) A first course in logic – an introduction to ical concepts. Once extracted, they can be used to
model theory, proof theory, computability, and complexity. construct conceptual networks of knowledge, and
Oxford University Press, Oxford, UK
Lavrac N, Dzeroski S (1994) Inductive logic programming: may be organized into hierarchies denoting different
techniques and applications. Ellis Horwood, New York types of relationships, such as is-a, part-of, etc., in
Portoraro F (2010) Automated reasoning. In: Zalta E (ed) addition to domain-specific relations such as inhibits,
Stanford encyclopedia of philosophy. Stable URL, http:// induces, etc. These relationships can be subsequently
plato.stanford.edu/entries/reasoning-automated/
Solow D (2005) How to read and do proofs: an introduction used to develop ontologies (▶ Ontologies). Biological
to mathematical thought processes, 4th edn. Wiley, terms are complex, since they include a vast number of
Hoboken synonyms and variants, such as acronyms. As an
example, consider the term ERK2, which denotes
a protein (▶ Named Entity) in Homo sapiens. Within
the literature, a large number of synonymous term
Automated Term Recognition variants of ERK2 can occur, including the
following: MAPK1, MAP kinase 1, ERT1,
Sophia Ananiadou extracellular signal-regulated kinase 2, ERK-2,
National Centre for Text Mining, School of Mitogen-activated protein kinase 2, MAP kinase 2,
Computer Science, University of Manchester, MAPK2 (http://www.uniprot.org/uniprot/P28482).
Manchester, UK Addressing terminological variability is crucial to
ensure the accuracy of text-based search systems in
biology, to allow integration of different resources, and
Synonyms to facilitate the development of terminologies
and ontologies (▶ Ontologies). Existing terminologi-
Term extraction; Terminology management cal and ontological resources do not cover all instances
of terms found in the literature. Since most of these
resources are manually curated, keeping track of all
Definition new terms and variants that appear in the literature is
a virtually impossible task. Term extraction tools can
Automatic term recognition (ATR) refers to the be a great asset to the curators of biological databases
automatic extraction of technical terms from and terminological resources, in that they can signifi-
domain-specific texts. It additionally encompasses cantly reduce the burden of keeping the resources
the assignment of semantic categories of the extracted up-to-date. The challenges faced by such tools thus
terms and mapping of them to concepts contained include not only the automatic identification of
within terminological or ontological resources. biological terms, but also the identification of
synonyms belonging to the same concept, and the
disambiguation of terms that belong to different
Characteristics concepts. In order to address all of these challenges,
integrated techniques such as automatic term recogni-
The Need for Term Recognition tion, ▶ term normalization and ▶ term disambiguation
With an overwhelming number of textual resources are essential for the systematic collection and update of
becoming available in biomedicine, there is an terminologies and their integration into systems
increased interest in ▶ text mining techniques that biology applications.
can identify, extract, manage, and exploit biomedical
knowledge hidden in the literature (Ananiadou and Methods of Automatic Term Recognition (ATR)
McNaught 2006). In systems biology, terms are the The process of ATR consists of three steps: term
backbone of such knowledge, since they constitute recognition, term classification, and term mapping.
A 58 Automated Term Recognition

Term recognition identifies terms from the literature. collection of documents that deal with a specific
Term classification assigns coarse-grained semantic domain can denote their degree of termhood (Kageura
tags to the recognized terms, for example, metabolites, and Umino 1996), that is, the extent to which such
chemical compounds, genes, proteins, etc. These tags sequences represent domain-specific terms. Another
are determined by using ▶ named entity recognition way of looking at word distribution is to measure the
techniques. Lastly, the recognized terms are mapped attachment strength between the components of a term
to terminological or ontological resources such as (unithood). Techniques like ▶ mutual information,
UMLS (Bodenreider 2004), OBO (Open Biological log-likelihood ratios test, frequency of occurrence
Ontologies (http://www.obofoundry.org/)), Uberon and likelihood have all been used to measure word
(http://obofoundry.org/wiki/index.php/UBERON:Main_ distribution. These approaches are typically
Page), GO (Ashburner et al. 2000), and nomenclatures knowledge poor, that is, they do not make use of
such as HUGO. existing, domain-specific resources, and are thus
The main approaches to term recognition portable to many domains. A popular approach used
are dictionary-based, statistical, and hybrid, the latter in biology, based on the notion of termhood, is C-value
of which combines linguistic knowledge with (Frantzi et al. 2000). This is a hybrid measure, as it
statistics. combines linguistic knowledge with a number of
Dictionary-based approaches use existing termino- statistical characteristics. This approach facilitates the
logical resources or ontologies to identify terms in text. recognition of embedded subterms, which are very
If a particular sequence of words in a text matches an frequent in biology. C-Value is the measure underlying
entry in one of the existing resources, then it is the term extraction service TerMine (http://www.
considered to be a term. Although there are several nactem.ac.uk/software/termine/), provided by the
inventories of terms in biology, for example, ChEBI National Centre for Text Mining.
(de Matos et al. 2010), BioThesaurus (Liu et al. 2006),
NCBI taxonomy, etc., the use of such resources to aid Incorporating Term Variability and
term recognition has several drawbacks, for example, Disambiguation into Term Extraction
each resource has a different focus, they are often not The integration of techniques to recognize biological
linked together, and they do not have sufficient termi- terms and their variants into text search engines
nological coverage. As new terms are created daily in improves accuracy, since all possible variants of
biology, and as most resources are manually curated a particular term can be mapped to the same canonical
and are costly to build, term extraction tools are crucial entity. This means that if user provides any single
for populating these resources. Wide coverage variant of a term as input to the engine, documents
terminological resources, such as the BioLexicon containing all known variants of the term will be
(Thompson et al. 2011), can be helpful in the construc- returned. Orthographic variations are one of the
tion of these tools. The BioLexicon combines together common types of term variation (e.g., tumour vs
terms from a number of existing resources, which are tumor; NF-KB vs NF-kb; amino-acid vs amino acid).
further augmented with terms extracted through the However, one of the most prolific types of terminolog-
automatic analysis of MEDLINE abstracts, in order ical variation by far in biology is acronyms. By way of
to account for real, observed usage of terms in text. an example, the acronym ER has about 80 possible
The resource incorporates gene/protein term variants definitions (expansions), including estrogen receptor,
and other linguistic and semantic information required emergency room, receptor alpha, enhancement ratio,
to drive text mining applications, such as ▶ informa- etc. In turn, each expansion can have several different
tion extraction and ▶ information retrieval. variant forms. For example, variants of estrogen recep-
Statistical approaches are based on various statisti- tor include the following: oestrogen receptor, estrogen
cal distributions of words in text. Since the vast receptors, estrogen-receptor, estrogenic receptor,
majority of terms consist of nouns or sequences of etc. Techniques for the automatic extraction of
nouns, the main strategy is to extract specific noun acronyms and their expanded forms are useful for
phrases as term candidates and estimate their probabil- ▶ information retrieval and ▶ information extraction.
ity of being terms. The frequency with which For example, Wren and Garner (2002) reported that
sequences of words co-occur in a document or PubMed could retrieve 5,477 documents using the
Automatic Classification Methods 59 A
acroymn JNK as a search term, but only 3,773 docu- Cross-References
ments when using the expanded form as a search term,
that is, c-jun N-terminal kinase. As with term extrac- ▶ Applied Text Mining
A
tion, a common approach for extracting acronyms is ▶ Clustering
utilizing statistical clues in text, that is, co-occurrences ▶ Information Extraction
between short-forms (e.g., ER) and long-forms ▶ Information Retrieval
(e.g., estrogen receptor). The strength of these ▶ Mutual Information
co-occurrences can be measured via techniques such ▶ Named Entity
as ▶ mutual information, Dice coefficient, ▶ Named Entity Recognition
log-likelihood ratio, etc. An example of a service that ▶ Term Disambiguation, Text Mining
automatically recognizes acronyms, and expands them ▶ Term Normalization, Text Mining
into their definitions and variant forms is Acromine
(http://www.nactem.ac.uk/software/acromine/).
Term variation is not the only challenge for auto- References
matic term management. Terms are frequently associ-
ated with multiple meanings. For example, the Ananiadou S, McNaught J (eds) (2006) Text mining for biology
and biomedicine. Artech House, Boston/London
abbreviation PCR has 129 expanded forms that can
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H,
be consolidated to 30 senses (e.g., polymerase chain Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT,
reaction, pathologic complete response, and Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S,
phosphocreatine). In general, each sense has more Matese JC, Richardson JE, Ringwald M, Rubin GM,
Sherlock G (2000) Gene ontology: tool for the unification
than one surface form (i.e., variant). Abbreviation
of biology. Nat Genet 25(1):25–29
disambiguation methods use resources such as Sense Bodenreider O (2004) The unified medical language system
Inventories (Okazaki et al. 2010) and apply ▶ cluster- (UMLS): integrating biomedical terminology. Nucleic
ing to the expanded definitions, using a number of Acids Res 32(Database issue):D267–D270
de Matos P, Alcantara R, Dekker A, Ennis M, Hastings J,
similarity methods to capture the context in which
Haug K, Spiteri I, Turner S, Steinbeck C (2010) Chemical
expanded forms appear. entities of biological interest: an update. Nucleic Acids Res
38(Database issue):D249–D254
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition
of multi-word terms: the C-value/NC-value method.
Conclusion Int J Digit Libr 3(2):115–130
Kageura K, Umino B (1996) Methods of automatic term
Given the frequency with which new terms appear recognition – a review. Terminology 3(2):259–289
in the literature, it is necessary to use techniques Liu H, Hu ZZ, Zhang J, Wu C (2006) BioThesaurus: a web-based
thesaurus of protein and gene names. Bioinformatics
such as the recognition of terms and their variants,
22(1):103–105
together with disambiguation. These techniques can Okazaki N, Ananiadou S, Tsujii J (2010) Building a high quality
facilitate the automatic extraction and classification sense inventory for improved abbreviation disambiguation.
of new terms, and allow them to be linked with Bioinformatics 26(9):1246–1253
Thompson P, McNaught J, Montemagni S, Calzolari N, Del
databases, ontologies, and controlled vocabularies.
Gratta R, Lee V, Marchi S, Monachini M, Pezik P,
An immediate application of automatic term extrac- Quochi V, Rupp C, Sasaki Y, Venturi G, Rebholz-
tion is to enhance the performance of text search Schuhmann D, Ananiadou S (2011) The BioLexicon:
engines. While existing search methods normally a large-scale terminological resource for biomedical text
mining. BMC Bioinformatics 12:397
restrict retrieval to exact matching of query terms,
Wren JD, Garner HR (2002) Heuristics for identification of
which can result in either low precision (irrelevant acronym-definition patterns within text: towards an auto-
information is retrieved) or low recall (relevant mated construction of comprehensive acronym definition
information is overlooked), term recognition and dictionaries. Methods Inf Med 41(5):426–434
term management methods can improve search
results by linking terms as they appear in text and in
various biological resources, discovering relation- Automatic Classification Methods
ships between terms and improving classification
and clustering methods. ▶ Clustering Methods
A 60 Autonomy

Characteristics
Autonomy
Although the idea of autonomy is commonly used, it is
Alvaro Moreno quite difficult to be defined. It is applied to very differ-
Department of Logic and Philosophy of Science, ent objects, in rather different contexts, and with dif-
University of the Basque Country, San Sebastian, ferent rigor. Systems as diverse as machines, control
Spain devices, robots, computer programs, human beings,
institutions, countries, animals, organs, cells, and
others, apparently deserve to be called “autonomous,”
Definition or at least to be studied and compared in terms of the
“autonomy” displayed in their behavior. Originally the
The concept of autonomy expresses the property of term was used in the context of law and sociology (in
a system that builds and actively maintains the rules the sense of self-government of the Greek polis) or
that define itself, as well as the way it behaves in human cognition and rationality (in the sense of
the world. So autonomy covers the main properties a cognitive agent that acts according to rationally
shown by any living system at the individual level: self-generated rules).
(1) self-construction (i.e., the fact that life is continu- Etymologically, autonomy means self-law (giving),
ously building, through cellular metabolisms, the namely, the capacity to act according to self-
components which are directly responsible for the determined principles. Broadly speaking, autonomy
behavior of the system) and (2) functional action is understood as the capacity to act according to self-
on and through the environment (i.e., the fact that determined principles. However, the idea of autonomy
organisms are agents, because they necessarily modify can adopt a more specific, minimal sense related to the
their boundary conditions in order to ensure their own capacity of a system to self-define, to construct its own
maintenance as far-from-equilibrium, dissipative identity. It is in this more basic sense that autonomy
systems). proves relevant for biology, to refer to the main feature
There are several key points. First, the notion of of the organization of living organisms, namely, the
cyclic causal organization or recursivity remains the fact that they are able to self-repair, self-produce, and
conceptual nucleus of autonomy as identity preserva- self-maintain.
tion. The system is constituted as a series of causal Although the idea of autonomy as a key concept to
processes (energy transduction, component produc- understand the organization of organisms has its roots
tion, etc.) converging to a given (initial) state, as an in Kant (1790), it was the theory of autopoiesis which
indefinite repetition of the same loop (Bechtel 2007). placed the notion of autonomy at the center of the
Second, this identity is something more than a mere biological understanding of living beings.
ordered pattern: It is a true set of mechanisms, whose Within the autopoietic theory, the concept of auton-
function is the very maintenance of the system, and omy has been applied to different biological domains:
therefore of themselves. Third, this organization is the basic metabolic domain, the immune domain, and
intrinsically precarious because it depends on the neural domain (Varela 1979). The root of the con-
a dynamical order whose cohesion is dissipative. cept, however, was the characterization of the basic
Fourth, an autonomous system is continuously organization of the living, namely, its metabolic orga-
performing actions, doing things that contribute to its nization. Actually, it was a simplified and rather
own maintenance. These interactions with the environ- abstract version of a cyclic idea of metabolism that
ment are, at the same time, a result of the constitutive inspired Maturana and Varela (1973; Varela et al.
processes of the system and a necessary condition for 1974) to define living beings as autopoietic (self-
their continuity. In other words, there is a reciprocal producing) systems. Thus, the concept of autopoiesis
dependence between what defines the “self” (or the refers to a recursive process of component production
subject) and the actions derived from its existence, that builds up its own physical border. The global
because it is not possible to separate the system’s network of component relations establishes a self-
doing from its being (Moreno et al. 2008). maintaining dynamics, which brings about the
Autonomy 61 A
constitution of the system as an operational unit. In Kauffman’s (2000) approach. Starting from the con-
sum, components and processes are entangled in cept of an autocatalytic set, the main condition
a cyclic, recursive production logic and they constitute required to consider a system as autonomous is that it
A
an identity. According to Maturana and Varela, auton- be capable of performing what this author calls (at
omous capacities stem from the logic of autopoiesis least) one “work-constraint cycle.” This capacity is
(Maturana and Varela 1973, 1980). implemented through a deep entanglement between
Although the stress is made in the circular logic of work and constraints: “work begets constraints begets
the system, (The idea that the essence of life consists in work,” as some form of closure in the abstract space of
a cyclic or causally closed organization is also shared catalytic tasks. This idea is based, on the one hand, in
by other authors like Rosen (1981) or Ganti (1975).) in Atkins’ view (1984) of work as a coordinated, coher-
other places these authors also consider the relation of ent, and constrained release of energy, and, on the
the system with its environment, admitting that auton- other hand, in the recognition that work is absolutely
omy implies the capacity to maintain the identity necessary to build constraints. To be autonomous,
through the active compensation of deformations a system must do work; to be capable of work it
(Maturana and Varela 1980). Thus, phenomena like requires constraints to channel the flow of energy in
tornadoes, whirlpools, and candle flames, which are to an appropriate way; and to build those constraints the
a certain degree self-organizing and self-maintaining system requires appropriately constrained energy
systems, are not autonomous, because they lack any flows, that is to say, work. This is the circularity of
form of active interaction exerted by the system. In that the “work-constraint cycle.” Kauffman’s account
sense, what distinguishes simple forms of self- envisages how functionality or utility (implicit in the
organization and self-maintenance from autonomy is idea of work) can come out of the causal circularity of
that the former lack an internal organization which the system, where this circularity is not only under-
would be complex enough to be recruited for stood in terms of abstract relations of component pro-
displaying selective actions, capable to actively ensure duction, but also as a thermodynamic logic sustaining
the maintenance of the system. (This is not to say that the specific chemical recursivity of the system: All the
simple self-maintaining systems are necessarily less processes ongoing in the system are constrained in
robust than autonomous ones). order to satisfy the condition of self-maintenance.
However, Maturana and Varela have not analyzed
which type of material organization may satisfy this
requirement. Actually, their view on autonomy was Implications
explicitly abstract and functionalist. More recently,
other authors have developed different approaches on The idea of autonomy can serve as a bridge from the
the concept of biological autonomy. Hooker, Collier, nonliving to the living domain (Ruiz-Mirazo et al.
Christensen, and Bickhard, for example (Christensen 2004). Autonomy is applicable to this gap between
and Hooker 2000; Collier 1998; Bickhard 2000), have chemistry and biology, beyond standard self-
stressed both the material-energetic and the interactive organization, because in a living organism component
dimension of autonomous systems, derived from the production relationships (chemical transformations,
fact that the systems are in far-from-equilibrium con- reaction feedbacks, mutual interactions, catalytic
ditions: since the cohesive organization of these sys- effects, etc.) can be interpreted as self-constraining
tems exists only in far-from-equilibrium conditions, processes, that is, as the generation by the very organ-
they must maintain an adequate interchange of matter ism of the local rules (constraints) that govern its
and energy with their environment in order to keep this dynamic behavior (Pattee 1972). It is through this
cohesion. Otherwise these systems will disintegrate. self-constraining action that the organism actually
Therefore, an autonomous system needs an internal defines itself, constructs an identity of its own.
mechanism able to organize and channel energy The concept of autonomy has been proposed for
flows for the system’s self-maintenance. defining the very nature of biological individuals,
A similar concern in formulating the idea of auton- because it is intended to account for the complex
omy in material terms is probably on the basis of material organization underlying any living organism,
A 62 Average Path Length

namely, its metabolism, understood as a cyclic, self- Universitaria. Santiago (1994, 3rd edition including new
maintaining network of reactions by means of which prefaces by each author)
Maturana H, Varela F (1980) Autopoiesis and cognition: the
the components of an organism are continually realization of the living. Reidel, Dordecht
produced. Moreno A, Etxeberria A, Umerez J (2008) The autonomy of
biological individuals and artificial models. Biosystems
91(2):309–319
Pattee H (1972) Laws and constraints, symbols and languages.
Cross-References In: Waddington CH (ed) Towards a theoretical biology,
vol 4, Essays. Edinburgh University Press, Edinburgh,
▶ Closure, Causal pp 248–258
▶ Constraint Rosen R (1991) Life itself: a comprehensive inquiry into the
nature, origin and fabrication of life. Columbia University
▶ Function Press, New York
▶ Organization Ruiz-Mirazo K, Moreno A (2004) Basic autonomy as
▶ Perturbation a fundamental step in the synthesis of life. Artif Life
▶ Robustness 10(3):235–259
Ruiz-Mirazo K, Pereto J, Moreno A (2004) A universal defini-
▶ Self-Organization tion of life: autonomy and open-ended evolution. Orig Life
▶ Systems, Autopoietic Evol Biosph 34:323–346
Varela F (1979) Principles of biological autonomy. Elsevier
North Holland, New York
Varela F, Maturana H, Uribe R (1974) Autopoiesis: the
References organization of living systems, its characterization and
a model. Biosystems 5:187–196
Atkins PW (1984) The second law. Freeman, New York
Bechtel W (2007) Biological mechanisms: organized to main-
tain autonomy. In: Boogerd F, Bruggeman F, Hofmeyr JH,
Westerhoff HV (eds) Systems biology: philosophical
foundations. Elsevier, Amsterdam, pp 269–302
Average Path Length
Bickhard M (2000) Autonomy, function, and representation.
Communication and Cognition - Artificial Intelligence ▶ Characteristic Path Length
17(3–4):111–131
Christensen W, Hooker C (2000) Anticipation in autonomous
systems: foundations for a theory of embodied agents. Int
J Comput Anticip Syst 5:135–154
Collier J (1998) Autonomy in anticipatory systems: significance Avidian
for functionality, intentionality and meaning. In: Dubois D
(ed) Proceedings of CASYS’98, the second international
▶ Digital Organism
conference on computing anticipatory systems. Springer-
Verlag, New York
Kant I (1790:1952) Critique of judgement. Oxford University
Press, Oxford
Kauffman S (2000) Investigations. Oxford University Press,
Oxford
Azacitidine
Maturana H, Varela F (1973) De máquinas y Seres Vivos.
Autopoiesis: La organización de lo Vivo. Editorial ▶ 5-azaCytosine
B

5-Bromo-2-deoxyuridine (BrdU) B Cell Epitope Prediction

▶ Lymphocyte Labeling, Cell Division Investigation Yasser EL-Manzalawy and Vasant Honavar
Center for Computational Intelligence, Learning, and
Discovery, Computer Science, Iowa State University,
Ames, IA, USA

B Cell Epitope
Synonyms
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome Antigen–antibody binding site prediction; Antigen–
Informatics, Institute of Genomics and Integrative antibody interface residue prediction; Computational
Biology, Delhi, India methods for mapping B cell epitopes

Definition Definition

A B cell epitope is the region of the antigen recognized B cell epitopes, also known as antigenic determinants,
by soluble or membrane-bound antibodies. B cell are restricted parts of molecules that are recognized by
epitopes are classified as either linear or discontinuous immunoglobulin molecules (antibodies) either in their
epitopes. free form or as membrane-bound B cell receptors.
B cell epitopes typically belong to one of two classes:
linear (continuous or sequential) epitopes or confor-
Cross-References mational (discontinuous) epitopes. Linear epitopes are
short peptides that correspond to a contiguous amino
▶ B Cells acid sequence fragment of a protein. Linear epitopes
▶ Systems Immunology, Data Modeling and Scripting are usually identified using assays such as PEPSCAN.
in R Consequently, current experimental methods offer

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
B 64 B Cell Epitope Prediction

# seqname source feature start end score N/A ? 1 11 21 31 41 51 60


# -------------------------------------------------------------------
AAT74874 bepipred-1.0b epitope 82 82 1.078 . . E NITNLCPFGEVFNATKFPSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDL 60
AAT74874 bepipred-1.0b epitope 83 83 0.947 . . E
.................EEEEEEEEEEEEEEEE...........................
AAT74874 bepipred-1.0b epitope 84 84 0.736 . . E
AAT74874 bepipred-1.0b epitope 85 85 0.796 . . E
CFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCVLAWNTRNIDATSTGNYN 120
AAT74874 bepipred-1.0b epitope 86 86 0.860 . . E ............................................EEEEEEEEEEEEEEEE
AAT74874 bepipred-1.0b epitope 87 87 0.685 . . E YKYRYLKHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIGYQPYRVV 180
AAT74874 bepipred-1.0b epitope 88 88 0.534 . . E .....................EEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEEE......
AAT74874 bepipred-1.0b epitope 89 89 0.077 . . . VLSFELLNAPATV 193
AAT74874 bepipred-1.0b epitope 90 90 0.043 . . . .............
AAT74874 bepipred-1.0b epitope 91 91 –0.195 . . .
AAT74874 bepipred-1.0b epitope 92 92 0.065 . . .
AAT74874 bepipred-1.0b epitope 93 93 0.494 . . E
AAT74874 bepipred-1.0b epitope 94 94 0.804 . . E
AAT74874 bepipred-1.0b epitope 95 95 0.411 . . E
AAT74874 bepipred-1.0b epitope 96 96 0.234 . . .
AAT74874 bepipred-1.0b epitope 97 97 0.024 . . .
AAT74874 bepipred-1.0b epitope 98 98 –0.016 . . .
AAT74874 bepipred-1.0b epitope 99 99 –0.356 . . .
AAT74874 bepipred-1.0b epitope 100 100 –0.569 . . .
AAT74874 bepipred-1.0b epitope 101 101 –0.779 . . .
AAT74874 bepipred-1.0b epitope 102 102 –1.256 . . .

B Cell Epitope Prediction, Fig. 1 Residue-based (left) residue is denoted with the letter “E.” In the case of peptide-
and peptide-based (right) linear B cell epitope predictions. In based predictions, the predicted epitopes are encoded with
the case of residue-based predictions, each predicted epitope a string of “E”s

little direct evidence indicating that each residue in the and output the predicted epitope(s). Some predictors,
epitope does in fact make contact with one or more called peptide-based predictors, require the user to
residues in the paratope (the part in the antibody that specify the length of the epitope as an input parameter.
binds to the antigen). The second class of B cell Peptide-based predictors are trained to classify amino
epitopes is called conformational (discontinuous) acid fragments into epitopes and other non-epitopes.
B cell epitopes that represent the vast majority of B cell Residue-based predictors, on the other hand, are trained
epitopes found in proteins. These epitopes are com- to score each residue in the query sequence; the higher
posed of amino acids that, although not contiguous in the score, the more likely it is that the corresponding
the primary sequence, are brought into close proximity residue belongs to an epitope. Figure 1 illustrates pep-
within the folded three-dimensional protein structure. tide-based and residue-based B cell epitope prediction.
In contrast to peptide-based predictors that can only
identify epitopes of specified length or antigenic
Characteristics regions, residue-based predictors can be used to infer
antigenic regions of unknown length or even conforma-
Predicting B Cell Epitopes tional B cell epitopes. Conformational B cell epitope
Identification of B cell epitopes is often a necessary predictors take as input an antigen structure (actual or
first step in developing safe and effective vaccines. predicted PDB coordinate file) or antigen sequence, and
Characterization of sequence and structural features extract some sequence- and/or structure-based feature
of B cell epitopes is important from the standpoint of representation of each residue in the input query anti-
understanding of pathogenicity and the adaptive gen. The output of the predictor is a per residue scores
immune response. Several experimental techniques assigned to each residue in the query antigen.
are currently available for mapping B cell epitopes.
However, with rapid increase in the number of fully or Predicting Linear B Cell Epitopes
partially sequenced pathogen genomes and prohibitive Although a vast majority of B cell epitopes are
cost and effort required for experimental identification believed to be conformational epitopes (Walter
of epitopes, there is an urgent need for cost-effective 1986), most of the existing experimental B cell-epitope
computational methods for reliable genome-wide mapping techniques and, perhaps consequently, most
identification of B cell epitopes. of the existing computational methods for B cell epi-
B cell epitope predictors can be categorized based tope prediction deal with linear B cell epitopes. Several
on the type of the B cell epitope predicted into linear computational-based methods for predicting linear
and conformational B cell epitope predictors. Linear B B cell epitopes have been proposed in literature
cell epitope predictors accept as input a suitable repre- (EL-Manzalawy and Honavar 2010). B cell epitopes
sentation of the amino acid sequence of a target antigen predicted using such methods can be easily
B Cell Epitope Prediction 65 B
synthesized and tested for their antibody-binding prop- binary labels (epitopes vs. non-epitopes) to obtain
erties. Existing computational methods for predicting peptide-based B cell epitope predictors. Alternatively,
linear B cell epitopes fall into two major categories: they can be trained using examples that are fixed length
(1) propensity scale–based methods; (2) machine amino acid sequence windows whose binary labels
learning–based methods. indicate whether or not the residue at the center of B
each sequence window is an epitope residue to obtain
Propensity Scale–Based Methods residue-based B cell epitope predictors. In either case,
Propensity scale–based methods (e.g., Parker and Guo a variety of amino acid residue–based and amino acid
1986; Pellequer et al. 1993) rely on the observed cor- sequence–based features for encoding the input to the
relations between specific physicochemical properties classifiers (EL-Manzalawy and Honavar 2010) have
(e.g., hydorophilicity, flexibility, or solvent accessibil- been utilized. Despite recent advances in machine
ity) of amino acids and the antigenic determinants in learning–based linear B cell epitope predictors, there
protein sequences to identify the location(s) of the is much room for improvement in the performance of
linear B cell epitope(s) in the query protein sequence. the state-of-the-art in computational prediction of lin-
The main idea is to assign a score to each amino acid in ear B cell epitopes (Greenbaum et al. 2007). This is due
a query protein sequence. These propensity scores at least in part to the limited amount of experimentally
measure the tendency of an amino acid to be part of characterized epitope data (and in particular, lack of
a B cell epitope (as compared to the background). The reliable negative, i.e., non-epitope data). Conse-
score for each target amino acid residue in a query quently, the development of more reliable linear
sequence is computed as the average of the propensity B cell epitope predictors remains a major challenge
values of the amino acids in a sliding window centered in computational immunology.
at the target residue. The propensity scores are then
used as a basis of predicting whether a given amino Predicting Conformational B Cell Epitopes
acid sequence residue is likely to be part of a linear Several experimental techniques can be used for iden-
B cell epitope. Figure 2 shows the analysis of receptor- tifying conformational B cell epitopes. The most accu-
binding domain (RBD) of Severe Acute Respiratory rate method relies on the determination of the structure
Syndrome coronavirus (SARS-CoV) spike protein of antigen–antibody complexes using X-ray crystal-
using Parker’s hydrophilic scale. lography (Fleury et al. 2000). Progress of computa-
Recently, Blythe and Flower (Blythe and Flower tional methods for conformational B cell epitope
2005) evaluated 484 amino acid propensity scales on prediction has been hindered at least in part by the
a data set of 50 proteins and concluded that the best limited number of solved antigen–antibody com-
achievable performance is only marginally better than plexes. As noted earlier, propensity scale–based
random guessing. This result underscores the need for methods can be used to predict both linear and confor-
more sophisticated methods (e.g., those that use state- mational B cell epitopes using only amino acid
of-the-art machine learning algorithms together with sequence information of antigens. Two recently devel-
appropriate data representations) for constructing oped methods for predicting conformational B cell
improved linear B cell epitope predictors. epitopes, DiscoTope and PEPITO, improve the accu-
racy of propensity scale–based methods by incorporat-
Machine Learning–based Methods ing information derived from the structure of the
Machine learning currently offers one of the most cost- antigens, e.g., solvent accessibility of residues. The
effective and hence widely used approaches to devel- task of predicting conformational B cell epitopes can
oping predictive models from data in bioinformatics be reduced to identifying protein–protein interface res-
applications (Baldi and Brunak 2001). Several idues on the surface of a target protein (antigen). This
machine learning–based linear B cell epitope predic- opens up the possibility of adapting the state-of-the-art
tors have been developed using Support Vector sequence and/or structure-based protein–protein inter-
Machine, Artificial Neural Network, Decision Tree, face residue prediction methods for developing con-
k-Nearest Neighbor, and Ensemble classifiers. Such formational B cell epitope predictors. A recent study
classifiers can be trained using examples that are (Ponomarenko and Bourne 2007) compared the per-
amino acid peptide sequences with the corresponding formance of six publicly available protein–protein
B 66 B Cell Epitope Prediction

B Cell Epitope Prediction, Fig. 2 Analysis of RBD domain of in the window. The higher the score, the more likely it is that the
SARS-CoV Spike protein using Parker’s hydrophilic scale. corresponding residue is an epitope residue. Positive peaks along
A sliding window of seven amino acids is used to assign the sequence in the plot of the residue scores indicate the pres-
a propensity score to the residue in the center of the window. ence of an epitope in that region of the sequence
The score is the sum of the epitope propensity of each amino acid

interface residue prediction tools on the conforma- (named mimotopes) that bind to the antibody with
tional B cell epitope prediction task. The reported high affinity. It is assumed that this panel of mimotopes
performance of such methods (with an average area mimics the physicochemical properties and spatial
under curve (AUC) no greater than 0.7) underscores organization of the genuine epitopes. Because the pre-
the need for developing protein–protein interface pre- cise identification of the epitope mimicked by the set of
dictors that are customized for the conformational B mimotopes is not straightforward since the epitope is
cell epitope prediction task. often discontinuous (conformational) and the epitope
A recently developed approach for identifying B cell and mimotopes do not necessarily share a high degree
epitopes uses a combination of both experimental and of sequence similarity, several computational methods
computational techniques. In this approach, a phage- have been proposed for localizing the panel of affinity-
display library of random peptides is scanned against selected peptides on the surface of a target antigen (e.g.,
an antibody of interest to obtain a panel of peptides Bublil et al. 2007).
B Cells 67 B
References Definition

Baldi P, Brunak S (2001) Bioinformatics: the machine learning B cell-mediated immune response is defined as the
approach, 2nd edn. MIT Press, Cambridge, MA
immune response cascade triggered by the binding of
Blythe M, Flower D (2005) Benchmarking B cell epitope prediction:
underperformance of existing methods. Protein Sci 14:246–248 antibodies (produced by the B cells) to the antigens B
Bublil E, Freund N, Mayrose I, Penn O, Roitburd-Berman A, and subsequent identification by the cell surface recep-
Rubinstein N, Pupko T, Gershoni J (2007) Stepwise predic- tors of macrophages, neutrophils, or other cells of
tion of conformational discontinuous B cell epitopes using
the B cell-mediated immunity to destroy the antigens.
the Mapitope algorithm. Proteins Struct Funct Bioinform
68:294–304, Wiley It is a type of adaptive immunity in vertebrates
EL-Manzalawy Y, Honavar V (2010) Recent advances in B cell (Alberts et al. 2002).
epitope prediction methods. Immunome Res 6:S2
Fleury D, Daniels R, Skehel J, Knossow M, Bizebard T (2000)
Structural evidence for recognition of a single epitope by two
distinct antibodies. Proteins 40:572 Cross-References
Greenbaum J, Andersen P, Blythe M, Bui H, Cachau R, Crowe J,
Davies M, Kolaskar A, Lund O, Morrison S, Mumey B, ▶ Major Histocompatibility Complex (MHC)
Ofran Y, Pellequer J, Pinilla C, Ponomarenko J, Raghava G,
▶ TCR Recognition of MHC-Peptide Complexes
van Regenmortel M, Roggen E, Sette A, Schlessinger A,
Sollner J, Zand M, Peters B (2007) Towards a consensus on
datasets and evaluation metrics for developing B cell epitope
prediction tools. J Mol Recognit 20:75–82 References
Parker J, Guo D (1986) New hydrophilicity scale derived from
high-performance liquid chromatography peptide retention
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P
data: correlation of predicted surface residues with antige-
(2002) The adaptive immune system. In: Molecular biology
nicity and x-ray-derived accessible sites. Biochemistry
of the cell, 4th edn. Garland Science, New York,
25:5425–5432, American Chemical Society
pp 1363–1421
Pellequer J, Westhof E, Van Regenmortel M (1993) Correlation
between the location of antigenic sites and the prediction of
turns in proteins. Immunol Lett 36:83–99
Ponomarenko J, Bourne P (2007) Antibody-protein interactions:
benchmark datasets and prediction tools evaluation. BMC
Struct Biol 7:64, BioMed Central Ltd
Walter G (1986) Production and use of antibodies against syn-
B Cells
thetic peptides. J Immunol Methods 88:149–161
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome
Informatics, Institute of Genomics and Integrative
Biology, Delhi, India
B Cell-mediated Immune Response

Shoba Ranganathan Synonyms


Department of Chemistry and Biomolecular
Sciences and ARC Center of Excellence in B lymphocyte
Bioinformatics, Macquarie University, Sydney,
NSW, Australia
Department of Biochemistry, Yong Loo Lin School of
Medicine, National University of Singapore, Definition
Singapore
B cells are a type of lymphocytes produced in the bone
marrow of mammals, which later migrate to spleen and
Synonyms lymph nodes. B cells mainly differentiate into memory
B cell and plasma cells. The plasma cells produce
Antibody-dependant immune response; Antibody- antibodies, thereby eliciting strong humoral immune
mediated immunity; B cell-mediated immunity response against antigens.
B 68 B Lymphocyte

Cross-References
Bacterial Transcriptional Cascade
▶ B Cell Epitope
▶ Systems Immunology, Data Modeling and Scripting ▶ Sigma Cascade
in R

Bagged Predictors
B Lymphocyte
▶ Bagging
▶ B Cells

Bagging
Backbone or Carbon-a Chain (Both with
the Sidechain) Celine Vens
Department of Computer Science, Katholieke
▶ Protein Structure Metapredictors Universiteit Leuven, Leuven, Belgium

Synonyms
Bacterial Artificial Chromosome (BAC)
Bagged predictors; Bootstrap aggregating
Myong-Hee Sung
Laboratory of Receptor Biology and Gene Expression,
National Cancer Institute, National Institutes of Definition
Health, Bethesda, MD, USA
Bagging is an ▶ ensemble learning method. Each mem-
ber of the ensemble is trained on a different ▶ bootstrap
Definition replicate of the training set. The outcomes of the indi-
vidual learning models are aggregated to obtain the
A bacterial artificial chromosome is a DNA construct outcome of the ensemble. Bagging often uses ▶ deci-
used for cloning a relatively large piece of DNA sion tree learners to create the individual models, but it
(100700 kb), usually by transforming E. coli. can be used with any unstable learning method.

Characteristics
Bacterial Cell Cycle
Constructing Bagged Predictors
▶ Cell Cycle, Prokaryotes Bagging proceeds by applying the same learning algo-
rithm to different bootstrap samples of the training set
(Breiman 1996a). Given a training set D, M new train-
ing sets Dk are generated, by uniformly sampling
Bacterial Transcription examples from D, with replacement. Usually, the sam-
pling procedure is terminated when each training set
▶ Transcription in Bacteria contains an equal amount of training examples as the
Basal Lamina 69 B
original set D. Thus, each example may appear zero, In each resampled training set, about one third of the
one, or multiple times in each of the bootstrap samples. instances are left out (actually 1/e in the limit). As
Algorithm 1 shows the pseudo-code to construct a result, out-of-bag estimates are based on combining
a bagged ensemble. only about one third of the total number of classifiers in
the ensemble. This means that they might overestimate B
Algorithm 1 Pseudo-code for constructing an ensemble using the error rate, certainly when a small number of trees
bagging. D denotes the training set, M is the number of models in
the ensemble. are used in the ensemble.
1: for k (1 to M do
2: Dk ( Bootstrap(D) Applications in Systems Biology
3: hk ( Learning Algorithm (Dk) Dudoit and Fridlyand (2003) propose a bagging pro-
4: end for cedure to improve the clustering of gene expression
5: return Uhk
data from cancer microarray studies. Schietgat et al.
The outputs of the base classifiers are aggregated (2010) use bagging of ▶ decision trees to predict the
(hence the name bagging, which is an acronym for functions of genes.
bootstrap aggregating) to form the output of the bag-
ging ensemble. Bagging is most popular in the context Implementations
of ▶ supervised learning, in which case the output Most data mining tools (e.g., Weka [Weka, Machine
corresponds to a prediction of the target attribute. For Learning Tool]) include an implementation of the
▶ classification tasks, the ▶ majority vote of the base bagging procedure.
classifiers’ predictions is taken; for ▶ regression, the
average is taken.
The number of models, M, is a parameter to be Cross-References
chosen by the user. Different values have been
reported in the literature. Breiman (1996a) uses ▶ Bootstrapping
50 models for classification tasks, and 25 models for ▶ Classification
regression. ▶ Decision Tree
Bagging improves the predictive performance of ▶ Ensemble
the base predictors if the base predictor is unstable. ▶ Learning, Supervised
This means that small changes in the training set can ▶ Regression
yield large changes in the predictive model. ▶ Deci-
sion trees and artificial neural networks are examples
of unstable predictors. Nearest neighbor methods, for References
instance, are not. In terms of the bias-variance trade-
Breiman L (1996a) Bagging predictors. Machine Learning
off, bagging is known to reduce variance, while 24(2):123–140
slightly increasing bias. Breiman L (1996b) Out-of-bag estimation. Technical Report.
University of California, Berkely ftp.stat.berkeley.edu/pub/
Out-Of-Bag Error Estimates users/breiman/OOBestimation.ps.Z. Accessed Aug 4, 2011
An advantage of using bagging is that out-of-bag error Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of
a clustering procedure. Bioinformatics 19(9):1090–1099
estimates (Breiman 1996b) can be used to estimate the Schietgat L, Vens C, Struyf J, Blockeel H, Kocev D, Džeroski S
generalization error of the ensemble, which removes (2010) Predicting gene function using hierarchical multi-
the need for a set aside test set. If the ensemble com- label decision tree ensembles. BMC Bioinformatics 11(2)
prises decision trees, then the out-of-bag error estima-
tion proceeds as follows: for every example in the
training set, a prediction is made, but only those trees
for which the example was not in the bootstrap sample Basal Lamina
are used. The error rate of this resulting out-of-bag
classifier is called the out-of-bag error estimate. ▶ Extracellular Matrix
B 70 Basal Transcription Factors

Basal Transcription Factors Bayes Rule

▶ General Transcription Factors Katsuhisa Horimoto


Computational Biology Research Center, National
Institute of Advanced Industrial Science and
Technology, Koto-ku, Tokyo, Japan
Basement Membrane

Marsha A. Moses Synonyms


Department of Surgery/Harvard Medical School,
Vascular Biology Program/Children’s Hospital Bayesian network model; Conditional independence
Boston, Boston, MA, USA

Definition
Definition
In probability theory, the definition of the conditional
The basement membrane is a thin layer of extracellular probability of A given B, P(A|B), is
components that lies underneath the epithelium of
many organs, as well as the basal surface of the endo- PðA; BÞ
thelium of the entire vasculature. Its main components PðAjBÞ ¼
PðBÞ
include, but not limited to, laminin, fibronectin,
entactin, proteoglycans, and collagen IV (Yurchenco
and Patton 2009). The function of the basement mem- where P(A) and P(B) are the probabilities of A and B,
brane is to provide structural support for organs and for respectively. Also, P(B|A) is, by symmetry,
the vascular architecture. Degradation of the basement
membrane by matrix metalloproteinases serves as PðA; BÞ
PðBjAÞ ¼
a key step during tumor angiogenesis by facilitating PðAÞ
endothelial cell sprouting and pericyte detachment
(Jakobsson and Claesson-Welsh 2008). Then Bayes rule is expressed as follows:

PðBjAÞPðAÞ
Cross-References PðAjBÞ ¼
PðBÞ

▶ Extracellular Matrix
In the situation where P(A|B) is difficult to compute
▶ Neovascularization
directly but we have direct information about P(B|A),
Bayes rule enables us to compute P(A|B) in terms of P
References
(B|A). If the denominator P(B) in the above equation is
Jakobsson L, Claesson-Welsh L (2008) Vascular basement a normalizing constant which can be computed by
membrane components in angiogenesis – an act of balance. marginalization, then
Scientific World Journal 8:1246–1249
Yurchenco PD, Patton BL (2009) Developmental and patho- X X
genic mechanisms of basement membrane assembly. Curr
PðBÞ ¼ PðBjAi Þ ¼ PðBjAi ÞPðAi Þ
i i
Pharm Des 15(12):1277–1294

Thus, Bayes rule is also written as


BAXS
PðBjAÞPðAÞ
PðAjBÞ ¼ P
▶ Bile Acid and Xenobiotic System i PðBjAi ÞPðAi Þ
Bayesian Inference 71 B
Cross-References Cross-References

▶ Bayesian Network Model ▶ Bayesian Inference


▶ Causal Relationship ▶ Optimal Experiment Design
▶ Conditional Independence B
▶ Correlation Relationship
References

Bayesian Smith JQ (2010) Bayesian decision analysis: principles and


practice. Cambridge University Press, Cambridge

▶ Bayesian Decision Analysis

Bayesian Inference
Bayesian Decision Analysis
Roger Higdon
Malcolm Farrow Seattle Children’s Research Institute, Seattle,
School of Mathematics & Statistics, Newcastle WA, USA
University, Newcastle upon Tyne, UK

Synonyms
Synonyms
Bayesian statistics
Bayesian; Decision theory

Definition
Definition
The principle of Bayesian inference is about assigning
Bayesian decision analysis (Smith 2010) is concerned probability to “the state of knowledge” of parameters
with making choices when the outcomes cannot be (y) related to an experiment. To do so Bayesian infer-
predicted with certainty. Probabilities are assigned to ence assigns probability distribution to all unknown
the various possible outcomes and so to values of the quantities in a statistical problem, parameters (y) as
reward or payoff, under each possible choice. We well as the data (X). This is in contrast to frequentist
choose between probability distributions of rewards. interference (▶ Frequentist Approach) where parame-
This is done by making the choice which maximizes ters are fixed unknown quantities that define the fre-
the expected utility (▶ utility function), that is, by quency of the data.
choosing the alternative which gives the probability
distribution of rewards with the greatest expectation of
utility, where utility is a function of the reward. For Characteristics
example, suppose that a person’s utility for small
financial rewards is an increasing linear function of The term “Bayesian” refers to Thomas Bayes
the monetary value. Suppose that this person is given (1702–1761), who proved a special case of what is
the choice between alternatives A and B with reward now called Bayes’ theorem (Bayes 1763). However,
distributions as follows: it was Pierre-Simon Laplace (1749–1827) who intro-
A: $1 with probability 0.6 or $2 with probability 0.4 duced a general version of the theorem and used it to
B: $0 with probability 0.2 or $2 with probability 0.8 approach problems in celestial mechanics, medical
The optimal choice is then B, with expectation $1.6, statistics, reliability, and jurisprudence.
while A has expectation $1.4. Bayesian inference provides a standardized
See also ▶ Utility Function. approach for analyzing statistical problems
B 72 Bayesian Inference

(Berger 1985). It is based on the posterior distribution, problems involve complex integration of the posterior
p(y|X), the distribution of parameters given in the distribution in order generate means or quantities such
data. The posterior can be represented as a constant as highest posterior density regions (the Bayesian
times product of the prior distribution and the analog to the confidence interval). These calculations
likelihood typically involve complex numerical approaches
such as Markov Chain Monte Carlo (MCMC)
pðyjXÞ ¼ k pðXjyÞpðyÞ (Brooks 1998).
A compromise between Bayesian and frequentist
The likelihood is the joint distribution of the data as approaches are empirical Bayes methods (Carlin and
a function of y. The prior distribution represents the Louis 2000). In empirical Bayes methods, the Bayes-
state of knowledge about the experimental parameters ian distributional framework is used; however, instead
before generating the data. of specifying the parameters in the prior distribution,
The ability to incorporate prior information is what the parameters are estimated from the data themselves.
gives Bayesian inference its power and makes it con- Empirical Bayes methods are quite commonly used in
troversial. The prior distribution is a vehicle by which the analysis of microarrays and other high-throughput
previous knowledge can be used to guide the statistical experimental data.
inference. One must be judicious in how the prior is There are Bayesian analogs to most classical statis-
specified since a very strong prior distribution can tical approaches such as t-tests, linear regression, or
overwhelm the evidence given by the data. One can chi-squared tests. Bayesian methods have become very
use an uninformative or reference prior if there is no popular in systems biology and bioinformatics because
strong initial belief about the parameters. In this case, they offer a more standardized way of handling com-
Bayesian inference gives very similar results to plex and noisy data. No specialized approaches are
maximum likelihood inference. needed if models are not fully specified or if data are
A simple example illustrating Bayesian inference is missing; this is not the case using traditional
batting averages in the sport of baseball. Suppose, frequentist methods. Examples of Bayesian methods
a player is observed to get six hits (successes) in ten in systems biology are sequence alignment (Durbin
at bats (trials) over the course of two games, the usual et al. 1998), motif finding (Zhou and Liu 2004), struc-
frequentist estimate of the player’s batting average is ture prediction (Schmidler et al. 2000), microarray
.600. This is a very unrealistic estimate since through- analysis (Gottardo et al. 2003), and Bayesian networks
out the history of the sport, only a few players have (Jansen et al. 2003).
ever batted even .400. If one puts a prior distribution on
the batting average p, where p is distributed beta
(47,127), thus giving p a mean of .270 and standard
Cross-References
deviation of .3 (this corresponds roughly to the typical
distribution of batting averages for baseball players).
▶ Bayesian Method
So this implies that the posterior distribution is as
▶ Frequentist Approach
follows.

pðyjXÞ / p6 ð1  pÞ4 p47 ð1  pÞ127 ¼ p53 ð1  pÞ131


References
A Bayesian estimate of the player’s batting average
can be found by taking the mean of the posterior Bayes T (1763) An essay towards solving a problem in the
doctrine of chances. Phil Trans 53:370–418
distribution, which in this case is .288, a far different Berger JO (1985) Statistical decision theory and Bayesian
and likely more realistic estimate of the player’s analysis, 2nd edn, Springer series in statistics. Springer,
batting average than the frequentist estimate. New York
The use of the beta distribution (the conjugate prior Brooks SP (1998) Markov chain Monte Carlo method and its
application. Statistician 47:69–100
for the binomial distribution) in the previous example Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods
makes calculations based on the posterior distribution for data analysis, 2nd edn. Chapman & Hall/CRC Press,
quite easy. However, most Bayesian inference Boca Raton
Bayesian Method 73 B
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological References
sequence analysis: probabilistic models of proteins and
nucleic acids. Cambridge University Press, Cambridge Schwarz G (1978) Estimating the dimension of a model. Ann
Gottardo R, Pannucci JA, Kuske CR, Brettin T (2003) Statistical Stat 6(2):461–464
analysis of microarray data: a Bayesian approach.
Biostatistics 4:597–620
Jansen R, Yu H, Greenbaum D et al (2003) Bayesian networks
B
approach for predicting protein-protein interactions from
genomic data. Science 302:449–453
Schmidler SC, Liu JS, Brutlag DL (2000) Bayesian segmenta- Bayesian Method
tion of protein secondary structure. J Comput Biol 7:233–248
Zhou Q, Liu JS (2004) Modelling within-motif dependence for Lin Wang
transcription factor binding site predictions. Bioinformatics
20:909–916 School of Computer Science and Information
Engineering, Tianjin University of Science and
Technology, Tianjin, China

Bayesian Information Criterion (BIC)


Synonyms
Xing-Ming Zhao
Institute of System Biology, Shanghai University, Bayesian inference; Bayesian network model
Shanghai, China

Definition
Synonyms
Bayesian method refers to a probability method to con-
Schwarz criterion; Schwarz information criterion (SIC) struct a knowledge structure used to support decision
making, such as classification (▶ Classification;
▶ Identification of Gene Regulatory Networks,
Definition Machine Learning) and regression (▶ Regression
Analysis) tasks, processes, or analyses. Bayesian
In statistics, the Bayesian information criterion (BIC) methods are valuable whenever there is a need to extract
(Schwarz 1978) is a model selection criterion. It is a information from data that are uncertain or subject to
selection criterion for choosing between different any kind of error or noise (including measurement error
models with different numbers of parameters. and experiment error, as well as noise or random vari-
The BIC is an asymptotic result derived under the ation intrinsic to the process of interest).
assumption that the data distribution belongs to the
exponential family.
Suppose that: Characteristics
1. x ¼ the observed data.
2. n ¼ the number of data points in x, or equivalently, Traditional statistical techniques struggle to cope
the sample size. with complex nonlinear models that are only
3. k ¼ the number of free parameters to be estimated. partially observed. Due to the fact that the Bayesian
4. pðxjkÞ ¼ the probability of the observed data given statistical paradigm is fully probabilistic, there is no
the number of parameters. fundamental distinction between any of the unknowns
5. L ¼ the maximized value of the likelihood function in a statistical model – parameters, hidden variables,
for the estimated model. and observations are all treated together in a consistent
The formula for the BIC is: manner – and it is from this that the power of the
 2 ln pðxjkÞ  BIC ¼ 2 ln L þ k lnðnÞ methodology is derived. Provided that you can write
down a statistical model relating the quantities you are
Given any two estimated models, the model with interested in to the data you can observe (possibly
the lower BIC value is the one to be preferred. via many unobserved intermediary variables), then
B 74 Bayesian Method

you can carry out Bayesian method to extract factory #1 was the prior probability, PðH1 Þ, which
the information in the data to give fully probabilistic was 0.5. After observing the unqualified light
information on all unobserved model variables. bulb, we must revise the probability to PðH1 jEÞ;
which is 0.33.
Mathematical Formulation
In the Bayesian inference, we specify a sampling Practical Application of Bayesian Methods
model PðZjyÞ (density or probability mass function) The main limiting factor in applying Bayesian
for our data given the parameters, and a prior distribu- methods is computational. For nontrivial problems,
tion for the parameters PðyÞ reflecting our knowledge analytic approaches to Bayesian inference are not
about y before we see the data. We then compute the possible, and their numerical solution is often
posterior distribution challenging due to the need to solve high-dimensional
integration problems (which in the discrete case
PðZjyÞPðyÞ PðZjyÞPðyÞ translate to combinatorial summation problems).
PðyjZÞ ¼ ¼Ð ; (1) Advances in the speed of commodity computing
PðZÞ PðZjyÞPðyÞdy
hardware in recent decades have been paralleled by
developments in computationally intensive algorithms
The example shown in following illustrates for Bayesian inference. Arguably, the most important
the calculation of posterior probability for a hidden advance has been the development of a range of
variable. Suppose there are two light bulb factories. techniques based on ▶ Markov Chain Monte Carlo
Factory #1 has 40 qualified products and 10 unquali- (MCMC). The ideas originate from statistical physics,
fied products, while factory #2 has 30 qualified prod- but are now widely used for Bayesian inference.
ucts and 20 unqualified products. One person chooses Although by no means a panacea, carefully crafted
a factory at random, and then picks a light bulb at MCMC algorithms executed on fast computers are
random. It is assumed that the two factories are treated able to solve a phenomenal range of problems that
equally, likewise for the light bulbs. The light bulb would have been considered completely intractable
turns out to be an unqualified one. How probable is it only a few years ago. In the high-dimensional context,
that the light bulb is out of factory #1? it is often necessary to decompose the full problem
Intuitively, it seems clear that the answer should be according to the underlying conditional independence
less than a half, since there are less unqualified light structure of the model, and it is in this context that
bulbs in factory #1. The precise answer is given by graphical model (also known as ▶ Bayesian Network
Bayesian approach. Let H1 correspond to factory #1, Model) is particularly useful.
and H2 to factory #2. It is given that the bowls Gibbs sampler is just one of MCMC procedures for
are identical from the person’s point of view, sampling from posterior distributions. It uses condi-
thus PðH1 Þ ¼ PðH2 Þ, and the two must add up to 1, tional sampling of each parameter given the rest, and
so both are equal to 0.5. The event E is the observation is useful when the structure of the problem makes
of an unqualified light bulb. From the products of the this sampling easy to carry out. Specifically,
factories, we know that PðEjH1 Þ ¼ 10 50 ¼ 0:2 and a Markov chain is constructed with ▶ equilibrium
PðEjH2 Þ ¼ 2050 ¼ 0:4. Bayesian formula then yields probability distribution PðyjZÞ. Each iteration of the
sampler involves cycling through each component of
PðEjH1 ÞPðH1 Þ the K-dimensional vector y in order and sampling
PðH1 jEÞ ¼ from Pðyi jyi ; ZÞ; i ¼ 1;    ; K; where yi denotes
PðEjH1 ÞPðH1 Þ þ PðEjH2 ÞPðH2 Þ
the vector of all components of y except yi . Knowledge
0:2  0:5 (2)
¼ ; of the ▶ Bayesian network for the model can simplify
0:2  0:5 þ 0:4  0:5
the computation of these so-called full-conditional
¼ 0:33 distributions. In many cases, the full-conditionals
will be straightforward to sample directly, but in
Before we observed the unqualified light bulb, the others, a Metropolis-Hastings method will be
probability we assigned for the person having chosen required. Here, a proposed new value is simulated
Bayesian Network Model 75 B
from a largely arbitrary proposal distribution, Definition
qðyi jyi Þ and accepted with a probability.
A Bayesian network model is a probabilistic graphical
Bayesian Algorithms and Tools model of a set of variables for considering their condi-
There is a large variety of Bayesian algorithms and tional independencies in a directed acyclic graph B
tools; open source tools widely used in computational (DAG).
biology include Matlab (Matlab, Machine Learning
Tool and Matlab, Data Analysis Tool).
Characteristics

Cross-References Probabilistic Graphical Model


Let X ¼ (X1, X2,...,Xn) be a set of random variables, and
▶ Bayesian Network Model let xi be a value of Xi, the i-th component of X. Let
▶ Equilibrium Probability Distribution y ¼ ðxi ÞXi2Y be a value of Y  X. Then, a probabilistic
▶ Identification of Gene Regulatory Networks, graphical model for X is a graphical factorization of the
Machine Learning joint generalized probability density function,
▶ Machine Learning f(X ¼ x). The representation of this model is given by
▶ Markov Chain Monte Carlo two concepts: a structure and a set of local generalized
▶ Regression Analysis probability densities (Pearl 2000).
The structure S for X is a directed acyclic graph
(DAG) that describes a set of conditional indepen-
References dencies (Dawid 1979) about the variables on X: PaSi
represents the set of parents (variables from which an
Bolstad WM (2010) Understanding computational bayesian arrow is coming out in S) of the variable Xi in the
statistics. Wiley, Hoboken
Box GEP, Tiao GC (1973) Bayesian inference in statistical
probabilistic graphical model whose structure is
analysis. Wiley, Hoboken given by S. The structure S for X assumes that Xi and
Hastie T, Tibshirani R, Friedman J (2001) The elements of its non-descendants are independent given PaSi ,
statistical learning, Springer Series in Statistics. Springer, i ¼ 2,..., n. Therefore, the factorization can be written
New York
as follows:
Wilkinson DJ (2007) Bayesian methods in bioinformatics
and computational systems biology. Brief Bioinform
8(2):109–116 Y
n  
f ðxÞ ¼ f ðx1 ; x2 ; . . . : xn Þ ¼ f xi jpaSi (1)
i¼1

A representation of the models of the characteristics


described in Eq. 1 assumes that the local generalized
Bayesian Network Model probability densities depend on a finite set of parame-
ters uS 2 QS, and as a result, Eq. 1 can be rewritten as
Katsuhisa Horimoto1 and Shigeru Saito2 follows:
1
Computational Biology Research Center,
National Institute of Advanced Industrial and
Y
n  
Science and Technology, Koto-ku, Tokyo, Japan f ðxjyS Þ ¼ f xi jpaSi ; yi (2)
2
Infocom Corporation, Tokyo, Japan i¼1

where yS ¼ (y1, y2,..., yn).


Synonyms
Bayesian Network Model
Bayes rule; Causal relationship; Conditional Bayesian network model is the probabilistic graphical
independency model in the particular case of every variable Xi 2 X
B 76 Bayesian Network Model

Bayesian Network Model,


Fig. 1 Structure and resulting
factorization for a Gaussian
network with four variables

 
being discrete. If the variable Xi hasri possible  values, where N xi ; mi ; s2i is a univariate normal
x1i ; :::; xrii , the local distribution, g xi jpaj;S
i ; y i is an distribution with mean mi and variance vi ¼ s2i for
unrestricted discrete distribution: the i-th variable.
Taking this definition into account, an edge missing
   from Xj to Xi implies bji ¼ 0 in the former linear-

g xki paj;S
i ; y i ¼ yxk jpaj (3)
i i regression model. The local parameters are given by
ui ¼ (mi, bi, vi), where bi ¼ (b1i, b2i,..., bi1i)T is
qi;S
where pa1;Si ; :::; pai denotes the values of PaSi , that is, a column vector. A probabilistic graphical model
the set of parents of the variable Xi in the structure S; qi built from these local density functions is known as
is the number of different possible instantiations of the a Gaussian network (Shachter and Kenley 1989).

parent variables of Xi. The local parameters are given The components of the local parameters are
by yi ¼ ðyijk Þk ¼1 ri Þj ¼1 qi . In other words, the param- as follows: mi is the unconditional mean of Xi, vi is
eter yijk represents the conditional probability that var- the conditional variance of Xi given Pai, and bji
iable Xi takes its k-th value, knowing that its parent is a linear coefficient that measures the strength of
variables have taken their j-th combination of values. the relationship between Xj and Xi. Figure 1 is an
We assume that every yijk is greater than zero. example of a Gaussian network in a four-dimensional
space.
Gaussian Network Model For investigating how Gaussian networks and
Here, one example of Bayesian network model is illus- multivariate normal densities are related, the joint
trated, which assumes the joint density function to be density function of the continuous n-dimensional
a multivariate Gaussian density, Gaussian network variable X is by definition a multivariate normal dis-
model (Whittaker 1990). tribution iff
An individual x ¼ (x1, x2,..., xn) consists of
a continuous value in ℜn. The local density function f ðxÞ ¼ Nðx; m; SÞ
for the i-th variable Xi can be computed as the linear
S1 ðxmÞ
¼ ð2pÞ2 jSj2 e2ðxmÞ
n 1 1 T
regression model (5)

!
   X where m is the vector of means, S is covariance matrix
f xi paSi ; yi ¼ N xi ; mi þ bji ðxj  mj Þ; ni n  n, and |S| denotes the determinant of S. The
xj 2pai
inverse of this matrix, W ¼ S1, in which elements
(4) are denoted by wij, is known as the precision matrix.
BCR Receptor Diversity 77 B
This density can also be written as a product of n cannot represent feedback loops in the networks. It is
conditional densities using the chain rule, namely, well known that cyclic machinery is a common mech-
anism in various biological functions. Another is
Y
n
a more serious problem: Arbitral structure learning in
f ðxÞ ¼ f ðxi jx1 ; x2 ; . . . :xi1 Þ
i¼1
Bayesian network is an NP-hard problem, which B
! means that we are allowed to use only heuristic strat-
Y
n X
i1
¼ N x i ; mi þ bji ðxj  mj Þ; ni (6) egies for finding approximate solutions. But yet,
i¼1 j¼1 Bayesian network is of certain worth in terms of its
handiness and ease of interpretation.
where mi is the unconditional mean of Xi, vi is the
variance of Xi given X1, X2,..., Xi1, and bji is a linear
coefficient reflecting the strength of the relationship Cross-References
between variables Xj and Xi. This notation allows us to
represent a multivariate normal distribution as ▶ Bayes Rule
a Gaussian network, where for any bji 6¼ 0 with j < i ▶ Bayesian Method
this network will contain an edge from Xj to Xi. ▶ Causal Relationship
▶ Conditional Independency
Applications of Biological Network ▶ Probabilistic Graphical Model
The application of Bayesian network model to the
biological network is first found in Friedman et al.
(2000). One of the important research themes for apply- References
ing Bayesian network model to biological network is to
develop algorithm for obtaining accurate network struc- Dawid AP (1979) Conditional independence in statistical theory.
J Roy Stat Soc Ser B 41:1–31
ture in reasonable computational time, which is called
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using
“structure learning.” Currently, approaches for structure Bayesian networks to analyze expression data. J Comput
learning are roughly divided into two categories. One is Biol 7:601–620
“score-based method,” which provides a structure by Pearl J (2000) Causality: models, reasoning, and inference.
Cambridge University Press, Cambridge/New York
maximizing some scoring function with respect to the
Shachter R, Kenley C (1989) Gaussian influence diagrams.
posterior probability of the structure, such as Bayesian Manag Sci 35:527–550
Dirichlet equivalent (BDe) and Bayesian information Whittaker J (1990) Graphical models in applied multivariate
criterion (BIC), by using several maximization statistics. Wiley, Chichester
methods, such as greedy, K2, and Markov chain
Monte Carlo. The other is “constraint-based method,”
which provides a structure by checking conditional
independency among nodes in the graph, such as SGS/ Bayesian Statistics
PC-algorithm and CI-algorithm. The score-based
methods try to get an optimal structure in terms of ▶ Bayesian Inference
likelihood (i.e., goodness of fit of the model). On the
other hand, the constraint-based method focuses on
consistency of conditional independencies in the graph
(local Markov property). B Cell-mediated Immunity
According to improvement of measurement tech-
nology like DNA microarray, Bayesian networks have ▶ B Cell-mediated Immune Response
become quite a popular approach for genetic network
inference. Nevertheless, although a lot of successful
applications of Bayesian networks under various bio-
logical settings can be enumerated, Bayesian networks BCR Receptor Diversity
still have several limitations. One is that Bayesian
networks accept only acyclic graphs, that is, they ▶ Immune Repertoire Diversity
B 78 Belief Elicitation

hypothesis testing. While the ▶ Bonferroni correction


Belief Elicitation relies on the Family Wise Error Rate (FWER),
Benjamini and Hochberg introduced the idea of a
▶ Prior Elicitation FDR to control for multiple hypotheses testing. In the
statistical context, discovery refers to the rejection of
a hypothesis. Therefore, a false discovery is an
incorrect rejection of a hypothesis and the FDR is
the likelihood such a rejection occurs. Controlling
Bench-to-Bedside Research
the FDR instead of the FWER is less stringent and
increases the method’s power. As a result, more
▶ Translational Research
hypotheses may be rejected and more discoveries
may be made.
In the Benjamini–Hochberg method, hypotheses
are first ordered and then rejected or accepted based
Benign on their p-values. A p-value is a data point for each
hypothesis describing the likelihood of an observation
Barbara J. Davis based on a ▶ probability distribution. The Benjamini–
Section of Pathology, Tufts Cummings School of Hochberg method begins by ordering the m hypothesis
Veterinary Medicine Biomedical Sciences, by ascending p-values, where Pi is the p-value at the ith
North Grafton, MA, USA position with the associated hypothesis Hi. Let k be the
largest i for which:

Definition i
Pi q
m
Localized disorganized tissue mass that does not spread
and is amenable to local excision. The tissue character- Reject hypotheses i ¼ 1, 2, 3,..., k. The Benjamini–
istics are more similar to the tissue it is derived from. Hochberg method has been proven to control the FDR
for all tests at a level of q.

Cross-References
References
▶ Cancer Pathology
Benjamini Y, Hochberg Y (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple hypothe-
sis testing. J R Stat Soc B 57:289–300

Benjamini–Hochberg Method

Winston Haynes
Seattle Children’s Research Institute, Seattle, Beta Workbench
WA, USA
▶ BlenX

Definition

The Benjamini–Hochberg method controls the BetaWB


False Discovery Rate (FDR) using sequential
modified ▶ Bonferroni correction for ▶ multiple ▶ BlenX
Bifurcation 79 B
of a system causes a sudden “qualitative” or topolog-
Bias ical change in its behavior. Bifurcations can occur
in both continuous systems (described by ▶ ODEs,
Olga Vitek DDEs, or PDEs) and discrete systems (described
Department of Statistics, Department of Computer by maps). B
Science, Purdue University, West Lafayette, IN, USA

Characteristics
Definition
Bifurcation Diagram
Bias is a property of experimental design defined as In mathematics, particularly in dynamical systems,
a systematic error in data-derived conclusions regard- a bifurcation diagram shows the possible long-term
ing the unknowns. values (equilibria/fixed points or periodic orbits) of
a system as a function of a bifurcation parameter in
the system. It is usual to represent stable solutions with
Cross-References a solid line and unstable solutions with a dotted line.
An example is the bifurcation diagram of the logis-
▶ Designing Experiments for Sound Statistical tic map:
Inference
xnþ1 ¼ rxn ð1  xn Þ (1)

The bifurcation parameter r is shown on the hori-


Bifan zontal axis of the plot and the vertical axis shows the
possible long-term population values of the logistic
▶ Canonical Network Motifs function. Only the stable solutions are shown here;
there are many other unstable solutions which are not
shown in this diagram.
The bifurcation diagram nicely shows the forking of
Bifurcation the possible periods of stable orbits from 1–2 to 4–8,
etc. Each of these bifurcation points is a period-
Tianshou Zhou doubling bifurcation. The ratio of the lengths of
School of Mathematics and Computational Sciences, successive intervals between values of r for which
Sun Yet-Sen University, Guangzhou, Guangdong, bifurcation occurs converges to the first Feigenbaum
China constant (Fig. 1).

Bifurcation Types
Definition It is useful to divide bifurcations into two principal
classes:
Bifurcation means the splitting of a main body into two Local bifurcations, which can be analyzed entirely
parts. Bifurcation theory is the mathematical study of through changes in the local stability properties of
changes in the qualitative or topological structure of ▶ equilibria, periodic orbits, or other invariant sets
a given family, such as the integral curves of a family as parameters cross through critical thresholds.
of vector fields, and the solutions of a family of differ- Global bifurcations, which often occur when larger
ential equations. Most commonly applied to the invariant sets of the system “collide” with each
mathematical study of dynamical systems, a bifurca- other, or with equilibria of the system. They cannot
tion occurs when a small smooth change made to be detected purely by a stability analysis of the
the parameter values (the bifurcation parameters) equilibria (fixed points).
B 80 Bifurcation

1.0 bifurcation is either a saddle-node (often called fold


bifurcation in maps), transcritical, or pitchfork bifurca-
0.8 tion. If the eigenvalue is equal to  1, it is a period-
doubling (or flip) bifurcation, and otherwise, it is a Hopf
0.6
bifurcation.
x

Examples of local bifurcations include:


0.4
1. ▶ Saddle-node (fold) bifurcation
0.2 2. Transcritical bifurcation
3. Pitchfork bifurcation
0.0 4. Period-doubling (flip) bifurcation
2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 5. ▶ Hopf bifurcation
r 6. Neimark (secondary Hopf) bifurcation
Bifurcation, Fig. 1 Bifurcation diagram of the logistic map
Saddle-Node Bifurcation In the mathematical area
of ▶ bifurcation theory a saddle-node bifurcation, tan-
Local Bifurcations gential bifurcation, or fold bifurcation is a local bifur-
A local bifurcation occurs when a parameter change cation in which two fixed points (or equilibria)
causes the stability of an equilibrium (or fixed point) to of a dynamical system collide and annihilate each
change. In continuous systems, this corresponds to the other. The term “saddle-node bifurcation” is most
real part of an eigenvalue of an equilibrium passing often used in reference to continuous dynamical sys-
through zero. In discrete systems (those described by tems. In discrete dynamical systems, the same bifurca-
maps rather than ODEs), this corresponds to a fixed tion is often instead called a fold bifurcation. Another
point having a ▶ Floquet multiplier with modulus name is blue skies bifurcation in reference to the sud-
equal to one. In both cases, the equilibrium is den creation of two fixed points.
nonhyperbolic at the bifurcation point. The topological If the phase space is one dimensional, one of the
changes in the phase portrait of the system can be equilibrium points is unstable (the saddle), while the
confined to arbitrarily small neighborhoods of the other is stable (the node).
bifurcating fixed points by moving the bifurcation The normal form of a saddle-node bifurcation is:
parameter close to the bifurcation point (hence
“local”). x_ ¼ r þ x2 (4)
More technically, consider the continuous dynami-
cal system described by the ODE: Here x is the state variable and r is the bifurcation
parameter.
x_ ¼ f ðx; lÞ; f : Rn  R ! R (2) If r < 0 there are two equilibrium points, a stable
pffiffiffiffiffiffi
equilibrium point at  r and an unstable one at
pffiffiffiffiffiffi
A local bifurcation occurs at ðx0 ; l0 Þ if the ▶ Jacobian þ r . At r ¼ 0 (the bifurcation point) there is
matrix Df ðx0 ; l0 Þ has an ▶ eigenvalue with zero real exactly one equilibrium point. At this point the fixed
part. If the eigenvalue is equal to zero, the bifurcation point is no longer hyperbolic. In this case the fixed
is a steady state bifurcation, but if the eigenvalue is point is called a saddle-node fixed point. If r > 0, then
nonzero but purely imaginary, this is a ▶ Hopf there are no equilibrium points.
bifurcation. A saddle-node bifurcation occurs in the consumer
For discrete dynamical systems, consider the equation if the consumption term is changed from px
system: to p, that is the consumption rate is constant and not in
proportion to resource x.
xnþ1 ¼ f ðxn ; lÞ (3) Saddle-node bifurcations may be associated with
hysteresis loops and catastrophes.
Then a local bifurcation occurs at ðx0 ; l0 Þ if the
matrix Df ðx0 ; l0 Þ has an eigenvalue with modulus Transcritical Bifurcation In ▶ bifurcation theory,
equal to one. If the eigenvalue is equal to one, the a field within mathematics, a transcritical bifurcation
Bifurcation 81 B
is a particular kind of local bifurcation, meaning that it of the pitchfork is given by the sign of the third
is characterized by an equilibrium having an eigen- derivative:
value whose real part passes through zero. 
Both before and after the bifurcation, there is one @f 3 <0 supercritical
ð0; r0 Þ (8)
unstable and one stable fixed point. However, their @x3 >0 subcritical B
stability is exchanged when they collide. So the unsta-
ble fixed point becomes stable and vice versa.
The normal form of a transcritical bifurcation is: Period-Doubling Bifurcation In mathematics,
a period-doubling bifurcation in a discrete dynamical
x_ ¼ rx  x2 (5) system is a bifurcation in which the system switches to
a new behavior with twice the period of the original
This equation is similar to logistic equation but in system. Period-doubling bifurcations can also occur in
this case we allow r and x to be positive or negative continuous dynamical systems, namely, when a new
(while in the logistic equation x and r must be nonneg- ▶ limit cycle emerges from an existing limit cycle, and
ative). The two fixed points are at x ¼ 0 and x ¼ r. the period of the new limit cycle is twice that of the old
When the parameter r is negative, the fixed point one (Fig. 2).
at x ¼ 0 is stable and the fixed point x ¼ r is unstable.
But for r > 0, the point at x ¼ 0 is unstable and the Supercritical/Subcritical Hopf Bifurcations The
point at x ¼ r is stable. So the bifurcation occurs at limit cycle is orbitally stable if a certain quantity called
r ¼ 0. the first Lyapunov coefficient is negative, and the bifur-
A typical example (in real life) could be the cation is supercritical. Otherwise it is unstable and the
consumer-producer problem where the consumption is bifurcation is subcritical.
proportional to the (quantity of) resource. The normal form of a Hopf bifurcation is:
For example:
 
z_ ¼ z l þ bjzj2 (9)
x_ ¼ rxð1  xÞ  px (6)
where z, b are both complex and l is a parameter. Write
where rxð1  xÞ is the logistic equation of resource b ¼ a þ ib. The number a is called the first Lyapunov
growth; and px is the consumption, proportional to coefficient.
the resource x. If a is negative, then there is a stable limit cycle for
l > 0:
Pitchfork Bifurcation In ▶ bifurcation theory, a field
within mathematics, a pitchfork bifurcation is zðtÞ ¼ reiot (10)
a particular type of local bifurcation. Pitchfork bifur-
cations, like ▶ Hopf bifurcations, have two types – where
supercritical or subcritical.
pffiffiffiffiffiffiffiffiffiffiffi
In flows, that is, continuous dynamical systems r¼ l=a and o ¼ br 2 (11)
described by ▶ ODEs, pitchfork bifurcations occur
generically in systems with symmetry. The bifurcation is then called supercritical.
An ODE: If a is positive then there is an unstable limit cycle
for l < 0. The bifurcation is called subcritical.
x_ ¼ f ðx; r Þ (7)
Global Bifurcation
described by a one parameter function f ðx; r Þwith r 2 R Global bifurcations occur when “larger” invariant sets,
satisfying:  f ðx; r Þ ¼ f ðx; r Þ ( f is an odd function such as periodic orbits, collide with equilibria.
@f @f 2
with regard to x), @x ð0; r0 Þ ¼ 0, @x 2 ð0; r0 Þ ¼ 0, This causes changes in the topology of the trajectories
@f 3 @f @f 2
@x3 ð 0; r 0 Þ ¼
6 0, @r ð 0; r 0 Þ ¼ 0, @x@r ð 0; r 0Þ ¼
6 0 has in the phase space which cannot be confined to a small
a pitchfork bifurcation at ðx; r Þ ¼ ð0; r0 Þ. The form neighborhood, as is the case with local bifurcations.
B 82 Bifurcation

Bifurcation, Bifurcation Diagram for the modified Phillips curve


Fig. 2 Bifurcation diagram
2.8
for the modified Phillips curve

2.6

2.4

2.2

πe 2

1.8

1.6

1.4
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
b

In fact, the changes in topology extend out to an arbi- Heteroclinic Bifurcation In mathematics, particu-
trarily large distance (hence “global”). larly dynamical systems, a heteroclinic bifurcation is
Examples of global bifurcations include: a global bifurcation involving a heteroclinic cycle.
1. Homoclinic bifurcation in which a ▶ limit cycle Heteroclinic bifurcations come in two types: resonance
collides with a saddle point (▶ Saddle-Node bifurcations and transverse bifurcations. Both types of
Bifurcation) bifurcations will result in the change of stability of the
2. Heteroclinic bifurcation in which a limit cycle col- heteroclinic cycle.
lides with two or more saddle points At a resonance bifurcation, the stability of the cycle
3. Infinite-period bifurcation in which a stable node and changes when an algebraic condition on the eigen-
saddle point simultaneously occur on a limit cycle values of the equilibria in the cycle is satisfied. This
4. Blue sky catastrophe in which a limit cycle collides is usually accompanied by the birth or death of a
with a nonhyperbolic cycle periodic orbit.
Global bifurcations can also involve more compli- A transverse bifurcation of a heteroclinic cycle is
cated sets such as chaotic attractors. caused when the real part of a transverse eigenvalue of
one of the equilibria in the cycle passes through zero.
Homoclinic Bifurcation In mathematics, This will also cause a change in stability of the
a homoclinic bifurcation is a global bifurcation which heteroclinic cycle.
often occurs when a periodic orbit collides with
a saddle point. Global Bifurcation In mathematics, an infinite-
The image below shows a phase portrait before, at, period bifurcation is a global bifurcation that can
and after a homoclinic bifurcation in 2D. The periodic occur when two fixed points emerge on a limit cycle.
orbit grows until it collides with the saddle point. At As the limit of a parameter approaches a certain critical
the bifurcation point the period of the periodic orbit has value, the speed of the oscillation slows down and the
grown to infinity and it has become a homoclinic orbit. period approaches infinity. The infinite-period bifurca-
After the bifurcation there is no longer a periodic orbit tion occurs at this critical value. Beyond the critical
(Fig. 3). value, the two fixed points emerge continuously from
Bifurcation 83 B

Bifurcation, Fig. 3 A homoclinic bifurcation occurs when bifurcation parameter increases, the limit cycle grows until it
a periodic orbit collides with a saddle point. Left panel: For exactly intersects the saddle point, yielding an orbit of infinite
small parameter values, there is a saddle point at the origin and duration. Right panel: When the bifurcation parameter increases
a limit cycle in the first quadrant. Middle panel: As the further, the limit cycle disappears completely

each other on the limit cycle to disrupt the oscillation Symmetry Breaking in Bifurcation Sets
and form two saddle points. In a dynamical system such as:

Blue Sky Catastrophe The blue sky catastrophe is


x€ þ f ðx; mÞ þ egðxÞ ¼ 0 (12)
a type of ▶ bifurcation of a periodic orbit. In other
words, it describes a sort of behavior that stable solu-
tions of a set of differential equations can undergo as which is structurally stable when m 6¼ 0, if
the equations are gradually changed. This type of a bifurcation diagram is plotted, treating m as the bifur-
bifurcation is characterized by both the period and cation parameter, but for different values of e, the case
the length of the orbit approaching infinity as the e ¼ 0 is the symmetric pitchfork bifurcation. When
control parameter approaches a finite bifurcation e 6¼ 0, we say we have a pitchfork with broken sym-
value, but with the orbit still remaining within metry. This is illustrated in Fig. 4.
a bounded part of the phase space, and without loss
of ▶ stability before the bifurcation point. In other Codimension of a Bifurcation
words, the orbit vanishes into the blue sky. The codimension of a bifurcation is the number of
The bifurcation has found application in, among parameters which must be varied for the bifurcation
other places, slow-fast models of computational neu- to occur. This corresponds to the codimension of the
roscience. The possibility of the phenomenon was parameter set for which the bifurcation occurs within
raised by David Ruelle and Floris Takens in 1971, the full space of parameters. Saddle-node bifurcations
and explored by R.L. Devaney and others in the fol- are the only generic local bifurcations which are really
lowing decade. A more compelling analysis was not codimension-one (the others all having higher
performed until the 1990s. codimension). However, often transcritical and pitch-
This bifurcation has also been found in the context fork bifurcations are also often thought of as
of fluid dynamics, namely, in double-diffusive convec- codimension-one, because the normal forms can be
tion of a small Prandtl number fluid. Double-diffusive written with only one parameter.
convection occurs when convection of the fluid is An example of a well-studied codimension-two
driven by both thermal and concentration gradients, bifurcation is the Bogdanov–Takens bifurcation.
and the temperature and concentration diffusivities
take different values. The bifurcation is found in an Catastrophe Theory
orbit that is born in a global saddle-loop bifurcation, In mathematics, catastrophe theory is a branch of
becomes chaotic in a period-doubling cascade, and bifurcation theory in the study of dynamical systems
disappears in the blue sky catastrophe. (▶ Dynamical Systems Theory, Bifurcation Analysis);
B 84 Bifurcation

Bifurcation,
1.5 ε = −0.1
Fig. 4 Symmetry breaking in
pitchfork bifurcation as the
parameter epsilon is varied.
e ¼ 0 is the case of symmetric 1
pitchfork bifurcation

0.5

x −0.5

−1

−1.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
μ

it is also a particular special case of more general called the germs of the catastrophe geometries.
singularity theory in geometry. The degeneracy of these critical points can be unfolded
Bifurcation theory studies and classifies phenomena by expanding the potential function as a Taylor series
characterized by sudden shifts in behavior arising from in small perturbations of the parameters.
small changes in circumstances, analyzing how the When the degenerate points are not merely acciden-
qualitative nature of equation solutions depends on tal, but are structurally stable, the degenerate points exist
the parameters that appear in the equation. This may as organizing centers for particular geometric structures
lead to sudden and dramatic changes, for example, the of lower degeneracy, with critical features in the param-
unpredictable timing and magnitude of a landslide. eter space around them. If the potential function depends
Catastrophe theory, which originated with the work on two or fewer active variables, and four (respectively,
of the French mathematician René Thom in the 1960s, five) or fewer active parameters, then there are only
and became very popular due to the efforts of Christo- seven (respectively, eleven) generic structures for these
pher Zeeman in the 1970s, considers the special case bifurcation geometries, with corresponding standard
where the long-run stable equilibrium can be identified forms into which the Taylor series around the catastro-
with the minimum of a smooth, well-defined potential phe germs can be transformed by diffeomorphism (a
function (▶ Lyapunov Stability). smooth transformation whose inverse is also smooth).
Small changes in certain parameters of a nonlinear These seven fundamental types are now presented, with
system can cause equilibria to appear or disappear, or the names that Thom gave them.
to change from attracting to repelling and vice versa,
leading to large and sudden changes of the behavior of Fold Catastrophe
the system. However, examined in a larger parameter The potential function of one active variable is:
space, catastrophe theory reveals that such bifurcation
points tend to occur as part of well-defined qualitative V ¼ x3 þ ax (13)
geometrical structures.
Catastrophe theory analyses degenerate critical At negative values of a, the potential has two
points of the potential function – points where not extrema – one stable and one unstable. If the parameter
just the first derivative but one or more higher deriva- a is slowly increased, the system can follow the stable
tives of the potential function are also zero. These are minimum point. But at a ¼ 0 the stable and unstable
Bifurcation 85 B
One can also consider what happens if one holds b
constant and varies a. In the symmetrical case b ¼ 0,
x one observes a pitchfork bifurcation as a is reduced,
a
with one stable solution suddenly splitting into two
stable solutions and one unstable solution as the phys- B
ical system passes to a < 0 through the cusp point (0, 0)
(an example of spontaneous symmetry breaking).
Away from the cusp point, there is no sudden change
in a physical solution being followed: when passing
through the curve of fold bifurcations, all that happens
is an alternate second solution becomes available.
3x 2 +a = 0 A famous suggestion is that the cusp catastrophe
can be used to model the behavior of a stressed dog,
Bifurcation, Fig. 5 Stable and unstable pair of extrema disap- which may respond by becoming cowed or becoming
pear at a fold bifurcation angry. The suggestion is that at moderate stress (a > 0),
the dog will exhibit a smooth transition of response
extrema meet, and annihilate. This is the bifurcation from cowed to angry, depending on how it is provoked.
point. At a > 0 there is no longer a stable solution. If But higher stress levels correspond to moving to the
a physical system is followed through a fold bifurca- region (a < 0). Then, if the dog starts cowed, it will
tion, one therefore finds that as a reaches 0, the stability remain cowed as it is irritated more and more, until it
of the a < 0 solution is suddenly lost, and the system reaches the “fold” point, when it will suddenly, dis-
will make a sudden transition to a new, very different continuously snap through to angry mode. Once in
behavior. This bifurcation value of the parameter a is “angry” mode, it will remain angry, even if the direct
sometimes called the tipping point (Fig. 5). irritation parameter is considerably reduced.
Another application example is for the outer sphere
Cusp Catastrophe electron transfer frequently encountered in chemical
The potential function of one active variable is: and biological systems (Xu 1990).
Fold bifurcations and the cusp geometry are by far
V ¼ x4 þ ax2 þ bx (14)
the most important practical consequences of catastro-
The cusp geometry is very common, when one phe theory. They are patterns which reoccur again and
explores what happens to a fold bifurcation if again in physics, engineering, and mathematical
a second parameter, b, is added to the control space. modeling. They are the only way we currently have
Varying the parameters, one finds that there is now of detecting black holes and the dark matter of the
a curve (blue) of points in (a, b) space where stability is universe, via the phenomenon of gravitational lensing
lost, where the stable solution will suddenly jump to an producing multiple images of distant quasars.
alternate outcome. The remaining simple catastrophe geometries are
But in a cusp geometry the bifurcation curve loops very specialized in comparison, and presented here
back on itself, giving a second branch where this alter- only for curiosity value (Fig. 6).
nate solution itself loses stability, and will make a jump
back to the original solution set. By repeatedly increas- Swallowtail Catastrophe
ing b and then decreasing it, one can therefore observe The potential function of one active variable is:
hysteresis loops, as the system alternately follows one
solution, jumps to the other, follows the other back, V ¼ x5 þ ax3 þ bx2 þ cx (15)
then jumps back to the first.
However, this is only possible in the region of The control parameter space is three dimensional.
parameter space a < 0. As a is increased, the hysteresis The bifurcation set in parameter space is made up of
loops become smaller and smaller, until above a ¼ 0 three surfaces of fold bifurcations, which meet in two
they disappear altogether (the cusp catastrophe), and lines of cusp bifurcations, which in turn meet at
there is only one stable solution. a single swallowtail bifurcation point.
B 86 Bifurcation, Supercritical and Subcritical

Bifurcation, Fig. 6 (Left)


Cusp shape in parameter space 1
(a,b) near the catastrophe b x
point, showing the locus of a a
fold bifurcations separating
the region with two stable 2
solutions from the region with a<0 a>0
one; (right) pitchfork
bifurcation at a ¼ 0 on the
surface b ¼ 0
a = -6x 2, b = -8x 3 x(4x 2 + 2a) = 0

As the parameters go through the surface of fold Galan J, Freire E (1999) Chaos in a mean field model of coupled
bifurcations, one minimum and one maximum of the quantum wells; bifurcations of periodic orbits in a symmetric
hamiltonian system. Rep Math Phys 44(1):81–86
potential function disappear. At the cusp bifurcations, Gao J, Delos JB (1997) Quantum manifestations of bifurcations
two minima and one maximum are replaced by one of closed orbits in the photoabsorption spectra of atoms in
minimum; beyond them the fold bifurcations disappear. electric fields. Phys Rev A 56:356–364
At the swallowtail point, two minima and two maxima Gutzwiller MC (1990) Chaos in classical and quantum mechan-
ics. Springer, New York. ISBN 0-387-97173-4
all meet at a single value of x. For values of a > 0, beyond http://www.scholarpedia.org/article/Quantum_chaos
the swallowtail, there is either one maximum-minimum http://www.scholarpedia.org/article/User:Gutzwiller
pair, or none at all, depending on the values of b and c. Kleppner D, Delos JB (2001) Beyond quantum mechanics:
Two of the surfaces of fold bifurcations, and the two lines insights from the work of Martin Gutzwiller. Found Phys
31(4):593–612
of cusp bifurcations where they meet for a < 0, therefore Monteiro TS, Saraga DS (2001) Quantum wells in tilted fields:
disappear at the swallowtail point, to be replaced with semiclassical amplitudes and phase coherence times. Foun-
only a single surface of fold bifurcations remaining. dations Phys 31(2):355
Salvador Dalı́’s last painting, The Swallow’s Tail, was Peters AD, Jaffé C, Delos JB (1994) Quantum manifestations of
bifurcations of classical orbits: an exactly solvable model.
based on this catastrophe. Phys Rev Lett 73:2825–2828
Stamatiou G, Ghikas DPK (2007) Quantum entanglement
Butterfly Catastrophe dependence on bifurcations and scars in non-autonomous
The potential function of one active variable is: systems: the case of quantum kicked top. Phys Lett
A 368(3–4):206–214
Wieczorek S, Krauskopf B, Simpson TB, Lenstra D (2005) The
V ¼ x6 þ ax4 þ bx3 þ cx2 þ dx (16) dynamical complexity of optically injected semiconductor
lasers. Phys Rep 416(1–2):1–128
Xu F (1990) Application of catastrophe theory to the DG¼ 6
to –DG
Depending on the parameter values, the potential
relationship in electron transfer reactions. Zeitschrift f€ ur
function may have three, two, or one different local Physikalische Chemie Neue Folge 166:79–91
minima, separated by the loci of fold bifurcations. At
the butterfly point, the different three surfaces of fold
bifurcations, the two surfaces of cusp bifurcations, and
the lines of swallowtail bifurcations all meet up and Bifurcation, Supercritical and Subcritical
disappear, leaving a single cusp structure remaining
when a > 0. Tianshou Zhou
School of Mathematics and Computational Sciences,
References Sun Yet-Sen University, Guangzhou, Guangdong,
China
Blanchard P, Devaney RL, Hall GR (2006) Differential equations.
Thompson, London, pp 96–111
Courtney M et al (1995) Closed orbit bifurcations in continuum Definition
stark spectra. Phys Rev Lett 74:1538–1541
Founargiotakis M, Farantos SC, Skokos CH, Contopoulos G
(1997) Bifurcation diagrams of periodic orbits for unbound In bifurcation theory, a field within mathematics,
molecular systems: FH2. Chem Phys Lett 277(5–6):456–464 a pitchfork bifurcation is a particular type of local
Bile Acid and Xenobiotic System 87 B
bifurcation. Pitchfork bifurcations, like Hopf bifurca- and transporter-related functions that ensure the detox-
tions, have two types – supercritical and subcritical. ification and removal from the body of harmful xeno-
In flows, that is, continuous dynamical systems biotic and endobiotic compounds while ensuring that
described by ODE, pitchfork bifurcations occur gener- primary bile acids (essential for the emulsification and
ically in systems with symmetry. absorption of dietary fats and fat-soluble vitamins) are B
An ODE not eliminated and can be reused. Xenobiotics are
chemical compounds which are foreign to the living
dx
¼ f ðx; r Þ organism. They are not naturally found or expected to
dt be present in the organism as they are neither produced
described by a one parameter function f ðx; r Þ by it nor are they part of the organism’s natural diet.
with r 2 R satisfying:  f ðx; r Þ ¼ f ðx; r Þ ( f is an The process of xenobiotic metabolism ensures they are
@f
odd function with regard to x), @x ð0; r 0 Þ ¼ 0, broken down and either put to good use or detoxified
@f 2 @f 3 @f and removed from the organism. Examples of xenobi-
@x2 ð0; r 0 Þ ¼ 0, @x3 ð0; r 0 Þ 6¼ 0, @r ð0; r 0 Þ ¼ 0,
@f 2 otics are drugs, pesticides, and carcinogens. Endobi-
@x@r ð0; r 0 Þ 6¼ 0 has a pitchfork bifurcation at
otics are chemicals produced by an organism, such as
ðx; r Þ ¼ ð0; r 0 Þ. The form of the pitchfork is given by steroid hormones or bile acids. They are produced to
the sign of the third derivative: carry out a variety of functions, e.g., acting as

@f 3 < 0 supercritical a signaling molecule or facilitate absorption of dietary
ð 0; r Þ
@x3
0
> 0 subcritical fats; however, their concentrations need to be strictly
regulated as they can become toxic. The overall
▶ metabolic flux of the BAXS is primarily achieved
through the activities of nuclear receptors, which have
Bigraph the ability to directly bind to DNA and regulate ▶ gene
expression. Nuclear receptors can be thought of as
▶ Bipartite Graph metabolic sensors of exogenous and endogenous
toxins. Detailed knowledge of the factors that govern
the activity of nuclear receptors is required to under-
stand a range of physiological processes, such as drug-
Bile Acid and Xenobiotic System drug interactions, intracrine hormone metabolism,
▶ xenobiotic clearance, and cholesterol homeostasis
Noel Kennedy1, Paul Thompson1, Oliver Schmidt1, and lipid homeostasis.
Werner Dubitzky2 and Huiru Zheng3
1
School of Biomedical Sciences, University of Ulster,
Coleraine, UK Characteristics
2
Biomedical Sciences Research Institute, University of
Ulster, Coleraine, UK A System View
3
School of Computing and Mathematics, Computer ▶ Bile acids represent essential but also toxic biolog-
Science Research Institute, University of Ulster, ical reagents whose concentrations within the body
Jordanstown, UK require critical maintenance. Many of the genetic fac-
tors that dictate bile acid concentration also govern the
detoxification and removal from the body of many
Synonyms drugs and foreign compounds. These overlapping bio-
logical processes define a network or system termed
BAXS; Bile acid system the bile acid and xenobiotic system which involves the
coordinated activities of many genes in different
Definition tissues.
Bile acids are necessary for the emulsification and
The bile acid and xenobiotic system (BAXS) defines absorption of dietary fats and the regulation of choles-
an intricate physiological network of chemoprotective terol homeostasis. They are synthesized in the liver
B 88 Bile Acid and Xenobiotic System

Bile Acid and Xenobiotic


System, Fig. 1 Schematic Synthesis
illustration of enterohepatic Liver
circulation, illustrating the
circulation of biliary acids

Bile Acids
from the liver
Gall bladder

Bile Acids

Hepatic portal vein

95% reabsorbed

Duodenum IIeum

Bile Acids 5% lost to faeces

Small Intestine

from the catabolism of cholesterol forming the primary Given the multi-scale nature of the BAXS, it is difficult
bile acids, cholic and chenodeoxycholic acids. Bacte- to assess the exact role of individual receptors and their
rial flora dehydroxylates a portion of the primary bile activating/deactivating ▶ ligands with respect to the
acids in the intestinal lumen, resulting in secondary overall BAXS flux and its variation throughout
bile acids, deoxycholic and lithocholic acids. These a number of participating organs. A comprehensive
four bile acids possess detergent-like properties neces- description of the interacting components that govern
sary for the absorption of dietary lipids and fat-soluble BAXS ▶ gene expression would enable the identifica-
vitamins. When present at high concentrations, they tion of regulatory “nodes” as targets for treatment
can become toxic; therefore, ▶ bile acid concentra- regimes, facilitate a deeper understanding of the com-
tions need to be appropriately regulated and recycled ponents impacting drug-drug interactions, and provide
(Lefebvre et al. 2009). a framework for the design of large-scale, integrated
Similarly, the BAXS detects any accumulation of prediction studies.
xenobiotic and endobiotic compounds and facilitates
their detoxification and removal from the body. This Nuclear Receptors
is accomplished through a complex network of sensors Nuclear receptors are a superfamily of proteins which
in the form of nuclear receptors that function as can bind directly to DNA and upregulate or
ligand-activated transcription factors (Kliewer and downregulate the transcription of a gene. Within the
Willson 2002). The process of enterohepatic circula- BAXS, nuclear receptors act like a network of sensors
tion (as depicted in Fig. 1) ensures 95% of bile acids detecting compounds such as hormones or xenobiotics.
are recycled. These compounds, referred to as ligands, bind to the
The BAXS involves the coordinated activities of nuclear receptors and can either activate them, leading
a number of genes across multiple temporal and spatial to increased transcription, or deactivate them, leading
scales. Basic BAXS processes and their time scales to repression of gene transcription. A bound nuclear
include the binding of ▶ ligands to nuclear receptors receptor may also dimerize with another nuclear recep-
(seconds), ▶ gene expression and ▶ gene regulation tor to form a complex which then binds to response
(hours), ▶ transporter protein activity (minutes), and elements located in the promoter region of the gene.
metabolic enzyme activity (seconds). Spatially, This activates the gene and transcription is increased
BAXS components range from the dimension of mol- considerably. Nuclear receptors can also repress gene
ecules (e.g., nuclear receptors) to organs (e.g., liver). expression through competitive inhibition, whereby
Bile Acid and Xenobiotic System 89 B
Bile Acid and Xenobiotic
System, Fig. 2 The role of
nuclear receptors, illustrating
the competition between
nuclear receptors for the
ligands they bind and the B
genes targeted

the nuclear receptor competes with other receptors for The BAXS main process revolves around the circu-
the ligands they bind, or by binding to the promoter lation and critical maintenance of bile acid concentra-
region of the gene, thus reducing the efficacy of acti- tion and the detection and removal of harmful
vation and therefore gene transcription. Nuclear recep- compounds. This process is composed of multiple sub-
tors are classified as ▶ transcription factors and their processes (e.g., transactivation, interactions with
combined inhibitory and activating effects regulate co-activators or corepressors, heterodimerization)
gene expression. This is vital in controlling develop- operating on different time and space scales.
ment, metabolism and maintaining adult homeostasis In the BAXS, the initial stimuli leading to
(▶ Homeostasis). a physiological response would be the binding of
Nuclear receptors serve to detect fluctuations in a ligand by a nuclear receptor. Subsequently, the
concentration of many compounds and initiate a bound nuclear receptor binds to response elements in
physiological response by regulating the BAXS. the target genes, leading to increased ▶ gene expres-
▶ Transcriptional regulation by nuclear receptors sion and the cascading effects that would ensue. Fur-
involves both activating and repressive effects upon ther processes include conjugation and transporter
specific groups of genes. Figure 2 illustrates the overlap functions (Stieger and Meier 1998).
present between nuclear receptors and the genes they Examples of nuclear receptors within the BAXS
target and also the ligands that bind to and activate them. include the pregnane X receptor (PXR), farnesoid
It is these factors that contribute to the phenomenon X receptor (FXR), constitutive androstane receptor
of drug-drug interactions, e.g., between St. John’s wort (CAR), retinoid X receptor (RXR), and vitamin
and Cyclosporine (Barone et al. 2000), or St. John’s D receptor (VDR). A ligand can be either endogenous,
wort and an oral contraceptive (Hall et al. 2003). e.g., a hormone or bile acid, or exogenous such as
Positive feed-forward and negative feedback loops can a drug, e.g., Rifampicin, St. John’s wort, Dexametha-
also occur, e.g., within the cholesterol ▶ metabolic sone. A ligand binds to the nuclear receptor which may
pathway (Eloranta and Kullak-Ublick 2005). also bind to another receptor of the same type to form
B 90 Bile Acid and Xenobiotic System

Bile Acid and Xenobiotic


System, Fig. 3 Illustration of
PXR-mediated metabolism of
Hyperforin in the liver
inhibited by Ritonavir,
FXR-mediated bile acid
metabolism, and the transport
process

a homodimer (homodimerization) or with other through ligand binding, expression of CYP3A4 and
nuclear receptor complexes to form a heterodimer MDR1 are considerably increased, resulting in the
(heterodimerization). The overall complex then binds metabolism of Hyperforin and its transport from
to the ▶ promoter or ▶ enhancer of a specific gene and the cell into the intestinal lumen (Lin et al. 2009). In
this either upregulates or downregulates expression of the absence of receptor binding, Ritonavir inhibits
that gene depending on the complex formed. transcription of CYP3A4 and MDR1. In theory, this
could lead to accumulated levels of Hyperforin in the
BAXS Process and Example liver as unbound Ritonavir inhibits the metabolism
The example in Fig. 3 shows the effects of Ritonavir mechanism.
(an antiretroviral drug from the protease inhibitor class FXR is activated by primary and secondary bile
used to treat HIV infection and AIDS) on the metabo- acids, chenodeoxycholic acid (CDCA) and lithocholic
lism of Hyperforin (a phytochemical produced by acid (LCA). It upregulates transcription of CYP3A4,
some of the members of the plant, e.g., Hypericum multidrug-resistant protein 2 (MRP2) and bile salt
perforatum which is commonly known as St John’s efflux pump (BSEP), both members of the ATP-
wort) in the liver and the overlap of this process with binding cassette (ABC) transporter superfamily of
FXR-mediated primary and secondary bile acid metab- membrane-associated proteins responsible for
olism. Both Ritonavir and Hyperforin are activating transporting bile acids into the bile duct. In both pro-
ligands for PXR (Chang 2009; Willson and Kliewer cesses, an overlap occurs at CYP3A4. A patient taking
2002). CYP3A4 is a member of the cytochrome P450 Hyperforin will have increased expression of CYP3A4
superfamily of enzymes and one of the most important which may lead to a deficiency in ▶ bile acid concen-
involved in xenobiotic metabolism. Multidrug resis- tration as this gene metabolizes bile acids. Similarly,
tance protein 1 (MDR1) is a member of the ATP- a patient with high bile acid concentrations may reduce
binding cassette (ABC) transporter superfamily of the efficacy of Hyperforin (if taken) as transcription of
membrane-associated proteins responsible for CYP3A4 is increased. If Ritonavir is added, then bile
transporting a wide variety of substrates from the cell acids and Hyperforin could accumulate to toxic levels
(Koudriakova et al. 1998). Once PXR is activated in the liver.
Binding Affinity 91 B
Cross-References
Bile Acid System
▶ Bile Acid and Xenobiotic System
▶ Enhancer ▶ Bile Acid and Xenobiotic System
▶ Gene Expression B
▶ Gene Regulation
▶ Ligand
▶ Metabolic Flux Analysis Bimodality
▶ Metabolic Networks, Databases
▶ Metabolic Pathway Analysis ▶ Bistability
▶ Promoter
▶ Transcription Factor
▶ Transcriptional Regulation
▶ Xenobiotics Binding Affinity

Xiujun Zhang
Institute of System Biology, Shanghai University,
References
Shanghai, China
Barone G, Gurley B, Ketel B, Lightfoot M, Abul-Ezz S (2000)
Drug interaction between St. John’s wort and cyclosporine.
Ann Pharmacother 34(9):1013–1016 Definition
Chang T (2009) Activation of pregnane X receptor (PXR) and
constitutive androstane receptor (CAR) by herbal medicines.
AAPS J 11(3):590–601 Binding affinity is a measure of the tendency or
Eloranta JJ, Kullak-Ublick GA (2005) Coordinate transcrip- strength of interactions between molecules. The mol-
tional regulation of bile acid homeostasis and drug metabo- ecules that can bind together include proteins, DNA,
lism. Arch Biochem Biophys 433(2):397–412
antibodies, enzymes, and some other organic mole-
Hall SD, Wang Z, Huang S, Hamman MA, Vasavada N,
Adigun AQ, Hilligoss JK, Miller M, Gorski JC (2003) The cules such as drugs. The result of molecular binding
interaction between St John’s wort and an oral contraceptive is formation of a molecular complex such as protein-
[ast]. Clin Pharmacol Ther 74(6):525–535 protein, protein-DNA, and protein-drug complex.
Kliewer SA, Willson TM (2002) Regulation of xenobiotic and
Binding affinity can be quantified and detected by
bile acid metabolism by the nuclear pregnane X receptor.
J Lipid Res 43(3):359–364 some physical techniques (Karney et al. 2005). For
Koudriakova T, Iatsimirskaia E, Utkin I, Gangl E, example,
Vouros P, Storozhuk E, Orza D, Marinina J, Gerber N
(1998) Metabolism of the human immunodeficiency L þ P  LP (1)
virus protease inhibitors Indinavir and Ritonavir by human
intestinal microsomes and expressed cytochrome where L is a ligand, P is a protein, LP is the ligand-
P4503A4/3A5: mechanism-based inactivation of cyto- protein complex.
chrome P4503A by Ritonavir. Drug Metab Dispos
The concentration of the ligand-protein complex is
26(6):552–561
Lefebvre P, Cariou B, Lien F, Kuipers F, Staels B (2009) Role of given by the dissociation constant:
bile acids and bile acid receptors in metabolic regulation.
Physiol Rev 89(1):147–191 ½L ½P
Kd ¼ (1)
Lin YS, Yasuda K, Assem M, Cline C, Barber J, Li C, ½LP
Kholodovych V, Ai N, Chen JD, Welsh WJ, Ekins S, Schuetz
EG (2009) The major human pregnane X receptor (PXR) The binding affinity is defined as:
splice variant, PXR.2, exhibits significantly diminished
ligand-activated transcriptional regulation. Drug Metab kd
!
Dispos 37(6):1295–1304 NA
pKd ¼ log10 (2)
Stieger B, Meier PJ (1998) Bile acid and xenobiotic transporters 1 kmolm3
in liver. Curr Opin Cell Biol 10(4):462–467
Willson TM, Kliewer SA (2002) PXR, CAR and drug metabo-
lism. Nat Rev Drug Discov 1(4):259–266 where NA is the Avogadro constant.
B 92 Binomial Distribution Without Replacement

References
Biochemical Systems Optimization
Karney CF, Ferrara JE, Brunner S (2005) Method for Through Mathematical Programming
computing protein binding affinity. J Comput Chem
26(3):243–251
Julio Vera1 and Néstor V. Torres2
1
Department of Systems Biology and Bioinformatics,
Institute of Computer Sciences, University of Rostock,
Rostock, Germany
Binomial Distribution Without 2
Department of Biochemistry and Molecular Biology,
Replacement University of La Laguna, San Cristóbal de La Laguna,
Islas Canarias, Tenerife, Spain
▶ Hypergeometric Distribution

Synonyms

Bioactivity Metabolic engineering

▶ Biological Activity
Definition

Mathematical models can be used to predict the


Bioassay effect of costly complex interventions over biochemi-
cal systems with technological or biomedical purposes.
▶ Biological Assay Toward this end, mathematical modeling can be
combined with mathematical optimization techniques
giving support to microbiologists and biomedical
researchers in the biotechnological improvement
Biochart Diagram of microorganisms with industrial applications
(“what is the minimum set of enzymes that should
▶ Modeling Formalisms, Lymphocyte Dynamics and be modified in order to maximize the production of
Repertoires a desired end product?”) or in the design of new
therapeutic approaches (“what is the minimum set of
interactions in a biochemical network that must
be inhibited by specific drugs to subvert a given
Biochemical Kinetics pathological condition?”) (Torres and Voit 2002;
Vera et al. 2007).
▶ Mass Action Stochastic Kinetics The underlying idea is that a well-characterized and
calibrated mathematical model, in ordinary differential
equations, has predictive abilities that can be used
to obtain optimal configurations of the biochemical
Biochemical Network system regarding a biotechnological (or biomedical)
problem. This idea has been exploded by a number of
▶ Metabolic Networks groups in the past decade (Voit 1992; Torres et al.
1997; Hatzimanikatis et al. 1998). In a nutshell, simu-
lations with the mathematical model are used to predict
the behavior of the biochemical system. An optimiza-
Biochemical Pi-Calculus tion program, according to the biotechnological
problem under investigation, is established including
▶ Stochastic pi-Calculus the following: (1) one or more objectives are
Biochemical Systems Optimization Through Mathematical Programming 93 B
established, describing in mathematical terms the the form of kinetic model or GMA, and then
properties of the system that are to be improved; transferred to S-system.
(2) a set of constrains is established, accounting Step 2: Analysis of quality of the model. The S-system
for physiological or technological limitations; and modeling formalism provides us with tools to
(3) a set of potential interventions on the system is analyze its stability, robustness, and dynamic B
established, which may include modulation of the sys- response. It is important to ensure the validity
tem inputs, overexpression, or repression of enzymes, of the model, i.e., that it properly describes
but also inhibition of given biochemical processes. the known behavior of the system. Otherwise, it
Based on this program, optimization algorithms are is necessary to refine the model prior to its
used to estimate new configurations of the system optimization.
that optimized the established objectives via the choice Step 3: Construction and resolution program optimi-
of the appropriate interventions, which should be zation. This is the core of the procedure. All infor-
experimentally tested. mation gathered on the system and the limitations of
In many cases, the methodology here discussed biological or technological constraints on the
involves highly nonlinear optimization problems, solutions are converted into the mathematical
difficult to solve in an efficient manner. Some of equations that make up the program optimization.
these difficulties are circumvented for steady-state After ensuring the validity of the model and
optimization when canonical models like S-systems the adequacy of the components of the program
and power models are used. To illustrate the charac- optimization, the stated objectives and the condi-
teristics of this methodology and some of the advan- tions imposed on the system are moved into loga-
tages associated to the use of canonical models, we will rithmic space. In the logarithmic space, the
focus here on the so-called indirect optimization optimization problem becomes a linear program
method with S-systems models. and once the optimal solution is obtained it is trans-
ferred to the area of the original variables. In addi-
tion to the above, the optimized system must be at
Characteristics a steady state and fulfill some constraints. These
constraints, as well as the objective functions, are
Linear Programming and the IOM Method readily translated into linear functions and inequal-
The Indirect Optimization Method (IOM) is based on ities, and the optimization task becomes a linear
the fact that although S-system models are nonlinear, program.
their steady-state equations are linear when the vari- In the more general case (multiobjective optimiza-
ables are expressed in logarithmic coordinates. Since tion), after this point we can choose among three
a function and its logarithm assume their maxima for possible options (Vera et al. 2003, 2010): (1) pure
the same argument, yields or fluxes can thus be opti- multicriteria problem, which is the formulation of
mized with linear programs expressed in terms of the a multiple objective linear program, a direct exten-
logarithms of the original variables. Also, typical sion of the linear programming case; (2) weighted
constraints that the optimized system has to satisfy sum approach, where we assign a weight to each
reduce to linear equations in a logarithmic coordinate objective or function to be optimized (usually these
system. Thus, transporting the steady-state system weights are chosen between zero and one, such that
and the constraints into a logarithmic space reduces they sum up to one); and (3) goal programming,
the nonlinear optimization problem to a problem of where a goal level of achievement is established
straightforward linear programming (Voit 1992; Vera for each objective. These goal levels are soft
et al. 2003). The method is a procedure that system- constraints that are included in the definitions
atizes the steps to follow for optimum results. Figure 1 of the objective functions. Available software
illustrates the stages of the method in an outline for packages can produce the efficient solution set
the most general case. for multiobjective linear programming tasks such
Step 1: Getting the model. The first step is the as ADBASE or the Matlab optimization toolbox
development of a model system. This can be for the goal programming and weighted sum
modeled directly on the S-system formalism or in approach.
B 94 Biochemical Systems Optimization Through Mathematical Programming

Biochemical Systems Indirect Optimization Method


Optimization Through
Mathematical
Programming, biological
Fig. 1 Sketch of the indirect system
optimization method modelling

step 1
kinetic
model

S-system
model

model quality assesment

step 2
• stability
optimization engine • robustness
• dynamics
Design of multiobjective
linear program
1. objectives
max/min f1(y),...,fn(y)
2. subject to
• steady state equations

step 3
• metabolic burden logarithmic transformation
• fluxes constraints y = In(x)
• metabolites bounds
• enzymes bounds
• others

antilogarithmic transformation
efficient solutions set

solutions quality assesment


• stability
• robustness

step 4
• dynamics

selected
solutions

linear domain non-linear domain

Step 4: Analysis and refinement of the solution. Here Case Study 1. The Monoobjective Version of the
the system’s properties at the optimal state (stabil- IOM: Optimization of the L-()-Carnitine
ity, robustness, and dynamics) are evaluated. If the Biosynthesis by Escherichia Coli
optimal solution is sound, it is chosen as a solution; L-()-carnitine is a chiral compound widely distrib-
otherwise the imposed restrictions or the objective uted in nature and with a wide range of medical appli-
function are refined and the optimization process cations. A great deal of the L-carnitine is produced by
repeated until a satisfactory solution is reached. In chemical synthesis, with the disadvantage of produc-
the case where an S-system model was built from ing a racemic mixture that is necessary to separate in
a previous kinetic model, the S-system solution a costly process.
must be transferred to the original model. In the Figure 2 shows the experimental set up for the
following we will illustrate the application of this biotransformation of crotonobetaine into L-carnitine
approach to two case studies. by an overproducing E. coli strain in a cell recycle
Biochemical Systems Optimization Through Mathematical Programming 95 B
Biochemical Systems INPUT
Optimization Through glycerol X6
Mathematical crotonobetaine X7
Programming,
Fig. 2 Experimental set up
for the biotransformation of B
crotonobetaine into
L-()-carnitine by E. coli glucose X1
strains biomass X4
membrane OUTPUT
filtration
module volumetric
flux X5
X2 X3

E. coli cell glycerol X1


crotonobetaine X2
fermentor L-carnitine X3

bioreactor. The model includes the metabolic transfor- 0:0211 ¼  0:9162 Y1  0:5675 Y4 þ 0:5676 Y5
mation of crotonobetaine into g-butyrobetaine, þ Y6  0:5675 Y8
through the activity of a previously described
0:6433 ¼  0:1424 Y2 þ 0:3538 Y3  0:0249 Y4
crotonobetaine reductase. In the optimization proce-
dure the dependent variables are X1 to X4, while X5 to þ 0:0249 Y5 þ 0:0605 Y7  0:0245 Y9
X9 are the parameters. The original model set up was 0:6246 ¼ 0:1106 Y2  0:3926 Y3 þ 0:0257 Y4
translated to the S-system representation yielding the  0:0257 Y5 þ 0:0254 Y9
following S-system representation (Alvarez-Vasquez 2:706 ¼ 0:8524 Y1 þ Y8
et al. 2002):
where Yi is ln(Xi) for i ¼ 1,. . ., 9. At this point it is
dX1
¼ X5 X6  1:0213 X10:9162 X40:5675 X50:4324 X80:5675 important to define the extent to which changes in the
dt variables and parameters are allowed. Generally this
dX2 variation range will be between 0.5 and 1.5 times the
¼ 0:2438 X30:3538 X40:9394 X50:0605 X70:0605 X90:9292 baseline values. An exception here is X4, the biomass,
dt
which can be allowed to increase up to two times the
 0:464 X20:1424 X40:9643 X50:0356 X90:9537 baseline. Accordingly:

dX3
¼ 0:3844 X20:1106 X4 X90:989 3:0746 Y1 4:1732; 2:6909 Y2 3:7895;
dt 2:3277 Y3 3:4263; 1:8068 Y4 3:1931;
 0:2058 X30:3926 X40:9742 X50:0257 X90:9636 0:6931 Y5 0:4054; 3:9120 Y6 5:0106;
3:2188 Y7 4:3174; 1:1988 Y8 0:1002;
dX4 4:1214 Y9 6:424:
¼ 0:0053 X10:8524 X4 X8  0:08 X4 :
dt
With the defined linear program two optimization
Subsequently the (mono)objective function is tasks were carried out: with one or two decision
defined, namely, the net rate of L-carnitine biosynthe- (independent) variables. The obtained optimal profiles
sis, expressed as Vc ¼ X3·X5. In logarithmic were translated from the S-system to the original
coordinates this function becomes Y3 + Y5, where model (K-M) and then implemented in the bioreactor.
Yi is ln(Xi). The steady-state condition, once linearized Table 1 shows the experimental results obtained which
in logarithmic coordinates, obtained was: are in good agreement with the theoretical predictions.
B 96 Biochemical Systems Optimization Through Mathematical Programming

Biochemical Systems Optimization Through Mathematical Programming, Table 1 Comparison between predicted (K-M)
and actual experimental values (Exp.) of L-()-carnitine production rate by E. coli in a cell recycle crotonobetaine biotransformation
system. The optimized parameter profiles for changes in one (1; dilution rate, X5) and two parameters (2; dilution rate and
crotonobetaine concentration in the input, X7) are shown. Results are given as the values divided by the basal
1 2
Variables Basal K-M Exp. K-M Exp.
Dependent
Glycerol, X1 43.28 mM 1 0.96 1 0.95
Crotonobetaine, X2 29.49 mM 1 0.97 1.49 1.77
Carnitine, X3 20.51 mM 1 1.01 1.11 1.13
Biomass, X4 12.18 g/L 1.5 1.53 1.5 1.53
Independent
Dilution rate, X5 1 L/h1 1.5 1.5
Crotonobetaine input, X7 50 mM 1 1.33
Production rate, Vc 21.1 mM/h1 1.5 1.54 1.65 1.74

In both cases assayed, the experimental steady state dX4


was the same as that predicted by the model and the ¼ 0:2365 X0:5285
3 X0:0994
5 X9
dt
increase in the rate of L-()-carnitine production was
almost coincident with the predicted increase.  2:0892 X0:0075
3 X0:304
4 X0:0484
5 X10

dX5
Case Study 2. The Multiobjective Version of the ¼ 1:406 X0:2605
3 X0:152
4 X0:0739
5 X0:5
9 X10
0:5
dt
IOM: Application to the Ethanol Production by
 2:9437 X0:1962 X0:1791 X0:2354 X0:3514 X0:2925
Saccharomyces Cerevisiae 1 2 5 7 8

We have used a mathematical model of the ethanol X0:0589


11 13 :
X0:297
production by Saccharomyces cerevisiae presented by
Schlosser et al. (1994) as a case study to apply the Three biotechnologically relevant optimization
multiobjective version of the IOM procedure. objectives are considered (Vera et al. 2003):
As it can be seen in Fig. 3, glucose is converted in (1) the maximization of the ethanol production, Fprod;
ethanol, polysaccharides, and glycerol. The model (2) the minimization of the total internal metabolite
refers to the ethanol production under anaerobic, no concentrations, Fint; and (3) the minimization of the
growing conditions, with glucose as the sole carbon total enzyme activities, Fenz. The first objective
source and absence of nitrogen. The preliminary model relates to the enhancement of productivity, measured
by Schlosser et al. (1994) was converted into an in terms of ethanol production. The other two objec-
S-system model in Vera et al. (2003) with the tives refer to the minimization of the amount of glu-
following structure: cose transformed into other products different from
ethanol and to the cell viability by reducing the osmo-
dX1 larity stress (objective 2) and the metabolic burden
¼ 1:0006 X0:0492
2 X6 (objective 3).
dt
Being Y ¼ ln(X), they are defined as:
 1:6497 X0:5582
1 X0:0465
5 X7
 
min ZðYÞ ; ZðYÞ ¼ Fprod ðYÞ; Fint ðYÞ; Fenz ðYÞ
dX2
¼ 1:6497 X0:5582
1 X0:0465
5 X7
dt
The maximization objective becomes a minimiza-
 0:5793 X0:5097
2 X0:2218
5 X0:8322
8 X0:1678
11 tion objective by changing the sign of the objective
function. These objective functions take the following
dX3
¼ 0:4536 X0:4407
2 X0:2665
5 X8 form in the logarithmic transformed space:
dt
 0:2456 X0:4506
3 X0:0441
4 X0:092
5 X0:8547
9 X0:1453
12 Fprod ðYÞ ¼ 0:0075 Y3 þ 0:304 Y4 þ 0:0483 Y5 þ Y10 ;
Biochemical Systems Optimization Through Mathematical Programming 97 B
Glucoseext configure the properties of the solutions; their values
medium are dictated by the experience and are formulated for
X6 cytosol each variable. Experience dictates (Alvarez-Vasquez
et al. 2000) that a reasonable lower boundary is the
X1 − 50% of the original basal steady-state value and five B
X5 times 0 this basal 0value
 for most of the variables
ADP
X7 X5 ADP 0:5Xi < Xi < 5Xi for the upper boundary. The
exception
 to this rule were the ATPase activity
X2 Polisacc
X11
X5 < 1:5X50 , limited by additional physiological
X5 X8 constrains, and the side-products polysaccharides and
ADP glycerol,
 which were maintained
 at their basal values
X3 2 Glycerol X11 ¼ X110
; X13 ¼ X13
0
. A constrain, limiting the
X12 value of the flux ratio through the enzymes X9 and
2 ADP
X9 X10 in order to prevent dynamical instability in the
2 X5
+ solutions (X9/X10 1.75) was introduced. A last con-
2 X4 straint to ensure that any computed solution doubles at
X5
2 ADP least the ethanol production in the original unmodified
X13 microorganism, VPK ðXÞ 2 VPK ðX0 Þ, was also
2 X5 X10
imposed.
ADP After computing the multiobjective program by
2 Ethanol
using ADBASE, 22 efficient solutions were obtained.
Only the efficient vertexes of the problem were consid-
Biochemical Systems Optimization Through ered. The analysis of the solutions showed that solutions
Mathematical Programming, Fig. 3 Glucose fermentation with a high ethanol production are only possible when
pathway to ethanol, glycerol, and polysaccharides in Saccharo- the maximum of ATP (X5 ¼ 1.5) and ATPase (X13 ¼ 5)
myces cerevisiae. Five dependent concentrations and
eight fluxes are involved: Intracellular Glucose (X1); Glucose-
are provided to the system. This indicates the extreme
6-phosphate (X2); Fructose-1,6-bisphosphate (X3); Phosphoenol- importance of the ATP turnover to increase productiv-
pyruvate (X4); ATP (X5); Polysaccharides; Glycerol; Ethanol ity. To facilitate the analysis and classification of the
(X16), and ADP. Enzymes/pathway steps: Vin, Glucose uptake generated solutions, some auxiliary biologically rele-
(X6); VHK, Hexokinase (X7); VPFK, Phosphofructokinase
(X8); VGAPD, Glyceraldehyde 3-phosphate dehydrogenase (X9);
vant parameters were defined as follows:
VPK, Pyruvate kinase (X10); VPOL, Total disaccharide and
polysaccharide storage (X11); VGOL, Glycerol production P
P
5 13
(X12); VATPase (X13), Generalized rate for all ATP-utilizing Xi Xi
process, with the exception of VHK, VPFK, and VPOL i¼1 i¼6 VPK ðXÞ
sðXÞ ¼ ; yðXÞ ¼ ; rðXÞ ¼ :
P5 P13 VPK ðX0 Þ
Xi0 Xi0
i¼1 i¼6

X
5 X
13
Fint ðYÞ ¼ Yi and Fenz ðYÞ ¼ Yi : s is the ratio between the total intermediate concen-
i¼1 i¼6 tration of the current solution and the one in the orig-
inal basal solution of the system; y is the ratio between
In the optimization program three types of con- the total enzyme activities at the optimum solution and
straints are considered (Vera et al. 2003): (a) those the ones at the basal steady state, while r describes the
that guarantee that solutions correspond to a steady- same ratio for the ethanol production of the system. In
state solution of the system; (b) those setting the phys- order to support the biotechnologists making the
iologically feasible lower and upper boundaries for the proper decision, additional criterions were introduced.
model variables; and (c) additional constraints related We ruled out all solutions with the lowest admissible
to special features of the considered biosystem. The ethanol production (rmin ¼ 2:0) and ranked the
constraints that set up the boundaries for the model remaining ones according with their parametric
variables (metabolite or enzyme concentrations) distance to the so-called utopian point, a fictitious
B 98 Biochemical Systems Optimization Through Mathematical Programming

Biochemical Systems
Optimization Through
Mathematical
Programming,
Fig. 4 Efficient solutions for
5
the multicriteria optimization
of ethanol production by 4
S. cerevisiae. (●): Set I,
solutions with the largest 3
D(X,Xutp) value. (♦): Set II, ρn
solutions closer to the utopian 2
point (star). (■) Set III,
solutions with intermediate 1
values of D(X,Xutp).
0
(X) represents the anti-ideal 0
point
1 5
2 4
3 3
σn
2 θn
4
1
5 0

solution constructed with the optimal values of ethanol high productivity and enzyme levels but low metabo-
production (rutp), total intermediate concentration lite concentrations (0.8–1.85). Finally, Set III repre-
(sutp), and total enzyme concentration (yutp) from the sents those with intermediate values of D(X,Xutp), high
computation of separated monoobjective programs: productivity and enzyme levels but medium values of
total metabolites concentration. When the parametric
rutp ¼ 4:986 sutp ¼ 0:5 yutp ¼ 1:0 distance is considered the criterion to choose the best
solution two solutions were found (five-pointed star in
In a similar way, we defined the anti-utopian solu- Fig. 4: Table 2).
tion, extracting the worst value from every column in These solutions represent opposite alternatives
the payment matrix for the separated monoobjective from a biotechnological perspective. Solution
programs (Vera et al. 2003): I prefers the increase in the ethanol production at the
price of high cell resources consumption, while solu-
rfun ¼ 2:0 sfun ¼ 4:76 yfun ¼ 3:30 tion II establishes a biotechnological compromise
between satisfactory increasing in the ethanol produc-
With this information we defined a weighted para- tion and non-excessive cell resources consumption.
metric distance from every solution to the utopian It should be noted that the heuristic nature of the last
point, which is described by the following equation: step in the IOM method may provoke a loss of
efficiency in the solutions when the behavior of the
DðX; Xutp Þ original model departs from their S-system expansion.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
#ffi
u " This may result in the violation of some of the restric-
u1 rutp  rðXÞ 2 s utp  sðXÞ 2 y utp
 yðXÞ
2
¼t þ þ tions imposed. However, experience shows that these
3 rutp  rfun sutp  sfun yutp  yfun
violations are often negligible and the steady-state
solutions usually stay in the vicinity of the real efficient
Figure 4 shows the solutions grouped in three sets solution. Moreover, the method incorporates tools to
according with their D(X,Xutp) value. Set I corresponds estimate solutions closer to the optimum in the
to those solutions with the largest D(X,Xutp) value. This nonlinear domain. This latter step consumes consider-
group is characterized by a high productivity (r), but able computational effort, but even more important,
also by high total intermediate concentration (s), an the solution obtained ceases to be optimal. The practi-
undesirable property. Set II includes the closer solu- cal application of the method indicates however that in
tions to the utopian point D(X,Xutp), characterized by most cases it remains a high-quality solution.
Biochemical Systems Optimization Through Mathematical Programming 99 B
Biochemical Systems Optimization Through corresponding values for the fluxes, respectively. The
Mathematical Programming, Table 2 Best two efficient values assigned to the weights lj and ll are propor-
solutions for the multicriteria optimization of ethanol production
by S. cerevisiae tional to the relative importance of each key metabolite
and flux. Every term in the equation is scaled by its
Sol r s y D(X,Xutp)
value in the healthy condition such that all contribu- B
I 4.987 1.812 3.301 0.604
tions have comparable weights.
II 2.791 0.596 2.798 0.613
Dysfunction description. A mathematical model
description of the functional origin of the disease,
which is obtained by imposing a characteristic value
Case Study 3. Combining Mathematical Modeling to the enzyme activities whose deregulation originates
and Optimization to Detect Potential Drug Targets the pathology:
in Human Diseases
The methodology discussed here can be used to detect Xj ¼ XjPS
potential drug targets in diseases that are originated by
dysfunctions of biochemical networks. The strategy is
Physiological constrains. Some additional equa-
to point out one or more enzymes in the biochemical
tions grating that the biochemical network configura-
network, whose modulation via specific drugs permits
tion obtained is physiologically acceptable. These
to redirect some biochemical fluxes in the network. In
equations can be steady-state conditions and upper
this manner, the critical fluxes and metabolites for the
and lower bounds in the concentrations of the metab-
system are restored to values similar to the ones found
olites, respectively:
in healthy subjects (Vera et al. 2007). The method
requires the derivation of a mathematical model
dXi
describing the dynamics of the metabolic network ¼ 0; i ¼ 1; . . . ; n:
dt
under analysis. It is also necessary to retrieve biomed-
ical knowledge required to select the critical fluxes and
metabolites unbalanced in the disease condition, as XiLB Xi XiUB
well as their values in healthy subjects. Model optimi-
zation is used to determine which biochemical The solutions generated by the method are sets of
processes in the network must be modulated, via computationally predicted values for metabolites and
drug-mediated inhibition or activation, in order to substrates, as well as the required modulation level of
move the system from the current values of critical the targeted enzyme. The selected solutions are those
metabolites and fluxes toward the healthy ones. The for which the computed values of critical metabolites
simulation-derived therapeutic treatments may and fluxes are sufficiently close to the ones in the
consider the modulation of one reaction in the network healthy condition without compromising the values
at a time, but also suggest strategies to develop multi- of other metabolites. In Vera et al. (2007) this meth-
factorial treatments. The optimization program odology was used to identify potential enzyme drug
suggested contains at least the following elements. targets for hyperuricemia, a disease associated with
Objective. An objective function, which translated a defect in phosphoribosyl pyrophosphate synthetase,
in mathematical terms the minimization of the distance an enzyme that regulates the de novo metabolic syn-
between the values of the critical metabolites and thesis of purines. The final pathological effect of this
fluxes in the current systems state with respect to deregulation is an abnormal level of uric acid, which
those in the desired health state: triggers arthritic pain and nephropathy. We used
a power-law mathematical model of purine metabo-
"    #
Xl Xj  XHS  X P Ji  JiHS  lism (Curto et al. 1997) and defined a mathematical
 j 
Min lj  þ li  HS 
 optimization program with the structure described
 X HS
j
 Ji
j¼1 i¼1 above. We defined as critical metabolite the uric acid
and computed solutions of the optimization program
where Xj and XjHS denote the current and the health consisting in the inhibition of enzymes and the
values of the metabolites and Ji and JiHS are the potential application of a diet with low levels of purine
B 100 Biochemical Systems Optimization Through Mathematical Programming

Biochemical Systems
Optimization Through
Mathematical
Programming,
Fig. 5 Sketch of the
mathematical model for purine
metabolism used in our
analysis (see Vera et al. 2007
for complete description). In
red we highlight the metabolic
fluxes whose inhibition in
combination with a diet low in
purine precursors reduces in
our simulations the levels of
uric acid. Precisely, those
solutions propose the
drug-mediated inhibition
of one of the following
enzymes: Adenine
phosphoribosyltransferase
(Vden), AMP deaminase
(Vampd), RNases to AMP and
GMP (Vrgna, Vrnaa), 50
(30 )-Nucleotidase (Vdgnuc),
Guanine hydrolase (Vgua), or
xanthine oxidase (Vxd, Vhxd).

precursors, as well as combinatorial treatments References


consisting in the parallel inhibition of two enzymes.
With the method we detected up to six potential single Alvarez-Vasquez F, González-Alcón CM, Torres NV
(2000) Metabolism of citric acid production by Aspergillus
enzyme targets, including an analogous of the conven-
niger. Model definition, steady state analysis, dynamic
tional clinical treatment using the drug allopurinol, but behavior and constrained optimization of citric acid produc-
also two other totally unexpected potential drug tion rate. Biotechnol Bioeng 70(1):82–108
targets. When considering potential multifactorial Alvarez-Vasquez F, Cánovas M, Iborra JL, Torres NV
(2002) Modeling, optimization and experimental assessment
treatments, numerous possible solutions were detected
of continuous L-()-carnitine production by Escherichia
(Fig. 5). coli cultures. Biotechnol Bioeng 80(7):794–805
Curto R, Voit EO, Sorribas A, Cascante M (1997) Validation
and steady-state analysis of a power-law model of
Cross-References purine metabolism in man. Biochem J 324(Pt 3):761–775
Hatzimanikatis V, Emmerling M, Sauer U, Bailey JE
(1998) Application of mathematical tools for metabolic
▶ Flux Balance Analysis design of microbial ethanol production. Biotechnol Bioeng
▶ Parameter Estimation, Metabolic Network 58(2–3):154–161 Review
Modeling Schlosser PM, Riedy G, Bailey J (1994) Ethanol
production in baker’s yeast: application of experimental
▶ Power-Law Functions
perturbation techniques for model development and resultant
▶ S-System changes in flux control analysis. Biotechnol Prog
▶ Synthetic Biology, Predictability and Reliability 10(2):141–154
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts 101 B
Torres NV, Voit EO (2002) Pathway analysis and optimization
in metabolic engineering. Cambridge University Press, BioCreative II.5 and the FEBS Letters
Cambridge
Torres NV, Rodrı́guez F, González-Alcón C, Voit EO (1997) An Experiment on Structured Digital
indirect optimization method for biochemical systems: Abstracts
description of the method and application to ethanol, glycerol
and carbohydrates production in Saccharomyces cerevisiae.
B
Florian Leitner, Martin Krallinger and Valencia
Biotechnol Bioeng 55(5):758–772
Vera J, De Atauri P, Cascante M, Torres NV (2003) Alfonso
Multicriteria optimization of biochemical systems by Structural Biology and BioComputing Programme,
linear programming. application to the ethanol production Spanish National Cancer Research Centre (CNIO),
by Saccharomyces Cerevisiae. Biotechnol Bioeng Madrid, Spain
83(3):335–343
Vera J, Curto R, Cascante M, Torres NV (2007) Detection of
potential enzyme targets by metabolic modelling and opti-
mization. Application to a simple enzymopathy. Bioinform Definition
23(17):2281–2289
Vera J, González-Alcón C, Marı́n-Sanguino A, Torres N (2010)
Optimization of biochemical systems through mathematical BioCreative is a community challenge to evaluate
programming: methods and applications. Comput Operat applied systems in biomedical text-mining. In the
Res 37(8):1427–1438 context of the third installment of this challenge,
Voit EO (1992) Optimization in integrated biochemical systems. BioCreative II.5, the feasibility of using automated text-
Biotechnol Bioeng 40:572–582
mining systems for protein–protein interaction (PPI)
database curation was evaluated. In parallel, FEBS Let-
ters asked manuscript authors to annotate their manu-
scripts with protein interaction information. The author
Biochemical Systems Theory (BST) information then was further utilized by MINT PPI cura-
tors to compare the impact of basing their work on the
Eberhard O. Voit author data against their performance when starting from
The Wallace H. Coulter Department of Biomedical scratch. The BioCreative organizers evaluated the per-
Engineering, Georgia Institute of Technology and formance of the curators, authors, and automated systems
Emory University, Atlanta, GA, USA individually. Then, curator annotations based on author
data, as well as combined annotations from all text
mining systems, and combining author and automated
Definition systems’ data was compared to those individual results.
Finally, the annotation overlap between curators,
BST is a fully dynamic modeling framework, based on authors, and the best performing automated system
ordinary differential equations, in which all processes was measured to establish to what extent each of the
are represented with products of power-law functions. three sources would be complementary to the others.
This representation leads directly to specific rules
and guidelines for the design, diagnostics, analysis, and
application of models in various fields of biology. Introduction
BST permits several variants, among which Generalized
Mass Action (GMA) systems (▶ Generalized Mass The Structured Digital Abstract (SDA) initiative (Ceol
Action System) and S-systems (▶ S-System) are most et al. 2008) is an ongoing effort by FEBS Journal and
important. Models within BST are called canonical FEBS Letters to add annotations describing protein–
(▶ Canonical Model) in reference to their strict structure protein interactions (PPIs) that are reported with exper-
and strong modeling guidelines. imental verification to publications. BioCreative
(Blaschke et al. 2003; Krallinger et al. 2008) is
a community evaluation of text-mining systems
Cross-References where the BioCreative organizers ask participants to
perform biologically relevant tasks and then evaluate
▶ Dynamic Metabolic Flux Analysis the submitted results on a blind test set. In the context
B 102 BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts

of BioCreative II.5, the third installment of this chal- that could be used as common basis for the evaluation,
lenge, the organizers asked participants to automati- a so-called gold standard. To this end, article
cally generate parts of the SDAs from FEBS Letters annotations that had initially been made by authors
publications (Leitner 2010a). In total, 15 research were refined by three independent MINT database
groups worldwide followed this call and participated bio-curators, who continued refining the results until
in BioCreative II.5. The aim was to provide they reached a consensus annotation for those articles
a quantitative insight on how well automated systems that matches the MIMIx standard for annotating
can reproduce human annotations. Additionally, anno- PPIs (Orchard et al. 2007). In a second round
tations were provided by the authors of the papers of pruning – after the BioCreative participants had
themselves as well as by expert bio-curators submitted their results – the BioCreative organizers
of the MINT PPI database (Ceol et al. 2010; reevaluated the annotations with the results of the
Chatr-aryamontri et al. 2007) in the context of the automated systems and identified potentially errone-
associated FEBS Letters experiment. All these results ous annotations by examining results where the auto-
were evaluated then by the BioCreative organizers mated systems unanimously reported annotations that
(Leitner 2010b). Together with the author and curator were inconsistent with the curator data; Requesting
annotations it was possible to compare the results of the bio-curators counsel on the found inconsistencies
the automated systems to the manual annotations. and making corrections where necessary, this third
The SDA initiative and BioCreative II.5 are the first step produced the final “gold standard” annotations
quantitative approach to directly compare the impact for the evaluation.
of generating annotations for scientific manuscripts The applied performance measures are intended to
by various plausible approaches, namely, database reflect (a) the overall quality of the annotations made
curators (Howe et al. 2008), manuscript authors, and by any of the sources (i.e., systems, authors, and
automated text-mining systems. At current funding curators) and (b) the performance of the automated
levels, PPI databases can only keep up with a fraction systems when ranking their results by a probability
of the data that is published by the biosciences. score scheme, so-called “confidence values”
This leaves the larger part of generated scientific data (i.e., probability values in the (0,1] range) that partic-
“dormant” in written text: This data is neither trivial ipants were asked to report together with their annota-
to retrieve or locate by researchers looking for a partic- tions. This latter evaluation method was used as
ular information, nor is this format accessible for large- the primary evaluation target, as it measures the
scale data manipulation as would be required by Systems systems’ ability to produce result lists that can be
Biology or other data mining approaches in Computa- used by human annotators as initial dataset to
tional Biology. Therefore, BioCreative II.5 intended to base their own annotation effort on. Therefore, two
investigate the feasibility of pursuing new avenues to statistical measures were used:
increase the coverage of data deposited in structured (a) The harmonic F-measure (aka. F1-score) for the
repositories as a result of scientific publications, using overall quality of the result sets (both human and
PPIs as the exemplary data in this setting. For example, system annotations)
in the realm of PPIs, the united consortium of all major (b) The area under the interpolated precision/recall
PPI databases, IMEx, only manages to cover an esti- curve (aka. AUC iP/R) for the quality of automated
mated 10–20% of the existing interactions, limiting the results given the ranking
reconstruction of interactive graphs for protein interac- The competition itself was carried out online using
tion maps to a fraction of the existing data and making it an extended version of the BioCreative Meta-Server
difficult for biologists to determine all known, proven (BCMS), a special framework that interacts via web
interaction partners with their proteins of interest. services with the automated systems provided by the
challenge’ participants (Leitner et al. 2008) (also see
the entry ▶ BioCreative Meta-Server and Text-Mining
Methods Interoperability Standard in this encyclopedia).
Contrary to the initial version, the extended BCMS
The most crucial part of this effort was to produce allows participants to take direct control of and
a high-quality dataset with near-perfect annotations monitor the communication between their
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts 103 B
systems and the BCMS. This tool set the whole
challenge in an environment that simulated the applied
use case in which these annotations would be Full Text Selection
generated online and provided to a human annotator SDA
Structured

(see Fig. 1). Furthermore, this also made it possible Digital


Abstract
B
to time the systems and factor that variable into the
evaluation’s conclusions.
Query Users Results
Finally, in an effort to bring together all sides of
this effort, namely, the publishers, the curators,
and the scientists, a workshop was held in Madrid
(Spain), in 2009, to present the results and discuss
Request BCMS Response
their implications for both the efforts in text-mining
research, the feasibility of adding human annotation Distribution Collection

efforts with automated systems, and the outlook of


possible avenues to add annotations to publications
in general. IE / TM Annotation Annotation
Pipeline Servers Data

Results
BioCreative II.5 and the FEBS Letters Experiment on Struc-
Detailed results are found in the relevant tured Digital Abstracts, Fig. 1 The simulated online setting
of BioCreative II.5 reproduced by the BioCreative Meta-Server.
publications reported across three journals – Nature
The gray box “Users” represents the hypothetical human anno-
Biotechnolgy, FEBS Journal, and IEEE’s Transactions tator creating a SDA, while the “BCMS” and “Annotation
on Computational Biology and Bioinformatics Servers” form the technical framework created for the challenge
(TCBB); all (Leitner 2010a, b, c). They describe the
outcome and conclusions in meticulous detail and
also spurred research of independent organizations –
the challenge’s participants – resulting in nine The main comparison of the results of both the
additional research articles that formed part of the FEBS experiment and the BioCreative II.5 challenge
TCBB special issue that contains the main BC II.5 dealing with each source of annotations (authors,
article. curators, and text-mining systems) has been published
The texts this effort is based on, 122 PPI publica- in Nature Biotechnology; this is also the lead article of
tions, together with more than 1,000 negative example this joint effort. The comparison focused on the correct
articles (i.e., articles that explicitly do not contain PPI annotation of protein [database] identifiers and of the
descriptions with experimental evidence) from the [binary] interaction pairs found in the SDAs, which
same period and journal (FEBS Journal 2008 and were the main two tasks for the automated systems
2009), were provided by the FEBS Journal. With the in BioCreative II.5. The main conclusions from this
friendly permission of Elsevier, they are now accessi- work are:
ble as a free and open resource for further (a) At least the two FEBS magazines (FEBS Journal
research projects in text mining (Leitner 2010c). This and Letters), Cell (their graphical abstracts), and
resource is available through the BioCreative the ScienceDirect annotations (provided by NeXT
homepage (www.biocreative.org), as a so-called Bio) – but potentially other publishers, too – are
corpus. This alone presents an important achievement, very interested in adding annotations to their
as text resources with high-quality annotations are – manuscripts.
mostly due to copyright and other legal issues – scarce (b) Although authors and systems might perform
and very hard to come by; Furthermore, this resource is reasonably well, none of the two produced results
in XML format, which is simpler to read for that are sufficient for database standards.
machines as opposed to the usual PDF format scientific (c) The time requirement of systems is significantly
manuscripts are made available. lower than that of manual annotations
B 104 BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts

BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts, Table 1 Individual evaluation results of
the three sources and the combination of authors and curators (left) and results of individual text-mining systems, including their
performance when taking their ranking into account (AUC iP/R, right)
Precision Recall Precision Recall AUC
Task Class (%) (%) F-Score Task Class (%) (%) F-Score iP/R
Protein Systems 74 55 0.59 Protein Best F-score 74 55 0.59 0.53
identifiers Authors 84 66 0.71 identifiers (T42, S1)
Curators 96 89 0.91 Best AUC iP/R 14 73 0.21 0.57
Authors + 96 94 0.95 (T10, R5)
curators
Interaction Systems 53 34 0.37 Interaction Best F-score and 53 34 0.37 0.31
pairs Authors 72 57 0.59 pairs AUC iP/R
Curators 93 83 0.86 Incl. MINT data 64 61 0.58 0.58
Authors + 93 89 0.90 (T18, R1)
curators

(averaging at 2 min/articles, vs. almost 1 h for the author annotations would produce better results than
human annotations) and the quality of systems the two approaches by themselves did.
seems to be sufficient to at least aid human
annotators.
(d) Systems and authors, although producing lower Perspectives
quality results than curators, were able to identify
annotations the curators missed, and in the evalu- With the results of the BioCreative II.5 challenge and
ation it became apparent that when one source the insight created by evaluating it in comparison
based its annotations on another, the quality of with the FEBS Letters SDA experiment, it is clear
the resulting annotations was higher. For example, that neither curators (databases), nor authors, or text-
curators basing their annotations on author results mining systems can be tasked with creating the
performed better than if the curators started anno- needed annotations on their own. Our proposal is to
tating from scratch. A similar correlation is shown combine the approaches (as combining sources at
in annotations in the publications between systems least increased performance in terms of quality and
and authors. it is likely that adding automated systems will
Table 1 shows the evaluation results of the decrease annotation times) and distribute the effort
three sources as well as the performance of curators between all participants. A theoretical framework for
when basing their annotations on author data (left), combining human and machine annotations is
and the detailed results of the best submissions of presented in Leitner and Valencia (2008).
text-mining systems from the BioCreative II.5 partic- A potential outline of this architecture is shown in
ipants (right). Fig. 2 and explained in deep detail in the supplemen-
Last but not least, a detailed analysis of the tary material of the Nature Biotechnology publica-
core text-mining part, that is, BioCreative II.5 tion. Therefore, the further development of the
itself, that includes a comparison of the systems and BioCreative Meta-Server from a demonstration
applying various statistical approaches to the 134 server to a production-ready public service
result sets produced by the 15 participant teams is representing the main resource provided by the text-
part of the third publication in Transactions on Com- mining community is a major objective, and this also
putational Biology and Bioinformatics. This publica- encompasses the generation of an international stan-
tion documents the evaluation metrics (F1-Score, dard for annotating scientific texts, which will
AUC iP/R), the online setting of the challenge with increase the interoperability of text-mining systems
the BioCreative Meta-Server, and shows that combin- and make it easier to interface with tools humans use
ing the results of the text-mining systems with the to annotate the articles.
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts 105 B
Paper Biological
SDAs part of repository repository
submission and
review process

Authors
B Curators
B
Web- Publishers Databases
Review Curation
1 UI C 2
process Web- pipeline
Web-
services services
peer-reviewed
SDAs

A D
annotation annotation
assistance during assistance
authoring / reading during curation

Annotation Web- Web- Servers


BioNLP services UI
Reserachers
Text-mining 3 BCMS
platform

BioCreative II.5 and the FEBS Letters Experiment on Struc- extracting annotations from scientific manuscripts (see the
tured Digital Abstracts, Fig. 2 The target architecture to chapter ▶ BioCreative Meta-Server and Text-Mining Interoper-
integrate authors and text-mining systems in the process of ability Standard in this book, too)

Cross-References Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W,


Hill DP, Kania R, Schaeffer M, St Pierre S, Twigger S,
White O, Rhee SY (2008) Big data: the future of biocuration.
▶ BioCreative Meta-Server and Text-Mining Nature 455:47–50
Interoperability Standard Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L,
Wilbur J, Hirschman L, Valencia A (2008) Evaluation
of text-mining systems for biology: overview of the
References Second BioCreative community challenge. Genome Biol
9(suppl 2):S1
Blaschke C, Hirschman L, Yeh A, Valencia A (2003) Critical Leitner F, Valencia A (2008) A text-mining perspective on the
assessment of information extraction systems in biology. requirements for electronically annotated abstracts. FEBS
Comp Funct Genomics 4:674–677 Lett 582(8):1178–1181
Ceol A, Chatr-aryamontri A, Licata L, Cesareni G (2008) Leitner F et al (2008) Introducing meta-services for
Linking entries in protein interaction database to structured biomedical information extraction. Genome Biol
text: the FEBS Letters experiment. FEBS Lett 582:1171– 9(suppl 2):S6
1177 Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman
Ceol A, Chatr-aryamontri A, Licata L, Peluso D, Briganti L, L, Valencia A (2010a) An overview of BioCreative
Perfetto L, Castagnoli L, Cesareni G (2010) MINT, the II.5. IEEE/ACM Trans Comput Biol Bioinform
Molecular INTeraction database: 2009 update. Nucleic 7(3):385–399
Acids Res 38:D532–D539 Leitner F, Chatr-aryamontri A, Mardis SA, Ceol A, Krallinger M,
Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Licata L, Hirschman L, Cesareni G, Valencia A
Schneider MV, Castagnoli L, Cesareni G (2007) MINT: the (2010b) The FEBS Letters/BioCreative II.5 experiment:
Molecular INTeraction database. Nucleic Acids Res 35: making biological information accessible. Nat Biotechnol
D572–D574 28(9):897–899
B 106 BioCreative Meta-Server and Text-Mining Interoperability Standard

Leitner F, Krallinger M, Cesareni G, Valencia A (2010c) The developed (Krallinger et al. 2008a). Some of them
FEBS Letters SDA corpus: a collection of protein interaction have matured to industrial-standard quality sites
articles with high quality annotations for the BioCreative II.5
online challenge and the text mining community. FEBS Lett that are frequented by many users, such as iHOP
584(19):4129–4130 (Hoffmann and Valencia 2004) or Reflect (Pafilis
Orchard S et al (2007) The minimum information required et al. 2009). Others have developed powerful web
for reporting a molecular interaction experiment (MIMIx). service APIs (application programming interfaces),
Nat Biotechnol 25:894–898
as, for example, WhatIzIt (Rebholz-Schuhmann
et al. 2008), or the NaCTeM Service Systems
(Kano et al. 2010). Many more are available as indi-
BioCreative Meta-Server and vidual applications that can be downloaded and
Text-Mining Interoperability Standard installed. However, all this wealth of available soft-
ware, tools, services, and sites does not come without
Florian Leitner, Martin Krallinger and cost. Due to no real community agreement on annota-
Valencia Alfonso tion standards, document formats, and the semantics of
Structural Biology and BioComputing Programme, the generated data, it is painful to interface between
Spanish National Cancer Research Centre (CNIO), tools, especially if they are not from the same organi-
Madrid, Spain zation. Also, the power of even the best systems can be
still outstripped by a consensus result created from
many different systems that all provide the same kind
Definition of annotations (e.g., Smith et al. 2008). In addition, to
use one tool, it often is necessary to rely on results of
Over the past decade, text-mining systems have another. For example, before identifying gene names
matured to serve both bioinformaticians and biologists in an article, such a gene name “tagger” can signifi-
in a variety of ways. However, this also has the nega- cantly gain performance if its annotations are based on
tive impact that a plethora of protocols, sites, and tools the results of a pipeline that first tags the “part-of-
exist that are not compatible between each other. With speech” (PoS) of the words; PoS taggers annotate the
the BioCreative Meta-Server (BCMS) the first proto- grammatical sense of words, such as verb, noun, adjec-
type system to counterbalance this development was tive, etc.. However, while it might be easy to find high-
published. Because of uniting several text-mining sys- quality tools for one specific task, another area often
tems into one service using one protocol, both access can be lacking in terms of the number of systems that
and use of those resources could be significantly are available. Many tools only come with very basic
simplified. In addition, the BCMS provides command-line interfaces or are system libraries that
a straightforward means to combine results from mul- only computer experts know how to make use of. This
tiple text-mining systems, contributing an added ben- creates a very high entry barrier for many potential
efit that can lead to an even higher quality result than users. Even worse, in a few cases research groups
the individual annotations of the systems available decide to not make their tools publicly available; Yet,
through the BCMS. To push the capabilities of text- they might agree to make their pipeline available as
mining systems even further, a project was recently a web service, so that users can gain access to them but
launched to create a text-mining community consensus nobody can gain access to the source code the system
on an interoperability standard for the annotation of relies on.
texts. This project should facilitate the use of annota- All these issues led to a community understanding
tion systems and annotated texts for both text miners that the current status quo is ripe for improvement. In
and consumers of text-mining services. the area of sequence annotation and structure predic-
tion, these bioinformaticians have years ago already
begun to build distributed systems and meta-services.
Characteristics The idea is fairly simple: For distributed systems,
instead of having many different protocols and for-
Over the past decade, many text-mining and informa- mats, a community agrees to certain standards and
tion extraction systems for biomedical texts have been then builds their tools to those specifications
BioCreative Meta-Server and Text-Mining Interoperability Standard 107 B
(nowadays, this design principle is termed “service- scientific community. In essence, it is similar to
oriented architecture”). The earliest example of such as a structure prediction meta-server, but instead of
system is (Bio)DAS, a distributed sequence annotation adapting the interface to every possible text-mining
system (Dowell et al. 2001). On the other hand, as system, it defined a standard exchange format that
combining results leads to often superior consensus any system plugged into the BCMS had to follow. B
predictions, structural biology researchers at the same Thereby, it inherently has advantages similar to the
time begun to create so-called meta-servers (Bujnicki ones of the BioDAS system – namely, a uniform pro-
et al. 2001). In this case, the user contacts a central tocol and data model. The BCMS prototype requests
“service broker” or “meta-server” with the protein and manages annotations of four basic types and works
sequence for which he wishes to receive a structure on PubMed abstracts:
prediction. The meta-server, in turn, instead of calcu- 1. Classifying the abstract as to whether it describes
lating a protein structure on its own, forwards this protein–protein interactions
request to many different prediction servers via web 2. Identifying the mentions of genes and proteins in
services. Then the broker waits for the results to return the abstract
form those prediction servers, collecting each server’s 3. Listing of possible mappings to UniProt accessions
result. Once all results are in, the meta-server creates for the mentions
a superimposed consensus prediction from the individ- 4. Listing the most likely organisms the abstract is
ual the results, which usually tends to be a better esti- discussing
mate of the true protein structure than any single result. However, the current, publicly available version of
Then the user is informed that his result is ready and he the BCMS is a prototype. It only offers these annota-
can fetch the structure and accompanying information tions in a very limited scope: This particular service
directly from the meta-server. broker is limited to the approximately 22,000 PubMed
Both of these approaches would also solve plenty of abstracts used during the BioCreative II challenge.
the issues mentioned with current approaches in text- Therefore, only queries for that data set can be made
mining: The (Bio)DAS approach shows how to reduce and are distributed to the connected servers. In other
interoperability problems by defining a standard for the words, it is not possible to enter any new text to
data exchange. This makes it easy to chain the outputs annotate, request annotations other than the above
of one tool to another and to join or merge data from mentioned four base types, or even just query for any
several sources. The general web service approach other PubMed abstract.
allows organizations to make the fruits of their This prototype project, however, enabled the par-
labor available to the public without having to worry ticipants and developers of the servers to glean some
about loosing control over who has access to their very important insights into the endeavor of setting up
source code. By using the web, a user also does not a distributed and interoperable annotation system for
need to know anything about command-line interac- biomedical texts to and providing a public, automated
tion or programming and can run sequence annotation text-mining platform. These problems can be related to
tools and structure prediction services directly from his the three layers of interoperability (see Fig. 1): First,
browser in a graphical environment he is familiar with. there are low-layer syntactic issues that need appropri-
Last, meta-servers reap the fruits of consensus results, ate solutions, such as:
providing the user with possibly improved results. • Concurrency and thread management on all servers;
All this insight created the motivational basis to a fair queueing policy of requests on the broker/
develop the BioCreative Meta-Server (BCMS) meta-server side
(Leitner et al. 2008). This first prototype was directly • Fail-safe and nonrestrictive handling of the various
created after BioCreative II, a text-mining community possible representations of text in computers (so-
challenge where information extraction systems are called encoding schemes) on all participating
evaluated by an independent group of judges to mea- servers
sure the performance of those systems on some topic in • Compatibility issues with web service
the field of molecular biology (Krallinger et al. 2008b). implementations provided by different platforms
This version of the BCMS became the first attempt to and programming languages (“service
provide true “meta-services” for text-mining to the agnosticism”)
B 108 BioCreative Meta-Server and Text-Mining Interoperability Standard

Use-case analysis • The impact of copyright issues (e.g., only granting


Systemic IOP
Process definition access to a text to which the user has access, espe-
WHY? cially for publications, once the BCMS allows to
Text space annotate any text resource)
Semantic IOP Annotation types
WHAT? • Designing an interactive process with the users,
Service requirements
for example, by allowing the user to provide
initial annotations when sending a request,
Syntactic IOP Document formats
HOW? such as the organism(s) the text treats, which
Annotation formats
Communication protocols would significantly reduce the possible number of
API specifications protein IDs the systems need to concern themselves
when generating ID mappings (Leitner and
BioCreative Meta-Server and Text-Mining Interoperability Valencia 2008)
Standard, Fig. 1 The three layers of interoperability (IOP). From these insights into issues that have to be
The syntactic layer relates to technical problems such as speci- addressed for an “ideal” distributed text annotation
fications and implementation details. The semantic layer treats
issues related to data content and meaning, and interfaces. system, and by designing solutions for these problems,
Finally, the systemic layer determines the use-case and process the next stage of the BCMS is now under development
requirements and outlines the strategy for solving issues on the (Fig. 2). The goal will be to provide a framework that
two inner layers will allow users to request annotations for text in
many of the common document formats (e.g., plain-
These are just a few of the many technical details text, HTML, XML, Word, or PDF files), permitting
that need attention. Then there are concerns that are the use of any encoding schema to represent those
related to the content and meaning of the annotation texts, having a universal entity annotation schema
data, such as: that is not limited to a predefined set of types,
• What kind of attribute values are permitted; for avoiding a lock-in into one specific service protocol
example, that reporting a probability value annota- by providing the most common ones “out of the box”
tion of zero for an annotation is a violation or how to (i.e., JSON-RPC, XML-RPC, SOAP, and REST),
report the position of a mention inside a text: either providing the ability to query for and annotate any
by counting the number of characters or the number PubMed abstract, interactively allowing both human
of bytes up to that mention, which is additionally and machine annotations, etc. An important milestone
obfuscated by the variable multibyte character on this path is to establish a community agreement
nature of many encoding schemas, or via inline on a biomedical text-mining annotation interopera-
annotations such as XML tags that come with dis- bility standard (currently, termed “BAIS” and acces-
tinct and possibly larger set of disadvantages for sible via web at http://bais.bioinfo.cnio.es/) that
interoperability will provide the essential guidelines for any informa-
• Which databases to use for the mappings to and tion extraction tool, even for systems that do not
how (e.g., which databases can be used for a gene necessarily have to be coupled with the BCMS
annotation or a protein, and, e.g., for UniProt, platform.
whether to map to entry names or to the UniProt At the current state (October 2010), all the
accessions; or how to treat obsolete database development is by and large still in very early
identifiers; etc.) development stages. Hopefully, a fully functional
Are the most prominent issues to agree upon at the BCMS will replace the prototype version that is
semantic layer. Finally, there are high-level, syntactic currently found at http://bcms.bioinfo.cnio.es/,
problems that have to be considered, such as: running at the Spanish National Cancer Research
• Establishing a consensus annotation, runtime Center in Madrid in the near future. Also, the inter-
requirements of the servers (if they take too long, operability standard (BAIS) should have matured
users would get annoyed, but if servers tune the text by that time and currently resides at http://bais.
processing too much, annotation quality suffers) bioinfo.cnio.es/.
BioCreative Meta-Server and Text-Mining Interoperability Standard 109 B

1
JSON, CSV, Stand-off data,
Any online PC Some application
or inline XML inline XML, or
(annotations) CSV (anns.) B
HTML UI,
server mgmt, REST RPC or REST
system admin (PMIDs, doc) (PMIDs, docs)

REST REST
Website XML & JSON Request manager XML 2
PubMed
Query
retrieval of unknown or
system + entity outdated MEDLINE records
data names docs, jobs
MEDLINE user data & docs anns &
records & MEDLINE inline XML Persistent store
system data records
system data (users, servers),
Fair
MEDLINE records, entity names
queue
PL / Temporaty store 3
Persistent store Temporary store
SQL documents & annotations, logs
docs &
server data & anns validation of identifiers and
entity names docs jobs retrieval of unknown entity names
server data & anns
known entities
inline XML
XML
XML worker RPC Service broker REST 4

BioCreative meta-server RPC or REST

Stand-off data, Stand-off data,


inline XML, or CSV inline XML, or CSV

Text-mining
servers in
the WWW 5

Gene mention server Protein normalization server PPI relation server etc.

HTTP request Data posting


Arrow Legend
HTTP response Data retrieval

BioCreative Meta-Server and Text-Mining Interoperability manages communication with the text-mining servers, and val-
Standard, Fig. 2 The current design proposal for the final idates database identifiers, adding supplementary information
BCMS. Essentially, there are five units, from top to bottom: (1) (e.g., the protein or gene names) to the annotations; (5) And
Clients accessing the platform, either from a browser or pro- the text-mining servers, providing the actual annotations of any
grammatically via a web service; (2) The front end that provides type agreed upon by the community (i.e., BAIS) via web services
the web site and services, accepts documents, queues jobs, again. Blue arrows symbolize communication channels over the
returns annotations, and communicates with PubMed; (3) The Internet, black dashed arrows outline internal communication
storage layer that is responsible for the data management and pathways for the BCMS components
persistency; (4) The back end that processes inline annotations,
B 110 Bioinformatics

Cross-References Definition

▶ BioCreative II.5 and the FEBS Letters Experiment Biological activity is “the capacity of a specific molec-
on Structured Digital Abstracts ular entity to achieve a defined biological effect” on
a target (Jackson et al. 2007). It is measured in terms of
potency or the concentration of the molecular entity
References
needed to produce the effect (Pelikan 2004).
Bujnicki JM, Elofsson A, Fischer D, Rychlewski L (2001) Struc- A biological activity is determined by means of
ture prediction meta server. Bioinformatics 17(8):750–751 a biological assay.
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The
distributed annotation system. BMC Bioinform 2:7
Hoffmann R, Valencia A (2004) A gene network for navigating
the literature. Nat Genet 36(7):664 Cross-References
Kano Y, Dobson P, Nakanishi M, Tsujii J, Ananiadou S (2010)
Text mining meets workflow: linking U-compare with ▶ Biological Assay
Taverna. Bioinformatics 26(19):2486–2487
▶ Drug Target
Krallinger M, Valencia A, Hirschman L (2008a) Linking genes to
literature: text mining, information extraction, and retrieval ▶ Natural Product Resources
applications for biology. Genome Biol 9(Suppl 2):S8
Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L,
Wilbur W, Hirschman L, Valencia A (2008b) Evaluation
of text-mining systems for biology: overview of the
References
Second BioCreative community challenge. Genome Biol
9(Suppl 2):S1 Jackson MJ, Esnouf MP, Winzor D, Duewer D (2007) Defining
Leitner F, Valencia A (2008) A text-mining perspective on the and measuring biological activity: applying the principles of
requirements for electronically annotated abstracts. FEBS metrology. Accredit Qual Assur 12(6):283–294
Lett 582(8):1178–1181 Pelikan EW (2004) Glossary of terms and symbols used in
Leitner F et al (2008) Introducing meta-services for biomedical pharmacology. University School of Medicine, Department
information extraction. Genome Biol 9(Suppl 2):S6 of Pharmacology and Experimental Therapeutics, Boston.
Pafilis E, O’Donoghue SI, Jensen LJ (2009) Reflect: augmented http://www.bumc.bu.edu/busm-pm/resources/glossary
browsing for the life scientist. Nat Biotechnol 27(6):508–510
Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A
(2008) Text processing through Web services: calling Whatizit.
Bioinformatics (Oxford, England) 24(2):296–298
Smith L et al (2008) Overview of BioCreative II gene mention Biological Applications of Network
recognition. Genome Biol 9(Suppl 2):S2
Modules

Minlu Zhang1 and Long Jason Lu2


1
Department of Computer Science, University of
Bioinformatics
Cincinnati, Cincinnati, OH, USA
2
Division of Biomedical Informatics, Cincinnati
▶ Disease System, Malaria
Children’s Hospital Research Foundation, Cincinnati,
OH, USA

Biological Activity
Definition
Riza Theresa Batista-Navarro
National Centre for Text Mining, Manchester A protein-protein interaction (PPI) network of a certain
Interdisciplinary Biocentre, Manchester, UK organism is a network that contains proteins and
physical interactions between protein pairs. In the
network, each node/vertex is a protein and each edge/
Synonyms link is a physical interaction.
An interactome refers to a complete PPI network
Bioactivity; Pharmacological activity that contains all protein physical interactions in
Biological Applications of Network Modules 111 B
a certain organism. An interactome is usually approx- Protein Function Prediction Based on Modular
imated by combining all known PPI data from large- Analysis
and small-scale experiments, expert curations, and One application of module identification is to predict
possibly computational predictions. functions for proteins in modules with unknown func-
A general greedy search is any algorithm or method tional annotations. Such protein function predictions B
that makes a locally optimal decision in order to retrieve are based on identified modules as well as known
or approximate a globally optimal solution. A greedy functional annotations, such as ▶ gene ontology or
search method often contains a candidate set, a selection Saccharomyces Genome Database annotations
criterion, a feasibility criterion, an objective function, (Ashburner et al. 2000; Cherry et al. 1998).
and a solution criterion. The candidate set contains all A straightforward method to predict protein func-
candidates that can be added to the solution. The selec- tions is to assign a function shared by the majority
tion criterion or function determines the best candidate of proteins/genes in a module to other unannotated
to add next into the solution. The feasibility criterion protein/gene in the same module, also considering
determines whether a candidate can be added to con- each of the assigned functions as a function of the
tribute for the solution. The objective function assigns whole module. In this way, proteins with unknown
and calculates a heuristic score for a solution or partial functions are assigned these shared functions within
solution. The solution criterion indicates whether the the same module. Alternatively, a hypergeometric test
final solution is achieved. (or one-tailed ▶ Fisher’s test) can be performed for
each function to calculate a p value indicating whether
or not a function is overrepresented by genes/proteins
Characteristics in a module:

Network modules identified by network clustering are j nj


widely suspected to correspond to ▶ functional mod- X
k1
i mi
p¼1 ;
ules and complexes. Therefore, module identification i¼0 n
is directly applied for functional module discovery. m
The evaluation of identified modules mainly includes
the comparison with known protein complexes and where n is the number of nodes in the whole network,
pathways, as well as ▶ functional enrichment analysis j is the number of nodes in the network annotated with
in modules. Protein complexes data of yeast can be the function, and m is the module size. The p value
retrieved from Munich Information Center for Protein indicates the probability of observing at least k proteins
Sequences (MIPS) database (Mewes et al. 1999), and from a module or cluster of size m by chance to have
Kyoto Encyclopedia of Genes and Genomes (▶ KEGG the tested function, and small p values imply possible
PATHWAY) is the most comprehensive database for enrichment of certain functions in modules. Given
pathway-related information (Kanehisa and Goto a threshold for the p value, the significantly overrepre-
2000). On one hand, identified modules are likely sented functions are then predicted for all proteins/
biologically meaningful if the proteins/genes in the genes in the module and assigned to proteins with
module overlap with certain protein complexes or unknown functions. For example, suppose a module
pathways. On the other hand, network modules with 20 nodes is identified from a curated human
are likely functional modules if many of identified ▶ interactome network from Human Protein Refer-
modules have overrepresented functions by their ence Database (HPRD; the 9th release) (Keshava
protein members. Prasad et al. 2009), where each node is a human protein
Further applications of identified network modules, and each edge indicates a physical interaction between
especially in ▶ protein-protein interaction (PPI) net- two proteins, and 18 out of the 20 proteins have
works, mainly include protein function prediction by ▶ gene ontology annotations. If 16 out of the 18
assigning overrepresented functions in a module to annotated proteins all have a specific function, for
protein members with unknown functions, as well as example GO:0006950 (response to stress), while
▶ network-based biomarker discovery for disease status among other 9,462 human proteins in the interactome,
or outcome (Chuang et al. 2007; Zhang et al. 2010). 1,924 proteins are annotated with this function, then
B 112 Biological Assay

this function is significantly overrepresented ▶ Network-based Biomarkers


by proteins in this module, with p value ¼ 1.7e-8 by ▶ Ontology
a ▶ Fisher’s exact test. Considering that the module ▶ Ontology Analysis of Biological Networks
is very likely to be involved in a function related to
response to stress, the other four proteins including the
two proteins with unknown functions can be predicted References
or assigned to have this function as well.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al
(2000) Gene ontology: tool for the unification of biology.
Biomarker Discovery Based on Modular Analysis
The gene ontology consortium. Nat Genet 25(1):25–29
Another application of modular analysis is the discov- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS et al
ery of ▶ network-based modular biomarkers for (1998) SGD: Saccharomyces genome database. Nucleic
disease status or outcome. Such discovery is Acids Res 26(1):73–79
Chuang HY, Lee E, Liu YT, Lee D, Ideker T (2007) Network-
performed by combining network modular analysis
based classification of breast cancer metastasis. Mol Syst
with disease-related genomic data and often involves Biol 3:140
several steps of scoring and search. Specifically, over- Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes
laying genes from gene expression data of two differ- and genomes. Nucleic Acids Res 28(1):27–30
Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S,
ent phenotypes (e.g., disease versus normal control, or
Kumar S et al (2009) Human protein reference database–
cancer metastatic samples versus primary tumor sam- 2009 update. Nucleic Acids Res 37(Database issue):
ples) onto molecular networks, for example PPI net- D767–D772
works, will result in a set of connected components, or Mewes HW, Heumann K, Kaps A, Mayer K, Pfeiffer F et al
(1999) MIPS: a database for genomes and protein sequences.
subnetworks with disease-related genes. For each
Nucleic Acids Res 27(1):44–48
sample from one of the two phenotypes, an activity Zhang M, Deng J, Fang C, Zhang X, Lu LJ (2010) Biomolecular
score can be calculated for each subnetwork by aver- network analysis and applications. In: Alterovitz G, Ramoni
aging the normalized gene expression levels of all M (eds) Knowledge-based bioinformatics: from analysis to
interpretation. Wiley, Chichester, pp 253–288
genes in the subnetwork. Correlation between the
subnetwork gene activity score and the phenotype
can be assessed by the ▶ mutual information measure.
Such ▶ mutual information between genes in
a subnetwork and the resulting phenotype can be opti-
mized based on a greedy search method by Biological Assay
seeding with one gene and growing the subnetwork
by adding connected genes based on the PPI network. Riza Theresa Batista-Navarro
Finally, a set of subnetworks that can best National Centre for Text Mining, Manchester
classify the two phenotype outcomes are output as Interdisciplinary Biocentre, Manchester, UK
▶ network-based modular biomarkers for disease sta-
tus or outcome.
Synonyms

Cross-References Assay; Bioassay

▶ Biomarkers
▶ Fisher’s Test Definition
▶ Functional Enrichment Analysis
▶ Functional Modules and Complexes A biological assay is an experiment that determines
▶ Functional/Signature Network Module for Target a substance’s biological activity based on its effect on
Pathway/Gene Discovery a specific drug target, relative to that of a standard
▶ KEGG Pathway Database preparation (IUPAC 1997). It is also the process by
▶ Modules in Networks, Algorithms and Methods which the potency of an agent is measured in terms of
▶ Mutual Information the reactions of a specific drug target (Pelikan 2004).
Biological Disease Mechanism Networks 113 B
Cross-References edges (directed/undirected connections) and classified
into different subtypes.
▶ Biological Activity In other words, the complex biological processes
▶ Drug Target are presented through the precise interaction and reg-
▶ Natural Product Resources ulation of 1,000 of molecules. These biological B
networks are significantly different from any random
References networks and often exhibit ubiquitous properties in
terms of their structure and organization.
IUPAC (1997) Compendium of chemical terminology. In: Biological networks are developed and analyzed to
McNaught AC, Wilkinson A (eds) The “gold book”,
understand the pathophysiology of different human
2nd edn. Blackwell Scientific, Oxford
Pelikan EW (2004) Glossary of terms and symbols used in diseases. Such networks also help in identifying
pharmacology. University School of Medicine, Department novel biomarkers, candidate genes, pathway cross
of Pharmacology and Experimental Therapeutics. Boston. talk pattern, etc., leading to the progression/molecular
http://www.bumc.bu.edu/busm-pm/resources/glossary
characterization of complex diseases, i.e., type 2
Diabetes and Cancers.

Biological Clock
Characteristics
▶ Circadian Rhythm
Network Classification/Network Types
The high-throughput data-collection techniques
like microarrays, protein chips, or yeast two-hybrid
Biological Disease Mechanism Networks screens determine how and when these molecules
interact with each other. Various types of interaction
Shipra Agrawal1 and M. R. Satyanarayana Rao2 networks, including protein–protein interaction, and
1
BioCOS Life Sciences Pvt. Limited, Institute of metabolic, signaling, and transcription-regulatory net-
Bioinformatics and Applied Biotechnology, works emerge by integrating these data and interac-
Bangalore, Karnataka, India tions. Figure 1 describes interrelation across different
2
Chromatin Biology Laboratory, Molecular Biology types of biological networks, which are mostly
and Genetics Unit, Jawaharlal Nehru Center for constructed from high-throughput molecular data.
Advanced Scientific Research, Bangalore, Karnataka, The basic networks are required to study any biological
India phenomena or disease includes protein network, sig-
naling and metabolic network, co-expression and tran-
scriptional regulatory network, and protein-DNA
Synonyms interaction network (Fig. 1). All these networks are
analyzed to identify biologically relevant modular
Diseasome; Gene network; Intercatome; Protein net- structure of the networks (Please see the box for Mod-
work; Scale-free network; Transcriptional regulatory ular Network).
network The classifications of such biological networks are
mostly based on types of input data, information con-
nectivity, architecture topology, and overall functional
Definition interpretations.
Based on aforementioned criteria, some of the
The biological networks are constructed from any important networks are defined as follows:
type of molecular/biochemical and biological data 1. Protein–Protein Interaction Network
to interpret the overall functional and regulatory • It is a network of interacting proteins. This is
schema of any biological system (cell/tissue/organ undirected network, in which proteins are
and organism). These networks are represented represented by nodes and the interaction
through wiring diagram of nodes (genes/proteins) and between the proteins are represented by edges.
B 114 Biological Disease Mechanism Networks

Gx
D1
Gy
D2 Ga Disease-Phenotype
network
Gb
D3
Gc Phenomic data
Gd

Level E1

3 3A+B C
A 1
E2
of 1
1 1 Metabolic network
B B+C D C
1
D 1
Complexity 1 1
Metabolic data
D B+C

E3

Protein -Protein interaction


network

Proteomic data

Coexpression network Transcriptional regulatory


network
TF1 TF2 TF3

G1 G2 G3 G4 Genomic & transcriptomic


data

Biological Disease Mechanism Networks, which are finally merged to construct complex disease networks
Fig. 1 Biological/molecular networks and relative complexity (Please refer Network Complexity Box for details) (Note: D1,
It shows sequential increase in complexity in terms of both data D2, D3 exemplify related diseases and Gx, Gy, Ga, etc. represent
and information. The layers represent different network types reported candidate genes for corresponding diseases)
and cross-correlation of different types of molecular networks,

2. Signaling and Metabolic Networks enzymes. The nodes are either metabolites or
• Signaling network is molecular bridge of events enzymes or reactions. It is a directed and
between the cell and the outside environment. It weighted network. It is constructed from the
is directed and is composed of proteins or literature information on reactions, enzymes,
enzymes or factors that trigger or suppress the genes, substrate–enzyme concentration, product
expression of a number of genes. The signaling concentration, etc. (Ma and Zeng 2003;
event involves various proteins localizing to Albert 2005).
different compartments in the cell and finally 3. Co-expression Network and Transcriptional Regu-
controlling gene expression in the cell nucleus latory Network
(Hughey et al. 2009). • Co-expression networks are constructed from
• The signaling networks are mostly integrated the co-expression measures of genes across var-
with metabolic networks, which are biochemical ious tissue samples. Here, the nodes correspond
reactions along of substrates, metabolites, and to genes and edges represent the connection
Biological Disease Mechanism Networks 115 B
strength which is obtained from the promoter (in coming connectivity). The network is
co-expression similarity. It is also called as cor- undirected (Deplancke et al. 2006).
relation or association networks (Horvath and
Dong 2008).
Network Complexity
• The co-expression data is modeled to generate B
A rational integration of molecular networks
transcriptional regulatory networks, in
describes substantial understanding of the com-
which one gene may regulate transcription of
plexity of a biological phenomena/molecular
another gene either directly or indirectly.
mechanism. For example, a researcher con-
Here, the nodes are genes that are connected
structs biologically important networks sepa-
through directed edges. Arrows in the network
rately from genomics and proteomics data from
topology diagram indicate that they trigger
a diseased system. Then, these networks are cor-
the target gene activation, while bars in the
related and integrated to develop a single and
network indicate a repressive effect on the
meaningful disease model or network. The bio-
target gene (Keurentjes et al. 2007). This net-
logical significance and complexity of
work presents both physical and functional
a molecular network increases by integrating
interactions composed of genes controlled by
various types of molecular data/networks as
transcription factors. The network is directed
one base and holistic network.
and the node types are genes and transcription
factors. The transcriptional regulatory networks
are having several signature motifs including
feed-forward loop motif, bifan, and auto-
regulatory loops (Dobrin et al. 2004). Descrip- Modular Network
tions of these motifs are given in the following
paragraphs. M2
• Feed-forward loop motif – It consists of a pattern M1
of three genes which is composed of two tran-
scription factors and a target gene. One of the
M3
transcription factors regulates the other while
both together regulate the target gene.
• Autoregulatory loops – It consists of a regulator
gene that in turn binds to its own gene promoter,
thereby bringing autoregulation. M4

• Bifan motifs – Bifan consists of two source


nodes, which directly cross regulates two target M = Modules
nodes. Such motifs are common in mammalian M5
cell signaling and in transcriptional networks.
4. Protein–DNA interaction (PDI) network – It is
constructed from interactions between proteins/ Modular networks are created due to interactions
transcription factors and gene regulatory elements across smaller subnetwork modules. The biolog-
such as promoters, cis- and trans-regulatory ele- ical networks are functionally organized into
ments. Such interactions lead to temporal gene modules, which are group of genes forming
expression during development, and other physio- hubs. Interactions between the modules repre-
logical status. Here, the nodes represent “interactor sent the cross talk between them. Each module’s
proteins” and promoters as “DNA targets” while the function is determined by the module organizer
edges represent the interactions between them. The genes or proteins. This modular fashion of net-
complexity of PDI is dependent on number of pro- works reduces the complexity into a small num-
moter targets per interactor/protein (outgoing con- ber of connected structures and function (Rives
nectivity) and the number of interactors per and Galitski 2003).
B 116 Biological Disease Mechanism Networks

Cancer Driver Genes & Combining 2 methods


Signaling Altered modules
pathways
Sequence
Mutations
From Microarray
experimentally Protein- DNA copy
validated genes protein number
interaction Gene signatures
changes

GBM-
specific
network gene-expression
GBM

WGCNA
Gene sets
Network
Topologic
Reconstr
al
analysis uction Survival
Gene associated
Coexpression
classifier Genes
module
Novel cell-cycle detection
regulators
Driver GBM Glioblastoma
Molecular mutations
Ladha et al., 2010
targets (DM)
Signaling pathway Cerami et al., 2010
Cross talks
de Tayrac et al., 2009
For DM
Torkamani & Schork, 2009
Horvath et al., 2006
Zhang et al., 2009

Biological Disease Mechanism Networks, Fig. 2 Graphical different colors correspond to specific research methods and
representation of different approaches, which have been used to results available in the literature
construct systems biology models in GBM. The arrows in

Disease Mechanism Network


linked to a disorder/disease from the known dis-
High-throughput molecular, biochemical, and genetic
ease–gene relationship. The nodes correspond to
data are used to construct disease specific networks in
candidate genes or disease names while the
human. Such networks hold high relevance in terms of
edges represent the relation between them. This
elucidating disease mechanism, identification of impor-
is a type of undirected network (Goh et al. 2007).
tant candidate genes and pathways, biomarkers etc.
The phenotypic disease network displays
There are some landmark developments in the area,
a relation across multiple diseases through
which signify understanding on disease mechanism
overlapping candidate genes.
and description of human diseasome through the cor-
Scientists working on specific complex dis-
relation of candidate genes and corresponding pheno-
eases have used various approaches to build dis-
typic properties. The following network describes
ease networks to understand the disease
a diseasome.
mechanism at system wide scale. Studies on
complex diseases like type 2 diabetes, prostate
Phenotypic Disease Network (PDN) cancer, lung cancer, colon cancer, glioma,
It is the network of interactions between diseases malaria, and tuberculosis led to the identification
and the candidate genes. The candidate genes are of candidate genes, biomarkers, and novel cross
Biological Disease Mechanism Networks 117 B
References
talk and protein interaction points. The network
biology approach has led to the discovery of Albert R (2005) Scale-free networks in cell biology. J Cell Sci
causative pathways in several of these complex 118(Pt 21):4947–4957
diseases like androgen receptor (AR) pathway Cerami E, Demir E, Schultz N, Taylor BS, Sander C (2010)
for metastatic prostate cancer (Vellaichamy Automated network analysis identifies core pathways in B
glioblastoma. PLoS One 5(2):e8918.
et al. 2010), role of TGFB pathway in inducing Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA,
oxidative stress and MAPK pathways to finally Martinez NJ, Sequerra R, Doucette-Stamm L, Reece-Hoyes
facilitate vascular complications in type 2 dia- JS, Hope IA, Tissenbaum HA, Mango SE, Walhout AJ
betics (Sengupta et al. 2009). Network-based (2006) A gene-centered C. elegans protein-DNA interaction
network. Cell 125(6):1193–1205
analysis has also facilitated discovery of markers Dobrin R, Beg QK, Barabási AL, Oltvai ZN (2004) Aggregation
in brain cancer, i.e., identification of CSK21 and of topological motifs in the Escherichia coli transcriptional
PP1A as progression markers in the pathogenesis regulatory network. BMC Bioinformatics 5:10
of glioblastoma (GBM) (Ladha et al. 2010). Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL
(2007) The human disease network. Proc Natl Acad Sci USA
104(21):8685–8690
Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM,
Glioblastoma Laurance MF, Zhao W, Qi S, Chen Z, Lee Y, Scheck AC,
GBM is the most common primary brain tumor occur- Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum HI,
Cloughesy TF, Nelson SF, Mischel PS (2006) Analysis of
ring in adult population. They are highly malignant and
oncogenic signaling networks in glioblastoma identifies
in turn show several subtypes. System biology appro- ASPM as a molecular target. Proc Natl Acad Sci USA. 103
aches have identified potential tumor suppressor genes, (46):17402–17407
prognostic genes, gene signatures, tumor subtypes, Horvath S, Dong J (2008) Geometric interpretation of gene
coexpression network analysis. PLoS Comput Biol 4(8):
survival-associated genes, and cell-cycle regulators.
e1000117
The functionally relevant weighted gene co- Hughey JJ, Lee TK, Covert MW (2009) Computational model-
expression network analysis (WGCNA) method has ing of mammalian signaling networks. Wiley Interdiscip Rev
identified some key molecular targets for glioblastoma Syst Biol Med 2(2):194–209
Keurentjes JJ, Fu J, Terpstra IR, Garcia JM, van den
(Horvath et al. 2006). A similar approach has detected
Ackerveken G, Snoek LB, Peeters AJ, Vreugdenhil D,
rare cancer driver mutations, which could drive late Koornneef M, Jansen RC (2007) Regulatory network con-
tumorigenesis (Torkamani and Schork 2009). struction in Arabidopsis by using genome-wide gene expres-
An integrated approach based on DNA copy sion quantitative trait loci. Proc Natl Acad Sci USA
104(5):1708–1713
number changes and gene expression changes was
Ladha J, Donakonda S, Agrawal S, Thota B, Srividya MR,
used to identify targeted gene signatures, tumor sub- Sridevi S, Arivazhagan A, Thennarasu K, Balasubramaniam
groups, and potential tumor suppressor genes for glio- A, Chandramouli BA, Hegde AS, Kondaiah P,
blastoma (de Tayrac et al. 2009). Systems approach Somasundaram K, Santosh V, Rao MR (2010) Glioblas-
toma-specific protein interaction network identifies PP1A
based on combined gene sets and protein interaction
and CSK21 as connecting molecules between cell cycle–
network has been used to develop a prognostic gene associated genes. Cancer Res 70(16):6437–6447
classifier that can predict survival-associated genes Ma H, Zeng AP (2003) Reconstruction of metabolic networks
(Zhang et al. 2009). from genome data and analysis of their global structure for
various organisms. Bioinformatics 19(2):270–277
Another approach combining DNA copy numbers
Rives AW, Galitski T (2003) Modular organization of cellular
and mutations, associated with sequences, along with networks. Proc Natl Acad Sci USA 100(3):1128–1133
human interaction networks has helped in identifying Sengupta U, Ukil S, Dimitrova N, Agrawal S (2009) Expression-
the cancer driver genes and potentially altered modules based network biology identifies alteration in key regulatory
pathways of type 2 diabetes and associated risk/
(Cerami et al. 2010).
complications. PLoS One 4(12):e8100
Our own investigation from experimentally vali- Torkamani A, Schork NJ (2009) Identification of rare cancer
dated upregulated genes and corresponding protein– driver mutations by network reconstruction. Genome Res 19
protein interaction information has led to identification (9):1570–1578
Vellaichamy A, Dezso Z, Lellean JeBailey AM,
of novel and important connecting proteins, CSK21
Chinnaiyan ASreekumar, Nesvizhskii A, Omenn GS,
and PP1A, which are implicated in cell-cycle regula- Bugrim A (2010) “Topological significance” analysis of
tion (Ladha et al. 2010) (Fig. 2). gene expression and proteomic profiles from prostate
B 118 Biological Marker

cancer cells reveals key mechanisms of androgen response.


PLoS One 5(6):e10936 Biomarker
Zhang J, Liu B, Jiang X, Zhao H, Fan M, Fan Z, Lee JJ, Jiang T,
Jiang T, Song SW (2009) A systems biology-based gene
expression classifier of glioblastoma predicts survival with ▶ Biomarker Discovery, Knowledge Base
solid tumors. PLoS One 4(7):e6274

Biomarker Discovery, Knowledge Base


Biological Marker
Donna L. Mendrick and Weida Tong
▶ Biomarkers Division of Systems Biology, National Center for
Toxicological Research, US Food and Drug
Administration, Jefferson, AR, USA

Biological Module Synonyms

▶ Organelle and Functional Module Resources Adverse events; Biomarker; Knowledge base; Systems
biology

Biological Network Model Definition

Melissa L. Kemp The current challenges as well as opportunities in


The Wallace H. Coulter Department of Biomedical biomarker discovery lie in the integration of in-house
Engineering, Georgia Institute of Technology and generated data of multiple types together with diverse
Emory University, Atlanta, GA, USA data in the public domain to assess disease and toxicity
at the systems level (systems biology). The knowledge
base approach (Fig. 1) takes advantage of current
Definition advances in computer technology and bioinformatics
to integrate diverse data sets into a content-centric
Biological network model is an abstract, graphical, or resource for evaluation of molecular biomarkers in
mathematical representation of a biological system determining efficacy or adverse events in animals and
using nodes and connecting edges to denote biomole- humans. Such a knowledge base will spawn hypothe-
cules and biomolecular interactions, respectively. ses to develop new studies to address the current gaps
and lead to further improvements in biomarker discov-
ery. New data and information generated from these
studies, in turn, will further enrich the knowledge base.
Biological System Model Knowledge bases are not only important in biomarker
discovery for biomedical research and drug develop-
Eberhard O. Voit ment, but also essential for the regulatory agencies for
The Wallace H. Coulter Department of Biomedical use as a first tier of information to review drug sub-
Engineering, Georgia Institute of Technology and missions supported by biomarkers for efficacy or
Emory University, Atlanta, GA, USA safety concerns.

Definition

A biological system model is a mathematical represen-


tation of a biological network, its regulation, and Disclaimer: The views presented in this article do not necessarily
dynamics. reflect those of the U.S. Food and Drug Administration.
Biomarker Discovery, Typical Process 119 B
Biomarker Discovery, • Various types of data
Knowledge Base, • A collection of knowledge and
Fig. 1 A knowledge base institutional memory
schema for biomarker
discovery Text Exp data Pub data
Standard methods to collect data; documents B
manually and / or computationally Curation

Structured database for


computationally organizing and Data
retrieving data repository

Computational tools for pattern Systems


recognition and data integration; biology
evolving as data and knowledge
grows

Knowledge base

Biomarker discovery

monitoring, drug-metabolism, drug-efficacy, and


Biomarker Discovery, Typical Process drug-toxicity biomarkers. The processes by which bio-
markers are discovered vary, depending on the type of
Emily A. Moon and Marquis P. Vawter biological materials and health or disease states being
Functional Genomics Laboratory, Department of investigated. Typically, the process begins when
Psychiatry and Human Behavior, University of researchers discover an association between a specific
California, Irvine, CA, USA indicator and a disorder, diagnosis, or drug response,
usually out of a large number of indicators that are
analyzed in the initial phase of the study. These signif-
Definition icant indicators are then verified through a range of
tests and analyses, depending upon the biomarker
Biomarker – a specific physical measure or indicator of being studied, and then validated in a larger sample
healthy biological processes, disease states, or phar- set that seeks to replicate the biomarker association
macologic responses to treatment. Can be a predictor across multiple populations. Regardless of the course
of disease severity, onset, or recovery; does not have to that researchers take in the biomarker discovery pro-
be related to pathophysiology, may be an indirect cess, however, this process is only the initial step in
marker of pathophysiology. a path toward clinical validation and commercializa-
tion of novel biomarkers.
Perhaps the best way to understand the biomarker
Characteristics discovery process is to look at a few examples within
the discovery of blood-derived biomarkers.
The discovery of novel biomarkers represents
a lucrative future of preventative, predictive, and RNA
personalized medicine (▶ Biomarkers, Clinical Rele- RNA biomarkers are typically RNA measurements
vance). A great subset of funding from the pharmaceu- that are indicators of normal biologic processes, dis-
tical and clinical diagnostic industries and government ease states, toxicological reaction, or therapeutic
sponsors has resulted in the detection and validation of response to treatment. RNA species can include
biomarkers relevant to disease state, point of care microRNA, message RNA, ribosomal RNA, and
B 120 Biomarker Discovery, Typical Process

transfer RNA levels, as well as RNA editing and splic- Amplification plot
1.000 E+1
ing of message RNA. Alterations in the sequences and
expression of the RNA molecules are commonly used
biomarkers. The RNA biomarker discovery process 1.000
begins with the recruitment of individuals for
a biomarker study and the isolation of RNA from
1.000 E–1
subjects’ whole blood or peripheral blood mononu-

ΔRn
clear cells (PBMCs), or generation of lymphoblastic
cell lines (LCLs). Researchers typically screen the 1.000 E–2
known set of genes with candidate genes generated
either from published studies or unpublished pilot 1.000 E–3
studies done by the researcher that repeatedly show
expression alterations in the state of interest, as com-
pared to normal expression. These putative biomarkers 1.000 E–4
are usually compiled via gene-focused microarray
0 5 10 15 20 25 30 35 40 45
analysis of whole blood, LCL, or PBMCs. Once Cycle
a gene, gene list, or network of genes of interest is
compiled, high quality RNA from subjects that meet Biomarker Discovery, Typical Process, Fig. 1 A qPCR
amplification plot is the plot of fluorescence signal versus cycle
the researcher’s inclusion criteria is transcribed into
number. Figure 1 is an amplification plot of the HLA-CD74 gene
cDNA and expression data is generated via quantita- in brain, showing the change in fluorescence of SYBR Green dye
tive PCR, normalized with an appropriate reference (Y axis) plotted versus cycle threshold (X axis). Lower Ct number
gene. The qPCR data can be generated with real-time represents greater expression of the HLA-CD74 gene
amplification plots (Fig. 1) and is analyzed to evaluate
the ability of individual and composite gene expression
markers to differentiate cases from controls based on DNA
transcript abundance as well as to validate the micro- Genomic biomarkers can be characterized by varia-
array results. The primary endpoint of the RNA bio- tions in DNA: single nucleotide polymorphisms
marker discovery process is the area under the curve (SNPs), haplotypes, insertions, copy number variation,
(AUC) of the Receiver Operating Characteristic inversions, deletions, or methylated cytosine residues.
(ROC) curve (▶ Area Under the ROC Curve, Relevant SNPs can be determined from a subject’s
▶ Receiver Operating Characteristic (ROC) Curve). isolated DNA by direct sequencing or SNP genotyping
The ROC is a plot that measures classification perfor- assays which employ fluorescence technology to
mance of a model across the entire range of thresholds assess genotypes. Figure 3 shows an allelic discrimi-
(Dwinnell 2010). The ROC curve is generated by nation plot generated from a TaqMan SNP genotyping
graphed data points of the true positive rate (sensitiv- assay from Applied Biosystems (Foster City, CA).
ity) and the false positive rate (100 minus specificity). ▶ Fluorescent markers are detected at the locus of
The more the ROC curve breaks toward the top-left of interest, depending on genotype of the sample. The
the graph, the better the model is at separating cases blue cluster is a software call of homozygous Allele
from controls, as opposed to a random prediction Y, marked with FAM fluorescence, and the red cluster
shown by the dashed line (Fig. 2). Thus, AUC of the is a call of homozygous Allele X, marked with VIC
ROC represents the predictive ability of the gene fluorescence. Dual fluorescence, an indicator of
expression markers, and a greater AUC correlates to heterozygosity, is seen as the green cluster centered
a stronger predictive marker. If an expression profile between the two fluorescent extremes. The results of
based on the ROC can be developed for cases versus multiple SNP genotyping assays or sequencing
controls, the biomarker can be used for early detection, studies can be analyzed together to find haplotype
monitoring, and treatment decisions with predictive biomarkers.
medicine for disorders, and with the added benefit of Copy number variation (CNV) assays consisting of
greater cost effectiveness. a primer/probe reagent mixture can be purchased from
Biomarker Discovery, Typical Process 121 B
100

6.0

80
5.0 B
60
Sensitivity

4.0

Allele Y
40 3.0

Biomarker 1
2.0
20 Biomarker 2
Biomarker 3
1.0
0
0 20 40 60 80 100 0.0
100-Specificity
0.2 0.7 1.2 1.7 2.2 2.7
Biomarker Discovery, Typical Process, Fig. 2 Three bio- Allele X
markers showing the degree of sensitivity and specificity for
predicting cases and controls; biomarker 1 and 3 are nearly Biomarker Discovery, Typical Process, Fig. 3 An allelic
equal, and both are better predictors than biomarker 2 discrimination plot of genotype calls at Neuregulin1 SNP
rs3924999, showing homozygotes (blue and red ) and heterozy-
gotes (green)
bioscience companies, or can be created in the labora-
tory using primer and probe and standard dilutions of
calibrated DNA references. Figure 4 shows a table
generated by CopyCaller after an Applied Biosystems 2
CNV assay run on an ABI 7900HT PCR system
(Applied Biosystems, Foster City, CA). The assay
Copy number

uses a multiplex reaction that includes a reference


assay and a target assay, and the cycle numbers of
the two assays are compared for each sample, and
1
a copy call is generated. CNV calls are analyzed to
determine their predictive power of the characteristic
of interest.
Methylation of cytosine deoxyribonucleotides in
DNA (▶ DNA Methylation) has been added to this
critical list of genomic biomarkers, and there has 0
been a flood of methylation research in all fields of
Biomarker Discovery, Typical Process, Fig. 4 Bar graph
medicine to discover valid epigenetic markers. output from CopyCaller, demonstrating 0, 1, and 2 copy number
DNA cytosine methylation is stable, and although variants in DNA. Each bar represents one sample
not permanent, could mark transition states from
health to disease as well as environmental alterations
to the epigenome. Biomarkers of methylated DNA PCR and nucleotide sequencing is then used to detect
are typically discovered via PCR techniques, as the converted base pair and confirm the presence of
methylated DNA is readily amplifiable and easily methylation at cytosine residues (Shiraishi and
detectable. Typically, DNA is treated with sodium- Hayatsu 2004). DNA hypermethylation usually occurs
bisulfite, which converts cytosine residues to uracil. near or within the promoter region of genes, although
B 122 Biomarker Discovery, Typical Process

microarray probes can measure methylation though rich in promise, is not always a particularly
changes across all regions of genomic DNA. Bio- fruitful endeavor. Several limiting factors to protein
markers of methylation are particularly useful, as biomarker discovery exist and include:
screening in at-risk populations can occur with • The relatively low abundance of some biomarkers
minimally invasive techniques. Multiple studies in the proteome that must be detected in a complex
show that DNA methylation of certain loci can be matrix and the lack of means to amplify these infre-
detected in blood, sputum, bronchoalveolar lavage, quent markers
and, potentially, exhaled breath-condensate (Anglim • The post-translational complexity of protein bio-
et al. 2008). markers in human blood and biological fluids mak-
ing detection somewhat ambiguous
Proteins • The variability in human population and
Protein biomarker discovery (▶ Biomarkers, Protein pathologies that make valid biomarkers very
Expression) can be defined as the process by which difficult to detect and apply to disease states (Rifai
the differential expression of proteins in diseased et al. 2006)
versus healthy or transitional states is first identified. Further, on a genome-wide basis, protein or multi-
Protein biomarker discovery can use model systems proteins levels are quite difficult to obtain and techni-
(i.e. mouse models, cell lines) or the analysis of cally not feasible at this time. However, this is not as
a variety of human biological fluids including blood, large of a problem for RNA diagnostics, which evalu-
urine, tissue lysates, and cerebrospinal fluid. Protein ate the expression of the genes in question. Because
biomarker discovery commonly employs two- gene expression can reflect both genetic and environ-
dimensional gel electrophoresis and mass spectrome- mental influences, it may be particularly useful for
try technology to separate and identify differential identifying risk factors for complex disorders which
expression of proteins in diseased and normal states, are thought to have a multi-factorial polygenic etiol-
via measurement of peptide abundance. ogy in which many genes and environmental factors
From this first pass separation and identification interact. In addition, multiple gene biomarkers can be
analysis, a list of candidate biomarker proteins is gen- used to identify disease risk and to predict or monitor
erated. These lists generally consist of hundreds of drug response. Lastly, novel RNA diagnostic tools can
candidate biomarkers, with a high rate of false posi- be employed with PBMCs, whole blood or LCLs,
tives. A biomarker candidate verification phase providing more options for researchers and
reduces the number of false positives and ensures that practitioners.
only the most promising putative biomarkers found in
discovery go on to the validation phase. Human plasma Other Biomarkers
(if not used in the discovery process) is typically ana- Besides the trilogy of RNA, DNA, and protein, the
lyzed during verification, as it is considered the most field of biomarkers routinely engages the use of
comprehensive illustration of the human proteome and other diverse measures such as metabolomics, lipids,
is in contact with all tissues and could be a sentinel of small molecules such as steroids, and toxic environ-
the disease processes (Anderson and Anderson 2002). mental residues in bodily tissues. Also to be consid-
Once a candidate biomarker has been verified, it can ered as biomarkers are the following imaging
move on to the clinical validation and commercializa- techniques: computed tomography, magnetic reso-
tion process. nance imaging, positron emission tomography, and
ultrasound. Electrophysiological measures such as
Caveats of Biomarker Discovery: DNA Versus RNA electroencephalogram, electrocardiogram, and elec-
Versus Protein tromyogram are considered informative biomarkers,
DNA diagnostics such as SNPs are useful but often do to accurately measure tissues of interest in disease
not reflect variants that have a biological role. More- or transitional states. Biomarker discovery in
over, DNA variants do not provide critical information these categories follows a process similar to that of
on the environmental factors that are important for blood-derived biomarkers, but with verification and
pathogenesis of these diseases except in cases of epi- validation stages and analyses that suit the biomarker
genetic biomarkers. Protein biomarker discovery, in study.
Biomarkers 123 B
Cross-References biologic processes, pathogenic processes, or pharma-
cologic responses to a therapeutic intervention
▶ Area Under the ROC Curve (Biomarkers Definitions Working Group 2001).
▶ Biomarkers, Clinical Relevance A biomarker can be used as an indicator of a change
▶ Biomarkers, Protein Expression or, if it meets the highest standard of proof, B
▶ DNA Methylation a surrogate endpoint that is expected to predict clin-
▶ Fluorescent Markers ical benefit (or harm, or lack of benefit or harm) based
▶ Receiver Operating Characteristic (ROC) Curve on epidemiologic, therapeutic, pathophysiologic, or
other scientific evidence.
Biomarkers are widely used in medicine, drug
References discovery and development, and safety assessment.
They serve as indicators for disease progression
Anderson NL, Anderson NG (2002) The human plasma (diagnosis and prognosis), therapeutic responses
proteome: history, character, and diagnostic prospects. Mol
(efficacy), and adverse side effects (safety) in organ-
Cell Proteomics 1(11):845–867
Anglim PP, Alonzo TA, Laird-Offringa IA (2008) DNA isms, organs, and/or cells. An ideal clinical biomarker
methylation-based biomarkers for early detection of should contain the following characteristics (Lesko
non-small cell lung cancer: an update. Mol Cancer 7:81 and Atkinson 1999): (1) clinical relevance, (2) sensi-
Dwinnell W (2010) ROC curves and AUC. http://
tivity and specificity to treatment effects, (3) reliabil-
matlabdatamining.blogspot.com/. Accessed 1 June 2010
Rifai N, Gillette MA, Carr SA (2006) Protein biomarker discov- ity, (4) practicality, and (5) simplicity.
ery and validation: the long and uncertain path to clinical Biomarkers need to be qualified to define
utility. Nat Biotechnol 24(8):971–983 their specific use and context (fit for purpose). How-
Shiraishi M, Hayatsu H (2004) High-speed conversion of cyto-
ever, most biomarkers in use today have not been
sine to uracil in bisulfite genomic sequencing analysis of
DNA methylation. DNA Res 11(6):409–415 evaluated in a comprehensive qualification process
(Goodsaid et al. 2008). This, unfortunately, means
that new biomarkers are being compared to older
ones for which a full understanding of their
context for use is not known. Currently, there is no
Biomarkers widely accepted framework for biomarker qualifica-
tion. The FDA has taken an initiative to develop
Donna L. Mendrick and Weida Tong a consistent and standardized qualification frame-
Division of Systems Biology, National Center for work for the acceptance of biomarkers for regulatory
Toxicological Research, US Food and Drug use. Such a regulatory driven effort will facilitate
Administration, Jefferson, AR, USA communication between regulatory agencies, phar-
maceutical companies, the research community, clin-
ical practice, and consumer participants for
Synonyms evaluation of the biomarker-surrogate-clinical end-
point relationship in different settings and
Biological marker; Molecular marker; Surrogate applications.
endpoint Current biomarker discovery increasingly is
relying on emerging molecular technologies that
aim to determine the causal and mechanistic relation-
Definition ships of molecular markers with clinical endpoints.
The representative technologies and associated bio-
A biomarker is a characteristic that is objectively marker types are summarized in Table 1, and most
measured and evaluated as an indicator of normal of them are high throughput or high content in
nature. These technologies can be applied indepen-
dently or in parallel; the latter is able to identify
Disclaimer: The views presented in this article do not necessarily multiple biomolecules at different level of biological
reflect those of the US Food and Drug Administration. complexity.
B 124 Biomarkers, Blood Derived

Biomarkers, Table 1 An emerging molecular technology landscape in biomarker discovery


Different Scientific disciplines and their representative molecular
biological levels technologies Data type Biomarkers
DNA Genetics/epigenetics: Genome Wide Association Study SNP variation Genetic
(GWAS), next generation sequencing markers
RNA Genomics: microarrays; next generation sequencing Gene expression Genomic
markers
Protein Proteomics: 1D/2D gel coupled with MS or MS/MS Protein profiling Protein markers
Metabolite Metabolomics: NMR and MS Metabolites Metabolomics
markers
Cell Drug screening: cell-based assays Multiple mechanistically Cellular
relevant parameters biomarkers

Cross-References biologic processes, disease states, toxicological reac-


tion, or therapeutic response to treatment. RNA species
▶ Biomarkers, Protein Expression can include microRNA, message RNA, ribosomal
▶ Biomarkers, Solid Tissue RNA, transfer RNA levels, as well as RNA editing
and splicing of message RNA. Alterations in the
References sequences and expression of the RNA molecules are
commonly used biomarkers.
Biomarkers Definitions Working Group (2001) Biomarkers and
surrogate endpoints: preferred definitions and conceptual
Genomic
framework. Clin Pharmacol Ther 69:89–95
Goodsaid FM, Frueh FW, Mattes W (2008) Strategic paths for Genomic biomarkers can be characterized by varia-
biomarker qualification. Toxicology 245:219–223 tions in DNA (single nucleotide polymorphisms,
Lesko LJ, Atkinson AJ Jr (1999) Use of biomarkers and surro- insertions, copy number variation, translations,
gate endpoints in drug development and regulatory decision
inversions, haplotype effects or deletions). A
making: criteria, validation, strategies. Annu Rev Pharmacol
Toxicol 41:347–366 recently discovered single nucleotide biomarker
predicting response to Hepatitis C treatment is
a relevant example of a DNA level biomarker of
therapeutic response to pegylated interferon-alpha-
Biomarkers, Blood Derived 2b. The CC genotype at rs12979860, located 3 kb
from the IL28B gene on chromosome 19, is associ-
Marquis P. Vawter and Emily A. Moon ated with a twofold greater rate of sustained viro-
Functional Genomics Laboratory, Department of logical response (the absence of detectable Hepatitis
Psychiatry and Human Behavior, University of C virus) at the end of pegylated interferon-alpha-2b
California, Irvine, CA, USA treatment, over the TT genotype (Ge et al. 2009).
Genomic screening also increasingly includes mito-
chondrial DNA as a biomarker for mitochondrial
Definition diseases, and there are databases developed that list
all mitochondrial DNA mutations associated with
Human blood provides a source for multiple types of disease such as MITOMAP (http://www.mitomap.
biomarkers. Within blood, biomarkers can be found in org/MITOMAP).
multiple components (plasma, serum, cellular) at the
following levels. Epigenomic
DNA methylation involves the addition of a methyl
Functional Genomic group to DNA and is essential for normal develop-
A functional genomic biomarker can be defined as an ment. Mammalian DNA methylation occurs mostly at
RNA characteristic that is an indicator of normal the number 5 carbon of the cytosine of a CpG
Biomarkers, Clinical Relevance 125 B
dinucleotide. Approximately 75% of all CpGs are References
methylated, and unmethylated CpGs are clustered in
“CpG islands,” generally located at the 50 end of Belinsky SA (2004) Gene promoter hypermethylation as
a biomarker in lung cancer. Nat Rev Cancer 4:707–717
a gene. In many disease processes, CpG islands undergo
Ge D, Fellay J, Thompson AJ, Simon JS, Shianna KV, Urban TJ,
abnormal hypermethylation. Hypermethylation has Heinzen EL, Qiu P, Bertelsen AH, Muir AJ, Sulkowski M, B
been found to be a powerful biomarker for prostate McHutchison JG, Goldstein DB (2009) Genetic variation in
cancer screening, detection, and diagnosis (Nakayama IL28B predicts hepatitis C treatment-induced viral clearance.
Nature 461:399–401
et al. 2004), as well as early detection and monitoring of
Nakayama M, Gonzalgo ML, Yegnasubramanian S, Lin X,
lung cancer (Belinsky 2004). DNA methylation may De Marzo AM, Nelson WG (2004) GSTP1 CpG island
also be useful for monitoring adverse environmental hypermethylation as a molecular biomarker for prostate
effects upon DNA. cancer. J Cell Biochem 91:540–552

Peptides and Proteins


Proteins and peptides are often studied in the bio-
marker discovery process, as proteomics gives
a much better dynamic understanding of an organism Biomarkers, Clinical Relevance
than genomics. Human plasma is thought to be the
most comprehensive representation of the proteome Marquis P. Vawter and Emily A. Moon
and of all body tissues and processes. Protein bio- Functional Genomics Laboratory, Department of
marker discovery faces many challenges given the Psychiatry and Human Behavior, University of
complexity of the proteome; the difficulties in studying California, Irvine, CA, USA
low abundance proteins, where many biomarkers are
thought to exist; and the variation within populations
and pathologies. Protein biomarker discovery will Definition
become a more fruitful endeavor as high throughput
proteome technologies and data mining continue to Biomarkers have significant clinical relevance as diag-
advance. nostic tools. Finding diagnostic biomarkers for dis-
eases that can only be identified by observation and
patient self-reports, like schizophrenia and
Metabolomics Alzheimer’s, can direct proper treatment and, for dis-
Metabolomics, the study of the small molecules eases where cure or improvement in prognosis is
(such as metabolites) that chemical processes leave related to early detection and intervention, potentially
behind, is a rapidly emerging field of study and extend the lives of affected patients.
rather promising source of novel biomarkers. Study- Biomarker discovery represents a plausible
ing these molecules, via mass spectrometry or capil- future of preventative, predictive, and personalized
lary zone electrophoresis, is particularly useful in medicine.
biomarker discovery, as metabolomics takes into
account the way lifestyle, diet, and environment Preventative
affect the health of an individual, in addition to An example of a preventative biomarker is blood pres-
genetics. Additionally, as compared to the genome, sure, used to assess risk for coronary heart disease
transcriptome, and proteome, the metabolome is (CHD) in a population. A meta-analysis of nine studies
very small, and with a high translatability across of over 400,000 subjects found that there was
species and eukaryotic and prokaryotic cells. a positive, continuous, log-linear relationship between
One caveat of metabolomic biomarker discovery, diastolic blood pressure (DBP) and stroke/CHD events
however, is the enormous amount of data that is (MacMahon et al. 1990). Blood pressure screening
generated from metabolomic mining studies. Analyt- represents a preventative biomarker that, when
ical technologies must advance in order to fully elu- indentified, can be used to direct treatment for lower-
cidate the potential of the information-rich field of ing DBP, either via drug therapies or changes in diet
metabolomics. and exercise.
B 126 Biomarkers, Independent Validation

Predictive and Personalized


Biomarkers can also be used to predict the efficiency of Biomarkers, Independent Validation
drug treatment in affected patients. One example of
a drug-response predictive biomarker is cytochrome Péter Horvatovich
P450 2D6 (CYP2D6), a gene predominantly expressed Department of Pharmacy, Analytical Biochemistry
in the liver and involved in the metabolism of many Research Group, University of Groningen, Groningen,
drugs, including antidepressants and tamoxifen, which The Netherlands
is biotransformed to the estrogen receptor blocker,
endoxifen, by the CYP2D6 enzyme. Among women
with estrogen-dependent breast cancer, tamoxifen is Synonyms
the most widely used drug therapy in tumor suppres-
sion. Studies on CYP2D6 polymorphisms suggest that Machine learning; Model cross-validation; Model
individuals carrying functional variants associated testing; Model validation
with deficient CYP2D6 metabolism receive less cura-
tive benefit from tamoxifen and are at a higher risk of
relapse than those without these variants (Goetz et al. Definition
2007). Screening breast cancer patients for these
variants can guide practitioners in developing a more Biomarker discovery is a complex procedure
personalized and effective treatment plan. In this consisting of discovery and validation phases. In
example, prediction of better survival rates for patients cases where there are no assumptions about the under-
with estrogen-dependent malignancy treated with lying molecular mechanism of a disease, the discovery
tamoxifen is conditioned on the genetic variant that phase consists of comprehensive profiling using the
increases tamoxifen bioavailability. biological fluid, which will be used for the final diag-
Biomarker discovery also plays a role in clinical nostic test. Analytical methods applied for comprehen-
economics; as health care moves more toward individ- sive profiling are time-consuming and have low
ualized care, the market for biomarker commercializa- sample throughput, while they provide quantitative
tion has grown substantially. The increased allocation information on several hundreds or thousands of pro-
of health-care resources to molecular diagnostics has teins or metabolites. The aim of statistical analysis is to
created a $23 billion dollar potential global market of select from these compounds those that may have an
high-cost, high-profit products (Wilson et al. 2007). As association with the biological states in question, such
an example, prescreening patients in a drug trial based as diseased vs. healthy. However, due to the low
upon a genetic variant can potentially limit the cost of sample size relative to the number of measured com-
achieving positive results by reducing the number of pounds, the outcome of statistical analysis may lead to
non drug responders enrolled. the selection of spurious biomarker candidates that do
not have a strong correlation with the biological state
that is being monitored. Supervised statistical classifi-
cation methods have different assumptions, which may
References
not always be true, such as even distribution of features
Goetz MP et al (2007) The impact of cytochrome P450 2D6
across the data set, similar variance in case control
metabolism in women receiving adjuvant tamoxifen. Breast sample groups. There are assumptions about the gen-
Cancer Res Treat 101:113–121 eralization of findings and that a real association exists
MacMahon S, Peto R, Cutler J, Collins R, Sorlie P, Neaton J, between the identified biomarkers and the investigated
Abbott R, Godwin J, Dyer A, Stamler J (1990) Blood pres-
sure, stroke, and coronary heart disease. Part 1, Prolonged
conditions. In addition, statistical methods in cases of
differences in blood pressure: prospective observational low sample size and large quantified compounds tend
studies corrected for the regression dilution bias. Lancet toward overfitting, especially in cases where the out-
335:765–774 come of classification is a combination of multiple
Wilson C, Schulz S, Waldman SA (2007) Biomarker develop-
ment, commercialization, and regulation: individualization
compounds. Latent parameters not taken into consid-
of medicine lost in translation. Clin Pharmacol Ther eration at sample selection, collection, storage, or
81:153–155 analysis may influence the outcome of biomarker
Biomarkers, Protein Expression 127 B
discovery and may result in poor classification perfor- and validation: a statistical perspective. Pharmacogenomics
mance to classify new sample sets using the model 5(6):709–719
Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M,
built with previously identified biomarkers (Mischak Benigni A, Bennett SE, Bischoff R, Bongcam-Rudloff E,
et al. 2010). Capasso G, Coon JJ, D’Haese P, Dominiczak AF,
To test the association strength between the selected Dakna M, Dihazi H, Ehrich JH, Fernandez-Llama P,
B
biomarkers and the studied biological state as well the Fliser D, Frokiaer J, Garin J, Girolami M, Hancock WS,
Haubitz M, Hochstrasser D, Holman RR, Ioannidis JP,
generalizability of the classification, validation with an Jankowski J, Julian BA, Klein JB, Kolch W, Luider T,
independent sample set is compulsory. The selection Massy Z, Mattes WB, Molina F, Monsarrat B, Novak J,
of a new sample set should be based on the same Peter K, Rossing P, Sánchez-Carbayo M, Schanstra JP,
clinical criteria that were used in the study to identify Semmes OJ, Spasovski G, Theodorescu D,
Thongboonkerd V, Vanholder R, Veenstra TD,
the biomarker panel, and these clinical criteria should Weissinger E, Yamamoto T, Vlahou A (2010) Recommen-
also be applied in the final diagnostic test. For exam- dations for biomarker identification and qualification in clin-
ple, if samples with a certain age range are used to ical proteomics. Sci Transl Med 2(46ps42):46
perform discovery and validation studies, the final
diagnostic test is only valid for that age range. In
order to minimize the influence of latent variables it
is advisable to perform the validation in a multi-center Biomarkers, Protein Expression
setting with blinded samples. Application of a faster
analytical method targeting only the selected com- Péter Horvatovich
pounds is recommended for the validation phase to Department of Pharmacy, Analytical Biochemistry
increase the statistical power. For example, when com- Research Group, University of Groningen, Groningen,
prehensive profiling using label-free LC-MS is applied The Netherlands
in the discovery phase, targeted Multiple-Reaction-
Monitoring (MRM) may be used to quantify the iden-
tified discriminating compounds in the independent Synonyms
data sets. MRM acquisition enables faster analysis,
resulting in higher sample throughput. Both the dis- Biomarkers; Independent validation; Protein expres-
covery and validation data sets should have a large sion; Solid tissue
enough sample size to provide meaningful statistical
outcome (Mischak et al. 2010; Feng et al. 2004). How-
ever, use of an independent validation data set with Definition
enough power is a necessary but not sufficient condi-
tion to identify a clinically successful biomarker panel, Biomarkers are characteristics that are objectively
but other parameters such as sample selection, collec- measured and evaluated as indicators of normal
tion, characterization and handling, and a clear state- biologic processes, pathogenic processes, or pharma-
ment of the clinical objectives are also important. cologic responses to a therapeutic intervention
Predictive value of the diagnostic test should be deter- (Biomarkers Definitions Working Group 2001). In
mined on the independent validation set as it provides most cases, biomarkers are related to compounds
unbiased results compared to the more optimistic pre- such as proteins or metabolites, whose concentrations
dictive value determined using only the data set used are specifically correlated to the biological state(s).
for discovery (Azuaje 2010). Biomarkers are mostly used to diagnose diseases and
to monitor disease progression and treatment effi-
ciency. Other physical measures such as MRI measure-
References ments, PET images, or imaging mass spectrometry
data may be used as biomarkers; however, in most
Azuaje F (2010) Bioinformatics and biomarker discovery: cases, these measures are strongly correlated with the
"Omic" data analysis for personalized medicine. Wiley,
molecular change due to disease. A surrogate marker
Chichester
Feng Z, Prentice R, Srivastava S (2004) Research issues and (Katz 2004) is a laboratory measurement or physical
strategies for genomic and proteomic biomarker discovery sign that is used in therapeutic trials as a substitute for a
B 128 Biomarkers, Protein Expression

cut-off value modifications, and modified proteins may perform dif-


ferent biological roles. This may lead to an enormous
controls explosion in the number of protein forms, and sophis-
ticated analytical methods are required to distinguish
number number

TN
between the different forms. The situation is further
FP complicated because the activity of proteins may be
FN [X] influenced by many factors, such as cofactors, forma-
tion of protein complexes, local pH conditions, and the
TP
presence of natural inhibitors, and often in is not the
diseased concentration but the activity of the protein that is in
strong relation with the disease state. This has lead to
the development of analytical methods providing
activity profiles of classes of proteins (Fig. 2).
Biomarkers, Protein Expression, Fig. 1 Concentration dis-
tribution of hypothetical protein biomarker in group of samples For diagnosis and treatment, follow-up biomarkers
obtained from healthy control group and diseased individuals. are mostly detected from easily accessible body fluids
The difference of the concentration distribution and the cut-off such as blood, urine, and saliva or from the cerebro-
value define the number of true positive (TN), false positive spinal fluid. However, in body fluids, the difference
(FP), true positive (TP) and the false negative (FN)
between the concentration of the least and most
abundant compounds is large (11–12 orders of magni-
tude in the case of human blood) (Schiess et al. 2009).
clinically meaningful endpoint, and it is a direct mea- Interesting biomarker candidates originating from the
sure of how a patient feels, functions, or survives and is diseased tissue(s) or organ(s) are mixed with com-
expected to predict the effect of the therapy. The main pounds from all other parts of the body, providing
difference between a surrogate marker and a biomarker a huge challenge for detection by analytical chemistry.
is that a biomarker is a meaningful “candidate” for Other biomarker discovery approaches first identify
a surrogate marker, and a surrogate marker is a test biomarker candidates directly in the diseased cells,
used as a measure of the effects of a specific treatment. tissues, or organs, followed by subsequent validation
Proteins are most suitable for biomarkers as their of the identified candidate compounds or their specific
expression is regulated in correlation with the general metabolites if they are present in sufficient amount in
state of living organisms and contributes considerably easy accessible body fluids, where the final diagnostic
to the display of the phenotype. In an experimental test is applied.
context, this means measuring protein concentration The most widely used techniques are antibody-
distribution in samples obtained at different biological based ELISA tests, protein arrays, and specific LC-
states. Figure 1 shows different concentration distribu- MS analysis. Research to identify new biomarkers is
tions of a hypothetical biomarker between two stages, composed from biomarker discovery process, where
corresponding to the healthy and diseased stages. a high number of proteins are identified in low number
Overlap between the two concentration distributions of samples using comprehensible quantitative analyti-
and the cut-off value define the Type I (number false cal techniques such as protein arrays, LC-MS, or
positive or FN) and Type II errors (number false neg- SELDI measurements. This is followed by identifica-
ative or FP) and specify the sensitivity and specificity tion of the most discriminating peaks between
of the test. The area under the receiving operation predefined class of samples (e.g., healthy and disease)
curve (AUROC) can be used to compare efficiency of and validation of the biomarkers on large number of
different diagnostic tests. samples using fast, targeted analytical methods such
Proteins have a complex life cycle from synthesis as MRM-based LC-MS or ELISA tests. The final val-
on the ribosome to the ubiquitin degradation idation comprises the exploration of the role of
mechanism. During the life span of a protein it may the biomarker compound(s) in the mechanism of the
undergo several chemical, isomerization, and sterical disease.
Biomarkers, Protein Expression 129 B
Albumin

1010
Transferrin
Alpha-2-macroglobulin
109 Alpha-1-acid glycoprotein 1 mg/ml
B
Apolipoprotein C-III
108 Clusterin
Apolipoprotein C-I
Complement component 7
107 Tetranectin
dynamic concentration range

EGF receptor Coagulation factor XIII B chain


Vitamin K dependent protein C
Kallikrein 11
106 μg/ml
Fibronectin 1
Vitronectin
105 Cytokeratin 8
Chromogranin A Neutrophil defensin 1
classical plasma proteins Alpha-fetoprotein Prolactin
104 Her2/neu
Inhibin
Kallikrein 6 Ep-CAM
soluble IL-2R alpha
103 Kallikrein 3 (PSA) Thyroglobulin ng/ml
CEA
Gastrin Kallikrein 10
tissue leakage products
VEGF Choriogonadotropin
102
Colony stimulating factor 1 Somatostatin
101 Corticotropin-lipotropin
Calcitonin

100 pg/ml
interleukins, cytokines

10−1
plasma proteins

Biomarkers, Protein Expression, Fig. 2 Plasma protein con- the HUPO plasma proteome initiative and yellow dots represent
centration showing three main categories (classical plasma pro- currently utilized biomarkers (Figure taken from Schiess et al.
teins, tissue leakage products, signaling compounds e.g., (2009))
cytokines). Red dots indicate proteins that were identified by

Affinity-based protein arrays or two-dimensional Methods without using any chemical modifications are
gel electrophoresis with mass spectrometric identifica- called label-free, and methods using chemical reaction
tion are able to quantify whole intact proteins. Due to or incorporation of amino acids with non-natural iso-
the difficulties to separate intact proteins with reverse- tope compositions form one other type of protein quan-
phase chromatography compatible with MS detection, tification method.
LC-MS methods use protein fragments obtained with
cleavage of protease enzymes such as trypsin. In this
approach, which is also called as shotgun proteomics,
the better chromatographic separation is at the expense
References
of obtaining higher sample complexity, which
Biomarkers Definitions Working Group (2001) Biomarkers and
results in difficulty to obtain exact quantification of surrogate endpoints: preferred definitions and conceptual
the intact protein in the presence of proteins with framework. Clin Pharmacol Ther 69(3):89–95
high homology having few peptides in common or Katz R (2004) Biomarkers and surrogate markers: an FDA
perspective. NeuroRx 1(2):189–195
proteins having partly different type of post-translation
Schiess R, Wollscheid B, Aebersold R (2009) Targeted proteo-
modifications. Quantification of compound using mass mic strategy for clinical biomarker discovery. Mol Oncol
spectrometry can be also performed in several ways. 3(1):33–44
B 130 Biomarkers, Ranking

the Top-K gene list (k << n). Biomedical reports


Biomarkers, Ranking usually report gene lists ranging from the Top-10 to
the Top-50 (Lockhart and Winzeler 2000; Parmigiani
Ronnie Alves et al. 2003).
Instituto de Informática, Universidade Federal do Rio
Grande do Sul, Porto Alegre, Brazil
References

Ewens WJ, Grant GR (2004) Statistical methods in bioinformat-


Synonyms ics: an introduction (statistics for biology and health),
2nd edn. Springer, New York
Gene ranking; Order statistics; Transcriptomic data Lockhart DJ, Winzeler EA (2000) Genomics, gene expression
and DNA arrays. Nature 405(6788):827–836
analysis Parmigiani G, Garett ES, Irizarry RA (2003) The analysis of
gene expression data. Springer, New York

Definition

Biomolecular markers (Biomarkers) include altered or Biomarkers, Solid Tissue


mutant genes, RNA, proteins, lipids, carbohydrates,
small metabolites molecules, and changed expression Péter Horvatovich
states of such markers that can be correlated with Department of Pharmacy, Analytical Biochemistry
biological behavior or a clinical outcome. Most of the Research Group, University of Groningen, Groningen,
discovered biomarkers are based on changes in gene The Netherlands
expression patterns from profiling (Transcriptomic)
studies (Ewens and Grant 2004).
In order to make a clear definition of the term “bio- Synonyms
markers” solely based on gene expression data, let us
define gene expression data as a pair GED ¼ (g,c), Biomarkers; Protein expression
where the m  n matrix g ¼ (gij)i ¼ 1,. . .,m; j ¼ 1,. . .,n
contains m observations of the random vector (G1,. . .,
Gn) (as an example, the expression levels of m genes), Definition
and c ¼ (c1,. . .,cm) stores either an experimental con-
dition C fixed by design or the response variable of Biomarker discovery approaches based on case-
interest for those m observations. Let us define control studies of solid tissues or organs are best placed
a ranking of the variables G1,. . .,Gn as a permutation to provide compounds with a close relation to the
r ¼ (rj)j ¼ 1,. . .,n of (1,. . .,n), where rj is the rank of the disease. However, diagnostic tests are generally
variable Gj with respect to its association between the applied to easy accessible body fluids such as blood,
considered gene and C, either positive or negative. urine, saliva, or cerebrospinal fluid, and, in most cases,
A ranking provides an ordered gene list sampling solid tissues and organs affected by
gl ¼ (glk)k ¼ 1,. . .,n defined by pathological alterations is too invasive for the patient.
Therefore, compounds strongly associated with the
Glk ¼ j , rj ¼ k for all j; k ¼ 1; . . . ; n: (1) disease indentified in solid tissues and organs need to
find their way directly or indirectly to the body fluid
Taking the differential expression of a gene G54, the sampled for the diagnostic test. A further complication
variables r54 ¼ 1 and gl1 ¼ 54 mean that G54 is pointed is that such compounds, after being transported to the
as the most differentially expressed gene. Since gl is an target body fluid, are surrounded by a large number of
ordered list, the k top genes in the list gl1,. . .,glk form other compounds having a large dynamic
Biomarkers, Solid Tissue 131 B
Biomarkers, Solid Tissue,
Fig. 1 Workflow for MALDI Tissue slide
imaging mass spectrometry
analysis. Schematic outline of
a typical workflow for fresh
frozen tissue samples. Sample Matrix application B
pre-treatment steps include Las
er
cutting and mounting the
tissue section on a conductive
target. Matrix is applied in an
ordered array across the tissue Laser ablation
section and mass spectra are
acquired at each x, Tandem MS MS
y coordinate. Single stage
mass spectra are used to found
discriminating between
diseased and healthy area with
statistical methods. Tandem
MS (MS/MS) spectra are used
for peptide and protein
identification. Further data
MS/MS spectrum
analysis steps include the
visualization of the
distribution of a single peptide Mass spectra for each x,y coordinate
or protein within the tissue or Single m/z Biocomputational
to visualize the image of values analysis
discriminating peaks. The Peptide fragments
scale represents the relative
intensity of the protein
(Figure taken from ref
(Schwamborn and Caprioli
2010)) Database search 0% 100% Class A Class B
Protein identification Protein images Classification images

concentration range (often 11–12 orders of magni- case-control sampling of the same tissue and therefore
tude). This presents an enormous challenge for analyt- excludes the majority of the biological variance from
ical chemistry and statistical methods to select the analysis. However, disease cells may influence the
compounds with a real association with the disease. surrounding healthy cells, e.g., with proteins secreted
Tissue- or organ-derived discriminating compounds by the disease-altered cells. Therefore, validation by
should therefore be validated so that they reach periph- comparing disease-altered cells, the surrounding
eral body fluids either in intact or modified forms. One healthy tissue in diseased tissue, and healthy cells in
promising approach is the measurement of the healthy samples is required.
secretome molecular profile of in vitro cultivation of MALDI imaging mass spectrometry (Schwamborn
removed tissue samples. Generally, samples obtained and Caprioli 2010) is one other promising technique
from solid tissues are pure, but inevitable contamina- for finding tissue-derived compounds closely related to
tion from blood may result in alteration of the the disease. It enables direct in situ or cross-sample
measured tissue secretome molecular profile. comparison of the molecular profile of diseased
The most widely used sampling method for solid and healthy cells in tissue sections (Schwamborn and
tissues is laser capture microdissection (Murray and Caprioli 2010). Figure 1 presents the outline of MALDI
Curran 2005; Liu 2010), which enables highly accurate imaging mass spectrometry analysis from tissue
cutting of diseased altered and healthy tissue pieces section preparation to the statistical analysis of the
using precise laser shots. This technique enables acquired data.
B 132 Biomedical Decision Support Systems

References
Biomedical Named Entity Recognition,
Liu A (2010) Laser capture microdissection in the tissue Whatizit
biorepository. J Biomol Tech 21(3):120–125
Murray GI, Curran S (2005) Laser capture microdissection:
methods and protocols, Methods in Molecular Biology Dietrich Rebholz-Schuhmann
series, vol 293. Humana Press, Totowa European Bioinformatics Institute, Hinxton, UK
Schwamborn K, Caprioli RM (2010) Molecular imaging
by mass spectrometry–looking beyond classical histology.
Nat Rev Cancer 10(9):639–646
Definition

Named entities in the biomedical research domain


comprise genes and proteins, diseases, species, chem-
Biomedical Decision Support Systems ical entities, anatomical components, and other seman-
tic types. A large number of biomedical named entities
Guy Tsafnat have to be included into text-mining solutions due to
Centre for Health Informatics, Australian Institute for the descriptive nature of biomedical research.
Health Innovation, University of New South Wales,
Sydney, NSW, Australia
Introduction

Synonyms Ready access to text-mining solutions requires the


integration of various resources and specialized
Clinical decision support systems technologies into an open access infrastructure. In the
past, solutions have been proposed that incorporate
these resources into standalone applications (Friedman
Definition et al. 2001; Kano et al. 2009). Unfortunately, such
solutions have an architecture that does not support
A biomedical decision support system (DSS) is any well integration of bioinformatics services similar to
computational system that aids decisions made open IT solutions such as Taverna (Hull et al. 2006).
by clinicians, medical researchers, and medical biolo- Other systems such as iHOP provide special interfaces
gists when interpreting large amounts of information for programmatic access, but iHOP does not allow to
relevant to a particular decision. Some DSS may process other documents than Medline abstracts with
support decision making by retrieving (only) the new means (Hoffmann et al. 2005).
information required to make the decision, while A Web service–based TM solution centralizes and
others also summarize the information and/or predict harmonizes crucial tasks and thus solves a number of
the outcomes of decisions. difficulties reducing maintenance for users. A server-
Examples of biomedical decision support based solution can incorporate large terminology sets
systems are: from biomedical data resources: updates to these
• Information retrieval systems that help clinicians resources are efficiently propagated through the server.
find clinical guidelines accurately and quickly The end user profits from a harmonized schema,
• Intelligent systems that suggest diagnoses based on including coupling of text processing services to bio-
patient symptoms informatics data resources.
• Simulation systems that calculate the outcomes of
complex biochemical pathway to predict treatment
outcomes Implementation
• Data analysis systems that summarize, interpret,
and visualize hundreds of microarray assays Whatizit is a modular infrastructure that delivers TM
identifying genes that are under- or over-expressed services to the public. Each module processes and
in cancer tissue annotates text, for example, identifies named entities
Biomedical Named Entity Recognition, Whatizit 133 B
and introduces links to database entries. Individual the demands of different extraction methods and with
modules can be composed of a number of internal the amount of literature processed over time.
modules. All terminologies are based on publicly
available resources (e.g., UniProtKb/Swiss-Prot, gene Acknowledgments This work is funded by the Network of
ontology, DrugBank, see below). Terms are matched Excellence “Semantic Mining” (NoE 507505). Medline
abstracts are provided from the National Library of Medicine B
to the text taking morphological variability into con-
(NLM, Bethesda, MD, USA). Sylvain Gaudan is supported by an
sideration (Kirsch et al. 2006). “E-STAR” fellowship (EC’s FP6 Marie Curie Host fellowship,
The user submits the name of a pipeline and the text MESTCT-2004-504640).
to be annotated, and Whatizit returns the text with all
the annotations contained. As an alternative, the user
can access Whatizit services without building a client Cross-References
application. On the Web interface, Whatizit provides
a text input area where user can submit any kind of ▶ Applied Text Mining
Unicode encoded text or retrieve Medline abstracts for ▶ BioCreative Meta-Server and Text-Mining
subsequent analysis. Interoperability Standard
Different types of modules are available through the ▶ Bio-Ontologies
Whatizit infrastructure. One set of modules annotates ▶ Information Extraction
named entities. WhatizitChemical searches for chemi- ▶ Named Entity Recognition
cal entities based on the terminology from ChEBI and ▶ Natural Language Processing
the identification of chemical terms by OSCAR3 ▶ Ontology Lookup Service for Controlled
(Corbett et al. 2006). WhatizitDisease identifies Vocabularies and Data Annotation
disease terms using a controlled vocabulary (CV)
extracted from MedlinePlus, whereas whatizitDi-
seaseUMLS allows access to MetaMap (Aronson
2001). For whatizitDrugs, the CV has been extracted References
from DrugBank (http://redpoll.pharmacy.ualberta.ca/
Aronson AR (2001) Effective mapping of biomedical text to the
drugbank/). WhatizitGO is a pipeline searches for
UMLS Metathesaurus: the MetaMap program. In: Proceed-
gene ontology terms using exact matching and consid- ings of the AMIA Symposium, vol 5. Hanley & Belfus,
ering morphological variability (Ashburner et al. Philadelphia, pp 17–21
2000). Last, whatizitOrganism identifies species Ashburner M et al (2000) Gene ontology: tool for the unification
of biology. The gene ontology consortium. Nat Genet
names extracted from the NCBI taxonomy (NLM).
25(1):25–29
Other annotation pipelines represent solutions Corbett P, Murray-Rust P (2006) High-throughput identification
that are more complex, i.e., they identify combinations of chemistry in life science texts. In: Proceedings of the
of semantic types: for example, whatizitSwissprotGo CompLife. Lecture notes in bioinformatics, vol 4216.
Cambridge, pp 107–118
for UniProtKb/Swiss-Prot in conjunction with GO
Friedman C et al (2001) GENIES: a natural-language processing
annotations and whatizitEbiMed for the annotation system for the extraction of molecular pathways from journal
pipeline from EbiMed (Rebholz-Schuhmann et al. articles. Bioinformatics 17(Suppl 1):S74–S82
2007). The retrieval engine for Medline abstracts is Gaizauskas R et al. (1996) GATE: an environment to support
research and development in natural language engineering.
accessible via the module whatizitQbmarsdf. For
In: Proceedings of the 8th IEEE international conference on
the retrieval, the user has to submit query terms or tools with artificial intelligence, Toulouse, pp 58–66, ISBN
PubMed IDs. 0-8186-7686-8
Hatcher E, Gospodnetic O (2004) Lucene in action. Manning,
Greenwich
Hirschman L et al (2005) Overview of BioCreAtIvE task 1B:
Conclusion normalized gene lists. BMC Bioinformatics 6(Suppl 1):S11
Hoffmann R, Valencia A (2005) Implementing the iHOP con-
Whatizit is a service that copes with large terminologi- cept for navigation of biomedical literature. Bioinformatics
21(Suppl 2):ii252–ii258
cal resources, is aligned with updates from the primary
Hull D et al (2006) Taverna: a tool for building and running
resource, and is available through a centralized service workflows of services. Nucleic Acids Res 34:W729–W732,
that scales with the amount of integrated resources, with Web server issue
B 134 Biomedical Natural Language Processing (BioNLP)

Kano Y, Baumgartner WA Jr, McCrohon L, Ananiadou S,


Cohen KB, Hunter L, Tsujii J (2009) U-Compare: share BioModels Database: A Repository of
and compare text mining tools with UIMA. Bioinformatics
25(15):1997–1998 Mathematical Models of
Kirsch H et al (2006) Distributed modules for text annotation and Biological Processes
IE applied to the biomedical domain. Int J Med Inform
75(6):496–500 Vijayalakshmi Chelliah1, Camille Laibe2 and
Rebholz-Schuhmann D et al (2006a) Annotation and Disambig-
uation of semantic types in biomedical text: a cascaded Nicolas Le Novère3
1
approach to named entity recognition. In: Workshop on EMBL European Bioinformatics Institute, Hinxton,
“Multi-dimensional markup in NLP”, EACL 2006. ACL, Cambridgeshire, UK
USA 2
EMBL-European Bioinformatics Institute, Wellcome
Rebholz-Schuhmann D et al (2006b) IeXML: towards an anno-
tation framework for biomedical semantic types enabling Trust Genome Campus, Hinxton, Cambridgeshire, UK
3
interoperability of text processing modules. SIG BioLink, EMBL European Bioinformatics Institute and
ISMB 2006, Fortaleza, Brasil Babraham Institute, Cambridge, UK
Rebholz-Schuhmann D et al (2007) EBIMed–text crunching to
gather facts for proteins from medline. Bioinformatics 23(2):
e237–e244
Schmid H (1994) Probabilistic part-of-speech tagging using Synonyms
decision trees. In: Proceedings of the international confer-
ence on new methods language processing, vol 12, BioModels; Model repository
Manchester

Definition

BioModels Database is a public online resource that


Biomedical Natural Language Processing allows storing and sharing of published, peer-reviewed
(BioNLP) quantitative, dynamic models of biological processes.
The model components and behavior are thoroughly
▶ Natural Language Processing checked to correspond the original publication and man-
ually curated to ensure reliability. Furthermore, the
model elements are annotated with terms from controlled
vocabularies as well as linked to relevant external data
resources. This greatly helps in model interpretation and
Biomedical Ontologies reuse. Models are stored in SBML format, accepted in
SBML and CellML formats, and are available for down-
▶ Bio-Ontologies load in various other common formats such as BioPAX,
Octave, SciLab, VCML, XPP and PDF, in addition to
SBML. The reaction network diagram of the models is
also available in several formats. BioModels Database
features a search engine which provides simple and more
Biomedical Web Communities advanced searches. Features such as online simulation
and creation of smaller models (submodels) from the
▶ Collaborative and Distributed Biomedical selected model elements of a larger one are provided.
Applications BioModels Database can be accessed both via a web
interface and programmatically via web services. New
models are available in BioModels Database at regular
releases, about every 4 months.

BioModels Characteristics

▶ BioModels Database: A Repository of Mathemati- BioModels Database (http://www.ebi.ac.uk/biomodels/)


cal Models of Biological Processes (Le Novère et al. 2006; Li et al. 2010a) hosts a collection
BioModels Database: A Repository of Mathematical Models of Biological Processes 135 B
BioModels Database:
A Repository of
Mathematical Models of
Biological Processes,
Fig. 1 Growth of BioModels
Database: The total number of B
models (red) and the total
number of reactions (green) in
all models are plotted here.
The number of relationships
includes SBML “reactions,”
“rate rules,” “assignment
rules,” and “events.” There has
been approximately a 20-fold
increase in the number of
models since the launch of the
resource in 2005, with an
average increase in
complexity of the models
(measured by the number of
mathematical relationships)
being increased five times in
the same period

BioModels Database:
A Repository of
Mathematical Models of
Biological Processes,
Fig. 2 Type of models:
Categorization of models in
the curated branch of
BioModels Database based on
the GO terms present in the
annotation of the models. This
chart was generated by
enumerating models in the
database, whose annotations
refer either the GO terms listed
here or the child of the GO
terms listed here

of mathematical models of biological processes, Diversity of Models Hosted


described in peer-reviewed scientific literature. It is BioModels Database covers a wide range of models
part of the BioModels.net initiative (http://biomodels. from several biological categories. It hosts models
net/) (Le Novère 2006), which aims to help researchers from simple biochemical reaction systems to larger
in the modeling field to exchange and build upon each and complex dynamic models, metabolic network
other’s work with greater ease and accuracy. models, and FBA models. Figure 2 represents the
It has significantly grown both in size and number categorization of models in the curated branch using
of the models since its origin in 2005 (Fig. 1) and terms from Gene Ontology (GO) present in the model
serves as an efficient model sharing platform. annotation.
B 136 BioModels Database: A Repository of Mathematical Models of Biological Processes

Models Provenance the reference publication. A representative figure or


Models are implemented from articles published in table reproduced by the model (which is present in
peer-reviewed scientific journals by a team of curators. the reference publication) together with a description
Also, there are an increasing number of models which of how it was obtained are available for each model.
are directly submitted by the modelers themselves and This ensures that the encoded form of the model that is
this demonstrates the popularity and recognition of provided corresponds to what was described in the
BioModels Database in the community. In the past, paper.
some models came from collaboration with other There are several reasons for models to be in the
repositories, such as the former SBML model reposi- non-curated branch. These are models that either do
tory (Caltech, USA), JWS Online, (http://jjj.biochem. not satisfy the full requirements for MIRIAM compli-
sun.ac.za/) the Database Of Quantitative Cellular Sig- ance or have not been curated yet due to limited time
naling (DOQCS), (http://doqcs.ncbs.res.in/) and the and resources. Many of these models are pathway
CellML Repository (http://models.cellml.org/cellml). maps or models for networks and Flux Balance Anal-
Several scientific journals recommend deposition of ysis (FBA), without sufficient quantitative results pro-
models to BioModels Database in their instructions vided for validation. Some others are only the subsets
for authors. These include journals that are published of the whole model described in the article, as some
by Nature Publishing Group (NPG), Public Library of parts cannot be encoded in SBML. And finally, a small
Science (PLoS), Royal Society of Chemistry (RSC), number of models could not be made to reproduce the
and BioMed Central (BMC). published results due to untraceable errors in the
implementation or typos in the publication (often
Model Submission, Curation, Annotation, and even after contacting the authors of the article).
Publication All model elements in the curated branch are fur-
Submission of models to BioModels Database is free thermore thoroughly annotated with cross-references
and is open to everyone. Models can be submitted via to other database records and ontology terms.
an online interface and are accepted in two formats, The annotations are included in the models using
SBML (Systems Biology Markup Language) (http:// MIRIAM URIs (Juty et al. 2012). As model elements
sbml.org/) and CellML. While BioModels Database are not always named precisely to relate directly to the
only distributes models that have been described in corresponding biological processes or physical entity,
the peer-reviewed scientific literature, models can be annotations are necessary to enhance interpretability
submitted prior to the publication of their associated by both users and software tools. To date, model ele-
paper. However, these models are made publicly avail- ments in BioModels Database are annotated using
able only after the publication of their corresponding around 40 different external data resources. Some of
paper. At the time of submission, each model is the predominantly used external resources for model
assigned a unique and perennial identifier which annotations are Gene Ontology, ChEBI ontology,
allows users to access and retrieve it. This identifier Brenda Tissue Ontology, Systems Biology Ontology
can be used by authors as a reference in their (SBO), Taxonomy, Reactome, KEGG, and UniProt.
publications. A collection of data resources and their URIs can be
The models that are submitted pass through several obtained from MIRIAM Registry (http://www.ebi.ac.
steps prior to get published. BioModels Database is uk/miriam/) (Laibe and Novère 2007).
composed of a curated and a non-curated branch. Once the curation and annotations are completed,
Models in both these branches are fully SBML com- the model is tagged as ready for publication, and
pliant. Depending on their curation status, models are becomes available online during the next release of
moved to one of the two branches. Models that satisfy BioModels Database. New releases happen two to
the MIRIAM (Minimum Information Required in the four times a year.
Annotation of Models) guidelines (Le Novère et al.
2005) progress to the curated branch. These models Model Browsing, Searching, and Retrieval
are thoroughly checked and corrected for accuracy as There are several features available through the
they must match and reproduce the results published in web interface which facilitate efficient usage of
BioModels Database: A Repository of Mathematical Models of Biological Processes 137 B
the models. They can be browsed by branches. The Action menu provides an additional feature for
They can also be located by using a tree-structured some model, called “Model of the Month” which is
browser based on the Gene Ontology (GO) terms a brief article that discusses the biological background,
used in the annotation of the models. BioModels significance, structure, and results of the model. The
Database incorporates a powerful search engine that “Model of the Month” is also accessible through BMC B
allows users to retrieve models of their interest. The Systems Biology Gateway.
search can either be a simple keyword search or an
advanced one. Web Services
For programmatic access, BioModels Database fea-
Model Display tures web services (http://www.ebi.ac.uk/biomodels-
Each model in the curated branch is presented in main/webservices) (Li et al. 2010b), allowing, for
a tabbed form, providing access to all the information example, direct retrieval of complex searches for
stored about the model. The model elements are models, and the creation of submodels. The available
hyperlinked between its entries in different tabs and services are described in a Web Services Description
in addition, the annotations are hyperlinked to their Language (WSDL) file that enables software to under-
detailed resource page. The detailed description stand available functions and their usage. The web
about the model is separated into six categories services use the Simple Object Access Protocol
namely: Model, Overview, Math, Physical entities, (SOAP) to encode requests and responses. The com-
Parameters, and Curation, which are all accessible plete list of available methods, as well as a Java library
via a dedicated tab. and the associated documentation, is provided on the
BioModels Database website.
Model Exports BioModels Database is developed under the GNU
For convenience, as certain simulation tools General Public License and the software is freely
support only specific Levels or Versions of SBML, available from its SourceForge repository (http://
the Download SBML menu allows users to sourceforge.net/projects/biomodels/).
download the model in various versions of SBML,
including the version that was checked by the Usage
curators. Models can be downloaded in various BioModels Database has significantly grown both in
other formats such as BioPAX, (http://www.biopax. size and number of the models since its origin in 2005
org/) the Virtual Cell Markup Language (VCML), and serves as an efficient model sharing platform.
(http://vcell.org/) XPPAUT, (http://www.math.pitt. BioModels Database announced its 21st release on
edu/~bard/xpp/xpp.html) SciLab, (http://www.scilab. 8 February 2012, with a total of 829 models provided.
org/) Octave (m-file), (http://www.gnu.org/software/ The submission of models by modelers/authors them-
octave/) and PDF (generated using the SBML2LaTeX selves is increasing rapidly, and this demonstrates the
tool) through the “Other formats (auto-generated)” popularity and recognition of BioModels Database in
menu. the community. The resource helps modelers to reuse
The reaction network of the model that follows the already existing models or model components, to mod-
Systems Biology Graphical Notation (SBGN) is avail- ify them by implementing their own theory, to publish
able in PNG and SVG formats via the Action menu. articles describing new models, and submit those new
The reaction networks are also provided in a dynamic models to BioModels Database. For example,
way via an interactive Java applet. BIOMD0000000176 and BIOMD0000000177 are
The Actions menu also provides access to the online derived from BIOMD0000000172 which in turn is
simulation tools. BioModels Database embeds SOSlib derived from BIOMD00000000064. BioModels
(Machné et al. 2006) to provide a basic online simula- Database is also used as a source of trusted models
tion tool. The simulation results are returned both in for benchmarking model simulation software
graphical and textual form. For many models, an addi- packages.
tional and more flexible simulation tool is available BioModels Database has the potential to serve as
using JWS Online. a comprehensive repository for computational systems
B 138 Biomolecular Interaction Networks

biology models, similar to the functionality of


GenBank and Protein Data Bank (PDB), the data BioNLP Shared Task
resources for genes and protein three-dimensional
structures. Jin-Dong Kim1 and Sampo Pyysalo2
1
Database Center for Life Science, Tokyo, Japan
2
Department of Computer Science, University of
Cross-References Tokyo, Tokyo, Japan

▶ MIRIAM Guidelines
▶ MIRIAM URI Definition
▶ SBGN
▶ Systems Biology Markup Language (SBML) BioNLP Shared Task (BioNLP-ST, hereafter) is
▶ Systems Biology Ontology a series of shared evaluations and workshops focused
on biomolecular event extraction from literature. The
first BioNLP-ST evaluation was organized in 2009 by
References the Tsujii Laboratory of the University of Tokyo, with
a workshop held under the auspices of Biomedical
Juty N, Le Novère N, Laibe C (2012) Identifiers.org and Natural Language Processing Special Interest Group
MIRIAM registry: community resources to provide persis-
tent identification. Nucleic Acids Res 40:D580–D586
(SIGBIOMED) of the Association for Computational
Laibe C, Novère NL (2007) MIRIAM resources: tools to gener- Linguistics (ACL). The task targeted the extraction of
ate and resolve robust cross-references in systems biology. biomolecular events (bio-events), defined as changes in
BMC Syst Biol 1:58 the states of biomolecular objects following the defini-
Le Novère N (2006) Model storage, exchange and integration.
tion and event representation of the GENIA project. The
BMC Neurosci 7(Suppl 1):S11
Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, task data was based on the GENIA corpus (Kim et al.
Collado-Vides J, Crampin EJ, Halstead M, Klipp E, 2008) human-curated annotations of bio-events in
Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL, PubMed abstracts. BioNLP-ST 2009 was the first
Spence HD, Wanner BL (2005) Minimum information
community-wide effort for the automatic extraction of
requested in the annotation of biochemical models
(MIRIAM). Nat Biotechnol 23:1509–1515 bio-events from literature (Kim et al. 2009).
Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, The event extraction task was quite new to much
Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, of the community, as most bio-text mining (bio-TM)
Hucka M (2006) BioModels Database: a free, centralized
targeted simple binary relationships between
database of curated, published, quantitative kinetic models
of biochemical and cellular systems. Nucleic Acids Res 34: bio-entities, such as protein–protein interactions (PPI)
D689–D691 (Bunescu et al. 2004) and disease-gene associations
Li C, Courtot M, Le Novère N, Laibe C (2010a) BioModels.net (DGA) (Chun et al. 2006). However, BioNLP-ST’09
Web Services, a free and integrated toolkit for computational
met community-wide participation, with 42 teams
modelling software. Brief Bioinform 11:270–277
Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, signing up for initial registration and 24 teams submit-
Chelliah V, Li L, He E, Henry A, Stefan MI, Snoep JL, ting final results.
Hucka M, Le Novère N, Laibe C (2010b) BioModels data- At the time of writing, BioNLP-ST 2011, the
base: an enhanced, curated and annotated resource for
second event of the series, is being organized as
published quantitative kinetic models. BMC Syst Biol 4:92
Machné R, Finney A, M€ uller S, Lu J, Widder S, Flamm C (2006) a joint effort of several groups. The evaluation seeks
The SBML ODE solver library: a native API for symbolic to provide a variety of tasks and annotations centered
and fast numerical analysis of reaction networks. Bioinfor- around event extraction.
matics 22:1406–1407

Characteristics

Biomolecular Interaction Networks The MUC (1987–1997) (Chinchor 1998), TREC


(1992–) (Voorhees 2007), and ACE (1999–) (Strassel
▶ Interaction Networks et al. 2008) shared evaluation initiatives have
BioNLP Shared Task 139 B
significantly motivated research in information 3. Detailed evaluations at each subtask were
retrieval (IR), information extaction (IE), and text provided.
mining (TM) for general domain text. There have 4. Human-curated corpus annotations corresponding
been similar efforts organized for bio-IR, -IE or -TM. to each subtask were prepared for system training,
The TREC Genomics track (2004–2007) (Hersh et al. tuning, and evaluation purposes. B
2007) was organized to address bio-IR, and the 5. The basic IE task of named entity recognition was
JNLPBA shared task (2004) (Kim et al. 2004) to excluded to encourage the participants to concen-
address named entity recognition (NER). (Organized trate on event extraction, the novel challenge of the
by the GENIA project using the same corpus, JNLPBA task. Accordingly, the gold standard annotations for
can be regarded a predecessor to the BioNLP-ST protein references were provided to the participants
series.) The LLL challenge (2005) (Nédellec 2005) also for test data.
focused on interactions of proteins or genes (IE), and 6. Publicly available natural language processing
BioCreative (2005–) (Hirschman et al. 2007) addressed (NLP) tools adapted for the BioNLP-ST data format
a variety of tasks, including named entity normalization were provided in readily available packages to
and protein–protein interactions. encourage the adoption of NLP tools and to further
While LLL ran a typical relation extraction task to allow participants to focus on event extraction.
find interacting pairs of proteins or genes, BioCreative
and BioNLP-ST have taken definitive steps forward, Task Definition
although in different directions: BioCreative toward Table 1 shows the event types targeted at BioNLP-ST
user-oriented task settings and extrinsic evaluation 2009. The event types are selected from the GENIA
and BioNLP-ST toward fine-grained IE. The differ- ontology, with consideration given to their importance
ence in direction is motivated in part by and the number of annotated instances in the GENIA
different applications envisioned as being supported corpus. The selected event types all concern
by the IE methods. For example, BioCreative aims protein biology, implying that they take proteins as
to support curation of PPI databases, such as MINT their theme. The first three types concern protein
(Chatr-aryamontri et al. 2007), for a long time one of metabolism, i.e., protein production and breakdown.
the primary tasks in the domain. BioNLP-ST aims to The fourth, Phosphorylation, is one type of pro-
support the development of more detailed and structured tein modification, selected because it appears fre-
databases, e.g., pathway (Bader et al. 2006) or Gene quently in the GENIA corpus. Localization and
Ontology Annotation (GOA) (Camon et al. 2004) Binding are representative fundamental molecular
databases, which are gaining increasing interest in bioin- events. Regulation (including its subtypes, Positive
formatics research in response to recent advances in and Negative_regulation) represents regulatory events
molecular biology. and general causal relations. The last five event types
are universal, but frequently occur on proteins. For the
biological interpretation of the event types, readers are
BioNLP-ST 2009 referred to Gene Ontology (GO) and the GENIA
ontology.
The first event of the series, BioNLP-ST 2009 (http:// As shown in Table 1, the theme or themes of
www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/), all events are considered primary arguments, that is,
had the following characteristic features which made arguments that are critical to identifying the event. For
it different from and complementary to other shared regulation events, the entity or event stated as the cause
evaluation efforts. of the regulation is also regarded as a primary argu-
1. Events were represented as typed n-ary associa- ment. For some event types, further arguments detail-
tions of proteins and other events in various ing the events are also defined (Secondary Arg. in
roles, allowing complex, structured associations Table 1). From a computational point of view, the
of multiple entities to be captured. event types represent different levels of complexity.
2. The event extraction task was divided into three When only primary arguments are considered, the first
subtasks: core event extraction, event enrichment, five event types require only a single argument, and the
and negation and speculation recognition. task can be cast as binary relation extraction between
B 140 BioNLP Shared Task

BioNLP Shared Task, Type Primary arguments Secondary Arg.


Table 1 The event types and
their arguments targeted at Gene_expression Theme(Protein)
BioNLP-ST 2009. Arguments Transcription Theme(Protein)
that may be filled more than Protein_catabolism Theme(Protein)
once per event are marked Phosphorylation Theme(Protein) Site
with “+.” Site means a site of Localization Theme(Protein) AtLoc, ToLoc
a theme entity, and CSite a site
Binding Theme(Protein)+ Site+
of a cause entity
Regulation Theme(Protein/Event), Cause(Protein/Event) Site, CSite
Positive_regulation Theme(Protein/Event), Cause(Protein/Event) Site, CSite
Negative_regulation Theme(Protein/Event), Cause(Protein/Event) Site, CSite

a predicate (event trigger) and an argument (Protein). The failure of p65 translocation to the nucleus . . .
The Binding type is more complex in requiring T1 (Protein, 15-18)
the detection of an arbitrary number of arguments. T2 (Localization, 19-32)
E1 (Type:T2, Theme:T1, ToLoc:T3)
Regulation events always take a Theme argument
T3 (Entity, 40-46)
and, when expressed, also a Cause argument. Further, M1 (Negation E1)
a Regulation event may take an event as its primary
argument (theme or cause), creating structures linking BioNLP Shared Task, Fig. 1 Example event annotation. The
multiple events that represent causal chains. Such protein annotation T1 is given as a starting point. The extraction
of annotation in bold is required for Task 1, the Entity type
connections between events are a unique feature of t-entity T3 and the secondary argument ToLoc:T3 for Task 2,
the BioNLP task compared to other event extraction and M1 for Task 3
tasks such as ACE.

Format and Example e.g., binding and regulation, was a lot more challeng-
In the BioNLP-ST data sets, annotations to text are ing, achieving 40% of performance level.
provided in a stand-off style, as shown in the example
in Fig. 1. The annotations are of two general types: ones Continuation
identifying text spans referring to bio-entities (e.g., T1 After BioNLP-ST 2009, all the resources from the
and T3) and event triggers (e.g., T2), and ones expressing event were released to public, to encourage continuous
events (e.g., E1) and their modifications (e.g., M1). The efforts for further advancement. Since then, several
former type of annotations are represented by pairs, improvements have been reported (Bj€orne et al.
(type-specification, span-specification), and the latter by 2010; Miwa et al. 2010a, b; Poon and Vanderwende
n-tuples resembling predicate–argument structures, 2010; Vlachos 2010). For example, Miwa et al.
(predicate, argument1, argument2, etc.). (2010b) reported a significant improvement with bind-
ing events, achieving 50% of performance level.

Results
BioNLP-ST 2011
The final results enabled to observe the state-of-the-art
performance of the community on the bio-event The second event of BioNLP-ST series is organized for
extraction task. It showed that the automatic extraction 2011 (https://sites.google.com/site/bionlpst/). While
of simple events – those with unary arguments, its predecessor, BioNLP-ST 2009, relied on the
e.g., gene expression, localization, phosphorylation – GENIA corpus which only contained PubMed
could be achieved at the performance level of 70% abstracts on transcription factors in human blood
in F-score, but extraction of complex events, cells, the main theme of BioNLP-ST 2011 is
BioNLP Shared Task 141 B
generalization, which was pursued to three directions: Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J,
text types, event types, and subject domains. To Binns D, Harte N, Lopez R, Apweiler R (2004) The gene
ontology annotation (GOA) database: sharing knowledge in
achieve that, various tasks and annotations were col- uniprot with gene ontology. Nucl Acids Res 32(Suppl 1):
lected through contributions from several groups, and D262–266
five event extraction tasks were arranged in four tracks. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G,
B
• GENIA (GE) Schneider MV, Castagnoli L, Cesareni G (2007) MINT: the
molecular INTeraction database. Nucleic Acids Res
• Epigenetics and Post-translational Modifications 35(Suppl 1):D572–574
(EPI) Chinchor N (1998) Overview of MUC-7/MET-2. In: Message
• Infectious Diseases (ID) understanding conference (MUC-7) proceedings, Fairfax
• Bacteria Track Chun HW, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T,
Tsujii J (2006) Automatic recognition of topic-classified
– Bacteria Biotopes (BB) relations between prostate cancer and genes using
– Bacteria Interaction (BI) MEDLINE abstracts. BMC-Bioinformatics 7(Suppl 3):S4
The GE task remains similar to the task of BioNLP- Hersh W, Cohen A, Lynn R, Roberts P (2007) TREC 2007
ST 2009, to play the role of pivot for generalization and genomics track overview. In: Proceeding of the sixteenth
text retrieval conference, Gaithersburg
to measure the progress of community with the Hirschman L, Krallinger M, Valencia A (eds) (2007) Proceed-
previous task. However, to avoid overfitting to the ings of the second BioCreative challenge evaluation work-
evaluation data set, the GE task is planned to arrange shop. CNIO Centro Nacional de Investigaciones
additional text sets, most of them coming from full Oncológicas, Madrid
Kim JD, Ohta T, Tsuruoka Y, Tateisi Y, Collier N (2004) Intro-
paper articles, so that generalization from abstracts to duction to the bio-entity recognition task at JNLPBA. In:
full papers can be measured. The EPI task is arranged Proceedings of the international joint workshop on natural
to evaluate generalization of event types with focus on language processing in biomedicine and its applications
events relevant to epigenetics and post-translational (JNLPBA), Geneva, pp 70–75
Kim JD, Ohta T, Tsujii J (2008) Corpus annotation for mining
protein modification without further subdomain biomedical events from literature. BMC Bioinformatics
restruction. The ID and BB tasks involve generaliza- 9(1):10
tion to new domains, ID targeting events relevant to the Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J (2009) Overview of
biomolecular mechanisms of infectious diseases and BioNLP’09 shared task on event extraction. In: Proceedings
of natural language processing in biomedicine (BioNLP)
BB localization events of bacteria. NAACL 2009 workshop, Colorado, pp 1–9
BioNLP-ST 2011 also includes three supporting Miwa M, Pyysalo S, Hara T, Tsujii J (2010) A comparative study
tasks: Coreference, Entity relation, and Gene renaming. of syntactic parsers for event extraction. In: Proceedings of
Although these are not event extraction tasks them- BioNLP’10, Uppsala, pp 37–45
Miwa M, Sætre R, Kim JD, Tsujii J (2010b) Event extraction
selves, according to the analysis on the results of with complex event classification using rich features.
BioNLP-ST 2009, coreference resolution and entity rela- J Bioinform Comput Biol (JBCB) 8(1):131–146
tion detection are expected to play an important role for Nédellec C (2005) Learning language in logic – genic interaction
making a breakthrough in improving event extraction extraction challenge. In: Cussens J, Nédellec C (eds) Pro-
ceedings of the 4th learning language in logic workshop
performance. Gene naming has similar features with (LLL05), Bonn, pp 31–37
coreference; thus it is included as a supporting task. Poon H, Vanderwende L (2010) Joint inference for knowledge
extraction from biomedical literature. In: Proceedings of
NAACL-HLT’10, Los Angeles, pp 813–821
References Strassel S, Przybocki M, Peterson K, Song Z, Maeda K (2008)
Linguistic resources and evaluation techniques for evaluation
of cross-document automatic content extraction. In: Proceed-
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway ings of the 6th international conference on language resources
resource list. Nucleic Acids Res 34(Suppl 1):D504–506 and evaluation (LREC 2008), Marrakesh
Bj€orne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T (2010) Vlachos A (2010) Two strong baselines for the BioNLP 2009
Complex event extraction at PubMed scale. Bioinformatics event extraction task. In: Proceedings of BioNLP’10,
26(12):i382–390 Uppsala, pp 1–9
Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani Voorhees E (2007) Overview of TREC 2007. In: The sixteenth
AK, Wong YW (2004) Comparative experiments on learning text REtrieval conference (TREC 2007) proceedings,
information extractors for proteins and their interactions. Gaithersburg
Artif Intell Med 33(2):139–155
B 142 Bio-Ontologies

further research, which is thus meant to be as universal


Bio-Ontologies and basic as possible. Rhee et al. (2006, p. 345) point to
the quality of definitions in bio-ontologies as their most
Sabina Leonelli important characteristics: “data and knowledge need to
ESRC Centre for Genomics in Society, University of be described in explicit and unambiguous ways that
Exeter, Exeter, Devon, UK must be comprehensible to both human beings and
computer programs.” Indeed, bio-ontologies are
constructed to work across different research contexts,
Synonyms disciplines, and ▶ model organism communities. The
Gene Ontology, for instance, was developed to anno-
Biomedical ontologies; Ontologies for the life sciences tate gene products across community databases in
▶ model organism biology. To guarantee the interop-
erability of bio-ontologies across biological domains,
Definition the Open Biomedical Ontologies Consortium has pro-
posed a set of simple rules for the development of
Bio-ontologies have come to play a crucial role in the ontologies, such as, for instance, the rule of univocity:
dissemination of results across research contexts and in Terms should mean the same wherever they are used,
the extraction of inferences and testable hypotheses so the same term cannot be used to indicate two dif-
from available datasets (▶ Data-intensive Research). ferent processes (Smith et al. 2007). Also, bio-
They are essentially classification tools, whose imme- ontology terms and relations are selected and defined
diate function is to store, organize, and retrieve data via through consensus-seeking mechanisms implemented
databases accessible through the Internet. They consist on a global scale, such as consultations and workshops
of networks of terms that are linked to existing data- with prospective users (Leonelli 2008).
bases. Each term in the network is precisely defined as
describing specific biological entities or processes; it is
used as a keyword with which to identify and retrieve Characteristics
datasets that constitute relevant evidence toward the
investigation of the entities or processes described by Bio-ontologies are an achievement of the bioinfor-
its definition. The terms are related to each other matic effort toward an efficient organization and dis-
through simple structuring rules, such as X “is_a” tribution of data produced by genomic research. They
Y and X is “part_of” Y. provide a framework through which heterogeneous
Bio-ontologies classify datasets in terms of their sets of biological data can be classified, stored, and
potential usefulness to research, as documented by retrieved through freely available, online databases
previous empirical studies. This is different from (Rubin et al. 2008). For the purposes of this entry,
a classification based on information about the prove- I restrict my examination of bio-ontologies to the
nance of data (the place or group in which they were ones collected by the Open Biomedical Ontologies
produced), the model organism on which they were Consortium (http://www.obofoundry.org), an organi-
obtained, or the instrument with which they were cre- zation founded to facilitate communication and coher-
ated. Bio-ontologies do incorporate these other types ence among bio-ontologies with broadly similar
of information through ▶ metadata; yet the key prop- characteristics (Ashburner et al. 2003), and particu-
erty of datasets annotated through bio-ontologies – the larly to the Gene Ontology, which is widely regarded
official criterion for data retrieval in this system – as the most successful case of bio-ontology construc-
remains their association to terms defining their rele- tion to date and used as a template for several other
vance to understanding biological objects and prominent bio-ontologies (Ashburner et al. 2003;
processes. Brazma et al. 2006).
Because of their descriptive nature, bio-ontologies
are often referred to as “representations of biological Structure
knowledge” (e.g., Bard and Rhee 2004). They repre- Bio-ontologies have three defining features: the use of
sent the knowledge needed to share resources for precisely defined terms to refer to biological entities or
Bio-Ontologies 143 B
processes; the use of precisely defined relations among types of relations among terms: “is_a,” “part_of,” and
terms; and the association of each term with datasets. “regulates.” The first category denotes relations of
identity, as in “the nuclear membrane is a membrane”;
Terms the second category denotes mereological relations,
The crucial feature of bio-ontology terms is their sub- such as “the membrane is part of the cell.” The third B
stantive meaning. Bio-ontology terms are not purely category has been implemented in 2008 to signal reg-
conventional signs. They are intended to be descriptive ulatory roles. In other bio-ontologies, for instance, the
of existing biological phenomena – in other words, to ones employed to gather data about phenotypes, the
capture commonly agreed knowledge about the fea- categories of relations available are more numerous
tures and components of biological entities and pro- and complex: for instance, including relations signal-
cesses. The meaning of each term is fixed via a precise ing measurement (“measured_as”) or belonging
definition, in which curators specify the characteristics (“of_a”). To help with standardization and interopera-
of the phenomenon which the term is intended to bility, an ontology of relationship types (▶ Relation-
designate, sometimes including species-specific ship Type Ontology) is currently under development.
exceptions (Baclawski and Niu 2006, p. 35). Each
ontology represents knowledge from one or more spe- Data Association
cific domains. For instance, the Gene Ontology Terms in bio-ontologies function as keywords through
includes three classifications, each of which addresses which existing datasets can be ordered and retrieved.
different groups of phenomena (Ashburner et al. The criterion used for data classification is the eviden-
2003): a process ontology describing biological objec- tial significance of data, i.e., their role as evidence
tives to which the gene or gene product contributes, toward establishing the structure and function of
such as metabolism or signal transduction; a molecular a specific entity or process. In the Gene Ontology,
function ontology representing the biochemical activ- data on a specific gene product are categorized in
ities of gene products, such as the biological functions terms of its known functional significance toward the
of specific proteins; and a cellular component ontol- development of specific traits (e.g., gene FFO2 is asso-
ogy, referring to the places in the cell where a gene ciated with “meristem growth”). Datasets are extracted
product is active (see also entry on ▶ Disease Ontol- (▶ Data Mining) either from digital repositories
ogy). At the same time, to guarantee interoperability containing all available data of a specific type (e.g.,
among ontologies, curators strive to make sure that GenBank), or from scientific publications. Once cura-
terms used in one ontology are compatible with terms tors have selected a set of relevant data as well as a set of
used in other ontologies (Smith et al. 2007). bio-ontology terms to which data should be associated,
they label the data with a unique identifier, a machine-
Relations readable symbol that allows for the automatic analysis
Bio-ontology terms are related through a network of data in cross-reference to other datasets. This process
structure whose basic features are determined by the is called annotation (Hill et al. 2008). Unique identifiers
programming language through which bio-ontologies effectively enable bio-ontologies to function as tools for
are implemented. Within the eXtensible Markup data analysis. For instance, functional annotations made
Language (XML), the standard language used for within the Gene Ontology are used to analyze and
bio-ontologies, the basic relationship between objects correlate microarray data on gene expressions and thus
is called containment and involves a “parent” term and to ground statistical evaluations of clusters of co-
a “child” term. The child term is “contained” by the expressed genes (Rubin et al. 2008).
parent term when the child term represents a subclass
of the parent term. This relationship is fundamental to Curation
the organization of the bio-ontology network, as it Bio-ontology terms have the same tendency of other
supports a hierarchical ordering of the terms used. classificatory categories: that of stabilizing objects of
The criteria used for the hierarchical ordering of knowledge in ways that enable, but at the same time
terms are chosen in relation to the types of phenomena constrain, future research. At the same time, the knowl-
described by the terms in each bio-ontology. Each edge captured by bio-ontologies is bound to change with
subset of the Gene Ontology uses the same three further research, as well as manifesting themselves
B 144 Bio-Ontologies

differently in each research context. Resolving the ten- a machine-readable format, while other types of data
sion between stability and flexibility of classificatory (such as photographs) might need manipulation to be
categories is crucial to bio-ontologies’ success and is stored and retrieved through digital means. The extent
a core responsibility of curators, who engage in adapting to which curators need to manipulate data depends on
and updating bio-ontologies so that they mirror the the format in which data are produced and extracted,
research practices and knowledge of their users. The which is not always compatible with the software and
following activities are good examples (if not exhaus- format supported by any given bio-ontology.
tive) of steps in bio-ontology development that require
expert judgment and intervention by curators. Including Information About Data Provenance
Curators use metadata to classify information about
Data Mining from Publications data provenance. This procedure enables database
When extracting data from publications, curators have users to search through datasets on the basis of
to single out publications that they consider to be reli- context-independent criteria such as bio-ontology
able, updated, and representative for specific datasets. terms, while still allowing them to retrieve information
For instance, when gathering available data on a specific about the production of data when needed. Curators
gene, curators need to choose one or two publications have the responsibility of determining which informa-
that best represent data relevant to a given gene product tion about provenance are most useful to the classifi-
for the purposes of classification. They cannot compile cation of data for circulation and reuse.
data from each relevant publication, as it would be too
time consuming as well as generating inconsistent anno- Defining Terms
tations. Thus, curators choose what they see as the most Definitions that befit bio-ontology terms are not often
up-to-date and accurate publications on a specific gene found in biology textbooks or other publications, since
product, which as a consequence become “representa- those are usually steeped within specific research tra-
tive” publications for that entity. Further, once curators ditions or communities. Constructing a definition that
settle on a specific publication, they have to assess will be acceptable to all potential users of bio-
which data therein contained should be extracted and/ ontology, no matter which tradition they come from
or how the interpretation given within the paper matches and which organism they work on, constitutes a chal-
the terms and definitions already contained in the bio- lenging conceptual task for curators (Leonelli 2012).
ontology. Does the content of the paper warrant the Through activities such as the above, curators medi-
classification of given data under a new bio-ontology ate between the diverse assumptions and practices
term? Or can the contents of the publication be associ- characterizing the work of bio-ontology users and the
ated to one or more existing terms? These choices are need for bio-ontologies to conform to universal
unavoidable when extracting data from a publication requirements such as consistency, computability, ease
and they are impossible to regulate through fixed and of use, and wide intelligibility. Curators’ interventions
objective standards. The reasons why the process of data are crucial to the good functioning of bio-ontologies,
extraction requires manual curation are the same rea- and ideally need to be informed by a wide range of
sons why it cannot be divorced from subjective judg- expertise, including IT and programming skills, train-
ment: The choices involved are informed by a curator’s ing in more than one biological discipline (allowing
expertise and his or her ability to bridge between the them to bridge between different scientific contexts)
original context of publication and the context of bio- and familiarity with experimentation at the bench (so
ontology classification. that they understand observational statements made in
the context of specific experimental settings, as well as
Data Formatting anticipating the expectations of bio-ontologies users).
Without a minimal degree of homogeneity across for-
mats, there is no chance of making data searchable and
displaying them through the visualization tools used Cross-References
within databases. Formatting is also geared toward mak-
ing data computationally manageable: Data generated ▶ Data Mining
through high-throughput technologies are already in ▶ Data-intensive Research
Bio-PEPA 145 B
▶ Disease Ontology Definition
▶ Model Organism
▶ Ontology Lookup Service for Controlled Bio-PEPA (Ciocchetta and Hillston 2009) is an
Vocabularies and Data Annotation extension of the PEPA language (Hillston 1996) for
▶ Relationship Type Ontology dealing with biochemical signaling pathways. B
PEPA is a formal language for describing continuous
time Markov chain originally defined for the
References performance analysis of computer systems. PEPA
allows to quantitatively model and analyze
Ashburner M, Mungall CJ, Lewis SE (2003) Ontologies for large pathway systems and it is supported by a lot of
biologists: a community model for the annotation of genomic
software tools for analysis and stochastic simulations
data. Cold Spring Harb Symp Quant Biol 68:227–236
Baclawski K, Niu T (2006) Ontologies for bioinformatics. The are available. The combined use of PEPA and
MIT Press, Cambridge, MA the probabilistic model checker PRISM describe,
Bard JBL, Rhee S (2004) Ontologies in biology: simulate, and analyze biochemical signaling
design, applications and future challenges. Nat Rev Genet
pathways.
5:213–222
Brazma A, Krestyaninova M, Sarkans U (2006) Standards for A Bio-PEPA system is a formal, intermediate, and
systems biology. Nat Rev Genet 7:593–605 compositional representation of biochemical systems,
Hill DP, Smith B, McAndrews-Hill MS, Blake JA (2008) Gene on which different kinds of analysis can be carried out:
ontology annotations: what they mean and where they come
from. BMC Bioinformatics 9(5):S2
deterministic analysis of the ODE system, stochastic
Leonelli S (2008) Bio-ontologies as tools for integration in simulations (with the ▶ Stochastic Simulation Algo-
biology. Biol Theory 3(1):8–11 rithm), generation and analysis of the underlying
Leonelli S (2012) Classificatory theory in data-intensive science: continuous time Markov chain, and generation of the
the case of open biomedical ontologies. International Studies
input code for the PRISM model checker. Each of
in the Philosophy of Science 26(1):47–65
Rhee SY, Dickerson J, Xu D (2006) Bioinformatics and its the analysis carried on in the different derived formal-
applications in plant biology. Annu Rev Plant Biol isms can be of help for studying different aspects of
57:335–360 the biological model. Moreover, they can be used in
Rubin DL, Shah NH, Noy NF (2008) Biomedical ontologies:
conjunction in order to have a better understanding of
a functional perspective. Brief Bioinform 9(1):75–90
Smith B et al (2007) The OBO Foundry: coordinated evolution the system.
of ontologies to support biomedical data integration. Nat The Bio-PEPA extension modifies PEPA to deal
Biotechnol 25(11):1251–1255 with some features of biological models that are pecu-
liar of those kinds of systems (Calder and Hillston
2009):
• Functional rates: In contrast to PEPA,
individual processes are not able to define their
Bio-PEPA own rates for actions. Instead the rate associated
with an action is specified once, independently
Alida Palmisano1 and Corrado Priami2 of the processes in which the action occurs.
1
Department of Biological Sciences and Department The value of this rate can be specified to be
of Computer Science, Department of Biological a function that depends on the current state of the
Sciences Virginia Tech, Virginia Polytechnic Institute system.
and State University, Blacksburg, VA, USA • Stoichiometry: For each action, as well as its type,
2
Microsoft Research-University of Trento Centre for the stoichiometry or degree of involvement is also
Computational and Systems Biology and DISI, specified.
University of Trento, Povo, Trento, Italy • Parameterized processes: Bio-PEPA has
been designed to support the population-based
reagent-centric style of modeling and so a model
Synonyms consists of a number of sequential components
each representing a distinct species which evolve
PEPA quantitatively (increasing or decreasing amounts).
B 146 BioPortal

SBML-events that represent changes in the


A r1 B r3 D system due to some trigger conditions and to support
the definition of a hierarchy of compartments
that have a fixed structure, but a dynamic varying
r4 size.
C r5 In Fig. 1, an example pathway modeled with Bio-
E PEPA is shown.
r2
Bio-PEPA language has been used for modeling
many biological case studies (for a complete list see
def
http://biopepa.org/).
A = (r 1, 1)¯A + (r 2, 1)A
def
B = (r 1, 1)¯B + (r 2, 1)B + (r 3, 1)¯B
def
C = (r 2, 1)C
def Cross-References
D = (r 4, 1)¯D + (r 3, 1)D
def
E = (r 4, 1)¯E + (r 5, 1)E ▶ Cell Cycle Modeling, Process Algebra
def
System = A(A0) 
 B(B0) 
 C(C0) 
 D(D0) 
 E(E0)
∗ ∗ ∗ ∗

Bio-PEPA, Fig. 1 A small synthetic pathway of five reactions:


A + B ! C, C ! A + B, B ! D, D + E ! B, ! E. The
equations exhibit various combinations of increasing/ References
decreasing/preserved reagents between the left- and right-hand
sides. The meaning of the code in the lower part of the figure is Calder M, Hillston J (2009) Process algebra modelling styles for
the following: in each term in the form of “(r, k) op S”, r is biomolecular processes. LNCS 5750:1–25
an action name and can be viewed as the name or label of Ciocchetta F, Hillston J (2009) Bio-PEPA: a framework for the
a reaction, k is the stoichiometry coefficient of the species, modelling and analysis of biological systems. Theor Comput
the combinator “op” represents the role of the element in the Sci 410:3065–3084
reaction (specifically, down arrow denotes the role of reactant, Hillston J (1996) A compositional approach to performance
up arrow product), and S the state after the reaction is fired. modelling. Cambridge University Press, Cambridge
The operator + expresses the choice between possible actions.
The system is defined as the synchronization between
components A, B, C, D, and E on the whole set of common
action names (“ * ” symbol). S(S0) represent the initial quantities
of each species

BioPortal

Mark A. Musen
Thus in order to capture the state of a system Stanford Center for Biomedical Informatics Research,
each component is parameterized recording its Stanford University, Stanford, CA, USA
current level.
• Differentiated prefix: For each action (reaction)
that a component is involved in it records its role Definition
within that reaction, e.g., reactant, product, inhibi-
tor etc. This enables the appropriate values to be BioPortal is an online repository of biomedical termi-
used in the functional rate associated with this nologies and ontologies maintained by the National
reaction. Center for Biomedical Ontology. BioPortal contains
Recently, some extensions of Bio-PEPA have the world’s largest collection of biomedical
been defined in order to represent some specific ontologies, which it makes available using
features of some biochemical networks. Specifically, a standard Web interface and a standard API at
the language has been extended to support http://bioportal.bioontology.org.
Bipartite Graph 147 B
Cross-References The BioSPI project was the main inspiration for
another framework designed around the pi-calculus
▶ Protégé Ontology Editor language: SPiM (Phillips and Cardelli 2007). The sim-
ulation algorithm is based on the ▶ stochastic simula-
tion algorithm and the language features a simple B
graphical notation for modeling a range of biological
BioSPI systems.

Alida Palmisano1 and Corrado Priami2


1
Department of Biological Sciences and Department Cross-References
of Computer Science, Department of Biological
Sciences Virginia Tech, Virginia Polytechnic Institute ▶ Cell Cycle Modeling, Process Algebra
and State University, Blacksburg, VA, USA
2
Microsoft Research-University of Trento Centre for
Computational and Systems Biology and DISI, References
University of Trento, Povo, Trento, Italy
Barkai N, Leibler S (2000) Biological rhythms: circadian clocks
limited by noise. Nature 403:267–268
Phillips A, Cardelli L (2007) Efficient, correct simulation of
Synonyms biological processes in the stochastic pi-calculus. In:
CMSB. LNCS, vol 4695. Springer, p 184
Stochastic pi-calculus simulator Regev A, Silverman W, Shapiro E (2001) Representation
and simulation of biochemical processes using the
pi-calculus process algebra. In: Proceedings of the pacific
symposium of biocomputing 2001 (PSB2001), vol 6.
Definition pp 459–470

BioSPI is a computer application developed for


simulating the behavior of biochemical systems spec-
ified in the stochastic version of pi-calculus. It is Bipartite Graph
based on the Logix system, which implements flat
concurrent prolog (FCP). The use of FCP allows Marie Lisandra Zepeda-Mendoza and Osbaldo
both mobility and synchronized communication, two Resendis-Antonio
of the major features of the pi-calculus. This frame- Center for Genomics Sciences-UNAM,
work includes several debugging tools for tracing Universidad Nacional Autónoma de México,
stepwise execution of programs, including tree Cuernavaca, Morelos, Mexico
traces and step-by-step execution mode. In stochastic
simulations, the number of processes is monitored
and recorded, as the internal clock progresses, to Synonyms
produce a fully ordered trace of all events (Regev
et al. 2001). Bigraph
This framework has been used for modeling many
biological systems like circadian clock (Barkai and
Leibler 2000), cell cycle (see ▶ Cell Cycle Modeling, Definition
Process Algebra), metabolic pathways, signal trans-
duction networks, etc. A bipartite graph is one whose vertices, V, can be
All the software tools and examples related to BioSPI divided into two independent sets, V1 and V2, and
can be found at its official Web site (http://www. every edge of the graph connects one vertex in V1 to
wisdom.weizmann.ac.il/~biospi/index_main.html). one vertex in V2 (Skiena 1990). If every vertex of V1 is
B 148 Bipartite Network

connected to every vertex of V2 the graph is called Cross-References


a complete bipartite graph. If V1 and V2 have equal
cardinality, meaning they have same number of verti- ▶ Mitosis
ces, the graph is called a balanced bipartite graph.
Another way to view a bipartite graph is by coloring
the two vertices with different colors. Say all vertices
of set V1 will be colored green and all vertices of set V2
will be colored red, then each edge will connect Bistability
vertices of different colors.
This view helps to understand the fact that a graph Jinzhi Lei
that does not contain odd-length circles is a bipartite Zhou Pei-Yuan Center for Applied Mathematics,
graph, because if it had an odd number of vertices, one Tsinghua University of Beijing, Beijing, China
of the edges would have endpoints of the same color.

Synonyms
Cross-References
Bimodality
▶ Modules, Identification Methods and Biological
Function
Definition

References Bistability is a fundamental phenomenon in nature


by which a system can be resting in two states.
Skiena S (1990) Coloring bipartite graphs, }5.5.2. In: In biological systems, bistability is a situation in
Skiena S (ed) Implementing discrete mathematics:
combinatorics and graph theory with mathematica. Addison-
which two stable states coexist in a population of
Wesley, Reading, p 213 interest.
Bistability is key for understanding basic
phenomena of cellular functioning, such as decision-
making processes in cell cycle progression,
Bipartite Network cellular differentiation, and apoptosis. In a population
with bistability, we typically observe bimodal
▶ MicroRNA-mRNA Regulation Networks distribution.
From a physical point of view, the bistability of
a system comes from the fact that the free energy of
the whole system possesses two local minimums that
Bipolar Attachment are separated by a peak (maximum).
In genetic network, a ▶ positive feedback with
Rosella Visintin cooperative binding is a necessary condition to achieve
IEO, European Institute of Oncology, bistability. The positive feedback often appears as
Milan, Italy a double-▶ negative feedback.
Figure 1 shows the basic characteristics of a
bistable genetic system. In a bistable system, a key
Definition regulator protein X activates its own expression to
form a positive feedback. The feedback can be formed
Bipolar attachment is achieved when sister chromatids either directly, through a cooperative binding site of
bind to microtubules emanating from the opposite protein X to its own promoter, or indirectly, through
poles. a signal-transduction cascade via other regulators.
Bisulfite Conversion 149 B
b

a dX
= fp(X) - fd(X) fp(X) B
dt

Rates
fd(X)
fp(X) fd(X)
X
Production Degradation

Bistability, Fig. 1 Characteristics of bistable systems (Smits shows the change of X over time (dX/dt). (b) Plot of the functions
et al. 2006). (a) Schematic illustration of a bistable gene regula- fp(X) and fd(X) within the same graph. The intersection points
tion. The functions fp(X) and fd(X) represent the production and show the steady states
degradation rates of X, respectively. The differential equation

The production rate of X can be expressed as a Hill


function: Bistable Switch

f1 X n ▶ Toggle Switch, Switching Network


fp ðXÞ ¼ fo þ ; (1)
K n þ Xn

Here n is the Hill coefficient. The degradation


(or dilution) of the protein can be described by Bisulfite Conversion
a linear-type function. The change of X over time is
described by a differential equation combining Vani Brahmachari and Shruti Jain
the production and degradation rates. The interaction Dr. B. R. Ambedkar Center for Biomedical Research,
of these two functions gives the steady states of the University of Delhi, Delhi, India
system in which the change of X is equal to 0 (Fig. 1b).
When the positive feedback is cooperative, i.e., the
Hill coefficient is n > 1, there are three steady Synonyms
states, two of which are stable and separated by an
unstable state. Bisulfite sequencing: methylation status analysis

Cross-References Definition

▶ Cell Cycle Dynamics, Irreversibility Bisulfite conversion is a frequently used technique to


directly identify the methylated status of cytosine
residues in DNA, besides the method based on using
▶ methylation-sensitive restriction endonucleases
References (Frommer et al. 1992). On treatment of DNA with
Smits WK, Kuipers OP, Veening JW (2006) Phenotypic varia-
sodium bisulfite, unmethylated cytosine residues are
tion in bacteria: the role of feedback regulation. Nat Rev converted to uracil, and on replication of this DNA,
Microbiol 4:259–271 thymine is incorporated at these positions. On the other
B 150 Bisulfite Sequencing: Methylation Status Analysis

Bisulfite Conversion, a STEP 1


Fig. 1 Schematic NH2 NH2
Sulphonation
representation of bisulfite
sequencing. (a) Steps of N NaHSO3 +HN b
chemical conversion. (b) The
sequence obtained after H+ AT... mCG... GC... AACG...AT... mCG
O O SO3+ TA... GCm ...CG...TTGC...TA... GCm
conversion is shown; cytidine N N
(C) is replaced by thymidine H H
Bisulfite
(T) at positions where there is Cytosine Cytosine
conversion
no methylation (arrow) while sulphonate
methylated cytidine remains H2O STEP 2 AT... mCG... GU... AAUG...AT... mCG
unchanged (arrowhead). The NH4+ Hydrolytic TA... GCm ...UG...TTGU...TA... GCm
relevant dinucleotides are Deamination
O O PCR
shown in bold font
Amplification
HN NaOH HN
AT... CG... AT... AATA...AT... CG
OH−
O O SO3+ TA... GC... TA...TTAT...TA.... GC
N STEP 3 N
H H
Alkali
Uracil Uracil
Desulphonation
sulphonate

hand, methylated cytosine (5-methylcytosine)


remains unaffected by the bisulfite treatment (Fig. 1). BlenX
When the sequence of bisulfite converted DNA
is determined, unmethylated cytosine is read as Alida Palmisano1 and Corrado Priami2
1
thymine while 5-methyl cytosine remains as cytosine. Department of Biological Sciences and Department
On comparison with DNA sequence without bisulfite of Computer Science, Department of Biological
conversion, the methylation pattern of the DNA can Sciences Virginia Tech, Virginia Polytechnic Institute
be deciphered. and State University, Blacksburg, VA, USA
2
Microsoft Research-University of Trento Centre for
Computational and Systems Biology and DISI,
Cross-References University of Trento, Povo, Trento, Italy

▶ Epigenetics
▶ Methylation-sensitive Restriction Endonucleases Synonyms

Beta workbench; BetaWB; CoSBiLab


References

Frommer M, Mcdonald LE, Millar DS, Collis CM, Watt F, Definition


Grigg GW, Molloy PL, Paul CL (1992) A genomic sequenc-
ing protocol that yields a positive display of
5-methylcytosine residues in individual DNA strands. Proc BlenX (Dematté et al. 2008, 2010) is a stochastic lan-
Nat Acad Sci USA 89:1827–1831 guage implementing the b-binders process algebra
(Priami and Quaglia 2005) explicitly designed for
modeling biological systems. A BlenX program is
made up of a program file for the system structure, an
Bisulfite Sequencing: Methylation Status interfaces file for the quantitative information about the
Analysis system, and an optional declaration file for the user-
defined variables and functions. A BlenX program can
▶ Bisulfite Conversion be executed by the Beta Workbench software tool
BlenX 151 B
(which can be freely downloaded at http://www.cosbi. behavior (see Fig. 1). For example, in a box modeling
eu/index.php/research/prototypes together with the a protein, binders may represent sensing and effecting
CoSBiLab companion tools) that implements an effi- domains. Sensing domains are the places where the
cient variant of the ▶ stochastic simulation algorithm. protein receives signals, effecting domains are the places
The basic metaphor of BlenX is that a biological that a protein uses for propagating signals, and the inter- B
entity (i.e., a component that is able to interact with nal structure codifies for mechanism that transforms an
other components to accomplish some biological func- input signal into a protein conformational change, which
tions) is represented by a box. A box has interfaces (also can result in the activation or deactivation of another
called binders) and an internal structure that drives its domain. The exchanging of signals can happen between
boxes whose binders have a certain degree of affinity,
which codes the strength of their interaction.
site2
The internal program of a box is described by
S2_act a process built on top of the a set of primitives, derived
from the classical process algebra primitives con-
structs (i.e., input/output actions on communication
site1 S1_Ph P S3_act site3 channels, sequential/alternative/parallel composition
of actions) and some other primitives specific for the
BlenX language (i.e., changing the type/state of an
CycB interface and conditional statements).
In addition, BlenX allows the definition of events
BlenX, Fig. 1 Graphical notation of a BlenX box. The small which are statements, or verbs, that are executed with
squares on the border of the box are the binders; site1, site2, and a specified rate and/or when some conditions of the
site3 are the interfaces subjects (omitted when not necessary); system are satisfied. Events can split an entity into two
S1_Ph, S2_act, and S3_act are the interfaces types; the line
entities, join two entities into a single one, and add or
surrounding the interface with type S3_act indicates that the
interface is hidden; P is the internal program and CycB is the remove entities into or from the system. The final
name of the box peculiar characteristic of BlenX is the possibility of

BlenX, Fig. 2 Coding a basic enzymatic reaction in BlenX. is kb. If E and S are connected, they share a private channel
Upper part: boxes are pictured as different shapes where corners through which the synchronization of the output action x!() and
represent molecule interfaces (with their associate type). Lower input action y?() can happen at rate kc. If this happens the binder
part: The BlenX model of this reaction network is coded in three of S changes (at infinite rate) its type from DS to DP (see the
files: the program file (containing the definitions of the species), internal code of S), allowing the decomplexation of the two
the interfaces file (containing the affinities between the types), boxes (because the affinity between DE and DP is infinite for
and the declaration file (containing the definitions of the con- the decomplexation event, see the interfaces file). The final state
stants). The complexation between E and S happens because of of the system is the E species back in its initial state and a new
the affinity of DE and DS types at rate ka. This reaction is product species P
reversible, so the decomplexation rate between the same types
B 152 BLOcks SUbstitution Matrix (BLOSUM)

creating complex-on-the-fly (i.e., during the simula- BLOSUM-x matrix. The probability pi,j of a unique
tion): the complexation event is performed when two pair of amino acids at a site (Ai and Aj) is computed as
boxes have interfaces with compatible types and it is well as the probability pi of the unique amino acid to
p
used for modeling the creation of a private and perma- be Ai. Then the log odd ratio log pi;ji is computed and
nent (until the complementary decomplexation event) recorded in the (i,j) entry of the matrix. The BLOSUM
communication channel between two specific boxes. matrix comprises of information regarding the
Those two boxes will maintain their individual internal 20 amino acid properties similarity and yields delicate
behavior but they will be able to communicate on the evolutional and chemical association among the
shared channel: the only thing that the user needs to 20 amino acids.
define is a rate of complexation/decomplexation
between two binder types.
An example of a basic enzymatic reaction modeled Cross-References
in BlenX is shown in Fig. 2.
▶ TAP Translocator, In Silico Prediction
Cross-References

▶ Cell Cycle Modeling, Process Algebra


References

Henikoff S, Henikoff JG (1992) Amino acid substitution matrics


References from protein blocks. Proc Natl Acad Sci USA 89:10915–10919

Dematté L, Priami C, Romanel A (2008) The BlenX language:


a tutorial. In: SFM 2008. LNCS, vol. 5016, Springer,
pp 313–365
Dematté L, Larcher R, Palmisano A, Priami C, Romanel A
Bone Marrow-derived Cells
(2010) Programming biology in BlenX. In: Sangdun Choi
(ed) Systems biology for signaling networks, vol. 1. Springer, Mary Helen Barcellos-Hoff
pp 777–821 Department of Radiation Oncology and Cell Biology,
Priami C, Quaglia P (2005) Beta binders for biological interac-
New York University School of Medicine,
tions. In: Proceedings of the second international workshop
on computational methods in systems biology (CMSB04). New York, NY, USA
LNBI, vol. 3082. Springer, pp 21–34

Definition

BLOcks SUbstitution Matrix (BLOSUM) Bone marrow-derived cells (BMDC) are derived from
hematopoietic stem cells that give rise to diverse cell
Joo Chuan Tong types present in blood and lymphatic organs, some of
Data Mining Department, Institute for Infocomm which are recruited to other organs.
Research, Singapore, Singapore
Department of Biochemistry, Yong Loo Lin School of
Medicine, National University of Singapore, Characteristics
Singapore
The term, bone marrow-derived cells (BMDC),
encompasses a broad class of undifferentiated cells
Definition known to originate from the bone marrow (▶ Bone
Marrow-derived Cells) that are dispersed among tis-
BLOSUM matices are calculated from blocks of sues. The plasticity of these cells has been implicated
highly conserved amino acid sequences with a certain in diverse processes from development to pathology
degree of similarity between these sequences. Matrix but remains poorly understood. Bone marrow trans-
constructed from no more than x% similarity is called plantation and lineage-tracing experiments have
Bone Marrow-derived Cells 153 B
provided strong experimental evidence that BMDC the ability to rescue lethally irradiated mice.
can generate distinct cell types, including cardiac mus- The uncommitted, or pluripotent, gives rise to five
cle, liver cell types, neuronal and nonneuronal cell progenitor cells that give rise to seven distinct cell
types of the brain, as well as endothelial cells and types: erythrocyte, lymphocyte, neutrophil, eosino-
osteoblasts. These multiple cell types could have orig- phil, basophil, monocytes, and platelets. For many B
inated from either HSC or MSC within bone marrow or of these cells differentiation occurs within the
from other less well-defined precursors. Injury may bone marrow, but for lymphocytes and monocytes,
mobilize BMDC into circulation and recruitment and maturation occurs after release into circulation
differentiation at damage sites (Orlic et al. 2001). and residence in tissues. In the case of lymphocytes,
Inflammation can also mobilize BMDC, and chronic this occurs in the tissues of the immune system.
inflammation may drive BMDC to actively participate Monocytes circulate for 3 days before migrating
in pathological process, including cancer (Houghton into tissues where they become tissue-resident
et al. 2004). macrophages. Dendritic cells, an important antigen-
Bone marrow, a tissue that replenishes the circulat- presenting cell residing in tissues, can be either
ing cells of the blood and immune system, contains lymphoid or myeloid in origin.
many cell types, including stroma, vascular cells, adi- BMDC are mobilized in cancer-bearing animals
pocytes, osteoblasts, and osteoclasts. These cells form and in humans (Wels et al. 2008). Many studies were
the microenvironment, or niche, in which mesenchy- initiated following the observation that recruitment of
mal stem cells (MSC) and hematopoietic stem cells a subset of hematopoietic and vascular progenitors
(HSC) reside. Both stem cells originate within the bone were required to assemble new vessels in tumors
marrow, but HSC mostly function to regenerate (Lyden et al. 2001). The key concepts are that BMDC
hematopoietic cells within the marrow and the circu- have temporally and spatially restricted roles in
lation, while MSC mostly relocate to produce a variety supporting in specific types of tumors, that recruitment
of cell types found in diverse organs. of even small numbers of BMDC can play a crucial
MSC replicate as undifferentiated cells and have the role in catalyzing tumor progression and that BMDC
potential to differentiate to lineages of mesenchymal may precede or presage cancer, implicating them in the
tissues, including bone, cartilage, fat, tendon, muscle, very earliest manifestations of cancer (Kaplan et al.
and marrow ▶ stroma (Pittenger et al. 1999). A MSC 2005).
population resides in bone marrow, but cells with The variety, rarity, and plasticity of different
similar molecular, phenotypic, and functional proper- BMDC subsets complicate their analysis and respec-
ties, at least in vitro, have been identified in a wide tive importance and specific functions in response to
range of tissues. MSC populations in adult tissues and injury, cancer, and pathology at large. An important
organs contribute to tissue cell turnover and also tool has been implementation of bone marrow trans-
respond to tissue damage, for example, in wound plantation in conjunction with various genetic reporter
healing (▶ Wound Response). systems and immunologic markers. One idea is that
HSC give rise to three main lineages, erythroid, BMDC can be used to treat injured tissues and promote
myeloid and lymphoid, which respectively form red repair, and another is that blocking BMDC recruitment
blood cells, monocytes, and immune cells (Morrison and/or action can prevent disease, but there is still
et al. 1997). Extensive single-cell transplantation of much to be learned about the recruitment, behavior,
highly purified stem cells demonstrated that the clonal and stability of these cells across development, during
contribution to the different blood cell lineages varies homeostasis, and in disease.
significantly and can be stably maintained through
serial passaging, providing evidence that the pool of
HSCs comprises at least two and possibly more distinct Cross-References
clonal subtypes imbued with differential lineage and
self-renewal potential (Dykstra et al. 2007). A great ▶ Adaptive Immune System
many functional and characterization studies have ▶ Bone Marrow-derived Cells
established a functional hierarchy of HSC using assays ▶ Fibroblasts
for growth potential under restrictive conditions and ▶ Stroma
B 154 Bonferroni Correction

References When deciding whether to accept or reject an indi-


vidual null hypothesis, a probability threshold, a, is
Dykstra B, Kent D, Bowie M, McCaffrey L, Hamilton M, utilized to control the likelihood of false positives. In
Lyons K,Lee S-J, Brinkman R, Eaves C (2007) Long-term
multiple hypothesis testing, an increased number of
propagation of distinct hematopoietic differentiation pro-
grams in vivo. Cell Stem Cell 1(2):218–229 samples, n, in a given family increases the probability
Houghton J, Stoicov C, Nomura S, Rogers AB, Carlson J, Li H, that false positives will arise within that family at the
Cai X, Fox JG, Goldenring JR, Wang TC (2004) Gastric same probability threshold, a. Thus, the threshold, a,
cancer originating from bone marrow-derived cells. Science
should be lowered to control the total number of false
306(5701):1568–1571. doi:10.1126/science.1099513
Kaplan RN, Riba RD, Zacharoulis S, Bramley AH, Vincent L, positives.
Costa C, MacDonald DD, Jin DK, Shido K, Kerns SA, Zhu Z, The Bonferroni correction controls the number of
Hicklin D, Wu Y, Port JL, Altorki N, Port ER, Ruggero D, false positives arising in each family by using
Shmelkov SV, Jensen KK, Rafii S, Lyden D (2005)
a probability threshold of a/n for each observation
VEGFR1-positive haematopoietic bone marrow progenitors
initiate the pre-metastatic niche. Nature 438:820–827 within the family. By guaranteeing that the probability
Lyden D, Hattori K, Dias S, Costa C, Blaikie P, Butros L, of a test being accepted within a family is the same as
Chadburn A, Heissig B, Marks W, Witte L, Wu Y, or less than the probability of any individual test being
Hicklin D, Zhu Z, Hackett NR, Crystal RG, Moore MAS,
accepted, the Bonferroni correction is extremely con-
Hajjar KA, Manova K, Benezra R, Rafii S (2001) Impaired
recruitment of bone-marrow-derived endothelial and hema- servative. When the number of comparisons is large,
topoietic precursor cells blocks tumor angiogenesis and use of the Bonferroni correction should be limited and
growth. Nat Med 7(11):1194–1201 replaced by methods like the ▶ Benjamini-Hochberg
Morrison SJ, Wandycz AM, Hemmati HD, Wright DE,
Method.
Weissman IL (1997) Identification of a lineage of
multipotent hematopoietic progenitors. Development
124(10):1929–1939
Orlic D, Kajstura J, Chimenti S, Limana F, Jakoniuk I, Quaini F,
Nadal-Ginard B, Bodine DM, Leri A, Anversa P (2001)
References
Mobilized bone marrow cells repair the infarcted heart,
improving function and survival. Proc Natl Acad Sci Abdi H (2007) The Bonferroni and Sidak corrections for multi-
98(18):10344–10349. doi:10.1073/pnas.181177898 ple comparisons. In: Salkind NJ (ed) Encyclopedia of mea-
Pittenger MF, Mackay AM, Beck SC, Jaiswal RK, Douglas R, surement and statistics. Sage, Thousand Oaks
Mosca JD, Moorman MA, Simonetti DW, Craig S, Marshak Bonferroni CE (1936) Teoria statistica delle classi e calcolo
DR (1999) Multilineage potential of adult human mesenchy- delle probabilit ‘a. Pubblicazioni del R Istituto Superiore di
mal stem cells. Science 284(5411):143–147. doi:10.1126/ Scienze Economiche e Commerciali di Firenze 8:3–62
science.284.5411.143
Wels J, Kaplan RN, Rafii S, Lyden D (2008) Migratory neigh-
bors and distant invaders: tumor-associated niche cells.
Genes Dev 22(5):559–574
Boolean Function

Xiujun Zhang
Bonferroni Correction Institute of Systems Biology, Shanghai University,
Shanghai, China
Winston Haynes
Seattle Children’s Research Institute, Seatlle,
WA, USA Synonyms

Switching function
Definition

In ▶ Multiple Hypothesis Testing, the Bonferroni cor- Definition


rection is a conservative method for probability
thresholding to control the occurrence of false In mathematics, a Boolean variable B has the value
positives. 0 or 1. A Boolean function is a function with the form
Boolean Model 155 B
f: Bn ! B, where Bn ¼ B  B  . . .  B ¼ {0, 1} therefore we adopt it in our discussion. We note that
 {0, 1}  . . .  {0, 1}, the Cartesian product of B with a BN model is a deterministic model and the only
itself to n factors (Chatterjee 2005). randomness comes from its initial state. Considering
For example, for n ¼ 2, B2 ¼ {0, 1}  {0, 1} ¼ {00, the inherent deterministic directionality in BNs as well
01, 10, 11}. Thus, a Boolean function of two variables as only a finite number of possible states, it is easy B
is a function from B2 to B which assigns every ordered to see that the BN will eventually enter into a set of
pair of elements of B to a unique element of B. “AND” state(s) and stay there forever. These states are called
function and “OR” function are two important Boolean attractors and the states that lead to them comprise
functions of two variables (Koshy 2004). their basins of attraction. The number of transitions
needed to return to a given state in an attractor is called
cycle length.
References Since a BN is a deterministic model, to better model
a genetic regulatory network, one has to overcome the
Chatterjee D. Abstract algebra. New Delhi: Prentice Hall; 2005. deterministic rigidity of a BN. It is therefore natural to
Koshy T. Discrete mathematics with applications. Burlington:
extend BNs to a stochastic setting and this results in
Elsevier; 2004
a new class of mathematical models, namely, probabi-
listic Boolean networks (PBNs). The basic idea is
described as follows. In a PBN, each gene can have
more than one Boolean function with a certain selec-
Boolean Model tion probability assigned to it. The dynamics of a PBN
can be studied by using Markov chain theory, and the
Xi Chen, Wai-Ki Ching and Nam-Kiu Tsing network behavior is characterized by its transition
Advanced Modeling and Applied Computing probability matrix and its steady-state probability dis-
Laboratory, Department of Mathematics, University of tribution. One can understand a genetic regulatory
Hong Kong, Hong Kong, China network and identify the influence of different genes
via such a network. Then, therapeutic gene interven-
tion and gene control policies can be developed and
Synonyms studied. Similar to BN, there are two classes of PBNs :
synchronous PBNs and asynchronous PBNs. Synchro-
Boolean networks; Probabilistic boolean networks nous PBNs are more popular as they are easier to be
analyzed. Moreover, there are two types of PBNs:
instantaneously random PBNs and context-sensitive
Definition PBNs. The instantaneously random PBN is essentially
a collection of Boolean networks in which, at any
Boolean network (BN) is known as a popular mathe- discrete time point, the gene state vector transforms
matical model for modeling genetic regulatory net- according to the rules of one of the constituent net-
works. The BN model was first proposed by works. That is, the rule for updating each gene is
Kauffman (1969). In a BN model, the gene expression randomly chosen at each time step from several possi-
states are quantized into only two levels: on and off ble rules in accordance with a fixed probability distri-
(represented as 1 and 0). The target gene is determined bution. The context-sensitive PBN is an extension of
by several other genes called its input genes according PBN. It differs from instantaneously random PBNs for
to regulation rules (given as Boolean functions). A BN two reasons: (1) each gene is allowed to change its
is said to be well defined when all the input genes and activity with a small probability at each time instant,
Boolean functions are given (Kauffman (1993)). There regardless of which constituent BN is active at the
are two types of BN models: synchronous BNs and moment, and (2) switching between constituent BNs
asynchronous BNs, depending on whether or not the occurs with a small probability, such that a PBN may
states of nodes are updated synchronously. Synchro- remain in the same constituent BN for a longer time
nous model is more popular and easier to analyze and interval (Pal et al. 2005).
B 156 Boolean Model

Characteristics Boolean Model, Table 1 The truth table


State v1(t) v2(t) f (1) f (2)
A Boolean Network (BN) G(V, F) actually consists of 1 0 0 1 1
a set of n nodes (where each node corresponds to 2 0 1 1 0
a gene): 3 1 0 1 0
4 1 1 0 0
V ¼ fv1 ; v2 ; . . . ; vn g:

and a list of Boolean functions (which represent the


0 1
regulatory rules for nodes): 0 0 0 1
B0 0 0 0C
B1 ¼ B
@0
C (2)
F ¼ f f1 ; f2 ; . . . ; fn g: 1 1 0A
1 0 0 0
where fi : {0, 1}n ! {0, 1}. Define vi(t) to be the state
(0 or 1) of the node vi at time t. The rules of the The truth table provides the one-step transition
regulatory interactions among the genes are then probability (0 or 1 in the case of BN) between any
represented by two states. We observe that there are two cycles and
they are given as follows: (a) (0, 0) ↔ (1, 1), and (b)
vi ðt þ 1Þ ¼ fi ðvðtÞÞ; i ¼ 1; 2; . . . ; n: (1, 0) ↔ (1, 0). Thus, State 3 is an attractor cycle of
period (length) one and States 1 and 4 form an attractor
Here, we let v(t) ¼ (v1(t), v2(t),. . ., vn(t) )T, which is cycle of period two. We remark that there is a one-to-
called the Gene Activity Profile (GAP). The GAP can one relation between a BN and its corresponding BN
take any possible form (state) from the set matrix.
Since BN is a deterministic model, it may not be
n o able to capture the properties of a biological system
S ¼ ðv1 ; v2 ; . . . ; vn ÞT : vi 2 f0; 1g (1) which processes certain randomness. Moreover, the
microarray data sets used to infer the network structure
and thus totally there are 2n possible states in the are usually not accurate because of experimental noise
network. Hence, v(t + 1) is determined from v(t) in in the complex measurement process. To address this,
a BN. As mentioned before, given an initial state v(0), a stochastic model, namely, probabilistic Boolean
the BN will enter into a cycle called attractor. An network (PBN) was developed by Shmulevich,
attractor consisting of only one global state is called Dougherty, Kim, and Zhang (2002). In a PBN, each
a singleton attractor. Otherwise, it is called a cyclic gene instead of having only one Boolean function,
attractor with period p if it consists of p states. there are a number of Boolean functions (predictor
ðiÞ
The following is an example of a BN having two functions) fj ð j ¼ 1; 2; . . . ; lðiÞÞ to be chosen for
genes with the truth table given in Table 1. determining the state of gene vi. The probability of
ðiÞ ðiÞ
From the truth table, there are four states and they choosing fj as the predictor function is cj
are (0, 0), (0, 1), (1, 0), and (1, 1). Let us label them as (Shmulevich and Dougherty 2007),
1, 2, 3, and 4, respectively. We note that if the current
state of the BN is 3, the network will stay with State 3
ðiÞ
X
lðiÞ
ðiÞ
in the next step (with probability one). Suppose the 0 cj 1 and cj ¼ 1 for
current state is 2, the network will go to State 3 in the j¼1 (3)
next step (with probability one). Similarly if the cur-
j ¼ 1; 2; . . . ; lðiÞ; i ¼ 1; 2; . . . ; n:
rent state is 1, the network will go to State 4 in the next
step (with probability one). Finally, if the current net-
ðiÞ
work state is 4, the BN will then go to State 1 in next The probability cj can be estimated by using the
step (with probability one). The transition-probability statistical method Coefficient of Determination (COD)
matrix (Boolean network matrix) of the 2-gene BN is developed by Dougherty, Kim, and Chen (2000) with
then given by real gene expression data sets.
Boolean Networks 157 B
Cross-References
Boolean Networks
▶ Global State, Boolean Model
▶ Local State, Boolean Model Zhong-Yuan Zhang
▶ Probabilistic Boolean Network School of Statistics, Central University of Finance and B
▶ Regulation Function Economics, Beijing, China
▶ Synchronous Model

Synonyms
References
Kauffman network; N-K model
Dougherty E, Kim S, Chen Y (2000) Coefficient of determina-
tion in nonlinear signal processing. Signal Process
80:2219–2235
Kauffman S (1969) Metabolic stability and epigenesis in Definition
randomly constructed genetic nets. J Theor Biol
22:437–467 A Boolean network model is composed of two parts:
Kauffman S (1993) The origins of order: self-organization
(1) A directed and loops-allowed graph with N nodes
and selection in evolution. Oxford University Press,
New York representing the genes where each gene is connected
Pal R, Datta A, Bittner M, Dougherty E (2005) Intervention in with only K other genes (hence, Boolean network is
context-sensitive probabilistic boolean networks. Bioinfor- also called N-K model) (Jong 2002; Kauffman 1969;
matics 21:1211–1218
L€ahdesm€aki et al. 2003). Each gene only has two
Shmulevich I, Dougherty E, Kim S, Zhang W (2002)
From boolean to probabilistic boolean networks as states: 1 (on) means expressed and 0 (off) means silent.
models of genetic regulatory networks. Proc IEEE (2) Boolean functions and states of genes, where the
90:1778–1792 state of each gene i, i ¼ 1, 2, ···, N, are modeled as the
Shmulevich I, Dougherty E (2007) Genomic signal processing.
output of Boolean function fi with K (or at most K)
Princeton University Press, New Jersey
inputs specified by the graph. Once the network and the
Boolean functions are determined, the model’s state at
any discrete time step is uniquely determined by the
initial assignment of the states of the genes. Since the
Boolean Modeling of Cell Cycle Control number of the possible states is finite, the model will
inevitably fall into an attractor at last (point attractor or
▶ Cell Cycle Modeling Using Logical Rules cyclic attractor; see Fig. 1).

X T T+1 1 1 0
Z X
0 0
1 1 0 1 1
1 0 1
1 0 0
T T+1 T T+1
X Y X Y Z
Y Z
0 0 0 0 0
0 1 1 1 1 1
1 1
1 0 0
1 1 1

Boolean Networks, Fig. 1 Left: An example of Boolean net- fz(1, 0) ¼ 0; fz(0, 1) ¼ 0; fz(1, 1) ¼ 1. Right: The states trajectories
work. X, Y, and Z are three genes. The Boolean functions of them of two particular realizations of the Boolean network. The top
are given by the truth tables placed next to them. For example, trajectory falls into a cyclic attractor, and the bottom one falls
the Boolean function fz of gene Z at time T is fz(0, 0) ¼ 0; into a point attractor
B 158 Bootstrap Aggregating

To find the Boolean networks for real gene expres- Definition


sion dataset, many algorithms have been developed.
The bootstrap is a data resampling strategy (Efron
1983; Efron and Tibshirani 1997; Duda et al. 2001).
Cross-References This resampling provides an estimate for an unknown
population parameter y. Let a data set D be a sample of
▶ Boolean Model n data points (or cases) xi, i ¼ 1..n, from the population
▶ Identification of Gene Regulatory Networks, under study. The values of these cases are assumed to
Machine Learning be the outcomes of independent and identically distrib-
uted random variables with (unknown) probability
density function. Without specific assumptions or
References a particular model for this probability function, the
bootstrap is non-parametric, otherwise parametric.
Jong HD (2002) Modeling and simulation of genetic The parameter y is a function of the values in the
regulatory systems: a literature review. J Comput Biol
population, y ¼ T(x), for example, the population
9(1):67–103
Kauffman SA (1969) Metabolic stability and epigenesis mean (Dixon 2006). This function also determines
in randomly constructed genetic nets. J Theor Biol the sample statistic, ^y ¼ T(x), for example, the mean
22:437–467 of all values in D (i.e., the sample mean). To estimate
L€ahdesm€aki H, Shmulevich I, Yli-Harja O (2003) On learning
gene regulatory networks under the Boolean network model.
the sampling distribution Fy(x) of the function T,
Mach Learn 52:147–167 a bootstrap sample is generated by randomly and uni-
formly selecting n cases from D with replacement.
This sampling is repeated B times to generate B
bootstrap samples (Fig. 1). The statistic ^y is calculated
Bootstrap Aggregating for all B bootstrap samples individually. Then the
bootstrapped estimate, ^yboot , of the population param-
▶ Bagging eter is calculated based on all b ¼ 1..B individual
statistics ^yb .Thus, the empirical distribution of all ^yb
provides an estimate for the theoretical sampling
distribution Fy(x) of the function T (Dixon 2006). For
Bootstrap Resampling example, if this function is the mean, then we average
all estimates ^yb as shown in (1) to obtain the
▶ Bootstrapping bootstrapped estimate of the mean, ^yboot .

XB
^yboot ¼ 1 ^yb (1)
B b¼1
Bootstrapping

Daniel Berrar1 and Werner Dubitzky2 Characteristics


1
Interdisciplinary Graduate School of Science and
Engineering, Tokyo Institute of Technology, Bootstrapping is a general data resampling strategy for
Midori-ku, Yokohama, Japan estimating any population parameter. For example,
2
Biomedical Sciences Research Institute, University of from the empirical distribution of the B statistics ^yb ,
Ulster, Coleraine, UK we can calculate the bootstrapped estimates for bias
and variance of the sample statistic ^y as follows:

Synonyms
1X B
biasf^yg ¼ ð^yb  ^yÞ (2)
Bootstrap resampling B b¼1
Bootstrapping 159 B
Bootstrapping,
Fig. 1 Simplified schematic
of the standard bootstrap
process. From the data set D
comprising n points, B
bootstrap samples are B
generated by repeated uniform
sampling with replacement.
From each bootstrap sample,
an estimate of the parameter y
is calculated. The B estimates
are used to derive the
bootstrapped estimate ^yboot

PB ^ !2 estimation. Whereas the standard bootstrap assumes


1 X B
^b  b¼1 yb
variancef^
yg ¼ y (3) that the data points are independent and identically
B  1 b¼1 B distributed random variables, the moving blocks
bootstrap is applicable to correlated observations,
For B ! 1, the bootstrapped estimate approaches for example, time series data (K€unsch 1989;
the true parameter of interest. Therefore, by increasing Dixon 2006).
the number of bootstrap samples, we can improve the
accuracy of our estimate. The Bootstrap in Systems Biology
To create one bootstrap sample, the data set D The bootstrap and ▶ cross-validation are the two
with n cases is sampled n times. In the standard most widely used data resampling strategies in
bootstrap, this sampling is uniform; hence, each of systems biology. Specifically for evaluating classifica-
the n data points in D has the same probability 1/n tion performance in small sample scenarios such
of being selected for the bootstrap sample. Thus, as microarray data classification, the bootstrap is a
the probability that a point is not selected for the popular choice.
bootstrap sample is (1  1/n)n, which is approximately In ▶ classification tasks, the bootstrap can be used
e1  0:368 for large n. Therefore, the expected num- to estimate the prediction error e of a classifier C. Here,
ber of distinct points in a bootstrap sample is about the classifier is built B times using the data in the
0.632 times n. bootstrap samples, which serve as training sets. The
The standard non-parametric resampling is resulting B models are then applied to the original data
arguably the most widely used type of bootstrap. set D, which serves as test set.
Alternative methods include the balanced bootstrap, Let yi denote the true class label of a case xi. Cb is
which warrants that each case occurs exactly B times the classifier resulting from the application of
in B bootstrap samples (Dixon 2006). The smoothed a learning algorithm to the data of the bth bootstrap
bootstrap adds a small amount of noise to each set, and Cb(xi) is the prediction of this classifier for
sampled case (Silverman and Young 1987). Both case xi. L(yi, Cb(xi)) is the loss function of the classifi-
balanced and smoothed bootstrap stabilize the variance cation. For example, the 0–1 loss function gives
B 160 Bootstrapping

0 if yi ¼ Cb(xi) and 1 otherwise. Equation (4) defines classifier is trained on about 63% of the cases only.
the bootstrapped estimate of the prediction error As a result of the small effective training set, the leave-
(Hastie et al. 2008). Here, B is the number of bootstrap one-out bootstrapped estimate ^eloob tends
samples, and n is the total number of cases. to overestimate the true prediction error (i.e., ^eloob
is biased upward). The 0.632 bootstrap
(▶ Bootstrapping, 0.632 Bootstrap) addresses
11X B X n
^eboot ¼ Lðyi ; Cb ðxi ÞÞ (4) this bias by weighing the leave-one-out bootstrapped
B n b¼1 i¼1
estimate and the bootstrapped resubstitution error,
^eresub , which is defined in (6).
The expected number of cases that are identical in
the data set D and the bth bootstrap sample is 0.368
times n. As the training sets overlap with the test set, 1X n
1 X
^eresub ¼ Lðyi ; Cb ðxi ÞÞ; (6)
the bootstrapped estimate of the prediction error, ^eboot , n i¼1 jSþi j b2S
þi
is biased downward (i.e., optimistically biased). For
▶ overfitted classifiers, the bootstrapped estimate of
the prediction error would then be smaller than their with S+i being the set of indices of bootstrap
true prediction error e. samples that contain the case i, and |S+i| is the number
To alleviate the optimistic bias of (4), we of those samples. Like any resubstitution error esti-
can exclude those cases that are contained in both mate, ^eresub is biased downward, i.e., it is smaller
the bootstrap samples and the original data set. than the true prediction error. To correct the downward
The resulting estimate is called the leave-one-out bias of the 0.632 bootstrap (▶ Bootstrapping, 0.632
bootstrap estimate of the prediction error (Hastie Bootstrap), Efron and Tibshirani (1997) developed
et al. 2008). the 0.632 +bootstrap (▶ Bootstrapping, 0.632+
Bootstrap).
1X n
1 X Figure 2 illustrates the relation between the
^eloob ¼ Lðyi ; Cb ðxi ÞÞ; (5) leave-one-out bootstrapped estimate and the
n i¼1 jSi j b2S
i bootstrapped resubstitution error in a simplified
example. To calculate the contribution of case 1 to
with S–i being the set of indices of bootstrap samples the leave-one-out bootstrapped estimate, the
that do not contain the case i, and |S–i| is the number of prediction of only model C2 is relevant because it is
those samples. Because of the random sampling, it is derived from a bootstrap sample that does not contain
possible that all bootstrap samples contain the case i, case 1. In contrast, to calculate the contribution
leading to |S–i| ¼ 0. To avoid a division by zero, the of case 1 to the bootstrapped resubstitution error,
number of bootstrap samples needs to be sufficiently the predictions of only models C1 and C3 are used
large so that at least one bootstrap sample does because only the bootstrap samples #1 and #3 contain
not contain the case i. Alternatively, we could omit case 1.
those cases that are included in all bootstrap samples
(Hastie et al. 2008). Note that, although the estimate Advantages and Disadvantages of the Bootstrap
^eloob is called the leave-one-out bootstrapped estimate, The bootstrap is one of the most commonly
on average, 0.368 times n cases are left out per boot- used resampling strategies for assessing the reliability
strap sample. The leave-one-out bootstrap estimate of of performance measures in classification tasks, specif-
the prediction error is not to be confused with the out- ically when the data sets are relatively small. This is
of-bag estimate of the error rate resulting from bagged generally the case for genomic data sets where the
classifiers. number of features (e.g., probe sets on a microarray) is
For each bootstrap sample, the expected number of orders of magnitude smaller than the number of
distinct training cases is 0.632 times n. So each specimens.
Bootstrapping 161 B
are questionable and classic statistical procedures
thus problematic. However, it is also possible to
include distributional assumptions into the sampling,
for example, we may assume that the data follow
approximately a normal distribution. Or, on the B
basis of our prior beliefs, we may assign different
prior probabilities to the individual cases for being
selected for a bootstrap sample. The bootstrap is then
a parametric method. Thus, the bootstrap can be
performed either with or without a predefined proba-
bility model (Davison and Hinkley 1997).
A particular advantage of the bootstrap is that its
implementation is straightforward. The language
and environment ▶ R, for example, provides several
functions for parametric and non-parametric
bootstrapping (see, e.g., the R library boot). Specifi-
cally, R provides functions for computing a variety of
bootstrapped confidence intervals (see, e.g., the
R function boot.ci). For instance, using the percentile
bootstrap, we derive a confidence interval for our
statistic of interest as follows: (1) we sample several
hundred bootstrap samples with replacement; (2) we
calculate the statistic for each sample; and (3) we use
the a/2 and (1  a/2) percentiles of the distribution to
obtain an empirical (1  a/2)-level confidence inter-
val. For many real-world data sets, the assumptions
underlying conventional confidence intervals are vio-
lated. Specifically for small data sets, the assumption
of an asymptotic distribution may not hold.
Bootstrapped confidence intervals are then an inter-
esting alternative.
The bootstrap requires multiple samplings of the
data set. The computational costs may be considered
a disadvantage, specifically if all possible subsamples
are generated. The exhaustive subsampling, however,
is generally not necessary. In the context of perfor-
Bootstrapping, Fig. 2 Illustration of the leave-one-out mance assessment, bootstrapped estimates of error
bootstrapped estimate and the bootstrapped resubstitution error. rates tend to be optimistically biased, specifically for
The data set contains n ¼ 10 cases; here, only the case indices are the leave-one-out bootstrap and the 0.632 bootstrap
shown. From the original data set D, three bootstrap samples are (Molinaro et al. 2005). For small data sets, the 0.632+
drawn, and three classifiers (C1, C2, and C3) are built
bootstrap performs competitively with leave-one-out
cross-validation and ten-fold cross-validation
The non-parametric bootstrap method does not (Simon 2007). Although bootstrapping is often
rely on any assumptions about the data distribution. recommended for small sample scenarios, it is no
Therefore, bootstrapping can be applied to real-world panacea for a lack of data (Jiang and Simon 2007;
data sets for which specific distributional assumptions Isaksson et al. 2008).
B 162 Bootstrapping, 0.632 Bootstrap

Cross-References
Bootstrapping, 0.632 Bootstrap
▶ Bootstrapping, 0.632 Bootstrap
▶ Bootstrapping, 0.632+ Bootstrap Daniel Berrar1 and Werner Dubitzky2
▶ Classification 1
Interdisciplinary Graduate School of Science and
▶ Cross-Validation Engineering, Tokyo Institute of Technology,
▶ Learning, Attribute-Value Midori-ku, Yokohama, Japan
▶ Model Testing, Machine Learning 2
Biomedical Sciences Research Institute, University of
▶ Overfitting Ulster, Coleraine, UK
▶ R, Programming Language

Definition
References
The 0.632 bootstrap weighs the bootstrapped
Davison AC, Hinkley DV (1997) Bootstrap methods
resubstitution error (▶ Bootstrapping), ^eresub , and the
and their application. Cambridge University Press,
New York leave-one-out bootstrapped estimate of the prediction
Dixon PM (2006) Bootstrap resampling. In: El-Shaarawi AH, error (▶ Bootstrapping), ^eloob , as follows:
Piegorsch WW (eds) Encyclopedia of environmetrics.
Wiley, New York, pp 212–220
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd
edn. Wiley-Interscience, New York. ^e:632 ¼ 0:368 ^eresub þ 0:632 ^eloob (1)
Efron B (1983) Estimating the error rate of a prediction rule:
improvement on cross-validation. J Am Stat Assoc 78
(382):316–331.
Efron B, Tibshirani R (1997) Improvement on cross-
This weighted average corrects the upward bias of
validation: the 0.632+ bootstrap method. J Am Stat Assoc the leave-one-out bootstrapped estimate. However,
92:548–560. ^e:632 can now be biased downward (Hastie et al.
Efron B, Tibshirani R (1998) An introduction to the bootstrap. 2008). Consider the example of a classification prob-
Chapman & Hall, London
Hastie T, Tibshirani R, Friedman J (2008) The elements of
lem with two balanced classes where the class labels
statistical learning. Springer Series in Statistics, New York/ are independent of the data set attributes. For this data
Berlin/Heidelberg set, the true prediction error of any classifier
Isaksson A, Wallman M, Goeransson H, Gustafsson MG cannot be smaller than 0.5. However, consider the
(2008) Cross-validation and bootstrapping are unreliable
in small sample classification. Pattern Recogn Lett 29:
0.632 bootstrap estimate for the 1-nearest neighbor
1960–1965 classifier (Hastie et al. 2008). Its resubstitution
Jiang W, Simon R (2007) A comparison of bootstrap methods error is zero because 1-NN uses the class label of
and an adjusted bootstrap approach for estimating the the duplicated training case to predict the class label
prediction error in microarray classification. Stat Med
26:5320–5334
of the corresponding test case in D. The leave-one-out
K€unsch HR (1989) The jackknife and the bootstrap bootstrapped estimate is 0.5 because for those
for general stationary observations. Ann Stat 17:1217– cases that are not included in the bootstrapped sam-
1241 ples, 1-NN performs like a random guesser. Thus,
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error
estimation: a comparison of resampling methods. Bioinfor-
the 0.632 bootstrapped estimate of the prediction
matics 21(15):3301–3307 error is ^e:632 ¼ 0 þ 0:632  0:5 ¼ 0:316, which
Silverman BW, Young GA (1987) The bootstrap: to smooth or underestimates the true error rate of 0.5 in this
not to smooth? Biometrika 74:469–479 example. To correct the downward bias of the 0.632
Simon R (2007) Resampling strategies for model assessment and
selection. In: Dubitzky W, Granzow M, Berrar D (eds)
bootstrap, Efron and Tibshirani (1997) developed
Fundamentals of data mining in genomics and proteomics. the 0.632+ bootstrap (▶ Bootstrapping, 0.632+
Springer, Heidelberg Bootstrap).
Bottom-Up Hierarchical Clustering 163 B
Cross-References degree of ▶ overfitting. Hence, the weights for the
0.632+ bootstrap are not fixed as in the 0.632
▶ Bootstrapping, 0.632+ Bootstrap bootstrap but determined based on the degree of
▶ Bootstrapping ▶ overfitting. Efron and Tibshirani (1997) developed
the 0.632+ bootstrap to address the downward bias B
of the 0.632 bootstrap (▶ Bootstrapping, 0.632
Bootstrap).
References

Efron B, Tibshirani R (1997) Improvement on cross-validation:


the 0.632+ bootstrap method. J Am Stat Assoc 92:548–560
Hastie T, Tibshirani R, Friedman J (2008) The elements of Cross-References
statistical learning. Springer Series in Statistics, New York/
Berlin/Heidelberg ▶ Bootstrapping
▶ Bootstrapping, 0.632 Bootstrap
▶ Learning, Attribute-Value
▶ Overfitting

Bootstrapping, 0.632+ Bootstrap

Daniel Berrar1 and Werner Dubitzky2 References


1
Interdisciplinary Graduate School of Science and
Efron B, Tibshirani R (1997) Improvement on cross-
Engineering, Tokyo Institute of Technology, validation: the 0.632+ bootstrap method. J Am Stat Assoc
Midori-ku, Yokohama, Japan 92:548–560
2
Biomedical Sciences Research Institute, University of Hastie T, Tibshirani R, Friedman J (2008) The elements of
Ulster, Coleraine, UK statistical learning. Springer Series in Statistics, New York/
Berlin/Heidelberg

Definition

The 0.632+ bootstrap adjusts the weights of the 0.632


Bottleneck Enzyme
bootstrap (▶ Bootstrapping, 0.632 Bootstrap) as fol-
lows (Hastie et al. 2008).
▶ Rate-limiting step

^e:632þ ¼ ð1  wÞ ^eresub þ w ^eloob ; with


0:632 ^eloob  ^eresub (1)
w¼ and R ¼ random
1  0:368R ^e  ^eresub Bottom-Up Determination

▶ Interlevel Causation
Here, ^eresub is the bootstrapped resubstitution error
(▶ Bootstrapping), and ^e loob is the leave-one-out
bootstrapped estimate of the prediction error
(▶ Bootstrapping). ^e random is the estimate of the pre-
diction error under the assumption that the class Bottom-Up Hierarchical Clustering
labels are independent of the data set attributes
(▶ Learning, Attribute-Value). R is an estimate of the ▶ Hierarchical Agglomerative Clustering
B 164 Bromodomain

A tandem repeat type bromodomain, which is des-


Bromodomain ignated as a double bromodomain, was first found in
the TAF1 (also called as CCG1 or TAF(II)250)
Toshiya Senda1 and Naruhiko Adachi2 subunit of TFIID. This double bromodomain interacts
1
Biomedicinal Information Research Centre (BIRC), with histone chaperone CIA/Asf1. The Rsc4 subunit
National Institute of Advanced Industrial Science and of the RSC complex, which is an ATP-dependent
Technology (AIST), Tokyo, Japan nucleosome-remodeling factor, also contains tandem
2
Structure-guided Drug Development Project, JBIC bromodomains. The physical interactions between
Research Institute, Japan Biological Informatics the bromodomain and factors for the nucleosome
Consortium, Tokyo, Japan structural change suggest that histone acetylation
affects nucleosome structural change.

Definition
Cross-References
Bromodomains, initially identified in the Drosophila
protein Brahma, is an acetylated histone recognition ▶ Histone Post-translational Modification to
domain and found in various chromatin factors. Nucleosome Structural Change
ATP-dependent nucleosome-remodeling factors and
histone acetyltransferases frequently contain
bromodomains (Allis et al. 2006). Typical References
bromodomains are composed of approximately 110
amino acid residues. Tertiary structure analyses showed Allis CD, Jenuwein T, Reinberg D, Caparros M-L (2006)
Epigenetics, 1st edn. Cold Spring Harbor Laboratory Press,
that the bromodomain adopts a four-a-helix-bundle
New York
structure and has a cavity at the top of the molecule to
accommodate an acetylated lysine residue (Fig. 1).

Brownian Ratchet

Nobuo Shimamoto
Faculty of Life Sciences, Kyoto Sangyo University,
Kyoto, Japan

Definition

Brownian Ratchet is a model mechanism for translo-


cation of RNA polymerases. In this model, the
transcript molecule, complexed with the template
DNA, is supposed to thermally move forward
and backward rapidly in the catalytic site of RNA
polymerase. Thus, an eventual big backward
movement makes the backtracked complex, with
the 30 -end of the transcript molecule protruding
from RNA polymerase. A big forward movement
Bromodomain, Fig. 1 Crystal structure of bromodomain in dissociated the transcript in abortive synthesis. In
complex with an N-tail peptide of histone H4 (PDB code: 1E6I).
Bromodomain and the N-tail peptide of histone H4 are shown in
normal elongation, a big backward movement is
cartoon (cyan) and stick models (carbon atoms in white), hampered by the ratchet of the collision with the
respectively bridge helix, and a big forward movement by the
Building Blocks of Gene Regulatory Network 165 B
reaction with a substrate nucleoside triphosphate Cross-References
(NTP), which is inserted into a space occasionally
formed between the helix and the 30 -end of the tran- ▶ Transcription in Bacteria
script. In this model, a transcription complex with the
transcript molecule at any given position is merely B
a conformation but no longer a chemical species.
Thus, there must be neither a chemical transition Building Blocks of Gene Regulatory
state nor an activation energy of translocation, and Network
the system should be described in dynamics rather
than kinetics. ▶ Network Motifs of Gene Regulatory Networks
C

3-Cycle be obtained by a combination of tracer experiments


(intracellular labeling information), measured
▶ Canonical Network Motifs extracellular fluxes and stoichiometric balancing.
Nuclear Magnetic Resonance Spectroscopy (NMR)
and, more recently, ▶ Mass Spectrometry (MS) have
emerged as interesting tools for measuring the level of
13 13
C Fluxomics C enrichment in metabolites (Wiechert 2001). It pro-
vides valuable insights into the cellular metabolism
▶ 13C Metabolic Flux Analysis and is one of the methodologies used in ▶ Metabolic
Pathway Analysis.

13
C Metabolic Flux Analysis Characteristics
13
Meghna Rajvanshi and Kareenhalli V. Venkatesh C metabolic flux analysis is a powerful technique
Department of Chemical Engineering, Indian Institute for modeling biological systems. 13C metabolic flux
of Technology Bombay, Powai, Mumbai, analysis is free of all the short comings of the stoichio-
Maharashtra, India metric ▶ Metabolic Flux Analysis (MFA), though it
needs more information in addition to extracellular
flux data to have the complete flux distribution.
Synonyms Stoichiometric MFA and ▶ Flux Balance Analysis
(FBA) are restricted in their ability to fully quantify
13
C Fluxomics the network because they cannot be applied in
cases involving the following: (1) metabolic pathways
where the energy metabolites, such as ATP, NADH,
Definition NADPH, are not balanced; (2) parallel reactions
where none of the competing branches are directly
13
C MFA is an experimental approach to compute coupled to measurable variable or metabolite;
intracellular flux pattern. It involves tracer studies (3) metabolic pathways containing cyclic reaction
where substrates are labeled with stable isotopes, chains not linked to a measurable flux; and (4) meta-
such as 13C, and the labeling pattern of metabolites is bolic cycles having bidirectional reactions (Wiechert
subsequently measured. Detailed flux distributions can et al. 1997).

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
C 168 13
C Metabolic Flux Analysis

13
C Metabolic Flux Analysis, Fig. 1 Different labeling states of a molecule containing 3 C atoms. Labeled C is shown as black
circles. There are eight positional isotopomers and four mass isotopomers (specified as (a–d))

Certain assumptions and rules need to be applied The basic procedure followed for 13C metabolic
while performing 13C metabolic flux analysis as listed flux analysis (Wiechert 2001; Kr€omer et al. 2009) is
below (Marx et al. 1996): to measure the labeling patterns of metabolites.
1. It is based on an assumption of a pseudo steady Carbon labeling experiment consists of growing the
state. organism of interest on 13C substrate such as glucose
2. It assumes that there is no preference of enzymes for labeled at a specific location. This labeled substrate is
labeled and unlabeled substrates (i.e., no isotope then utilized or consumed across the various path-
effect). ways finally leading to enrichment of the metabolites
3. Bioreactor size should be small to minimize the cost intracellular pool. Cells are harvested and are hydro-
of the experiment as labeled substrates are quite lyzed to get metabolite precursors like amino acids.
costly but it is also necessary that the volume reduc- The pattern of enrichment in these metabolites is
tion should not be to the extent where stable identified or estimated by mass spectrometry (MS)
stationary condition cannot be achieved. or NMR. The output from MS or NMR consists of
4. It is necessary that all the intracellular metabolites a spectral pattern. The pattern and the level of enrich-
are in steady state both in terms of metabolic fluxes ment is a fingerprint for a particular kind of pathway
and isotopomer distribution. Typically it takes 2–4 involved. The level of enrichment of certain metabo-
bioreactor residence times to reach the isotopic lite is a reflection of the permutation and combina-
equilibrium state. Time required for intracellular tions a labeled molecule has undergone. MS is
intermediates to reach isotopic steady state is com- becoming the method of choice for 13C analysis due
paratively less than the time taken by the macro- to its comparatively simpler data interpretation ability
molecules of biomass (proteins, RNA, DNA, cell even for multimolecular species. NMR on the other
wall) to reach isotopic steady state. The term hand does not require complex sample processing as
isotopomer implies “one of the different labeling MS but its spectral data becomes extremely convo-
states in which a particular metabolite can luted for molecules with more than three atoms.
be found.” There can be 2n different labeling A few deconvolution algorithms are available but
states of a molecule. For example, for a metabolite are mathematically involved and this restricts its
with 3 C atoms, there can be eight different usage to only few users with time-developed
positional isotopomers while only four different skill and experience. Thus, this intracellular labeling
mass isotopomers (Fig. 1). Knowledge of complete information, in combination with extracellular
isotopomer distribution yields information regard- fluxes, is used to compute the intracellular fluxes.
ing the enrichment pattern of different intracellular Several algorithms are available to decipher the
metabolites. This fractional enrichment information meaningful information from such spectra. OpenFlux
reflects how atoms in the reacting species interact (Quek et al. 2009), FiatFlux (Zamboni et al. 2005),
13
with each other to form product molecules. Thus, C MFA, etc., are some of the examples of the
fractional enrichment data along with C atom tran- software available. The enrichment information in
sition for all the species involved in a set of reac- terms of mass isotopomer distribution acts as an
tions defining the intracellular metabolism will input to the algorithm. Algorithm uses this mass
yield intracellular fluxes. isotomomer data to obtain the positional isotopomer
13
C Metabolic Flux Analysis 169 C
13
C Metabolic Flux
Analysis, Fig. 2 Schematic Growth on 13C labeled Substrate
flowchart of 13C-based
metabolic flux analysis

Harvest Biomass Measurement of:


• Biomass Composition C
• Uptake Rates
• Production Rates
Intracellular Protein Bound
Metabolites Amino acids

Assumed Fluxes

Isotopic Enrichment
Measurement
By Software
NMR or GC/MS Simulation

Simulated
Mass Isotopomer Distribution Isotopomer
Distribution

DIFFERENCE

Model Based Parameter Fitting

data, which is further used to estimate the fluxes. Applications of 13C MFA
Software works by minimizing the error between
13
the simulated and experimental mass isotopomer C MFA has been widely used in analyzing microbial
distribution. One fundamental problem of CLE is metabolism and regulation involved in metabolic net-
that it is usually difficult to measure the labeling works. It is mainly used for analyzing central carbon
pattern of many important intracellular metabolites metabolism but recently has been applied for genome
due to their small pool size. To overcome this prob- scale networks as well. 13C MFA has good predictabil-
lem, the labeling patterns of amino acids are often ity for prokaryotic systems but application of 13C MFA
measured, since the amino acids reflect the labeling to eukaryotic systems, especially to plant and animal
states of a number of important intermediate metab- cells, is extremely difficult due to the intracellular
olites through their precursors. These labeling mea- compartmentalization. Information of compartmental-
surement data provide additional and yet independent ization has to be captured by model properly in order to
constraints on the intracellular fluxes and thus enable obtain proper fitting of predicted versus experimental
a more refined analysis of metabolic fluxes in the data (Wiechert 2001). Similarly application of 13C
complex metabolic network. An overall outline of MFA is obscure when multiple carbon substrates are
13
C-based metabolic flux analysis methodology is used. High cost of labeled substrates, need for special-
given in Fig. 2. ized instrumentations (NMR or MS) for determining
C 170 C Domain

isotopic labeling, and elaborate mathematical and sta- ▶ Metabolic Flux Analysis
tistical analysis for isotopomer modeling are some ▶ Spectroscopy and Spectromicroscopy
additional limitations of 13C MFA, which further
restrict its use (Tang et al. 2009). 13C MFA requires
excessive experimental data and is quite complicated References
as compared to other methods of flux analysis;
however, for certain cases, 13C MFA is the only Blank LM, Kuepfer L, Sauer U (2005) Large-scale 13C-flux
analysis reveals mechanistic principles of metabolic network
solution. For example as stated in the previous section,
13 robustness to null mutations in yeast. Genome Biol 6(6):R49
C MFA can resolve parallel pathways, cyclic meta- Iwatani S, Yamada Y, Usuda Y (2008) Metabolic flux analysis in
bolic pathways and bidirectional fluxes in the meta- biotechnology processes. Biotechnol Lett 30(5):791–799
bolic network. 13C MFA has been extensively used in Kr€
omer J, Quek L-E, Nielsen L (2009) 13C-Fluxomics: a tool for
measuring metabolic phenotypes. Australian Biochemist
studying metabolism of relevant organisms for bio-
40(3):17–20
technology processes, namely, production of amino Marx A, de Graaf AA, Wiechert W, Eggeling L, Sahm H (1996)
acids (lysine, glutamate, phenyl alanine, methionine, Determination of the fluxes in the central metabolism of
etc), vitamin (riboflavin), ethanol, glycerol, antibiotics Corynebacterium glutamicum by nuclear magnetic reso-
nance spectroscopy combined with metabolite balancing.
(penicillin G), and bioplastics (1,3-propanediol)
Biotechnol Bioeng 49:111–129
(Iwatani et al. 2008). It has also been used to test the Quek L-E, Wittmann C, Nielsen L, Kromer J (2009)
validity of the assumptions made in other methods of OpenFLUX: efficient modelling software for 13C-based
flux analysis and about the metabolic network. 13C metabolic flux analysis. Microb Cell Fact 8(1):25
Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EEK,
MFA has been used in studying single gene disruption
Keasling JD (2009) Advances in analysis of microbial met-
mutant and establishing robustness in the metabolic abolic fluxes via 13C isotopic labeling. Mass Spectrom Rev
network (Blank et al. 2005). 13C MFA has proven to 28(2):362–375
be quite useful in cancer research to study unique Wiechert W (2001) 13C metabolic flux analysis. Metab Eng
3(3):195–206. doi:10.1006/mben.2001.0187
metabolic characteristics under diseased condition. It
Wiechert W, Siefke C, de Graaf AA, Marx A (1997) Bidirectional
is used to understand flux distribution through impor- reaction steps in metabolic networks: II. Flux estimation and
tant pathways for amino acids and fatty acids biosyn- statistical analysis. Biotechnol Bioeng 55(1):118–135
thesis in cancer cells (Yang et al. 2008). Other Yang C, Richardson A, Osterman A, Smith J (2008) Profiling of
central metabolism in human cancer cells by two-
important application of 13C MFA is in identification
dimensional NMR, GC-MS analysis, and isotopomer model-
of bottlenecks in metabolic network of the industrially ing. Metabolomics 4(1):13–29
important organisms. Identified target genes, upon Zamboni N, Fischer E, Sauer U (2005) FiatFlux–a software for
manipulation through genetic engineering techniques, metabolic flux analysis from 13C-glucose experiments.
BMC Bioinformatics 6:209
lead to enhanced productivity of the metabolite of
interest. 13C MFA has also been used in identifying
drug targets for various diseases. By studying flux
distribution in pathogens or diseased cells, it might be C Domain
possible to identify a pathway crucial for the survival
of the pathogen inside the host cells. Such pathways ▶ Constant (C) Domain
can be potential drug targets (Tang et al. 2009). Thus,
in spite of being a complex methodology for flux
analysis, 13C MFA has wide applicability in different
fields. It provides important insights about the charac- C Gene
teristics of a metabolic network, for which this tech-
nique is indispensable. ▶ Constant (C) Gene

Cross-References
C Type Domain
▶ Flux Balance Analysis
▶ Mass Spectrometry, Proteomics, and Metabolomics ▶ Constant (C) Domain
Canalization 171 C
Calibration Canalization

Clemens Kreutz1 and Jens Timmer2,3,4 Philippe Huneman


1
Institute for Physics, Freiburger Center for Data Institut d’Histoire et de Philosophie (IHPST), des
Analysis and Modeling (FDM), University of Sciences et des Techniques, Université Paris 1
Freiburg, Freiburg, Germany Panthéon-Sorbonne, Paris, France C
2
Institute for Physics, University of Freiburg, Freiburg,
Germany
3
BIOSS Centre for Biological Signalling Studies and Definition
Freiburg Institute for Advanced Studies (FRIAS),
Freiburg, Germany First recognized by Conrad Waddington, canalization
4
Department of Clinical and Experimental Medicine, means that many perturbations in the genes will
Link€ oping University, Link€ oping, Sweden not change the phenotypic output of the process.
Waddington in The Strategy of the Genes
(1957) used the metaphor of “epigenetic landscapes”
Definition (Fig. 1a), genes acting like the nails which hold the
ropes underpinning a valley (Fig. 1b) in such a way
Calibration is the process of finding the input-output that moving or displacing them does not change much
relationship of a system or device (Skoog 2007). Usu- the general landscape. Canalization occurs also with
ally, this term is used if an assay or measurement regard to the environment. Phenomena like genetic
technique is characterized. The calibration curve assimilation mean therefore that new channels have
relates the physical quantity of interest to the experi- been created. Canalization also implies that organ-
mentally accessible quantities, e.g., the relationship isms with various genotypes may have the same phe-
between the concentration of a compound like protein notype, which contributes to explain typicality in
or mRNA to intensities or electric current. various species.
Measurement devices are commonly designed to Canalization raises two main sets of questions.
have a linear calibration curve, e.g., g(x) ¼ a0 + a1x. First, developmental and molecular: what are the
Here, a0 denotes an offset or background parameter and mechanisms underlying it (Masel and Siegal 2009)?
a1 and the scaling parameter. The purpose of calibration Second, evolutionary: given that canalization means
is then to estimate the parameters of the calibration robustness in the face of either mutational or genetic
curve by systematically changing x and evaluating the change, one should ask which came first (Wagner
output y ¼ g(x) + noise. In addition, the range of x where 2005): has canalization evolved as an adaptation in
the linear relationship holds has to be detected. the face of environmental variations? Or has it been
Because a mathematical model of a biological sys- selected because of the buffering against mutational
tem also describes an input-output behavior, estima- perturbation (genetic noise)? Lehner (2010) argued
tion of model parameters is termed model calibration. that it is indeed an adaptation for environmental
variation, and robustness versus mutational robustness
is a by-product.
Cross-References Besides the evolutionary cause of canalization,
biologists also investigate its evolutionary effects. It
▶ Experimental Design has been argued that canalization, as a form of robust-
▶ Observable ness, indeed increases evolvability: even if it seems to
▶ Parameter Estimation decrease phenotypic variability and then the opportu-
nities for selection, it appears that canalized develop-
mental systems, because of their robustness across
References
a wide range of variations, allow more evolution,
Skoog D (2007) Principles of instrumental analysis. Thomson, since genotypic variation can accumulate without
Brooks/Cole hampering the production of a viable phenotype.
C 172 Cancer

Canalization,
Fig. 1 Canalization (after
Waddington). (a) Epigenetic
landscape, figuring the
robustness of developmental
fates. (b) Genetic
underpinning of canalization

Cross-References Cross-References

▶ Explanation, Developmental ▶ Cancer Resources


▶ Neoplasms

References
References
Lehner B (2010) Genes confer similar robustness to environ-
mental, stochastic, and genetic perturbations in yeast. PLoS World Health Organization (2011) Cancer. Fact sheet 297.
One 5(2):e9035 World Wide Web. http://www.who.int/mediacentre/
Masel J, Siegal M (2009) Robustness: mechanisms and factsheets/fs297/en/. Accessed 18 May 2011
consequences. Tr Gen 25(9):395–403
Wagner A (2005) Robustness and evolvability and living
systems. Princeton University Press, Princeton

Cancer and Environmental Influences

C. Vyvyan Howard and Gesa Staats


Cancer Centre for Molecular Biosciences, University of
Ulster, Coleraine, UK
Jingky Lozano-K€uhne
Department of Public Health, University of Oxford,
Oxford, UK Definition

Cancer is a multifactorial multistage disease. Environ-


Synonyms mental influences are implicated in the etiology of
cancer at all stages of the life cycle, though particularly
Malignant tumors; Neoplasms during the developmental period. The mechanisms
underlying the development of cancer are critical to
an understanding of the magnitude of environmental
Definition influences.

Cancer is a generic term for a class of diseases


characterized by the rapid uncontrolled growth of Characteristics
abnormal cells which can invade adjacent body parts
and spread to other organs. The process of spreading to Background
other organs or other locations of the body is known as The environment can be defined as the totality of the
metastasis and is a major cause of death from cancer biological and nonbiological influences that act upon
(WHO 2011). an organism, a population of organisms, or an
Cancer and Environmental Influences 173 C
ecological system and that can influence survival. Bio- By the year 2000, it was less than 1 in 3. The risk of
logical factors include the organisms, their food, and women developing breast cancer in the UK in 1960
their interactions. Nonbiological factors can be divided was 1 in 20. The latest figures indicate that 1 in
into physical and chemical and usually act in combi- 8 women in the UK will now develop breast cancer.
nations. Purely physical influences include all electro- These trends are described for a variety of cancers
magnetic radiations, temperature, and meteorology. across a large number of countries. Developed coun-
Purely chemical influences include soil constitution, tries have become used to high levels of cancer in their C
water and air quality, and chemical pollution. Organ- populations as “the norm.” It has been demonstrated
isms respond to environmental changes by evolution- that, for some cancers, the average age of onset within
ary adaptations. the UK population is decreasing (Newby et al. 2007).
The etiology of cancer remains unknown, in that This provides strong evidence that an environmental
a detailed mechanistic description of how cancers arise factor is operating.
is lacking. There are many correlations with factors However, it is a valid question to enquire whether
that are associated with an increased or decreased members of traditional preindustrial societies who
incidence of cancer. Environmental factors can influ- lived into old age, invariably demonstrate increasing
ence cancer incidence and such factors are normally cancer incidence? An examination of the observations
loosely classified into “lifestyle factors” and “external made on those societies, whose lifestyles had remained
factors.” Lifestyle factors are generally regarded to be virtually unchanged for hundreds of years, shows
under the control of the individual and include a remarkably consistent pattern: an almost total
smoking, drinking, diet, drug taking, exercise level, absence of cancer. To investigate this, it is necessary
sexual behavior, and occupation. External factors to go back quite a long way into the written record.
include background ionizing radiation, ambient chem- A key source of information on this topic is a book, by
ical pollution, and poor air quality. This entry will Vilhjamur Stefansson (1960), in which he cites
concentrate on the effects of chemical pollution on a variety of authors who reported little or no signs of
cancer incidence, emphasizing some recent advances cancer occurring in unindustrialized societies living
in knowledge on the effects of epigenetic influences traditional lifestyles. The correspondents were medi-
during development. That is not to ignore the widely cally qualified doctors visiting so-called primitive
studied and highly significant effects of radiation and societies in many and widespread communities in
lifestyle factors, such as tobacco smoking, on cancer Canada, Africa, India, and South America.
incidence. The main tool used to detect alterations in In 1915, a report entitled “The Mortality from
the distribution of disease in human populations is Cancer Throughout the World” was authored by Fred-
epidemiology. A number of seminal contributions in erick L Hoffman, the then chairman of the committee
the eighteenth century contributed to the early devel- on statistics of the American Society for the Control of
opment of cancer epidemiology. In 1713, Bernadino Cancer. It analyzed literally thousands of separate
Ramazzini (the “father of occupational medicine”) reports and all of the data available at that time.
reported a relatively high incidence of breast cancer A major conclusion was “the rarity of cancer among
in nuns, linked to a very low incidence of cervical native man suggests that the disease is primarily
cancer. He pondered whether this might be related to induced by the conditions and methods of living
a celibate lifestyle. In 1761, in his book “Cautions which typify our modern civilization.” The
Against the Immoderate use of Snuff,” John Hill author went on to explain that “. . . a large number of
made the first observations linking tobacco use with medical missionaries and other trained medical
the development of cancer. Then in 1775, Sir Percival observers living for years among native races through-
Pott described an increased incidence of carcinoma of out the world, would long ago have provided a more
the scrotum in chimney sweeps, an occupational substantial basis of fact regarding the frequency of
disease. occurrence of malignant disease among the so-called
The industrialized world has seen a steady increase uncivilised races, if cancer were met with among them
in the incidence of cancer throughout the past 150 to anything like the degree common to practically
years. In the UK immediately after the Second World all civilised countries”. . . “Quite to the contrary, the
War, the lifetime risk of developing cancer was 1 in 4. negative evidence is convincing that, in the opinion of
C 174 Cancer and Environmental Influences

qualified medical observers, cancer is exceptionally Caucasian inhabitants. The topic has been addressed
rare among the primitive peoples. . .” more recently by Howard and Newby (2004) and
It is easy to dismiss such evidence as “anecdotal.” Irigaray et al. (2007).
However, with the reliably observed continuation of
the trend to increased cancer incidence in developed Low-Dose Epigenetic Phenomena and Cancer
countries, combined with the absence of any evidence There is increasing emphasis on the influence of low-
at all about substantial cancer incidence rates in dose intrauterine exposure of the fetus to ubiquitous
preindustrial societies, it seems reasonable to conclude environmental pollutants on the subsequent risks for
that cancer as a disease is related primarily to industri- cancer in later life. This rather subtle form of environ-
alization. Other arguments can be brought into play to mental chemical exposure, associated with long
counter this assertion. For example, average life expec- latency periods, has been prompted by the demonstra-
tancy is higher in developed countries, and cancer tion in animal models that doses of certain chemicals,
incidence is a function of age (though if one is living several orders of magnitude below the concentration
in a “soup” of carcinogenic influences from conception required to produce measureable effects in adults,
to grave, then length of exposure would be expected to appear to be able to perturb normal development. The
be significant, though not necessarily causal in itself). explanation for this is likely to be found in considering
It should be noted though that average life expectancy the physiological concentrations at which cell signal-
itself, as a measure, can be deceptive. It has been used ing molecules, such as hormones, operate. This is
to assert that life in earlier times was brutal and short typically very low, in parts per trillion. Moreover,
with most adults dying in their third decade. This is dose-response curves tend to be nonlinear and with
a very simplistic explanation. In preindustrial socie- an inverted U shape. This indicates that there is an
ties, the death rate in infancy was high, but if adoles- optimal concentration at which such cell signaling
cence was reached then, according to the same molecules operate, that is, there can be too little and/
medically qualified observers mentioned above, the or too much, as is typical with receptor-mediated
chances of living a reasonable life span in good health events.
were high and unlikely to end in the development of In adult animals, cell signaling molecules are
cancer. These observations are supported by the rec- involved in maintaining homeostasis in an intact
ognition among most epidemiologists that the major organism in which most structural and functional ele-
contribution to the rise in average life expectancy in ments are relatively stable. In the fetus, on the other
developed countries has been the reduction in infant hand, cell signaling molecule expression is intimately
mortality. tied up with control of cell proliferation and apoptosis
during windows of developmental activity. Perturba-
Nature Versus Nurture tion of the tightly regulated levels of key cell signaling
Epidemiological studies of monozygous twins are molecules can lead to subtle changes in the phenotype
regarded as the golden yardstick for deciding whether which can be expressed anatomically as tissue dysgen-
a disease is predominantly determined either by esis or physiologically as functional deficits. Such
genetic or, on the other hand, environmental factors. mechanisms appear to be highly conserved between
Lichtenstein et al. (2000) reported a study of the con- species, making animal studies potentially more com-
cordance of similar cancers in 45,000 pairs of identical parable across species than for adult high-dose
twins. The average rate of concordance was rather low toxicology.
10–15%. The conclusion from the study was that Hitherto, classical toxicology has mainly been pred-
environmental factors far outweigh genetic influences icated on adult animal studies. Teratological studies
in the etiology of cancer. Czene et al. (2002) reported have mainly been restricted to the detection of naked
similar conclusions. The concept should not be eye malformations. Why is this of relevance to the
surprising, because Doll and Peto (1981) showed, in study of environmental causes of cancer? It is known
their study of cancer incidence in native Japanese that any child born with a malformation is marginally
compared to émigré Japanese in Hawaii, that breast more likely than average to develop cancer in their
cancer rates were four times higher in the emigrants, lifetime. There is a connection, as yet not fully under-
approaching the incidence prevailing in the indigenous stood, between the ways that tissues are assembled in
Cancer and Environmental Influences 175 C
the fetus and the subsequent development of cancers. exposed in the womb to the EDS bisphenol A between
This process is normally associated with long gestational day 9 and postnatal day 1, at concentrations
latencies. 200 times lower than the current regulatory tolerable
daily intake. The exposed rats developed such tumors
Cell Signaling Disruption in the Fetus and Cancer in early adulthood, while there were no such changes in
The most widely studied group of chemicals capable of the breasts of the untreated control group. Morpholog-
affecting the function of natural cell signaling mole- ical changes to breast architecture in mice were shown, C
cules are classed as endocrine-disrupting substances Markey et al. (2001), at even lower doses.
(EDSs). There is general agreement that high-dose The Endocrine Society Scientific Statement was
therapeutic exposure to synthetic hormones at critical less emphatic about the role of EDS in male genital
stages of development can cause cancer, for example, tract tumors. However, the testicular dysgenesis syn-
intrauterine exposure to diethylstilboestrol (DES) led, drome (TDS) is of relevance. TDS consists of four
after a 20-year latency period, to clear cell adenoma interlinked conditions: cryptorchidism, hypospadias,
of the vagina in female offspring. However, a recurring oligospermia, and testicular germ cell cancer
argument against the causal link between environmen- (TGCC). The incidence of all four of these conditions
tal exposure to such environmental pollutant EDS and has been rising sharply over the past decades in many
subsequent carcinogenesis, predicated on classical developed countries. TGCC is typically a cancer of
high-dose toxicology, is that pollutants are simply not younger men, the peak age range for incidence is
present in sufficient environmental quantities to lead to between 25 and 40. When a child has a cryptorchid
a human exposure level that could cause cancer. testis, then it is either brought down into the scrotum by
Another aspect is that many EDS are not genotoxic, orchidopexy or removed (orchidectomy). The reason
which under the somatic mutation theory (SMT) (see for this is that an intra-abdominal undescended testis is
entry ▶ Cancer Theories), Boveri (1929), would put dysgenic and carries a higher risk of developing
a question mark over their ability to cause cancer. Such TGCC.
conclusions, however, are usually based on the logic of Skakkebaek et al. (2001) examined bilateral biop-
cancer induction toxicology performed on adult labo- sies, one from each testis, in cases of unilateral TGCC
ratory animals. demonstrated carcinoma in situ in both testes. The
Human evidence of links between exposure to envi- conclusion drawn was that TGCC cancer is
ronmental pollutants, at background, and cancer is a developmental disease. They also demonstrated that
beginning to emerge, as illustrated in a paper by hypofertile males were more likely to develop TGCC
Hardell and Eriksson (2003) which showed a positive than average for the population.
correlation between higher current levels of PCBs, Epidemiological evidence for the hypothesis that
HCB, and chlordanes (long lived pollutants) in human exposure to EDCs causes TGCC remains
mothers with sons who have developed testicular can- scarce. There are some indications that exposure to
cer. However, the majority of our current knowledge PCBs and mono-butyl phthalate may be implicated
concerning low-dose fetal exposure to EDS and sub- but the evidence is not conclusive. Migration studies
sequent cancer development comes from animal have shown that first generation migrants retain the
studies. incidence level of TGCC that prevails in their coun-
The Endocrine Society has published a Scientific tries of birth, while second and subsequent generations
Statement on endocrine-disrupting chemicals had rates similar to their adoptive country. This gives
(Diamanti-Kandarakis et al. 2009). When considering strong evidence that TGCC is a developmental condi-
the role played by EDS in the etiology of breast cancer tion and that it is mediated via environmental factors.
the report concludes that “Collectively, these data sup-
port the notion that endocrine disruptors alter mam- Mechanisms of Cancer Induction
mary gland morphogenesis and that the resulting It is clearly beyond the scope of this entry to discuss all
dysgenic gland becomes more prone to neoplastic the evidence on EDSs and cancer induction. However,
development.” The study in question, Murray et al. the point of the previous section has been to empha-
(2007), was the demonstration of the development of size that a low-dose mechanism mediated through
carcinoma in situ of the breasts in 33% of fetal rats fetal exposure can and does lead to tissue dysgenesis.
C 176 Cancer and Environmental Influences

This is essentially a non-genotoxic process. How then congeners and enantiomers (mirror image isomers)
does it relate to the somatic mutation theory (see entry are taken into account (Vetter and Luckas 1995,
▶ Cancer Theories)? Well, maybe not very much! An 1998). Other groups of organic chemicals also have
alternative hypothesis has been proposed by high numbers of variants. We simply do not possess
Sonnenschein and Soto (1999), the tissue organization the tools in toxicology to analyze complex mixtures.
field theory (TOFT), which provides a more logical Even testing mixtures of two chemicals at a time is
explanation of low-dose fetal exposure mechanisms quite laborious (Axelrad et al. 2002). There are some
(see entry ▶ Cancer Theories). Under this hypothesis, bio-monitoring methods being developed for estimat-
it is proposed that, as in other life forms, all cells in the ing the total dioxin-like activity or estrogen-like activ-
body are in a “default” state of proliferation, rather ity of mixtures (Murk et al. 1996; Soto et al. 1997), but
than “quiescence,” as previously assumed. In these do not identify individual components of the
multicellular organisms, where cells must exist in an mixture. However, many of these components are
ordered “society of cells,” the desire to proliferate has acknowledged to be carcinogenic or cancer promoters.
become continuously suppressed by the evolution of Under such circumstances, the power of epidemiol-
inhibitory juxtacrine and autocrine (i.e., local hor- ogy to determine outcome against exposure in human
monal) influences. This local control is intimately populations is rather weak and requires enormous bio-
bound up with local architecture of the tissue at monitoring programs involving hundreds of thousands
a microscopic level. The balance of the stroma to the of participants. Some such programs are getting under
parenchyma and the relationship of one cell type to way but will not report on cancer outcomes for several
another in 3D space are probably of high significance decades. In the meantime, Sir Bradford Hill
(Kaufman and Arnold 1996). The TOFT proposes that (1960) criteria on causation are worth revisiting. He
carcinogens alter tissue interactions, such as those gave a list of criteria that can be considered when
occurring between the stroma and the epithelium, that addressing the causation of disease in occupational
in turn cause the architectural changes found in carci- medicine. Now it is realized that they have much
nomas, which, in turn, facilitates an increased local cell wider significance and applicability to other fields of
proliferation rate. The control of the micro- medicine. It is not essential that all the criteria be
architecture of organs by epigenetic factors (external fulfilled to establish causation. We discuss each
influences), throughout the developmental period, is Bradford Hill criteria with respect to the etiology of
a delicate process. Some of these factors are known cancer from environmental pollution, in order of
but most are not fully understood. Hormones, which decreasing strength, as previously considered by
are known to be involved in the control of organogen- Nicolopoulou-Stamati, Howard and Gaudet (2004).
esis, act in concentrations in the parts per trillion 1. Temporal sequence: The suspected causal agent
ranges. Xeno-chemicals, which may mimic hormones, must appear in the environment prior to the effect
are present in the populations of industrialized coun- it is assumed to be causing. Preindustrialization
tries at similar concentrations. From our current levels of cancer were low. There has been a steady
knowledge base, they are likely to be inducing tissue increase in cancer incidence following the start of
dysgenesis in the fetus. industrialization. The incidence of certain hormone
related cancers has increased rapidly following the
Complexity introduction of EDSs into the general environment,
Another feature of human exposure to potential car- and there is universal exposure of the population.
cinogens is the extreme complexity of the mixture of This provides strong evidence of causality related
xeno-chemicals to which populations are exposed, to industrial development as measured by this
which consists of hundreds of chemical groups. If criterion.
these groups are broken down into individual com- 2. Experimental evidence: There is strong experimen-
pounds, then there are tens of thousands of individual tal evidence that many of the chemical pollutants in
chemicals in the mixture. For example, the organo- the environment and the food chain are carcino-
chlorine pesticide toxaphene, which in many classifi- genic in animal models. To this, we must add radi-
cations would be regarded simply as one chemical, has ation exposure as a major influence, where the
over 62,000 theoretically possible variants if all experimental evidence is overwhelming.
Cancer and Environmental Influences 177 C
3. Biological plausibility: Exposure to environmental exposure and pleural mesothelioma and radiation
pollution commences pre-conceptually and con- and leukemia. However, apart from the undisputed
tinues throughout all stages of life. This exposure rise in cancer incidence and its temporal sequence
therefore covers vulnerable windows of develop- considered in association with the rise in environ-
ment. There is very high biological plausibility mental pollutant exposure, added to the small num-
that exposure to a mixture of known carcinogens ber of published reports discussed in the biological
would be causally related to the postindustrial gradients section, there is in general weak associa- C
increase in cancer incidence. Furthermore, cancer tion. However, this is largely attributable the lack of
incidence would be age related as this is a measure specificity of association with a complex mixture of
of the length of exposure to these influences. influences.
4. Reasoning by analogy: We can reason that pollut- 8. Specificity: This is the weakest of the associations
ants that are known to cause cancer in animal with the Bradford Hill criteria with respect to the
models are likely to cause cancer in humans. This environmental causation of cancer. The reasons for
is indeed the basis of animal testing for the licensing the lack of specificity are clear; humans are exposed
of pharmaceutical and agrochemical agents, and to diffuse and multiple influences, including radia-
therefore the analogy is widely accepted by tion, chemicals, lifestyle choices, and changes in
governments. dietary habits. Epidemiology, one of the main
5. Coherence with biological background and tools for making specific associations in human
previous knowledge: We know that there has been disease, is a relatively weak tool when a mixture is
environmental contamination with bioactive sub- the causal factor under examination, given also that
stances which, prior to their invention and produc- there is very little fetal exposure data. This is par-
tion, had never been part of the environment. ticularly true if the effect is to alter the frequency in
Animal detoxification systems are generally well the population of a common condition. The power
adapted to natural biochemicals that they have of epidemiological studies is further weakened by
coevolved with, which is often not the case with the fact that there are no true control groups, that is,
novel xeno-chemicals. Bioaccumulation is an everybody has some degree of contamination with
example of such an incompatibility. In addition, such pollutants. Therefore, although this criterion
the effects of low-dose endocrine-disrupting cannot be met, there are perfectly good reasons for
chemicals on development, a newly discovered understanding why it cannot be met, and they
mechanism, have to be considered in the context should not necessarily be used as a reason for
of non-genotoxic mechanisms of carcinogenesis. avoiding the use of precaution. However, opponents
There is considerable prior biological knowledge of the theory of the environmental etiology of can-
to support a coherent theory for such exposures to cer adopt this very lack of specificity as the main
be able to lead to cancer. The causation of cancer plank of their arguments.
by environmental pollutants is in keeping with this It appears therefore that five out of the eight
criterion. Bradford Hill criteria support the argument that
6. Biological gradient (dose-response relationship): environmental pollutants are causally linked to the
Dose-response relationships to chemical pollutants rise in human cancer incidence. Of the remaining
have been demonstrated in animal studies, though three criteria, one offers some evidence, and the
in the case of EDS they frequently are non- other two are weak. However, the reasons for this
monotonic. Data on humans is weak, with many weakness are perfectly understandable, and the
gaps in knowledge. There are many confounding inability to detect any effect does not preclude its
factors, such as the complexity of the mixture to presence but is more likely a commentary on the
which the population is exposed. Dose-response unsatisfactory nature of the investigatory tools that
relationships to radiation exposure are strong and we have at our disposal.
well documented.
7. Strength of association: This is generally a weak Conclusions
criterion for general chemical pollution. Exceptions • The incidence of cancer in preindustrial societies
are, for example, the relationship between asbestos was low.
C 178 Cancer and Environmental Influences

• Since the beginning of the industrial revolution, Doll R, Peto R (1981) The causes of cancer. Oxford University
there has been an inexorable rise in human cancer Press, Oxford
Hardell L, Eriksson M (2003) Is the decline of the increasing
incidence, which is still continuing. incidence of non-Hodgkin lymphoma in Sweden and other
• The fact that environmental rather than genetic countries a result of cancer preventive measures? Environ
influences predominate in the causation of cancer Health Perspect 111:1704–1706
has been shown through human twinning studies. Hoffman FL (1923) Cancer and civilization, speech to Belgian
National Cancer Congress at Brussels (cited by Stefansson)
• Low-dose exposure to cell signaling disrupting Howard CV, Newby JA (2004) Could the increase in cancer
chemicals during windows of vulnerability in the incidence be related to recent environmental changes?
fetal and infant stages of development, resulting in In: Nicolopoulou-Stamati P et al (eds), Cancer as an
tissue dysgenesis, appears to be an important influ- Environmental Disease. Kluwer Academic Publishers,
pp 171–199
ence in adult cancer formation. Irigaray P, Newby JA, Clapp R, Hardell L, Howard CV,
• Cancer is a disease associated with industrializa- Montagnier L, Epstein S, Belpomme D (2007) Lifestyle-
tion. It is likely that there is a predominantly epige- related factors and environmental agents causing cancer: an
netic (non-genotoxic mechanism) influence; this overview. Biomed Pharmacother 61:640–658
Kaufman DG, Arnold JT (1996) Stromal-epithelial interactions
means that the process should be reversible if the in the normal and neoplastic development. In: Sirica AE (ed)
causative agents can be removed from the Cell and molecular pathogenesis. Lippincott-Raven, Phila-
environment. delphia, pp 403–432
• The proportion of cancers caused by “lifestyle fac- Lichtenstein P, Holm NV, Vrkasalo PK (2000) Environmental
and heritable factors in the causation of cancer. N Engl J Med
tors” as opposed to “external factors” remains to be 342:78–85
determined. However, the current state of knowl- Markey CM, Luque EH, Munoz De Toro M, Sonnenschein C,
edge indicates that the statement by the late Sir Soto AM (2001) In utero exposure to bisphenol A alters the
Richard Doll that only 1–4% of cancers can be development and tissue organization of the mouse mammary
gland. Biol Reprod 65:1215–1223
attributed to environmental pollution is a gross Murk AJ, Legler J, Denison MS, Giesy JP, van de Guchte C,
underestimate. Even using Doll’s estimate, the Brouwer A (1996) Chemical-activated luciferase gene
basis of which was never adequately explained expression (CALUX): a novel in vitro bioassay for Ah recep-
would mean that many millions of cancers are tor active compounds in sediments and pore water. Fundam
Appl Toxicol 33(1):149–160
already attributable to environmental chemical pol- Murray TJ, Maffini MV, Ucci AA, Sonnenschein C, Soto AM
lution. The likelihood is that a much higher propor- (2007) Induction of mammary gland ductal hyperplasia and
tion of human cancers, maybe a majority, is carcinoma in situ following fetal bisphenol A exposure.
attributable to external factors. Reprod Toxicol 23:383–390
Newby JA, Busby CC, Howard CV, Platt MJ (2007) The cancer
incidence temporality index: an index to show the temporal
changes in the age of onset of overall and specific cancer
Cross-References (England and Wales, 1971–1999). Biomed Pharmacother
61:623–630
Nicolopoulou-Stamati N, Howard CV, Gaudet BAJ (2004)
▶ Neoplasia Re-evaluation of priorities in addressing the cancer issue:
consclusions, strategies, prospects. In: Nicolopoulou-Stamati
P et al (eds) Cancer as an Environmental Disease. Kluwer
Academic Publishers, pp 171–199
References Skakkebaek NE, Meyts ER-D, Main KM (2001) Testicular dys-
genesis syndrome: an increasingly common developmental
Axelrad JC, Howard CV, McLean WG (2002) Interactions disorder with environmental impacts. Hum Reprod
between pesticides and components of pesticide formulations 16:972–978
in an in vitro neurotoxicity test. Toxicology 173:259–268 Sonnenschein C, Soto AM (1999) The society of cells. Taylor &
Bradford-Hill A (1966) The environment and disease: Francis, London, UK
association or causation? Proc R Soc Med 58:295 Soto AM, Fernandez MF, Luizzi MF, Oles Karasko AS,
Czene K, Lichtenstein P, Hemminki K (2002) Environmental Sonnenschein C (1997) Developing a marker of exposure to
and heritable causes of cancer among 9.6 million individuals xenoestrogen mixtures in human serum. Environ Health
in the Swedish Family-Cancer Database. Int J Cancer Perspect 105:647–654
99:260–266 Stefansson V (1960) Cancer: disease of civilization? An anthro-
Diamanti-Kandarakis E et al (2009) Endocrine-disrupting pological and historical study. Hill and Wang, New York
chemicals: an endocrine society scientific statement. Endocr Vetter W, Luckas B (1995) Theoretical aspects of
Rev 30(4):293–342 polychlorinated bornanes and the composition of toxaphene
Cancer Drug Discovery 179 C
in technical mixtures and environmental samples. Sci Total disease whose pathogenesis involves genetic muta-
Environ 160–161:505–510 tions as well as epigenetic changes, leading to altered
Vetter W, Scherer G (1998) Variety, structures, GC properties,
and persistence of compounds of technical toxaphene gene expression, signaling, and metabolic activity. The
(CTTs). Chemosphere 37:2525–2543 enormous individual variation in these changes adds to
the challenge of developing effective targeted drugs
for even a single known cancer type. Currently, the
most common drug treatment for cancer is through the C
Cancer Cell Cycle use of low-specificity cytotoxic chemotherapy drugs
that affect different aspects of cellular function and
▶ Cell Cycle, Cancer Cell Cycle and Oncogene whose effects are particularly damaging to fast-
Addiction dividing cells. In many cases, tumor cells either are
or become resistant to such drugs, resulting in limited
efficacy. Another, more recent approach has been to
develop drugs that target abnormalities specific to
Cancer Cenes
tumor cells, such as mutations in specific receptors.
Several such drugs are in use that affect intracellular
▶ Cancer Networks
signaling pathways altered in cancer (see, e.g.,
(Alvarez 2010)). They take advantage of our increased
knowledge about the link between genetic lesions and
Cancer Drug Discovery the capabilities normal cells acquire as they undergo
the transition to become malignant cells.
Reinhard Laubenbacher Two relatively recent developments in biology
Virginia Bioinformatics Institute, Virginia Tech, research promise a paradigm change in cancer drug
Blacksburg, VA, USA development: systems biology and cheap high-
throughput sequencing. The fundamental relevance
of systems biology to the understanding and treatment
Definition of cancer is the insight that genes and proteins do not
act in isolation, but rather as nodes in complex inter-
Development of anticancer drugs provides a crucial active networks that include multiple feedback mech-
component of the repertoire of therapeutic interven- anisms and redundancies. The design of effective
tions, in addition to surgery and radiation therapy. drugs to battle cancer will depend on the understanding
While existing drugs have led to important treatment of these networks and of the specific network alter-
advances, much remains to be done to discover new ations present in an individual tumor (Goodarzi et al.
drugs that effectively target key molecular mechanisms 2009; Erler and Linding 2010; Klipp et al. 2010). This
contributing to cancer pathogenesis. This is only possi- will allow the discovery of tailor-made combination
ble through an understanding of the complex nonlinear therapies that target specific network perturbations.
molecular networks that mediate drug actions and are High-throughput sequencing has made possible the
responsible for the de novo or acquired resistance of identification of the genetic changes in an individual
tumor cells to many existing drugs. The field of systems patient, which provide crucial information about how
biology is focused on precisely such networks and to target the network. Thus, systems biology is affect-
therefore holds the promise of changing the paradigm ing a paradigm shift away from single proteins as the
of drug discovery from one focused on individual pro- target of drugs toward the network as a target (Baggs
tein targets to one that targets networks instead. et al. 2010). The result will be what has been called
systems pharmacology (Yang et al. 2010), a systems
biology approach to drug design. Its hallmark will be
Characteristics combination therapy, guided by an analysis of the
complex signaling networks involved in malignant
As our understanding of the molecular basis of cancer changes (Wu et al. 2010). Mathematical modeling
grows, a picture emerges of a complex multi-faceted and computer simulation are essential tools both for
C 180 Cancer Drug Discovery

the understanding of the nonlinear dynamic networks (EGFR) or 2 (HER2) the ErbB receptor is activated,
involved, and for the study of effective control mech- followed by a cascade of phosphorylation events that
anisms (Feala et al. 2010). regulate apoptosis and cellular proliferation pathways,
The two case studies below illustrate the systems mediated by cyclin-dependent kinases. The drug
biology approach to cancer drug design. The first, trastuzumab, approved by the U.S. Food and Drug
focused on breast cancer, shows how one can use Administration in 1998, is a recombinant humanized
systems biology to address the limited efficacy of monoclonal antibody that binds with high affinity to
existing drugs by identifying new additional targets the extracellular domain of the HER2 receptor, and is
that can be used to amplify their effect. The second, applied in patients that exhibit HER2 overexpression.
focused on melanoma, shows how an understanding of However, at least two thirds of patients are de novo
signaling networks can lead the way to appropriate resistant to the effects of this drug. One possible expla-
choices for combination therapy. nation for this resistance on the cellular level could be
that cancer cells are able to exploit the redundancy of
CASE STUDY 1: Breast cancer drugs the signaling network between EGFR and the proteins
There are currently three types of drug therapy avail- controlling cell cycle progression, primarily the reti-
able for breast cancer patients: conventional cytotoxic noblastoma protein pRB, to avoid cell cycle arrest.
chemotherapy, hormone-based therapy, and targeted Thus, additional targets within this network need to
therapy based on monoclonal antibodies (Alvarez be identified in order to overcome resistance. In other
2010) The first two drug types are systemic, in the words, instead of individual proteins, the network has
sense that their effects are not specifically restricted to become the target.
to cancer cells. For instance, cytotoxic chemotherapy In (Sahin et al. 2009) this is done through the use of
drugs inhibit cell division in some way or target DNA a dynamic mathematical model of the network.
repair mechanisms or interfere with metabolism, and A computer implementation of the model enables sim-
are particularly damaging to fast-dividing cells. While ulation studies that can explore the effect of targeting
this includes cancer cells, in particular small, fast- other proteins in the network for their therapeutic
growing tumors, it also includes normal cells in, e.g., potential. New potential targets were identified,
the bone marrow, intestinal tract, and hair follicles. through in silico experiments that were then confirmed
What fundamentally distinguishes targeted therapy in the laboratory or the literature, including cyclin D1
drugs from the others is that they are based on cellular and CDK4, as well as transcription factors c-MYC and
abnormalities specific to some cancer cells, and ER-a. These could be exploited either instead of or in
thereby represent an entirely new and more sophisti- combination with HER2 to attack the entire network
cated class of chemotherapeutics. For breast cancer structure.
cells, an important molecular target that has been iden-
tified is the human epidermal growth factor receptor 2 CASE STUDY 2: Melanoma drugs
(HER2). Systems biology can play a key role in iden- Melanoma is a very aggressive cancer with few effec-
tifying such targets and understanding the network of tive treatment options. However, a recent increased
interconnected pathways that propagate their signals understanding of the genetic events underlying mela-
and their redundancy that helps cancer cells to develop noma pathogenesis and the signaling pathways
resistance to drugs that interfere with their action. The affected by mutations common in melanoma has
example of the drug trastuzumab that targets the HER2 opened the door to a systems biology approach to the
pathway and its role in controlling the cell cycle is discovery of targeted therapeutics. The review (Ko and
instructive. Fisher 2010) surveys some of these developments and
Members of a family of receptor tyrosine kinases, makes clear the need for combination therapies that
the ErbB family, are frequently overexpressed in epi- target the entire signaling network rather than individ-
thelial tumors and promote tumor cell proliferation in ual pathways, similar to the case of breast cancer
several cancers, including breast cancer. Four homol- described above.
ogous members of the HER receptor family have been Several signaling cascades have been discovered as
identified to date, HER1-4 (also known as ErbB1-4). important in melanoma. One is the RAS-RAF-MAPK-
After ligand binding to ErbB family members 1 ERK growth factor pathway. In virtual all human
Cancer Networks 181 C
melanoma cases this signaling cascade has been altered ▶ c-Myc
in some way, in particular through NRAS or BRAF ▶ Interaction Networks
mutations. Several agents targeting proteins in this ▶ Network-Based Biomarkers
pathway are available, such as the BRAF inhibitor ▶ Signal Transduction Pathway
PLX4032, currently in clinical trials. Another pathway, ▶ Systems Medicine
also subject to RAS signaling, is the PI3K-AKT-mTOR ▶ Systems Pharmacology, Drug-Target Networks
pathway, whose activation has the effect of suppressing C
apoptosis. Several small molecule inhibitors are avail-
able that target mTOR signaling. It is thought that none References
of the actors in these pathways stand alone in the
pathogenesis of melanoma, and it is therefore likely Alvarez RH (2010) Present and future evolution of
advanced breast cancer therapy. Breast Cancer Res
that no single-agent approach to this disease will be
12(Suppl 2):S1
successful in affecting a cure. The signaling network Baggs JE, Hughes ME et al (2010) The network as the target.
including the proteins in the two pathways mentioned Wiley Interdiscip Rev Syst Biol Med 2(2):127–133
here is complex, with feedback loops and redundant Erler JT, Linding R (2010) Network-based drugs and
biomarkers. J Pathol 220(2):290–296
pathways, thus, combination therapies targeting several
Feala JD, Cortes J et al (2010) Systems approaches and
network nodes at the same time hold the most promise algorithms for discovery of combinatorial therapies. Wiley
for success. Here too, detailed computer models of the Interdiscip Rev Syst Biol Med 2(2):181–193
network are essential tools for the discovery of effec- Goodarzi H, Elemento O et al (2009) Revealing global
regulatory perturbations across human cancers. Mol Cell
tive drug combinations. Probably, the computer model
36(5):900–911
will ultimately have to be enlarged to also take into Klipp E, Wade RC et al (2010) Biochemical network-
account other apoptosis pathways as well as growth based drug-target prediction. Curr Opin Biotechnol
factor signaling. 21(4):511–516
Ko JM, Fisher DE (2010) A new era: melanoma genetics and
therapeutics. J Pathol 223(2):241–250
Sahin O, Frohlich H et al (2009) Modeling ERBB receptor-
Discussion regulated G1/S transition to find novel targets for de novo
trastuzumab resistance. BMC Syst Biol 3:1
Wu Z, Zhao XM et al (2010) A systems biology approach to
The experience with cancer drugs that target individual
identify effective cocktail drugs. BMC Syst Biol
proteins suggests that combination therapies have the 4(Suppl 2):S7
greatest promise of being effective. To identify the Yang R, Niepel M et al (2010) Dissecting variability in responses
right combination of targets for a particular tumor it to cancer chemotherapy through systems pharmacology. Clin
Pharmacol Ther 88(1):34–38
is essential to understand the signaling network that is
perturbed through mutations and that mediates the
drugs’ action. It is the aim of systems biology to
provide the link between an organism’s genomic
sequence and its phenotype through an understanding Cancer Networks
of the various dynamic networks that connect the two.
Thus, a systems biology approach to drug discovery Eric Werner
and design of combinatorial therapies is indispensable. Department of Physiology, Anatomy and Genetics,
University of Oxford, Oxford, UK
Department of Computer Science, University of
Cross-References Oxford, Oxford, UK
Oxford Advanced Research Foundation, Fort Myers,
▶ Adult T-Cell Leukemia FL, USA
▶ Apoptosis
▶ Cdk Inhibitors
▶ Cell Cycle Checkpoints Synonyms
▶ Cell Cycle, Cancer Cell Cycle and Oncogene
Addiction Cancer cenes; Cenes; Developmental cancer networks
C 182 Cancer Networks

Definition a1

A cancer network is a developmental network


(▶ Developmental Control Networks) or ▶ cene that
contains one or more self-regenerating loops resulting Φ1 → A1 B
in endless cell proliferation. The nodes in the network
are cell states. The branches denote cell division or
jump to a new cell state.
b

Cancer Networks, Fig. 1 A linear, first-order cancer stem


Characteristics cell network. It generates cells B if the condition F1 is
met (Werner 2011b). In cancer, the condition F1 may be
Cancer networks come in two basic forms: cancer locked into being satisfied, so that we have endless cell
proliferation.
▶ stem cell networks and exponential networks,
where cancer stem cell networks are further divided
into linear stem cell networks and more general
geometric networks: Stem Cell Hierarchy
• Cancer stem cell networks Stem cell networks form a natural hierarchy based
– Linear cancer networks on the number of loops in their networks. For
– Geometric cancer networks this reason, we distinguish between first-order,
• Exponential networks second-order, third-order, and further higher-order
Cancer networks like all ▶ developmental control ▶ stem cell networks.
networks can be further divided into deterministic and
stochastic networks. Networks may also involve cell Geometric Networks
signaling. Based on the topology of their ▶ develop- We call these networks geometric (control) networks
mental control networks, there exist a vast number of because their proliferative properties are related
possible cancer networks. The monograph to properties of geometric numbers (also called
(Werner 2011b) gives a kind of periodic table of all the figurative numbers). They are also related to the
possible cancer networks together with their salient coefficients of Pascal’s triangle (Werner 2011b;
proliferative and phenotypic properties. Fig. 3).
The general case of k loops in cancer ▶ stem cell
Linear Cancer Networks networks: A higher-order geometric network Gk
A linear cancer stem cell is a self-regenerating cancer containing k loops and one link to a terminal develop-
cell that generates other terminal progenitor cells. If mental network after n rounds of synchronous
the network allows stochastic dedifferentiation then divisions has an ideal proliferation rate that given by
these terminal cells may dedifferentiate to cancer the following formula, when n > 0:
stem cells. A linear cancer network G1 contains one
self-regenerating loop and one link to a terminal devel- nðn  1Þ
opmental network. They have many of the character- Cellsðn; kÞ ¼ 1 þ n þ
2
istics of linear ▶ stem cell networks (Werner 2011b;
nðn  1Þðn  2Þ
Fig. 1). þ þ ...
6
Cancer Meta-Stem Cells n!
þ (1)
A cancer meta-stem cell or second-order cancer stem k!ðn  kÞ!
cell, G2, is a cancer stem cell that generates other
cancer stem cells. They have many of the characteris- !
X
k
n
tics of normal meta-stem cell developmental networks Cellsðn; kÞ ¼ (2)
(Werner 2011b). See ▶ Stem Cell Networks (Fig. 2). i¼0 i
Cancer Networks 183 C
a2 a1

Φ2 → A2 Φ1 → A1 B C

a1 C
b

Cancer Networks, Fig. 2 A cancer meta-stem cell network is the conditions Fi may be locked into being satisfied, so that we
a second-order geometric network. It consists of a second-order have endless cell proliferation. In an ideal space with no resis-
loop at A2 linked to a first-order loop at A1 (Werner 2011b). tance, synchronous divisions, and when both conditions Fi are
A cell in state A2 is a cancer meta-stem cell that generates satisfied, this network generates a triangle
first-order linear cancer stem cells. In cancer, one or both of

ak a k−1 a1

Cancer Networks, Fig. 3 A


kth-order stem cell network Ak A k−1 A1 D
(Werner 2011b). The network
contains k loops at control
states Ak . . . A1 and end in
a terminal developmental
a k−1 d
network D

Limited Exponential Growth with Geometric tumor formed by a third-order stem cell, we will see
Networks second-order stem cells and first-order stem cells
When the number of rounds of division n is less than with a majority being terminal or terminal progenitor
the number of loops k, i.e., 0  n  k, then a geometric cells (Fig. 4).
stem network exhibits exponential growth since the First-order cancer stem cells are slow growing with
following holds: linear proliferation rates and can be relatively harmless
if there is no stochastic dedifferentiation. A second-
! ! ! ! !
X
n n n n n n order stem cell tumor grows more quickly (adding the
¼ þ þ þ ... þ triangular number) with an ideal proliferation rate
i 0 1 2 n
i¼0
given by the following formula, given n > 0:
¼2 n

(3)
nðn  1Þ
Cellsðn; 2Þ ¼ 1 þ n þ (4)
2
Metastatic Hierarchy
The stem cell network hierarchy generates a hierarchy A third-order stem cell tumor grows even more
of cancer metastases (Werner 2011b). A first-order quickly (adding the tetrahedral number) with an ideal
cancer stem cell generates terminal cells. proliferation rate as follows:
A second-order cancer stem cell generates first-order
stem cells which then generate terminal progenitor nðn  1Þ
Cellsðn; 3Þ ¼ 1 þ n þ
cells. A third-order stem cell generates second-order 2
stem cells which generate first-order stem cells which nðn  1Þðn  2Þ
þ (5)
generate terminal progenitor cells. Hence, in a given 6
C 184 Cancer Networks

a1

A B C

a2

Cancer Networks, Fig. 5 An exponential cancer network


where a cell in control state A divides into daughter cells that
self-loop and differentiate to control state A. For details, see
Werner 2011b

Subnetworks and Tangentially Activated


Networks
Any loop can include a path that is a subpath of another
Cancer Networks, Fig. 4 A metastatic hierarchy with one red
G3 stem cell, and the rest are orange G2, green G1, and blue
developmental network. This means the loop has
terminal cells growing in a background of light beige cells. For further developmental progeny generated by that
details, see Werner 2011b subpath. This ability to link to subnetworks and other
developmental networks is a basic characteristic of
▶ developmental networks (Werner 2011a, b).
These tangentially activated networks can generate
Tumor Properties of Stem Cell Networks the dominant phenotype of a tumor or cancer. For
The tumor has interesting properties. Initially, the example, with meta-stem cell cancer networks, the
growth rate is exponential for all cells whose division dominant phenotype may consist of terminal or
cycle is less than or equal to the number of loops in the terminal progenitor cells that are non-cancerous. Ter-
stem cell network. Then, the higher the number of atoma tumors also may result from a small set of
loops, the greater the percentage of cells in the tumor cancer cells with the vast majority being terminal
that are actively proliferating. For example, with cells of various tissue types.
a tumor generated by a stem cell controlled by
a third-order stem cell network, a high percentage of
the cells will be proliferating (a ratio that relates Conclusion
the volume of a tetrahedron to the area of the base of Cancer networks can exist in a vast space of ▶ devel-
that tetrahedron). opmental networks. The space of all possible cancer
networks is described in Werner 2011b.
Exponential Networks
An exponential network X contains two control loops
from two daughter cells that lead back to the parent cell
(Fig. 5). Cross-References
As with linear and geometric networks, exponential
networks can have a great variety of forms depending ▶ Cene
on the subnetworks they contain and the other net- ▶ Developmental Control Networks
works that they activate. ▶ Stem Cell Networks
Cancer Pathology 185 C
References derived from including neoplastic cells, supporting
connective tissue and mesenteries, blood vessels, lym-
Werner E (2011a) On programs and genomes, arXiv:1110.5265v1 phatics, nerves, and immune cells. Solid tumors are
[q-bio.OT]. http://arxiv.org/abs/1110.5265
subclassified generally based on the tissue and embry-
Werner E (2011b) Cancer networks: a general theoretical and
computational framework for understanding cancer, ological cell of origin into epithelial, mesenchymal,
arXiv:1110.5865v1 [q-bio.MN]. http://arxiv.org/abs/ or mixed tumors. Embryologically, epithelial tumors
1110.5865v1 are those derived from ectoderm and endoderm. Mes- C
enchymal tumors are derived from mesoderm or neural
crest cells. The distinction is important because the
Cancer Pathology biological behavior and the course of treatment vary
with cell type.
Barbara J. Davis The nomenclature of the tumor is then defined by
Section of Pathology, Tufts Cummings School of the origin of the cell and whether it is benign or malig-
Veterinary Medicine Biomedical Sciences, North nant (Table 1).
Grafton, MA, USA Benign epithelial tumors are called adenomas, pap-
illomas, polyps; malignant epithelial cancers are
called carcinomas. For further clarification, the tissue
Characteristics or cell of origin is also included in the name. For
example, a malignant cancer of the cells that form the
Cancer pathology is the study of structural (morphol- liver parenchyma, or the hepatocytes, is called
ogy), biochemical, molecular, or genotypic alterations a hepatocellular carcinoma, while a cancer of the
and the functional or clinical manifestations associated cells that form the bile transport system in the liver,
with disorders of growth of tissues. Tumors are defined or the bile ducts and cholangioles, is called
by disorganized tissue architecture and excessive cell a cholangiocellular carcinoma.
proliferation. A tumor, translated as “swelling,” is Benign mesenchymal tumors are named with refer-
a general term used in pathology to designate a mass ence to the tissue of origin. For example, a benign
associated with disorders of growth (Fig. 1). The suffix mesenchymal tumor of connective tissue is called
“oma” is used to designate a mass of tissue or tumor. a “fibroma.” A benign tumor of smooth muscle origin
Tumors may be “congenital” or develop after birth. is a leiomyoma. Malignant mesenchymal cancers are
Congenital tumors are classified as ▶ hamartomas called sarcomas. A malignant cancer of connective
or ▶ choristomas. tissue is called a fibrosarcoma. A malignant cancer of
Tumors that represent disturbances of growth after a smooth muscle cell is a leiomyosarcoma.
birth are classified as ▶ hyperplasia, ▶ dysplasia, or Mixed tumors are tumors in which the progenitor
▶ neoplasia (Fig. 1). cells develop features of both epithelial and mesenchy-
A neoplastic growth may be ▶ benign or mal components and are named according to tissue and
▶ malignant. behavior. For example, a tumor of benign mixed tumor
A malignant neoplasm (or may also be referred to as of a mammary (breast) gland may be called
a malignant tumor) is generally synonymous with the a fibroadenomas or benign mixed tumor; a malignant
term “cancer.” That is, the use of the term “cancer” in mixed tumor is called a carcinosarcoma.
medicine refers to a malignant neoplasm. The spread There is also special consideration given to tumors
of a malignant cancer is referred to as ▶ metastasis. derived from subtypes of cells within tissues such as
Tumors are further defined by their cell and tissue of those from neuroendocrine cells. These are more com-
origin (Fig. 2). mon in the gastrointestinal tract, but also found in
respiratory tract, and other sites. Neuroendocrine
Solid Tumors tumors are called “carcinoids” and may be considered
Solid Tumors are tissue-based masses and represent an benign, but most are considered malignant or with
imitation of the tissue and tissue components they are malignant potential.
C 186 Cancer Pathology

Cancer Pathology,
Fig. 1 Classification scheme
for identification of masses or Tumor (mass of tissue)
“tumors”

Congenital Growth Disturbance

Harmatoma Choristoma Hyperplasia Dysplasia


Neoplasia

Benign Malignant
“Cancer”

Neoplasia

Solid tissue Liquid blood

Cancer Pathology, Mixed


Epithelial Mesenchymal Lymphoma Leukemia
Fig. 2 Classification of
tumors based on origin

Neural crest cells give rise to pigmented melanin- germ cells may give rise to tumors of embryonic
producing cells called melanocytes. A benign tumor of origin such as teratomas, choriocarcinomas, or even
melanocytes is called a “nevus” or melanocytoma. yolk sac tumors.
A malignant melanocytic cancer is called a melanoma
or malignant melanoma. Hematopoietic tumors
Special consideration and nomenclature is also Hematopoietic tumors are also referred to as “liquid
used for tumors of specialized tissue such as the tumors” and are of blood-based origin. This category
nervous system and the gonads. In the brain, tumors includes cells that make up the immune white blood
may be classified as neuroepithelial and non- cell system such as the lymphocytes and granulocytes
neuroepithelial tumors and have special stains to dif- and red blood cell systems such as erythrocytes and
ferentiate their origin. In the gonads, the totipotent megakaryocytes. Terms such as hyperplasia,
Cancer Pathology 187 C
Cancer Pathology, Table 1 Examples of tumor nomenclature by tissue of origin and behavior
Benign Malignant
Epithelial Adenoma Carcinoma
Papilloma
Polyp
Of tissue Skin Papilloma Squamous cell carcinoma
Basal cell carcinoma C
Gastrointestinal tract Polyp or adenoma Adenocarcinoma
Bladder Polyp Transitional cell carcinoma
Kidney Oncocytoma Renal cell carcinoma
Clear cell variant
Papillary carcinoma
Juxtaglomerlar Tumor
Renal pelvis squamous cell carcinoma
Liver hepatocytes Hepatoma Hepatocellular carcinoma
Liver bile duct Cholangioma Cholangiocarcinoma
Mesenchymal “oma” Sarcoma
Of tissue Connective tissue Fibroma Fibrosarcoma
Smooth muscle Leiomyoma Leiomyosarcoma
Skeletal muscle Rhabdomyoma Rhadomyosarcoma
Endothelial cells Hemangioma/angioma Hemangiosarcoma or angiosarcoma
Cartilage Chondroma Chondrosarcoma
Bone Osteoma Osteosarcoma
CNS Neuroepithelial
Astrocytes Astrocyoma Anaplastic astrocytoma
Glial cells Glioblastoma multiforme
Oligodendrocytes Oligodendroglioma Anaplastic olgiodendroglioma
Ependymal cells Ependymoma Anaplastic epednymoma
Choriod plexus Papilloma Carcinoma
Embryonic origin Medulloepithelioma

dysplasia, and neoplasia are similarly used to define or “trimming” the mold with the tissue; and then plac-
disorders of altered architecture of these lineages as ing the tissue on a glass slide. The tissue will then be
well. Cancer of lymphoid organs is termed lym- stained for visual examination. Preservatives include
phoma. Cancer of circulating bone marrow cells is 10% buffered formalin, which is the most common
termed leukemia. routinely used tissue preservative. Other preservatives
that may be used are methanol or 80% alcohol, 2% or
Diagnosis 4% paraformaldehyde, glutaraldehyde, and “special”
The diagnosis of a mass is based on the gross appear- fixatives such as Bouin’s fixative, which includes
ance and behavior of the tumor and the histological picric acid. Tissues may also be frozen in a container
appearance (Fig. 3). in liquid nitrogen. The tissue is then stained –
For microscopic assessment, the tumor is sampled hemotoxylin and eosin (or H&E) staining is most com-
during an in-life surgical procedure called a biopsy or monly used on a routine basis.
after death as an autopsy. The preparation of the tumor The determination of whether the mass is hyper-
includes fixation of the mass in a preservative; plastic, dysplastic, or neoplastic, of epithelial, mes-
processing the tissue through a series of alcohol dehy- enchymal, or mixed (or “other”) origin, and whether
dration steps to remove water and replace it with it has features of a benign or malignant tumor is based
a transparent harder material which is usually wax on the microscopic pattern using criteria including
paraffin; embedding the tissue in a mold; sectioning the tissue of origin, level of organization, or
C 188 Cancer Pathology

Cancer Pathology, Fig. 3 (a) Photograph of liver with a tan irregularly nodular and cavitated mass. A section of mass is dissected,
fixed in a preservative (formalin), and processed to make a tissue section and stained for microscopic examination. (b) Photomicro-
graph of a tissue section of the mass in (a) stained with hemotoxylin and eosin (H&E). The pattern and location is consistent with
a diagnosis of cholangiocellular carcinoma, a neoplasm arising from the biliary epithelium

Cancer Pathology,
Fig. 4 Photographs of
examples of (a) normal
(arrow) and hyperplastic (*)
and (b) dysplastic squamous
epithelium of mucosa; (c)
Normal mucosa intestinal
epithelium compared to (c)
adenoma or polyp
characterized by disorganized
proliferation of epithelium
supported by a fibrovascular
stalk of connective tissue; and
(d). Adenocarcinoma
characterized by disorganized
proliferation of pleomorphic
epithelium cells invading into
the subjacent submucosa and
muscular wall of the intestine
(box) and (e) higher
magnification of invading
carcinoma

differentiation and level of containment (Fig. 4). The tissue, and whether the pattern most closely resem-
mass is identified and described as to whether it has bling the tissue or cell of origin. For example, neo-
a connective tissue capsule or not, whether it is plastic epithelial cells tend to arrange as glands of
compressing adjacent tissue or invading adjacent acini or tubules, nests or islands. Neoplastic
Cancer Pathology 189 C
Cancer Pathology,
Fig. 5 Photograph of
a section of H&E stained
breast tissue diagnosed as
carcinoma in situ (a) with
intraductular but not invasive
masses of disorganized
pleomorphic epithelial cells
that are confined to the C
basement membrane boxed
and shown in (b)

mesenchymal cells tend to arrange in interlacing or ▶ immunohistochemistry is also commonly used to


intersecting bundles, whorls, or palisades. In addition further define the tissue or cell of origins.
to the overall pattern, the morphology of the cells is For the diagnosis of solid tumors, IHC is used to
also important in the diagnosis. Neoplastic epithelial differentiate epithelial tumors and mesenchymal
cells are generally cuboidal, columnar, or polygonal; tumors based on the type of intermediate filaments.
neoplastic mesenchymal cells are generally elongated Intermediate filaments are structural proteins that
or “spindle-shaped.” Additional criteria include help determine the cell shape. Neoplastic epithelial
▶ anaplasia, cell ▶ pleomorphism, nuclear to cyto- cells retain cytokeratins and react with antibodies
plasmic ratios, the appearance of the nucleus, directed at these intermediate filaments (Fig. 6). Neo-
▶ anisocytosis, ▶ anisokaryosis, ▶ mitotic figures, plastic mesenchymal cells retain vimentin microfila-
and their appearance (normal alignment or bizarre). ments. Mixed tumors will show staining for both
Characteristics such as how fast the tumor grew, cytokeratins and vimentin. For hematopoietic tumors,
whether there were previous tumor diagnoses, or IHC is used routinely to differentiate lymphocytic
whether the mass represents a “re-growth” of tumors (lymphoma) derived from B-cells or T-cells
a previously excised tumor are also important in the (Fig. 7).
consideration of the diagnosis. IHC methods have also been developed as diagnos-
Benign tumors are considered well differentiated, tic and prognostic indicators such as in breast cancers
encapsuled with usually scant and normal mitotic fig- where the status of estrogen receptor, progesterone
ures, and usually normal mitotic figures. Malignant receptor, and HER2/neu status is determined prior to
tumors breech the basement membrane if epithelial in and in support of the type of therapy that will be used.
origin and show local invasion, lymphatic or vascular Evaluation of the ultrastructural features of cells via
invasion, and destruction of surrounding tissue. A few electron microscopy is also valuable in diagnostic
malignant tumors do not metastasize like basal cell evaluation of the origin of some samples that do not
carcinoma or gliomas, and sarcomas tend to be locally have characteristic features or for which enzyme or
invasive. ▶ Carcinoma in situ also has characteristics immunohistochemistry are not diagnostic.
of malignancy but does not invade the local tissue
(Fig. 5). Tumor metastasis may occur as seeding of Grading and Staging
peritoneal cavity or other spaces (ovarian carcinomas, A grade is the qualitative or semiqualitative determi-
mesothelioma); spread through lymphatics (most com- nation or assessment of the degree of differentiation of
mon); or invasion and spread through the vasculature. the tumor. It includes a number of mitosis, cellular
The liver and lungs are common sites for metastasis cytoplasmic, and nuclear features. Grading schemes
because of blood flow to these organs through first pass developed over time for each tumor type when appro-
and capillary beds. priate. Schemes may be in two categories – “high”
In addition, special enzyme-based stains can be grade or “low” grade or low, intermediate, or high
used to reveal cellular components. Staining tissues grade. Grading is subjective and has proven less valu-
with antibodies for a colorimetric process, called able clinically than staging.
C 190 Cancer Pathology

Cancer Pathology,
Fig. 6 Photographs of
biopsies from (a) H&E stained
section of a poorly
differentiated neoplastic
cancer confirmed to be
a carcinoma based on
identification of (b) positive
brown cytoplasmic staining in
a section stained for
cytokeratin intermediate
filaments using
immunohistochemistry; (c)
H&E stained section of
a intersecting bundles of
spindle-shaped cells
confirmed to be a sarcoma
based on identification of (d)
positive brown cytoplasmic
staining in a section stained for
vimentin intermediate
filaments using
immunohistochemistry

Cancer Pathology,
Fig. 7 Photograph of a H&E
stained biopsy from an
enlarged lymph node
consistent with lymphoma and
confirmation of subtype of
a B-cell origin using
immunohistochemistry with
antibodies to CD79a, which is
expressed on B-cells, or CD3,
which is expressed on T-cells
but not B-cells

The stage of tumor is based on the size of the Metastasis followed by a numerical assignment for
primary tumor and the extent and spread to each category. The designations are: T1-T4; N1-N3;
regional lymph nodes or blood-borne metastasis. and M1, 2.
The staging of tumor is designated by the abbrevia- For carcinoma in situ, the stage would be designated
tion “TNM” for Tumor size, regional Node, and as T0.
Cancer Resources 191 C
References Cesario and Marcus (2011) on cancer systems biology,
bioinformatics, and medicine. This book describes
Rosai J (2011) Rosai and Akerman’s surgical pathology, systems approaches to cancer research and spans
10th edn. Mosby Elsevier, Edinburgh
a variety of topics from laboratory to clinical aspects
Striker T, Kumar V (2009) Neoplasia. In: Aster J, Abbas A,
Kumar V, Fausto N (eds) Robbins and Cotran pathological of cancer. In some publications (e.g., Yan 2010),
basis of disease, 8th edn. Saunders Elsevier, Philadelphia, cancer systems biology is included as a major book
pp 259–330 chapter. C
World Health Organization (WHO) Classification of
Several journals (e.g., BMC Systems Biology,
Tumours. Published by Interational Agency for Research
on Cancer. Cancer Discovery) provide information on latest
results about systems biology and cancer research.
BMC Systems Biology (http://www.biomedcentral.
com/bmcsystbiol/) is an open access journal published
Cancer Resources by BioMed Central. It includes original peer-reviewed
research articles on the functions of biological sys-
Jingky Lozano-K€uhne tems. The Cancer Discovery journal (http://
Department of Public Health, University of Oxford, cancerdiscovery.aacrjournals.org/) is a new cancer
Oxford, UK information resource that also publishes review arti-
cles aside from articles on major advances in cancer
research and clinical trials. Monographs and published
articles about cancer research are also provided by the
Definition International Agency for Research on Cancer (IARC)
which is part of the World Health Organization. Pub-
▶ Cancer resources provide information about cancer, lications of the IARC can be freely downloaded from
researches and clinical trials, and tools in cancer http://www.iarc.fr/.
research and analysis. Other cancer resources that are One comprehensive online resource on cancer is the
intended for patients and caregivers provide informa- Website of the National Cancer Institute (NCI, http://
tion on cancer pain management, support groups, www.cancer.gov/) which provides information on the
financial assistance, and palliative care. The selected different types of cancer, clinical trials, and cancer
cancer resources described here focus on resources for statistics. Beyond information, NCI also provides
health professionals, researchers, and students in the funding and support to cancer researches and
field of systems biology. These resources are available also conducts its own laboratory and clinical studies.
in different formats. There are printed publications, Another useful Website is the Cancer Systems
online references, databases, and also software pack- Biology Resource of the University of Utah (https://
ages for cancer data management and research cancersystemsbiology.utah.edu/) which provides links
analysis. to topics such as basic gene and protein information,
▶ Pathways, tissue distribution and ▶ Gene Expres-
sion information among others.
Characteristics There are numerous printed and online publications
on cancer in addition to the resources described above.
Printed Publications and Online References Many of them share common references and links as
Basic information about cancer can be found in many the ones already mentioned. One can also consult the
medical textbooks and other references. As a reference ▶ Disease Databases and specific cancer (e.g., http://
specific on ▶ Cancer Systems Biology, the book of www.breastcancer.org/) sites for searching additional
Wang (2010) provides not only basic concepts on details about cancer.
cancer and systems biology but also applications,
information on software tools, data resources, and Cancer Databases
a significant collection of research updates. This is Cancer databases may contain collated information
the first book written specially for computational and about different types of cancers, cancer genes, or
experimental biologists. Another reference is that of other clinical data related to cancer. As data sources,
C 192 Cancer Resources

they are useful especially for conducting research on while the Omics World provides links to resources on
analysis methods. One commonly used cancer genomics, genetics, experimental protocols, reagents,
database is the Cancer Chromosomes Database which instruments, and also software.
will be soon part of the dbVar database, a database of There are more data sources available online for
genomic structural variation. Both databases are cancer research. An attempt to collate and interconnect
maintained by the United States National Center for all these data was done by the National Cancer Institute
Biotechnology Information (NCBI). The Cancer (NCI) through the cancer Biomedical Informatics Grid
Chromosome Database is integrated into the (caBIG®, https://cabig.nci.nih.gov/). It was launched
NCBI ▶ Entrez system and is actually composed of by the National Cancer Institute to create a virtual
three sub-databases containing cytogenetic network of interconnected cancer data and people
(▶ Cytogenetics), clinical, and/or reference working together in cancer research. The caBIG® com-
information. munity Website has tools and facility for data sharing
The National Cancer Institute (NCI) provides and also workspaces where members can conduct vir-
access to cancer data and also tools for its analysis tual activities, discussions, and exchange of ideas.
in their statistics’ web page http://www.cancer.gov/ Membership to the network is open to the public
statistics/tools. For studies on cancer incidence and for free.
population data associated by age, sex, race, year of
cancer diagnosis, and geographic areas, their Surveil- Software for Data Management and Analysis
lance, Epidemiology, and End Results (SEER, NCI Many of the cancer data providers previously
2005) data are commonly used. For data that are mentioned also provide data management and analysis
more specific to genes and pathways, the data portal tools. For example, the SEER Website offers analysis
of the International Cancer Genome Consortium software for the SEER data and other cancer-related
(ICGC, http://www.icgc.org/) would be a useful databases for studying the impact of cancer in the
source. The Cancer Genome Atlas (TCGA, http:// population. The ICGC Data Portal also runs a freely
cancergenome.nih.gov/) also provides a data portal available software called BioMart which is used for
for researchers on cancer. The data available at data mining. The caBIG® Website offers quite
TCGA contains clinical information, ▶ Histopathol- a comprehensive collection of tools for cancer research
ogy slide images, and molecular information of high- (NCI 2009). One of the commonly availed tools in
quality tumor samples. For pathway analysis, the caBIG® is the caArray which is used for microarray
▶ KEGG Pathway database is a useful resource. For (▶ DNA Microarrays) data management system. The
interests in mutation data and sequencing (▶ DNA caIntegrator is another tool that allows users to set up
Sequencing) data, the COSMIC database (Forbes web portals for integrative search. For integrated
et al. 2008) which can be downloaded from http:// genomics, the geWorkbench tools can be used for
www.sanger.ac.uk/genetics/CGP/cosmic/ is a good visualization and analysis of gene expression and
reference. The Website also provides other cancer- sequence data. One can also find tools in caBIG® for
related data and information. For other data on molecular analysis, cancer genome-wide association
mutations, one can also check the SH2base site scan (▶ Genome-wide Association Study), clinical
(http://bioinf.uta.fi/SH2base/). trials management, and many more. Currently, more
Omics data or data in the field of biology ending in than 40 tools are available in caBIG®.
“–omics” (e.g., ▶ Pharmacogenomics, ▶ Proteomics) Other software packages and tools for specialized
are also commonly used in the study of cancer and analysis are also available online. Vital to cancer
systems biology. Omics data enable the study of com- systems biology research are computational platforms
plex pathways in cancer etiology. Example of online such as software for simulation, analysis, and method-
resources that provide information on omics data are ologies for modeling. For visualization, mining, anal-
the Biodatabase (http://biodatabase.org/) and the ysis, and modeling of biological networks, an
Omics World Websites (http://www.omicsworld. integrated software platform called VisANT
com/). The Biodatabase contains links to cancer gene (Hu et al. 2009) can be accessed at http://visant.bu.
databases, protein, genomic databases, and others, edu/. Additional resources on computational platforms
Cancer Stem Cell Kinetics 193 C
are also accessible at the Website of the Systems Language (SBML): a medium for representation and
Biology Institute (SBI, http://www.sbi.jp/index.htm) exchange of biochemical network models. Bioinformatics
19:524–531
based in Japan. SBI is a nonprofit private research National Cancer Institute (NCI) (2005) SEER surveillance, epi-
institution that promotes systems biology research demiology and end results program. National Institutes of
and its application to medicine and global sustainabil- Health, Bethesda
ity. They have played a major part in the development National Cancer Institute (NCI) (2009) The cancer biomedical
informatics grid caBIG resource guide. National Institutes of
of the Systems Biology Markup Language (SBML, Health, Bethesda C
Hucka et al. 2003) which is used in software packages Wang E (ed) (2010) Cancer systems biology, CRC mathematical
for graphical editing and simulation of molecular and computational biology series. CRC Press, Boca Raton
networks. Yan Q (ed) (2010) Systems biology in drug discovery and
development: methods and protocols. Humana Press/
One software which uses SBML is the CellDesigner Springer, New York
(http://celldesigner.org/), a structured diagram
editor for drawing gene regulatory and biochemical
networks (▶ Gene Regulatory Networks).

Cancer Stem Cell Kinetics


Cross-References
Heiko Enderling
▶ Cancer Center of Cancer Systems Biology, St. Elizabeth’s
▶ Cancer Systems Biology Medical Center - CBR 115D, Tufts University School
▶ Cytogenetics of Medicine, Boston, MA, USA
▶ Disease Databases
▶ DNA Microarrays
▶ DNA Sequencing Synonyms
▶ Entrez
▶ Gene Expression Cancer-initiating cells; Tumor stem cells; Tumor-
▶ Gene Regulatory Networks initiating cells; Tumor-rescuing units
▶ Genome-wide Association Study
▶ Histopathology
▶ KEGG Pathway Database Definition
▶ Pathways
▶ Pharmacogenomics Cancer stem cells possess seemingly unlimited prolif-
▶ Proteomics eration capacity and the ability to give rise to
a heterogeneous population of cancer stem cells
and mortal non-stem daughters (Al-Hajj et al. 2003).
During mitosis, cancer stem cells may undergo either
References Asymmetric Division to simultaneously self-renew
Cesario A, Marcus F (eds) (2011) Cancer systems, biology,
and generate one new non-stem cancer cell, or
bioinformatics and medicine: research and clinical Symmetric Division to yield two identical cancer
applications. Springer, Dordrecht stem cells (Morrison and Kimble 2006; Dingli et al.
Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, 2007). Over many generations, a heterogeneous
Menzies A, Teague JW, Futreal PA, Stratton MR (2008) The
population emerges, within which cancer stem cells
catalogue of somatic mutations in cancer (COSMIC). Curr
Protoc Hum Genet doi:10.1002/0471142905.hg1011s57 are uniquely able to initiate, sustain, and reinitiate
Hu Z, Hung J, Wang Y, Chang Y, Huang C, Huyck M, DeLisi C tumors after cytotoxic insults and/or at distal sites.
(2009) VisANT 3.5: multi-scale network visualization, anal- Cancer stem cell longevity and self-renewal is, in
ysis and inference based on the gene ontology. Nucleic Acids
Res 37:W115–W121
part, due to upregulation of telomerase that prevents
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, telomere erosion (Blackburn and Gall 1978; Bodnar
Kitano H et al (2003) The Systems Biology Markup et al. 1998).
C 194 Cancer Stem Cell Kinetics

Characteristics (only for non-stem cancer cell generations), and


a cell death term:
While cancer stem cells can be an infinitesimal sub-
population of the whole tumor (in theory, one cell), the dC
cancer stem cells C : ¼ pdC  aC
cancer stem cell hypothesis does not exclude cancer dt
stem cells being the dominant phenotype in a tumor or 1st generation non - stem cancer cells N:
even the trivial case of the cancer stem cell subpopu- dN1
¼ ð1  pÞdC  g1 N1  b1 N1
lation being equal to the total tumor population. Let us dt
denote the total tumor population with T and the pop- 2nd generation non - stem cancer cells N:
ulation of cancer stem cells with C. Then the cancer dN2
stem cell hypothesis can be expressed using set theory ¼ 2g1 N1  g2 N2  b2 N2
dt
as C 2 T. All sets of C shown in Fig. 1 are subsets () :
or true subsets () of T:
:
Cancer Stem Cell Hierarchy :
It is often argued that tumors follow a unidirectional n1st generation non - stem cancer cells N:
hierarchy (Fig. 2). Cancer stem cells sit on top of the dNn1
¼ 2gn2 Nn2  gn1 Nn1  bn1 Nn1
lineage tree and give rise to mortal non-stem cancer dt
cells with limited proliferation capacity. Let us denote nth generation non - stem cancer cells N:
the ith generation non-stem cancer cells with Ni. The dNn
kinetics of the cancer stem cell and non-stem cancer ¼ 2gn1 Nn1  bn Nn ;
dt
cell populations can be described by a term for prolif-
eration (only for cancer stem cells), gain from the where p is the probability of symmetric cancer stem
previous generation compartment and loss due to cell division, d, a and gi, bi are proliferation and apo-
advancement into the next generation compartment ptosis rates of cancer stem cells and ith generation of

Cancer Stem Cell Kinetics,


Fig. 1 Different cancer stem
cell (C) subsets of a tumor
population (T)

Cancer Stem Cell Kinetics,


Fig. 2 Tumor hierarchy
initiated by self-renewing
cancer stem cells (round red
cell, left)
Cancer Systems Biology 195 C
non-stem cancer cells, respectively. n is the number of Morrison SJ, Kimble J (2006) Asymmetric and symmetric stem-
divisions non-stem cancer cells can undergo before cell divisions in development and cancer. Nature
441(7097):1068–1074
cell death (proliferation capacity). The parameters d Morton CI, Hlatky L, Hahnfeldt P, Enderling H (2011) Non-stem
and gi can be replaced by a function of tumor size f(C + cancer cell kinetics modulate solid tumor progression. Theor
∑Ni) to account for kinetics, including linear, logistic Biol Med Model 8(1):48
or Gompertzian growth.
C
Parameter Estimation
It is evident that without cancer stem cells, the popu- Cancer Systems Biology
lation will inevitably die out. Furthermore, tumors can
only progress when cancer stem cell proliferation and Zhihui Wang and Thomas S. Deisboeck
symmetric division rate is higher than cell death rate, Harvard-MIT (HST) Athinoula A. Martinos Center for
i.e., pd > a, and any increase in cell death rates a and bi Biomedical Imaging, Massachusetts General Hospital,
yields a reduction in tumor size. For pd ¼ a and pd > 0, Charlestown, MA, USA
the tumor is mitotically active at the cellular level but
may exhibit macroscopic tumor dormancy. The fre-
quency of cancer stem cells in the population is depen- Definition
dent on the values of probability of symmetric cancer
stem cell division p, cancer stem cell death rate a, non- Tumors are heterogeneous cellular entities whose
stem cancer cell proliferation and death rates gi and bi, growth is dependent on dynamical interactions
as well as proliferation capacity of non-stem cancer between cancer cells themselves and the continually
cells n (Enderling et al. 2009; Morton et al. 2011). With varying microenvironment these cells live in (Bissell
adequately chosen parameters, the frequency of cancer and Radisky 2001). It is very difficult if not impossible
stem cells varies enormously (Fig. 1). Experimentation to investigate these dynamic interactions by using wet-
and empirical observations are essential to reliably lab experiments only because experimental complex-
parameterize the kinetics of all cells in the system. ity usually restricts observations to single or very lim-
ited spatial and/or temporal scales without allowing to
reproducibly and quantitatively analyze relationships
Cross-References between these scales. Hence, a more recent paradigm
shift within the cancer research community is to rec-
▶ Asymmetric Cell Division ognize and study cancer as a systems disease.
▶ Symmetric Cell division Cancer systems biology is an emerging multidis-
ciplinary field that focuses on the systemic understand-
ing of cancer initiation and progression by
References investigating how individual components interact and
collaborate to give rise to the function and behavior of
Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, the tumor system as a whole (Deisboeck et al. 2001). In
Clarke MF (2003) Prospective identification of tumorigenic addition to conventional biomedical experiments,
breast cancer cells. Proc Natl Acad Sci USA
a systems approach often involves computational
100(7):3983–3988
Blackburn EH, Gall JG (1978) A tandemly repeated sequence at modeling to help decipher the massive amount of
the termini of the extrachromosomal ribosomal RNA genes data generated today especially in molecular and cell
in Tetrahymena. J Mol Biol 120(1):33–53 biology. In silico cancer models are necessarily sim-
Bodnar AG, Ouellette M, Frolkis M, Holt SE, Chiu CP, Morin
GB, Harley CB et al (1998) Extension of life-span by intro-
plified, yet are beginning to adequately represent spe-
duction of telomerase into normal human cells. Sci (NY) cific cancer phenomena. In fact, it has been
279(5349):349–352 increasingly recognized that such a theoretical
Dingli D, Traulsen A, Michor F (2007) (A)symmetric stem cell approach may help to simulate, predict, and optimize
replication and cancer. PLoS Comput Biol 3(3):e53
Enderling H, Hlatky L, Hahnfeldt P (2009) Migration rules:
procedures, experiments, and therapies, to test and
tumours are conglomerates of self-metastases. Br J Cancer refine hypotheses. The development of a successful in
100(12):1917–1925 silico cancer model is expected to be iteratively
C 196 Cancer Theories

conducted, with available experimental data used to Characteristics


guide, support, and shape the model design, and to
verify and validate model results. John Cairns (1997) rightfully stated: “Biology
The discrepancy between the in vitro and the clin- and cancer research have developed together. Invari-
ical settings has significantly affected the ability of ably, at each stage, the characteristics of the
researchers to design and develop successful cancer cancer cell have been ascribed to some defect in
therapeutics (Khalil and Hill 2005). Hence, one of the whatever branch of biology happens at the time to
most important challenges facing cancer researchers be fashionable and exciting. . .” This statement pro-
currently is how to better translate in vitro discoveries nounced 15 years ago applies to the status of cancer
to clinical application. Data-driven, predictive cancer research today, and thus justifies a brief historical
systems models can help bridge this gap and expedite note to introduce the two currently competing
the development of effective cancer therapeutics. theories of carcinogenesis, namely, the somatic muta-
Although still at an early stage, cancer systems biology tion theory and the tissue organization theory of
has begun to provide useful insights into the disease carcinogenesis.
mechanisms involved by providing a systematic and
integrative framework for incorporating data, generating Historical Perspective
testable hypotheses, and guiding further experiments. The modern view of ▶ neoplasms can be traced
back to the second half of the nineteenth century,
following the advent of the cell theory and the
Cross-References development of modern pathology spearheaded
by Virchow, Thiersch and Waldeyer, in Germany.
▶ Multilevel Modeling, Cell Proliferation In the view of these German pathologists, the bases
of organismic order resulted from embryonic
development and were due to interactions of multiple
References mutually dependent systems within the organism.
Neoplasms were viewed as a disorder of these
Bissell MJ, Radisky D (2001) Putting tumours in context. Nat normally interdependent parts (Moss 2003;
Rev Cancer 1:46–54
Deisboeck TS, Berens ME, Kansal AR, Torquato S, Stemmer-
Sonnenschein and Soto 2008).
Rachamimov AO, Chiocca EA (2001) Pattern of self- All along the twentieth century, the full range of
organization in tumour systems: complex growth dynamics phenotypic potentials available to cells and tissues
in a novel brain tumour spheroid model. Cell Prolif interacting within the organism was gradually
34:115–134
replaced by a more reductionist conceptualization of
Khalil IG, Hill C (2005) Systems biology for cancer. Curr Opin
Oncol 17:44–48 vital processes (Moss 2003). This shift from the
organismic view to a new cell-based and eventually
gene-based view started in 1914 when Theodor
Boveri claimed in his book entitled The Origin of
Cancer Theories Malignant Tumors that “the problem of tumors is
a cell problem” and that cancer was due to “a certain
Ana M. Soto and Carlos Sonnenschein permanent change in the chromatin complex” which,
Department of Anatomy and Cellular Biology, Tufts “without necessitating an external stimulus, forces
University School of Medicine, Boston, MA, USA the cell, as soon as it is mature, to divide again”
(Boveri 1929). Ever since, cancer was considered as
a problem of ▶ control of cell proliferation due to
Definition permanent changes in the “chromatin”, which in
Boveri’s time was already known to contain the her-
Scientific theories serve the purpose of making the itable material. Boveri’s theory represents the precur-
world intelligible. They provide organizing principles sor of what became to be known later in the twentieth
and construct objectivity by framing observations and century as the ▶ somatic mutation theory of carcino-
experiments. genesis (SMT).
Cancer Theories 197 C
The Somatic Mutation Theory The Tissue Organization Field Theory
The premises adopted by the SMT are: (1) cancer is An alternative theory of carcinogenesis, namely, the
derived from a single somatic cell that, over time, tissue organization field theory (TOFT) is based on
accumulate multiple DNA mutations, (2) the default principles similar to those adopted by cancer researchers
state of cells in metazoa is quiescence, and (3) cancer is at the end of the nineteenth century. The TOFT directly
a disease of the ▶ control of cell proliferation caused challenges the core premises of the SMT by positing
by mutations in genes that affect steps within the cell that (1) carcinogenesis, like histogenesis (the formation C
cycle (Hahn and Weinberg 2003). During the last of tissues) and organogenesis (the formation of organs),
decades of the twentieth century until now, the SMT is a supracellular, emergent phenomenon occurring at
became the dominant view in carcinogenesis. the tissue level of biological organization. From this
From this perspective, cancer was perceived as being perspective, regardless of its many causes, cancer
a disease caused by mutations in the DNA of a single becomes “development gone awry.” An additional fun-
founder cell and, therefore, cancer has been considered damental premise of the TOFTs is that (2) as in unicel-
as a clonal disease. lular organisms and metaphyta, the default states of
In spite of the dominance of the SMT, until the metazoan cells, are proliferation and motility. By “the
advent of the ▶ oncogene hypothesis in the 1970s, default state of cells is proliferation” is meant that the
there were several competing theories to explain state that cells assume in the presence of abundant
cancer; they differed at the level of biological orga- building blocks (nutrients) needed to synthesize new
nization in which carcinogenesis occurred. Thus, dis- cells is proliferation. The TOFT stems from a body of
cussions centered on whether cancer was either literature showing that cells that once belonged to
a problem of ▶ control of cell proliferation and/or a neoplasm could be “normalized” upon recombination
of cell differentiation, or else, of tissue organization. with normal tissue. The SMT precludes this eventuality
From the 1950s to the 1970s, D.W. Smithers, C. (Sonnenschein and Soto 2008).
Dawe, J.W. Orr, B. Mintz, G.B. Pierce, and several
others offered compelling evidence for carcinogene- The Hybrid Theories of Carcinogenesis
sis to be considered as a tissue-based phenomenon During over 40 years of dominance by the original and
akin to development gone awry (Soto and the updated SMT (the oncogene hypothesis), hundreds
Sonnenschein 2012). Notwithstanding, around the of oncogenes and dozens of suppressor genes have
1960s and 1970s, the reductionist view became dom- been described (Hanahan and Weinberg 2011). Origi-
inant based on in vitro transformation assays nally, Hanahan and Weinberg, proposed what they
which replaced the animal models that were used called the Hallmarks of Cancer, i.e., “. . .the basic
until then. Philosophers, historians, and sociologists rules governing the neoplastic transformation of
of biology have advanced explanations about how normal human cells”. These originally six rules postu-
this view achieved dominance. One of them, Joan lated that the properties of cancer cells were “. . .to
Fujimura, proposed a sociological explanation that generate their own mitogenic signals, to resist
connected the interests of molecular biologists and exogenous growth-inhibitory signals, to evade apopto-
the rise of the genetic engineering biotechnology sis, to proliferate without limits (i.e., to undergo
industry into constructing “doable problems” such immortalization),to acquire vasculature (i.e., to
as “Are there molecular changes in the cellular undergo angiogenesis), and in more advanced cancers,
proto-oncogenes of tumor cells?” Michel Morange to invade and metastasize.” Later on, two extra rules
proposed, instead, an epistemic explanation whereby were added, namely: reprogramming of energy metab-
the oncogene hypothesis became attractive to olism and evading immune destruction (2011).
researchers because it suggested a straight-forward Central to their original and to their updated view
research program. In this context, the products of remains the idea that carcinogenesis is a cell-based
oncogenes participated in signal transduction problem due to mutations that cause the founder can-
conveying messages from the extracellular milieu cer cell and its progeny to “proliferate without limits”.
through membrane receptors to the nucleus and In his analysis of the metaphysical presuppositions
these messages regulated ▶ cell proliferation (Soto and scientific practices in cancer research, the philos-
and Sonnenschein 2012). opher James Marcum (2005) observed an ideological
C 198 Cancer-initiating Cells

and practical shift among the supporters of the SMT Enderling H, Hahnfeldt P (2011) Cancer stem cells in solid
who adopted concepts of the TOFT, shared by other tumors: is ‘evading apoptosis’ a hallmark of cancer? Prog
Biophys Mol Biol 106:391–399
researchers (Xu et al. 2009). For example, for those Hahn WC, Weinberg RA (2003) Rules for making human tumor
who are committed to explaining carcinogenesis cells. N Engl J Med 347:1593–1603
through a sub-cellular (mutational) strategy, the causal Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next
role of the ▶ stroma in carcinogenesis is transformed generation. Cell 144:646–674
Marcum JA (2005) Metaphysical presuppositions and scientific
into the problem of how mutated genes can affect the practices: reductionism and organicism in cancer research.
interactions of the cancer cells harboring these genes Int Stud Philos Sci 19:31–45
with otherwise normal neighboring cells. This Moss L (2003) What genes can’t do. MIT Press, Cambridge, MA
“hybrid” theory has been criticized on the grounds Sonnenschein C, Soto AM (2008) Semin Cancer
Biol 18:372–377
that the premises of the SMT and the TOFT contradict Soto AM, Sonnenschein C (2012) Is systems biology
each other (Soto and Sonnenschein 2012). a promising approach to resolve controversies in cancer
research? Cancer Cell Int 12:12
Systems Biology, Heuristics and Theory Testing Xu R, Boudreau A, Bissell MJ (2009) Tissue architecture and
function: dynamic reciprocity via extra- and intra-cellular
Systems Biology offers an opportunity to explore the matrices. Cancer Metastasis Rev 28:167–176
▶ heuristic value and usefulness of the two main the-
ories of carcinogenesis. Mathematical modeling and
computer simulation enables researchers the boldness
to choose premises and temporarily reject data sets Cancer-initiating Cells
without having to commit prematurely to a program
of expensive and time-consuming ‘wet’ experiments. ▶ Cancer Stem Cell Kinetics
This exploratory role of Systems Biology may become
central to breaking the habit of fixing lacks of fit with
ad hoc explanations instead of taking a bold and
critical look at the premises that were adopted Canonical Model
(Enderling and Hahnfeldt 2011; Soto and
Sonnenschein 2012). Eberhard O. Voit
In an ultimate analysis, the passage of time will be The Wallace H. Coulter Department of Biomedical
the final arbiter in deciding which of the two main Engineering, Georgia Institute of Technology and
theories of carcinogenesis has been most useful in Emory University, Atlanta, GA, USA
organizing principles and constructing objectivity by
framing observations and experiments.
Definition

Cross-References A canonical model is a mathematical representation of


a biological phenomenon that is constructed according
▶ Cell Proliferation to strict rules and therefore has a regularly structured
▶ Heuristic Optimization format. In contrast to ad hoc models, which are com-
▶ Neoplasms posed from a variety of known or alleged functions that
▶ Oncogene seem to be best suited for the specific modeling pur-
▶ Somatic Mutations pose, each canonical model is obtained from a single
▶ Stroma type of approximation for all processes within the
modeled system. Linear models are canonical, because
all processes are represented with linear functions.
References Nonlinear canonical models include S-systems,
Generalized Mass Action systems, Lotka-Volterra
Boveri T (1929) The origin of malignant tumors. Williams &
Wilkins, Baltimore
systems, and lin-log models. Canonical models are
Cairns J (1997) Matters of life and death. Princeton University particularly useful if specific mechanisms or process
Press, Princeton descriptions are not known.
Canonical Network Motifs 199 C
enumeration algorithms are described in detail in
Canonical Network Motifs Konagurthu and Lesk (2008a). A study analyzing the
preponderance of various subgraph patterns in large
Arun S. Konagurthu1 and Arthur M. Lesk2 natural networks can be found in Konagurthu and Lesk
1
Clayton School of Information Technology, (2008b).
Monash University, Clayton, VIC, Australia
2
Department of Biochemistry and Molecular Biology, Common Subgraph Patterns in Biological C
and Huck Institutes of Genomics, Proteomics, and Networks
Bioinformatics, Pennsylvania State University, These patterns represent some of the commonly enu-
University Park, PA, USA merated subgraph patterns in biological networks:
feed-forward loop (FFL), 3-Cycle (3-CYC), single-
input motif (SIM), multiple-input motif (MIM), and
Synonyms bifan. (See Fig. 1.)
The following statements formally define these sub-
3-Cycle; Bifan; FFL; MIM; Network building blocks; graph patterns. The definitions are, in a sense, “orthog-
SIM; Subgraph patterns onal.” That is, our definitions are chosen to avoid
ambiguity in classification of any subgraph of
a network. (Usage in the literature often allows, for
Definition example, MIMs to contain SIMs as subgraphs, leading
to difficulty in formulating the problem of motif
A system-wide view of complex interactions involved enumeration.)
in biological processes finds a natural description as Feed-forward loop: (Fig. 1a) A FFL is a set of three
networks or directed graphs. Systematic enumeration vertices (source, intermediate, and target) with one
of small subgraphs in various biological networks has direct path, and another indirect path through an inter-
identified preponderant patterns, which are often mediate vertex, from source to target.
described as motifs. Many authors accept the follow- 3-Cycle: (Fig. 1b) A 3-CYC is a three-vertex
ing as a canonical set of network motifs: single-input directed cyclic graph.
motif (SIM), multiple-input motif (MIM), and feed- Single-input motif: (Fig. 1c) A SIM in a large
forward loop (FFL). There is consensus that motifs directed graph G(V, E), with a vertex and directed
are “building blocks” of various biological networks, edge (arc) set V and E respectively, is a bipartite sub-
naturally selected to contribute to their functional graph G0 (P [ C, E0 ) containing two disjoint sets
architecture. In some networks, especially in those (layers) of vertices P and C (denoting parent and
describing genome-wide transcription regulation, spe- child vertex sets, respectively), subject to the following
cific functional roles have been suggested for motifs constraints:
(Shen-Orr et al. 2002; Zaslaver et al. 2004; Mangan 1. |P| (¼ the number of elements of P) and |C|  2.
et al. 2006). For example, SIMs are commonly asso- 2. There is an outgoing edge from the vertex in P to
ciated with temporal ordering of gene expression, every vertex in C. (E0 is the edge set containing these
MIMs with combinatorial gene regulation, and arcs.)
FFLs with filters that do not pass on transient signals. 3. There are no arcs between any two vertices in C.
These functions depend not only on the topology of 4. There are no incoming edges to any vertex in C from
the subgraph, but on the logic at nodes receiving any other vertex (outside the SIM) in the set of
multiple inputs. vertices V  P  C 2 G. (Note that this constraint
does not prohibit outgoing edges from any vertex in
C to a vertex outside of the SIM.)
Characteristics 5. For the singleton vertex in P, C is a maximal set
under the above constraints. (That is, C cannot be
There follows a formal description of commonly enu- extended by conforming to the above constraints.)
merated subgraph patterns in various biological net- Multiple-input motif: (Fig. 1d) A MIM in G ¼ (V, E)
works. The description of these patterns and their is a bipartite subgraph G0 ¼ (P [ C, E0 ) containing two
C 200 Canonical Network Motifs

a b c d e

Canonical Network Motifs, Fig. 1 (a) Feed-forward loop; (b) 3-Cycle; (c) single-input motif; (d) multiple-input motif; (e) Bifan

disjoint sets (layers) of vertices P and C subject to the Enumeration


following constraints: Enumerating Feed-forward loops: FFLs can be enu-
1. |P|  2 and |C|  2. merated in a number of ways. A simple procedure to
2. There are outgoing edges from every vertex pi 2 P enumerate all FFLs in a directed graph G(V, E) is as
to every vertex cj 2 C. (E0 is the edge set containing follows: At each vertex vi 2 V, for each pair of outgo-
these arcs.) ing edges from vi to distinct vertices vj and vk in G,
3. There are no arcs between any two vertices in P. count a FFL if there is an arc from either vj to vk or
4. There are no arcs between any two vertices in C. vice versa.
5. Both P and C are maximal sets under the above Enumerating 3-Cycles: Enumerating 3-CYCs is
constraints. similar to the procedure above. At every vertex vi, for
Bifans: (Fig. 1e) We note that MIMs arose as a gen- each pair of outgoing and incoming arcs to-vj and
eralization of bifans in the literature. To enforce from-vk respectively, count a 3-CYC if there is an arc
orthogonality of motif definition, consider a bifan from vj to vk. Note that, since a 3-Cycle has an auto-
only as a maximal 2  2 MIM, where |P| ¼ |C| ¼ 2. morphism number of 3 – that is, there are three
(See section below for more details.) different permutations of vertices giving an indistin-
guishable connectivity of 3-CYC – this procedure
Maximality and Orthogonality of Motif Definitions results in overcounting 3-CYCs by three times. This
We emphasize that the criterion of maximality is cru- can be avoided if the 3-CYC list is generated under
cial to maintain independence (orthogonality) in enu- some canonical ordering of vertex labels.
merating the above motifs. In case of SIM, the set C is Enumerating Single-input motifs: For each vertex vi
maximal, whereas with MIMs, both P and C sets are in the full graph G ¼ (V, E) find the set C(vi) Ci of all
maximal. Without this constraint, it is easy to overcount its children – the set of vertices that are targets of an arc
non-maximal subgraphs of the type MIMs and SIMs. from vi. Shrink the set Ci by eliminating vertices which
For example, a maximal MIM with p parents and c have incoming edges from vertices outside vi [ Ci. The
children contains [2p  (p + 1)]  [2c  (c + 1)]  1 reduced Ci, provided it contains at least two vertices,
easily enumerable non-maximal “sub-MIMs,” includ- forms the candidate set from which maximal indepen-
   
p c dent sets of vertices – that is, set of vertices without
ing  non-maximal bifans. Similarly,
2 2 links between them – are extracted.
a maximal SIM with one parent and c children contains Use Bron and Kerbosch algorithm (Bron and
2c  c  2 non-maximal “sub-SIMs.” Kerbosch 1973) for finding all cliques in graphs as
The definitions of the SIM exclude the counting of follows. (A clique is a maximal complete subgraph,
subgraphs of MIMs as SIMs. Without the in-degree a subgraph with an edge between every two vertices.)
constraint (4) on a SIM, every MIM containing p parents Consider the undirected complement of the graph
and c children would contain p  (2c  (c + 1)) induced by reduced Ci. This is created by deleting all
subgraphs that form either a SIM or a subset of a SIM edges originally present, and introducing an undirected
with two or more children (in set C). edge between any two vertices originally unconnected.
Canonical Pathway of microRNA Biogenesis 201 C
Then the condition that a subgraph of the reduced Ci Note on densely overlapping regulons: Shen-Orr
set has no edges becomes that the corresponding sub- et al. (2002) introduced densely overlapping regulon
graph of the inverted graph is completely connected. (DOR). DORs can be thought of as an incomplete
Each possible clique (of size 2 vertices) in the com- MIM, from which some of the arcs between P and C
plement graph gives us the disjoint set C of a SIM for are absent.
every vertex vi P, satisfying all the constraints in its The method for enumerating MIMs described
definition. above can be extended to count DORs. The sets P C
Enumerating Multiple-input motifs: For each vertex and C are first calculated such that there is an arc
vi 2 V find the set Ci of all its children. Let P(c) be the between every node in P and every node in C. Subse-
set of vertices (“parents” of vertex c) which have an quently, relaxing the constraint of having “complete”
incoming edge to c. For each pair c1, c2 2 Ci, find the sets of edges between the two sets, P and C can be
maximal subset of nodes that are parents of c1 and c2, iteratively extended such that they now include nodes
that is, the intersection of the parents of c1 and the that share at least a certain user-defined threshold on
parents of c2: P(c1) \ P(c2). number of edges between the sets. Once extended, the
By considering all pairs of elements from the child edges between P and C can be assumed to be complete,
sets derived from every node, we build up a list of child and DORs (free of parent-parent and child-child edges)
and corresponding parent sets such that all extracted using the procedure discussed earlier.
parents have arcs to all children. Initially, c1, c2 and
P(c1) \ P(c2) form a child and corresponding parent set
(provided that P(c1) \ P(c2) contains 2 nodes). If References
another pair c3, c4 gives us the same intersection of
parents – P(c1) \ P(c2) ¼ P(c3) \ P(c4) – we merge the Bron C, Kerbosch J (1973) Finding all cliques of an undirected
graph. Commun ACM 16:575–577
child sets, updating our list to contain: c1, c2, c3, c4 and
Konagurthu A, Lesk A (2008a) Single and multiple input mod-
P(c1) \ P(c2) (¼ P(c3) \ P(c4)). We build up maximal ules in regulatory networks. Proteins: Struct Funct Bioinf
child and corresponding parent sets. The process is 73:320–324
repeated for all pairs in the parent sets that share the Konagurthu A, Lesk A (2008b) On the origin of distribution
patterns of motifs in biological networks. BMC Syst Biol
same children.
2:73+
This step has built up candidate MIMs that are Mangan S, Itzkovitz S, Zaslaver A, Alon U (2006) The incoher-
complete as far as parent-child arcs are concerned. ent feed-forward loop accelerates the response-time of the
The next step is finding subsets of candidate MIMs gal system of Escherichia coli. J Mol Biol 356:1073–1081
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs
free of parent-parent and child-child edges.
in the transcriptional regulation network of Escherichia coli.
In the induced undirected version of the graph Nat Genet 31:64–68
containing parent and child sets of a candidate MIM, Zaslaver A, Mayo AE, Rosenberg R, Bashkin P, Sberro H,
consider the parent set and “invert” the edges between Tsalyuk M, Surette MG, Alon U (2004) Just-in-time
transcription program in metabolic pathways. Nat Genet
vertices in this set by deleting all originally present
36:486–491
edges, and introducing an edge between any two nodes
originally unconnected. Repeat the same procedure for
the vertices in the child set. Preserve the edges (ignor-
ing the direction) linking vertices in the parent set to Canonical Nonlinear Modeling
nodes in the child set – we know by construction that
there is an edge between every parent and every child ▶ S-System
node.
Then the problem of finding MIMs – maximal
subsets of parent and child graphs, that contain
no interparent edges and no interchild edges but Canonical Pathway of microRNA
all parent-child edges – is equivalent to finding Biogenesis
all cliques in this transformed graph of each candidate
MIM. ▶ MicroRNA Biogenesis, Regulation
C 202 Capacity

layers of epithelium but do not breach the basement


Capacity membrane. They may or may not progress to
malignancy.
▶ Disposition

Cross-References
Capacity to Evolve ▶ Cancer Pathology

▶ Evolvability, Generalized Biology

Case-based Reasoning
Capillary Electrophoresis Mass
Spectrometry–based Metabolomics Eyke H€ullermeier, Thomas Fober and Marco
Mernberger
▶ CE-MS-based Metabolomics Philipps-Universit€at, Marburg, Germany

Definition
Carbon Assimilation
Case-based reasoning (CBR) is a computational prob-
▶ Photosynthesis lem solving methodology offering a principled
approach to knowledge engineering and knowledge-
based systems design. Being rooted in cognitive psy-
chology, CBR essentially seeks to formalize a problem
Carboxyfluorescein Succinimidyl Ester solving paradigm upon the basis of a simple rule of
(CFSE) thumb, namely, that “similar problems tend to have
similar solutions” (Kolodner 1993). More specifically,
▶ Lymphocyte Labeling, Cell Division Investigation the idea of CBR is to store and exploit the experience
from similar problems in the past, and to adapt then
successful solutions to the current situation. Thus,
the core of every case-based problem solver is the
Carcinogens
case base or case library, which is a collection of
memorized “chunks of experience” called cases.
▶ Xenobiotics
Moreover, the concept of similarity plays a key role
in CBR systems. Inference in CBR can be considered
as a specific form of analogical reasoning.
Carcinoma In Situ

Barbara J. Davis Characteristics


Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences, CBR is a subfield of artificial intelligence (AI) (see
North Grafton, MA, USA ▶ Evolutionary Algorithms) with close connections to
machine learning, information retrieval (see ▶ Identi-
fication of Gene Regulatory Networks, Machine
Definition Learning), databases, semantic web, and ▶ knowledge
management. It combines methods from these and
A specific designation for neoplasia derived from related areas to tackle specific problem tasks, such as
epithelial dysplastic changes that occur in all the diagnosis, planning, product recommendation, and
Case-based Reasoning 203 C
experience management (Bergmann et al. 2003). From planning of crystallization (Jurisica and Glasgow
a knowledge-based systems perspective, CBR can be 2004), or more recently the analysis of microarray
seen as an alternative to the rule-based paradigm for data (Bangpeng and Shao 2010).
expert systems design. In this regard, it offers a number
of advantages, especially from a knowledge acquisi- The CBR Cycle
tion and maintenance point of view. In fact, knowledge Despite the diversity of concrete implementations, the
in the form of individual cases (often specified as essentials of the CBR methodology are captured by C
problem/solution tuples (x, y), indicating that y is the a surprisingly simple and uniform process model,
best or at least a sufficiently good solution to the referred to as the CBR cycle (Aamodt and Plaza
problem x) is in general more readily available than 1994). It consists of four sequential steps organized
generally valid rules, which are difficult to elicit from around the knowledge of the CBR system: retrieve,
human experts. Moreover, whereas rule-based infer- reuse, revise, and retain (see Fig. 1).
ence normally requires a knowledge base to be com- Problem solving starts when a new problem (the
plete and correct in a logical sense, analogical and query) must be solved. First, in the retrieve phase,
similarity-based inference are more tolerant toward one or several cases with the highest similarity to the
the incompleteness and inconsistency of knowledge query are selected from the case base, where similarity
(H€ullermeier 2007). is determined by an underlying similarity measure.
Applying CBR principles in a machine learning Since the efficiency of the retrieval step critically
context, for example, for ▶ classification, comes depends on the size of the case base, a branch of
down to realizing “lazy,” instance-based instead of CBR research deals with methods that improve
“eager,” model-based learning (like rule induction). retrieval efficiency through the use of specific index
The nearest neighbor method can be seen as the sim- structures.
plest form of case-based reasoning and is a typical In the subsequent reuse phase, the solutions of the
example of the so-called structural approach to CBR. retrieved cases are adapted according to the require-
In structural CBR, cases are represented according to ments of the query. In simple tasks like ▶ classifica-
a structured vocabulary, that is, an ▶ ontology. The tion, where the number of solutions (classes) is limited
describing features of a case can be organized as flat and typically much smaller than the number of cases,
attribute-value tables, in an object-oriented manner, as no adaptation is needed at all. On the other hand,
graph structures, or by sets of atomic formulas of for synthetic tasks such as configuration of technical
a logic language. Although the structural approach systems or planning, where the solution space is
is most widely used in practice and arguably most potentially infinite and exceeding the number of avail-
relevant for the biological sciences, it is worth men- able cases, solution adaptation becomes essential.
tioning the existence of other types of CBR, such Several techniques for adaptation in CBR, including
as textual and conversational CBR, which can differ transformational and generative adaptation, have been
significantly with regard to case representation and proposed so far (Lopez Mantaras et al. 2005).
reasoning. In the revise phase, the solution determined so far is
By reasoning on the basis of specific cases instead verified and possibly corrected or improved, for exam-
of first principles, CBR is of particular relevance for ple, through intervention by a domain expert. Feed-
scientific fields that are characterized by a lack of such back can be given in the form of a rating of the result or
principles, and which are more data- than theory- a manually corrected revised case.
driven. From this point of view, CBR is especially Finally, the retain phase adopts the feedback
interesting for the biological and life sciences, includ- from the revise phase and updates the knowledge. In
ing systems biology, in which data abounds, analogi- particular, by adding the revised case to the case
cal/similarity-based reasoning is omnipresent, and base and thus making the new problem solving expe-
theory formation is largely founded on individual rience available for future problem solving episodes,
case studies. CBR systems have already been devel- a simple form of learning can be realized. However,
oped for a number of problems in this field, such as the continuous growth of the case base causes
gene finding, knowledge discovery in sequence data- a utility problem as it may compromise retrieval
bases, protein-structure determination and the efficiency. Explicit competence models have therefore
C 204 Case-based Reasoning

Case-based Reasoning, new problem


Fig. 1 The CBR cycle
RETRIEVE
new retrieved
case cases

RETAIN REUSE
prior
cases

Case Base
confirmed repaired solved suggested
solution case case solution
REVISE

been developed to enable the selective retention of As a means for modeling similarity functions, the so-
cases (Smyth and McKenna 2001). called local–global principle is widely used: The
similarity function is decomposed according to the
Representing and Structuring Knowledge in CBR vocabulary, in such a way that local similarity func-
A unified view of the knowledge contained in tions pertaining to individual attributes are used to
a (structural) CBR application employs the metaphor model the preference according to the corresponding
of a knowledge container, typically referring to four attributes. These local similarities are then aggregated
such containers: the vocabulary, the case base, the into the global similarity by means of an appropriate
similarity measure, and the adaptation knowledge. aggregation function.
The vocabulary (often called ontology today) is the
basis of all knowledge and experience representation
in CBR. The vocabulary defines the information enti- Cross-References
ties and structures (e.g., classes, relations, attributes,
data types) that can be used to represent cases, simi- ▶ Classification
larity measures, and adaptation knowledge. The case ▶ Evolutionary Algorithms
base is the primary form of knowledge in CBR. Tradi- ▶ Identification of Gene Regulatory Networks,
tionally, a case is considered as an instance in the Machine Learning
representation space defined by the vocabulary, for ▶ Knowledge Management
example, a vector in an attribute-value representation. ▶ Model Testing, Machine Learning
From a knowledge representation point of view, ▶ Ontology
vocabulary and case base representations are standard
applications of AI and database technology.
Since cases are selected based on their similarity to
the current problem, the similarity measure used in a References
CBR system encodes important knowledge of the
domain. It is usually formalized in terms of a function Aamodt A, Plaza E (1994) Case-based reasoning: foundational
issues, methodological variations, and system approaches.
that maps a pair of problem descriptions to a real AI Communications 7(1):39–59
number, often restricted to the unit interval, with the Bangpeng Y, Shao L (2010) ANMM4CBR: a case-based rea-
convention that a high value indicates a high similarity. soning method for gene expression data classification. Algo-
The semantics of such similarity degrees can be rithm Mol Biol 5(1):1–11
Bergmann R, Althoff KD, Breen S, G€ oker M, Manago M,
defined via the notions of preference and utility: Traph€oner R, Wess S (2003) Developing industrial case-
A higher degree of similarity suggests that a case based reasoning applications: the INRECA methodology.
is more useful for solving the current problem. Springer, Berlin
Causal Bayesian Networks 205 C
H€ullermeier E (2007) Case-based approximate reasoning. Formally, if one intervenes in a given causal
Springer, Berlin Bayesian network, say by setting a variable Xl to
Jurisica I, Glasgow J (2004) Applications of case-based reason-
ing in molecular biology. Artif Intell Mag 25(1):85–97 a fixed value xl, then the joint distribution is changed to
Kolodner JL (1993) Case-based reasoning. Morgan Kaufmann,
San Mateo Y
n

Lopez Mantaras R, McSherry D, Bridge D, Leake D, Smyth B, pðx1 ; :::; xl1 ; xlþ1 ; :::; xn j^
xl Þ :¼ pðxi jPa½xi Þ
Craw S, Faltings B, Lou Maher M, Cox MT, Forbus K, Keane i¼1
i6¼l C
M, Aamodt A, Watson I (2005) Retrieval, reuse, revision
and retention in case-based reasoning. Knowl Eng Rev
20:215–240 where x^l denotes the intervention in the given variable
Smyth B, McKenna E (2001) Competence models and the main- and thus random variables Xi with Xl as parent use the
tenance problem. Comput Intell 17(2):235–249 fixed value xl instead of the parent variable Xl. The
definition extends to sets of intervention variables.

Category Homomorphism Characteristics

▶ Functor Consider, as example, the Causal Bayesian Network


(see also ▶ Probabilistic Graphical Model)

X 1 ! X2 ! X4
Category Map
& " %
▶ Functor X3 ! X5

Intervening in X2 by setting it to the value x2, essen-


Causal Bayesian Networks
tially changes the network to:
Daniel Polani
X1 x 2 ! X4
Adaptive Systems Research Group, School of
Computer Science, University of Hertfordshire, & " %
Hatfield, UK X3 ! X5

Synonyms whose probability is to be read as:

Causal probabilistic graphical models pðx1 ;x3 ;x4 ;x5 j^


x2 Þ :¼ pðx1 Þpðx3 jx1 Þpðx4 jx2 ;x3 Þpðx5 jx3 Þ

Note that this intervention in X2 is the same as


Definition modifying the original Causal Bayesian network to
remove the incoming connections to X2, to fix the
Causal Bayesian networks are an extension of
variable’s value to x2, and considering the resulting
▶ probabilistic graphical models by the concept of
joint distribution of the other variables.
(causal) intervention which states how the joint
probability changes if variables in the network are
externally coerced into a given value. This can be
References
seen as modeling the influence of experimental
modification of a system as opposed to observing an Pearl J (2000) Causality: models, reasoning and inference.
unperturbed system. Cambridge University Press, Cambridge, UK
C 206 Causal Probabilistic Graphical Models

Causal Probabilistic Graphical Models Causality

▶ Causal Bayesian Networks Max Kistler


IHPST, Université Paris 1 Panthéon-Sorbonne,
Paris, France

Causal Relationship Synonyms

Katsuhisa Horimoto Causation


Computational Biology Research Center, National
Institute of Advanced Industrial Science and
Technology, Koto-ku, Tokyo, Japan Definition

Causality is a philosophical concept that expresses


Synonyms a relation or a process linking two (or more) distinct
events or facts, which consists in one bringing about
Bayes rule; Bayesian network model; Conditional dis- the other. Ordinary language contains many expres-
tribution; Conditional Independence; Correlation sions denoting causation: x makes y happen, x pro-
relationship vokes y, x produces y, x brings y about, x gives rise to
y, x induces y, etc. Many verbs express particular types
of causation, both in common sense and science: break,
Definition bend, link, release, trigger, modify, influence, produce,
transfer, catalyze, etc. Many philosophical analyses of
A causal relationship is when one variable causes the concept of causation have been developed but none
a change in another variable. In other words, when has won universal acceptance.
an event (the cause) occurs, the second event
(the effect) is understood as a consequence of the
first. Note that a causal relationship cannot be Characteristics
defined from a joint distribution of observed variables
alone, while any correlation (associational) relation- Causality (or causation) has always been a focus of
ship can be defined in terms of the distribution philosophical controversy. One debate, which is today
(Pearl 2009). as lively as ever, is about the correct analysis of the
concept: What is a causal relation, and what are its
terms: objects, events, or facts? However, since the
Renaissance, this debate coexists with a more funda-
Cross-References mental debate on whether causality is a legitimate
notion in science at all. I start with the latter issue,
▶ Bayesian Network Model and then present the most influential and relevant
▶ Conditional Independence accounts of causation.
▶ Correlation Relationship Contemporary debates about the legitimacy of the
concept of causation in science have been structured
by Russell’s 1912 essay “On the notion of cause.”
Here are two of his reasons for doubting that science
References
looks for or analyzes causal relations or processes.
Pearl J (2009) Causal inference in statistics: an overview. 1. The “principle of causality” says that “the same
Statistics Surveys 3:96–146 causes have the same effects.” In other words,
Causality 207 C
there are strict lawful regularities where instances Second, laws relate certain properties of objects and
of a first type of event, A, are invariably followed by events, not particular events that have many prop-
instances of a second type of event, B. There are two erties, whereas causes and effects are particular
reasons for doubting that such laws exist in events.
advanced sciences. The first is that events recur It is controversial whether Russell was right (1) on
only insofar as they are conceived vaguely: Once fundamental physics and on (2) whether all other
an event is described in a quantitatively precise sciences, including biology, would eventually come C
manner, as required by advanced science, the prob- to resemble fundamental physics in abandoning causal
ability of its recurrence tends toward zero. concepts (Price and Corry 2007). Independently of the
However, this claim is more plausible for macro- issue of that debate, it has often been argued that
scopic events such as throwing bricks than for the concept of causation is required to make sense,
microscopic events, such as the oxidation of not only of many common sense judgments but also
a carbon atom yielding a carbon dioxide molecule. of experimental and applied science.
The second reason holds for both macro and micro Several analyses of the meaning of causal
events: A type of event can only recur if it is nar- statements, both in science and common sense, have
rowly construed: Abstracting away from the cir- been elaborated in philosophy. Until the mid-twentieth
cumstances, which will never exactly recur, the century, the deductive-nomological (DN) analysis
event must be circumscribed to a narrow space- (definition) was generally accepted as an adequate
time region. The problem is that the narrow locali- account of both scientific explanation and causation.
zation of the first event A prevents the regularity of However, it has now been widely abandoned because it
its being followed by a second event B from being abolishes the distinction between causal and noncausal
strict: The circumstances being unspecified, noth- scientific explanations. Constitutive explanations seem
ing precludes them from containing factors that to be noncausal: The conformation of a macromole-
interfere with the processes leading from A to B. cule can be explained in terms of its constituents and
The regularity that As are followed by Bs may be the laws applying to the constituents in virtue of their
only a “ceteris paribus law” (see definition). properties, such as the distribution of electric charges.
2. The second reason is that science looks, and should Mechanical explanations can also be seen to be
look, for laws rather than for causes: The laws noncausal, insofar as they do not provide the cause of
discovered in advanced sciences have the form of some phenomenon, event, or fact, but show how the
functional equations, in particular differential parts of a complex system are articulated to guarantee
equations. Russell gives two reasons for why such the functioning of the whole system, i.e., its evolution
laws cannot be considered to be causal: First, func- from an initial state to an end state.
tional determination is independent of the direction Here is a brief sketch of the guiding ideas of the
of time, whereas causality is essentially directed most influential recent accounts of causation. They are
from past to future. In mechanics, the same laws, grounded on the following concepts respectively:
together with the description of a system at time t, (1) counterfactual dependence, (2) process, (3) proba-
can be used to calculate future and past states of the bility raising, (4) intervention or manipulation.
system. The same is true for reversible biochemical 1. The counterfactual approach develops the intuition
reaction equations (these are exceptional, most bio- that causes make a difference to their effects.
logically relevant reactions being irreversible). Its leading hypothesis is that the meaning of the
Moreover, laws expressing the functional depen- statement that event c is the cause of event e can
dence of different quantities in the equilibrium be analyzed in terms of counterfactual conditionals
state of a system, such as the ideal gas law such as: “If c had not occurred, e would not have
pV ¼ nRT or the equation indicating the equilib- occurred.” David Lewis (1973) has elaborated
rium constant of a biochemical reaction, cannot be a semantic analysis of the truth conditions of such
causal because the dependence holds at a given counterfactuals. Assume that events c and e have
instant of time, whereas causality requires that the occurred in the actual world. Then “c causes e” is
cause determines an effect that takes place later. true if and only if the closest of all non-actual
C 208 Causality

possible worlds in which c does not occur is such A tracer allows to track causal influence along
that e does not occur in it either. Causation cannot a “causal line,” which is a world-line consisting of
be directly analyzed in terms of counterfactual a series of contiguous regions of space-time, which
dependence. Such an analysis fails, e.g., in are connected by the transmission of the tracer.
situations of redundant causation, which are fre- A scientist who undertakes uncovering the causal
quent in biology. Assume that a chemical reaction structure of a complex situation may follow two strat-
R can be catalyzed by each of two enzymes, E1 and egies: She may either search for statistical correlations
E2, and take a particular situation in which enzyme among the variables characterizing the situation, or
E1 was active but not E2. In this situation, there is experimentally manipulate these variables. Each of
no counterfactual dependence of R on E1, for even these research strategies – analysis of observations
if E1 had not catalyzed the reaction, E2 would have. and experimentation – provides the central idea of
This type of situation is also a case of “preemption”: a philosophical strategy for analyzing the concept
The influence of E2 on R is preempted by the of causation.
influence of E1. To say that E1 preempts E2 from 3. The main hypothesis of probabilistic analyses of
causing R means that E2 would have caused R in the causality (Eells 1991) is that a variable (or factor,
absence of E1, but that E1 has blocked E2’s influ- or “event” in the language of probability theory)
ence on R. The problem for the counterfactual anal- A causes a variable B if and only if P(B|A) >
ysis is that the presence of the preempted cause E2 P(B|:A). Such theories are usually intended as ana-
removes the counterfactual dependence of R on its lyses of generic causation (definition), as opposed
cause E1. This problem can be solved by allowing to singular causation (definition). However,
that the causal influence from E1 to R runs through P(B|A) > P(B|:A) is insufficient to justify the judg-
a chain of intermediate steps. To cope with other ment that A causes B. It must also be excluded that
problematic situations, the counterfactual analysis A and B are effects of some common cause C,
has been modified in several other ways. which is called a screening factor (definition). Prob-
Furthermore, probabilistic processes have been abilistic analyses can also be adapted to account for
taken into account: What is counterfactually situations in which a factor diminishes the
dependent (directly or indirectly) on the cause is probability of one of its effects: If factor F, which
the probability of the effect, rather than the effect. enhances the risk of a disease M, is positively
Finally, our intuitive concept of causation makes correlated to factor S, which diminishes the risk
a difference between causes and background con- of M, it is possible that P(M|F) ¼ P(M|:F) or even
ditions. This can be made explicit in counterfac- P(M|F) < P(M|:F), because the negative influence
tuals: “c rather than c* caused e rather than e*” of S on M cancels out (first case) or even over-
can be analyzed as “if c* had occurred instead of compensates (second case) the positive influence
c, then e* would have occurred instead of e.” of F on M.
2. Another intuition has it that causation requires the 4. The most important recent analysis of causation is
existence of a continuous process that stretches intended as a model of another fundamental
from cause to effect. Salmon (1984) has combined research strategy for the discovery of causes in
Russell’s (1948) analysis of causal processes in complex situations, as it is used in sciences such
terms of world-lines (as they have been introduced as economics, sociology, epidemiology, or systems
in special relativity by Minkowski diagrams) and biology: To find out whether one variable X has
Reichenbach’s (1991) idea that causal processes are a causal influence on another variable Y in the
characterized by their capacity to transmit marks. system, one holds fixed all variables Z that cause
The paradigm case of this analysis is the propaga- Y but are not caused by X, and then manipulates
tion of a radio signal, which is a local modification X by an intervention (definition) (Woodward 2003).
of a structure spreading along a continuous world- X is a cause of Y if and only if this change of the
line from a sending event to a receiving event. value of X also changes the value of Y. Ideally, this
The use of tracers in biomedical research makes strategy leads to the discovery of the complete
sense in the framework of such process accounts: network of influences among all variables
Causation, Generic and Singular 209 C
characterizing the system. It can be expressed either References
by structural equations, which represent the value of
each variable as a function of all variables that have Quoted in Text
a direct causal influence on it, or equivalently by Eells E (1991) Probabilistic causality. Cambridge University
Press, Cambridge
oriented graphs, in which variables are represented Lewis D (1973) Counterfactuals. Basil Blackwell, Oxford
by nodes and influences by edges connecting these Price H, Corry R (eds) (2007) Causation, physics and the con-
nodes. However, this analysis cannot be used to stitution of reality: russell’s republic revisited. Oxford Uni- C
discover causal influences from scratch: In order versity Press, Oxford
Reichenbach H (1991) The direction of time. University of
to find out whether X causally influences Y, one
California Press, Berkeley
must already presuppose knowledge about which Russell B (1948) Human knowledge: its scope and limits. Allen
other variables influence Y. and Unwin, London, 5th edition, 1966
By requiring that causes of Y which are not effects Salmon W (1984) Scientific explanation and the causal structure
of the world. Princeton University Press, Princeton
of X must be held fixed during the intervention on X,
Woodward J (2003) Making things happen. Oxford University
the problem of preemption (mentioned above) is Press, Oxford
avoided: In the situation sketched above, in which
reaction R is caused by enzyme E1 but could also Recent Books and Overview Articles
have been caused by “back-up” enzyme E2, the value Beebee H, Hitchcock C, Menzies P (eds) (2009) The Oxford
of E2 is set to 0, because E2 does not intervene in the handbook of causation. Oxford University Press, Oxford
Schaffer J (2007) The metaphysics of causation. In: Zalta EN
actual situation. If E2 is “blocked,” then manipulation
(ed) The Stanford encyclopedia of philosophy, http://plato.
of E1 shows that E1 causes R: The value of variable stanford.edu/entries/causation-metaphysics/
R is a function of E1. Sosa E, Tooley M (eds) (1993) Causation. Oxford University
The interventionist analysis also correctly ana- Press, New York
lyzes situations in which causes diminish the proba-
bility of their effect, if a distinction between “total
cause” and “contributing cause” is introduced. If
F enhances the probability of M but also the proba- Causation
bility of S, which in turn diminishes the probability of
M, F may not be a total cause of M, if the positive ▶ Causality
(direct) and negative (indirect, through S) effects of
F on M just cancel out. However, by holding fixed S,
it is possible to discover that F is nevertheless
a contributing cause of M. Causation, Generic and Singular
Initially developed to analyze generic causation
(among variables), the interventionist framework has Max Kistler
recently been adapted to the analysis of token IHPST, Université Paris 1 Panthéon-Sorbonne,
causation: This “actual causation” is modeled by Paris, France
representing influences among particular values of
the variables in the network.
In comparing and evaluating such accounts, one Definition
must bear in mind that philosophical analyses may
pursue different goals: The counterfactual analysis is, Generic causation is a relation between two “factors,”
e.g., intended to be an a priori analysis of our common “properties,” or “events” (in the language of probabil-
sense concept of causation. This implies that it is ity theory) represented by variables. The meaning of
intended to be applicable to physically impossible the statement that smoking (A) (generically) causes
situations, such as fictions describing magical lung cancer (B) can be analyzed according to several
interactions. Process accounts have no such ambition. proposals: In the probabilistic framework, it means
Their aim is to discover the “real essence” of causality that, within a given population, belonging to set
as it is in the real world. A raises the probability of belonging to set B. In the
C 210 Causation, Origination

interventionist framework, it means that an interven- • Efficient cause is the primary exogenous source of
tion that changes the value of variable A, for example, change in an entity or system: “An earthquake
by making some nonsmokers smokers, changes the caused the tsunami.”
value of variable B. It is not possible to draw any • Formal cause is the relational organization which is
inference from the existence of a relation of generic responsible for properties of a system: “The charge
causation between factors A and B in a given popula- distribution on a water molecule causes water to be
tion to causal processes at the level of individuals wet.”
possessing properties A and B (or belonging to sets • Final cause is the purpose or meaning which moti-
A and B), in other words to the existence of relations of vates an action: “Hunger caused me to hurry home.”
singular causation: In a given population, A (smoking) Dynamical systems theory describes causation in
may cause B (having lung cancer), John may smoke terms of 6systems of first-order differential equations
(John is A) and have lung cancer (John is B), and yet, x_ ¼ f ðx; tÞ, where x is a vector of state variables of
John’s smoking may not be the (or even a) cause of his a dynamical system S, t is time, and f is the set of
cancer. The cause of John’s having cancer may be the structural relations which determine the dynamic
fact that he has inhaled asbestos dust. behavior x_ of S. This suggests several kinds of formal
causation:
• Upward causation: Change Dxi in an individual
Cross-References state variable causes a change Dx_ ¼ f ðx þ Dxi Þ in
the collective macro-behavior of S.
▶ Causality • Downward causation: A collective change Dx in all
variables causes a change Dx_i ¼ f i ðx þ DxÞ in
a specific micro-behavior.
• Reciprocal causation: A change Dxijj in either of
Causation, Origination two variables  xi  or xj causes a change
Dx_ ijj ¼ f ijj x þ Dxjji in the other’s behavior.
Niall Palfreyman These distinctions are of fundamental importance
Biotechnology and Bioinformatics, Weihenstephan- for understanding the origination of phenotypic nov-
Triesdorf University of Applied Science, Freising, elty, environmental effects, and disease. For example,
Germany one approach to carcinogenesis seeks efficient, upward
causes in the accumulation of genetic mutations within
a single originator cell (somatic mutation theory),
Definition while another seeks formal, downward cause from
tissue-organization fields (tissue-organization field
Causation is the physical process assumed by theory).
▶ causality to link one occurrence, the cause, to Physicalism traditionally reduces explanation to
a second occurrence, the effect, which is understood material and efficient causes. This reductive program
to occur as a consequence of the cause. A central stems from the explanatory benefits of Descartes’ and
challenge for the study of causation is origination: Newton’s partitioning of the world into two domains:
How are events in the physical world originally the physical domain of material and efficient cause
caused? operating on inert matter, and the epistemic domain
of final causes operating through human agents. The
epistemic domain was thus bracketed out of physical
Characteristics science, and formal cause assumed a twilight role as
action-at-a-distance fields.
Aristotle distinguishes in Metaphysics V 2 between Physicalism achieved its crowning triumph in the
four types of cause: theory of relativity. The special theory is founded on
• Material cause is the intrinsic physical constitution efficient time-like interaction between observers,
of an entity: “The charge of an electron causes it to while in the general theory, geometry becomes the
react to an electric field.” material cause of dynamics. However, mass-energy
Causation, Origination 211 C
equivalence and nonlinearity of the gravitational field Kim (in Bedau and Humphreys 2008) interprets
equations elevated the relativistic field concept to downward causation in two ways: synchronic (BðtÞ
a new status: formal cause as the bearer of material conditions bðtÞ at the same instant t) and diachronic
properties. This inclusion of formal cause into the (BðtÞ conditions bðt þ DtÞ at a later instant). He argues
physicalist fold was consolidated by quantum field that synchronic downward causation is logically inco-
theory, which shifts emphasis from particle-bound herent, but acknowledges (footnote 37, p. 153) that
properties and interactions as material and efficient diachronic downward causation in nonlinear dynami- C
causes to the formal cause of nonlocal field cal systems may be feasible.
configurations. Pursuing this idea further, Soto et al. (2008) argue
This trend is paralleled in biology, where living for diachronic downward causation. They propose
systems were traditionally explained in terms of effi- a materialistic non-physicalist ontology in which sys-
cient cause: Mayr’s proximate developmental causes tems of natural events operate historically to generate
and ultimate evolutionary causes. Yet this scheme is ontological novelty (novel qualities and novel struc-
incomplete on two counts: tures). Thompson (2007) maintains that diachronic
• Developmental systems theory (DST) downward causation is coherent in cases where BðtÞ
(▶ Developmental Systems Theory) stresses the is non-separable into individual micro-behaviors by
formal reciprocity of gene-organism-environment reason of its dependence on interactions between
relations, which complicates views of gene or envi- these behaviors. This form of emergence is clearly
ronment as efficient developmental causes (Oyama irreducible to individual micro-behaviors, yet even
et al. 2001). Thompson (pp. 429–30) seems doubtful of its ade-
• Acknowledging gene-plus-environment as efficient quacy as a basis for ontological origination. The prob-
evolutionary cause still neglects the role of the organ- lem is that dynamic co-emergence cannot escape
ism as the formal confluence of both – particularly determination by its component micro-behaviors
with regard to the origination of phenotypic and because even in a nonlinear dynamical system, it still
genotypic novelty (M€ uller and Newman 2003). supervenes on the formal organization of the structural
These issues have shifted the focus of biological relationships f.
explanation toward formal cause. Systems biology in Seeking a locus of origination in formal cause
particular, in studying the influence of structural and descends philosophically from the later Wittgenstein,
dynamical organization on ontogeny, claims an unam- who sought to define meaning in terms of an irreduc-
biguous allegiance to formal cause. ible closed set of formal language-usage relations, or
The rehabilitation of formal cause has prompted language game. Yet as Putnam (1980) demonstrated,
a search for the origination of action in formal cause. a closed formal system, by abandoning dependence on
A system-level behavior B of a deterministic dynami- external context, loses all causal relevance to that
cal system S is emergent if it is sufficiently irreducible context. By definition, it is then difficult to see how
to micro-behaviors b of S’s components as to constitute such a system can originate from its context.
an originator of efficient cause. Attempts to formalize Appeal to external context suggests a way to eman-
this notion of irreducibility have led to introduction of cipate formal cause from internal organization by sim-
the term supervenience: B supervenes on b if b logi- ply opening S up to interaction with its context. The
cally determines B; B is then emergent if it supervenes weasel’s coat-change is conditioned in part by the
on no set of micro-behaviors. changing autumnal day-length, and so does not super-
A litmus test for coherence of the emergence con- vene on the weasel’s internal organization. Yet mere
cept is its relation to downward causation, which per- openness is an inadequate basis for origination.
mits an emergent behavior BðtÞ of S at time t to Inserting a coin into a coffee-machine is a contextual
influence a micro-behavior b of S’s own components. interaction which results in a cup of coffee, yet we
For example, a weasel’s cells generate a white winter consider the coin, not the machine, the efficient cause
coat which is a formal cause of its survival, and thus of this activity, because a small variation, or defect, in
also of the very cells which generated it. To claim that the coin prevents coffee-making from occurring.
coloration is non-supervenient on these cellular behav- By contrast, the weasel’s color-change is self-
iors seems to invoke a contradictory circular causality. organizing (▶ Self-Organization): It proceeds even
C 212 Causation, Origination

Causation, Origination, 3.5


Fig. 1 Self-organized Demand
buffering of blood-glucose Insulin
3
level against varying demand Glucagon
Glucose
2.5

Concentration
2

1.5

0.5

0
0 50 100 150 200 250 300 350 400 450 500
time

if clouds obscure the day-length. Self-organizing bifurcating into qualitatively distinct system behaviors
systems generate reliably similar outcomes BðtÞ in response to small quantitative changes in some
from dissimilar initial conditions; Fig. 1 illustrates critical parameter P. If these changes in P stem from
self-organized buffering of blood-glucose against S’s context, they constitute environmental cues which
variations in demand in the mammalian insulin- S amplifies into vigorous responses BðtÞ (see Fig. 2).
glucagon system. Swenson (2010) uses the term However, for Kant, meaning is also essentially per-
autocatakinetic (ACK) to describe systems which sonal: S must itself determine the dynamics of its own
self-organize using contextual resources. The weasel meaning-constructions.
is an ACK system which buffers its coat-change Since S’s behavior x_ is determined by its structure
against fluctuations in its context, so its winter f ðx; tÞ, we may ask who determines this structure. We
coat is neither supervenient on the weasel’s internal might assume the weasel’s coloration change to be
structure nor is it determined by its context. So is meaningful, relying as it does on an interpretation of
the weasel qua organism the causal origin of the day-length in terms of future snow. However, the
coloration change? structures generating this dynamic arise from propen-
This is the question of final cause, and goes to the sities of the weasel species-niche system, so the inter-
heart of physical-epistemic dualism. In his Critique pretation “snow” does not originate in this individual
of Judgment, Kant argued for a distinction between weasel. The status of organisms as final causes there-
physical and biological explanation, characterizing fore hangs on one question: Do individual organisms
physical systems as heteronomous (other-governed), determine their own structure?
and living organisms as the autonomous (self- To attribute this determination to efficient
governing) originators of final cause. Autonomous cause would be to accept Cartesian dualism; however,
systems are certainly ACK, and ACK systems have the ability of complex open systems to amplify min-
the capacity to assert their autonomy against the imal contextual variation makes their context
dictates of context; however, Kant’s definition of a rich source of internal dynamical variation. If this
autonomous systems (▶ Autonomy) as final causes variation is dynamically coupled to the very struc-
depends on their ability to construct meaning from tures which generate it, any consequent stabilization
environmental cues, enabling them to predict and of those structures would necessarily be compensated
respond to future events. by a corresponding increase in ▶ entropy production
ACK systems can potentially construct meaning, by the system. Swenson’s (2010) proposed Law of
since they necessarily exhibit ▶ complex behavior, Maximum Entropy Production (LMEP) stipulates
CD4+ T Cell Response 213 C
Causation, Origination,
Fig. 2 Complex response of
a protein-RNA switch to Protein,
varying protein influx 1 RNA
Protein influx
0.8

C
0.6

0.4

0.2

0
0 50 100 150 200 250 300 350 400
time

that such variation is subject to differential selection Cross-References


on its ability to maximize entropy production. LMEP
can therefore be expected to drive dynamic stabiliza- ▶ Autonomy
tion (Thompson 2007), the differential selection ▶ Causality
of behaviors on their ability to stabilize their ▶ Complex Behavior
structural base. ▶ Developmental Systems Theory
Structures which generate stable, ordered dynamics ▶ Entropy
are thus an expected feature of LMEP selection, raising ▶ Self-Organization
the question: Who determines this choice of structure?
Sober and Wilson (1998) analyze this question in terms
of multilevel targets of selection – groups of individ- References
uals (for example, cells or organisms) whose collective
stability depends on interaction with each other, but Bedau MA, Humphreys P (eds) (2008) Emergence. MIT Press,
Cambridge, MA
not with individuals outside the group. On this account,
M€uller GB, Newman SA (eds) (2003) Origination of organismal
autonomous systems may be understood as ACK sys- form. MIT Press, Cambridge, MA
tems which determine their own structure not effi- Oyama S, Griffiths PE, Gray RD (eds) (2001) Cycles of contin-
ciently, but by presenting a unitary target to dynamic gency. MIT Press, Cambridge, MA
Putnam H (1980) Models and reality. J Sym Logic 45:464–482
stabilization.
Sober E, Wilson DS (1998) Unto others. Harvard University
In summary, current scientific understanding Press, Cambridge, MA
appears compatible with an account of causation Soto AM, Sonnenschein C, Miquel P-A (2008) Acta Biotheor
which resolves the physical-biological dualism of 56:257–274
Swenson R (2010) Selection is entailed by self-organization and
Kant. In particular, concepts of final cause, emergence,
natural selection is a special case. Biol Theory 5(2):167
and origination are compatible with autonomous Thompson E (2007) Mind in life. Belknap, Cambridge, MA
(dynamically stabilizing, ACK) dynamical systems.
These owe their autonomy to their engagement as
a systemic unity in meaning-making: the dynamic sta-
bilization of meaningful responses through the praxis
of meaning-construction. LMEP suggests that autono- CD4+ T Cell Response
mous systems are an expected feature of the natural
world. ▶ Th1 Response
C 214 CD8+ Cytotoxic T Lymphocytes (CTL)

carried out by CDK inhibitors (CKIs). CKIs play


CD8+ Cytotoxic T Lymphocytes (CTL) a key role in G1 regulation (Sherr and Roberts 1999).
In late G1, at the restriction point in mammals and Start
Joo Chuan Tong in yeasts, cells must decide to progress into a new cell
Data Mining Department, Institute for Infocomm cycle or to enter a quiescent state. After this point, cells
Research, Singapore, Singapore become insensitive to mitogenic or antimitogenic sig-
Department of Biochemistry, Yong Loo Lin School of nals, and are committed to complete the new cell cycle
Medicine, National University of Singapore, (Morgan 2007).
Singapore

Characteristics
Definition
Two Families of CKIs Regulate Cell-Cycle
CD8+ cytotoxic T lymphocytes are a subgroup of Progression and Differentiation
T lymphocytes whose main function is to induce the In mammals, CKIs are included in one of two families
apoptosis of infected (e.g., viruses, bacteria, parasites, according to their structure and CDK targets:
or fungi), mutated (e.g., cancer), or foreign (e.g., trans- 1. The INK4 (INhibitors of CDK4) family of CKIs
plants) cells. includes p16INK4a, p15INK4b, p18INK4c, and
p19INK4d. They bind CDK4 and CDK6 and form
Cross-References binary complexes with these kinases, preventing
CDK interaction with cyclin D. The four INK4
▶ TAP Translocator, In Silico Prediction proteins share a similar structure composed of mul-
tiple ankyrin repeats (Cánepa et al. 2007).
References 2. The Cip/Kip (CDK-interacting protein/Kinase
inhibitor protein) family of CKIs comprises
Murphy K, Travers P, Walport M (2007) Janeway’s p21Cip1, p27Kip1, and p57Kip2. These proteins have
immunobiology, 7th edn. Garland Science, New York, both a positive and a negative role on CDK activity:
pp 364–368
they act as cofactors required for CDK4/6-cyclin
D complex assembly but inhibit CDK2-cyclin E,
CDK2-cyclin A, and CDK1-cyclin B activity.
CDK Inhibitors Cip/Kip inhibitors share a conserved N-terminal
domain, which binds to CDKs and cyclins, but
Livia Pérez-Hidalgo and Sergio Moreno differ in the rest of the sequence (Besson et al.
Instituto de Biologı́a Molecular y Celular del Cáncer, 2008; ▶ Cell Cycle of Mammalian Cells) (Fig. 1).
CSIC/Universidad de Salamanca, Salamanca, Spain In other organisms, proteins functionally related to
mammalian CKIs also play key roles in cell-cycle
regulation and in promoting cell-cycle exit during dif-
Definition ferentiation processes (See Table 1; De Clercq and
Inzé 2006).
CDK inhibitors are proteins that suppress CDK-cyclin In Drosophila melanogaster, there are two Cip/
protein kinase activity in the G1 phase of the cell cycle Kip-related CKIs, Dacapo and Roughex. Dacapo reg-
and promote G1 arrest in response to environmental or ulate the exit from the cell-cycle during embryonic
intracellular stimuli (▶ Cyclins and Cyclin-dependent development, and Roughex contributes to the estab-
Kinases). lishment of G1 through downregulation of CDK-cyclin
Oscillations in CDK-cyclin activity are accurately A complexes.
regulated in order to ensure the correct sequence of cell The Caenorhabditis elegans genome encodes two
cycle events (▶ Cell Cycle). This regulation occurs at CKIs, cki-1 and cki-2, belonging to the Cip/Kip family
several levels, including cyclin binding to CDKs, CDK of CDK inhibitors, and both are required for cell-cycle
phosphorylation, and CDK-cyclin activity inhibition exit into quiescence during development.
CDK Inhibitors 215 C
p16INK4a
p15INK4b
INK4 p21Cip1
p18INK4c
INK4d Cip/Kip p27Kip1
p19
p57Kip2

Cdk4/6

G0
CycD C
Cip1
Cdk2 p21
CycE Cip/Kip p27Kip1
p57Kip2
G1

Cdk1
CycB M S Cdk2
CycA

G2

p21Cip1
p27Kip1 Cip/Kip
p57Kip2 Cdk1
CycA

CDK Inhibitors, Fig. 1 Regulation of cell-cycle progression by CKIs in mammalian cells. Two families of CDK inhibitors in
metazoans regulate G1 progression: Cip/Kip proteins and INK4 proteins

In Xenopus, Cip/Kip-related protein p27Xic1 is primary function of Sic1 appears to be at this point of
involved in cell-cycle exit and differentiation in the cycle, Sic1 is also required at mitotic exit,
myogenesis and neurogenesis (▶ Cell Cycle of Early contributing with its accumulation, together with
Frog Embryos). Clb2 degradation, to the irreversibility of this transi-
Yeasts’ CKIs share no sequence homology with tion (▶ Cell Cycle Dynamics, Irreversibility). Far1 is
metazoan CKIs, but they are functionally and structur- induced in response to pheromones, leading to the G1
ally related. In the budding yeast Saccharomyces arrest that precedes mating (▶ Cell Cycle, Budding
cerevisiae, two CKIs regulate cell-cycle progression, Yeast).
Sic1 and Far1. Sic1 is required for the establishment of In the fission yeast Schizosaccharomyces
a G1 phase and to delay entry into S phase until pombe, a single Sic1-related inhibitor, Rum1,
▶ prereplicative complexes are built. In the absence carries out the functions in cell proliferation and
of Sic1, cells show defects in G1 progression, differentiation of budding yeast Sic1 and Far1,
and this impairs S phase dynamics. Cells start DNA respectively. Rum1 regulates progression through
replication from fewer origins, extending S phase the G1 phase of the cell cycle and is essential in
and entering mitosis with unreplicated chromosomes processes that require lengthening the G1 phase,
that inefficiently separate during anaphase. As such as the responses to pheromones and nutrient
a consequence, sic1-defective cells show increased starvation. Also, Rum1 is required to prevent CDK1
genomic instability. All these defects are rescued (Cdc2) activation at Start until the critical cell size is
by delaying CDK activation and S phase, indicating reached (Labib and Moreno 1996). (▶ Cell Cycle,
that the primary function of Sic1 is at the end of Cell Size Regulation; ▶ Cell Cycle, Fission Yeast)
G1 (Lengronne and Schwob 2002). Although the (Fig. 2).
C 216 CDK Inhibitors

CDK Inhibitors, Table 1 CDK inhibitors


Species Name Target Function
Human p16INK4a CDK4/6-cyclin D inhibition Prevention of G1/S transition. Induction of senescence.
p15INK4b CDK4/6-cyclin D inhibition Prevention of G1/S transition. Mediator of TGFb
cell-cycle inhibitory effects.
p18INK4c CDK4/6-cyclin D inhibition Prevention of G1/S transition. Exit from the cell cycle
and cell differentiation.
p19INK4d CDK4/6-cyclin D inhibition Prevention of G1/S transition. Exit from the cell cycle
and cell differentiation.
p21Cip1 CDK4/6-cyclin D assembly and DNA-damage-induced cell cycle arrest in G1 and G2.
CDK2-cyclin A/E inhibition
p27Kip1 CDK4/6-cyclin D assembly and Cell-cycle exit into quiescence.
CDK2-cyclin A/E inhibition
p57Kip2 CDK4/6-cyclin D assembly and G1 arrest in embryonic development. Maintenance of
CDK2-cyclin A/E inhibition endocycles.
Drosophila Roughex CDK1-cyclin A Exit from mitosis. Establishment of G1 phase.
Dacapo CDK2-cyclin E Cell-cycle exit during embryonic development.
Establishment of endocycles.
Xenopus p27Xic1 CDK2-cyclin E/A Establishment of G1 phase in somatic cell cycle.
Differentiation in muscle and neural cells.
C. elegans cki-1 CDK2-cye1(cyclin E) Developmental cell-cycle quiescence.
cki-2 CDK2-cye1(cyclin E) Developmental cell-cycle quiescence.
Fission yeast Rum1 Cdc2(CDK1)-Cig2/Cdc13 Regulation of G1 progression. G1 arrest in response to
S. pombe pheromones or nitrogen starvation.
Budding yeast Sic1 Cdc28(CDK1)-Clb5/6 Establishment of the G1 phase.
S. cerevisiae Far1 Cdc28(CDK1)-Cln1/2/3 G1 arrest and differentiation in response to mating
pheromones.

Role of Mammalian CKIs in Tumorigenesis and p57Kip2 is required for embryonic development, and
Phenotypes of CKI Knockouts p57Kip2-null mice have major developmental defects
Cip/Kip Inhibitors and mostly die perinatally. This inhibitor shows
According to its function limiting cell-cycle progres- a tissue-specific pattern of expression both in the
sion, p27Kip1-deficient mice display cell proliferation adult and during embryogenesis.
defects: increased body size, multiple organ hyperpla-
sia, and pituitary tumors. Unlike other tumor suppres- INK4 Inhibitors
sor genes, p27Kip1 is rarely mutated in cancer but Members of the INK4 family display different patterns
appears frequently downregulated by increased degra- of expression: p16INK4a and p15INK4b are not expressed
dation, decreased transcription or cytoplasmic during embryonic development but are induced in
mislocalization. Besides, the simultaneous inactiva- response to oncogenic stress, and p18INK4c and
tion of other ▶ tumor suppressor genes enhances the p19INK4d are expressed during embryogenesis and at
tumor-prone phenotype of p27Kip1 knockout (▶ Cell later stages in adults, carrying out their function during
Cycle, Cancer Cell Cycle and Oncogene Addiction). organismal development. Mice deficient in any of
In contrast to p27Kip1, p21Cip1 knockout mice do not these proteins are viable.
display a hyperproliferative defect. They show p16INK4a is an important tumor suppressor that is
increased predisposition to tumor development in frequently mutated or downregulated in many different
combination with mutations in other tumor suppressor forms of human cancer in contrast to the other three
genes. As with p27Kip1, mutations in p21Cip1 gene are INK4 genes, which are rarely mutated in cancer. It is
rare in tumors. In addition, cells lacking p21Cip1 fail to not generally expressed during fetal development but
arrest in response to DNA damage (see below). accumulates in adults during ▶ senescence. In the
CDK Inhibitors 217 C
Budding yeast
Start

G1 S G2 M

Clb5,6
Cdc28
C
α-factor

Sic1
Cln1,2
Far1 P
Cdc28 SCFCdc4 P Ub
Proteasome
Sic1 Sic1 Ub Sic1
Ub

Fission yeast
Start
G1 S G2 M

Cig2/Cdc13
Cdc2

Rum1
Cig1
Cdc2 P
SCFPop1,Pop2 P Ub
Proteasome
Rum1 Rum1 Ub Rum1
Ub

CDK Inhibitors, Fig. 2 Regulation of the cell cycle by CKIs in inhibiting Cdc28-Cln1,2 complexes. Bottom: In fission yeast,
budding and fission yeast. Top: In budding yeast, Sic1 delays Rum1 regulates progression through G1 and induces a cell-cycle
entry into S phase through binding to Cdc28-Clb5,6 complexes. arrest in the absence of nutrients and in response to pheromones
At G1/S transition, Sic1 is phosphorylated by Cdc28-CIn1,2, by inhibiting the activity of Cdc2-Cig2 and Cdc2-Cdc13
thereby inducing ubiquitylation of Sic1 by SCFCdc4 and complexes. Phosphorylation of Rum1 by Cdc2-Cig1 targets it
proteasome-dependent degradation. In response to pheromones, to ubiquitylation by SCFPop1,Pop2, prior to degradation by the
Far1 induces cell-cycle arrest in G1 prior to cell fusion by proteasome

absence of p16INK4a, p15INK4b has an essential function contribution to the DNA-damage response in human
as a tumor suppressor. cells regulating multiple processes, such as cell cycle
The loss of p18INK4c and p19INK4d leads to male progression, ▶ apoptosis, and transcription (Cazzalini
reproductive defects, and mice lacking p19INK4d also et al. 2010).
display progressive hearing loss. In addition, inactiva- Regulation of cell cycle progression occurs by
tion of p18INK4c predisposes to cancer development inhibition of CDK2-cyclin E and CDK2-cyclin
and is a haploinsufficient tumor suppressor in mice. A complexes, required for phosphorylation of ▶ reti-
By contrast, p19INK4d null mice do not display tumors noblastoma, but also by binding to PCNA (proliferat-
nor are more susceptible to cancer development in ing cell nuclear antigen), which interferes with PCNA
response to carcinogens. function during DNA replication.
Another part of the response of p21Cip1 to DNA
Function of p21Cip1 in the DNA-Damage Response damage is transcriptional regulation, where p21Cip1
p21Cip1 increases following DNA damage in a has both a positive and a negative role. This regulation
p53-dependent manner, and it makes an essential occurs through different mechanisms. Inhibition of
C 218 CDK Inhibitors

CDK activity prevents retinoblastoma phosphoryla- and the G1/S transition. Mitogenic stimulation also
tion, thus leading to inhibition of E2F-dependent inhibits, through PI3K and AKT, GSK3b phosphory-
transcription. Besides, p21Cip1 associates with the lation of cyclin Ds, thereby inhibiting their nuclear
DNA-binding proteins E2F1, STAT3, and MYC, export and enhancing the nuclear accumulation of
suppressing their transcriptional activity, and dere- active cyclin Ds-CDK complexes. Inhibition of
presses p300-CREBBP targets. GSK3b also stimulates expression of cyclin Ds
In contrast to the role of p53 in the induction of through regulation of the transcription factors AP-1
apoptosis in response to DNA damage and oxidative and Myc.
stress, p21Cip1 protects cells from apoptosis, although Transforming growth factor (TGFb) antimitogenic
in some cases it may show a pro-apoptotic role. Since signal induce a G1 block, causing synthesis of CKIs
an active cell cycle is required for apoptosis, p21Cip1 p15INK4b and p21Cip1. p15INK4b levels increase through
may prevent apoptosis through its role in suppressing the Smad complex. Smad proteins inhibit expression
cell-cycle progression, but also it may inhibit apoptosis of Myc, and, after Myc has been repressed, expression
directly, blocking pro-caspase-3 processing or of p15INK4b increases (Morgan 2007).
preventing stress-induced apoptosis by binding to
ASK1/MEKK5 or JNK1/SAPK. Degradation of CKIs at the G1/S Transition
Irreversible entry into the cell cycle at Start is con-
Role of CKIs in Endocycles trolled by positive ▶ feedback loop that generate irre-
CKIs play an essential role in the control of CDK versible CDK activation (▶ Cell Cycle Dynamics,
activity in endocycling cells, which undergo multiple Irreversibility). Proteolytic degradation of CKIs at
rounds of DNA replication without an intervening G1/S, triggered by a starter kinase, is essential for
mitosis (Ullah et al. 2009). Start in yeast and mammalian cells (Morgan 2007;
p57kip2 triggers genome ▶ endoreplication during Novak et al. 2007).
differentiation of trophoblast stem (TS) cells into tro- In human cells, different pathways lead to p27kip1
phoblast giant (TG) cells. It is expressed at the end of phosphorylation, ubiquitylation, and proteasome-
S phase, preventing entry into mitosis and inducing the mediated destruction at G1/S:
accumulation of cells in G2. As CDK activity is low, 1. In late G1, p27kip1 is phosphorylated at T187 by
▶ APCCdh1 is activated instead of APCCdc20. One of CDK2-cyclin E and CDK2-cyclin A complexes.
the targets of APCCdh1 is geminin, an inhibitor of the This phosphorylation targets p27kip1 for
licensing factor Cdt1. The destruction of geminin and ubiquitylation by SCFSkp2 (▶ Skp1/Cul1/F-Box-
other targets, such as cyclin A, creates a window in G2 containing Complex (SCF)) and subsequent degra-
with low CDK activity, which permits the formation of dation by the 26S proteasome. The interaction
prereplicative complexes. p57kip2 disappears after between p27kip1 and Skp2 is enhanced by additional
phosphorylation by CDK2-cyclin E and degradation factors, such as Cks1 (CDK subunit 1). Also, inter-
by the 26S ▶ proteasome, to allow the onset of S phase. action between CDK2-cyclin A and T187-
Oscillations of p57kip2 and cyclin E protein levels phosphorylated p27kip1 stimulates p27kip1
Skp2
maintain the endocycles. In Drosophila, endocycles ubiquitylation by SCF .
in nurse cells are maintained through oscillations in 2. Phosphorylation of p27kip1 at S10 by AKT pro-
the levels of Cyclin E and the CKI Dacapo. motes its nuclear export through interaction with
the nuclear export protein Crm1. The stability of
Regulation of CKIs by Mitogenic and cytoplasmic p27kip1 is regulated by the E3 ligase
Antimitogenic Signals KPC (Kip1 ubiquitylation-promoting complex),
In mammalian cells, in response to mitogenic signals which targets it to degradation by the proteasome
and activation of the Ras-MAP kinase pathway, cyclin (Fig. 3).
Ds are synthesized, and stable cyclin Ds-CDK com- In addition, G1/S transition in cycling cells and
plexes containing the p27kip1 inhibitor as a cofactor after DNA-damage repair requires degradation of
form, thus taking p27kip1 from CDK2-cyclin E/A com- p21Cip. p21Cip1 is degraded at the proteasome by
plexes. This leads to the activation of these complexes ubiquitin-dependent and ubiquitin-independent
CDK Inhibitors 219 C
P KPC P
Proteasome
Kip1 p27Kip1
p27 p27Kip1
Ub
Ub
Ub
Cytoplasm

C
Nucleus Crm1
G0 G1 AKT

p27Kip1
S10

p27Kip1

T187
P SCFSkp2 P
Proteasome
p27Kip1 p27Kip1 p27Kip1
Ub
Ub
Ub
CDK2-CycE
G1 Late G1

CDK Inhibitors, Fig. 3 Regulation of p27Kip1 by phosphoryla- degradation at the proteasome. Phosphorylation of p27Kip1 by
tion and proteasome degradation. CDK2-cyclin E phosphory- AKT promotes p27Kip1 translocation to the cytoplasm and
lates p27Kip1, promoting its ubiquitylation by SCFSkp2 and KPC-dependent ubiquitylation

mechanisms. Phosphorylation by CDK2-cyclin E at targets it for destruction by the proteasome, leading to


S130 promotes SCFSkp2-dependent degradation, CDK1-Clb5/6 activation and the onset of S phase.
which occurs in part after translocation of phosphory-
lated p21Cip1 to the cytoplasm. p21Cip1 reacummulates
again in G2, and it is degraded at the G2/M transition Cross-References
by the proteasome after ubiquitylation by the E3 ligase
APCCdc20. In addition, the CRL4Cdt2 E3 ubiquitin ▶ Anaphase-Promoting Complex (APC/C)
ligase promotes p21Cip1 degradation during S phase ▶ Apoptosis
and in response to damage (Lu and Hunter 2010). ▶ Cell Cycle
In yeasts, CKI degradation is initiated by phosphor- ▶ Cell Cycle of Early Frog Embryos
ylation of the CKI by CDK-cyclin complexes different ▶ Cell Cycle of Mammalian Cells
from those inhibited by the CKI: ▶ Cell Cycle Transition, Principles of Restriction
In fission yeast, phosphorylation of Rum1 at T58 Point
and T62 by CDK1-Cig1 is required for Rum1 ▶ Cell Cycle, Budding Yeast
ubiquitylation by the SCFPop1,Pop2 and degradation by ▶ Cell Cycle, Cancer Cell Cycle and Oncogene
the 26S proteasome, thus leading to CDK1-Cig2/ Addiction
Cdc13 activation. ▶ Cell Cycle, Cell Size Regulation
In budding yeast, G1/S Sic1 destruction by the ▶ Cell Cycle Dynamics, Irreversibility
proteasome depends on phosphorylation at multiple ▶ Cell Cycle, Fission Yeast
CDK consensus sites (S/T)PX(K/R) by CDK1 ▶ Cyclins and Cyclin-dependent Kinases
(Cdc28)-Cln1/2 complexes. Once phosphorylated, ▶ Endoreplication
Sic1 binds to SCFCdc4, which ubiquitylates Sic1 and ▶ Feedback Loops
C 220 cDNA Microarrays

▶ Prereplicative Complex
▶ Proteasome Cell Compartment
▶ Retinoblastoma
▶ Senescence ▶ Organelle and Functional Module Resources
▶ Skp1/Cul1/F-Box-containing Complex (SCF)
▶ Tumor Suppressor Gene

References Cell Cycle


Besson A, Dowdy SF, Roberts JM (2008) CDK inhibitors: cell Attila Csikász-Nagy
cycle regulators and beyond. Dev Cell 14:159–169
The Microsoft Research - University of Trento
Cánepa ET, Scassa ME, Ceruti JM, Marazita MC, Carcagno AL,
Sirkin PF, Ogara MF (2007) INK4 proteins, a family of Centre for Computational and Systems Biology,
mammalian CDK inhibitors with novel biological functions. Trento, Italy
IUBMB Life 59:419–426
Cazzalini O, Scovassi AI, Savio M, Stivala LA, Prosperi E
(2010) Multiple roles of the cell cycle inhibitor
p21(CDKN1A) in the DNA damage response. Mutat Res Introduction
704:12–20
De Clercq A, Inzé D (2006) Cyclin-dependent kinase inhibitors Cell cycle is the sequence of events whereby
in yeast, animals, and plants: a functional comparison. Crit
a growing cell duplicates all of its components and
Rev Biochem Mol Biol 41:293–313
Labib K, Moreno S (1996) rum1: a CDK inhibitor regulating G1 divides them between two daughter cells. This bio-
progression in fission yeast. Trends Cell Biol 6:62–66 logical process is essential for reproduction, thus, it is
Lengronne A, Schwob E (2002) The yeast CDK inhibitor Sic1 a basic principle of life. There is a long history of cell
prevents genomic instability by promoting replication origin
cycle research and the discoveries of the key regula-
licensing in late G1. Mol Cell 9:1067–1078
Lu Z, Hunter T (2010) Ubiquitylation and proteasomal degrada- tors of the underlying machinery have been recog-
tion of the p21(Cip1), p27(Kip1) and p57(Kip2) CDK inhib- nized by Nobel Prize awards. The cell cycle was also
itors. Cell Cycle 9:2342–2352 an early target of mathematical modeling and
Morgan DO (2007) The cell cycle: principles of control. New
pioneering work in systems biology. One of the first
Science, London
Novak B, Tyson JJ, Gyorffy B, Csikasz-Nagy A (2007) Irrevers- systems biology approaches realizing the experiment-
ible cell-cycle transitions are due to systems-level feedback. model-experiment loop was concerned with the cell
Nat Cell Biol 9:724–728 cycle regulation of budding yeast cells as well as
Sherr CJ, Roberts JM (1999) CDK inhibitor: positive and nega-
some of the earliest system-level measurements of
tive regulators of G1-phase progression. Genes Dev
13:1501–1512 this system. Further research on cell cycle regulation
Ullah Z, Lee CY, Depamphilis ML (2009) Cip/Kip cyclin- in various organisms included diverse systems biol-
dependent protein kinase inhibitors and the road to poly- ogy techniques. This work has made considerable
ploidy. Cell Div 4:10
contribution to our understanding of key principles
of cell cycle regulation. Nevertheless, there are still
many open questions that could be answered by
a combination of computational and experimental
cDNA Microarrays approaches.
In this essay, we cover the basics of cell cycle
▶ DNA Microarrays physiology, molecular biology and the modeling the
underlying processes and mechanisms. This discussion
serves as a context for all articles on the cell cycle.
Here, we cover historical discoveries both on the
CDR-IMGT experimental and theoretical sides, explaining what
led to a Nobel Prize in cell cycle research and how
▶ Complementarity Determining Region (CDR- mathematical models of the cell cycle contributed to
IMGT) the success of systems biology.
Cell Cycle 221 C
Biology of the Cell Cycle cell cycle machinery to halt progress as long as prob-
lems remain. Metabolic pathways inform the cell cycle
Adequate cell cycle regulation is needed to ensure that (▶ Cell Cycle Signaling, Metabolic Pathway) about
each daughter cell inherits accurate copies of the the presence or absence of nutrients in the environ-
mother’s DNA, thus cell cycle separates the duplica- ment. Stress pathways (▶ Modeling Approaches of
tion and the separation of chromosomes in time. Dur- Cell Cycle Regulation by Signaling Pathways for
ing S-phase of the cell cycle the ▶ DNA replication External Stress) monitor other external and internal C
takes place while later in M-phase, the chromosome conditions, such as osmotic pressure, oxidative state
segregating ▶ mitosis occurs. These two phases are of the cells (▶ Cell Cycle Signaling, Hypoxia), DNA
separated by two gap phases, leading to a full cell (▶ Cell Cycle Arrest After DNA Damage), and spindle
cycle with G1-S-G2-M phases in this order (Morgan damage (▶ Cell Cycle Signaling, Spindle Assembly
2006). At the end of M-phase, the cell nucleus divides, Checkpoint). All of these pathways control activities
which is followed by ▶ cytokinesis when the two of key proteins in the core cell cycle machinery.
daughter cells separate. The gap phases are also needed The central members of this core cell cycle machin-
for proper homeostasis of cells: the newborn daughter ery are the complexes of ▶ cyclins and cyclin-
cells should have a cell size more or less similar to the dependent kinases. These complexes can induce the
birth size of the mother cell to keep the cell size close phosphorylation of various target molecules that signal
to constant between generations. The G1 and G2 to downstream cell cycle processes. Such phosphory-
phases are used to extend the required time to pick up lation events control the initiation of DNA replication
this extra cell growth that is needed to duplicate cell and mitosis directly by the effect of cyclin-dependent
size before cell division. This balance between cell kinases or through the regulation of centrosomal and
growth and division in many cell types is established mitotic kinases. The cyclin-dependent kinase (Cdk)
by active cell size regulation (▶ Cell Cycle, Cell Size can phosphorylate substrates only if a cyclin molecule
Regulation): either in G1 or G2 phase the cells check if is bound to it. This cyclin is responsible for the sub-
their size is large enough to, so that the cell can proceed strate recognition of the complex (Bloom and Cross
to the next cell cycle stage (S or M-phase respectively). 2007). As the name suggests, most cyclins appear only
Proper cell cycle physiology (▶ Cell Cycle, periodically during the cell cycle, their production and
Physiology) also requires that the S and M phases degradation can be regulated by both external and
happen in the correct order. This is achieved through internal signals of the cell. The activity of Cdk/cyclin
▶ cell cycle checkpoints. The above mentioned G1 and complexes are regulated throughout the cell cycle by
G2 phase size checks occur together with a check on control of the synthesis or degradation of cyclins
the successful completion of the earlier mitosis (at the (Zachariae and Nasmyth 1999), by binding the com-
G1 to S checkpoint) or DNA replication (at the G2 to plex to stoichometric ▶ Cdk inhibitors (CKI), or by
M checkpoint). These checks ensure that the DNA is controlling the kinase activity of Cdk through phos-
replicated only after it is successfully segregated and phorylation of its regulatory sites (Lindqvist et al.
cells start mitosis only after DNA replication is com- 2009) (Fig. 1a). These Cdk regulators ensure that Cdk
pleted (Hartwell and Weinert 1989; Lukas et al. 2004). activity is low in G1 phase, it appears at the G1/S
A third checkpoint exists in M-phase, where the cell transition point (also called Start or Restriction point
ensures that chromosome segregation starts only when (▶ Cell Cycle Transition, Principles of Restriction
all chromosomes are properly bound to the separation Point)), it increases in G2 phase and reaches its max-
machinery (Ciliberto and Shah 2009). Several other ima at the G2/M transition (▶ Cell Cycle Transitions,
checkpoints exist in which cells monitor external and G2/M), and it sharply drops at mitotic exit (▶ Cell
internal conditions and proceed only if all checks are Cycle Transitions, Mitotic Exit) (Fig. 1b). Checkpoints
passed. An example is the budding yeast morphogen- freeze Cdk activity, thus, the subsequent cell cycle
esis checkpoint, where cells check if the bud formation transition cannot be initiated until the checkpoint sig-
is properly initiated, before mitosis starts (Sia nal is not removed: DNA damage does not allow the
et al. 1998). sharp increase in Cdk activity and the spindle check-
These checkpoints work through a cross-talking point inhibits the drop in Cdk activity, ensuring that the
network of signaling pathways that talk to the core upcoming cell cycle transitions are inhibited.
C 222 Cell Cycle

Cell Cycle, Fig. 1 Regulation of Cdk activity during the cell induces cyclin removal at the end of the cell cycle. (b) Temporal
cycle. (a) Possible ways Cdk activity is regulated: CKI – pattern in Cdk activity during the cell cycle and cell cycle
Stochiometric Cdk inhibitor acts in G1, where Cdh1/APC transitions (Rp – Restriction point, exit – mitotic exit). In
complex induces cyclin degradation through the proteasome. S phase, the DNA replicates, in G2 two copies are present and
Synthesis of cyclin could be also regulated by transcriptional Cdk activity is low but not zero. Sharp increase in Cdk activity
factors. Wee1 and Cdc25 are the most important kinase/ induces mitosis, where the nuclei divide and the drop in the
phosphatase pair that regulates Cdk in G2 phase and Cdc20/APC activity triggers cell division

Crucial functions of the core cell cycle machinery in G2-phase). Such size control (▶ Cell Cycle, Cell
eukaryotic cells are the following: Size Regulation) clearly works in yeast cells, but
• The alternation of DNA replication (S-phase) and its role in higher eukaryotes is still not fully
mitosis (M-phase) is essential in normally prolifer- understood.
ating somatic cells. This is achieved by changing The basic controls are conserved in all eukaryotic
Cdk activity in each phase, and ensuring that the organisms from yeast to human. Indeed, human Cdks
cycle proceeds only in one direction. Incidental (or can rescue Cdk mutants of yeast cells and many other
regulated) drop in Cdk activity in G2 phase can core molecules are similarly highly conserved.
induce ▶ endoreplication, when two rounds of Although the cell cycle regulatory molecules of pro-
S phase happen without mitosis and two nuclear karyotes (▶ Cell Cycle, Prokaryotes) are highly dis-
divisions occur in ▶ meiosis, when the DNA con- similar, their interaction network and the resulting cell
tent is halved. These processes normally occur only cycle dynamics are very similar to the yeast system.
during development and embryogenesis, they are
avoided in normal somatic cells by careful control
of Cdk activity. Historical Discoveries of Cell Cycle Research
• Checkpoint mechanisms stop the cell cycle, by
freezing Cdk activity, if some previous event has Systematic analysis of cell cycle mutants in the 1970s
not been properly completed and ensure that repair by Lee Hartwell and Paul Nurse led to the discovery of
mechanisms are initiated to “solve” the problem. In Cdk while cyclin was discovered by Tim Hunt. These
higher eukaryotes, similar pathways eventually researchers received the Nobel Prize in 2001 for their
induce the suicide of the cell if the repair fails to breakthroughs in understanding cell cycle regulation
avoid the spreading of the mutant, which could lead (Nasmyth 2001).
to tumor formation (▶ Cell Cycle, Cancer Cell The first results toward the identification of
Cycle and Oncogene Addiction). a biochemical component governing cell cycle pro-
• Growth in cellular mass could be rate-limiting, thus, gression came from investigations on cell extracts
the DNA replication-division cycle is “slowed from frog eggs (▶ Cell Cycle of Early Frog Embryos).
down” by inserting gap phases (G1- and Studies in the early 1970s described the existence of
Cell Cycle 223 C
a maturation promoting factor (MPF) whose function The system-level interactions depicted in Fig. 2 cannot
was related to induction of quiescent cells into division be easily understood by reductionistic thinking. Math-
(Masui and Markert 1971). At the same time, yeast ematical and computational models can help to under-
genetic approaches lead to the discovery of the most stand such wiring diagrams and test if the current
important genes of cell cycle regulators: Hartwell knowledge on molecular interactions is sufficient to
identified cell division cycle (Cdc) genes in the bud- describe experimental observations (Csikasz-Nagy
ding yeast Saccharomyces cerevisiae (▶ Cell Cycle, 2009). Figure 3 shows simulation results of the wiring C
Budding Yeast) through a genetic screen of diagram of Fig. 2. It explains how the waves of the
temperature-sensitive mutants (Hartwell et al. 1973). different cyclins overlap with the inactivation of Cdk/
Paul Nurse isolated temperature-sensitive mutants of cyclin inhibitors (CKI, Cdh1 and Wee1).
the rod shaped fission yeast, Schizosaccharomyces
pombe (▶ Cell Cycle, Fission Yeast), and found cell-
cycle-blocked long cells, but also noticed unusually Initial Steps in Mathematical Modeling of the
small cells that seemed to be able to proliferate nor- Cell Cycle
mally (Nurse 1975). He named this mutant “wee”
(meaning small in Scottish) and realized that the Mathematical modeling of the cell cycle has been
related genes must have an important role in control- started already when scientists were only guessing
ling proper cell division. Nurse also figured out that the mechanisms that could drive the processes of the
one of the genes with wee phenotype can be mutated in cell division cycle. Early efforts considered the phe-
another domain to produce cell-cycle-blocked elon- nomenon as a black box but could still discover some
gated cells, discovered that this gene (Cdc2) is homo- of the rules that determine the observed physiology of
logues to Hartwell’s most important budding yeast cells. From the 1960s, we can find mathematical
gene, and found the human version of it (▶ Cell models that explain some key aspects of cell cycle
Cycle of Mammalian Cells). Tim Hunt contributed to regulation from phenomenological observations on
the previous work of Hartwell and Nurse when he cell size and cell cycle time distributions (Koch and
studied the control of mRNA translation and found Schaechter 1962). The great interest in the newly dis-
proteins whose abundance oscillated (Evans et al. covered chemical oscillations in the 60s made consid-
1983). He referred to these periodically degraded mol- erable contributions to research on theoretical physical
ecules as “cyclins.” Later discoveries showed that the chemistry and later to mathematical biology as well.
complex of cyclin and Cdc2 (generally now called Researchers, such as Albert Goldbeter, John J. Tyson,
Cdk) is responsible for the MPF activity in frog Arthur Winfree, and others, realized that the theories
embryos. After these breakthrough discoveries, several and tools to analyze chemical oscillations might help
cell cycle regulators and their functions have been to understand biological oscillations. They started to
identified in various organisms. We learned that simi- investigate calcium oscillations, circadian clocks, and
lar molecules control the cell cycle of plants, worms, many other biological oscillators, among them the
and flies and began to understand the intricacies of oscillations that drive cell division. As the first exper-
their regulation in the different organisms. Indeed, imental results on cell cycle regulation appeared (see
not only cyclin and Cdk proteins are conserved above), theoreticians started to create models to predict
among eukaryotes, but most of the cell cycle regulator how Cdk controls the cell cycle. Some of these models
proteins as well as their interactions (Nurse 1990). led to a better understanding of the basic dynamical
Here, we present the generic features of the cell cycle properties of the cell cycle regulatory network
regulatory network (Csikasz-Nagy et al. 2006) (Csikasz-Nagy 2009).
together with the regulators in most important cell
cycle test organisms (Fig. 2 and Table 1).
These interactions can drive the periodic appear- Predictive Models of the Cell Cycle
ances of the various cyclins. CycD is present through-
out the entire cell cycle, CycE, and CycA appears at the In the early 1990s, several hypotheses emerged
G1/S transition restriction point and CycB activity concerning the origin of cell cycle oscillations. Theory
increases at G2/M transition and drops at mitotic exit. tells us that for oscillations the system needed to
C 224 Cell Cycle

Cell Cycle, Fig. 2 Generic cell cycle regulatory network. Solid end lines stand for inhibition effects. Gray squares behind
arrows depict molecular transitions, molecules “sitting” on cyclins (CycA-D) represent their constantly bound Cdk partners
arrows and dashed arrows represent activations; dashed blunted

contain a negative feedback loop (auto-inactivation), a critical value to induce an abrupt activation of
some nonlinearity, and either a delay or a positive Cdk/CycB.
feedback loop (auto-activation) (Novak and Tyson Nonlinear positive feedback loops can create
2008). For the cell cycle oscillations, Albert Goldbeter multistability (Ferrell 2002) when, in a given parameter
proposed a molecular cascade mechanism that pro- range, the system can exist in more than one stable states
duces a limit cycle originating from a pure negative and only the history of the system determines in which
feedback of Cdk and its degradation machinery of these states we will find it (this is the concept of
(Goldbeter 1991). John J. Tyson and Bela Novak hysteresis). Transitions from one stable state to another
predicted a more complex mechanism producing can happen very abruptly when one of the controlling
a relaxation oscillator that moves around a hysteresis parameters reaches a critical value. To set back the
loop as a result of a combination of positive and neg- system to the original state, the same parameter has to
ative feedbacks (Novak and Tyson 1993). The Novak- decrease to a much lower value. This can ensure that
Tyson model proposed that the positive feedback loop transitions between the two stable states are quite robust,
between Cdk/CycB and Cdc25 and the double negative for small fluctuation these transitions are irreversible.
(thus also positive) loop between Cdk/CycB and Wee 1 Now we understand the importance of hysteresis and
create a situation where cyclin levels needs to reach irreversible switches (▶ Cell Cycle Dynamics,
Cell Cycle 225 C
Cell Cycle, Table 1 List of key cell cycle regulators in the most important cell cycle research model organisms
Function Name in Fig. 2 Budding yeast Fission yeast Xenopus laevis Mammalian
Starter CDK/cyclin complex CycD Cdc28/Cln3 Cdc2/Puc1 Cdk4,6/CycD Cdk4,6/CycD1–3
(G1/S transition inducers) CycE Cdc28/Cln1,2 - Cdk2/CycE Cdk2/CycE1,2
S-phase promoting factor (SPF) CycA Cdc28/Clb5,6 Cdc2/Cig2 Cdk1,2/CycA Cdk1,2/CycA1,2
M-phase promoting factor (MPF) CycB Cdc28/Clb1,2 Cdc2/Cdc13 Cdc2/CycB Cdk1/CycB1,2
Stoichiometric kinase inhibitor CKI Sic1 Rum1 Xic1 p27Kip1 C
Mitotic cyclin degradation Cdh1 Cdh1 (Hct1) Ste9 (Srw1) Fzr Cdh1
regulator (with APC)
Cyclin B, Cyclin A degradation Cdc20 Cdc20 Slp1 Fizzy p55Cdc
regulator (with APC)
Retinoblastoma protein Rb Whi5 - RBL1,2 Rb1
Cyclin E, Cyclin A transcription factor TFE Swi4/Swi6 Cdc10/Res1 XE2F E2F1–3
Mbp1/Swi6
CDK/cyclin B inhibitor kinase Wee1 Swe1 Wee1 Xwee1 Wee1
CDK/cyclin B activator phosphatase Cdc25 Mih1 Cdc25 Xcdc25 Cdc25C
Phosphatase working against the CDKs Cdc14 Cdc14 Clp1/Flp1 Xcdc14 Cdc14

Irreversibility) in the cell cycle: the cell must complete physiology (viability-lethality, cell size, lengths of dif-
one state before starting another event, avoiding the ferent cell cycle phases) in wild-type and 50 mutant
same phase to repeat twice (Novak et al. 2007). cells. The authors proposed that G1 and S/G2/M
As a result of the positive feedback in the Novak- phases are alternative states and cells switch between
Tyson model, the removal of some CycB cannot inac- them during the G1/S transition, when CDK activity
tivate Cdk immediately – showing hysteresis and abruptly increases. The bistability of the system is
bistability in the cell cycle (▶ Cell Cycle Dynamics, given by the antagonism between CDK/cyclin com-
Bistability and Oscillations). This prediction was inde- plexes and their inhibitors CKI (Sic1 in budding yeast)
pendently experimentally verified by two groups and Cdh1 (Fig. 4). The authors also proposed an exper-
(Pomerening et al. 2003; Sha et al. 2003), and it was iment which could verify the existence of this
also discovered that by removing the negative effect of bistability. In 2002, Fredrick R. Cross and his group
Wee1 on Cdk (thus disrupting the positive feedback performed experiments following these suggestions
loop), the oscillations became less robust and their and confirmed the prediction (Cross et al. 2002). In
amplitude was reduced (Pomerening et al. 2005). this groundbreaking study, Cross went much further
These insights highlighted the importance of positive and carried extensive experimental tests of various
feedbacks in cell cycle regulation. properties of the model while also performing some
Hysteresis and multistability was also proposed by crucial measurements to help further model develop-
mathematical modeling and verified later experimen- ment. This milestone study was one of the first that was
tally for the cell cycle regulation of yeast cells. The two fully devoted to test a mathematical model, and could
landmark papers by Katherine C. Chen and colleagues therefore be viewed as one of the first true systems
from the Tyson lab dealt with the cell cycle of budding biology studies. Two years later, the measurements
yeast cells. The models that were created in this work and corrections by Cross were implemented into
are now the influential mathematical models of cell a new version of Chen’s model (Chen et al. 2004). It
cycle regulation (Chen et al. 2004; Chen et al. 2000). was also extended to cover detailed regulation of
These models are based on an extensive literature data mitotic exit. The resulting model was tested against
collection on more than 100 budding yeast cell cycle the behavior of 131 mutants and marginally failed only
mutants. The final model can simulate the findings of on 11 of them – highlighting the aspects of the model
almost all of these experiments. In their first paper, that lacked sufficient experimental evidence. This
Chen et al. (2000) examined the molecular events model also predicted the presence of a mitotic exit
regulating the START transition of the cell cycle. regulatory phosphatase that was later experimentally
The model accounts for many details of the cells’ identified (Queralt et al. 2006).
C 226 Cell Cycle

Cell Cycle, Fig. 3 Simulation of a “generic” cell cycle model. induce Cdh1 inactivation and drive the cell into S-phase. Wee1
Temporal fluctuations in molecular levels of the most important keeps Cdk/CycB inactive in G2. Later, as the autocatalytic loops
cell cycle regulators (notations as in Table 1 and Fig. 2). Plots of Cdk/CycB activation turn on, Wee1 gets inactivated and the
show that the level of CycD is not fluctuating during the cell increase in Cdk/CycB activity induces M-phase. At the end of
cycle, still its activity together with the increasing CycE depen- mitosis, when the spindles are properly aligned, Cdc20 activa-
dent Cdk activity can induce CKI degradation and activation of tion is not halted anymore and the active Cdc20/APC induces
further CycE and CycA production (after inactivation of Rb and CycB degradation and CKI and Cdh1 activation to finish the cell
activation of E2F, as noted on Fig. 2). Cdk/CycA and Cdk/CycE cycle. The drop in Cdk/CycB activity induces cell division

Cell Cycle Modeling Techniques model can be defined. ODEs have been used for
many years to create mathematical models of eukary-
The most common approach to describe a system com- otic cell cycle regulation, since this was the technique
posed of biochemical reactions is to translate the inter- that was used to investigate chemical and physical
actions of a molecular wiring diagram into ordinary oscillators. Thus, all the developed tools of dynamical
differential equations (ODEs) (▶ Cell Cycle systems theory could be used to understand ODE
Modeling, Differential Equation). These can be either models of cell cycle regulation. Maybe to most fre-
solved analytically or, in the case of larger models, quently used technique is bifurcation analysis (▶ Cell
numerically by computational means. The solutions of Cycle Model Analysis, Bifurcation Theory), which
ODEs can be compared with existing experimental follows the steady state solutions of the system of
data, thus some of the kinetic rate constants of the differential equations. These steady states can identify
Cell Cycle 227 C

Cell Cycle, Fig. 4 Bistability in the budding yeast cell cycle. abruptly activated at the critical cell mass what drives the cell
The signal-response curve (left) shows how the steady state Cdk/ into S-phase. Small fluctuations cannot drive the system back to
Cyc activity depends on cell mass (solid and dashed lines stand G1 phase. Unless the cell mass sensing pathway activity drops
for stable and unstable steady states respectively). In early G1 below the lower boundary of the bistabile zone (gray back-
phase only one steady state exists, with low Cdk/Cyc activity. As ground), the cell will stay in the high Cdk activity S/G2/M
the cell grows the second stable state appears as well, but the state. The bistability is a result of an antagonistic, double nega-
original G1 steady state still remains until the cell mass reaches tive (thus positive) feedback loop, where Cdk inhibits Cdh1 and
a critical value where this disappears and only the high Cdk CKI activity, while Cdh1 induces cyclin degradation and CKI
activity state remains. As the cell grows (dotted lines) Cdk gets inhibits Cdk/Cyc complexes (Fig. 1a)

the phases of the cell cycle and cell cycle transitions complexity of mathematical models is growing as
could be associated with bifurcations. Indeed Fig. 4 is well. Detailed analysis and parameterization of such
a bifurcation diagram where cell mass is a bifurcation large systems can be difficult when the number of
parameter and Cdk activity is the state variable. ODEs is large. Simplification by logical modeling
The major problem with most biological systems is has been proposed to overcome this problem (▶ Cell
the fact that very few rate constants can be measured Cycle Modeling Using Logical Rules). Logical model-
directly. Parameter sensitivity analysis (▶ Cell Cycle ing has a long tradition in biology and recently some
Models, Sensitivity Analysis) is used to identify the applications to cell cycle research also appeared. These
parameters that have the highest influence on model models are based on Boolean algebra, where the activ-
behavior. This method could help to reduce the number ity of each component is represented by two states: ON
of parameters that need to be measured or estimated. and OFF, providing a method which is computation-
Several techniques exist to estimate the parameters of ally less expensive. Behaviors that originate from the
a particular model, from manual parameter “twid- topology of the system can be investigated this way. In
dling” based on “educated guesses” to highly sophis- their pioneering study, Li et al. (2004) showed that the
ticated parameter inference using software tools network of the budding yeast cell cycle regulation is
(Panning et al. 2008; Schmidt and Jirstrand 2006). robustly designed, as 86% of all possible initial states
Almost all of these have been used in various cell lead the system to the same stable cell cycle path. This
cycle models. Metabolic control analysis is mostly result underlines that the biochemical network of cell
used to identify key reactions of metabolism, but cycle regulation can function in a parameter-
recent studies showed that similar technique could be insensitive way.
used to identify key reactions of the cell cycle as well Fluctuations in the average behavior of a cell popula-
(▶ Cell Cycle Signaling, Metabolic Pathway). tion can be described by deterministic ODE models or
Our knowledge about the regulatory network of the logical models, but the behavior of individual cells cannot
cell cycle and its interactions with other cellular path- be simulated that way. Stochastic modeling approaches
ways is constantly growing. As a consequence, the (▶ Cell Cycle Modeling, Stochastic Methods) are
C 228 Cell Cycle

becoming more and more popular as they provide Restriction Point). Classical flow cytometry was first
a means to analyze single cells and find relevant results used in 70s to determine DNA content and with that,
in cooperation with novel single-cell experimental the cell cycle stage of cell populations. Now, with the
techniques such as quantitative flow cytometry and help of fluorescence proteins, it could be used in
fluorescence microscopy. Some cell cycle models are a much more sophisticated way to determine molecule
already using stochastic simulation techniques, such number changes during the cell cycle (▶ Cell Cycle
as Langevin-type stochastic ODEs or the exact sto- Analysis, Flow Cytometry). Fluorescence proteins and
chastic simulation algorithm (Gillespie 2007). The advanced microscopy techniques also allow analysis
effect of noise in nonsymmetrical cell division on cell of single cells and even localization of single mole-
size distribution on the population level has been also cules during the cell cycle (▶ Cell Cycle Analysis,
investigated. Much more detailed stochastic models of Live-Cell Imaging). The periodic oscillations in tran-
cell cycle regulation are appearing. Such models take scription during the cell cycle was an early target of
into account measured molecular levels, protein, and transcriptomics research (▶ Cell Cycle Analysis,
mRNA half-lives and several other noisy processes. Expression Profiling) indeed, one of the first genome-
Most of these models are based on earlier ODE models scale microarray experiments was performed on the
that are “unpacked” into elementary reactions to be budding yeast cell cycle. Protein level fluctuations are
able to run stochastic simulations on them. classically followed by Western blots, but now large-
A successful approach along these lines uses Petri scale mass spectrometry analysis is widely used to
nets (▶ Cell Cycle Modeling, Petri Nets) which allow follow protein level fluctuations during the cell cycle
to move between deterministic and stochastic simula- and to indentify the phosphorylation states of Cdk-
tions and to follow molecule numbers instead of con- regulated proteins (Aebersold and Mann 2003).
centrations. Some other modeling concepts from Advanced ChIP–chip techniques help to reveal the
computer science are adopted and adapted to model transcriptional network of cell cycle regulation (Bahler
biological systems. Rule-based modeling and espe- 2009) and, essentially, all molecular and systems
cially various process algebras (▶ Cell Cycle Model- biology methods are used in cell cycle research.
ing, Process Algebra) are able to handle combinatorial The collected data is stored in various databases on
complexity caused by complex formation and various protein interactions, transcriptional interactions, post-
protein modifications. These allow an easier handling translational modifications, etc (▶ Cell Cycle Data
of multisite phosphorylation and multi-component Analysis). There are some cell cycle specific databases
complex formations, which are both relevant issues that contain information on periodically transcribed
for cell cycle regulation. genes during the cell cycle (Gauthier et al. 2010) and
that introduce related mathematical models as well
(Alfieri et al. 2008). Furthermore, literature mining
System Level Cell Cycle Experiments tools are helping us in searching for specific experi-
mental results on our favorite proteins. Large-scale
Many models are parameterized by fitting data on gene perturbation experiments are also being used to
physiological observations: cell sizes, cell cycle investigate the cell cycle: studies on single- or double
phase distributions, and viabilities of mutants – after gene-deletion strains (Hillenmeyer et al. 2008) and phe-
painstaking literature mining for these data. The notype analysis of protein overexpressing cells are all
advanced large-scale measurements and the related giving important results to cell cycle research (▶ Cell
databases provide modelers data to help them formu- Cycle Analysis, Systematic Gene Overexpression).
lating larger, more detailed, and more precise models All these resources are being used to support the
in the near future. development of computational models of cell cycle
Results collected over the last 40 years, including regulation. The phenotypes of the deletion and
recent system-level experiments have extended our overexpression mutant strains provide physiological
knowledge and now allow us to construct much more constrains for such models, while time-course mea-
detailed interaction maps of key cell cycle regulatory surements of mRNA and protein levels provide data
steps (▶ Cell Cycle Transition, Detailed Regulation of to be fitted by the simulation of the models.
Cell Cycle 229 C
Models Beyond the Cell Cycle describing the crosstalk between the various upstream
and downstream pathways and the core cell cycle
Cell cycle of individual cells in multicellular organ- machinery.
isms (such as humans) is controlled by the environ-
ment as well as it is influenced by its neighboring cells.
Cell–cell interactions make the modeling of cell cycle Summary
in a cellular tissue a multilevel problem (▶ Multilevel C
Modeling, Cell Proliferation). Not only the internal A full systems biology workflow of experiment !
molecular interactions of each individual cell, but database ! model ! experiment was applied to
their incoming and outgoing signals to their neighbors understand the role of positive feedback in the cell
need to be tracked. Cell cycle modules could be intro- cycle regulation. To realize the entire loop took more
duced into agent-based models to simulate how cell than 10 years in the case of the frog egg studies, but it
cycle is controlled in individual interacting cells. Fur- took only 4 years in the case of the budding yeast cell
thermore, the core cell cycle engine should properly cycle control. We are hoping that with the emergence
regulate important downstream processes, such as of systems biology, stimulating interactions between
DNA replication and cell division. To carry out experimentalists and theoreticians could lead to more
a proper function, it is interlocked with several other detailed, more precise and realistic models. For
pathways. Various environmental effects (stress, light, instance, we will need more sophisticated experimen-
temperature, etc.) can influence progression through tal techniques to measure protein activity fluctuations
cell division cycles, and several drugs, external signal- in time – desirably in individual cells. At the same
ing molecules, and metabolites could also affect cell time, better modeling and parameter estimation soft-
proliferation (Csikasz-Nagy 2009). Many of these ware is needed to deal with the increasing amounts of
upstream and downstream pathways of cell cycle reg- data and to accommodate new types of measurements
ulation have been investigated in great detail by sys- on individual cells. Even though we have learned
tems biology techniques. The yeast osmotic pressure a great deal about cell cycle regulation in the last
sensing pathway is one of the most investigated path- 40–50 years, there are still gaps in our knowledge
ways; several experiments and computational models that need bridged, and experimentalists and theoreti-
are dealing with it (Klipp et al. 2005). Computational cians need to work together to accomplish this.
models of the mammalian system are focusing on the
regulation of the cell cycle by the low oxygen induced
hypoxia pathway, on the Src and NFkB pathways, as Cross-References
well as on the interactions between DNA damage,
apoptosis, and the cell cycle. A recent discovery on ▶ CDK Inhibitors
the connection between the circadian clock and the cell ▶ Cell Cycle Analysis, Expression Profiling
cycle in mammalian cells (▶ Cell Cycle, Coupled with ▶ Cell Cycle Analysis, Flow Cytometry
Circadian Clock) also initiated some computational ▶ Cell Cycle Analysis, Live-Cell Imaging
modeling work along these lines. Likewise, the role ▶ Cell Cycle Analysis, Systematic Gene
of micro RNAs in relation to cell cycle regulation are Overexpression
also being investigated by computational modeling ▶ Cell Cycle Arrest After DNA Damage
(▶ Cell Cycle Regulation, microRNAs). Also, some ▶ Cell Cycle Checkpoints
of the downstream processes induced by the cell ▶ Cell Cycle Data Analysis
cycle machinery have been studied by mathematical ▶ Cell Cycle Dynamics, Bistability and Oscillations
modeling. The regulation of cell growth, cell division, ▶ Cell Cycle Dynamics, Irreversibility
DNA replication and several aspects of mitosis are all ▶ Cell Cycle Model Analysis, Bifurcation Theory
described in some detail in the literature (Lygeros et al. ▶ Cell Cycle Modeling Using Logical Rules
2008; Mogilner et al. 2006), but none of these are ▶ Cell Cycle Modeling, Differential Equation
connected to a model of the core cell cycle regulatory ▶ Cell Cycle Modeling, Petri Nets
network. Thus, we still need to wait to see the results ▶ Cell Cycle Modeling, Process Algebra
C 230 Cell Cycle

▶ Cell Cycle Modeling, Stochastic Methods Ciliberto A, Shah JV (2009) A quantitative systems view of the
▶ Cell Cycle Models, Sensitivity Analysis spindle assembly checkpoint. EMBO J 28:2162–2173
Cross FR, Archambault V, Miller M, Klovstad M (2002) Testing
▶ Cell Cycle of Early Frog Embryos a mathematical model for the yeast cell cycle. Mol Biol Cell
▶ Cell Cycle of Mammalian Cells 13:52–70
▶ Cell Cycle Regulation, microRNAs Csikasz-Nagy A (2009) Computational systems biology of the
▶ Cell Cycle Signaling, Hypoxia cell cycle. Brief Bioinform 10:424–434
Csikasz-Nagy A, Battogtokh D, Chen KC, Novak B, Tyson JJ
▶ Cell Cycle Signaling, Metabolic Pathway (2006) Analysis of a generic model of eukaryotic cell-cycle
▶ Cell Cycle Signaling, Spindle Assembly Checkpoint regulation. Biophys J 90:4361–4379
▶ Cell Cycle Transition, Detailed Regulation of Evans T, Rosenthal ET, Youngblom J, Distel D, Hunt T (1983)
Restriction Point Cyclin: a protein specified by maternal mRNA in sea urchin
eggs that is destroyed at each cleavage division. Cell
▶ Cell Cycle Transition, Principles of Restriction 33:389–396
Point Ferrell JE (2002) Self-perpetuating states in signal transduction:
▶ Cell Cycle Transitions, G2/M positive feedback, double-negative feedback and bistability.
▶ Cell Cycle Transitions, Mitotic Exit Curr Opin Cell Biol 14:140–148
Gauthier NP, Jensen LJ, Wernersson R, Brunak S, Jensen TS
▶ Cell Cycle, Budding Yeast (2010) Cyclebase.org: version 2.0, an updated comprehen-
▶ Cell Cycle, Cancer Cell Cycle and Oncogene sive, multi-species repository of cell cycle experiments
Addiction and derived analysis results. Nucleic Acids Res 38:D699–
▶ Cell Cycle, Cell Size Regulation D702
Gillespie DT (2007) Stochastic simulation of chemical kinetics.
▶ Cell Cycle, Coupled With Circadian Clock Annu Rev Phys Chem 58:35–55
▶ Cell Cycle, Fission Yeast Goldbeter A (1991) A minimal cascade model for the mitotic
▶ Cell Cycle, Physiology oscillator involving cyclin and cdc2 kinase. Proc Natl Acad
▶ Cell Cycle, Prokaryotes Sci USA 88:9107–9111
Hartwell LH, Weinert TA (1989) Checkpoints: controls that
▶ Cyclins and Cyclin-Dependent Kinases ensure the order of cell cycle events. Science 246:629–634
▶ Cytokinesis Hartwell LH, Mortimer RK, Culotti J, Culotti M (1973) Genetic
▶ DNA Replication control of the cell division cycle in yeast: V. Genetic analysis
▶ Endoreplication of cdc mutants. Genetics 74:267–286
Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S,
▶ Lymphocyte Dynamics and Repertoires, Lee W, Proctor M, St Onge RP, Tyers M, Koller D,
Biological Methods Altman RB, Davis RW, Nislow C, Giaever G (2008) The
▶ Meiosis chemical genomic portrait of yeast: uncovering a phenotype
▶ Mitosis for all genes. Science 320:362–365
Klipp E, Nordlander B, Kruger R, Gennemark P, Hohmann S
▶ Modeling Approaches of Cell Cycle Regulation by (2005) Integrative model of the response of yeast to osmotic
Signaling Pathways for External Stress shock. Nat Biotechnol 23:975–982
▶ Multilevel Modeling, Cell Proliferation Koch AL, Schaechter M (1962) A model for statistics of the cell
division process. J Gen Microbiol 29:435–454
Li F, Long T, Lu Y, Ouyang Q, Tang C (2004) The yeast cell-
cycle network is robustly designed. Proc Natl Acad Sci USA
References 101:4781–4786
Lindqvist A, Rodriguez-Bravo V, Medema RH (2009) The deci-
Aebersold R, Mann M (2003) Mass spectrometry-based proteo- sion to enter mitosis: feedback and redundancy in the mitotic
mics. Nature 422:198–207 entry network. J Cell Biol 185:193–202
Alfieri R, Merelli I, Mosca E, Milanesi L (2008) The cell cycle Lukas J, Lukas C, Bartek J (2004) Mammalian cell cycle check-
DB: a systems biology approach to cell cycle analysis. points: signalling pathways and their organization in space
Nucleic Acids Res 36:D641–D645 and time. DNA Repair (Amst) 3:997–1007
Bahler J (2009) Global approaches to study gene regulation. Lygeros J, Koutroumpas K, Dimopoulos S, Legouras I,
Methods 48:217 Kouretas P, Heichinger C, Nurse P, Lygerou Z (2008) Sto-
Bloom J, Cross FR (2007) Multiple levels of cyclin specificity in chastic hybrid modeling of DNA replication across
cell-cycle control. Nat Rev Mol Cell Biol 8:149–160 a complete genome. Proc Natl Acad Sci USA
Chen KC, Csikasz-Nagy A, Gyorffy B, Val J, Novak B, Tyson JJ 105:12295–12300
(2000) Kinetic analysis of a molecular model of the budding Masui Y, Markert CL (1971) Cytoplasmic control of nuclear
yeast cell cycle. Mol Biol Cell 11:369–391 behavior during meiotic maturation of frog oocytes. J Exp
Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B, Zool 177:129–145
Tyson JJ (2004) Integrative analysis of cell cycle control in Mogilner A, Wollman R, Civelekoglu-Scholey G, Scholey J
budding yeast. Mol Biol Cell 15:3841–3862 (2006) Modeling mitosis. Trends Cell Biol 16:88–96
Cell Cycle Analysis, Expression Profiling 231 C
Morgan DO (2006) The cell cycle: principles of control. New Definition
Science Press, London
Nasmyth K (2001) A prize for proliferation. Cell 107:689–701
Novak B, Tyson JJ (1993) Numerical analysis of Gene expression (or ▶ Transcriptome) profiling refers
a comprehensive model of M-phase control in Xenopus to the simultaneous measurement of the activities of
oocyte extracts and intact embryos. J Cell Sci most or all genes present in a cell, a tissue, or a whole
106(Pt 4):1153–1168 organism. The activity of a given gene is determined
Novak B, Tyson JJ (2008) Design principles of biochemical
oscillators. Nat Rev Mol Cell Biol 9:981–991 by the number of RNA transcripts (expression level) C
Novak B, Tyson JJ, Gyorffy B, Csikasz-Nagy A (2007) Irrevers- derived from this gene under a specific condition.
ible cell-cycle transitions are due to systems-level feedback. Using cells that are synchronized for the cell cycle
Nat Cell Biol 9:724–728 (▶ Cell Cycle, Synchronization), gene expression pro-
Nurse P (1975) Genetic control of cell size at cell division in
yeast. Nature 256:547–551 filing enables the large-scale identification of genes
Nurse P (1990) Universal control mechanism regulating onset of that are periodically expressed as a function of the
M-phase. Nature 344:503–508 cell cycle (▶ Cell Cycle-Regulated Gene Expression).
Panning T, Watson L, Allen N, Chen K, Shaffer C, Tyson J This approach thus provides global data of gene regu-
(2008) Deterministic parallel global parameter estimation for
a model of the budding yeast cell cycle. J Global Optim lation during the cell cycle, which supports a systems-
40:719–738 level understanding of cell-cycle control.
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell
cycle oscillator: hysteresis and bistability in the activation of
Cdc2. Nat Cell Biol 5:346–351
Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level Characteristics
dissection of the cell-cycle oscillator: bypassing positive
feedback produces damped oscillations. Cell 122:565–578 Methods for Gene Expression Profiling
Queralt E, Lehane C, Novak B, Uhlmann F (2006) Gene expression profiling has been enabled by the
Downregulation of PP2A(Cdc55) phosphatase by separase
initiates mitotic exit in budding yeast. Cell 125:719–732 development of large-scale technologies such as
Schmidt H, Jirstrand M (2006) Systems biology toolbox for ▶ DNA microarrays and, more recently, next-
MATLAB: a computational platform for research in systems generation sequencing-based approaches such as
biology. Bioinformatics 22:514–515 ▶ RNA-seq. These genome-wide approaches are
Sha W, Moore J, Chen K, Lassaletta AD, Yi CS, Tyson JJ,
Sible JC (2003) Hysteresis drives cell-cycle transitions in quite straightforward experimentally, and the biggest
Xenopus laevis egg extracts. Proc Natl Acad Sci USA challenge is often to extract biologically meaningful
100:975–980 information from the huge data sets generated. Com-
Sia RA, Bardes ES, Lew DJ (1998) Control of Swe1p degrada- putational data mining is therefore a central aspect of
tion by the morphogenesis checkpoint. EMBO J
17:6678–6688 gene expression profiling, including a range of special-
Zachariae W, Nasmyth K (1999) Whose end is destruction: cell ized approaches such as ▶ clustering.
division and the anaphase-promoting complex. Genes Dev Besides methods for gene expression profiling to
13:2039–2058 measure ▶ transcriptome levels during the cell cycle,
complementary large-scale technologies are available
for global insight into ▶ cell-cycle-regulated gene
expression. For example, ▶ chromatin immunoprecip-
Cell Cycle Analysis, Expression Profiling itation (ChIP) combined with ▶ DNA microarrays
(ChIP-chip) or sequencing (ChIP-seq) provide
J€
urg B€ahler and Samuel Marguerat a means to determine the genome-wide binding sites
Department of Genetics, Evolution and Environment of transcription factors controlling gene expression.
and UCL Cancer Institute, University College London, Moreover, recent developments in ▶ proteomics are
London, UK now allowing the global measurement of protein levels
and post-translational modifications as a function of
the cell cycle.
Synonyms
Gene Expression Profiling During the Cell Cycle
Expression profiling; Transcriptome profiling; Periodic control of transcription seems to be
Transcriptomics a universal feature of the cell-division cycle.
C 232 Cell Cycle Analysis, Expression Profiling

M S M S Mcm1p
3
Ace2p
2 Swi5p
Relative expression level

Cln3p
1 M
G1

0.5
Fkh2p G2
Mcm1p MBF
Ndd1p S SBF

0.2 Fkh1p
0 30 60 90 120 150 180 210 240 270
Minutes after synchronization

Cell Cycle Analysis, Expression Profiling, Fig. 1 Expression Cell Cycle Analysis, Expression Profiling, Fig. 2 Serial reg-
profiles of fission yeast transcripts peaking in levels before ulation of transcription factors during the budding yeast cell
M-phase (red) or S-phase (blue). Transcriptomes were analyzed cycle. Each group of transcription factors (specified in colored
by microarrays every 15 min after cell-cycle synchronization for frames) regulates transcription factors acting in the next cell-
two full cell cycles. Data from Rustici et al. (2004) cycle phase (shown in circle). Continuous and hatched lines
indicate transcriptional and post-translational regulation,
respectively
Gene expression profiling of proliferating cells after
▶ cell-cycle synchronization have identified hundreds
of periodically expressed genes, in organisms ranging in human, are not conserved at the sequence level but
from bacteria to humans (Breeden 2003). Periodically play analogous roles in controlling genes whose tran-
expressed genes often play specific roles in the cell script levels peak before S-phase.
cycle, and their expression levels typically peak just Cell-cycle-regulated gene expression often shows
before the phase during which they function. Particu- serial regulation, whereby transcription factors func-
larly rich genome-wide data are available for budding tioning during one cell-cycle phase up-regulate tran-
and fission yeasts, which provide complementary scripts that encode downstream transcription factors
models for cell-cycle-regulated ▶ gene expression functioning during the next phase (Fig. 2) (Futcher
(B€ahler 2005). About 10–20% of all yeast genes are 2002). Such a network of sequentially expressed tran-
periodically expressed during the cell cycle, many of scription factors may act as a cell-cycle oscillator that
them in three transcriptional waves that roughly coin- regulates the periodic gene expression program inde-
cide with major cell-cycle transitions: initiation of pendently of, and in addition to, cyclin-dependent
DNA replication, entry into mitosis, and exit from kinases (Simmons Kovacs et al. 2008).
mitosis (Fig. 1). Budding and fission yeasts show similarities as well
as intriguing differences in their ▶ transcriptional reg-
Control of Cell-Cycle-Regulated Gene Expression ulatory networks, revealing evolutionary plasticity of
Periodic gene expression is integrated with cell-cycle gene expression control (B€ahler 2005): Regulatory
progression by multiple feedback loops at both tran- circuits between conserved transcription factors have
scriptional and post-translational levels (B€ahler 2005; been rewired, and periodic transcription of many
Futcher 2002). Some cell-cycle transcription factors orthologous genes is not conserved, except for a core
are functionally conserved from yeast to human. For set of genes that may be universally regulated during
example, proteins of the forkhead family, such as Fkh2 the eukaryotic cell cycle and may play key roles in cell-
in yeast and FoxM in human, control genes whose cycle progression.
transcript levels peak before M-phase. Other transcrip- Although protein complexes involved in the cell
tional regulators, such as MBF/SBF in yeast and E2F cycle are largely conserved among eukaryotes, their
Cell Cycle Analysis, Flow Cytometry 233 C
Cross-References
P Protein
degradation ▶ Cell Cycle-Regulated Gene Expression
▶ Cell Cycle, Synchronization
▶ Chromatin Immunoprecipitation
Active complex
▶ Clustering
▶ DNA Microarrays C
Inactive complex ▶ Proteomics
▶ RNA-Seq
Periodically ▶ Transcriptional Regulatory Network
expressed subunit
▶ Transcriptome

Cell Cycle Analysis, Expression Profiling, Fig. 3 Just- References


in-time assembly of protein complex during the cell cycle,
controlled by a single cell-cycle-regulated subunit (green).
B€ahler J (2005) Cell-cycle control of gene expression in budding
Post-translational regulation of the periodically expressed
and fission yeast. Annu Rev Genet 39:69–94
subunit, such as phosphorylation (red) and protein degradation,
Breeden LL (2003) Periodic transcription: a cycle within a cycle.
often serve as extra layers of control
Curr Biol 13:R31–R38
Chen Z, McKnight SL (2007) A conserved DNA damage
regulation has changed considerably during evolution response pathway responsible for coupling the cell division
cycle to the circadian and metabolic cycles. Cell Cycle
(de Lichtenberg et al. 2007). Most protein complexes 6:2906–2912
contain both periodically and constitutively expressed de Lichtenberg U, Jensen TS, Brunak S, Bork P,
subunits, but the periodically expressed subunits differ Jensen LJ (2007) Evolution of cell cycle control: same
between organisms. Moreover, the periodically molecular machines, different regulation. Cell Cycle
6:1819–1825
expressed subunits, which may control complex Futcher B (2002) Transcriptional regulatory networks and the
activity by just-in-time assembly, tend to be coordi- yeast cell cycle. Curr Opin Cell Biol 14:676–683
nately controlled at transcriptional as well as Rustici G, Mata J, Kivinen K, Lió P, Penkett CJ, Burns G,
post-translational levels, highlighting that multiple Hayles J, Brazma A, Nurse P, B€ahler J (2004) Periodic
gene expression program of the fission yeast cell cycle. Nat
regulatory layers are often integrated to drive cell- Genet 36:809–817
cycle progression (Fig. 3). Simmons Kovacs LA, Orlando DA, Haase SB (2008) Transcrip-
tion networks and cyclin/CDKs: the yin and yang of cell
Metabolic Cycle cycle oscillators. Cell Cycle 7:2626–2629
An intriguing special case for periodic gene expression
is the metabolic cycle. When continuously grown
under nutrient-limiting conditions, budding yeast
exhibit highly periodic cycles of respiratory bursts,
and microarray studies revealed that over 50% of the Cell Cycle Analysis, Flow Cytometry
genes are expressed periodically during these
metabolic cycles. The cell-division cycle is gated by James W. Jacobberger
this metabolic cycle, with DNA replication being seg- Case Comprehensive Cancer Center, Case Western
regated away from the oxidative phase when cells are Reserve University, Cleveland, OH, USA
actively respiring. This temporal compartmentaliza-
tion ensures that DNA is replicated when oxidative
stress is lowest, in order to minimize genome damage. Synonyms
The coordination between metabolic and cell-division
cycles may be related to the coupling between circa- Cell cycle analysis, Phase fraction analysis;
dian and cell-division cycles in other organisms (Chen Cytometry, cytofluorimetry, microfluorimetry, analyt-
and McKnight 2007). ical cytology; Flow cytometry, flow microfluorimetry
C 234 Cell Cycle Analysis, Flow Cytometry

Definition research, is not required for cell cycle analysis.


Some major sources of commercial instrumentation
▶ Cell cycle analysis commonly denotes a group of are listed in Table 1.
analytical processes used to estimate the fraction of
cells that are resident in cell cycle phases, states, or Key Features of Cytometry
stages by cell-based measurement methods. Generally, There are two principle features of cytometry that
the measurements are performed with optical instru- distinguishes it from other approaches to cell biology.
ments connected to photosensitive components The first is that the data are a set of measurements with
(photomultiplier tubes, photo diodes, and charge each element a measurement on a single cell. The
coupled devices [CCD]) and associated electronics to value of that distinction is that for any given asynchro-
digitize light that is emitted (fluorescence, phosphores- nous population, under specific conditions, cells can be
cence) or scattered (not absorbed) by individual cells. observed at some frequency at all possible states for
The digitization process converts a uniform pulse of the system. For example, proliferating cells will be
light to a usable number that represents the quantity of measured at all cell cycle phases or states related to
some biochemical entity (usually, DNA), “reported” the specific cell type and environment. An element that
by the fluorescent label. is related to this is that in multivariate cytometry, all
measurements are correlated. This means that if mea-
surement a represents state 1 in the system, that state is
Characteristics also characterized with respect to measurement b. The
second feature of cytometry is that it is quantitative. In
Instrumentation the design and manufacture of the instruments, empha-
▶ Flow cytometers generally employ optical elements sis is on obtaining quantitative measurement with high
(lenses, fiber optics, filters, and mirrors) to direct precision – i.e., if two particles, one containing 1x and
focused beams of monochromatic or a restricted one containing 2x of fluorescent dye molecules,
band of light from either lasers or arc-lamps to emphasis is placed on minimizing error when 1x +
a spot within a flowing stream that contains cells error and 2x + 2error represents reality. In cell cycle
that move, single file, under forces of laminar flow. analysis, it is not uncommon to measure cells with one
Cells, in suspension, are introduced to flow genome (G0/G1) and cells with two genomes (G2/M)
cytometers from tubes and plate wells. Laser scan- at a G2/G1 ratio of 1.95 or better and coefficients of
ning cytometers employ moving stages, laser light variation (CV) of 1–4%.
sources, and modulated mirrors to scan cells that are
stationary with respect to an optically transparent Basis for Cell Cycle Analysis
substrate that is scanned by a moving beam spot in Figure 1a shows a simulated expression profile for
one direction while the microscope stage moves in the DNA or genome content for a cycling mammalian
orthogonal direction. The single existing commercial somatic cell with a cell cycle time (Tc) of 15.4 h. The
imaging flow cytometer uses the same principles as time spent with one genome is the length of the G1
ordinary flow cytometers but uses a CCD to capture phase of the cell cycle – in this case, 5.6 h. The time
several images of each cell moving through the inter- spent synthesizing a second genome is the length of
rogation region and integrates data on a per pixel S phase (6.8 h), and the length of time with two
basis. A variant of flow cytometry, “cell sorting,” genomes is 2.8 h (G2 + M). S phase is defined by
employs a flow cytometer with inline mechanisms genomic DNA synthesis. Figure 1b shows a simulated
for undulating and charging the fluid stream to create age distribution for a proliferating population on
charged droplets containing one or more cells as the a fractional cell cycle time (Tc) basis. This distribution
stream exits a nozzle. Charged plates then deflect the accounts for binary division at the end of the cycle. If
droplets in one of 2–4 directions based on computa- the expression profile (A) is transformed to account for
tion that occurred upstream at the interrogation point. overrepresentation in the population due to binary
This latter technology, while useful in cell cycle division; and normally distributed random numbers
Cell Cycle Analysis, Flow Cytometry 235 C
Cell Cycle Analysis, Flow Cytometry, Table 1 Commercial research cytometersa
Instrumentation Company City, state/country Web address
Flow cytometers/Cell sorters BD biosciences San Jose, CA bdbiosciences.com
Beckman coulter Miami, FL coulterflow.com
i-Cyt/Sony Champaign, IL i-cyt.com
Flow cytometers Partec company M€unster, Germany partec.com
Millipore Billerica, MA millipore.com C
Invitrogen Carlsbad, CA invitrogen.com
Accuri Ann Arbor, MI accuricytometers.com
Laser scanning cytometers Compucyte Westwood, MA compucyte.com
Imaging flow cytometers Amnis Seattle, WA amnis.com
a
This list is not comprehensive

a b
Relative frequency
2 2
Genomes

1 1

0 5 10 15 0.0 0.5 1.0


Time (hours) Fractional cell cycle time

c d

2
DNA content

Frequency

0.0 0.5 1.0 1 2


Population fraction DNA content

Cell Cycle Analysis, Flow Cytometry, Fig. 1 Relationship a function of the fractional cell cycle time. See (Walker 1954) for
between cell cycle-related DNA expression and cytometry of the logic. Panel (c) shows simulated DNA content measure-
DNA content. Panel (a) represents DNA expression over the life ments that would be measured for cells programmed to express
of a cell with the values G1 ¼ 5.8 h, S ¼ 6.8 h, and G2 + DNA as in A. To generate these data, the X axis of A is
M¼2.8 h, which are values that we have previously measured transformed as Xt ¼ (2*exp(-0.6931*Xi/Tc))*(X/Tc),
for NIH3T3 cells growing in 10% serum at a density of 87–121 Tc ¼ cell cycle time (15.4), Xt ¼ transformed X, Xi ¼ original
cells/cm2 (Disalvo et al. 1995). Panel (b) shows the age distri- X. Xts are unitless fractions of Tc. Panel (d) is a histogram of the
bution of a population of asynchronously growing cells, data in Panel (c)
accounting for the generation of two cells at birth, plotted as
C 236 Cell Cycle Analysis, Flow Cytometry

are generated along the adjusted “expression” curve to subsequent Gaussian standard deviation terms are cal-
simulate cytometric measurements, we get Fig. 1c. If culated from that using the coefficient of variation
we then create a histogram from the simulated data (standard deviation/mean) multiplied by the mean of
(Fig. 1d), one obtains a simulated DNA content distri- the subsequent Gaussian. The same logic is used to
bution for a proliferating cell population with an aver- “broaden” the ends of either a trapezoid series or
age cell cycle time of 15.6 h. This model does not a single polynomial function. Broadening either starts
account for any phase specific cell death. For further at the center of each peak or some fractional value
reading, see Walker (1954), Mitchison (1971), Watson toward the S phase side of each peak (see Fig. 3b).
(1991), Bagwell (1993). The purpose is to account for statistical overlap of each
component (G0/G1, S, and G2/M). Although the
Extension of the Logic Gaussian series is the most appealing from
The illustrated logic of Fig. 1 was first enunciated in a statistical principles point of view, in practice the
1954 (Walker 1954) and later developed within the approach is the least robust and is generally used for
context of data from flow cytometry by several groups. perturbed populations. The polynomial model works
The logic can be extended to multiple parameters. well for asynchronous, unperturbed populations, since
Figure 2 shows simulated plots for DNA and cyclin it robustly accounts for the theoretical, expected shape
A2 (Fig. 2f) compared to real data (Fig. 1a) from which of S phase. The trapezoid model is a good compromise
the fraction profiles (Fig. 2b, c) were derived and noise between the other two, able to robustly fit S phases of
added (Fig. 2d, e). By reversing the logic shown in almost any shape. Most commercial software contains
Fig. 1, the programmed cell cycle expression (Fig. 2g, h) some form of DNA content distribution fitting. Figure 3
can be derived for any parameter, provided a set of shows a typical analysis using a polynomial fit to
continuous measurements can be made covering the S phase.
cell cycle in a nonredundant manner. It is relatively
easy to see that the logic can be extended to an unlimited Limitations
number of parameters and that from primary cytometric Cell populations like liver parenchyme cells in vivo,
data, expression profiles for any measurable biochemi- most mammalian cell cultures, and mammalian tumor
cal feature can be derived. This has distinct implications cells often undergo endoreduplication (failure to
for mathematical models of cell cycle regulation (e.g., undergo mitosis and cytokinesis during a mitotic
Csikasz-Nagy et al. 2006), since the expression profiles cycle). Tumor cells and mammalian cell cultures also
are not different in a relative sense from model outputs fail to undergo binary division (cytokinesis) and thus
of state variable levels over time. create cycling cell populations with two or more nuclei
that cycle together. Additionally, as tumor cells and
DNA Content Analysis for Simple Proliferating tumorigenically transformed mammalian cell cultures
Populations evolve, multiple stable stem lines with fractional
Practical use of DNA content measurements is to cal- genomes (aneuploidy) are present in the populations.
culate the fractions of G0/G1, S, and G2/M cells. Most of these features can be dealt with using single
Because the error in the measurements are normally parameter DNA content analysis with sound logic,
distributed (given that cell fixation and staining, and a theoretical underpinning, and robustness obtained
instrument operation are within standard practice), by years of practice and widespread use. At some
a typical analysis is performed by fitting a complex point, these analyses become tortured. ModFit LT
function composed of summed normal distributions (Verity House, Topsham, ME) and MultiCycle AV
centered on the primary and secondary peaks (G0/G1 (Phoenix Flow Systems, San Diego, CA) are two soft-
and G2/M) and an S phase component that is either ware packages that support complex models.
(1) a series of Gaussian functions, (2) a series of trap-
ezoids with Gaussian “broadened” ends, or Further Reading
(3) a polynomial with “broadened” ends. Generally, There are many good reviews and chapters describing
the terms (mean and standard deviation) of the Gauss- DNA content analysis (Dean 1987; Gray et al. 1990;
ian functions are defined by the G0/G1 peak and all Watson 1991; Bagwell 1993; Rabinovitch 1993).
Cell Cycle Analysis, Flow Cytometry 237 C
a b c

Cyclin A2 content
DNA content
Cyclin A2

DNA Frequency Frequency


d e f
DNA content

Cyclin A2

Cyclin A2
Frequency Frequency DNA

g h
Cyclin A2 expression
DNA expression

Time (hours) Time (hours)

Cell Cycle Analysis, Flow Cytometry, Fig. 2 Relationship described in Frisa and Jacobberger (2009). A lengthy, more
between cell cycle-related expression and cytometry of two detailed description is described in Jacobberger et al. (2011b).
parameters. Panel (a) shows data from exponentially growing Panels (g) and (h) are the calculated cell cycle–related expres-
RKO cells that were stained for cyclins A2, B1, DNA content, sion of DNA and cyclin A2, calculated by reversing the logic in
and phospho-S10-histone H3 (Yan et al. 2004). The cell popu- Fig. 1. Panel (f) is a bivariate plot of the data in Panels (d) and
lation frequency profiles for DNA content and cyclin A2 calcu- (e). Note the similarity of the simulated data (f) and the
lated from the data shown in Panel (a) are shown in Panels (b) real data (a)
and (c). Calculation of these profiles was performed generally as

Two Parameter Cell Cycle Analysis Double-Stranded and Denatured DNA


Analysis of the cell cycle in more than one parameter is There are many nucleic acid-binding dyes that, in
almost always aimed at increasing the number of cell conjunction with RNase treatment, produce results
cycle compartments that can be monitored. The fol- like those shown in Fig. 3. In addition, there are dyes
lowing entries are a sampling of the major approaches like Hoechst 33,258 and DAPI that bind nucleic acids
for extending cell cycle analysis by adding another but only fluoresce significantly when bound to DNA.
parameter to DNA content. Metachromatic dyes like acridine orange fluoresce at
C 238 Cell Cycle Analysis, Flow Cytometry

b
Cell count

Cell count
DNA content DNA content

Cell Cycle Analysis, Flow Cytometry, Fig. 3 Common DNA function (black line), extending that line to some predefined
content analysis. The data are from a human T lymphoma cell point under the Gaussian distributed data in either direction
line (Molt4). (a): The model is composed of three components, (here, the means of each Gaussian component), then “broaden-
two Gaussian functions (red) and a “broadened” polynomial ing” or “tailing” the curve with partial Gaussian functions using
(blue, hatched) that are used to fit the areas under the data that the coefficients of variation from Gaussian components (red
can be attributed to cells with 2C and 4C DNA (G1 and G2 + M) components in a) and the amplitudes at the ends of the polyno-
and for S phase (G1< S<G2). (b): Illustrates the fitting of the mial. (ModFit LT was used to fit the distribution in a; Graphpad
central S phase data (circles) to a second-order polynomial Prism was used to generate b)

one wavelength when intercalated into double- quantify DNA and RNA. Later efforts rely on DNA-
stranded nucleic acids and at another wavelength specific dyes (DAPI, Hoechst 33,258) coupled with the
when bound to single-stranded nucleic acids. This RNA binding dye, Pyronin Y. These approaches can
characteristic can be exploited by treating fixed distinguish deep G0 (non-cycling) cells from cycling
cells with RNase to remove RNA and acid to partially cells in addition to the typical G1, S, and G2/M anal-
denature residual DNA. The level of denaturability ysis of DNA content. The acridine approach requires
of DNA changes within the cell cycle, and by measur- some skill and knowledge of chemistry. Ribosomal
ing fluorescence at the two wavelengths, then plotting RNA is essentially what is measured. For further read-
total fluorescence (adding the two measurements) ing, see Darzynkiewicz et al. (1980), Darzynkiewicz
versus the fluorescence for single-stranded DNA, and Traganos (1990).
one can identify five cell cycle compartments –
G1 can be subdivided in two, G1A or G1B (early DNA and Protein Content
and late), S, G2, and M. For further reading, see Using dyes that covalently couple with primary amines
Darzynkiewicz et al. (1980), Darzynkiewicz and or dyes that preferentially bind protein, protein content
Traganos (1990). measurements can be coupled with DNA content. In
general, these approaches are not often used. Protein
DNA and RNA Content content correlates with RNA content. In principal, the
Early protocols used metachromatic dyes like acridine approach should be useful for studying imbalanced
orange which fluoresces green when intercalated into growth (an imbalance between the mitotic cycle and
double-stranded nucleic acids and red when bound to the process of synthesizing approximately twice the
single-stranded nucleic acids or other polyanions. cell’s mass). For further information, see Crissman and
Because DNA is double stranded and cellular RNA is Steinkamp (1987), Darzynkiewicz and Traganos
largely single stranded, acridine orange can be used to (1990).
Cell Cycle Analysis, Flow Cytometry 239 C
DNA Content and BrdU pulse (the single cell pulse as viewed on an oscillo-
The fluorescence from the DNA binding dyes, Hoechst scope) provides an ability to eliminate aggregates from
33,342 or 33,258, is quenched when the dye is bound to the assay, and measurement of excitation light scatter
▶ 5-Bromo-2-deoxyuridine (BrdU) substituted DNA. provides a discriminator based on cell size and granu-
By coupling this dye with a dye that does not quench larity or surface geometric complexity. However, these
and fluoresces at different wavelengths, this feature are largely utilitarian. Conversely, measuring epitopes,
has been exploited elegantly by two laboratories to the expression profiles of proteins or protein modifica- C
measure the phase fractions of successive cell cycles. tions that oscillate within the cell cycle and that oscil-
For further reading, see Poot et al. (1994). late out of phase with each other provides an analytical
paradigm for creating cell states that can be used to
DNA Content and Immunofluorescence parse, in a very fine manner, the effects of various
Immunofluorescence coupled with DNA content treatments on the cell cycle, e.g., drug effects.
cytometry opens up a very broad ability to probe biol-
ogy related to the cell cycle. Immunofluorescence DNA, Cyclin A2, and Phospho-S10-Histone H3 (pH3)
probes can home in on a population of interest in Many epitopes that report the levels of proteins or
complex mixtures, e.g., markers for keratin can isolate other parent molecules can be coupled to provide addi-
the epithelial component in a solid tumor (Glogovac tional sub-compartmentalization of the cell cycle or
et al. 1996). Other cell processes can be queried with other information. The triad of DNA, cyclin A2, and
respect to the cell cycle, e.g., activation of DNA repair pH3 deserves special mention because their joining
pathways using the gH2Ax epitope (Tanaka et al. into one assay provides the ability to deal with
2007). Below, two common immunofluorescence endoreduplication/binucleate cycles (see Limitations,
approaches are described. above). Figures 4 and 5 show the principal features of
this analysis. Figure 5 shows the capture of the primary
DNA Content and BrdU BrdU-substituted DNA can (stem line) cycle (2C ! 4C). Figure 5a shows
be detected by antibodies. This is commonly used to a “region,” R2, set around mitotic cells. Figure 5b is
either prove that the S phase cells detected by DNA “NOT gated” on R2 and a region was set around the
content analysis were synthesizing DNA at the time of 2C ! 4C interphase cells (R3). The data of both 5A
sampling (other possibilities are that the cells were and 5B have been gated to include singlet cells and
dead, dying, or quiescent) or to determine the average exclude aggregates as in Fig. 4. Figure 5c shows all the
phase times by pulse-chase or continuous labeling events including singlet interphase cells (green), singlet
experiments. For further reading, see Gray et al. mitotic cells (blue), and debris/dead cells/outliers and
(1990), Gray et al. (1987), Dean (1987), Terry and aggregates (red), which are not used in the analysis.
White (2001). Figure 5e is a look at the discarded data. The labels
2X, 3X, and 4X point to G1 aggregates (containing 2, 3,
DNA and a Mitotic Marker Many proteins are spe- or 4 cells) together with endoreduplicated/multinucleate
cifically phosphorylated or phosphorylated more often cycling cells for which each event is a single cell.
when cells enter mitosis. Phospho-specific antibodies Finally, Fig. 5c shows cyclin A2 versus pH3 for the
can be raised to these epitopes. These antibodies cou- 2C ! 4C stem line. Regions are contiguously set
ple well with DNA content to provide G0/G1, S, G2, around clusters defined either by changes in cell fre-
and M analysis. Figure 4 shows typical data. For a short quency or by changes in the “direction” of the data
but comprehensive review, see Darzynkiewicz (2008). (arrows). The arrows show “movement” of an ideal
For methods, see Juan et al. (2001). cell through the data space. Div is the interface where
cell division occurs. Figure 5e is particularly relevant to
Multiparametric Cell Cycle Analysis Systems Biology, since it is easy to see that the expres-
Almost all cytometry assays of the cell cycle are sion pattern of pH3 and cyclin A2 form a unidirectional
multiparametric. Commonly, even for DNA content closed loop, and this renders the ability to extract
alone, measuring the pulse height and the integrated programmed expression profiles of these and other
C 240 Cell Cycle Analysis, Flow Cytometry

a 2,000 b
M
104
DAPI (Peak height) (× 100)

1,500
Single
cells

pH3-A488
1,000 103

500
102

0
0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500
DAPI (Integral) (× 100) DAPI (Integral) (× 100)

c d 3,000

1,500 2,500

2,000
Number

Interphase
Number

1,000
1,500

1,000
500
500
M
0 0

102 103 104 400 600 800 1,000 1,200


pH3-A488 DAPI (Integral) (× 100)

Cell Cycle Analysis, Flow Cytometry, Fig. 4 DNA content are colored blue. Panel (c) shows the distribution of pH3 for
analysis that differentiates mitotic cells. Exponentially growing the 2 C ! 4 C interphase (green line) and M cells (blue line).
HeLa cells were fixed and stained for DNA content, cyclin A2, Quantification of the fractions of 2 C ! 4 C interphase and
and histone H3 phosphorylated at serine 10 (pH3). Panel (a) is M cells can be done from this plot. Panel (d) shows the DNA
a plot of the DNA signal trace* peak height versus the integrated content for all of the 2 C ! 4 C cells (yellow line), the 2 C ! 4 C
area under the peak. This plot is used to reduce the fraction of interphase cells (green line), and the 2 C ! 4 C M cells (blue
aggregate cells in the analysis. See Rabinovitch (1993) for line). Both plots in panels (c) and (d) were obtained by using the
a discussion of the logic, theory, and practical considerations gate: [(Single cells AND R3) OR M]. (R3 is a region on cyclin
supporting this method. Singlet events, falling within the region A2 versus DNA, not shown in this figure, but shown in Fig. 5).
labeled “Single cells,” are colored green. Debris (events with Quantification of the fractions of 2 C ! 4 C G1, S, and G2 are
less than a G1 level of fluorescence) and aggregate events are performed on a plot of the green line in Panel (d), using the
colored red. Mitotic cells are selected in Panel (b) by selecting methods illustrated in Fig. 3.*the digitized electrical measure-
the cells within the cluster with 4 C DNA content and high ment as a function of time as a cell or particle moves through the
expression of pH3 (rectangle label “M”). The mitotic events laser beam

state variables. By calculating the mean levels of these Cell Cycle Analysis and Extension of the Logic.
or other parameters and the frequency of cells within For further reading, see Jacobberger et al. (2008), Soni
each region, the programmed expression profile can be et al. (2008), Avva et al. (2011), Jacobberger et al.
obtained by the principles outlined in sections Basis for (2011a, b).
Cell Cycle Analysis, Flow Cytometry 241 C
a b c
R3
R2 104
104 104

cyc A2-PE
pH3-A488

pH3-A488
Div
103 103 C
103

0
102
102
500 1,000 1,500 2,000 500 1,000 1,500 2,000 –103 0 103 104
DAPI (Integral) (× 100) DAPI (Integral) (× 100) cyc A2-PE
d e
105 105

104 104 Outliers


cyc A2-PE

cyc A2-PE

103 103 4X
Debris 3X
0 2X
0
Cell death ?
–103 –103
0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500
DAPI (Integral) (× 100) DAPI (Integral) (× 100)

Cell Cycle Analysis, Flow Cytometry, Fig. 5 Complex cell entire 2 C ! 4 C cycle starting from a state that is equivalent to
cycle analysis. The data file is the same as in Fig. 4. Panel (a) “birth/early G1” and one equivalent to the end of the cell cycle.
represents the same view as Panel (b) in Fig. 4, except that these These states are separated by “Div.” The arrows show the uni-
data have been gated on the “Single cells” region. Panel (b) directional movement of cells through the state space defined by
shows cyclin A2 versus DNA content, Boolean gated on “Single DNA content, cyclin A2, and pH3. Panels (d) and (e) are
cells” AND NOT “M,” which equals 2 C ! 4 C and 4 C ! 8 C identical plots of cyclin A2 versus DNA content, showing the
interphases. The gate, (“Single cells” AND R3) OR R2, 2 C ! 4 C interphase population (green dots) and mitotic cells
recombines interphase with mitosis, including mitotic cells that (blue dots) in Panel (d) and the events that are not used with these
are excluded in the singlet region (arrow, Fig. 4a). This gate has data removed from the plot (Panel e)
been applied to Panel (c), which provides a bivariate view of the

References Csikasz-Nagy A, Battogtokh D et al (2006) Analysis of a generic


model of eukaryotic cell-cycle regulation. Biophys J
Avva J, Weis MC, Soebiyanto RP, Jacobberger JW, Sreenath SN 90(12):4361–4379
(2011) CytoSys: a tool for extracting cell-cycle-related Darzynkiewicz Z (2008) There’s more than one way to skin
expression dynamics from static data. Methods Mol Biol a cat: yet another way to assess mitotic index by cytometry.
717:171–193 Cytom A 73(5):386–387
Bagwell CB (1993) Theoretical aspects of flow cytometry data Darzynkiewicz Z, Traganos F (1990) Multiparameter
analysis. In: Bauer KD, Duque RE, Shankey TV (eds) Clin- flow cytometry in studies of the cell cycle. In:
ical flow cytometry: principles and application. Williams & Melamed MR, Lindmo T, Mendelsohn ML (eds)
Wilkins, Baltimore, p 635 Flow cytometry and cell sorting. Wiley-Liss, New York,
Crissman HA, Steinkamp JA (1987) Multivariate cell analysis p 824
(Techniques for correlated measurements of DNA and other Darzynkiewicz Z, Traganos F et al (1980) New cell cycle com-
cellular constituents. In: Gray JW, Darzynkiewicz Z (eds) partments identified by multiparameter flow cytometry.
Techniques in cell cycle analysis. Humana Press, Clifton, p 407 Cytometry 1(2):98–108
C 242 Cell Cycle Analysis, Live-Cell Imaging

Dean PN (1987) Data analysis in cell kinetics research. In: Gray


JW, Darzynkiewicz Z (eds) Techniques in cell cycle analysis. Cell Cycle Analysis, Live-Cell Imaging
Humana Press, Clifton, p 407
DiSalvo CV, Zhang D et al (1995) Regulation of NIH-3 T3 cell
G1 phase transit by serum during exponential growth. Cell Andreas Doncic and Jan M. Skotheim
Prolif 28(9):511–524 Department of Biology, Stanford University School of
Frisa PS, Jacobberger JW (2009) Cell cycle-related cyclin b1 Medicine, Stanford, CA, USA
quantification. PLoS One 4:e7064
Glogovac JK, Porter PL et al (1996) Cytokeratin labeling of
breast cancer cells extracted from paraffin-embedded tissue
for bivariate flow cytometric analysis. Cytometry 24(3): Synonyms
260–267
Gray JW, Dolbeare F et al (1987) Flow cytokinetics. In: Gray
JW, Darzynkiewicz Z (eds) Techniques in cell cycle analysis. Single-cell time-lapse microscopy; Time-lapse
Humana Press, Clifton, p 407 microscopy
Gray JW, Dolbeare F et al (1990) Quantitative cell-cycle
analysis. In: Melamed MR, Lindmo T, Mendelsohn ML
(eds) Flow cytometry and cell sorting. Wiley-Liss, New
York, p 824 Definition
Jacobberger JW, Frisa PS et al (2008) A new biomarker for
mitotic cells. Cytom A 73(1):5–15 Time-lapse microscopy is a form of live-cell imaging
Jacobberger JW, Sramkoski RM et al (2011a) (▶ Cell Cycle Analysis, Live-Cell Imaging) where liv-
Multiparameter cell cycle analysis. Methods Mol Biol
699:229–249 ing cells are grown and monitored over time. Single-
Jacobberger JW, Sramkoski RM et al (2011b) Analysis of cell cell analysis of the cell cycle has been used to study the
cycle cytometry data genetic networks regulating ▶ cell cycle transitions.
Juan G, Traganos F, Darzynkiewicz Z (2001) Methods to iden- This technique is able to leverage variability in indi-
tify mitotic cells by flow cytometry. Methods Cell Biol
63:343–354 vidual cells to uncover regulatory features of the
Mitchison JM (1971) The biology of the cell cycle. Cambridge underlying molecular network. In addition, the long
University Press, Cambridge durations of time-lapse experiments can be used to
Poot M, Hoehn H et al (1994) Cell-cycle analysis using contin- minimize the influence of stress from sample prepara-
uous bromodeoxyuridine labeling and Hoechst 33358-
ethidium bromide bivariate flow cytometry. Methods Cell tion. However, care must be taken regarding illumina-
Biol 41:327–340 tion levels and environmental conditions so that
Rabinovitch PS (1993) Practical considerations for DNA photodamage or a variable environment does not per-
content and cell cycle analysis. In: Bauer KD, Duque RE, turb the cell cycle. Environmental conditions may be
Shankey TV (eds) Clinical flow cytometry: principles
and application. Williams & Wilkins, Baltimore, accurately controlled using microfluidic devices per-
pp 117–142 mitting imaging. In summary, the confluence of
Soni DV, Sramkoski RM et al (2008) Cyclin B1 is rate improved fluorescent proteins and ever-increasing
limiting but not essential for mitotic entry and computational speed and memory has led to new age
progression in mammalian somatic cells. Cell Cycle
7(9):1285–1300 in live-cell imaging.
Tanaka T, Huang X et al (2007) Cytometry of ATM activation
and histone H2AX phosphorylation to estimate extent of
DNA damage induced by exogenous agents. Cytom
A 71(9):648–661 Characteristics
Terry NH, White RA (2001) Cell cycle kinetics estimated by
analysis of bromodeoxyuridine incorporation. Methods Cell Requirements: Stable Conditions
Biol 63:355–374 Live-cell microscopy requires cells to be kept alive
Walker PMB (1954) The mitotic index and interphase processes.
J Exp Biol 31(1):8–15 during the course of the experiment. At a minimum,
Watson JV (1991) Introduction to flow cytometry. Cambridge this necessitates reliable temperature control and con-
University Press, Cambridge sistent nutrient supply. For metazoan cells, such as
Yan T, Desai AB et al (2004) CHK1 and CHK2 are differentially mammalian tissue culture cell lines frequently used
involved in mismatch repair-mediated 6-thioguanine-
induced cell cycle checkpoint responses. Mol Cancer Ther in biomedical research, care must also be taken to
3(9):1147–1157 control the CO2-concentration. In addition, growth
Cell Cycle Analysis, Live-Cell Imaging 243 C
and proliferation of mammalian cells depends on their Thus, either software or hardware-based auto-focusing
cell density so studies of cell cycle kinetics are neces- routines must be used to keep the sample in focus
sarily affected. Thus, cell cycle researchers using through the experiment. We recommend hardware-
mammalian systems should carefully control for cell based focusing (e.g., Zeiss Definite Focus; Nikon
density. Perfect Focus), which is rapid and accurate.

Requirements: Conditions for Optimal Imaging Requirements: Environmental Control C


It is helpful for cells to remain in place for imaging. The integration of live-cell imaging with microfluidic
Steady temperature control will remove a certain devices (flowcell) capable of accurately controlling
amount of drift in the z-direction. Mammalian cells and rapidly changing the extracellular environment
will adhere to the hard surfaces at the bottom of tissue will prove increasingly useful. Rapid change in the
culture dishes, some of which have glass slide bottoms extracellular media can be used to analyze responses
for imaging purposes. In addition, hard dishes or soft to environmental change (e.g., adding mating phero-
hydrogels can be coated with fibronectin or collagen to mone to budding yeast as in Fig. 2) and to exogenously
promote adherence and proliferation. For many mam- control expression of synthetic constructs. For exam-
malian cell applications, it is preferable to use rela- ple, in budding yeast, the MET3 promoter can be
tively low magnification (4, 10 or 20) because the placed ahead of any gene of interest so that expression
larger depth of focus of low magnification dry lenses occurs only in media lacking methionine. It takes
makes it easier to keep the cells in focus over long 5–10 min to activate gene expression upon methionine
times. Unless the dynamics of small cellular structures removal, while the addition of methionine to the
need to be resolved, it is recommended to use low medium shuts off expression in approximately 5 min
magnification, which can be easily used to quantify (Charvin et al. 2008).
fluorescence in either the whole cell or the nucleus. Even though a variety of microfluidic devices exist,
To constrain yeast cells, they are normally either handling different types of analysis (large/small-scale)
sandwiched between two surfaces in the imaging or different cell types (yeast, mammalian, etc.), the
chamber (agar/flowcell) or physically attached to the basic principle behind them remains the same
cover slip. Attaching yeast with the lectin protein con- (Fig. 1b): Cells are trapped in a chamber where the
canavalin A (conA) might work for some applications influx of media and temperature is precisely con-
but it is not recommended for long-time imaging since trolled. The chamber is located on top of the micro-
new buds either disappear or grow out of the imaging scope objective for imaging. The commercially
plane. If only one condition/medium is going to be available yeast flowcell from Cellasic is easy to use
used, yeast can be forced to grow in 2D by and functions well for low throughput experiments
sandwiching the cells between the cover glass and (www.cellasic.com). The high throughput system
a slab of ~1.5% low-melt agar mixed with the growth from the Hansen group is currently the state of the art
medium (Fig. 1a). For yeast imaging, we recommend for yeast work (Taylor et al. 2009). A similar platform
63 high NA oil-immersion lenses. These lenses bal- for mammalian cell imaging from the Quake lab is also
ance the need for magnification with the need to recommended (Gomez-Sjoberg et al. 2007).
collect as much light as possible and 100 lenses Regardless of the type of experimental setting it is
will produce substantially darker images, which is important not to overexpose the cells while using
a drawback for photosensitive yeasts. A drawback of ▶ fluorescence microscopy. Overexposure may lead
high magnification oil-immersion lenses is the thin to cell cycle arrest and/or cell death. It is recommended
focal plane, which requires careful focusing in the z- to control for overexposure by measuring cell cycle
direction through the course of the experiment. In kinetics (rate of cell division) with and without fluo-
particular, the focusing problem is exacerbated when rescence. Particular care must be taken when multiple
multiple x-y positions are used because moving the images of a z-section are desired. Budding and fission
lens will produce a high pressure in the lubricating yeasts are more sensitive to fluorescence imaging than
oil film tending to push the slide out of focus. human cell lines.
C 244 Cell Cycle Analysis, Live-Cell Imaging

a
1. Add liquid agar (~60 °C) 2. Add another (thinner) 3. Remove top cover
cover glass and wait for glass and add cells
the agar to cool

Cover glass

4. Cut agar 5. Flip the agar

To the microscope

b Controlled influx of media


Cell trapped in confined
volume (flowchamber) Waste

Pump

Imaging

Cell Cycle Analysis, Live-Cell Imaging, Fig. 1 Two ways to Cut out the region of agar containing the cells and flip it 180
prepare yeast cells for imaging: (a). Add 3–4 ml of 1.5% low- onto the imaging cover glass/dish used for microscope imaging.
melt agar, heated to 60 C, to a cover glass. Leave it to cool (b). Schematic of a microfluidic device. Cells are grown in
a few minutes and then sandwich the agar with a cover slip and a flowcell where media influx and temperature are controlled
wait until the agar cools completely. Once the agar is cool, externally
remove the top cover glass and add the cells on top of the agar.

Data Acquisition 2005). The best ▶ fluorescent markers used for cell
There are two general ways to monitor cells through cycle analysis have the following qualities:
time: Either track the individual cells through time or • Bright enough to yield a good signal at low intensity
sample large numbers of cells without replacement illumination that does not induce photodamage.
(population-based measurements) after cell cycle syn- Enhanced fast folding GFP, or Venus (yellow) are
chronization. On one hand, single-cell analysis does both excellent; mCherry (red) is much dimmer and
not require perturbative synchronization methods, has slower maturation kinetics.
reveals sharper transitions, and can be used to measure • Clearly mark one/several cell cycle events. In gen-
cell-to-cell variation due to stochastic fluctuations. eral, measurements of changes in protein localiza-
On the other hand, population-based methods tion or degradation are preferable to measurements
(e.g., flow cytometry, coulter counter) can be used to of changing concentration because even fast matur-
rapidly acquire large numbers of measurements, and ing GFP proteins fold in ~10–15 min.
can be used in conjunction with standard biochemical • Do not interfere with the natural function of the
techniques. fluorescently labeled protein.
Image acquisition is normally done using ▶ fluo- Commonly used markers for some cell cycle tran-
rescence microscopy in combination with an addi- sitions can be found in Table 1:
tional imaging technique (e.g., bright field, phase A novel technique for live-cell imaging developed
contrast, or differential interference contrast) to deter- by the Pines group relies on a FRET (fluorescence
mine the location of the cells. Depending on the cellu- resonance energy transfer) sensor to measure
lar process studied, a variety of ▶ fluorescent markers CDK-Cyclin B activity in single live cells (Gavet and
may be used. Recent technical advances by the Tsien Pines 2010) Similarly, the Kapoor lab used FRET
group have greatly expanded the quality and color probes to measure spatial and temporal gradient
spectrum of available fluorophores (Shaner et al. of aurora B activity during mitosis in live cells
Cell Cycle Analysis, Live-Cell Imaging 245 C
a Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry
Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP

t = 0 min t = 45 min t = 90 min t = 135 min t = 180 min t = 360 min


C
SCD SCD SCD SCD SCD SCD +αfactor

b c Cln2 induced
60 SCD 60 SCD
Nuclear Whi5 [A.U.]

Nuclear Whi5 [A.U.]


40 40

20 20

SCD +αfactor
0 0
0 60 120 180 240 300 –60 0 60 120 150 210
Time [min] Time [min]

Cell Cycle Analysis, Live-Cell Imaging, Fig. 2 (a). Compos- cell grown in SCD. Whi5-GFP enters the nucleus at anaphase
ite phase and fluorescence images of budding yeast cells grown and exits the nucleus at the point of commitment to cell division
in a flowcell with synthetic complete glucose medium (SCD). in mid-G1. (c). Example nuclear Whi5-GFP time course of a cell
This strain contains integrated Htb2-mCherry (a nuclear marker) arrested in a-factor (added at time ¼ 0) and then forced out
and Whi5-GFP fusion proteins (marking early G1, see table1). of arrest using the inducible MET3-promoter to express the
Note that the cells arrest after addition of a-factor (mating G1-cyclin CLN2 (see text)
pheromone). (b). Example nuclear Whi5-GFP time series of a

(Fuller et al. 2008). The ability to measure protein Although standard algorithms and commercial
activity, rather than protein concentration, in individ- products exist for image analysis, each application
ual cells is a great advantage because of the various tends to be unique enough to require customization.
inhibiting and activating post-translational modifica- As the image processing needs of the scientific com-
tions affecting function. Thus, we expect FRET-based munity increase in the coming decade, we expect com-
techniques to become increasingly important. mercial products to improve and be more widely
applied.
Analysis
Image Analysis An example of Cell Cycle Analysis Using Live-Cell
To obtain single-cell data it is necessary to segment Imaging
(finding the cell outline) and track individual cells over A striking example of the potential in single-cell cell
time. Segmentation is normally done on the entire cell or cycle analysis can be seen in the work by Di Talia et al.
on some subcellular compartment of special interest like to measure size control in budding yeast (Di Talia et al.
the nucleus (Sigal et al. 2006). In general, segmentation 2007). Yeast exhibit size homeostasis: Small cells
starts by manipulating the original image to enhance the must grow more than larger cells through their cell
difference between “cell” and “non-cell” regions. The cycle (▶ Cell Cycle, Cell Size Regulation). To exam-
exact nature of this step depends on the imaging and the ine this outstanding problem, Di Talia and colleagues
markers used. Next, a threshold is applied to distinguish used yeast labeled with a Myo1-GFP fusion protein
cells from non-cells. After the cells are identified, (used to divide the cell cycle into G1 and S/G2/M
parameters useful for cell cycle data analysis such as intervals) and ACT1pr-DsRed, a red fluorescent
size, average fluorescence, and timing of signal appear- protein under the control of the strong and constitu-
ance/disappearance can be extracted. tive ACT1-promoter (used to measure cell size).
C 246 Cell Cycle Analysis, Live-Cell Imaging

Cell Cycle Analysis, Live-Cell Imaging, Table 1 Some commonly used cell cycle markers
Organism/
Cell cycle event/phase Cellular process/localization Gene(s) cell type Ref
Cytokinesis/G1 Disappearance of the septin ring Cdc10, Myo1 Budding (Di Talia et al.
initiation yeast 2007)
Start transition Whi5 export from the nucleus Whi5 Budding (Di Talia et al.
(mid G1) yeast 2007)
S-phase initiation Reappearance of the septin ring Cdc10, Myo1 Budding (Di Talia et al.
yeast 2007)
Metaphase to anaphase Nuclear separation (also gives Htb2 or any other nuclear marker Budding Many
transition lineage information) yeast
Cell cycle transitions, Clb2 degradation/Cdc14 Clb2, Cdc14 Budding (Drapkin et al.
mitotic exit localization yeast 2009)
Mitosis Anaphase spindle status Tub1 Budding (Drapkin et al.
yeast 2009)
G1 & G2 Sensor nuclear Proliferating cell nuclear antigen Human cell (Hahn et al. 2009)
(PCNA)-based biosensor line
S Sensor nuclear at replication foci PCNA-based biosensor Human cell (Hahn et al. 2009)
line
M Sensor located throughout the cell PCNA-based biosensor Human cell (Hahn et al. 2009)
line
End of M-phase to Sensor nuclear Truncated human DNA helicase Human cell (Hahn et al. 2009)
G1/S B (HDHB)-based biosensor line
S & G2, cell cycle Sensor cytoplasmic HDHB-based biosensor Human cell (Hahn et al. 2009)
transitions, G2/M line
M Sensor located throughout the cell HDHB-based biosensor Human cell (Hahn et al. 2009)
line
G1 to S & G1 Licensing Fragment of Cdt1 bound to mKO Metazoan (Sakaue-Sawano
(red/orange) cells et al. 2008)
S, G2 & M Licensing inhibition Fragment of geminin bound to Metazoan (Sakaue-Sawano
mAG (green) cells et al. 2008)

Using these markers, Di Talia and colleagues were ▶ Fluorescent Markers


able to measure both cell sizes and the duration of ▶ Live Cell Imaging
G1-periods for single cells. Because individual cells ▶ Mitosis
are of various sizes at birth and budding, correlation
between cell size and interval duration is possible.
Thus, single-cell variability is leveraged to infer regu- References
latory properties. Di Talia et al. found that only smaller
Charvin G, Cross FR, Siggia ED (2008) A microfluidic device
daughter cells exhibited size control (an inverse corre-
for temporally controlled gene expression and long-term
lation between size at birth and G1 duration). fluorescent imaging in unperturbed dividing yeast cells.
PLoS ONE 3:e1468
Di Talia S, Skotheim JM, Bean JM, Siggia ED, Cross FR
(2007) The effects of molecular noise and size control
Cross-References on variability in the budding yeast cell cycle. Nature
448:947–951
▶ Cell Cycle Data Analysis Drapkin BJ, Lu Y, Procko AL, Timney BL, Cross FR
▶ Cell Cycle Transitions, G2/M (2009) Analysis of the mitotic exit control system using
locked levels of stable mitotic cyclin. Mol Syst Biol 5:328
▶ Cell Cycle Transitions, Mitotic Exit
Fuller BG, Lampson MA, Foley EA, Rosasco-Nitcher S, Le KV,
▶ Cell Cycle, Cell Size Regulation Tobelmann P, Brautigan DL, Stukenberg PT, Kapoor TM
▶ Fluorescence Microscopy (2008) Midzone activation of aurora B in anaphase
Cell Cycle Analysis, Systematic Gene Overexpression 247 C
produces an intracellular phosphorylation gradient. Nature (▶ Cell Cycle Analysis, Systematic Gene
453:1132–1136 Overexpression) systematically, novel cell cycle regu-
Gavet O, Pines J (2010) Progressive activation of CyclinB1-
Cdk1 coordinates entry to mitosis. Dev Cell 18:533–543 lators that could not be isolated by loss-of-function
Gomez-Sjoberg R, Leyrat AA, Pirone DM, Chen CS, Quake SR experiments can be identified. There is also an
(2007) Versatile, fully automated, microfluidic cell culture approach to analyze ▶ robustness of the cell cycle by
system. Anal Chem 79:8557–8563 measuring limits of gene overexpression.
Hahn AT, Jones JT, Meyer T (2009) Quantitative analysis of cell
cycle phase durations and PC12 differentiation using fluo- C
rescent biosensors. Cell Cycle 8:1044–1052
Sakaue-Sawano A, Kurokawa H, Morimura T, Hanyu A, Characteristics
Hama H, Osawa H, Kashiwagi S, Fukami K, Miyata T,
Miyoshi H, Imamura T, Ogawa M, Masai H, Miyawaki A
(2008) Visualizing spatiotemporal dynamics of multicellular Introduction
cell-cycle progression. Cell 132:487–498 Overexpression is artificially performed by ▶ pro-
Shaner NC, Steinbach PA, Tsien RY (2005) A guide to choosing moter-swapping or copy number increase of the target
fluorescent proteins. Nat Meth 2:905–909 gene. In model organisms in which these genetic
Sigal A, Milo R, Cohen A, Geva-Zatorsky N, Klein Y, Alaluf I,
Swerdlin N, Perzov N, Danon T, Liron Y, Raveh T, Carpen- manipulations are available, the cell cycle analyses
ter AE, Lahav G, Alon U (2006) Dynamic proteomics in by the systematic gene overexpression have been
individual human cells uncovers widespread cell-cycle performed. Genes produce cell cycle defects upon
dependence of nuclear proteins. Nat Meth 3:525–531 overexpression have been isolated by the phenotype
Taylor RJ, Falconnet D, Niemisto A, Ramsey SA, Prinz S,
Shmulevich I, Galitski T, Hansen CL (2009) Dynamic anal- of transformed cells or individual organisms with
ysis of MAPK signaling using a high-throughput cDNAs or genomic fragments expressed under the
microfluidic single-cell imaging platform. Proc Natl Acad control of strong inducible promoters. In the post-
Sci USA 106:3758–3763 genomic era, DNA libraries containing each of every
protein-coding sequence on the genome have been
constructed. Using those libraries, genuine whole
genome overexpression-based screenings have been
Cell Cycle Analysis, Phase Fraction performed. Currently the budding yeast is the only
Analysis model organism in which these analyses have been
done.
▶ Cell cycle analysis, flow cytometry In addition to the screening, there is an approach to
study the robustness of the cell cycle by the systematic
measurement of limits of overexpression of the cell
cycle–related genes. The limit data was also used to
Cell Cycle Analysis, Systematic Gene evaluate and refine an integrative cell cycle model. The
Overexpression concrete examples of the cell cycle analysis of individ-
ual model organisms using systematic gene
Hisao Moriya overexpression are introduced below.
Okayama University Research Core for
Interdisciplinary Sciences, Kita-ku, Okayama, Japan Budding Yeast (Saccharomyces cerevisiae)
Several extensive screenings to identify genes that
cause the cell cycle defect upon overexpression have
Definition been performed in the budding yeast (Stevenson et al.
2001; Sopko et al. 2006; Niu et al. 2008). In all these
The ▶ cell cycle is achieved by the regulators that studies, GAL1 promoter by which the target gene is
are expressed in appropriate amounts, in appropriate highly expressed in galactose medium was used, and
cell cycle periods. To elucidate the regulatory ▶ flow cytometry was used to detect the cell cycle
mechanism of the cell cycle, experiments in which defects. Through these analyses, about 250 genes
the regulators are artificially overexpressed and the were identified. Because the set of genes includes
effects are observed have often been performed. By cyclin genes that play central roles in the cell cycle,
performing this gene overexpression experiment the set should include the genes directly involved in the
C 248 Cell Cycle Analysis, Systematic Gene Overexpression

cell cycle regulation. On the other hand, many genes There are thus few examples of the systematic screen-
those are known to be involved in other processes than ings of the genes that cause the cell cycle defects upon
the cell cycle were also identified, though for all overexpression. In the fruit fly, there is a system (GAL4
genetic experiments it is difficult to circumvent driving systems) by which the target gene can be
obtaining indirect factors. overexpressed in organ specific manner. Using 2,300
Along with the extension of our understanding of the fly lines in which each target locus (gene) is
cell cycle mechanisms, an integrative mathematical overexpressed, 32 genes were isolated by the “small
model of the cell cycle has been constructed (Chen eye phenotype” that is an indicator of the cell cycle
et al. 2004). Robustness against parameter fluctuations defects (Tseng and Hariharan 2002). It is estimated
is useful to evaluate the plausibility of biological models that only less than 10% genes are studied in this analysis.
(Morohashi et al. 2002). By measuring limits of gene
overexpression systematically, promoter of the cell The Future of the Cell Cycle Analysis Using
cycle of real cell against (gene expression) parameter Systematic Gene Overexpression
fluctuations can be measured. genetic Tug-Of-War Genetic screening using overexpression enables us to
(gTOW) method is used for this purpose. In gTOW, isolate genes that do not show any cell cycle defect
instead of promoter swapping, the target gene with its upon gene disruption. For example, cyclin genes that
native promoter is cloned into a plasmid of which the make gene family in the budding yeast does not show
copy number increases in the leucine-limiting condition. significant phenotype upon disruption, but they show
If overexpression of the target gene causes the cell cycle strong cell cycle defect upon gene overexpression.
defect, the copy number of the plasmid (and the gene) However, while the gene knockout experiment gives
per cell must be less than the overexpression limit. the unified result because the target gene is completely
Using gTOW, limits of overexpression of about 30 cell inactivated, gene overexpression experiments usually
cycle regulator genes were measured, and the data was do not give unified results because there are many ways
used to evaluate and refine the integrative mathematical to overexpress the target gene (i.e., promoter swapping
model (Moriya et al. 2006; Kaizu et al. 2010). using different promoters, copy number increase),
which lead to the different expression levels. In fact,
Fission Yeast (Schizosaccharomyces pombe) there are surprisingly few overlaps among the gene sets
The fission yeast is another model organism in which isolated in the three budding yeast genome-wide
the regulatory system of the cell cycle has been exten- screening described above (Niu et al. 2008).
sively studied. In this yeast, the target gene can be To obtain the unified result in the overexpression of
artificially overexpressed using nmt1 promoter that is every gene, we need to systematically measure the
induced in thiamine-limiting condition. Although the overexpression limit that causes certain cell cycle
induction of nmt1 promoter is relatively slower defect. An experiment to continuously observe the
(18 h) than the budding yeast GAL1 promoter cell cycle defect upon gradual increase of the target
(1 h), defects of the fission yeast cell cycle can be gene expression will be a one suitable way. Such
easily observed with the elongated morphology (cdc experiment can be done in the model organisms such
phenotype) under the microscope. Until now, 21 genes as the budding yeast (e.g., Charvin et al. 2008). In
were isolated by the systematic screening with cdc higher eukaryotes, the similar technology needs to be
phenotype of the transformants of a cDNA library developed using cultured cell first.
(Tallada et al. 2002). Technically, genuine exhaustive
genomic screening and gTOW can be performed in the
fission yeast as well. Cross-References

Higher Eukaryotes ▶ Cell Cycle


In higher eukaryotes (multicellular organisms), in addi- ▶ Cell Cycle Analysis, Flow Cytometry
tion to the difficulty in genetics manipulations, it is ▶ Promoter
a matter how defects in the cell cycle are observed. ▶ Robustness
Cell Cycle Arrest After DNA Damage 249 C
References Definition

Charvin G, Cross FR, Siggia ED (2008) A microfluidic device Cell cycle arrest after DNA damage describes the
for temporally controlled gene expression and long-term
interconnection between two complex signaling pro-
fluorescent imaging in unperturbed dividing yeast cells.
PLoS ONE 3(1):e1468 cesses – DNA damage sensing and the cell cycle – by
Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B, a variety of biochemical interactions. Damage may
Tyson JJ (2004) Integrative analysis of cell cycle control in arise from various sources, including radiation, chem- C
budding yeast. Mol Biol Cell 15(8):3841–3862
ical agents, or errors during DNA synthesis or cell
Kaizu K, Moriya H, Kitano H (2010) Fragilities caused by
dosage imbalance in regulation of the budding yeast cell division. The resulting damage is sensed by
cycle. PLoS Genet 6(4):e1000919 a signaling network that halts the cell cycle by modu-
Moriya H, Shimizu-Yoshida Y, Kitano H (2006) In vivo robust- lating cyclin/Cdk activity. Cell cycle arrest can be
ness analysis of cell division cycle genes in Saccharomyces
transient to allow repair of DNA damage, or can persist
cerevisiae. PLoS Genet 2(7):e111
Morohashi M, Winn AE, Borisuk MT, Bolouri H, Doyle J, indefinitely as a senescence-like state. This essay
Kitano H (2002) Robustness as a measure of plausibility describes mechanisms of DNA damage-induced cell
in models of biochemical networks. J Theor Biol cycle arrest, their dynamics, and their effect on even-
216(1):19–30
tual cell fate. It also discusses mathematical modeling
Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM (2008) Mecha-
nisms of cell cycle control revealed by a systematic and approaches used to gain insight into these processes.
quantitative overexpression screen in S. cerevisiae. PLoS
Genet 4(7):e1000120
Sopko R, Huang D, Preston N, Chua G, Papp B,
Kafadar K, Snyder M, Oliver SG, Cyert M, Hughes TR,
Characteristics
Boone C, Andrews B (2006) Mapping pathways and
phenotypes by systematic gene overexpression. Mol Cell Arrest Mechanisms: Connecting DNA Damage
21(3):319–330 Signaling to the Cell Cycle
Stevenson LF, Kennedy BK, Harlow E (2001) A large-scale
Progression through the cell cycle can be viewed as
overexpression screen in Saccharomyces cerevisiae iden-
tifies previously uncharacterized cell cycle genes. Proc Natl having two fundamental goals: the duplication of
Acad Sci USA 98(7):3946–3951 genetic information during the synthesis phase
Tallada VA, Daga RR, Palomeque C, Garzón A, Jimenez J (S phase), and the division of this material among
(2002) Genome-wide search of Schizosaccharomyces
daughter cells during mitosis (M phase) (▶ Cell
pombe genes causing overexpression-mediated cell cycle
defects. Yeast 19(13):1139–1151 Cycle of Mammalian Cells). Damage to the cell’s
Tseng AS, Hariharan IK (2002) An overexpression screen DNA interferes with both goals, leading to the possi-
in Drosophila for genes that restrict growth or cell- bility of erroneous genetic material being replicated or
cycle progression in the developing eye. Genetics
partitioned among daughters. Such errors can drive
162(1):229–243
mutagenesis, and play a central role in cancer
progression.
The first line of defense against such errors is to stop
the cell cycle when damage is sensed, preventing the
Cell Cycle Arrest After DNA Damage initiation of DNA replication or cell division. This
process, known as cell cycle arrest, must satisfy three
Jared Toettcher criteria: it must be activated quickly after damage is
University of California, San Francisco, CA, USA delivered, to prevent cell cycle progression in the pres-
ence of damage; it must remain active as long as
damage is present; and finally, if cells resume cycling
Synonyms after arrest, they must reenter the cell cycle in the same
phase from which they exited. Violation of any of these
Cell cycle arrest mechanisms; DNA damage response; conditions could lead to mutagenic errors. A group of
Extrinsic DNA damage checkpoints; p53 and the cell molecular interactions that transmit the presence of
cycle damage to halt the cell cycle is known as a cell cycle
C 250 Cell Cycle Arrest After DNA Damage

b
a ATM

Plk1
Damage DSBs (e.g. IR) SSBs; other (e.g. UV) Chk2
Wip1
γH2AX RPA Cdc25
Wee1 14-3-3σ
Sensors MRN Ku80 TopBP1

ATM DNA-PK TAOs ATR CycB p53

Effector
kinases Mcm
Chk2 p38 Chk1 Mdm2
Cdh1

TFs p53
CycA p27
DNA damage sensing
Cell cycle arrest
CycE p21 Cell cycle
DNA repair
Arrest mechanisms
Apoptosis
E2F1 Rb

CycD

Cell Cycle Arrest After DNA Damage, Fig. 1 DNA damage (black boxes). (b) Multiple mechanisms of arrest connect DNA
signaling and its connection to the cell cycle. (a) DNA damage damage to the cell cycle. Pictured here are arrest mechanisms
signaling connects damage sensors to cell fate through multiple implicated in the response to DNA double-stranded breaks. Post-
levels of effectors. The presence of damage leads to the local translational catalytic steps (solid lines), stoichiometric inhibitor
activation of sensor proteins at the damage sites. These sensors binding (dotted lines) and transcriptional activation or repression
in turn activate kinase cascades leading to growth arrest (indi- (dashed lines) for each arrest mechanism are indicated
cated by red boxes), DNA repair (green boxes) or apoptosis

arrest mechanism. Many arrest mechanisms are (▶ Cell Cycle Checkpoints). DNA damage leading
known, and can act to halt the cell cycle in G1, S, primarily to double-stranded breaks (DSBs) is sensed
G2, or M phase. by the Mre11-Nbs1-Rad50 (MRN) protein complex
binding at the site of damage. The MRN complex
The DNA Damage Response: Multiple Branches activates an effector kinase cascade including the
and Timescales phosphoinositide 3-kinase (PI3K) ataxia telangiectasia
Damage to the cell’s DNA is sensed by proteins of mutated kinase (ATM) and checkpoint kinase 2
three different classes: sensors that interact directly (Chk2). A positive feedback loop whereby ATM phos-
with damaged DNA; effectors that undergo post- phorylates histone H2AX at the site of damage leading
translational modification, allowing them to mediate to additional ATM activation ensures a strong damage
a downstream signaling response; and in some cases response even to a single DSB. A second PI3K, DNA-
transcription factors that mediate longer-term tran- PK, is also activated at the site of DSBs.
scriptional responses (Fig. 1a). Other forms of damage, including DNA adducts
The first two classes – sensors and effectors – act and single-stranded breaks (SSBs), are sensed by
quickly to halt the cell cycle, and are often considered a homologous, parallel kinase cascade. Again, DNA
together as the mediators of extrinsic cell cycle binding proteins such as RPA localize at damage sites,
checkpoints (Reinhardt and Yaffe 2009). They are and proceed to activate kinases such as ATM-related
discussed in more detail in the corresponding Essay kinase (ATR) and thousand-and-one amino acid
Cell Cycle Arrest After DNA Damage 251 C
kinase (TAO). Subsequent kinase cascades activate cycle regulators (e.g., cyclins, Cdc25 phosphatases,
checkpoint kinase 1 (Chk1) and the mitogen-activated Plk1), and overexpressing or disrupting them individ-
protein kinase p38. This is by no means an exhaustive ually to induce or abrogate arrest. They act on both G1
list of components involved in sensing and transmit- and G2 cell cycle components by a variety of biochem-
ting damage signals, and many additional regulators ical mechanisms, including stoichiometric inhibition,
affect the activity of the components described here. In enzymatic modification, and transcriptional induction/
addition, specific programs of DNA repair induced by repression of cell cycle proteins. C
each damage type depend highly on the subset of Kinase, phosphatase and protease activity provide
damage sensors and transducers activated as well as the fastest mode of regulation on the cell cycle
the current cell cycle phase (▶ DNA Repair). (Fig. 1b). These connections proceed from DNA dam-
In contrast to the sensor-effector machinery, much age effector kinases such as ATM, Chk1, and Chk2 to
of which is broadly conserved from yeast to mammals, phosphorylate various cell cycle targets. For instance,
the slower transcriptional arm of DNA damage- to signal G1 arrest, ATM can phosphorylate cyclin D
induced cell cycle arrest appears to be restricted to to target it for degradation. ATM can also act on
multicellular organisms. It plays a similar role in G2 cells by phosphorylating Plk1, preventing cyclin
species ranging from Drosophila melanogaster to B/Cdk1 activation. The dual specificity phosphatases
humans. Its central player is the stress-induced tran- Cdc25A, Cdc25B, and Cdc25C are key substrates of
scription factor p53. In mammals, the DNA damage the DNA damage effector kinases. Chk1 phosphoryla-
kinases p38, ATM, ATR, DNA-PK, Chk1, and Chk2 tion of Cdc25A leads to its degradation and G1 arrest,
can all phosphorylate p53, protecting it from homeo- while phosphorylation by Chk1/2 of Cdc25B/C leads
static degradation by ubiquitin ligases such as Mdm2. to their inhibition by 14-3-3 proteins and G2 arrest.
A variety of other post-translational modifications can In addition, stoichiometric inhibitors can play
tune p53 stability and transcriptional activity (acetyla- a crucial role in cell cycle regulation. The transcrip-
tion, methylation, SUMOylation, etc.). Elevated levels tional induction of these stoichiometric cell cycle
of modified p53 lead to both the induction and repres- inhibitors, as well as direct transcriptional repression
sion of various genes involved in the cell cycle, apo- of cyclins, constitutes a slower but potentially longer-
ptosis, and DNA repair. lived response to DNA damage (Fig. 1b). For instance,
p53 activation is itself regulated by multiple posi- p53 can induce expression of the CDK inhibitor p21
tive and negative feedback loops that serve to integrate and the Cdc25C inhibitor 14-3-3s (▶ CDK Inhibitors).
information between cell survival and stress pathways p53 also plays a role in the transcriptional repression of
or maintain low levels of p53 under unstressed condi- cyclins A and B.
tions (Harris and Levine 2005). Moreover, this feed- It should be noted that not all of these mechanisms
back regulation has consequences for p53’s temporal described in this section have been shown to be active
activation. DSB induction across a broad range of in all cell types, or in response to all stimuli. We still
mammalian contexts – from cultured fibroblasts to lack a comprehensive picture of which combinations
live mice – leads to a series of pulses of p53 activation of arrest mechanisms control an individual cell’s
on a timescale of hours (Batchelor et al. 2009). It has response to damage. However, from those few studies
been shown that oscillation depends on two negative addressing combinations of arrest mechanisms in the
feedback connections: p53 transcriptionally induces its same cells, a key motif emerges: that of fast (post-
negative regulator Mdm2 as well as the phosphatase translational) p53-independent arrest initiation acting
Wip1, which can inactivate p38, ATM, and Chk2 in parallel with slower (transcriptional) p53-dependent
(Batchelor et al. 2008). arrest maintenance. For instance, in G1 arrest induced
by ionizing radiation, cyclin D is promptly
The Molecular Logic of DNA Damage Signaling to ubiquitinated in a p53-independent fashion, followed
the Cell Cycle by the slower induction of the p53 transcriptional tar-
Many connections between DNA damage signaling get p21 (Agami and Bernards 2000). Similarly, G2
and the cell cycle have been identified. These connec- arrest is induced quickly by Cdc25C phosphorylation
tions have largely been uncovered through experi- by Chk2, followed by p53-dependent downregulation
ments identifying interacting partners of key cell of cyclins A and B (Toettcher et al. 2009).
C 252 Cell Cycle Arrest After DNA Damage

Understanding Cell Cycle Arrest via Computation challenging to compute the sensitivity of the times at
Differential equation models of the cell cycle typically which cell cycle transitions occur. If the time of cell
undergo autonomous oscillation under normal cycle transition can be associated with a local maxi-
conditions (▶ Cell Cycle Modeling, Differential Equa- mum or minimum of the ith species concentration
tion). Cell cycle arrest arises naturally from such (e.g., cyclin E/Cdk2 to mark the G1/S transition, or
a model, when some parameter or modeled species cyclin A/Cdk1 to mark S/G2), the sensitivity of this
is modified such that oscillation ceases and time t to model parameters p can be represented as
a stable steady state is reached. In contrast to the
checkpoint-centered view in which the cell cycle dt Hx f i @x Hp f i
¼ 
“checks” the status of damage signaling before com- dp ðHx f i Þ f @p ðHx f i Þ f
mitting to S or M phase, this dynamical systems-
@x
centered view holds that damage signaling modifies where @p is the standard concentration sensitivity of
the cell cycle oscillator such that it settles to a stable, state variables x with respect to model parameters
arrested state. Upon removal of the arrest stimulus, (Rand et al. 2006). This quantitative approach enables
the arrested state may either persist or oscillation may the generation of models that match a variety of exper-
resume. Notably, there is no guarantee that cycling imental observations.
continues from the same phase at which it arrested. Second, the modeler must be able to generate exper-
Modeling approaches have the advantage that cell imentally relevant predictions. Many experimental
cycle arrest mechanisms can be tested individually – or studies are based on measurements from large numbers
in combination – for their ability to promptly elicit of cells. These might be either population-averaged
arrest, maintain arrest, and resume cycling after dam- techniques such as Western blots, or population distri-
age repair. An attempt to characterize all 24 possible bution measurements such as flow cytometry (▶ Cell
arrest mechanisms in a recent model of the eukaryotic Cycle Analysis, Flow Cytometry). To afford compar-
cell cycle (Csikász-Nagy et al. 2006) identified five ison to data, populations of models must be simulated
classes of arrest states corresponding to G1, S, G2, and representing collections of cells. However, there is
M phase arrest, as well as a G1-like state with G2 DNA a subtlety associated with properly sampling
content (Toettcher et al. 2009) (Table 1). Such models populations of dividing cells: since a freely cycling
can also be fit to experimental data to determine which cell population contains twice as many newborn cells
classes of arrest mechanisms govern the response as cells that are just about to divide, the distribution of
in vivo. These experimentally driven approaches are cell ages in the population is skewed toward younger
complementary to theoretical investigations of cells. Generally, the probability of finding a cell at age
models’ oscillation and steady state behavior using t since its last division can be represented by the
techniques such as bifurcation analysis (▶ Cell Cycle relation
Model Analysis, Bifurcation Theory).
The modeler faces two challenges in developing pðtÞ ¼ ð2 ln 2Þ2t=T
quantitative models of DNA damage and the cell
cycle, discussed in this and the following paragraph. where T is the cell cycle length (Mitchison 1971).
First, it is crucial to inform a model by fitting its Thus, the modeler must sample the model’s initial
parameters to account for a number of experimental states according to this distribution. A corollary to
observations. Typical data might include the amount this observation is that the fraction of freely
time spent by freely cycling cells in each cell cycle cycling cells observed in G1, S, and G2 (e.g., by
phase, whether certain cell cycle mutants are able to flow cytometry) is not equal to the amount of
cycle, and how protein levels change during arrest. time spent by a cell in those phases, but must be
Fitting this data accurately and efficiently requires corrected accordingly.
computing the sensitivity of protein levels and the
times of cell cycle transitions from a model (▶ Cell Consequences of Cell Cycle Arrest
Cycle Models, Sensitivity Analysis). While it is Cell cycle arrest is typically considered to be
straightforward to compute the sensitivity of protein a temporary and reversible state, persisting until dam-
concentrations at fixed timepoints, it is more age is repaired, after which cycling is resumed. This is
Cell Cycle Arrest After DNA Damage 253 C
Cell Cycle Arrest After DNA Damage, Table 1 Computational analysis of individual arrest mechanisms affecting eight key cell
cycle species by stoichiometric inhibition, enzymatic inactivation, or transcriptional repression. Entries are grouped first by the arrest
state induced by the each mechanism, and second by the species inhibited
Arrest type Inhibited species Stoichiometric Enzymatic Transcriptional
G1 Cyclin D ✓ ✓ ✓
Cyclin E ✓ ✓ ✓
S Cdc25 ✓ ✓ ✓
C
Cyclin B ✓ ✓
G2 E2F ✓ ✓ ✓
Cyclin A ✓
M APCCdh1 ✓ ✓ ✓
Cdc20 ✓ ✓ ✓
G1 cyclins; 8N DNA Cyclin A ✓ ✓
Cyclin B ✓
✓ Predicted to cause cell cycle arrest

indeed in many contexts, especially for low levels of ▶ Cell Cycle Modeling, Differential Equation
damage where repair is concluded quickly. However, ▶ Cell Cycle Models, Sensitivity Analysis
two alternatives to this course of action can be ▶ Cell Cycle of Mammalian Cells
observed: adaptation, in which cells resume cycling ▶ Cell Cycle, Cancer Cell Cycle and Oncogene
even in the presence of damage; and senescence, Addiction
where cells remain arrested permanently, even after ▶ DNA Repair
damage is repaired. Adaptation plays a role in bacteria ▶ Endoreplication
and yeast, but is largely detrimental to multicellular
organisms due to the risk of mutagenesis it carries.
In contrast, senescence is the safest course of action,
References
provided nearby undamaged cells can compensate for
the loss of the damaged cell’s replicative capacity. Agami R, Bernards R (2000) Distinct initiation and maintenance
A growing body of work suggests a connection mechanisms cooperate to induce G1 cell cycle arrest in
between senescence and DNA damage (Toettcher response to DNA damage. Cell 102:55–66
Bartkova J, Rezaei N, Liontos M, Karakaidos P, Kletsas D,
et al. 2009; Bartkova et al. 2006; Di Micco et al.
Issaeva N, Vassiliou LV, Kolettas E, Niforou K,
2006). A senescence-like arrest state can result from Zoumpourlis VC, Takaoka M, Nakagawa H, Tort F,
DSB-inducing irradiation (Toettcher et al. 2009), and Fugger K, Johansson F, Sehested M, Andersen CL,
the DSB response is required for oncogene-induced Dyrskjot L, Orntoft T, Lukas J, Kittas C, Helleday T,
Halazonetis TD, Bartek J, Gorgoulis VG (2006) Oncogene-
senescence (▶ Cell Cycle, Cancer Cell Cycle and
induced senescence is part of the tumorigenesis barrier
Oncogene Addiction) (Bartkova et al. 2006; Di imposed by DNA damage checkpoints. Nature 444:633–637
Micco et al. 2006). During this response, even G2- Batchelor E, Mock CS, Bhan I, Loewer A, Lahav G (2008)
arrested cells appear to enter a G1-like arrest state, Recurrent initiation: a mechanism for triggering p53 pulses
in response to DNA damage. Mol Cell 30:277–289
with high levels of cyclin E and p21, and low levels
Batchelor E, Loewer A, Lahav G (2009) The ups and downs of
of cyclins A and B. When such an arrest state is p53: understanding protein dynamics in single cells. Nat Rev
disrupted (e.g., by deletion of p21) cells can undergo Cancer 9:371–377
a second round of DNA replication, a phenomenon Csikász-Nagy A, Battogtokh D, Chen KC, Novák B, Tyson JJ
(2006) Analysis of a generic model of eukaryotic cell-cycle
known as endoreduplication (▶ Endoreplication).
regulation. Biophys J 90:4361–4379
Di Micco R, Fumagalli M, Cicalese A, Piccinin S, Gasparini P,
Luise C, Schurra C, Garre M, Nuciforo PG, Bensimon A,
Cross-References Maestro R, Pelicci PG, d’Adda di Fagagna F (2006) Onco-
gene-induced senescence is a DNA damage response trig-
gered by DNA hyperreplication. Nature 444:638–642
▶ Cell Cycle Checkpoints Harris SL, Levine AJ (2005) The p53 pathway: positive and
▶ Cell Cycle Model Analysis, Bifurcation Theory negative feedback loops. Oncogene 24:2899–2908
C 254 Cell Cycle Arrest Mechanisms

Loewer A, Batchelor E, Gaglia G, Lahav G (2010) Basal dynam- transduction cascades blocking or slowing down cell
ics of p53 reveal transcriptionally attenuated pulses in cycle progression at specific stages. Checkpoints are
cycling cells. Cell 142:89–100
Mitchison JM (1971) The biology of the cell cycle. Cambridge triggered by sensor proteins detecting, directly or indi-
University Press, Cambridge rectly, cell cycle perturbations and transmitting the
Rand DA, Shulgin BV, Salazar JD, Millar AJ (2006) Uncovering signal, through the action of protein kinases, to effector
the design principles of circadian clocks: mathematical anal- proteins that stop cell cycle progression until the signal
ysis of flexibility and evolutionary goals. J Theor Biol
238:616–635 activating the checkpoint has been turned off. These
Reinhardt HC, Yaffe MB (2009) Kinases that control the cell mechanisms have been highly conserved during evo-
cycle in response to DNA damage: Chk1, Chk2, and MK2. lution, and checkpoint defects result in genome insta-
Curr Opin Cell Biol 21:245–255 bility, which is frequently associated to tumor
Toettcher JE, Loewer A, Ostheimer GJ, Yaffe MB, Tidor B,
Lahav G (2009) Distinct mechanisms act in concert to medi- development. The checkpoint controls are elicited
ate cell cycle arrest. Proc Natl Acad Sci USA 106:785–790 through molecular events regulating their activation,
maintenance, and inactivation resulting, respectively,
in cell cycle arrest, maintenance of the arrest for
a certain time and recovery of cell cycle progression.
Cell Cycle Arrest Mechanisms These surveillance mechanisms can be divided into
intrinsic regulatory pathways, ensuring the orderly
▶ Cell Cycle Arrest After DNA Damage progression of cell cycle events under physiological
conditions, and extrinsic pathways that are activated in
response to specific clues, such as damage to DNA or
cellular structures. The intrinsic checkpoints act by
Cell Cycle Checkpoints controlling the activity of cell cycle dependent kinases
(CDKs) mainly at the G1/S boundary and at the meta-
Flavio Amara, Marco Muzi-Falconi and Paolo Plevani phase to anaphase transition in mitosis; such mecha-
Dipartimento di Scienze Biomolecolari nisms are described in other entries of the
e Biotecnologie, Università di Milano, Milan, Italy encyclopedia.

DNA Damage Checkpoints


Synonyms The DNA damage checkpoint is required for the effi-
cient response to genotoxic stress. The checkpoint is
Adaptation; Cell cycle; DNA damage checkpoint; activated when lesions in the DNA are detected and the
Recovery; Spindle checkpoint. mechanisms involved differ slightly at various cell
cycle phases. DNA damage during the G1 phase acti-
vates the G1/S checkpoint preventing entry into
Definition S phase. The presence of DNA lesions while cells rep-
licate their genome slows down the kinetics of DNA
In this entry we describe the molecular mechanisms of replication (intra-S checkpoint), and if the chromo-
the cell cycle checkpoints. We focused our attention on somes are damaged in G2, the activation of the G2/M
the DNA damage checkpoint and on the spindle assem- checkpoint avoids chromosome segregation before
bly and spindle positioning checkpoints, controlling repair.
chromosome segregation in mitosis. Precise and complete DNA replication in every cell
cycle and repair of DNA lesions are critical for the
maintenance of genetic stability; failures in these pro-
Characteristics cesses reduce cell survival and lead to cancer suscep-
tibility. Cell cycle arrest is not the only final outcome
Intrinsic and Extrinsic Cell Cycle Checkpoints of the DNA damage checkpoint response; indeed, it
The checkpoints are evolutionarily conserved surveil- has been demonstrated that checkpoint activation reg-
lance mechanisms controlling the order and timing of ulates the choice of recombination pathways, influ-
cell cycle transitions. They are organized as signal ences transcription of DNA repair genes, stabilizes
Cell Cycle Checkpoints 255 C
Cell Cycle Checkpoints, Table 1 Major orthologous checkpoint proteins in budding yeast and mammalian cells
S.cerevisiae Mammal Function
RFA 1, 2, 3 RPA 1, 2, 3 ssDNA binding protein
MEC1/DDC2 ATR/ATRIP Sensor (PI3K kinase)
TEL1 ATM Sensor (PI3K kinase)
DDC1-RAD17-MEC3 RAD9-RAD1-HUS1 Sensor (9-1-1 PCNA-like clamp)
DPB11 TOPBP1 Relevant for MEC1/ATR activation C
RAD9 BRCA1/53BP1 Adaptor
MRE11-RAD50-XRS2 (MRX) MRE11-RAD50-NBS1 (MRN) Lesion processing
SAE2 CtIP Lesion processing
SGS1 BLM, WRN Lesion processing
EXO1 hEXO1 Lesion processing
CHK1 CHK1 Effector kinase (S/T kinase)
RAD53 CHK2 Effector kinase(S/T kinase)

stalled replication forks and, in multicellular eukary- RAD53/CHK2 and CHK1 activation influence various
otes, it may promote apoptosis when the damage is DNA metabolic pathways, like homologous recombi-
irreparable. nation, origin firing during S phase, nuclease activity,
and gene expression.
MEC1/ATR DNA Damage Checkpoint Signaling Another essential player of checkpoint activation is
In Saccharomyces cerevisiae, the two main players of represented by the 9-1-1 complex. This heterotrimeric
the DNA damage checkpoint activated in response to complex show limited sequence homology to the
many DNA helix-distorting lesions are two kinases PCNA homotrimeric clamp and it is therefore often
encoded by the MEC1 and RAD53 genes. They are referred to as the checkpoint clamp (Table 1 and
the orthologues of ATR and CHK2 in human cells Fig. 1). The 9-1-1 complex is loaded onto DNA by
(Table 1). a checkpoint clamp loader, a form of replication factor-
In budding yeast, MEC1 acts as the apical kinase C (RFC), in which the RFC1 subunit is substituted by
in the checkpoint cascade and likely senses RAD24 (S. cerevisiae) or RAD17 (mammalian cells)
abnormal amounts of single-stranded (ss) DNA (Majka and Burgers 2003). The 9-1-1 clamp promotes
stretches covered by the replication protein A (RPA), checkpoint activation in vivo by influencing MEC1/
which are generated by processing of various DNA ATR recruitment and its substrate specificity; the 9-1-1
lesions (Fig. 1). clamp has a role in modulating chromatin binding of
RPA-coated ssDNA is thought to recruit the apical RAD9/53BP1 and subsequent RAD53/CHK2 phos-
checkpoint kinase (MEC1-DDC2 or TEL1, in budding phorylation events.
yeast; ATR-ATRIP or ATM in mammalian cells), and
the 9-1-1 complex (RAD17-MEC3-DDC1 in budding Checkpoint Signaling in Response to Double
yeast) triggering the signal transduction cascade (Zou Strand Breaks (DSBs)
and Elledge 2003). As a consequence of this activation, When the DNA integrity is challenged by discontinu-
MEC1 phosphorylates, directly or indirectly, a number ities in the helix backbone, as those generated by
of factors (e.g. DDC2, H2A, DDC1, RAD9, DPB11/ double strand breaks, the checkpoint human kinase
TOPB1, RAD53/CHK2, CHK1, etc.) and these subse- ATM is loaded near the DSBs and, together with
quent phosphorylation events are used to follow ATR, activate the checkpoint response. ATM activa-
the signal through the cascade (Fig. 1). Activated tion requires the presence of the MRN complex, which
RAD53/CHK2 amplifies the signal through has a DNA tethering capacity and possesses both endo-
autophosphorylation events and inhibits the cell cycle and exo-nucleolytic activities. Although the nuclease
machinery by phosphorylating various targets, leading activity of the MRX/MRN complex is not critical for
to cell cycle arrest (Harrison and Haber 2006). checkpoint activation, the presence of a physically
C 256 Cell Cycle Checkpoints

Damaged DNA:
DNA adducts Double Strand Break (DSB)
Helix distorting lesions
DNA damage

G1 S G2/M

NER

Damage
processing
and ssDNA
production
Gapped Stalled replication
DNA fork Processed DSB

9-1-1

H3
DDC2 H2A
DNA adduct MEC1 Checkpoint
Helicase activation
Ser129 Chromatin
RPA Lys79 DBP11 remodelling
Nuclease
RAD9
Phosphorylation
Methylation RAD53

Cell Cycle Checkpoints, Fig. 1 DNA damage checkpoint activation

assembled complex and the action of other nucleases manner in response to a variety of genotoxic treatments
and helicases (such as EXO1 and SGS1) are important. and the modified gH2A molecules localize near a DSB
In S. cerevisiae TEL1, the ATM orthologue, partic- (van Attikum and Gasser 2009); the same modification
ipates only marginally in the checkpoint response occurs on the Ser139 conserved residue of the mamma-
induced by a single DSB, but its role becomes relevant lian H2AX histone variant. It is assumed that RPA-
in the presence of multiple DSBs and/or when the coated ssDNA is required to activate the checkpoint,
initiation of DSB ends processing is defective. In this but the MEC1/ATR and TEL1/ATM-dependent phos-
organism, a single irreparable DSB triggers the MEC1 phorylation of the H2A/H2AX histones (Fig. 1) is rele-
pathway, which is activated by extensive resection of vant for checkpoint maintenance, since in its absence
the DSB DNA ends (Lazzaro et al. 2009). If the break the checkpoint signal is turned off prematurely. In yeast,
is not rapidly repaired, nucleolytic activities produce one of the major roles of H2A phosphorylation is the
long ssDNA regions that, via the ssDNA binding pro- recruitment of chromatin remodeling complexes near
tein RPA, recruit MEC1 activating the checkpoint the DNA lesions. Another histone modification impor-
cascade (Fig. 1). tant for the activation of the DNA damage checkpoint is
methylation of Lys79 of histone H3 (H3-Lys79Me) by
Chromatin Remodeling and Checkpoint the specific methyl transferase DOT1. In fact, the main
Maintenance mediator proteins in the checkpoint cascade (RAD9 in
Chromatin modifications contribute to checkpoint func- S. cerevisiae and 53BP1 in mammals) can be recruited
tion. In fact, histone proteins are also modified in near a chromatin damaged site through two parallel
response to DNA damage. In S. cerevisiae, the Ser129 pathways. The first one is dependent upon H3-Lys79
residue of H2A is phosphorylated in a MEC1-dependent methylation, while the second pathway acts through
Cell Cycle Checkpoints 257 C
the function of the DPB11/TOPB1 adaptor proteins bi-orientation are stochastic processes taking
(Fig. 1). a variable amount of time to complete. During that
time individual chromosomes may be detached from
Turning Off the DNA Damage Checkpoint Signal the microtubules or be connected only to one spindle
In cells with repairable DNA damage, the checkpoint pole. The spindle assembly checkpoint (SAC) delays
arrest is maintained until repair is completed. Then, the metaphase to anaphase transition until the sister
cells can resume the cell cycle turning off the check- chromatids are properly attached to the spindle in C
point signal in a process known as recovery. Instead, if a bipolar orientation. In budding yeast, cell cycle arrest
an irreparable lesion is present, cells eventually over- at the G2/M transition is mediated by inhibition of the
ride the cell cycle arrest in a process called adaptation. CDC20-anaphase promoting complex (APC) ubiquitin
In mammals, it has been proposed that adaptation to ligase, thus preventing proteolysis of the securin PDS1
damage is linked to cancerogenesis and, likely, this until complete bi-orientation is achieved (Fig. 2).
event is normally prevented by inducing the apoptotic Checkpoint proteins (MAD1, MAD2, BUB1,
pathway. Inactivation of the RAD53 kinase is required BUBR1, BUB3, and MPS1) all accumulate at unat-
to recover from checkpoint-mediated cell cycle arrest tached kinetochores and form various complexes,
in S. cerevisiae. Various protein phosphatases, includ- many of which can inhibit the APC (Fang et al. 1998)
ing PTC2, PTC3, PPH3, and GLC7, have been found (Fig. 2). APC is a multiprotein complex that targets
to be important for checkpoint inactivation. Interest- several proteins for degradation during mitosis through
ingly, their relative contribution to the recovery pro- the associated specificity factor CDC20. Securin and
cess seems to be influenced by the type of genotoxic cyclins are ubiquitylated by CDC20-APC; therefore, to
stress causing checkpoint activation (DSBs, alkylating delay anaphase onset in the presence of spindle
agents, hydroxyurea). The same phosphatases act on defects, the checkpoint must block CDC20-APC medi-
phosphorylated gH2A and, not surprisingly, homolo- ated PDS1 degradation. Experimental evidence sug-
gous phosphatases influence checkpoint inactivation gests that in response to spindle defects, MAD2
also in human cells. exchanges from a MAD1/MAD2 complex to
While it was somehow expected that reversal of a CDC20/MAD2 complex sequestering CDC20 away
checkpoint signaling would involve the action of phos- from the APC and blocking PDS1 degradation (Fig. 2).
phatases acting on the master effector kinases of the In S. cerevisiae, spindle misorientation is detected
cascade, it was interesting to learn that POLO-like by the spindle positioning checkpoint (SPOC) which
kinases (such as PLK1 and CDC5) also play an impor- prevents mitotic exit. The target of this control is the
tant role in switching off the checkpoint. It is likely that mitosis exit network (MEN), and more specifically the
recovery and adaptation, being two sides of the same activation of the TEM1 GTPase (Adames et al. 2001).
coin, are going to share common players and mecha- TEM1 cycles between GDP- and GTP-bound states,
nisms. However, genetic screenings in budding yeast regulated by the putative guanine nucleotide exchange
revealed that some mutations allow to distinguish factor (GEF) LTE1 and the two-component GTPase
between the two processes. For example, the cdc5-ad activating protein (GAP) BFA1/BUB2. The last one
allele and deletions of certain recombination genes recruits TEM1 to the bud-directed spindle pole, where
(such as KU70/80, TID1 and RAD51) prevent the TEM1 is kept inactive until the pole crosses the neck
adaptation process without affecting recovery. into the bud. GTP-TEM1 then binds to the protein
kinase CDC15, which phosphorylates and activates
S. cerevisiae Spindle Assembly and Spindle the protein kinase DBF2. MOB1 binds to DBF2 and,
Position Checkpoints in a poorly understood manner, MOB1/DBF2 stimu-
Chromosome segregation at mitosis is controlled by lates the release of the CDC14 phosphatase from the
two surveillance mechanisms: the spindle assembly nucleolus and contributes to cytokinesis. CDC14
and the spindle positioning checkpoints. Accurate seg- dephosphorylates CDH1, leading to the activation of
regation requires bipolar attachment of sister chroma- CDH1-APC complex, which triggers cyclin degrada-
tids to the mitotic spindle, which is mediated by tion and exit from mitosis. A possible crosstalk
a proper connection between kinetochores and spindle between the SAC and SPOC is likely, but not yet
microtubules. Kinetochore capture and microtubules fully demonstrated (Lew and Burke 2003).
C 258 Cell Cycle Checkpoints

BFA1
BUB2

TEM1 TEM1

MPS1 MAD2
CDC15
BUB3 BUB1 CDC20 LTE1
BUB3 BUBR1 BUB3 BUBR1
CDC15
MAD1 MAD2 TEM1

MOB1

CDC14 DBF2
APC CDH1

Phosphate
PDS1
CDC28 GDP
CLB2
Separase
GTP

Metaphase Anaphase Telophase Cytokinesis

Cell Cycle Checkpoints, Fig. 2 The spindle assembly and spindle positioning checkpoints

Summary axis of cell polarity is controlled by another surveil-


The maintenance of genome stability is an essential lance mechanism, called the spindle positioning
process, which needs a careful control. Indeed, all checkpoint.
eukaryotic cells evolved surveillance mechanisms,
called checkpoints, sensing the presence of DNA
damage in the genome or alterations in cellular Cross-References
structures controlling chromosome segregation in
mitosis. DNA integrity can be challenged by lesions ▶ DNA Damage
caused by a variety of chimico-physical agents, ▶ DNA Repair
or by replication stress caused by special DNA ▶ Histones
structures or by a limited supply of deoxyribonucle- ▶ Kinetochores
otides. The DNA integrity checkpoint is a signal ▶ Mitosis
transduction cascade conserved from yeast to man ▶ Protein Kinases
and the apical factors in the pathway are protein ▶ Phosphatases
kinases. The spindle assembly checkpoint controls
the status of kinetochore/microtubule attachment
delaying exit from mitosis until all kinetochores
References
have formed correct bipolar connections with the Adames NR, Oberle JR, Cooper JA (2001) The surveillance
spindle. In budding yeast, and likely in all eukary- mechanism of the spindle position checkpoint in yeast.
otes, the alignment of the mitotic spindle with the J Cell Biol 153(1):159–168
Cell Cycle Data Analysis 259 C
Fang G, Yu H, Kirschner MW (1998) Direct binding of CDC20 they enabled systematic, genome-wide studies of
protein family members activates the anaphase-promoting ▶ gene expression through the cell cycle of
complex in mitosis and G1. Mol Cell 2(2):163–171
Harrison JC, Haber JE (2006) Surviving the breakup: the DNA a ▶ model organism (▶ Cell Cycle Analysis, Expres-
damage checkpoint. Annu Rev Genet 40:209–235 sion Profiling). Today several such studies have been
Lazzaro F, Giannattasio M, Puddu F, Granata M, Pellicioli A, performed, measuring the expression of every gene at
Plevani P, Muzi-Falconi M (2009) Checkpoint mechanisms many time points during the cell cycles of budding
at the intersection between DNA damage and repair. DNA
Repair 8:1055–1067 yeast (▶ Cell Cycle, Budding Yeast), fission yeast C
Lew DJ, Burke DJ (2003) The spindle assembly and spindle (▶ Cell Cycle, Fission Yeast), and cell lines of
position checkpoints. Annu Rev Genet 37:251–282 human (▶ Cell Cycle of Mammalian Cells) or plant
Majka J, Burgers PM (2003) Yeast Rad17/Mec3/Ddc1: a sliding origin. Other large-scale studies include mass spec-
clamp for the DNA damage checkpoint. Proc Natl Acad Sci
USA 100:2249–2254 trometry data on protein levels and phosphorylation
van Attikum H, Gasser SM (2009) Crosstalk between histone states in different phases of the cell cycle and system-
modifications during the DNA damage response. Trends Cell atic screens for substrates of cyclin-dependent kinases
Biol 19:207–217 (▶ Cyclins and Cyclin-dependent Kinases).
Zou L, Elledge SJ (2003) Sensing DNA damage through ATRIP
recognition of RPA-ssDNA complexes. Science 300:1542– The sheer size of these types of datasets implies that
1548 computational methods are needed to analyze them.
Because most of the datasets are expression time
courses covering one more cell cycles, a central prob-
lem is to identify cell-cycle-regulated (cycling) genes
or proteins based on expression profiles. This is
Cell Cycle Data Analysis a nontrivial problem because the data may contain
considerable amounts of noise and because they may
Lars Juhl Jensen contain other biological signals than the one of interest
Novo Nordisk Foundation Center for Protein (e.g., stress response caused by the synchronization of
Research, University of Copenhagen, Copenhagen, cell cultures). Over the past few years numerous
Denmark methods for identification of cycling genes have been
developed; a brief description and a ▶ receiver oper-
ating characteristic (ROC) curve can be found for most
Definition of them in Gauthier et al. (2010). Key features of all the
best performing approaches are that they integrate all
Cell cycle data analysis is an umbrella term used to available data, that they search specifically for
describe a variety of related methods for analyzing a ▶ periodic oscillation consistent with the length of
large datasets related to the cell cycle. These data the cell cycle, and that they take into account the
typically come in the form of time courses a high- ▶ oscillation amplitude (as opposed to only the
throughput assay, for example, DNA microarrays, are period).
applied to samples from multiple time points during The availability of genome-wide expression data
one or more cell cycles. Key aspects of cell-cycle from multiple model organisms also allows for analyses
analysis include data quality assessment, identification of the evolution of cell-cycle regulation. Once a set of
of cell-cycle-regulated genes and proteins, integrative cycling genes or proteins has been identified for each of
analysis of different data types, and evolutionary com- several organisms, cross-species comparisons can be
parisons across species. performed (de Lichtenberg et al. 2008). Mapping the
expression data onto ▶ functional modules and com-
plexes can give a higher-level view of cell-cycle regu-
Characteristics lation that is not obvious from analyses performed at the
level of individual genes or proteins (de Lichtenberg
Like in many other parts of molecular biology, the et al. 2008). Finally, correlations can be discovered
advent of high-throughput assays has had a major between different layers of cell-cycle regulation –
impact on cell-cycle research. ▶ DNA microarray for example, transcriptional and post-translational
technologies have been particularly influential, as regulation – by using ▶ Fisher’s exact test to compare
C 260 Cell Cycle Database

the gene and protein lists obtained through analysis of to genes and proteins involved in human and yeast cell
different types of high-throughput data (de Lichtenberg cycle (see also ▶ Cell Cycle, Biology) processes, one
et al. 2008). of the most studied complex system in biology (see
also ▶ Complex System). The database, which is
accessible at the web site http://www.itb.cnr.it/
Cross-References cellcycle, has been developed in a systems biology
context, since it also stores the cell cycle mathematical
▶ Cell Cycle Analysis, Expression Profiling models (see also ▶ Mathematical Model, Model The-
▶ Cell Cycle of Mammalian Cells ory) published in the recent years, with the possibility
▶ Cell Cycle, Budding Yeast to simulate them directly from the web interface. Cell
▶ Cell Cycle, Fission Yeast cycle information from humans is primary considered
▶ Cyclins and Cyclin-dependent Kinases since this resource aims to support biomedical studies
▶ DNA Microarrays in the context of cancer research. Then the database
▶ Fisher’s Test content is extended toward the budding yeast cell cycle
▶ Functional Modules and Complexes because of the genomic similarity between human and
▶ Gene Expression budding yeast and the large number of models avail-
▶ Model Organism able for this organism. According to this choice, the
▶ Oscillation Amplitude data integration (see also ▶ Data Integration) concerns
▶ Periodic Oscillation all genes and proteins involved in the cell cycle models
▶ Receiver Operating Characteristic (ROC) Curve of both the budding yeast S. cerevisiae and the
H. sapiens. This information is taken from the most
recent literature and plays a crucial role to contextual-
References ize behavior of each cell cycle component.

de Lichtenberg U, Jensen TS, Brunak S, Bork P, Jensen LJ


(2008) Evolution of cell cycle control: same molecular
machines, different regulation. Cell Cycle 6:1819–1825
Characteristics
Gauthier NP, Jensen LJ, Wernersson R, Brunak S, Jensen TS
(2010) Cyclebase.org: version 2.0, an updated comprehen- Motivation
sive, multi-species repository of cell cycle experiments and The aim of our initiative is to give an exhaustive view
derived analysis results. Nucleic Acids Res 38:D699–D702
of the cell cycle process starting from its building
blocks, genes and proteins, arriving to the pathway
they create, represented by the models, as summarized
Cell Cycle Database in Fig. 1.
The principal goal here is the integration of cell
Roberta Alfieri and Luciano Milanesi cycle information, which can be useful for researchers
Institute for Biomedical Technologies – CNR in the context of systems biology studies. From the
(Consiglio Nazionale delle Ricerche), Segrate, user’s point of view, this initiative presents two impor-
Milan, Italy tant features: the first is a data integration system for
genes and proteins involved in yeast and human cell
cycle processes; the second is a section dedicated to
Synonyms cell cycle models and their mathematical simulation.
The significant information related to cell cycle
Cell division database genes and proteins is a useful annotation of the models’
components and facilitates the exploration of the
relevant features of the whole network. The structure
Definition of this resource allows the storage of new data deriving
from cell cycle models (▶ Cell Cycle Models, Sensi-
The Cell Cycle Database is a biological resource, tivity Analysis), due to its particular structure and the
which collects the most relevant information related pipeline for the automatic data updating: the database
Cell Cycle Database 261 C
Web Interfaces

Genes Proteins Models


• Models list
• Gene Description • Bibliography references
• Related Pathways • Description and Family
• Abstract C
• Gene FASTA Sequence • FASTA Sequence
• Wiring diagram
• SNPs • Domains
• Proteins list
• cDNAs • 3D Structure/Connolly Surface
• SBML components
• Promoter • Mathematical Models
• Formulas
• TFs • Protein-Protein Interactions
• Simulation
• Expression data • Complexes
• Plot/Download results

CellCycle Database
• Genes
Data Sources • Proteins
• Mathematical Models
• Entrez Gene
• PDB
• GenBank • Transpath
• GEO • InterPro Linked Resources SW Resources
• SCPD • Mint
• MGC • Bind • Ensembl Genome Browser • XPPAUT
• DBTSS • Biogrid • CYGD
• SBML Parser
• Transfac • Kegg • Unigene
• Gene Ontology • GNUPLOT
• QPPD • Omim
• BioModels • Intact • Java
• SGD
• PubMed • Reactome • MathML Player/Fonts
• Uniprot

Cell Cycle Database, Fig. 1 Architecture of the Cell Cycle resources (Linked Resources). The Cell Cycle Database com-
Database. The core of the resource is a data warehouse system, prises user-friendly interfaces which show information and, in
which collects data from the most relevant bioinformatics general, provide the functions of the resource, even relying on
resources (Data Sources) and provides links to external third party free software

administrator can update the database content by Furthermore, a repository of the most recent
gene or protein name through the web interface. published cell cycle models to allow the exploration
When a new entry is inserted in the core table, all of their mathematical structure through SBML (Sys-
the external tables will be updated in cascade, tems Biology Markup Language) (Hucka et al. 2003)
while when a new entry is inserted in one of the components and their mathematical simulation is
external table no inward updating occurs. As a result available.
all tables of the database are updated according to the
infrastructure, which is designed for automated data Functionality
integration. The database stores 26 models of the cell cycle inter-
This database is able to collect the most important action networks and it allows the mathematical simu-
information related to cell cycle genes and proteins, lation over time of the quantitative behavior of each
which are drawn from the analysis of the cell cycle model component. To accomplish this task, a web
information available in literature and the existing interface for browsing information related to cell
pathway databases. The database also contains pro- cycle genes, proteins, and mathematical models is
tein-protein interactions taken from several resources available.
making the information on the cell cycle interaction Models and proteins information are directly linked
network as complete as possible. in the resource: for each protein we provide
C 262 Cell Cycle Database

information on the models in which it is involved. In the beginning of the page an instruction calls a XSL
the protein report a list of the published models is stylesheet, which allows the formulas to be viewed
available with a direct link to the specific model report. correctly. This technology allows using the browser
In this framework a pipeline, which allows users functions to find strings inside mathematical expressions
to deal with the mathematical part of the models, in and to change their size, operations which are usually
order to solve, using different variables, the ordinary not possible using images to represents formulas.
differential equation systems that describe the biolog- The simulation section allows users to simulate
ical process, has been implemented. Up to now among a model using the software XPPAUT (Ermentrout
the 26 models stored in the database, 13 models have 2002) and to plot results on the fly in order to capture
the related SBML file and for 12 of them it is possible the dynamical properties of the biological process. In
to run simulations. order to enable the simulation, this section lists the
The flexibility of this database allows the addition of model species (state variables, typically protein con-
mathematical data, which are used for simulating the centrations), its parameters (such as kinetic constants),
behavior of the cell cycle components in the different its algebraic rules, and XPPAUT internal options,
models. The resource is useful both to retrieve informa- using default values. Users can change the initial
tion about cell cycle model components and to analyze values in order to analyze the different natures of the
their dynamical properties. The Cell Cycle Database system dynamics, performing sensitivity and bifurca-
can be used to find system-level properties, such as tions analysis. Once the computation is completed
stable steady states and oscillations, by coupling struc- users can download XPPAUT input and output files
ture and dynamical information about models. and plot results. The web interface allows users to plot
both time courses and phase diagrams for each model
Accessibility after the simulation step. Results are shown with
The web interface allows to retrieve model-related images exported by GNUPLOT (Williams and Kelley
information and a pipeline has been implemented to 1998), the popular portable command-line function
deal with the mathematical part of the models, in order plotting software.
to solve the ordinary differential equations systems
that describe the biological processes. This pipeline Model Annotation
essentially consists in the XHTML-MML-XSL style As the primary data source we consider the BioModels
sheet implementation, as it will be discussed below. In Database (Le Novere et al. 2006) from which we
fact, using this system it is possible to visualize the collect the mathematical model specifically developed
mathematical description of the model and to run sim- for yeast and mammalian cell cycles. We also integrate
ulations by introducing modifications of the initial other published models, which are not stored in the
conditions of state variables and parameters. BioModels Database, manually retrieving them from
Each model is presented in a report structured in literature or from the CellML repository (Cuellar et al.
three sections: the publication data, the SBML data 2003).
structure, and the numerical simulation part. The first Each model is presented in a report, which is struc-
section contains the detailed publication data, the dia- tured in three sections: the publication data, the SBML
gram of the model, the related XML file, and the list of data structure, and the numerical simulation part. The
all the proteins involved in the model, which are linked first section contains the detailed publication data
to the related Cell Cycle Database protein report. (such as the authors, PubMed ID, the abstract, and
In the SBML data structure section, users can explore journal information), the diagram of the model and
the SBML components of the selected model including the related XML file, if available, and the list of all
its mathematical expressions. Mathematical formulas the proteins involved in the model, which are linked to
within the SBML models are expressed using the Math- the related Cell Cycle Database protein report. The
ematical Markup Language (MML). To visualize the protein identifiers are chosen according to the UniProt
expressions on the web our resource relies Database annotation.
on a XHTML + MML page, following the w3c specifi- More in detail, the Cell Cycle Database contains the
cations. In this page MML is in-lined with HTML and at literature information related to each model, the input
Cell Cycle Dynamics, Bistability and Oscillations 263 C
for the simulation software, and the XML file coded References
with SBML specifications (Hucka et al. 2003). We
choose SBML since it is an internationally supported Ausbrooks R et al. W3C Recommendation. Mathematical
Markup Language (MathML) version 2.0, 2 edn. [http://
and widely used language for metabolic networks,
www.w3.org/Math/]. Accessed 21 October 2003
cell-signaling pathways, regulatory networks, and Cuellar AA, Lloyd CM, Nielsen PF, Bullivant DP, Nickerson
many other biological pathways. However, there DP, Hunter PJ (2003) An overview of cellML 1.1, a biolog-
are published cell cycle models not yet implemented ical model description language. SIMULATION: Trans Soc C
Model Simul Int 79(12):740–747
in SBML: for this reason, some SBML models
Ermentrout B (2002) Simulating, analyzing, and animating
included in the database are manually generated dynamical systems: a guide to xppaut for researchers and
using the JigCell Model Builder software (Vass et al. students, SIAM 2002, Philadelphia
2004), a model editor which allows the construction Hucka M et al (2003) The systems biology markup
language (SBML): a medium for representation and
of biochemical reaction networks in SBML
exchange of biochemical network models. Bioinformatics
format, and are validated using the Systems Biology 19(4):524–531
Workbench SBML validator. Mathematical formulas Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
within the SBML models are expressed using Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
Hucka M (2006) BioModels database: a free, centralized
Mathematical Markup Language (MathML or MML)
database of curated, published, quantitative kinetic models
(▶ Systems Biology Markup Language (SBML); of biochemical and cellular systems. Nucleic Acids Res 34:
▶ MathML) (Ausbrooks et al. 2003). D689–D691
Vass M, Allen N, Shaffer CA, Ramakrishnan N, Watson LT,
Tyson JJ (2004) The jigcell model builder and run manager.
Model Curation
Bioinformatics 20(18):3680–3681
As discussed before, the Cell Cycle Database contains Williams T, Kelley C (1998) GNUPLOT: an interactive plotting
models from Biomodels, literature, and authors’ program, version 3.7 organized by: David Denholm
websites, where available. Concerning the models
from Biomodels, the model curation is ensured from
the database reliability, which we trusted in. To ensure
the model curation for the literature models, we tested
and simulated them in order to reproduce the results Cell Cycle Dynamics, Bistability and
shown in the related publication papers. Oscillations

New Model Additions John J. Tyson1 and Béla Novák2


1
The addition of new model in the Cell Cycle Department of Biological Sciences,
Database is manually curated. We periodically mon- Virginia Polytechnic Institute and State University,
itor both Biomodels and CellML repositories and Blacksburg, VA, USA
2
select the new cell cycle models. Moreover we peri- Department of Biochemistry, Oxford Centre for
odically analyze literature publications in order to Integrative Systems Biology, University of Oxford,
discover new mathematical models about the cell Oxford, UK
cycle which are not stored in the model repositories.
New models will undergo the annotation process
described before and they will be included in the Synonyms
database.
Multiple steady states; Toggle switch

Cross-References
Definition
▶ Cell Cycle, Biology
▶ Complex System A molecular regulatory system can maintain itself
▶ Data Integration indefinitely at a steady state where, for every time-
▶ Mathematical Model, Model Theory varying species, the total rate of formation of the
C 264 Cell Cycle Dynamics, Bistability and Oscillations

chemical is exactly balanced by its total rate of (X1 active, X2 inactive) and (X1 inactive, X2 active).
removal. In mathematical terms, we can write The intermediate steady state (X1 semi-active, X2
semi-active) is unstable.
dxi Multistability plays an important role in the theory
¼ fi ðx1 ; x2 ; :::; xn Þ  ri ðx1 ; x2 ; :::; xn Þ; of cell cycle progression. It is proposed that the
dt (1)
i ¼ 1; :::; n characteristic states of cell cycle arrest (G1-arrest,
G2-arrest, metaphase-arrest) correspond to alternative
stable steady states of the underlying chemical reaction
where xi is the concentration of species i, and fi(x1, network controlling the activities of cyclin-dependent
x2, . . ., xn) and ri(x1, x2, . . ., xn) are its rates of formation kinases (CDKs). In this view, cells progress through
and removal. At a steady state, dxi/dt ¼ 0 for all i, the cell cycle (G1 ! S/G2 ! M ! G1 ! . . .) by
irreversible transitions from the G1 steady state to the
fi ðx1 ; x2 ; :::; xn Þ ¼ ri ðx1 ; x2 ; :::; xn Þ; i ¼ 1; :::; n: (2) S/G2 steady state (the G1-S transition, also called
“Start” or the “Restriction Point”), from the S/G2
For any well-specified chemical reaction network, steady state to the M steady state (the G2-M transition),
there always exists at least  one steady state solution; and from the M steady state to the G1 steady state (the
call it x0 ¼ x01 ; x02 ; :::; x0n . Such a steady state can be metaphase-anaphase transition, also called “exit from
classified as either stable or unstable with respect mitosis”). In this view, cell cycle checkpoints work by
to small perturbations away from the steady state. If stabilizing one of these three steady states and
all small perturbations (in any direction in the preventing the transition to the next phase of the cell
n-dimensional state space of the reaction network) cycle.
eventually return to x0, then this steady state is said Bistability is intimately connected to the
to be (locally asymptotically) stable. If there exist existence of spontaneous limit cycle oscillations in
small perturbations (in some particular directions in regulatory networks with both positive feedback (to
state space) that depart from x0 and never return, then create alternative stable steady states) and negative
the steady state is said to be unstable. feedback (to induce spontaneous switching between
Since the rates of formation and removal of any the two stable steady states). This combination of
chemical species in a reaction network are algebraic positive and negative feedbacks is precisely the case
functions of species concentrations, Eq. 2 is a system in the regulatory system governing the eukaryotic
of n nonlinear algebraic equations, which in general cell cycle (Tyson and Novak 2008). In somatic
may have multiple solutions in the positive orthant: xi cells, checkpoint signals prevent spontaneous
 0 for all i. In this case, we can denote the different cycling, but in some circumstances these checkpoints
steady states as x0j, where j ¼ 1, . . ., m. A typical are removed, and the cell cycle proceeds as
situation is the case of three steady states (m ¼ 3), a spontaneous, unfettered, limit cycle oscillation.
where two of the steady states are stable and one is For example, during early embryogenesis, the fertil-
unstable. This case is called bistability. In general, ized egg undergoes a series of rapid cell divisions
there may be more than two stable steady states, in without growth, until it reaches the “mid-blastula
which case we refer to tristability or multistability. We transition,” when checkpoint proteins are expressed
may describe a system with m > 1 as having multiple and the cell cycle regains the characteristic G1-S, G2-
steady states when we are not sure of the total number M, and M-G1 transitions of somatic cells. It is also
of stable steady states. possible to create mutant yeast cells that lack check-
Bistability is a typical feature of reaction networks point controls; these cells divide faster than they
with positive feedback, although one must be aware grow, getting smaller and smaller each cycle until
that positive feedback is not always readily apparent in they die. These observations suggest that, under
reaction networks as they are conventionally drawn. most circumstances, periodic cell divisions are
Double-negative feedback is a particularly common governed not by spontaneous limit cycle oscillations
motif for bistability: X1 inhibits X2, and X2 inhibits but by multistability and irreversible transitions
X1. In this case, the two stable steady states are (Tyson and Novak 2008).
Cell Cycle Dynamics, Bistability and Oscillations 265 C
Cdc20 APC

WeelP
Cdc20-APC
PPase

Wee1
Cyclin
C
CDK-cyclin PCDK-cyclin CDK-cyclinUUU
(MPF) (preMPF)
CDK CDK
Cdc25P

PPase Checkpoint
signal
Cdc25

Cell Cycle Dynamics, Bistability and Oscillations, CDK-cyclin dimers (called MPF). The CDK subunit is phos-
Fig. 1 CDK regulatory network for frog egg cell cycles. In phorylated by Wee1 to form a less active form of MPF, called
this network diagram, a solid arrow represents a biochemical preMPF. The inhibitory phosphate group can be removed by
reaction, and a dashed arrow represents a catalytic effect on Cdc25. Wee1 and Cdc25 activities are also regulated by phos-
a reaction. A T-shaped arrow represents the association of two phorylation, catalyzed by MPF: Wee1P is the less active form,
proteins to form a complex. Black balls on the crossbar indicate and Cdc25P is the more active form. Cyclin B is labeled for
a reversible binding reaction. The small gray balls represent proteolysis by an E3 ubiquitin ligase, the APC, which works in
proteolytic fragments of a protein. Newly synthesized cyclin collaboration with a targeting subunit, Cdc20. The synthesis of
B combines with CDK subunits (in excess) to form active Cdc20 is activated by MPF

Characteristics a protein interaction network that drives MPF activity


up and down in waves of synthesis and degradation of
History cyclin proteins.
In an influential review of cell cycle regulation, Mur-
ray and Kirschner (1989) asked whether progression Systems Biology
through the cell cycle is more like “dominoes” These two views were brought together by Novak and
(a dependent sequence of events: one falling domino Tyson (1993) in an early example of “molecular sys-
pushing over the next) or a “clock” (an autonomous tems biology.” They used the wiring diagram in Fig. 1
oscillation, ticking along independent of the events to derive a mathematical model of spontaneous MPF
being timed). There is good experimental evidence oscillations in frog egg extracts and (in later papers) of
for both views. Early genetic analysis of the budding cell cycle mutants in fission yeast. According to their
yeast cell cycle provided evidence for two parallel theory, the regulation of CDK-cyclin activity by tyro-
dependent sequences (the budding sequence and the sine phosphorylation and dephosphorylation of the
DNA replication sequence) that diverged in G1 phase CDK subunit, by Wee1 (the kinase) and Cdc25 (the
(at the Start transition) and reconnected at the end of phosphatase), creates a bistable switch that governs
the cycle (exit from mitosis). On the other hand, bio- the transition from G2 phase into mitosis (a state of
chemical studies of frog egg extracts suggested an high CDK activity); see Fig. 2a. This switch has all the
autonomous oscillation of mitosis-promoting factor properties of a classic cell cycle checkpoint. To pass
(MPF) that drives periodic DNA replication and mito- the checkpoint, a cell must fully replicate its DNA and
sis, but generates periodic bursts of MPF activity quite grow to a sufficient size. Once past the checkpoint the
independently of chromosomes and nuclei. In the first transition is irreversible; the cell does not slip back into
view, cell cycle progression is orchestrated by a gene G2 phase and try to enter mitosis a second time.
regulatory network flipping genes on and off in a strict Rather, the cell must complete the stages of mitosis
sequence intimately connected to cell growth. In the and activate cyclin degradation at the metaphase-
second view, cell cycle progression is governed by anaphase transition. Cyclin degradation allows the
C 266 Cell Cycle Dynamics, Bistability and Oscillations

a b
MPF activity

MPF activity
0
qinact qact Total cyclin
Total cyclin

c
MPF activity

Total cyclin

Cell Cycle Dynamics, Bistability and Oscillations, Cdc20 is produced and cyclin is degraded faster than it is syn-
Fig. 2 Signal-response curves: MPF activity as a function of thesized. The bistable switch flips back to the low-activity state
total cyclin. (a) Bistable switch. If cyclin synthesis and degra- once [total cyclin] drops below the inactivation threshold. The
dation are blocked, then the total cyclin concentration ([MPF] + white circle represents an unstable steady state. (c) Damped
[preMPF]) in a frog egg extract can be manipulated experimen- sinusoidal oscillations. If the CDK phosphorylation site (a tyro-
tally. For [total cyclin] between the two thresholds, yinact and sine residue) is mutated to phenylalanine, then the positive
yact, the control system has two stable steady states separated by feedback loops involving Wee1 and Cdc25 are disengaged and
an unstable steady state. The dotted line shows how MPF activity bistability is lost. The system now undergoes damped oscilla-
will respond to slowly increasing [total cyclin] and then slowly tions to a stable steady state (black circle). If there is enough time
decreasing [total cyclin]. (b) Relaxation oscillations. If cyclin is delay in the negative feedback loop through Cdc20-APC, then
steadily synthesized, then [total cyclin] will increase when MPF the control system might exhibit sustained oscillations, but they
is in the low-activity state, because Cdc20 is unavailable, but are quite distinct from the relaxation oscillations in panel B
once the system flips to the state of high MPF activity, then

bistable switch to be reset to the state of low CDK generate spontaneous oscillations of MPF activity,
activity. Repeated cycles of DNA synthesis and mito- unconstrained by requirements of cell growth, DNA
sis correspond to repeated flipping of the switch from synthesis, DNA damage, or mitotic spindle functions.
a stable steady state of low CDK activity (interphase) The spontaneous oscillations are driven by periodic
to a stable steady state of high CDK activity (M phase) bursts of cyclin degradation, which are in turn gener-
and back again. (Physicists and engineers call this ated by the periodic activation of MPF by dephosphor-
behavior a “hysteresis loop”.) ylation of the CDK subunit. These spontaneous
Novak and Tyson showed, furthermore, that under oscillations result from an interplay between the
the special conditions in an early frog embryo or in bistable switch and a negative feedback loop (MPF
a frog egg extract, the CDK-cyclin control system can activates the cyclin degradation machinery which
Cell Cycle Dynamics, Bistability and Oscillations 267 C
destroys cyclin subunits, causing a loss of MPF activ- This notion became the basis of a successful kinetic
ity); see Fig. 2b. (Physicists and engineers call this model of the budding yeast cell cycle by Chen et al.
behavior a “relaxation oscillator.”) (2000). The model made many predictions that were
The mathematical model of Novak and Tyson not tested in Fred Cross’s laboratory. In particular, Cross
only gave a systematic account of a variety of experi- et al. (2002) tested the idea that, under “neutral” con-
mental observations on fission yeast, budding yeast, ditions, budding yeast cells could persist indefinitely in
and frog eggs, but also made a series of striking either the G1 state, with low CDK activity, or the C
predictions: S-G2-M state, with high CDK activity, depending on
1. The “cyclin threshold for MPF activation” that had the immediately prior history of how the cells are
been observed by Solomon et al. (1990) is only one- brought into the neutral conditions. This experiment is
half of a hysteresis loop. There should be a separate, explained in the article on “Cell cycle, budding yeast.”
distinctly lower cyclin threshold for MPF inactiva-
tion (Fig. 2a). Irreversibility
2. For cyclin levels well above threshold, the time lag The concept of bistability provides an immediately
for MPF activation should be roughly constant (as obvious and intuitively satisfying explanation of the
observed by Solomon et al. (1990)), but as cyclin irreversibility of progression through the cell cycle. If
level approaches the activation threshold from cell cycle transitions are a result of passing from one
above, the time lag for MPF activation should stable steady state to another, then it is clear that the
increase dramatically. reverse transition cannot follow the same path. Once
3. The cyclin level for MPF activation should be an the transition is made, then a completely different set
increasing function of unreplicated DNA in an of circumstances must be brought into play to accom-
extract. plish the reverse transition. For example, in Fig. 2a the
4. If the bistable switch is disabled by interfering with activation of MPF is accomplished by increasing total
the inhibitory phosphorylation of CDK, then MPF cyclin concentration above the threshold, yact, where
oscillations may still be possible, but they will be the unstable steady state merges with and annihilates
faster, more smooth (not abrupt bursts of MPF the steady state of low MPF activity and forces the
activity), and probably damped (Fig. 2c). system to switch to the steady state of high MPF
Predictions 1 and 2 are generic properties of activity. If subsequently the cyclin level is caused to
bistable systems with hysteresis. Prediction 3 was decrease, the control system does not jump back to the
a novel proposal for the mode of action of lower steady state at yact, which would be the case for
a checkpoint signal that delays entry into mitosis. a “reversible” transition. Rather, the cyclin level must
Prediction 4 relates to classic properties of oscillations decrease below a much smaller threshold, yinact, before
driven by a “simple negative feedback loop” in com- the down-jump occurs.
parison to relaxation oscillations in a “substrate- In the budding yeast case, the switch from the low-
depletion relaxation oscillator.” CDK state to the high-CDK state is driven by Cln-
dependent kinase activity, but the switch back is driven
Experimental Verification by a completely different mechanism, dependent on
The first three predictions of the Novak-Tyson model the activities of Cdc20-APC (degradation of B-type
were confirmed in frog egg extracts independently and cyclins) and Cdc14 (a CDK-counteracting phospha-
simultaneously by Sha et al. (2003) and Pomerening tase). In this case, the up-jump is a one-way switch: it
et al. (2003); see Fig. 3. Slightly later, Pomerening is induced by Cln-dependent kinase, but after the
et al. (2005) confirmed prediction 4 in a thorough and switch is made, the Cln-kinase activity can drop to
careful study of MPF oscillations in frog egg extracts; 0 and the CDK activity will remain high. Down-jump
see Fig. 4. occurs by means of a different one-way switch. In the
In 1996, Kim Nasymth (1996) proposed that, at its “neutral” condition (Cln ¼ Cdc20 ¼ Cdc14 ¼ 0), the
heart, the budding yeast cell cycle is a repetitive alter- control system can persist indefinitely in either the
nation between two “self-maintaining” states (i.e., sta- low-CDK or the high-CDK state. For more details,
ble steady states): a G1 state with low CDK activity see the articles on “Cell cycle, budding yeast” and
and an S-G2-M state with high CDK activity. “Cell cycle dynamics, irreversible transitions.”
C 268 Cell Cycle Dynamics, Bistability and Oscillations

a Entry into mitosis 1


Δ90 cyclin B (nM) : 0 16 24 32 40

0min

90min

b Exit from mitosis 1

Δ90 cyclin B (nM) : 0 16 24 32 40

60min

140min

M M M

Cell Cycle Dynamics, Bistability and Oscillations, is supplemented with variable amounts of nondegradable cyclin
Fig. 3 Experimental confirmation of bistability in frog egg at t ¼ 0, but cycloheximide is not added until t ¼ 60 min. By
extracts. From Sha et al. (2003), used by permission. (a) A frog t ¼ 60 min, the nuclei in each sample have been driven into
egg extract, containing all the proteins in Fig. 1 except for cyclin, mitosis 1 by a combination of the nondegradable cyclin added at
is prepared in the presence of sperm nuclei (indicators of MPF t ¼ 0 and the degradable cyclin subunits synthesized from the
activity) and cycloheximide (a drug that prevents the synthesis extract’s endogenous mRNA. As the extracts try to exit from
of cyclin protein from endogenous mRNA). In the absence of mitosis 1, the endogenous cyclin subunits are degraded by
cyclin subunits, the extract is blocked in interphase (t ¼ 0), as Cdc20-APC, but the exogenous D90 cyclin subunits resist deg-
indicated by the compact nucleus with intact nuclear membrane radation. Cycloheximide prevents any further cyclin synthesis in
and dispersed chromatin (stained blue). At t ¼ 0 the extract is the extracts. At t ¼ 140 min the extracts are assayed for the cell
supplemented with a measured amount of nondegradable (D90) cycle phase of the nuclei. Cyclin concentrations greater than
cyclin, and 90 min later the extract is observed to see if the nuclei 20 nM are sufficient to maintain the nuclei in a mitotic state.
are in interphase (dispersed chromatin, intact membrane, low [Total cyclin] ¼ 16 nM is below the threshold (blue down-
MPF activity) on in mitosis (condensed chromatin, breakdown triangle) for inactivation of MPF and exit from mitosis. Cyclin
of nuclear membrane, high MPF activity). Cyclin concentrations concentrations of 24 and 32 nM are clearly in the bistable region:
less than 35 nM are insufficient to induce entry into mitosis, the nuclei can persist stably in interphase or in mitosis,
but [total cyclin] ¼ 40 nM is above the threshold (blue up- depending on whether they are prepared initially in interphase
triangle) for mitotic entry. (b) In a second experiment, the extract or mitosis

Checkpoints exiting mitosis. If these transitions are governed by


The concept of bistability also provides a natural frame- saddle-node bifurcations of a bistable system (as in
work for understanding the mechanisms of checkpoint Fig. 2a), then the transition can be delayed or blocked
controls. The function of a checkpoint is to block or completely by raising the threshold for the transition
delay progression of a damaged cell into the next phase (yact). From the mathematical model of the transition it
of the cell cycle. For example, DNA damage should is immediately obvious which components control the
block cells from entering S phase, incomplete DNA location of the threshold. For example, in Fig. 1, it is the
ligation should block cells from entering M phase, and activity of the CDK-counteracting phosphatase (PPase)
misaligned chromosomes should block cells from that is the vulnerable point.
Cell Cycle Dynamics, Bistability and Oscillations 269 C
a M b
120 140

100 120 ~6.6-fold

MPF activity (% of maximum)


(% of maximum)
Relative levels

80
35S-cyclin 100
60
40 80 C
20 MPF activity 60
0
Time (min) 0 20 40 60 80 100 40

35S-cyclin 20
MPF 0
activity 0 20 40 60 80 100 120
35S-cyclin (% of maximum)

c 200 nM Cdc2wt 200 nM Cdc2AF


(relative units)
MPF activity

0 1.0 2.0 3.0 0 1.0 2.0 3.0 4.0


Relative time Relative time
(peak 2 = 2.0) (peak 2 = 2.0)

Cell Cycle Dynamics, Bistability and Oscillations, endogenous Cdc25wt (wild-type) protein, is added 200nM of
Fig. 4 Experimental confirmation of relaxation oscillations in either Cdc25wt protein or Cdc2AF protein, which cannot be
frog egg extracts. From Pomerening et al. (2005), used by phosphorylated and inhibited by Wee1. Hence, in the extract
permission. (a) In a “cycling” extract, cyclin subunits are con- on the right there are roughly equal amounts of Cdc2wt and
tinuously synthesized (blue curve), but MPF activity (red curve) Cdc2AF subunits, compared to the “control” extract on the left
is low until total cyclin exceeds the threshold for MPF activa- which contains 400 nM Cdc2wt subunits. The mutant kinase
tion. The abrupt activation of MPF drives nuclei into mitosis subunits compromise the positive feedback loops in the model
(M). As the extract exits from mitosis, cyclin is degraded by (Fig. 1) and change the properties of MPF oscillations. Com-
Cdc20-APC, and MPF activity falls as a result. (b) The data in pared to the blue curve, the MPF oscillations in the red curve are
panel A is projected onto state space (MPF activity versus total faster, more sinusoidal and noticeably damped, exactly as
cyclin level), as in panel A of Fig. 2bB. (c) Confirmation of predicted by the mathematical model
prediction 4. To a cycling extract, containing 200 nM

Cross-References References

▶ Cell Cycle Dynamics, Irreversibility Chen KC, Csikasz-Nagy A, Gyorffy B, Val J, Novak B,
Tyson JJ (2000) Kinetic analysis of a molecular
▶ Cell Cycle Model Analysis, Bifurcation Theory
model of the budding yeast cell cycle. Mol Biol Cell
▶ Cell Cycle Modeling, Differential Equation 11:369–391
▶ Cell Cycle of Early Frog Embryos Cross FR, Archambault V, Miller M, Klovstad M (2002) Testing
▶ Cell Cycle, Budding Yeast a mathematical model for the yeast cell cycle. Mol Biol Cell
13:52–70
▶ Cell Cycle, Fission Yeast
C 270 Cell Cycle Dynamics, Irreversibility

Murray AW, Kirschner MW (1989) Dominoes and clocks: the transitions (Novak et al. 2007). By “irreversibility” we
union of two views of the cell cycle. Science 246:614–621 understand that, once cells have committed to a new
Nasmyth K (1996) At the heart of the budding yeast cell cycle.
Trends Genet 12:405–412 round of DNA replication (at the G1/S transition), they
Novak B, Tyson JJ (1993) Numerical analysis of a comprehen- do not typically slip back into G1 phase and do
sive model of M-phase control in Xenopus oocyte extracts a second round of DNA replication. Similarly, once
and intact embryos. J Cell Sci 106:1153–1168 cells have committed to mitosis (at the G2/M transi-
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell
cycle oscillator: hysteresis and bistability in the activation of tion), they do not typically slip back into G2 phase and
Cdc2. Nat Cell Biol 5:346–351 try for a second mitotic division.
Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level Like most “rules” in biology, this one has important
dissection of the cell-cycle oscillator: bypassing positive exceptions (Morgan 2007). Some cells carry out
feedback produces damped oscillations. Cell 122:565–578
Sha W, Moore J et al (2003) Hysteresis drives cell-cycle transi- repeated rounds of DNA replication without interven-
tions in Xenopus laevis egg extracts. Proc Natl Acad Sci ing mitoses, becoming polyploid. During meiosis,
USA 100:975–980 a germ line cell undergoes two meiotic divisions with-
Solomon MJ, Glotzer M et al (1990) Cyclin activation of out an intervening S phase. But these exceptions only
p34cdc2. Cell 63:1013–1024
Tyson JJ, Novak B (2008) Temporal organization of the cell reinforce the centrality of the general rule of somatic
cycle. Curr Biol 18:R759–R768 cell cycles, namely, irreversible progression through
the cell cycle phases (G1, S, G2, M) in strict order.
The irreversibility of the three transitions is inti-
mately connected with cell cycle checkpoints, which
Cell Cycle Dynamics, Irreversibility halt further progression through the cell cycle when-
ever serious problems are detected (Murray 1992). For
John J. Tyson1 and Béla Novák2 example, DNA damage incurred during G1 phase will
1
Department of Biological Sciences, Virginia block the G1/S transition until the damage is repaired.
Polytechnic Institute and State University, Failure to fully replicate and ligate DNA molecules
Blacksburg, VA, USA during S phase will block the G2/M transition. Incom-
2
Department of Biochemistry, Oxford Centre for plete alignment of replicated chromosomes on the
Integrative Systems Biology, University of Oxford, mitotic spindle will block the M/G1 transition. When
Oxford, UK a checkpoint is lifted, the cell proceeds irreversibly to
the next phase of the cell cycle.
These irreversible transitions and checkpoints are
Synonyms controlled by complex molecular regulatory networks
with both positive and negative feedback. Positive
Bistability; Checkpoints; Hysteresis feedback creates a bistable switch, and negative
feedback allows the switch to be flipped from one
stable state to another (Tyson and Novak 2008).
Definition These systems-level properties of the regulatory net-
work are crucial to irreversible progression through the
The cell cycle is the sequence of events whereby a cell cell cycle, and when they are disturbed by mutation,
replicates its chromosomes and partitions the identical drugs, or disease, then cells make mistakes in DNA
sister chromatids to two separate nuclear compart- replication and partitioning, often with fatal conse-
ments (Morgan 2007). In eukaryotic cells, the phases quences for the cell or for the multicellular organism
of DNA synthesis (S phase) and mitosis (M phase) are harboring the rogue cells.
temporally distinct and separated by gap phases (G1 –
unreplicated chromosomes, and G2 – replicated chro-
mosomes). To maintain the proper ploidy of a cell Characteristics
lineage, generation after generation, it is essential
that S and M phases are strictly alternating. This alter- Physiology and Molecular Biology
nation is enforced by the irreversibility of three crucial Progression through the cell cycle is governed by a set
transitions in the cell cycle: the G1/S, G2/M, and M/G1 of cyclin-dependent kinases (CDKs) that initiate DNA
Cell Cycle Dynamics, Irreversibility 271 C
S-G2 phase M phase (see Fig. 1). During S and G2 phases, the cell
is producing M-phase cyclins, but CDK/cyclin-M
High CDKS activity
activity is low because of inhibitory phosphorylation
of the CDK subunit. At the G2/M transition, the
G2 / M M phase kinases that impose this inhibition must be inactivated,
TF↑
P-CDKM and the phosphatases that remove this inhibition must
ULE↓ CDKM
CKI↓ be activated, thereby allowing the cell to transit C
M / G1 abruptly into mitosis. At the M/G1 transition, these
Exit two switches must be reversed as well.
G1 / S
Start The scenario just described is a generic account of
TF↓ molecular events at the three basic transitions of the
ULE↑
cell cycle. In any specific type of cell (e.g., budding
CKI↑
yeast, fission yeast, mammalian cell) there may be
G1 phase variations on this general scheme, but most cell types
Low CDK activity
studied in depth show evidence of CDK-governing
toggle switches that are flipped on and off at alternat-
Cell Cycle Dynamics, Irreversibility, Fig. 1 Three check- ing transitions of the cell cycle.
points/irreversible transitions in the eukaryotic cell cycle. G1 The greatest exception to this rule is the newly
phase: low CDK activity. S-G2 phase: CDK-cyclinS activity is fertilized egg, a gigantic cell that undergoes rapid
high, CDK-cyclinM activity is low. M phase: CDK-cyclinM
activity is high. At the G1/S transition, cyclin synthesis is turned S-M cycles, lacking gap phases and checkpoints. It is
on, cyclin degradation is turned off, and CDK inhibitors are arguable whether these cell cycles are governed by
destroyed. At the M/G1 transition, these switches are reversed. irreversible toggle switches or a simple periodic oscil-
In S-G2 phase, CDK-cyclinM activity is kept low by phosphor- lator. Nonetheless, it is true that early embryonic cell
ylation of the CDK subunit. At the G2/M transition, this inhib-
itory phosphorylation is removed cycles often make serious mistakes (producing poly-
ploid or aneuploid cells) that lead ultimately to death of
the embryo. These errors seem to be a price that organ-
replication and mitosis and that inhibit the transition isms are willing to pay in order to hurry through the
from metaphase to anaphase-telophase-cytokinesis most vulnerable phase of their life cycle. In this con-
(“exit from mitosis”). G1 phase cells are uncommitted text, these errors reinforce the notion that toggle
to the cell cycle because they lack the necessary CDK switches and irreversible transitions are crucial to
activities. They are devoid of S- and M-phase cyclins maintaining genomic integrity of proliferating eukary-
because the transcription factors that would drive pro- otic cells.
duction of these cyclins are inactive, and the ubiquitin-
ligating enzymes (ULE) that label these cyclins for Bistability and Hysteresis
proteolysis are active. In addition, G1 phase cells From a theoretical perspective, irreversible transitions
contain generous amounts of CDK inhibitors (CKIs) are intimately related to bistability of molecular regu-
that bind to and inhibit CDK-cyclin dimers. latory networks. The article on “cell cycle dynamics,
At the G1/S transition (called “Start” in budding bistability and oscillations” discusses the molecular
yeast and the “Restriction Point” in mammalian cells), mechanisms underlying bistability and hysteresis at
a cell must do three things: (1) turn on the synthesis of the G2/M transition. Here we describe the molecular
S- and M-phase cyclins, (2) turn off their degradation, basis of irreversibility at Start and Exit. Figure 2 illus-
and (3) get rid of the G1-phase CKIs (See Fig. 1). At trates the generic interactions among CDK-cyclin, its
the M/G1 transition (“Exit”), these three switches must transcription factors, its ubiquitin-ligating enzymes,
be reversed. As cells proliferate, a lineage (mother- and its stoichiometric inhibitors. In each case, CDK
daughter-granddaughter) toggles back and forth is involved in a positive feedback interaction with
between states of low CDK activity (G1) and high a “friend” (TF) or a double-negative feedback interac-
CDK activity (S-G2-M). tion with an “enemy” (ULE or CKI). These interac-
Embedded inside this fundamental toggle switch is tions create the possibility of bistability: two stable
a secondary toggle switch governing the transition into steady states (“nodes”) separated by an unstable steady
C 272 Cell Cycle Dynamics, Irreversibility

SK SK

TF TFP ULE ULEP

EP EP
CDK cyclin CDK-cyclinUUU

CDK CDK
EP*
SK
CKI-CDK-cyclin

CKI PCKI PCKIUUU

EP

Cell Cycle Dynamics, Irreversibility, Fig. 2 CDK regulatory CKI and CDK-cyclin. These interactions create a bistable
network for a generic eukaryotic cell. Cyclin synthesis is regu- control system, with a stable steady state of low CDK activity
lated by a transcription factor (TF), and cyclin degradation is (G1 phase) and an alternative stable steady state of high CDK
regulated by a ubiquitin-ligating enzyme (ULE). CDK-cyclin activity (S-G2-M phase); see Figs. 1 and 3. Starter kinase activ-
dimers can also be inhibited by binding to an inhibitor protein ity (SK) tips the control system toward the high CDK state, and
(CKI). CKI stability is controlled by phosphorylation: PCKI is exit phosphatase activity (EP) tips the system toward the low
a good substrate for a different ubiquitin-ligating enzyme, which CDK state. SK and EP are related to CDK by negative feedback
labels CKI for degradation by proteasomes. Because TF is acti- loops. SK upregulates CDK-cyclin by flipping the switch to the
vated by phosphorylation by CDK-cyclin, these two components high CDK state, but high activity of CDK-cyclin downregulates
are involved in a positive feedback loop. Because ULE is the synthesis of SK. On the other hand, high activity of CDK-
inactivated by phosphorylation by CDK-cyclin, these two com- cyclin activates EP, which flips the switch to the low CDK state
ponents are involved in a double-negative feedback loop, as are and consequently EP is inactivated

state (a “saddle point”). The stable steady states are TF toward its “off” state, allowing the bistable switch
characterized by either low or high CDK activity to flip off. The M/G1 transition is irreversible because,
(Fig. 3). The CDK control system can persist in either even though EP activity disappears in early G1, the
of these stable steady states, corresponding to G1 or control system remains in the stable G1 steady state
S-G2-M phases of the cell cycle, respectively. (Tyson and Novak 2008).
For a G1 cell to commit to a new round of DNA
replication and division, it must be promoted from the Irreversibility and Proteolysis
G1 steady state to the S-G2-M steady state. This is the For a time it was fashionable among molecular biolo-
job, generically speaking, of a “starter kinase” (SK), gists to attribute irreversibility of cell cycle transitions
typically a special cyclin-CDK pair (Cln3-Cdc28 in to the proteolysis of key cell cycle regulators at each
budding yeast, cyclin D-Cdk4/6 in mammalian cells). transition. Not only are cyclins dramatically degraded
Transient activation of SK can induce a transition from at the M/G1 transition (Minshull et al. 1989), but also
G1 to S-G2-M. Even if SK activity disappears after the CKIs are degraded at the G1/S transition (Schwob et al.
transition, the control system will remain in the stable 1994) and CDK-inactivating kinases are degraded at
steady state of high CDK activity. In this sense, the the G2/M transition (Michael and Newport 1998). As
G1/S transition is irreversible. To exit mitosis and simple and appealing as this idea might be, it is clearly
return to G1, the cell must transit from the high- to insufficient to explain the irreversibility/directionality
the low CDK steady state. This is the job of generic of cell cycle progression.
“exit phosphatases” (EP) that are activated during ana- First of all, proteolysis is not irreversible in a kinetic
phase and telophase of the cell cycle. These phospha- sense; its opposing process is protein synthesis. In
tases push ULE and CKI toward their “on” states and general, the rates of protein synthesis and degradation
Cell Cycle Dynamics, Irreversibility 273 C
CDK In conclusion, the irreversibility of cell cycle tran-
S / G2 / M sitions, which is crucial to maintaining the genomic
integrity of proliferating cells, is a systems-level
property of the molecular networks that regulate
CDK activity in eukaryotic cells. Positive feedback
loops in the control system generate multiple
stable steady states (G1, S-G2 and M states), and C
abrupt, irreversible transitions between these stable
G1 steady states are induced by pro-proliferative signals
(growth factors, size increase, successful completion
EP SK of prior cell cycle events) and inhibited by checkpoint
signals (indicators of potentially fatal mistakes in
Cell Cycle Dynamics, Irreversibility, Fig. 3 Bistability, irre-
versibility, and hysteresis. The steady states of the control system the genome replication process). Irreversibility is
in Fig. 2 are related to SK and EP activities by this bifurcation not a property of any single gene, protein, or reaction
diagram. When SK ¼ EP ¼ 0, the control system has two stable but rather an emergent property of systems-level
steady states (black circles) separated by an unstable steady state
interactions.
(white circle). The cell can be in either G1 phase or S-G2-M
phase, depending on its recent history. A newborn cell, by
definition, is in G1 phase. For it to start a new round of DNA
synthesis and degradation, a starter kinase (SK) must be acti- Cross-References
vated. Activation of SK is controlled by checkpoint signals, such
as growth factors, DNA damage sensors, size sensors, etc. Once
SK flips the switch into the CDK-active state, it is no longer ▶ Cell Cycle Dynamics, Bistability and Oscillations
needed: SK activity can drop to 0, and the control system remains ▶ Cell Cycle Model Analysis, Bifurcation Theory
in the CDK-active state. For the cell to exit mitosis, undergo ▶ Cell Cycle Modeling, Differential Equation
cytokinesis, and return to G1 phase, an exit phosphatase (EP)
▶ Cell Cycle of Early Frog Embryos
must be activated. EP activation depends on other checkpoint
signals, such as complete replication of DNA and successful ▶ Cell Cycle, Budding Yeast
alignment of all replicated chromosomes on the mitotic spindle ▶ Cell Cycle, Fission Yeast

will balance each other at a unique, stable steady state


of protein abundance. To upset this balance, it is nec-
References
essary to switch the rates of protein synthesis and Michael WM, Newport J (1998) Coupling of mitosis to the
degradation between low and high values, and such completion of S phase through Cdc34-mediated degradation
switching behavior is a property of feedback interac- of Wee1. Science 282:1886–1889
tions in a regulatory network. Minshull J, Pines J, Golsteyn R, Standart N, Mackie S, Colman A,
Blow J, Ruderman JV, Wu M, Hunt T (1989) The role of
In addition, it is possible, by mutations and/or drug cyclin synthesis, modification and destruction in the control
treatments, to eliminate proteolysis at any one of the of cell division. J Cell Sci Suppl 12:77–97
three cell cycle transitions without compromising its Morgan DO (2007) The cell cycle: Principles of control. New
irreversibility. For example, ckiD mutants in budding Science, London
Murray AW (1992) Creative blocks: cell-cycle checkpoints and
yeast proceed irreversibly through the G1/S transition feedback controls. Nature 359:599–604
without proteolysis of any of the remaining regulatory Novak B, Tyson JJ, Gyorffy B, Csikasz-Nagy A (2007) Irrevers-
proteins (TF and ULE). Proteasome-inhibited mam- ible cell-cycle transitions are due to systems-level feedback.
malian cells, blocked in mitosis by nocodazole, can Nat Cell Biol 9:724–728
Potapova TA, Daum JR, Pittman BD, Hudson JR, Jones TN,
be induced to exit mitosis irreversibly by transient Satinover DL, Stukenberg PT, Gorbsky GJ (2006) The
inhibition of CDK activity by flavopiridol (Potapova reversibility of mitotic exit in vertebrate cells. Nature
et al. 2006). Frog egg extracts, without either cyclin 440:954–958
synthesis or degradation, can be induced to enter or Schwob E, Bohm T, Mendenhall MD, Nasmyth K (1994) The
B-type cyclin kinase inhibitor p40SIC1 controls the G1 to S
exit mitosis simply by manipulating the activities of transition in S. cerevisiae. Cell 79:233–244
the CDK-inhibiting kinase and/or the CDK-activating Tyson JJ, Novak B (2008) Temporal organization of the cell
phosphatase. cycle. Curr Biol 18:R759–R768
C 274 Cell Cycle Kinases

Characteristics
Cell Cycle Kinases
Historical Background
▶ Mitotic Kinases Since the days of Isaac Newton, ordinary differential
equations (ODEs) have been used throughout the
physical and life sciences to describe the temporal
development of dynamical systems: from the solar
Cell Cycle Model Analysis, Bifurcation system, to the clock radio, to the regulation of DNA
Theory replication and cell division. Initially the focus was on
ODEs that could be solved exactly in terms of
John J. Tyson “elementary” functions of high school algebra and
Department of Biological Sciences, Virginia trigonometry or “special” functions of mathematical
Polytechnic Institute and State University, physics. But in the 1890s Poincaré (1899) introduced
Blacksburg, VA, USA the “qualitative” theory of dynamical systems, i.e.,
systems of n nonlinear ODEs

Synonyms dx
¼ fðx; pÞ; xðtÞ ¼ fx1 ; :::; xn g ¼ Variables;
dt (1)
Nonlinear dynamical systems theory
p ¼ fp1 ; :::; pm g ¼ Parameters

Definition Poincare proposed to interpret these equations as


a vector field in n-dimensional state space, x, and to
Bifurcation theory provides a classification of the characterize this vector field by its invariant solutions,
expected ways in which the number and/or stability which can be either attractors or repellers. The crucial
of invariant solutions (“attractors” and “repellers”) of question for Poincare was not “what is the exact solu-
nonlinear ordinary differential equations may change tion of the ODE?” but “how do the qualitative features
as parameter values are changed. The most common of the attractors and repellers depend on the values of
qualitative changes are “saddle-node” bifurcations, the parameters?” This latter question is the subject of
“Hopf ” bifurcations, and “SNIPER” bifurcations. At bifurcation theory (Odell 1980; Strogatz 1994), which
a saddle-node bifurcation, a pair of steady states, was developed in the mid-twentieth century by
usually a stable node and an unstable saddle point, Andronov’s school of Russian physicists and engineers
coalesce and disappear. At a Hopf bifurcation, for n ¼ 2 (Andronov et al. 1966), and later by a host of
a stable focus changes to an unstable focus and mathematicians for the general case (Kuznetsov 2004).
makes way for a small amplitude periodic solution A one-parameter bifurcation diagram begins with
(“limit cycle”). At a SNIPER bifurcation, the coales- a plot of the steady-state value of a chosen dynamical
cence of a saddle point and a stable node creates an variable, xi, as a function of a chosen parameter, pj, the
infinite-period limit-cycle solution. These bifurca- “bifurcation parameter.” In Fig. 1a, we plot a typical
tions have clear physiological correlates in the regu- bifurcation diagram for a bistable system. Between the
lation of DNA replication, mitosis, and cell division. two thresholds, yinact < pj < yact, the system can persist
Saddle-node bifurcations are related to checkpoints in in either of two stable steady states (xi small or xi
the cell cycle: The establishment and removal of large). Precisely at the thresholds, pj ¼ yinact and
checkpoints correspond to the creation and annihila- pj ¼ yact, the dynamical system undergoes
tion of stable steady states at saddle-node bifurca- a bifurcation from one type of behavior (a single stable
tions. The repetitive nature of the cell cycle (G1-S- steady state) to a qualitatively different type of behav-
G2-M-G1-etc.) is related to limit-cycle solutions of ior (bistability). This type of bifurcation is called
the underlying kinetic equations: The ability to oscil- a “saddle-node” or “fold.” In Fig. 1b, we illustrate
late spontaneously arises at either a Hopf or a “Hopf” bifurcation, in which a stable steady state
a SNIPER bifurcation. loses stability and gives rise to stable limit-cycle
Cell Cycle Model Analysis, Bifurcation Theory 275 C
a b and (perhaps) for good reasons. First of all, morpho-
genesis is governed, we know, by the interactions of
SN genes and proteins (i.e., a biochemical interaction net-
xi xi Hopf work), which is not a gradient dynamical system.
SN Hence, the bifurcations of relevance to molecular cell
biologists are not the singularities of potential func-
tions (Thom’s case) but rather the generic bifurcations C
qinact qact
of nonlinear ODEs (the case of Andronov et al.). But
pj
pj Thom’s more fundamental idea (stripped of its unfor-
tunate alliance to Waddington’s energy landscape) –
Cell Cycle Model Analysis, Bifurcation Theory, Fig. 1 One- that qualitative changes in cell physiology should be
parameter bifurcation diagrams. (a) Saddle-node bifurcation.
Solid line: stable steady state; dashed line: unstable steady correlated with qualitative changes in the attractors
state. (b) Hopf bifurcation. Thin solid line: stable steady state; and repellers of a vector field (a system of nonlinear
thin dashed line: unstable steady state; thick solid line: maximum ODEs) – is absolutely correct. It is the basis of the
and minimum values attained by a stable limit-cycle oscillation application of bifurcation theory to problems in molec-
ular cell biology.
oscillations. The limit cycles are born with small
amplitude and grow in size as the parameter value One-Parameter Bifurcation Diagrams and Signal-
pulls away from the bifurcation point. We shall meet Response Curves
some other types of bifurcations shortly. The connection between bifurcation theory and cell
Of special interest to systems biologists are the physiology is the signal-response curve. In a typical
musings of Rene Thom (1989) on “structural stability experiment, a molecular cell biologist might challenge
and morphogenesis.” Thom was highly regarded cells with increasing amounts of an extracellular signal
among mathematicians for his study of gradient molecule and measure whether certain downstream
dynamical systems, genes are expressed or not. And a typical result is
that, for low signal levels there is no expression, but
dx for signal levels above a certain threshold there is
¼ =Uðx1 ; :::; xn Þ (2) strong expression of the gene (Fig. 2). In this circum-
dt
stance, it is natural to ask what happens if the signal
where U(x) is a scalar function of the variables (think level is steadily decreased in cells that are expressing
of it as the “potential energy” of the system) and protein R? Do they turn off at the same signal strength
= ¼ ð@=@x1 ; :::; @=@xn Þ is the gradient operator. The where they turned on? Or at a much lower signal
steady-state solutions of Eq. 2 are places where strength? Or not at all?
=U ¼ 0, i.e., “singularities” of the potential function. In the first case, the signal-response curve is
Thom’s great contribution was to classify the topolog- perfectly smooth and reversible; there are no qualita-
ically distinct types of singularities of potential func- tive changes in the behavior of the control system as
tions in n dimensions. Next, Thom took the unusual the signal varies up and down. In the second case,
step – unusual for a famous mathematician – to spec- there is a region of bistability between the two
ulate that the bifurcations he had characterized might thresholds, and the behavior of the control system is
underlie the “unfolding” of a fertilized egg into a larva. qualitatively different over three ranges of signal
Following Waddington’s hypothesis that embryonic strength: for S < yinact there is a single stable steady
development is the evolution of a dynamical system state with R small; for yinact < S < yact the control
on an “energy landscape,” Thom pointed out that his system can persist in either of two attractors (R small
complete classification of the qualitative changes of or R large) that are separated by an unstable steady
behavior that could be observed under this type of state; and for S > yact there is a single stable
gradient dynamic must provide the key to understand- steady state with R large. This is exactly the case of
ing morphogenetic transitions. a one-parameter bifurcation diagram with saddle-
Thom’s ideas were bitterly opposed by both theo- node bifurcations bounding a zone of bistability
retical and experimental biologists of his generation, (Fig. 1a).
C 276 Cell Cycle Model Analysis, Bifurcation Theory

Cell Cycle Model Analysis, Bifurcation Theory,


Table 1 Generic bifurcations of dynamical systems
(c) Name Characteristics Cell cycle correlate
Saddle-node Creation and Irreversible transitions;
(b) annihilation of pairs of checkpoints
steady states
Hopf, Birth of stable limit Spontaneous MPF
(a) supercritical cycles of small oscillations in embryos
R
amplitude and finite
frequency
Hopf, Birth of unstable limit Subcritical Hopf and
subcritical cycles of small cyclic fold bifurcations
amplitude and finite occur in pairs and may
frequency correlate with
Cyclic fold Creation and embryonic MPF
annihilation of pairs of oscillations
limit cycles
qinact qact
Saddle-loop Annihilation of a limit SL and SNIPER
S
cycle by a homoclinic bifurcations are closely
orbit at a saddle point; related; they are
Cell Cycle Model Analysis, Bifurcation Theory,
finite amplitude and involved in irreversible
Fig. 2 Signal-response curve. The experimentalist can vary
small frequency transitions in the yeast
the signal strength S (say, the concentration of an extracellular
ligand) and observe the response R (say, the expression level of SNIPER1 or Annihilation of a limit cell cycle (budding
a gene induced by S). As S is slowly increased, the expression of SNIC2 cycle by a homoclinic yeast and fission yeast)
R turns on abruptly; a typical threshold-type response. What (synonomous) orbit at a saddle-node;
happens as S is slowly decreased? There are three possibilities. finite amplitude and
(a) The gene expression turns on and off at the same threshold small frequency
signal strength: the signal-response curve is smooth and revers- SNIPER ¼ saddle-node infinite-period
1

ible; e.g., a Hill function. (b) The threshold for gene inactivation SNIC ¼ saddle-node invariant-circle
2

(yinact) is lower than the threshold for gene activation (yact): The
signal-response curve has a region of bistability and functions
like a toggle switch. (c) The gene cannot be inactivated by
lowering the signal strength even to zero: The control system
functions as a one-way switch These examples suggest that the signal-response
curves often measured by cell physiologists are none
other than one-parameter bifurcation diagrams in the
It is possible that yinact < 0, in which case the signal, parlance of applied mathematicians. If we may associ-
S ¼ [S] (a positive number), cannot be made small ate abrupt, qualitative changes in signal-response char-
enough to flip the switch off. In this case, the control acteristics of living cells with bifurcations in vector
system is said to be a one-way switch. By increasing S, fields of nonlinear dynamical systems, then it is natural
the switch can be turned on, but it cannot be turned off to ask how many different types of generic bifurcations
by decreasing S. are exhibited by dynamical systems and what do they
There are many convincing examples of toggle look like? Are there hundreds of different types of
switches in molecular and cell biology generally bifurcations to match the seemingly boundless variety
(Tyson et al. 2003) and in cell cycle regulation partic- of cellular behaviors? Or are all the peculiarities of
ularly (▶ Cell Cycle Dynamics, Bistability and cellular signal processing simply variations on a few
Oscillations). common themes? (Table1)
In another common physiological situation, The answer is the latter. In addition to the saddle-
a cellular process begins to oscillate when node and Hopf bifurcations illustrated in Fig. 1, there
a stimulating signal gets large enough (Goldbeter are only a few other common, generic, one-parameter
1996). In this case, the signal-response curve exhibits bifurcations: subcritical Hopf, cyclic fold, saddle-loop,
a Hopf bifurcation, as in Fig. 1b. and SNIPER bifurcations (Fig. 3).
Cell Cycle Model Analysis, Bifurcation Theory 277 C
a b Hopf
c Hopf

Subcritical
xi Hopf xi
CF SNIPER
xi
SN
C
SL

pj pj pj

Cell Cycle Model Analysis, Bifurcation Theory, Fig. 3 The other common types of bifurcation points. (a) Subcritical Hopf
bifurcation and cyclic fold (CF) bifurcation. (b) Saddle-loop (SL) bifurcation. (c) Saddle-node infinite-period (SNIPER) bifurcation

Of this we can be certain: Cell physiology is Relation to Cell Cycle Regulation


governed by underlying regulatory networks that con- Bifurcation theory has been used to study the molec-
sist of biochemical reactions among genes, RNAs, ular basis of cell cycle regulation (e.g., Borisuk and
and proteins. These networks are dynamical systems; Tyson 1998; Tyson et al. 2003; Csikász-Nagy et al.
their dynamics are governed by nonlinear ODEs 2006). The basic idea behind these papers is that
(biochemical kinetic equations). The solutions of a eukaryotic cell progresses through the DNA repli-
these equations determine the time-dependent behav- cation–division cycle by a series of transitions (G1/S,
ior of the cell, and the nature of these solutions are G2/M, M/G1) that correspond to bifurcations of
determined by the attractors and repellers of the the underlying molecular control system. Before
dynamical vector field in state space. Qualitative each transition, the cell is arrested in a stable steady
changes in the behavior of cells must be reflections state of the dynamical system that corresponds to
of qualitative changes in the nature of these attractors a particular physiological state: G1-arrest, G2-arrest,
and repellers, i.e., on the generic bifurcations of or metaphase-arrest. To pass to the next stage of
nonlinear vector fields. Hence, the six types of bifur- the cell cycle, the stable arrested state must be
cations we have introduced must be the basic building lifted, either by annihilation (at a SN or SNIPER
blocks of all cellular signal-response curves. It is this bifurcation) or by losing stability (at a Hopf bifurca-
connection between signal-response curves of living tion). To prevent a transition, e.g., if there is some
cells and one-parameter bifurcation diagrams of damage to the DNA or some problem in aligning
dynamical systems that is the heritage of Rene chromosomes on the metaphase plate, then a “check-
Thom’s proposal. point” mechanism stabilizes the arrested state
An important caveat to this interpretation of bifur- by moving the bifurcation point to some higher
cation theory is the fact that single cells are very small, value of the progression signal(s). For example,
with limited numbers of molecules (10s, 100s, 1,000s) a schematic diagram of the fission yeast cell cycle is
of each of the interacting species. Hence, continuous provided in Fig. 4.
ODEs are only a first approximation to the dynamics of During early embryogenesis, from fertilization to
intracellular molecular control systems. The effects of the mid-blastula transition, mitotic cycles proceed rap-
stochastic variations of small numbers of molecules idly and synchronously, without checkpoint controls.
can have significant effects on the qualitative features In this case, the DNA replication–division cycles seem
of dynamical systems. Stochastic effects must be given to be driven by spontaneous limit-cycle oscillations.
due consideration, but subsequent to a thorough study For more details, see the entry ▶ Cell Cycle Dynamics,
of the system by bifurcation theory. Bistability and Oscillations.
C 278 Cell Cycle Modeling Using Logical Rules

References

Andronov AA, Vitt AA, Khaikin SE (1966) Theory of oscilla-


tors. Pergamon, London (first published in Russian in1937)
Borisuk MT, Tyson JJ (1998) Bifurcation analysis of a model of
CDK activity

mitotic control in frog eggs. J Theor Biol 195:69–85


Csikász-Nagy A, Battogtokh D, Chen KC, Novák B, Tyson JJ
(2006) Analysis of a generic model of eukaryotic cell-cycle
regulation. Biophys J 90:4361–4379
Goldbeter A (1996) Biochemical oscillations and cellular
rhythms. Cambridge University Press, Cambridge
Kuznetsov YA (2004) Elements of applied bifurcation theory,
Thirdth edn. Springer, New York
Odell GM (1980) Qualitative theory of systems of ordinary
differential equations, including phase plane analysis and
1 2 the use of the Hopf bifurcation theorem. In: Segel LA (ed)
Mathematical models in molecular and cell biology.
Scrit Cell size
Cambridge University Press, Cambridge
Poincaré HJ (1899) Les méthodes nouvelles de la mécanique
Cell Cycle Model Analysis, Bifurcation Theory, céleste, vol 1–3. Gauthiers-Villars, Paris (English translation
Fig. 4 A schematic diagram of the fission yeast cell cycle. by D. Goroff, 1993, American Institute of Physics, New
The bifurcation diagram in Fig. 3c is interpreted here as a sig- York)
nal-response curve relating cyclin-dependent kinase (CDK) Strogatz SH (1994) Nonlinear dynamics and chaos. Addison-
activity to cell growth. CDK is a protein kinase that governs Wesley, Reading
progression through the cell cycle. In fission yeast, low CDK Thom R (1989) Structural stability and morphogenesis.
activity corresponds to a G1/S/G2 state and high activity to Addison-Wesley, Reading (first published in French in1972)
mitosis. Cell size can be thought of as a bifurcation parameter: Tyson JJ, Chen KC, Novak B (2003) Sniffers, buzzers, toggles
Cell size increases slowly as the cell grows, and the CDK control and blinkers: dynamics of regulatory and signaling pathways
adapts quickly to an attractor of the vector field at the current size in the cell. Curr Opin Cell Biol 15:221–231
of the cell. The red curve is a “cell cycle trajectory.” At the size
of a newborn cell (size ¼ 1), the only attractor is a stable steady
state of low CDK activity. After a brief G1 period, the cell
replicates its DNA and then pauses in G2 phase until it grows
large enough to surpass the SNIPER bifurcation. The bifurcation
point is the “critical size” for the G2/M transition in fission yeast. Cell Cycle Modeling Using Logical Rules
The dynamical system is attracted to a large amplitude limit
cycle, which carries the cell into mitosis (CDK increasing). Adrien Fauré1 and Denis Thieffry2
The cell exits mitosis when CDK activity is destroyed, and this 1
Graduate School of Science and Engineering,
is the signal for the cell to divide. Cell size is abruptly halved,
and the newborn cells (each of size ¼ 1) are attracted to the stable Yamaguchi Univesity, Yamaguchi, Japan
2
G1/S/G2 steady state. Notice that the cell cycle time (the time IBENS - UMR ENS - CNRS 8197 - INSERM 1024,
required to progress around the red loop) is identical to the mass- Paris, France
doubling time (the time necessary to grow from birth size ¼ 1 to
division size ¼ 2)

Synonyms

Cross-References Boolean modeling of cell cycle control; Logical


modeling of cell cycle control
▶ Cell Cycle Dynamics, Bistability and Oscillations
▶ Cell Cycle Dynamics, Irreversibility
▶ Cell Cycle Modeling, Differential Equation Definition
▶ Cell Cycle Models, Sensitivity Analysis
▶ Cell Cycle of Early Frog Embryos This essay provides a short overview of recent appli-
▶ Cell Cycle, Budding Yeast cations of Boolean and multilevel logical approaches
▶ Cell Cycle, Fission Yeast to cell cycle modeling.
Cell Cycle Modeling Using Logical Rules 279 C
Characteristics mutant phenotypes demonstrates that the dynamics of
the cell cycle can be captured independently of quan-
Introduction titative assumptions on kinetic parameters (Bornholdt
Extensively studied, the intricate regulatory circuits 2008; Fauré and Thieffry 2009).
controlling cell cycle (▶ Cell Cycle) in Eukaryotes In what follows, we start by illustrating the logical
constitute a daunting challenge and a recurrent refer- approach through its application to the definition and
ence in systems biology. Although quantitative data the analysis of a simplified model of the regulatory C
may be sometimes more easily obtained at the system network controlling the cell cycle in budding yeast.
level (as it is the case with cell cycle time length or This reference model is used to emphasise variations in
mass at division), cell cycle progression is essentially the definition of logical rules or yet regarding the use of
described as a succession of qualitatively different different updating assumptions. The following section
phases. The phases G1, S, G2, M, and G0 (and sub- introduces more recent, prospective studies devoted to
phases) can be characterized by the qualitative level of mammalian cell cycle control. The essay ends by
key markers, most particularly the different cyclins a brief outlook section.
(▶ Cyclins and Cyclin-Dependent Kinases). Further-
more, mutant phenotypes observations are often Boolean Modeling of the Molecular Network
limited to qualitative assessment of viability or arrest Controlling Budding Yeast Cell Cycle
in a specific phase. The focus of published studies ranges from molecular
Consequently, the logical formalism (▶ Logical regulatory networks to phenomenological models, the
Model) has been recurrently used to build qualitative different scales being sometimes integrated in the
dynamical models of the regulatory networks control- same network. For example, some models explicitly
ling cell cycle in different organisms. Notably, as bud- incorporate variables representing cell cycle phases
ding yeast (▶ Cell Cycle, Budding Yeast) and fission (Irons 2009), while others implicitly deduce them
yeast (▶ Cell Cycle, Fission Yeast) have served as from the level of key variables (Li et al. 2004; Fauré
model organisms for pioneering experimental studies et al. 2009).
of cell cycle control, they constitute reference systems Accordingly, as discussed by B€ahler and Svetina
for many modeling studies. (2005), there is no direct relationship between regula-
Phenomenological models can already bring inter- tory circuits and the molecular processes they repre-
esting insight into biological mechanisms. For exam- sent. The same type of regulatory arc may be used to
ple, the Boolean model published by B€ahler and represent different processes, such as transcriptional
Svetina (2005) focuses on the different growth modes activation of a gene, or post-translational modification
displayed by the fission yeast. Mutants are then defined of a protein. Conversely, similar processes may be
as deletions of specific arrows in the model and com- represented differently, in particular those involving
pared to known mutations that affect growth modes. the representation of mass flow, such as multiproteic
Such phenomenological understanding hints toward complex formation (Fauré and Thieffry 2009).
the underlying molecular mechanism. Regarding the definition of the logical rules asso-
In the recent years, several groups have focused on ciated with each regulatory component, two main
the development of predictive logical models for the approaches are used. A first approach consists in
molecular regulatory networks controlling cell cycle in simply summing the positive and negative influences
budding yeast and fission yeast (Li et al. 2004; on each node (sometimes considering different
Bornholdt 2008; Irons 2009; Fauré et al. 2009). Several weights for different edges of the network);
of these models have been used to assess novel simu- depending on whether this sum lies below, above, or
lation methods or model-building strategies, or yet to at a given threshold, the value of the variable will tend
address theoretical questions regarding modularity to decrease, increase, or remain unchanged, respec-
(Irons 2009; Fauré et al. 2009) or dynamical robustness tively. This formalism was used in the seminal Bool-
(Mangla et al. 2010). Globally, the success of logical ean model shown in Fig. 1, left. Using a generic
approaches at reproducing complex wild-type and summation rule with a threshold set to zero, the
C 280 Cell Cycle Modeling Using Logical Rules

a b
Cln3 Cln3

SBF MBF SBF MBF

Cln1_2 Sic1 Cln1_2 Sic1

Clb5_6 Clb5_6

Clb1_2 Clb1_2

Cdh1 Cdh1
Mcm1_SFF Mcm1_SFF

Cdc20_Cdc14 Swi5 Cdc20_Cdc14 Swi5

Cell Cycle Modeling Using Logical Rules, Fig. 1 Boolean regulators. Right: Boolean model adapted from the previous one
models of the Budding yeast cell cycle. Left: Original Boolean using specific logical formulae (see Table 1). Both graphs have
model presented in (Li et al. 2004). This model was simulated been drawn using GINsim (Naldi et al. 2009). Green normal
synchronously (see Fig. 2) using a generic summation rule. In arrows stand for activation, red T-arrows stand for inhibition,
this framework, self-inhibiting arrows had to be introduced to yellow T-arrows represent the “self-degradation” loops intro-
account for degradation of components that only have positive duced in (Li et al. 2004)

authors had to include “self-degradation” loops logical operators. In the absence of Cln3, this model
(in yellow in Fig. 1, left), which do not represent has a unique stable state corresponding to the G0
true auto-inhibitory processes. Using a synchronous phase. Starting with initial conditions enabling the
(deterministic) updating strategy, they identified initiation of the cell cycle (with Cln3 ON), this sim-
seven stable states as the sole attractors of the system plified model yields a cyclic attractor, which recapit-
and further established that the large majority of ulates the sequence of events along the cell cycle (see
Boolean states lead to the same stable state, Fig. 2, right).
which corresponds to the G1 phase. Furthermore, Using this second approach, several groups have
starting with initial conditions matching the entry designed more sophisticate models encompassing
into cell cycle, as shown in Fig. 2 (left), the system additional regulatory components (e.g., checkpoints
follows a trajectory consistent with the succession (▶ Cell Cycle Checkpoints)) and recapitulating
of molecular events and phases along budding yeast the behavior of the systems following various
cell cycle. kinds of perturbations (e.g., single or multiple loss-
A second approach consists in defining of-function mutations) (Irons 2009; Fauré et al.
a specific logical rule for each node in the network. 2009). Interestingly, these studies show that various
Fig. 1 (right) shows a transposition of the model by Li kinetic assumptions lead to similar consistent
et al. (2004) into a logical regulatory graph, while dynamical behaviors, indicating that the control net-
Table 1 lists the logical rules associated with each work is sufficiently robust to cope with stochastic
component of the network. Notice that the network is fluctuations of synthesis/degradation rates. The
somewhat simplified (elimination of the auto- multi-level model presented by Fauré et al. is argu-
inhibitory loops), while we have now different types ably the most comprehensive logical cell cycle
of regulatory rules combining NOT, AND, and OR model to date.
Cell Cycle Modeling Using Logical Rules 281 C
Cell Cycle Modeling Using

Cdc20_Cdc14

Cdc20_Cdc14
Logical Rules, Fig. 2 Left:

Mcm1_SFF

Mcm1_SFF
simulation of the original
model by Li et al. (2004) (cf.

Cln1_2

Clb5_6

Clb1_2

Cln1_2

Clb5_6

Clb1_2
Phase

Phase
Cdh1

Cdh1
Figure 1, left). Starting from an

Swi5

Swi5
MBF

MBF
Step

Step
Cln3

Cln3
SBF

Sic1

SBF

Sic1
initial state corresponding to
G1, simulation yields
a sequence of states that 1 G1
follows the normal course of
1 START C
the cell cycle through the 2 G1
2 G1
different phases G1, S, G2, and
M, until reaching a stable G1 3 S
state. The initial state 3 G1
corresponds to an “excited”
4 G1 4 G2
G1 stable state, with the Cln3
variable set to 1. Right:
simulation of the revised 5 S 5 M
model, with the input Cln3
always ON (cf. Figure 1, 6 G2 6 M
right). Black cells denote
activity of the corresponding 7 M 7 G1
components, white cells
denote inactivity 1 G1
8 M

9 M

10 M

11 M

12 M

13 Stationary G1

Cell Cycle Modeling Using Logical Rules, Table 1 Specific Toward the Modeling of Mammalian Cell Cycle
logical rules associated with the revised version of the model by Networks
Li et al. (2004) presented in Figure 1 (right). “&”, “|”, and “!” Proper modeling of the regulatory network controlling
stand for the Boolean operators AND, OR, and NOT,
respectively mammalian cell cycle remains a daunting challenge.
Most published models concentrate on the control of
Components Logical rules
the G1-S transition (see Fauré et al. 2006, for a generic
MBF Cln3 & !Clb1_2
Boolean model). Various groups are currently devel-
SBF Cln3 & !Clb1_2
oping models of human cell cycle in connection with
Cln1_2 SBF
Cdh1 !(Cln1_2 | Clb5_6 | Clb1_2) | Cdc20_Cdc14
cancer studies, with the aim of identifying intervention
Swi5 ((Mcm1_SFF | Cdc20_Cdc14) & !Clb1_2) | points to block uncontrolled proliferation. One diffi-
(Mcm1_SFF & Cdc20_Cdc14 & Clb1_2) culty is that the actors to consider depend to some
Cdc20_Cdc14 Clb1_2 | Mcm1_SFF extent on the cellular context (cell type, environmental
Clb5_6 (MBF | !Sic1) & !Cdc20_Cdc14 signals, presence of mutations).
Sic1 (Swi5 & Cdc20_Cdc14) | For example, Sahin et al. developed a Boolean
!(Cln1_2 | Clb5_6 | Clb1_2) model of the control of the G1-S transition by the
Clb1_2 !(Cdh1 | Sic1 | Cdc20_Cdc14) | EGF pathway to predict potential targets for the treat-
(Clb5_6 & Mcm1_SFF)
ment of trastuzumab-resistant breast cancer (Sahin
Mcm1_SFF Clb1_2 | Clb5_6
et al. 2009). Predictions yielded by loss-of-function
C 282 Cell Cycle Modeling, Differential Equation

simulations were assessed using RNA interference and ▶ Logical Model


proteomic (RPPA) essays. A refined version of the ▶ Monte Carlo Simulation
model was then derived to cope with observed
inconsistencies.
In order to account for the spatial aspects of tumor References
formation, logical models of regulatory networks can
be combined with other formal frameworks to generate Abou-Jaoudé W, Ouattara DA, Kaufman M (2009) From struc-
ture to dynamics: frequency tuning in the p53-mdm2 net-
multiscale models. For example, a model for avascular
work. I. logical approach. J Theor Biol 258(4):561–577
tumor growth has been proposed (Jiang et al. 2005), B€ahler J, Svetina S (2005) A logical circuit for the regulation of
where the upper, extracellular layer consists of fission yeast growth modes. J Theor Biol 237(2):210–218
reaction-diffusion equations for growth and inhibitory Bornholdt S (2008) Boolean network models of cellular regula-
tion: prospects and limitations. J R Soc 5(Suppl 1):S85–S94
factors; the middle, cellular layer is a lattice Monte
Fauré A, Thieffry D (2009) Logical modelling of cell cycle
Carlo model (▶ Monte Carlo Simulation); and the control in eukaryotes: a comparative study. Mol Biosyst
lower layer is a simplified Boolean model of the regu- 5:1569–1581
latory network that controls the G1-S transition simu- Fauré A, Naldi A, Chaouiya C, Thieffry D (2006) Dynamical
analysis of a generic Boolean model for the control of the
lated synchronously. To each updating corresponds
mammalian cell cycle. Bioinformatics 22(14):e124–e131
a fixed amount of time, during which simulation of Fauré A, Naldi A, Lopez F, Chaouiya C, Ciliberto C, Thieffry D
the upper levels proceeds proportionately. This (2009) Modular logical modelling of the budding yeast cell
integration of a simple logical model of the molecular cycle. Mol Biosyst 5:1787–1796
Irons DJ (2009) Logical analysis of the budding yeast cell cycle.
cell cycle network with the other two layers proved
J Theor Biol 257(4):543–559
accurate enough to match observed upper-level tumor Jiang Y, Pjesivac-Grbovic J, Cantrell C, Freyer JP
properties. Furthermore, based on this model, the (2005) A multiscale model for avascular tumor growth.
authors where able to infer that tumor cell quiescence Biophys J 89(6):3884–3894
Li F, Long T, Lu Y, Ouyang Q, Tang C (2004) The yeast cell-
is more likely due to failure to enter the S phase
cycle network is robustly designed. Proc Natl Acad Sci USA
rather than to mechanical constraints exerted by the 101(14):4781–4786
surrounding cells. Mangla K, Dill DL, Horowitz MA (2010) Timing robustness in the
budding and fission yeast cell cycles. PLoS ONE 5(2):e8906
Naldi A, Berenguier D, Fauré A, Lopez F, Thieffry D, Chaouiya
Outlook
C (2009) Logical modelling of regulatory networks with
Arguably, logical approaches are well suited to cope GINsim 2.3. BioSyst 97(2):134–139
with large regulatory networks, in particular when Sahin O, Fr€ ohlich H, L€
obke C, Korf U, Burmester S, Majety M,
most available data are qualitative or semi- Mattern J, Schupp I, Chaouiya C, Thieffry D, Poustka A,
Wiemann S, Beissbarth T, Arlt D (2009) Modeling erbb
quantitative. Furthermore, consistent logical models
receptor-regulated g1/s transition to find novel targets for
can serve as templates to define more quantitative de novo trastuzumab resistance. BMC Syst Biol 3:1
models using more differential or stochastic formal-
isms, as exemplified by the recent modeling study
devoted to DNA repair mechanism by Abou-Jaoudé
et al. (2009). Although logical models have yielded Cell Cycle Modeling, Differential
only limited impact onto experimental studies, this Equation
situation should change in the future as collaborations
between modelers and experimentalists intensify. John J. Tyson
Department of Biological Sciences, Virginia
Polytechnic Institute and State University,
Cross-References Blacksburg, VA, USA

▶ Cell Cycle
▶ Cell Cycle Checkpoints Synonyms
▶ Cell Cycle, Budding Yeast
▶ Cell Cycle, Fission Yeast Kinetic equations; Rate equations; Reaction-diffusion
▶ Cyclins and Cyclin-dependent Kinases equations
Cell Cycle Modeling, Differential Equation 283 C
Definition
F
The physiological properties of cells – their growth, X
division, movement, signaling, metabolism, differentia-
tion, and death – are all controlled by gene-protein G H
regulatory networks of considerable complexity.
F
Thanks to the revolutionary advances of molecular XP C
genetics in the latter part of the twentieth century,
much is known now about the genes and proteins that Cell Cycle Modeling, Differential Equation, Fig. 1 A typical
constitute these networks and about their interactions, as molecular regulatory network. In this diagram, letters denote
chemical species (proteins) and solid arrows represent chemical
well as the meso-scale topology of particular regulatory reactions transforming substrate(s) into product(s). A letter
networks and the global topology of genome-wide sur- sitting next to an arrow denotes the enzyme catalyzing the
veys of gene, mRNA and protein interactions. The anal- reaction. Dashed arrows represent “influences” of one protein
ysis of genome-wide (“omics”) data is still very much on another. Barbed arrows denote “activation” and blunt arrows
denote “inhibition.” The dynamics of this reaction network is
dominated by statistical methods, but at the local and represented by the system of five ODEs in Eq. 2
meso-scale of network complexity it is possible to build
detailed, accurate, and predictive models of the dynam-
ics of network behavior by using differential equations. each of these reactions (e.g., the total concentration of
The applicability of differential equations to model- the enzyme that catalyzes reaction j), and kj ¼ the rate
ing network dynamics in general, and cell cycle regu- constant(s) needed to express the material flux through
lation in particular, is based on the following logic reaction j. Similarly, Ril ¼ rate of the l-th reaction that
(Tyson et al. 2001). A molecular regulatory network removes species i, etc.
can be described, depending on the level of detail Regulatory networks like Fig. 1 are sometimes
available from experimental investigations, by called “wiring” diagrams, in analogy to the schematic
(1) a system of biochemical reactions or (2) an “influ- diagrams of electrical devices. Just like the dynamical
ence” diagram or (3) a hybrid of the two types. These behavior of an electrical device can be predicted (in
types of descriptions are illustrated in Fig. 1. Realistic practice) from Kirchoff’s Laws (ODEs for the voltages
networks are, of course, much more complex than the at various points in the circuit), so the dynamics of
example given. Whether the network is described by a molecular regulatory network can be predicted (in
chemical reactions or influences or a hybrid of the two, principle) from ODEs (Eq. 1). Unfortunately, the anal-
the network diagram is trying to tell us how one bio- ogy to electrical engineering goes no further. For an
chemical species is affecting the rates of production or electrical device, we can obtain a schematic wiring
removal of another species. As such, the network dia- diagram from the manufacturer, as well as
gram can be converted into a set of ordinary differen- a specification of the numerical values of the parame-
tial equations (ODEs), one ODE for each time-varying ters that characterize each component (resistances,
biochemical concentration: capacitances, etc.). For the living cell, we must guess
the wiring diagram, and we must estimate the rate
d½Xi X   constants from the very experiments we are trying to
¼ Pij ½X... ; ½M... ; kj explain. In essence, we must “reverse engineer” the
dt j
X cell’s circuitry by performing well-designed experi-
 Ril ð½X... ; ½M... ; kl Þ (1) ments to probe the input-output (signal-response) char-
l acteristics of the cell under normal and contrived
conditions (including mutations which scramble the
In this equation, [Xi] ¼ concentration of species i, wiring diagram in controlled ways).
Pij ¼ rate of the j-th reaction that produces species i, For the many ways that differential equations have
[X. . .] ¼ concentrations of the time-varying biochem- been used in molecular and cell biology, see the classic
ical species that participate in each of these reactions, books by Edelstein-Keshet (1988), Murray (1989),
[M. . .] ¼ constant concentrations of the time-invariant Goldbeter (1996), Fall et al. (2002), and Keener and
biochemical species (“modifiers”) that participate in Sneyd (2009).
C 284 Cell Cycle Modeling, Differential Equation

Characteristics response to the dynamical variable X. In return, the


dynamical variables X and XP are changing in response
Wiring Diagrams and Rate Equations to F, G, and H according to Eq. 2a, b. It would be
Figure 1, which will serve as our example of modeling impossible to keep track of the implications of all these
by ODEs, can be interpreted as a model of MPF changes in one’s head; it is the job of the ODEs to track
dynamics in a fertilized egg (▶ Cell Cycle Dynamics, the variables for us.
Bistability and Oscillations.) In this example, X ¼
MPF ¼ dimer of Cdk1 and cyclin B, Numerical Simulation of ODEs: Parameter Values
XP ¼ preMPF ¼ phosphorylated (inactive) form of and Initial Conditions
MPF, G ¼ Wee1 ¼ kinase that inactivates MPF, H ¼ Before we can solve the ODEs (Eq. 2) we must specify
Cdc25 ¼ phosphatase that converts preMPF into active numerical values for all the “parameters” (rate con-
MPF, F ¼ APC/Cdc20 ¼ ubiquitin-ligase that labels stants, Michaelis constants, o’s and s’s); see Table 1.
cyclin B for proteolysis. The production and removal Usually, these parameters must be estimated from
of X and XP are described by chemical reactions experimental data, but in this example we assign
(synthesis, degradation, phosphorylation, dephosphor- values to illustrate some interesting and physiologi-
ylation), and kinetic equations for the rates of change cally relevant solutions of the ODEs. In addition to
of X and XP can be written by standard principles of parameter assignments, we must also specify “initial
biochemical kinetics: conditions” (values at t ¼ 0) for the five time-varying
species: X(0), XP (0), F(0), G(0), H(0); see Table 2.
With this information, we can now instruct a com-
dX kg GX kh HXp
¼ ksx  kdx FX  þ puter, using the ODEs (Eq. 2), to take small time steps,
dt Kmg þ X Kmh þ Xp
(2a, b) dt, and update the values of the dynamical variables:
dXp kg GX kh HXp
¼ kdx FXp þ þ
dt Kmg þ X Kmh þ Xp
Xðt þ dtÞ ¼ XðtÞ
kg GXðtÞ kh HXP ðtÞ
In each case, the rate of a reaction is given either by þ ksx  kdx XðtÞ  þ dt
Kmg þ XðtÞ Kmh þ XP ðtÞ
the law of mass action (for synthesis and degradation
reactions) or by the Michaelis-Menten rate law (for the XP ðt þ dtÞ ¼ XP ðtÞ
phosphorylation and dephosphorylation reactions). kg GXðtÞ kh HXP ðtÞ
þ kdx XP ðtÞ þ  dt
(Which rate law we use for an enzyme-catalyzed reac- Kmg þ XðtÞ Kmh þ XP ðtÞ
tion depends on whether the enzyme tends to work in etc:
its “linear” regime or in its “saturated” regime.) Reg-
(3a, b, c, d, e)
ulation of the enzymes, F, G, and H in Fig. 1, is only
specified as “influences”: X “activates” F and H, and
X “inhibits” G. We choose to describe these influences The computer starts at t ¼ 0, with the given initial
by generic ODEs: conditions, computes the instantaneous rates of change
(the functions in [. . .] above), and then uses Eq. 3 to
dF compute the values of the five dynamical variables at
¼ lf ½Cðsf fof0 þ of1 XgÞ  F t ¼ 0 + dt. The computer then repeats the process to get
dt
dG   the values of the dynamical variables at t ¼ 2dt, 3dt,
¼ lg Cðsg fog0 þ og1 XgÞ  G (2c, d, e) etc. For dt small enough, this procedure gives an accu-
dt
dH rate numerical solution of the ODEs. Of course, there
¼ lh ½Cðsh foh0 þ oh1 XgÞ  H are more sophisticated and efficient algorithms for
dt
  solving nonlinear ODEs, but they are all based on the
where CðxÞ ¼ 1 1 þ ex is a “soft Heaviside” func- fundamental procedure just described.
tion; C(x) varies smoothly from 0 for x < < 1 to 0.5 In Fig. 2, we present numerical simulations of
for x ¼ 0 to +1 for x >> 1. According to Eq. 2c, d, e, F, ODEs (Eq. 2) for the parameter values in Table 1,
G, and H are continually changing to keep up with the with modifications given in the figure legend. For the
soft Heaviside functions, which are changing in case in Fig. 2a, the ODEs have a single stable steady
Cell Cycle Modeling, Differential Equation 285 C
Cell Cycle Modeling, Differential Equation, Table 1 Parameter values for the simulations in Fig. 2
kg ¼ kh ¼ 10 Kmg ¼ Kmh ¼ 0.05 lf ¼ lg ¼ lh ¼ 1 of1 ¼ 1
sg ¼ 3 og0 ¼ 0.2 og1 ¼ 0.7 sh ¼ 3 oh0 ¼ 0.2 oh1 ¼ 0.8
Fig. 2a Fig. 2b Fig. 2c Fig. 2a Fig. 2b Fig. 2c
ksx ¼ 0.04 ksx ¼ 0.1 ksx ¼ 0.1 kdx ¼ 1 kdx ¼ 1 kdx ¼ 0.5
sf ¼ 20 sf ¼ 20 sf ¼ 5 of0 ¼ 0.3 of0 ¼ 0.3 of0 ¼ 0.4
C
Cell Cycle Modeling, Differential Equation, Table 2 Initial fitting simulations to experimental data can be a very
conditions for the simulations in Fig. 2 difficult task. It requires careful choice of experimental
X XP F G H conditions and sophisticated methods of searching the
Fig. 2a 0.1149 1.5459 0.0241 0.5887 0.4197 parameter space. A lead-in to this extensive literature
Fig. 2b 0.1036 1.0602 0.0181 0.5961 0.4111 is provided by Apgar et al. (2010).
Fig. 2c(low) 0.1058 0.9649 0.1868 0.5933 0.4143
Fig. 2c(med) 0.1946 0.526 0.318 0.544 0.471 Alternative Modeling Strategies
Fig. 2c(high) 0.3327 0.1472 0.4167 0.4753 0.5495 In this chapter, we have discussed modeling by
nonlinear ODEs, assuming that the cell is a well-
mixed chemical reactor. Surprisingly, in some cases
state solution. In Fig. 2b the steady state solution is this is not a bad approximation. For example, the time
unstable and the system of ODEs exhibits spontaneous it takes for a typical protein to diffuse across a yeast cell
oscillations of all the variables. In Fig. 2c, the system (diameter 5 mm) is only about 10 s, which is very short
exhibits a phenomenon called “bistability,” i.e., two compared to the interdivision time (at least 90 min) of
stable steady states separated by an unstable steady a yeast cell. Hence, for models of the yeast cell cycle,
state. the cytoplasm is essentially well-mixed. Of course, one
might want to distinguish between nuclear and cytoplas-
Analysis of Nonlinear Ordinary Differential mic compartments, but this situation can be handled
Equations with nonlinear ODEs by including fluxes of components
Why does the system of nonlinear ODEs in Eq. 2 show into and out of the nucleus.
the many different sorts of behavior illustrated in In other situations, where the time scale is shorter
Fig. 2? Might the system show other, qualitatively and/or the space scale is larger, one must take into
different sorts of behavior? For what values of the account the coupling of local chemical reactions with
parameters are each of the types of solutions expected? molecular diffusion (and possibly vectorial transport
The answers to these sorts of questions are provided by processes, e.g., along microtubules). In these cases, the
bifurcation theory, which is described in the article correct modeling approach might be partial differential
“cell cycle model analysis, bifurcation theory.” equations.
In some cases, when very little is known about the
Parameter Estimation underlying biochemistry of a control system, systems
If we have experimental measurements of some of the biologists prefer to model the system with a Boolean
dynamical variables at a sequence of time points, under network, an approach described in the article on “cell
a variety of experimental conditions, both natural and cycle modeling, logical rules.”
contrived (e.g., in mutant cells), then it is sometimes
possible to estimate the parameters in a dynamical Software for Dynamic Modeling
model (and to test the adequacy of the wiring diagram) There are several convenient software packages for
by least-squares fitting of numerical simulations of the modeling molecular regulatory networks with differ-
model to the experimental data. For instance, the ential equations:
curves in Fig. 2b look very much like the measure- Copasi www.copasi.org
ments of Pomerening et al. (2005; their Fig. 1V). How- XPP http://www.math.pitt.edu/~bard/xpp/xpp.html
ever, even for a modest network such as our example, Madonna http://www.berkeleymadonna.com/
with five dynamical variables and 18 parameters, download.html
C 286 Cell Cycle Modeling, Differential Equation

1 2
a 1.8 b
0.8 1.6 XT(t)
1.4
0.6 1.2
X(t)

1
0.4 0.8
0.6
0.2 0.4 X(t)
SSS 0.2
0 0
0 1 2 3 4 5 0 5 10 15 20 25 30 35 40 45 50
t t
0.6
c
0.5

0.4
SSS
X(t)

0.3

0.2 USS

0.1 SSS

0
0 2 4 6 8 10
t

Cell Cycle Modeling, Differential Equation, Fig. 2 Repre- returning to the steady state. This behavior is called “excitability.”
sentative simulations of ODEs (Eq. 2). The parameter values and (b) A stable limit cycle oscillation. In addition to X(t) ¼ [MPF],
initial conditions for these simulations are given in Tables 1 and 2. we also plot XT(t) ¼ X(t) + XP(t) ¼ [total cyclin]. The period of
(a) A unique stable steady state (sss). Small perturbations away oscillation is 29 min. Compare to Fig. 1V in Pomerening et al.
from the steady state return immediately. A larger perturbation, (2005). (c) Two coexisting stable steady states separated by an
X(0) ¼ 0.5, exhibits a transient pulse of MPF activity before unstable steady state (uss)

Reaction-diffusion modeling with partial differen- References


tial equations is best done by
Virtual Cell http://www.nrcam.uchc.edu/ Apgar JF, Witmer DK, White FM, Tidor B (2010) Sloppy
models, parameter uncertainty, and the role of experimental
design. Mol Biosyst 6:890–900
Edelstein-Keshet L (1988) Mathematical models in biology.
Random House, New York
Cross-References Fall CP, Marland ES, Wagner JM, Tyson JJ (2002) Computa-
tional cell biology. Springer-Verlag, Berlin
▶ Cell Cycle Dynamics, Bistability and Oscillations Goldbeter A (1996) Biochemical oscillations and cellular
▶ Cell Cycle Dynamics, Irreversibility rhythms. Cambridge University Press, Cambridge
Keener J, Sneyd J (2009) Mathematical physiology, 2nd edn.
▶ Cell Cycle Model Analysis, Bifurcation Theory
Springer-Verlag, Berlin
▶ Cell Cycle Modeling Using Logical Rules Murray JD (1989) Mathematical biology. Springer-Verlag,
▶ Cell Cycle Modeling, Stochastic Methods Berlin
▶ Cell Cycle Models, Sensitivity Analysis Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level
dissection of the cell-cycle oscillator: bypassing positive
▶ Cell Cycle of Early Frog Embryos
feedback produces damped oscillations. Cell 122:565–578
▶ Cell Cycle, Budding Yeast Tyson JJ, Chen K, Novak B (2001) Network dynamics and cell
▶ Cell Cycle, Fission Yeast physiology. Nat Rev Mol Cell Biol 2:908–916
Cell Cycle Modeling, Petri Nets 287 C
molecule Cdh1 and the CycB-Cdk1 dimer. The
Cell Cycle Modeling, Petri Nets modeled reactions are shown in Eqs. 1–5, and include
two possible reversible bindings of Cdh1 with the
Ivan Mura cyclin dimer (Eqs. 1 and 2), two possible degradation
The Microsoft Research - University of Trento Centre ways of the Cdh1 molecule (Eqs. 3 and 4), and the
for Computational and Systems Biology, Trento, Italy CycB-Cdk1 catalysis reaction operated by Cdh1
(Eq. 5): C
Synonyms Cdh1 þ CycB  Cdk1 ! Cdh1  CycB  Cdk1 (1)

Place/transition net CycB  Cdk1 þ Cdh1 ! CycB  Cdk1  Cdh1 (2)

Definition Cdh1  CycB  Cdk1 ! CycB  Cdk1 (3)

Petri Nets are a graphical modeling formalism with


numerous applications to the modeling and analysis CycB  Cdk1  Cdh1 ! CycB  Cdk1 (4)
of systems composed by multiple concurrent pro-
cesses, including the molecular kinetics of cell cycle
Cdh1  CycB  Cdk1 ! Cdh1 (5)
regulation networks. They are named after Carl Adam
Petri, who originally defined them in his dissertation
thesis in 1962 (Petri 1962). Tokens are not represented in the Petri Net model in
Petri Net models consist of four types of elements: Fig. 1, neither are arc cardinalities, which are all
• Places, depicted as hollow circles, represent vari- unitary.
ables of the model. A very good reference textbook to the applications
• Tokens, depicted as black dots, are contained into of Petri Nets for the modeling of living systems is
places and provide the numerical value associated (Koch et al. 2010). We concisely describe in the fol-
to the variable. lowing several features of Petri Nets, providing in
• Transitions, depicted as bars, represent events particular details and references to their applications
affecting the variables. The occurrence of the to the modeling of cell cycle regulation networks.
event associated to a transition is referred as the
firing of the transition.
• Arcs, linking transitions to places and places to Characteristics
transitions (but not places to places nor transitions
to transitions), carry multiplicities defining the Semantics of Transitions Firing
changes on variables as a result of transitions fir- The marking of a Petri Net is the amount of tokens
ings. Incoming arcs to a place add tokens to the contained in the places. It evolves as a consequence of
place, whereas outgoing arcs remove tokens. transition firings, i.e., the occurrence of events
The intuitive graphical formalisms of Petri Nets modeled in a Petri Net. A transition of the model is
coupled with the considerable amount of theoretical said to be enabled if and only if each place connected
results supporting their analysis (see for instance the to the transition by an input arc contains a number of
work of Murata 1989) have favored the spread of this tokens greater or equal to the multiplicity of the arc. An
modeling formalism in various scientific and technical enabled transition may fire. Upon firing, the marking of
fields, among them being Systems Biology. There is each input place is updated by subtracting a number of
quite an immediate mapping between reaction- tokens equal to the arc multiplicity. Simultaneously,
oriented descriptions of biological systems and places, the marking of each place connected to the firing
arcs, and transitions. We show in Fig. 1 a Petri Net transition by an output arc is updated by adding to it
model composed by four places, seven transitions, and a number of tokens equal to the multiplicity of the
18 arcs, encoding the interactions between the APC connecting arc.
C 288 Cell Cycle Modeling, Petri Nets

CycB-Cdk1

td1 tb1 tb2 td2

CycB-Cdkl-Cdhl tu1 Cdh1 tu2 Cdhl-CycB-Cdk1 tcat

Cell Cycle Modeling, Petri Nets, Fig. 1 Petri net encoding of the interactions between Cdh1 and the CycB-Cdk1 dimer

Starting from the initial marking, the net can reach responsible for cell cycle arrest in G1 phase, is studied
new markings, which represent the evolution of the through the techniques of p-invariant and t-invariant
state of the modeled variables. In our context, the firing analysis (Murata 1989).
of transitions usually corresponds to the occurrence of
reactions and the changes in the marking of the places Petri Nets with Stochastic Firing Times
caused by the firing corresponds to the updates of the The quantitative information about the speed of reac-
abundance of reactants and products of reactions. tions is introduced into Petri Net models by various
It is quite immediate to realize that several concepts extensions of the firing semantics. The first one we
need a more precise definition to reach a complete consider here proposes a stochastic formulation of
characterization of the firing semantics. Consider for reaction occurrence times (Marsan 1990), where the
instance the case when multiple transitions are competition among concurrently enabled transitions is
enabled: obviously, rules are needed for defining the broken down through a random choice modulated by
order of firings, as different orders may result in dif- the speed of the competing transitions. More precisely,
ferent final markings of the net. each transition firing occurs with a time that follows
a probability distribution law, and in a set of simulta-
Qualitative Modeling with Petri Nets neously enabled transitions the smallest random time
The earliest definition of Petri Nets introduced a non- determines which transition will fire first. This firing
deterministic choice of the firing, which is the same as semantics is called race policy. In the case of contin-
to say that each order is possible, compatibly with the uous-time distributions of the firing times, no simulta-
enabling rules of the firing. Basically, a non- neous firings of transitions are thus allowed. Petri Nets
deterministic firing policy assigns an equal occurrence adhering to this semantics of firing are commonly
priority to each transition, irrespective of the time called Stochastic Petri Nets.
dimension of net evolution, as determined by the rela- Of particular interest in Systems Biology are the
tive speed with which reactions take place in the Stochastic Petri Nets whose firing times are distributed
modeled system. This type of firing semantics is useful as negative exponential ▶ random variables, in which
when no quantitative information is available about the case the marking evolution process over time is equiv-
speed of reactions, and can nevertheless provide a view alent to a discrete-space continuous-time Markov pro-
of the possible markings a given Petri Net model can cess (▶ Cell Cycle Modeling, Stochastic Methods;
reach. Thus, it can help in showing that some features ▶ Stochastic Processes, Fokker-Planck Equation).
of the model exist no matter which speed the real This class of models lend themselves to naturally rep-
kinetics of reactions possesses. An example of this resent the stochastic molecular kinetics as described by
Petri Net qualitative modeling can be found in Gillespie (1977), and their evolution can be studied
(Sackmann et al. 2006), where the mating pheromone through ▶ Gillespie stochastic simulation algorithm
response pathway of budding yeast, which is (Gillespie 1977).
Cell Cycle Modeling, Petri Nets 289 C
0.25
Cell mass distribution (A.U.)
0.2
Wild type sic1Δ
0.15

0.1 C

0.05

0.0
0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75

Cell Cycle Modeling, Petri Nets, Fig. 2 Cell mass spread in wild type and sic1D mutant of budding yeast

Among the modeling works that resort to stochastic degradation times of the key regulator Cdh1, and not
methods (▶ Cell Cycle Modeling, Stochastic only their average values, are critical in determining
Methods), we found in the literature a few applications the overall variability of cell cycle statistics. This
of Stochastic Petri Nets to the modeling and analysis of result provides an important argument for supporting
yeast cell cycle. The work in (Mura and Csikász-Nagy the validity of pursuing a stochastic modeling of cell
2008) provides a model of budding yeast (▶ Cell cycle regulation networks.
Cycle, Budding Yeast) cell cycle regulation network.
The model validation is performed with a comparison Hybrid Petri Nets
of simulated results against a set of experimental Hybrid Functional Petri Nets are another popular
results about wild type as well as budding yeast mutant timed extension of Petri Nets that has been used to
cells. The authors demonstrate how the Stochastic model cell cycle regulation. This modeling formalism
Petri Net model can be used to obtain statistics of the allows the coexistence of both continuous and dis-
key cell cycle parameters such as duration and cellular crete processes. Both the set of places and that of
mass, beyond the average one. This information pro- transitions is divided in a discrete and a continuous
vides insights on cell cycle characteristics. For part. Enabled discrete transitions fire after a deter-
instance, Fig. 2 compares the simulated cell mass dis- mined delay, consuming and producing discrete num-
tribution of wild type budding yeast with that of the bers of tokens as in regular Petri Nets. Enabled
sic1D mutant, where the Cdk inhibitor of cyclins is continuous transitions fire continuously at a given
removed. The chart clearly shows that the spread of the rate. This latter feature provides a convenient abstrac-
cell mass distribution is quite different in the two tion mechanism for directly representing into the
modeled organisms. The mutant mass distribution model processes that are difficult to model with
exhibits a higher variability, in agreement with exper- a discrete number of tokens, such as very high molec-
imental observations. Thus, the stochastic model ular concentrations, as well as external driving vari-
reveals a characteristic of the phenotype of sic1D ables such as cell mass.
cells that could not be anticipated by deterministic The cell cycle regulatory network of fission
models. yeast (▶ Cell Cycle, Fission Yeast) has been modeled
In (Csikász-Nagy and Mura 2010), Stochastic Petri and analyzed by Hybrid Functional Petri Nets in
Nets are used to explore the effects of intrinsic noise (Fujita et al. 2004). This study demonstrate the versa-
propagation on the variability of cell cycle duration tility of the Hybrid Functional Petri Net modeling
and cell mass at division time in fission yeast (▶ Cell formalism in representing a broad range of molecular
Cycle, Fission Yeast). In particular, the authors show regulation mechanisms and the intuitiveness of
that the variance of RNA synthesis and RNA the modeling process. A comprehensive model of the
C 290 Cell Cycle Modeling, Process Algebra

cell division process of Xenopus frog embryo


(▶ Cell Cycle of Early Frog Embryos), which includes Cell Cycle Modeling, Process Algebra
the molecular dynamics of cyclins and of the S-phase
and M-phase promoting factors as well as the DNA Alida Palmisano1 and Corrado Priami2
1
damage (▶ Cell Cycle Arrest After DNA Damage) Department of Biological Sciences and Department
checkpoint, has been proposed in (Matsui et al. of Computer Science, Department of Biological
2004). This latter model is able to reproduce the Sciences Virginia Tech, Virginia Polytechnic Institute
changes in the cell division cycles from synchronous and State University, Blacksburg, VA, USA
2
to asynchronous. Microsoft Research-University of Trento Centre for
Computational and Systems Biology and DISI,
University of Trento, Povo, Trento, Italy
Cross-References

▶ Cell Cycle Arrest After DNA Damage Synonyms


▶ Cell Cycle Modeling, Stochastic Methods
▶ Cell Cycle of Early Frog Embryos Algorithmic modeling languages; Computational
▶ Cell Cycle, Budding Yeast modeling languages; Rule-based languages
▶ Cell Cycle, Fission Yeast
▶ Gillespie Stochastic Simulation
▶ Random Variable Definition
▶ Stochastic Processes, Fokker-Planck Equation
Process algebras are formal languages originally con-
ceived for modeling concurrent systems and are also
a conceptual tool for the high-level description of
References interactions, communications, and synchronizations
between a collection of independent processes.
Csikász-Nagy A, Mura I (2010) Role of mRNA gestation and
Concurrent computing systems do not allow assump-
senescence in noise reduction during the cell cycle. In Silico
Biol 10:81–88. doi:10.3233/ISB-2010-0416 tions on the relative speed of their components, and
Fujita S, Matsui M, Matsuno H, Miyano S (2004) Modeling and therefore, a certain degree of non-determinism
simulation of fission yeast cell cycle on Hybrid Functional emerges in systems behavior.
Petri Net. IEICE Trans Fundam 87:2919–2928
Several approaches have been developed and
Gillespie DT (1977) Exact stochastic simulation of coupled
chemical reactions. J Phys Chem 81:2340–2361 used to model and study complex interaction mecha-
Koch I, Reisig W, Schreiber F (eds) (2010) Modeling in systems nisms in biological systems, and different process
biology, vol 16, Computational biology. Springer, London. algebra–derived languages have been used, in particu-
doi:10.1007/978-1-84996-474-6_7
lar, to build models of cell cycle at different levels of
Marsan MA (1990) Stochastic Petri nets: an elementary intro-
duction. In: Advances in Petri nets 1989. Springer-Verlag, abstraction.
New York, pp 1–29 In seminal work, Regev and Shapiro (2001)
Matsui M, Fujita S, Suzuki S, Matsuno H, Miyano S (2004) suggested an abstraction of cells-as-computation and
Simulated cell division processes of the Xenopus cell cycle
pathway by Genomic Object Net. J Integr Bioinform 1:3.
proposed that process algebras formerly used in the
doi:10.2390/biecoll-jib-2004-3 study of interacting computational entities could be
Mura I, Csikász-Nagy A (2008) Stochastic Petri Net extension of usefully employed and extended to model biological
a yeast cell cycle model. J Theor Biol. doi:10.1016/j. processes (Table 1).
jtbi.2008.07.019
The strategy used by process algebras diverges from
Murata T (1989) Petri Nets: properties, analysis and applica-
tions. IEEE Proc 77:541–580 classical mathematical modeling because it can
Petri CA (1962) Kommunikation mit automaten. Ph. D. Thesis, describe the flow of control between species and reac-
University of Bonn tions, i.e., not only the time, but also the causality
Sackmann A, Heiner M, Koch I (2006) Application of
relation among the events that constitute the very
Petri net based analysis techniques to signal transduction
pathways. BMC Bioinform 7:482. doi:10.1186/1471-2105- essence of the mechanistic steps describing the
7-482 emergent behavior of the modeled systems.
Cell Cycle Modeling, Process Algebra 291 C
Cell Cycle Modeling, Process Algebra, Table 1 A concise The basic steps of process algebras are encoded by
picture of the mapping between biology and process algebras. actions. In the simplest case, actions are input/output
In the process algebras interpretation, a biological entity
(e.g., a protein) is seen as a computation unit (i.e., a process) operations on communication channels. Actions can be
with interaction capabilities abstracted as communication chan- composed sequentially, meaning that a process can
nel names. Similarly to biological entities, which interact/react perform an action after the other one (“a.Q”). Pro-
through complementary capabilities, processes synchronize/ cesses can also be composed in parallel: (“P1 | P2”)
communicate on communication channels complementary
send/receive actions. The modifications/evolutions of molecules meaning that P1 and P2 run simultaneously, although C
after reactions are represented by state changes following they can synchronize and communicate if they share
communications a communication channel. Moreover, most of the alge-
Biology Process algebras bras are equipped with some sort of restriction operator
Entity Process which defines the scope of actions. For instance,
Interaction capability Communication channel name assuming “a” to be an action, and using “v” to denote
Interaction Synchronization/communication restrictions, “P1 | va.P2” means that action a is private
Modification/evolution State change to process P2.
The interaction policy assumed by each algebra is
one of its main distinguishing features, and it drives the
This interpretation is very similar to programming the design of its primitive operators. This is a list of the
behavior of a system rather than describing only its most used ones:
outcome with respect to time (Priami 2009). • Two-way synchronization (see, e.g., [Milner
A number of process algebras have been 1999]). Each action has a complementary action,
adapted or newly developed for building biological typically called co-action. Actions and co-actions
models and performing stochastic simulations (i.e., can synchronize with each other. For instance,
▶ Stochastic pi-calculus, ▶ BioSPI, ▶ κ-Calculus, assuming a and a to be complementary actions,
▶ Bio-PEPA, ▶ BlenX). For a comprehensive picture the following interaction would be possible:
of the state of the art of process algebras abstraction a:P1ja:P2 ! P1jP2 where “.” is the sequential
for biology, see (Guerriero et al. 2009). It is important operator.
to notice that each modeling language requires • Multi-way synchronization (see, e.g., [Hoare
a different modeling style because it maps molecular 1985]). The parallel composition operator is
components to interactions, process communication, parametric with respect to a set of actions (some-
and process composition in a quite different way. times called cooperation set) on which all the par-
So by choosing a formalism, a modeler is often allel processes are obliged to synchronize. In
making implicit choices (we refer the reader to this case, there is no need to resort to complemen-
(Calder and Hillston 2009) for details about the tary actions. For example, a:P1 fag a:P2 fag
abstractions used by many of the aforementioned a:P3 ! P1 fag P2 fag P3::::.
process algebras). • Name-passing (see, e.g., [Milner 1999]). In this
case, actions have either the form ahbi or the form
aðcÞ. The first one stands for “output the channel
Characteristics name b along the channel named a,” and the second
one stands for “input any name from channel a and
Process Algebras, the Approach then use it instead of the parameter name c.” The
A system is represented by a collection of entities name-passing interaction policy is a specific
(processes) that define the possible behaviors of the instance of the two-way communication paradigm:
components of the system. Algebras are equipped with The actions ahbi and aðcÞ are complementary to
an operational semantics, which consists in an induc- each other, and they can be involved in an interac-
tive system of logical rules that implicitly define a state tion. In this case, more than simple synchronization
transition function (Plotkin 1981) and allow users to occurs: Channel names flow from senders to
automatically derive the possible future states of the receivers, thus dynamically changing the intercom-
system. The fact that process P evolves into process munication topology of processes. For instance,
Q is generally written as P ! Q. ahbi:P1jaðcÞ:P2 ! P1jP2fb=cg, where fb=cg
C 292 Cell Cycle Modeling, Process Algebra

denotes the substitution of the free occurrences of c Cell Cycle Modeling, Process Algebra, Table 2 Summary on
by the actually received name b. tools and case studies of process algebras used for modeling
biological systems
Since names can be transmitted in interactions, the
restriction operator plays a special role in name- Simulation Case studies
passing algebras. Restricted names cannot be used as Biochemical BioSPI, SPiM Autoreactive lymphocyte
pi-calculus recruitment, gene regulation,
transmission media; they can, however, be used as cell cycle, Rho GTP-binding
transmitted data and, once transmitted, they become in phagocytosis, FGF
private resources shared by the sender and the receiver. pathway
The mentioned operators are those common to var- Bio-PEPA PEPA ERK signaling pathway, cell
ious process algebras; in addition to these, each calcu- workbench, cycle, epidemiological
eclipse plug-in models, genetic networks,
lus adopts a few specific operators (see specific entries NF-kB
▶ Stochastic pi-calculus, ▶ κ-Calculus, ▶ Bio-PEPA, BlenX CoSBiLab Cell cycle, circadian clock,
▶ BlenX for more details) (BetaWorkbench) actin polymerization, NF-kB,
MAPK, EGFR
Process Algebras, Comparing Different Languages k-calculus Kappa factory Cellular signals
Table 2 schematically reports a comparison between
different process algebras with the availability of sim-
ulation platforms (usually implementing some variants
of the ▶ Stochastic simulation algorithm) and the and make them active; then the dimers CYCLIN/CDK,
investigation of some complex case studies. the activator CDC14, and the CDH1 are involved in
a negative feedback loop: CYCLIN/CDK turns on
Process Algebras, Applications in Biology (Cell CDC14, which activates CDH1, which inhibits the
Cycle) CYCLIN/CDK activity, destroying the cyclin sub-
As the best of our knowledge, the only three process units. The model includes also another possibility of
algebras used to study different aspects of the cell inhibition of CYCLIN/CDK: the stoichiometric bind-
cycle are ▶ BioSpi, ▶ Bio-PEPA, and ▶ BlenX. ing with CKI.
Here we just sketch the main characteristic of the So the main events that the code simulates are: the
chosen model with respect to the peculiar feature of dimers CYCLIN/CDK formation (by the usage of pri-
those languages just to give to the reader a flavor of vate channels between the two processes), phosphory-
how different process algebras handle various aspect lation, and dephosphorylation of CDH1 by CDC14
of the cell cycle, usually studied in the deterministic (by changing of the current state of the process) and
framework. We refer the reader to the original publi- the protein degradation (by changing the current state
cations for more detailed discussions. of the process to the constant empty process). All the
events occur on global channels at different suitable
A Cell Cycle Model in BioSpi rates (with the exception of the dimer formation that
Lecca and Priami (2003) chose to implement the con- occurs on a private channel between the two pro-
trol mechanism of the cell cycle (modeled by Novak in cesses). The different reactions in which the compo-
1999) that is accounting for the antagonistic interac- nents of the system are involved are implemented as
tion between cyclin-dependent kinase dimers and the a multiple non-deterministic choice, that is then turned
anaphase-promoting complex: The APC extinguishes into a probabilistic one by the BioSpi simulator
cdk activity by destroying its cyclin partners, whereas (See ▶ stochastic simulation algorithm for more
cyclin/cdk dimers inhibit APC activity by phosphory- details about the logic behind the simulator engine).
lating Cdh1 (the interaction is also mediated by
a cyclin-dependent kinase inhibitor). A Cell Cycle Model in Bio-PEPA
The system is composed by six concurrent pro- Ciocchetta and Hillston (2009) chose the model
cesses, corresponding to the main five species of pro- presented by Goldbeter in 1991 and was later extended
teins, which regulate the cell cycle: CYCLIN, CDK, by Gardner et al. in 1998. Broadly speaking, the model
CDH1, CKI, CDC14 plus an auxiliary process describes the negative feedback loop obtained by
CLOCK. First, cyclin subunits bind to CDK monomers the fact that the cyclins promote the activation of
Cell Cycle Modeling, Process Algebra 293 C
a cdk (cdc2) which in turn activates a cyclin protease appropriate values at each simulation step and put
which in turn promotes cyclin degradation. In order to them in the classical race condition with the ele-
represent a control mechanism for the cell division mentary reaction rate, following the ▶ stochastic
cycle, Gardner and colleagues introduced a protein simulation algorithm strategy.
that initiates and concludes the cell division phase • Continuous variables can be defined in the input file
and modulates the frequency of oscillations. and their value is updated by the simulator so that
The Bio-PEPA model of the system has been built the user can define functional rates that refer to C
following this simple sequence of steps: them and that are updated by the simulator when-
• Definition of the initial and the maximum concen- ever the continuous variable is recalculated.
trations (information derived from the paper) and The BlenX model has been used for performing
choice of the appropriate level/step size that define stochastic simulations of wild type and mutants cells,
the level of granularity at which the user wants to showing how the effect of the noise is important to
study the system. characterize partially viable mutants at the border of
• Definition of functional rates and parameters of the life and death.
model. Concerning the kinetic laws, some of the
reactions in the input model follow mass-action
kinetics, whereas all the others have Michaelis– Cross-References
Menten kinetics.
• Definition of species components (i.e., the reagents/ ▶ Bio-PEPA
products of the different reactions) and of the model ▶ BioSPI
component (i.e., the composition of the species ▶ BlenX
components). ▶ κ-Calculus
The Bio-PEPA model has then been analyzed, and ▶ Stochastic Pi-calculus
some interesting properties about different parameter ▶ Stochastic Simulation Algorithm
sets and their effect on the underlying Continuous
Time Markov Chain have been investigated.
References
A Cell Cycle Model in BlenX
Palmisano et al. (2009) illustrates an easy translation to Calder M, Hillston J (2009) Process algebra modeling styles
BlenX of one of the most popular deterministic model for biomolecular processes. Lect Notes Comput Sci
5750:1–25
of the budding yeast cell cycle developed by Novak Ciocchetta F, Hillston J (2009) Bio-PEPA: a Framework for the
and Tyson in 2003. This model contains species Modeling and Analysis of Biological Systems. Theor
involved in reactions at very different levels of abstrac- Comput Sci 410(33–34):3065–3084
tion: There are elementary reactions following mass- Guerriero ML, Prandi D, Priami C, Quaglia P (2009) Process
calculi abstractions for biology, process calculi abstractions
action kinetics and more complex interaction mecha- for biology in algorithmic bioprocesses. Springer, Heidelberg
nisms abstracted as cooperative reactions with Hoare CAR (1985) Communicating sequential processes, Prentice
Michaelis–Menten and Hill kinetics. Moreover, con- hall international series in computer science. Prentice Hall, UK
tinuous variables (e.g., the mass of the cell) are intro- Lecca P, Priami C (2003) Cell cycle control in eukaryotes:
a BioSpi model. Proceedings of BioConcur 03. Electron
duced and they influence the rate of many other Notes Theor Comput Sci 180(3):51–63
reactions. Milner R (1999) Communicating and mobile systems: the pi-
In BlenX, it is easy to code all the features listed calculus. Cambridge University Press, Cambridge
above considering the following ideas: Palmisano A, Mura I, Priami C (2009) From odes to language-
based, executable models of biological systems. Proceedings
• Species can be modeled as simple boxes, with inter- of the pacific symposium on biocomputing (PSB 2009)
action capabilities defined only between the species Plotkin GD (1981) A structural approach to operational seman-
involved in pure mass-action reactions. tics. Tech Rep DAIMI FN-19, Computer Science Depart-
• Complex mathematical formula used as kinetic ment, Aarhus University, Aarhus, Denmark
Priami C (2009) Algorithmic systems biology. An opportunity
laws can be simply copied in the input file of for computer science. Commun ACM 52:80–88
the model and assigned as rate of BlenX events. Regev A, Shapiro E (2001) Cellular abstractions: cells as
The simulator takes care of calculating their computation. Nature 419:343
C 294 Cell Cycle Modeling, Stochastic Methods

of cell cycle phases can be modeled as continuous


Cell Cycle Modeling, Stochastic Methods time random variables.
• Stochasticity as molecular noise. The molecular
Ivan Mura dynamics of the proteins that participate to the cell
The Microsoft Research - University of Trento Centre cycle regulation network can be modeled by adding
for Computational and Systems Biology, Trento, Italy to a pure deterministic process that represents the
average change, a stochastic process that represents
molecular noise. In line with this approach we find
Synonyms probabilistic versions of Boolean networks, where
the deterministic causality dictated by the structure
Probabilistic methods of the model is relaxed through the introduction
of probabilities of state change. Another way to
represents stochastic changes in the molecular
Definition concentrations is through a differential ▶ Langevin
equation with multiplicative noise, as shown
Stochastic methods of modeling include randomness in Eq. 1:
as a way to represent the occurrence of events that,
because of their very nature or due to practical d pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
xðtÞ ¼ f ðxðtÞÞ þ Nð0; 1Þ 2 D xðtÞ (1)
impossibilities, can only be predicted in probabilistic dt
terms. Thus, the diffusion of molecules through
a membrane, the dynamic instability of microtubules, where xðtÞ is the concentration of species x at time t,
and the spontaneous switch from the non-lytic to the f ð Þ is a deterministic function, Nð0; 1Þ is a standard
lytic behavior of the l-phage can all be modeled as Gaussian noise (null expectation and unitary
stochastic processes. variance), and D is the noise amplitude.
Various modeling studies of cell cycle regulation • Stochastic kinetics. The randomness is directly
networks have been proposed in the literature, which introduced at the level of molecular interactions,
include stochastic parts of behavior. The abstraction according to the formulation of stochastic kinetics
level at which stochasticity is included in the model used in ▶ Gillespie stochastic simulation. Under a set
can provide a convenient classification criterion. of hypotheses about the homogeneity and the
Going down to the highest to the lowest level we thermal equilibrium of the system, the times to
distinguish the following four classes: the occurrence of reactions in the system can be
• Stochasticity at cell division time. The elements of faithfully modeled by negative exponential ▶ ran-
randomness considered are those related to the dom variables. This allows describing the evolution
process of cell division, which is the asymmetric of the system over time through a simple system
division of cells with unequal partitioning of cyto- of first order linear differential equations, known as
plasmic and/or nuclear material between the mother the Chemical ▶ Master equation, or equivalently, as
and daughter cells. The weights determining the rel- a discrete space continuous-time Markov process
ative partitioning of the cellular content are charac- (▶ Markov Chain). The cell cycle regulation net-
terized by discrete or continuous random variables. work is thus modeled by assigning to each reaction,
• Stochasticity at cell cycle phases duration. for each possible state of the system, a rate of occur-
The duration of the various cell cycle phases rence that is interpreted as the rate of a negative
depends on a number of factors, including exponential distribution. Various high-level lan-
genotypic differences, intrinsic molecular noise, guages for specifying stochastic kinetics models
and external factors such as the abundance of exist, we cite here the stochastic Petri net (▶ Cell
nutrients and DNA damage inducing events. Cycle Modeling, Petri Nets) and stochastic process
Therefore, cell cycle phase duration is not constant algebra (▶ Cell Cycle Modeling, Process Algebra)
in a population of cells, neither is across subsequent formalisms as they have been recently applied to the
rounds of duplication of the same cell. The duration modeling of cell cycle regulation networks.
Cell Cycle Modeling, Stochastic Methods 295 C
Characteristics Kar et al. (2009) with a Gillespie simulation scheme.
The model shows that both intrinsic and extrinsic
The four stochastic modeling approaches presented noises contribute significantly to the variability
above have been applied, sometimes combined with of cell cycle progression. Nonetheless, intrinsic
each other and mixed with deterministic parts of molecular fluctuations in the control system are
modeling, to define in-silico models of cell cycle considerably noisier, mostly due to the low numbers
regulation networks. The vast majority of modeling of mRNA molecules reported for yeast cells. C
studies that are of interest here deal with the cell
cycle of yeast. The corpus of deterministic models of Budding Yeast
yeast cell cycle, based on Ordinary Differential The cell cycle of budding yeast (▶ Cell Cycle, Bud-
Equations (▶ Ordinary Differential Equation (ODE)), ding Yeast) has also been the subject of modeling
has been extended in various ways with the stochastic studies employing stochastic methods. A modeling
methods to represent extrinsic and intrinsic molecular work based on a ▶ Boolean network by Zhang et al.
variability. A few stochastic models of higher organ- (2006) encodes the activation/repression relationships
isms cell cycle regulation are also found in the among the key molecular species, and a perturbing
literature. probabilistic noise is considered when evaluating the
state update rules. The authors show that the model is
Fission Yeast quite robust and that the main cycling behavior is
In a pursuit to determine the origin of the observed conserved with high probability even for very noisy
variability in cell cycle duration, Sveiczer et al. (2001) models. Similar conclusions were also drawn by
modified a pure deterministic model of fission yeast Sabouri-Ghomi et al. (2008), in a study that explored
(▶ Cell Cycle, Fission Yeast) cell cycle by intro- the effects of molecular intrinsic noise in cell cycle
ducing into it the asymmetric results of ▶ cytokinesis. progression of budding yeast by a stochastic kinetic
The cell size at birth is modeled as the product of model a la Gillespie. They demonstrated that the
a Gaussian random variable – with average molecular switches controlling the Start (entry into
value ¼ 0.5 and standard deviation ¼ 0.016 – and the S phase) and the Finish (exit from M phase) transitions
size of the mother cell, whereas the initial nuclear are robust even for low molecular counts of the key
volume is assumed to be Gaussian, with average regulating molecules. The works in (Mura and
value ¼ 1.0 (in arbitrary units) and standard Csikász-Nagy 2008) and in (Palmisano et al. 2009)
deviation ¼ 0.07 (determined from the fit with exper- introduce, through the application on the budding
imental data, due to the unknown distribution of real yeast cell cycle regulation networks, a procedure to
nucleus size). The inclusion of this stochasticity translate deterministic models based on ordinary
allowed the model to reproduce the experimentally differential equations into stochastic kinetic models
observed levels of variability for both wild type and for Gillespie simulation specified as stochastic Petri
wee1- mutant fission yeast cell population. An exten- net and stochastic process algebra models,
sion of a deterministic model of fission yeast cell cycle respectively. The authors demonstrate how stochastic
molecular machinery through Langevin equation noise models can be used to obtain statistics of key cell cycle
terms has been proposed in (Steuer 2004). The author parameters, such as duration and cellular mass, beyond
rewrites a model composed by ordinary differential the average one, information useful to revealing
equations by adding stochastic noise to each of characteristics of the phenotypes that could not be
them. The simulation results demonstrate that the anticipated by deterministic models. Finally,
introduction of noise makes the model able to account a stochastic kinetics model of the core of budding
for emerging properties, such as existence of quantized yeast cell cycle regulation network was studied
cycle times, thus showing that fluctuations cannot through a combination of stochastic simulation and
always be treated as small perturbations to the model checking techniques in (Ballarini et al. 2009)
deterministic behavior. The effects of both intrinsic to characterize the elements of the network determin-
(stochastic molecular kinetics) and extrinsic ing the irreversibility (▶ Cell Cycle Dynamics,
(asymmetrical split at division time) are studied by irreversibility) of phase transitions.
C 296 Cell Cycle Models, Sensitivity Analysis

Higher Organisms References


A hybrid Petri net model of the cell division process
of Xenopus frog embryo (▶ Cell Cycle of Early Frog Ballarini P, Mazza T, Palmisano A, Csikász-Nagy A (2009)
Studying irreversible transitions in a model of cell cycle
Embryos) has been proposed in (Matsui et al. 2004).
regulation. Electron Notes Theor Comput Sci 232:39–53.
This model output is able to reproduce the changes doi:10.1016/j.entcs.2009.02.049
in the cell division cycles from synchronous to Kar S, Baumann WT, Paul MR, Tyson JJ (2009) Exploring the
asynchronous. The cell cycle desynchronization in roles of noise in the eukaryotic cell cycle. Proc Natl Acad Sci
USA 106:6471–6476. doi:10.1073/pnas.0810034106
a culture of proliferating mammalian cells
Matsui M, Fujita S, Suzuki S, Matsuno H, Miyano S (2004)
(▶ Cell Cycle of Mammalian Cells) is the subject of Simulated cell division processes of the Xenopus cell cycle
Olofsson and McDonald (2010), where the stochastic pathway by Genomic Object Net. J Integr Bioinform 1:3.
characterization of phase duration is taken as an doi:10.2390/biecoll-jib-2004-3
Mura I, Csikász-Nagy A (2008) Stochastic Petri Net extension of
input to a branching process model of replicating
a yeast cell cycle model. J Theor Biol 254:859–860.
cells, able to match the experimentally observed per- doi:10.1016/j.jtbi.2008.07.019
centages of cells in the various phases of the cycle. Olofsson P, McDonald TO (2010) A stochastic model of cell
We finally mention the modeling study of Zamborsky cycle desynchronization. Math Biosci 223:97–104.
doi:10.1016/j.mbs.2009.11.003
et al. (2007), which deals with the cross-talking upon
Palmisano A, Mura I, Priami C (2009) From ODEs to
the Wee1 kinase between circadian rhythm and language-based, executable models of biological systems.
cell cycle regulation networks in mammalian cells. In: Proceedings of the pacific symposium on biocomputing
A stochastic model of cell cycle regulation network 14. Kohala Coast, Hawaii, USA, pp 239–250.
Sabouri-Ghomi M, Ciliberto A, Kar S, Novak B, Tyson JJ
expressed through Langevin equation is coupled with
(2008) Antagonism and bistability in protein interaction
circadian clock (▶ Cell Cycle, Coupled with Circa- networks. J Theor Biol 250:209–218. doi:10.1016/j.
dian Clock) deterministic model. The model suggests jtbi.2007.09.001
that the circadian clock may contribute to enforce Steuer R (2004) Effects of stochasticity in models of the
cell cycle: from quantized cycle times to noise-induced
a cell size control on the cell cycle when the mass
oscillations. J Theor Biol 228:293–301. doi:10.1016/j.
doubling time is quite different from the circadian jtbi.2004.01.012
period. Sveiczer A, Tyson JJ, Novak B (2001) A stochastic, molecular
model of the fission yeast cell cycle: role of the nucleocy-
toplasmic ratio in cycle time regulation. Biophys Chem
92:1–15. doi:10.1016/S0301-4622(01)00183-1
Cross-References Zámborszky J, Hong CI, Csikász-Nagy A (2007) Computational
analysis of mammalian cell division gated by a circadian
▶ Boolean Networks clock: quantized cell cycles and cell size control. J Biol
Rhythms 22:542–553. doi:10.1177/0748730407307225
▶ Cell Cycle Dynamics, Irreversibility
Zhang Y, Qian M, Ouyang Q, Deng M, Li F, Tanga C (2006)
▶ Cell Cycle Modeling, Petri Nets Stochastic model of yeast cell-cycle network. Phys
▶ Cell Cycle Modeling, Process Algebra D 219:35–39. doi:10.1016/j.physd.2006.05.009
▶ Cell Cycle Modeling, Stochastic Methods
▶ Cell Cycle of Early Frog Embryos
▶ Cell Cycle of Mammalian Cells
▶ Cell Cycle, Budding Yeast Cell Cycle Models, Sensitivity Analysis
▶ Cell Cycle, Coupled with Circadian Clock
▶ Cell Cycle, Fission Yeast Tamás Turányi
▶ Cytokinesis Institute of Chemistry, E€otv€os University (ELTE),
▶ Gillespie Stochastic Simulation Budapest, Hungary
▶ Langevin Equation
▶ Master Equation
▶ Ordinary Differential Equation (ODE) Synonyms
▶ Random Variable
▶ Stochastic Processes, Fokker-Planck Equation Sensitivity analysis; Uncertainty analysis
Cell Cycle Models, Sensitivity Analysis 297 C
Definition

Yi concentration
Sensitivity analysis is the name of a family of mathe-
matical methods that investigate the relation between ΔYi
the values of the parameters and the output of models.
Local sensitivity analysis explores the effect of small
parameter changes, while the methods of global sensi- C
tivity analysis may explore large parameter domains.

Characteristics

Local Sensitivity Analysis 0 t1 t2 time


A dynamical model can be characterized by the fol-
lowing initial value problem Cell Cycle Models, Sensitivity Analysis, Fig. 1 Concentration
– time solution of a model (solid line) and the modified solution
when one of the parameters is changed at time t1 (dashed line)
dY=d t ¼ fðY;pÞ Yð0Þ ¼ Y0 (1)

where t is time, Y is the n-vector of variables, p is the solving the following initial value problem (Tomović
m-vector of parameters, Y0 is the vector of the initial and Vukobratović 1972):
values of the variables, and f is the right-hand-side of
the differential equations. S_ ¼ JS þ F Sðt1 Þ ¼ 0 (3)
Local sensitivity analysis
 means the calculation of

partial derivative @Yi @pj , which is called the first where S(t) ¼ {sij(t)} ¼ {@Yi @pj } is the time-
order local sensitivity coefficient. The meaning of dependent local sensitivity matrix, matrix J is the
a local sensitivity coefficient is visualized in Fig. 1. Jacobian (J ¼ {@fi =@Yk }), and matrix F contains the
The concentration–time curve, as obtained by the solu- derivatives of the right-hand-side of the ODE with
tion of the original model, is plotted with solid line. If respect to the parameters (F ¼ {@fi @pj }). Matrices J
parameter j is changed by Dpj at time t1, then the and F are functions of time t2. The sensitivity
solution of the modified model will deviate (see
coefficients
   are usually used in normalized  
the dashed line) from the original solution. Denote pj =Yi @Yi @pj or semi-normalized pj @Yi @pj
DYi the change in the calculated solution at time t2. form. The normalized sensitivity coefficients are
The finite difference approximation can be used for the dimensionless and show the percentage change of
calculation of the local sensitivity coefficient: model solution i as a result of +1% change of the
value of parameter j.
@Yi DYi
 (2)
@pj Dpj Global Sensitivity Analysis
 Textbooks about the various methods of global sensi-
Local sensitivity coefficient @Yi @pj depends on the tivity analysis and their applications in science have
time of parameter perturbation t1 and the time of been published by Saltelli et al. (2004, 2008). The most
observation t2. If the time of perturbation t1 is fixed, frequently applied methods of global sensitivity anal-
then the local sensitivity coefficient is a function of ysis are the Monte Carlo method with Latin hypercube
time t ¼ t2t1. sampling, application of pseudo random numbers (e.g.,
The error of the finite difference approximation a Sobol’ sequence), Fourier Amplitude Sensitivity Test
cannot be controlled effectively, therefore the local (FAST), and High Dimensional Model Representation
sensitivity function sik(t) is usually calculated by (HDMR). These methods are able to determine the
C 298 Cell Cycle Models, Sensitivity Analysis

uncertainty of each model output (like the concentra- the sensitivity matrix. However, in several chemical
tion at a given time or the period time) knowing the kinetic systems the following relations have been
uncertainty of each parameter. The ultimate informa- observed (see Rabitz 1989; Zsély et al. 2003):
tion (usually obtained by the combined application of 1. Local similarity: Value
several methods) is the calculation of the joint proba-
bility density function (pdf) of the model outputs and sik ðtÞ
lij ðtÞ ¼ (6)
the contribution of each parameter to it knowing the sjk ðtÞ
joint pdf of the parameters.
depends on time t and the model results Yi and Yj
Raw and Cleaned Local Sensitivity Functions selected, but is independent of the parameter pk
The concentration–time curves of a cell cycle model perturbed.
are always nearly periodic and in most models the 2. Scaling relation: Equation
concentration–time curves are truly periodic having
elapsed a transition time. In this case one period time ðdY =dtÞ s ðtÞ
t later the concentrations are identical:  i  ¼ ik (7)
dYj =dt sjk ðtÞ

YðtÞ ¼ Yðt þ tÞ (4) is valid for any parameter pk. Existence of scaling
relation includes the presence of local similarity.
3. Global similarity: Value
It is possible to investigate how period time t
depends on the values of the parameters. Using
a local sensitivity approach, this information is sik ðtÞ
 mikm ¼ (8)
provided by the period time sensitivities @t @pj . sim ðtÞ
One might assume that the sensitivity functions
of a model having periodic solution are also is independent of t (within an interval).
periodic. However, it is true only for the local sensi- Zsély et al. (2003) have shown that the existence of
tivity functions belonging to parameter pk, if the low-dimensional manifolds in variable space may
corresponding period time sensitivity is zero cause local similarity. Global similarity emerges if
(∂t/∂pk ¼ 0). If ∂t/∂pk is not zero, then the maxima local similarity is present and the sensitivity differen-
of the sensitivity–time curves are continuously tial equations are pseudo-homogeneous. The latter is
increasing. It has been shown by several authors related to the existence of excitation periods, like the
(see (Tomović and Vukobratović 1972), (Edelson and autocatalytic runaways of concentrations.
Thomas 1981), (Larter 1983), (Zak et al. 2005))
  that In cell cycle models low-dimensional manifolds
the original (“raw”) sensitivity functions @Yi @pj and excitation periods may occur (Lovrics et al.
can be separated
 to truly
 periodic (“cleaned”) sensitiv- 2006), and accordingly all the three types of relations
ity functions @Yi @pj t and a secular term: between the sensitivity functions were found (Lovrics
et al. 2008). If the maxima of the sensitivity functions
  are continuously
@Yi @Yi t @t @Yi  increasing,
  then the cleaned sensitiv-
¼  (5) ity functions @Yi @pj t should be investigated.
@pj @pj t t @pj @t
Detection of the global similarity of sensitivity
functions has a special importance. If sensitivity
These authors suggested several methods for the functions @Yi @pj and @Yi =@pk are globally similar,
accurate
 determination of the period time sensitivity then changing parameter j causes exactly the same
@t @pj and the cleaned sensitivity functions from the effect on concentration–time curve Yi ðtÞ during the
raw sensitivity functions (Fig. 2). whole time period than a (different extent) change of
parameter k. This means that the effect of the change
The Similarity of the Local Sensitivity Functions of parameter j can be compensated by an appropriate
In the case of a general mathematical model, no rela- modification of the value of parameter k. As
tion is expected among the rows and/or the columns of a consequence, an infinite number of parameter sets
Cell Cycle Models, Sensitivity Analysis 299 C
Cell Cycle Models, 1.2
Sensitivity Analysis,
Fig. 2 Raw (a) and cleaned
(b) local sensitivity functions 0.8
of a cell cycle model
0.4

raw sensitivity function


0.0 C

−0.4

−0.8

−1.2

−1.6
0 50 100 150 200 250 300 350 400
time / min
4

1
cleaned sensitivity function

−1

−2

−3

−4

−5

−6
0 50 100 150 200 250 300 350 400
time / min

may result in the same simulation results. Also, the References


effect of a physical change of a parameter in
a biological system (e.g., due to the change of the Edelson D, Thomas VM (1981) Sensitivity analysis of oscillat-
ing reactions. J Phys Chem 85:1555–1558
environment) can be fully compensated by the appro-
Larter R (1983) Sensitivity analysis of autonomous oscillators.
priate change of another parameter, which is a form of Separation of secular terms and determination of structural
▶ robustness in biological systems. stability. J Phys Chem 87:3114–3121
Lovrics A, Csikász-Nagy A, Zsély IGy, Zádor J, Turányi T,
Novák B (2006) Time scale and dimension analysis of
a budding yeast cell cycle model. BMC Bioinformatics 7:494
Lovrics A, Zsély IGy, Csikász-Nagy A, Zádor J, Turányi T,
Cross-References
Novák B (2008) Analysis of a budding yeast cell cycle
model using the shapes of local sensitivity function. Int
▶ Robustness J Chem Kinet 40:710–720
C 300 Cell Cycle of Early Frog Embryos

Rabitz H (1989) Systems analysis at the molecular scale. Science


246:221–226
Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensi-
tivity analysis in practice. A guide to assessing scientific
models. Wiley, Chichester
Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J,
Gatelli D, Saisana M, Tarantola S (2008) Global sensitivity
analysis: the primer. Wiley, Chichester
Tomović R, Vukobratović M (1972) General sensitivity theory.
Elsevier, New York
Zak DE, Stelling J, Doyle FJ III (2005) Sensitivity analysis of
oscillatory (bio)chemical systems. Comput Chem Eng
29:663–673
Zsély IGy, Zádor J, Turányi T (2003) Similarity of sensitivity
functions of reaction kinetic models. J Phys Chem Cell Cycle of Early Frog Embryos, Fig. 1 Early Xenopus
A 107:2216–2238 laevis embryos after their first cell cleavage

anaphase-promoting complex (APC) – is known to


Cell Cycle of Early Frog Embryos behave as an autonomous, clock-like cell cycle oscil-
lator. Following the first 12 cleavages, at a stage known
Joseph R. Pomerening as the mid-blastula transition (MBT), the frog embryo
Department of Biology, Indiana University, cell cycle is modified to include zygotic transcription,
Bloomington, IN, USA and slows due to the incorporation of growth phases,
lengthened S-phases, and the ability to induce cell
cycle arrest due to the activation of checkpoints.
Synonyms

Cyclin-dependent kinase 1 (CDK1)/Anaphase- Characteristics


promoting complex (APC) oscillator; Early embryonic
oscillator; Xenopus laevis embryonic cell cycle No Gap or Growth Phases, and Zygotic
Transcription is Inhibited
Frog eggs and early embryos have been a long-utilized
Definition experimental system due to the ease of care of parental
populations, their large size lending to the ease of
Early frog embryos – including those of the well-studied manipulation and observation, and the ability to easily
genus Xenopus – exhibit a cell cycle with unique collect an abundance of eggs suitable for use in bio-
characteristics. These include a lack of growth phases, chemical studies – primarily in the pseudo-tetraploid
as well as an inability to arrest despite chemical or Xenopus laevis – or for genetic manipulation in the
physical insult, including DNA-damaging or cytoskele- diploid Xenopus tropicalis. Cell cycles in the early
ton-disrupting agents, as well as nuclear ablation. embryos of these frogs – from the period of fertiliza-
Alternation of genome replication (S-phase) and daugh- tion up until MBT, when the zygotic genetic program
ter chromosome segregation followed by cell cleavage is initiated – exhibit unique characteristics that distin-
(M-phase) occurs quickly in frog embryos (Fig. 1). guish it from the cycles that occur in somatic cells later
Egg-borne maternal stores of mRNA and protein in development, as well as in their mammalian embry-
drive these oscillations, with the duration of cycles onic counterparts (Murray and Hunt 1993). The frog
2–12 being roughly 30 min, following an initial early embryonic cell cycle, however, does share simi-
75–90 min cycle. Due to the rapid and independent larities to the early cycles of other amphibians, fish,
nature of these frog early embryonic cycles, the and insects. One similarity is that the earliest cleavages
enzymatic tandem that drives it – the stimulatory that occur following fertilization do not accommodate
phospho-transferase, cyclin-dependent kinase 1 periods for growth, while rapid exchanges between
(CDK1), and the inhibitory ubiquitin ligase, the periods of DNA replication and mitosis occur.
Cell Cycle of Early Frog Embryos 301 C
Xenopus laevis eggs – and developing one-cell

(CDK1 activity, relative units)


histone H1 phosphorylation
1000
blastomeres – are roughly 1 mm in diameter, and
remain this size until their cell cycle is modified to 800
include growth phases after the onset of MBT 600
(Bernardini et al. 1999). The maternal contribution of
400
protein and mRNAs to the frog egg is critical to these
200
early cell cleavages, because its genome remains tran- C
scriptionally inactive until the embryo reaches MBT.
0 60 120 180 240
Experimental evidence points to this transition corre- time (min)
lating strongly with the nuclear-to-cytoplasmic vol-
ume (N-to-C) ratio in cells, but what triggers the cell phospho-H1
cycle to behave more deliberately with its inclusion of
growth phases and checkpoints remains unknown Cell Cycle of Early Frog Embryos, Fig. 2 Oscillations of
(Newport and Kirschner 1982). CDK1 activity in a Xenopus laevis cycling egg extract

No Checkpoints in Response to Unreplicated DNA, laevis eggs (Lohka et al. 1988). This activity was
DNA Damage, or Mitotic Defects ascribed to a heterodimeric complex of proteins com-
After fertilization, frog early embryos continue their prising the cyclin-dependent kinase 1 (CDK1), and its
rapid cell cycles unimpeded, even if exposed to agents activating counterpart, cyclin B. In 1995, the enzyme
that block the processes required for DNA replication responsible for targeting cyclin B for proteolysis by the
and cell division (Morgan 2006). Treatment with DNA proteasome – called the anaphase-promoting complex
synthesis inhibitors prevents S-phase completion, but (APC) – was also purified from Xenopus eggs and
cell cleavages will continue. In fact, the presence of clams (King et al. 1995; Sudakin et al. 1995).
a nucleus is not a requirement: enucleation of fertilized Together, the M-phase stimulatory CDK1 protein
eggs has been shown inconsequential to oscillations of kinase and the CDK1-inhibiting APC ubiquitin ligase
the biochemical clock that drives the frog early embry- were found to be the system that drives the clock-like
onic cell cycle. Poisons including nocodazole and taxol oscillations during the early embryogenesis of frogs, as
inhibit the assembly and disassembly of the mitotic well as other amphibians, marine invertebrates (such
spindle, respectively, but even these cell division inhib- as clams and starfish), and insects. Cycling extracts
itors fail to block the activities of the oscillating derived from parthenogenetically activated Xenopus
enzymes that drive the cycles. Frog early embryos eggs exhibit rapid and sustained oscillations of CDK1
treated with these drugs do not undergo cleavages, but activity, and provide a useful tool to study the dynam-
do exhibit waves of cortical/cell surface contractions ical behaviors of this enzyme and others within the
that mirror the timing when mitosis would be expected frog early embryonic oscillatory system (Murray
to occur. DNA-damaging agents – including UV irradi- 1991). CDK1 activity is assayed in these cytoplasmic
ation – are also insufficient to halt progression through extracts by measuring its phosphorylation of
the cell cycle in these early embryos. Altogether, these a preferred substrate, histone H1 (Fig. 2).
traits exemplify the fact that the cell cycle and its control
system in frog early embryos is a clock-like, autono- Positive Feedback Generates Bistability in the
mous oscillator that is not susceptible to perturbations Response of CDK1 to Cyclin, and Confers
that affect the genome, or the subcellular structures that a Relaxation Oscillator Design in the Frog Early
are necessary for cell division. Embryo
The activation of CDK1 is under the control of inhibi-
Frog Early Embryonic Cell Cycles Are Driven by tory kinases and activating phosphatases (Morgan
Alternating Oscillations of CDK1 and APC 2006). Upon binding of cyclin, phosphorylation of
Activities CDK1 on residues of its ATP-binding site (Thr14/
During the late 1980s, extensive physiological and Tyr15) by the kinase Wee1 (as well as another, called
biochemical studies led to the purification of the Myt1) inhibits its activity. Activation of the phosphatase
M-phase-promoting factor from unfertilized Xenopus Cdc25 counteracts this inhibition, but also initiates two
C 302 Cell Cycle of Early Frog Embryos

critical positive-feedback loops. CDK1 phosphorylates (when leaving the off-state), than the threshold level of
and inhibits the Wee1 and Myt1 kinases – causing cyclin that must be surpassed to switch CDK1 from on
a double-negative (positive)-feedback loop – while to off (Pomerening et al. 2003; Sha et al. 2003). This
also further activating additional Cdc25, and inducing steady-state behavior is known as hysteresis, and this
a second positive-feedback loop (Fig. 3). means that there is bistability in response of CDK1
This activation, in turn, stimulates the APC- to cyclin stimulus: depending on where the system
mediated negative-feedback loop, and induces mitotic starts – either in the off or on state – it can exist in
exit and interphase by resetting CDK1 to an inactive either of those two states for a range of cyclin concen-
state through targeting the cyclin subunit for proteoly- trations (Fig. 4).
sis. Together, these positive- and negative-feedback In other words, within the hysteretic range of cyclin
subcircuits provide the basis for CDK1 activation and stimulus, if CDK1 starts in the on state (mitosis) and
inactivation, respectively, during the cell cycle. the stimulus decreases, CDK1 will remain active
Systems-level studies into the dynamical nature of (though its activity would be decreased, since there
CDK1 activation – that included a fusion of both would be less cyclin present) and the cell would remain
computation and experiment – revealed that the in mitosis. If the system starts in the off state and the
steady-state threshold for CDK1 activation is higher stimulus increases to the same level as mentioned
previously, CDK1 would remain off and the cell
would not leave interphase. Combining positive feed-
back and bistability with the APC-driven negative-
feedback loop, the CDK1-APC oscillator in frog
early embryos behaves as a relaxation oscillator –
producing rapid and spike-like bursts of CDK1
activity – followed by its rapid APC-mediated inacti-
vation. Lessening of this positive feedback through the
addition of a non-phosphorylatable (Wee1-insensitive)
mutant of CDK1 was found to cause damped CDK1
activity oscillations (Pomerening et al. 2005). This
caused defects in cell cycle progression – namely, the
Cell Cycle of Early Frog Embryos, Fig. 3 Positive-feedback premature inhibition of DNA synthesis – by rendering
loops in CDK1 activation egg extracts incapable of sustaining periods of low

Cell Cycle of Early Frog


Embryos, Fig. 4 Hysteresis
and bistability in CDK1
activation
Cell Cycle of Mammalian Cells 303 C
CDK1 activity. Therefore, on the systems-level, posi- Definition
tive feedback and bistability in this oscillator help to
ensure sustained, pulsatile oscillations of CDK1 activ- The cell cycle is the regulated, ordered progression of
ity to properly drive cell cycle progression during frog steps by which a cell commits to proliferation, replicates
early embryogenesis. its DNA, and divides into two daughter cells. The mam-
malian cell cycle is coupled to normal and aberrant
biological processes unique to multicellular organisms C
References including development, differentiation, tissue forma-
tion, wound healing, and cancer. Numerous signal trans-
Bernardini G, Prati M, Bonetti E, Scari G (1999) Atlas of duction pathways link these processes to mammalian
Xenopus development. Springer, New York
cell cycle entry and progression. Cancer is the disease
King RW, Peters JM, Tugendreich S, Rolfe M, Hieter P,
Kirschner MW (1995) A 20S complex containing CDC27 state that results from excessive cellular proliferation.
and CDC16 catalyzes the mitosis-specific conjugation of Cancer results when mechanisms for regulating the cell
ubiquitin to cyclin B. Cell 81(2):279–288 cycle are lost, as such the complexities of the mamma-
Lohka M, Hayes MK, Maller JL (1988) Purification of matura-
lian cell cycle have received significant attention from
tion-promoting factor, an intracellular regulator of early
mitotic events. Proc Natl Acad Sci USA 85:3009–3013 the research community. Mammalian cells have signal
Morgan DO (2006) The cell cycle: principles of control. New transduction pathways that monitor aberrant prolifera-
Science Press, London tion and in response commit the cell to either senescence
Murray AW (1991) Cell cycle extracts. Methods Cell Biol
36:581–605
or apoptosis as a barrier to tumorigenesis. The complex,
Murray AW, Hunt T (1993) The cell cycle: an introduction. multivariate relationship between the cell cycle and
Oxford University Press, New York the context of the cell necessitates a systems biology
Newport J, Kirschner M (1982) A major developmental transi- approach to understanding how the cell cycle is regu-
tion in early Xenopus embryos: I. Characterization and
lated and importantly for understanding how this
timing of cellular changes at the midblastula stage. Cell
30(3):675–686 regulation fails in cancer cells.
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell
cycle oscillator: hysteresis and bistability in the activation of
Cdc2. Nat Cell Biol 5(4):346–351
Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level
Characteristics
dissection of the cell-cycle oscillator: bypassing positive
feedback produces damped oscillations. Cell 122(4): Fundamentals
565–578 Mammals are complex, multicellular organisms
Sha W, Moore J, Chen K, Lassaletta AD, Yi CS, Tyson JJ, Sible
JC (2003) Hysteresis drives cell-cycle transitions in Xenopus
possessing diverse tissues and organs containing
laevis egg extracts. Proc Natl Acad Sci USA 100(3):975–980 a plethora of cell types. All of these diverse cell types
Sudakin V, Ganoth D, Dahan A, Heller H, Hershko J, Luca FC, are generated during development and arise from the
Ruderman JV, Hershko A (1995) The cyclosome, a large totipotent fertilized egg. The process that generates
complex containing cyclin-selective ubiquitin ligase activity,
distinct cell types is known as cellular differentiation.
targets cyclins for destruction at the end of mitosis. Mol Biol
Cell 6(2):185–197 The overwhelming majority of cells in a mature mam-
mal are somatic cells, which are terminally differenti-
ated and no longer proliferate. Mammals maintain
pluripotent stem cells that drive tissue regeneration
and wound healing.
Cell Cycle of Mammalian Cells The cell cycle of a proliferating mammalian cell
consists of four phases G1, S, G2, and M. Mitosis (M)
Gerard J. Ostheimer is the ordered series of events that result in the replicated
Department of Agriculture, Washington, DC, USA chromosomes being physically separated, i.e., segre-
gated, and cytokinesis in which two daughter cells
with equal amounts of genetic material are produced.
Synonyms Replication occurs during the synthesis (S) phase. G1 is
the gap period between M and S phase and G2 is the gap
Cell division; Cell growth; Proliferation period between S and M phase. Actively proliferating
C 304 Cell Cycle of Mammalian Cells

mammalian cells, such as cancer cells in tissue culture, oncogene addiction (link to article by Frick et al.).
move through the cell cycle in 12–24 h. Cells with the Nutrient availability, a prerequisite for growth and
potential to proliferate but which are not proliferating proliferation, is communicated via the mTor pathway
are in the G0 phase of the cell cycle and are described (Hay and Sonenberg 2004). The EGFR signaling path-
as quiescent. As in yeast, progression through the way has received considerable attention from systems
cell cycle is driven and regulated by cyclin– and computational biologists and highly detailed mass-
cyclin-dependent kinase (CDK) pairs responsible for action kinetic models have been published (Chen et al.
promoting and regulating each of the four transitions 2009). These models are being used to generate molec-
G0-G1, G1-S, S-G2, G2-M. Mammalian cells have a ular targets for anticancer therapeutics (Schoeberl et al.
larger and more diverse set of cyclins than yeast and 2009).
these cyclins allow for tissue and context-specific
regulation of the cell cycle (see article by Malumbres). Cell Cycle Quality Control
For proliferation to succeed both daughter cells must
Cell Cycle Entry receive accurate and complete copies of the genome.
It is extremely difficult to monitor cell growth and Replicating the genome by unwinding, transcribing,
proliferation in vivo. As such, much of our understand- and accurately segregating chromosomes (S phase)
ing of the mammalian cell cycle is from experiments causes numerous sites of DNA damage such as single
performed in tissue culture where it is possible to and double strand breaks. In addition, there are numer-
control cellular exposure to many of the variables ous additional sources of DNA damage including free
that determine whether cells proliferate including radicals from the cell’s own metabolism and environ-
exposure to growth factors and nutrient availability. mental sources of damaging agents such as back-
Removal of nutrients and growth factors, aka serum ground radiation. DNA damage is detected and
starvation, drives many immortal cell lines out of the responded to by an extensive molecular machinery of
cell cycle and into quiescence (G0). Quiescence is proteins that recognize DNA damage and signal its
characterized by hypophosphorylated Rb that binds presence and extent to the cell and DNA repair
E2F1 thereby blocking transcription of genes essential machinery. In mammalian cells the protein p53
or cell cycle entry. Addition of nutrients and growth functions as a signal integration node coupling DNA
factors activates signal transduction pathways that damage to cell cycle arrest, senescence, and apoptosis.
upregulates gene expression that results in phosphory- In addition, the signaling kinases of the DNA damage
lation of Rb and causes Rb to no longer bind E2F1. response, Chk1 and Chk2, directly couple the DNA
Released E2F1 upregulates the expression of Cyclin D damage response to the cell cycle machinery. For more
and Cdk1, and this cyclin-CDK pair regulates the details on how DNA damage is coupled to the cell
transition from G0 to G1. Once cells have passed the cycle, see Cell cycle signaling, DNA damage by
restriction point (link to article by You et al.) then they Toettcher in this volume.
move into S and are committed to moving through the Cell fate after DNA damages is a function of the
entire cell cycle even if serum starved again. amount of damage, the dynamics of the damage, and
A complex set of signal transduction pathways interplay with extracellular cues. Low levels of tran-
communicates extracellular cues to the cell cycle sient DNA damage induce a temporary cell cycle arrest
machinery. A canonical example is the extracellular that permits repair. In the G2 phase of the cell cycle the
growth factor (EGF) pathway. EGF binds its receptor, cell possesses two copies of each chromosome, which
EGFR, activating the mitogen-activated protein kinase allows the cell to correct damage generated in S phase
(MAPK aka ERK) pathway, which in turn activates through coupled homologous recombination and
transcriptional programs (via c-Myc and CREB) that repair. High and/or persistent levels of DNA damage
induce cell cycle entry. The Akt pathway also trans- signal through p53 to induce either senescence or apo-
duces extracellular growth factor signal into cell cycle ptosis. Differentiated mammalian cells do not express
entry. Many components of the MAPK and Akt path- telomerase and as such each round of replication trun-
ways are oncogenes. Mutations in these proteins can cates their telomeres. Telomeres completely eroded by
drive oncogenesis, and cancer cells often exhibit excessive rounds of replication are recognized by the
Cell Cycle Ontology (CCO) 305 C
DNA damage machinery which induces a persistent of ErbB signaling pathways as revealed by a mass action
DNA damage response that drives the cell into a state model trained against dynamic data. Mol Syst Biol 5:239
Hay N, Sonenberg N (2004) Upstream and downstream of
of permanent cell cycle referred to as replicative senes- mTOR. Genes Dev 18:1926–45
cence. These cellular responses are unique departures Schoeberl B, Pace EA, Fitzgerald JB, Harms BD, Xu L, Nie L,
from the mammalian cell cycle and function as barriers Linggi B, Kalra A, Paragas V, Bukhalid R, Grantcharova V,
to tumorigenesis (Bartkova et al. 2006). Interestingly, Kohli N, West KA, Leszczyniecka M, Feldhaus MJ,
Kudla AJ, Nielsen UB (2009) Therapeutically targeting
cell fate determination after DNA damage is strongly ErbB3: a key node in ligand-induced activation of the ErbB C
influenced by extracellular cues, such as cytokines and receptor-PI3K axis. Sci Signal 2:ra31
growth factors, implying that a multivariate systems
approach is necessary for understanding how cells
couple extracellular signal transduction to the DNA
damage response and the cell cycle machinery to
commit to these cell fates. Cell Cycle Ontology (CCO)

Erick Antezana, Vladimir Mironov and Martin Kuiper


Department of Biology, Systems Biology Group,
Cross-References
Norwegian University of Science and Technology,
Trondheim, Norway
▶ Apoptosis
▶ Cdk Inhibitors
▶ Cell Cycle
Synonyms
▶ Cell Cycle Checkpoints
▶ Cell Cycle Model Analysis, Bifurcation Theory
Application ontology; Domain ontology
▶ Cell Cycle Modeling, Differential Equation
▶ Cell Cycle Models, Sensitivity Analysis
▶ Cell Cycle Transitions, G2/M
Definition
▶ Cell Cycle Transitions, Mitotic Exit
▶ Cell Cycle, Budding Yeast
Application ontologies are engineered for a specific
▶ Cell Cycle, Cancer Cell Cycle and Oncogene
research area or domain. They often use canonical
Addiction
ontologies to construct ontological classes and rela-
▶ Cell Cycle, Fission Yeast
tionships between classes. Ontologies (▶ Ontology)
▶ Cytokines
are a means by which knowledge is captured according
▶ DNA Repair
to a common conceptualization of a domain, and they
▶ DNA Replication
are commonly used within the molecular biology
▶ Endoreplication
subdiscipline of the life sciences (Stevens and Lord
▶ Mitosis
2008). The cell cycle ontology (CCO, Antezana et al.
▶ Mitotic Kinases
2009) is an application ontology, developed to inte-
▶ Senescence
grate knowledge of the eukaryotic cell cycle process
▶ Signal Transduction Pathway
(▶ Cell Cycle). It facilitates scientific discovery in the
▶ Transcription
area of cell cycle research.

References Characteristics

Bartkova J et al (2006) Oncogene-induced senescence is part of Requirements of CCO


the tumorigenesis barrier imposed by DNA damage
checkpoints. Nature 444:633–637
CCO brings together a variety of knowledge from the
Chen WW, Schoeberl B, Jasper PJ, Niepel M, Nielsen UB, cell cycle domain, following a standard ontology-
Lauffenburger DA, Sorger PK (2009) Input-output behavior based integration paradigm. The knowledge in this
C 306 Cell Cycle Ontology (CCO)

domain is constantly evolving, and covers a variety of 7. Biological species relationships as described by the
organisms. This poses a range of requirements: NCBI taxonomy (▶ NCBI BioProject Genome
1. CCO integrates cell cycle knowledge across Resources)
several model organisms: H. sapiens, S. cerevisiae, 8. Protein-protein interactions involving cell
S. pombe, and A. thaliana. cycle proteins from the IntACT database
2. It presents a common conceptual view of this (▶ Protein–Protein Interaction Databases)
knowledge, permitting sophisticated querying in
a variety of forms, across cell cycle knowledge. Core Concepts of CCO
3. CCO accommodates frequent changes in data, by Protein, Modified protein, Gene, Molecular interac-
frequent rebuilds through an automated pipeline. tion, Molecular function, Biological process, Cellular
4. CCO performs automatic mapping of biological component, and Organism. All these entities in the
entities. CCO (proteins – including their modified forms,
genes, interactions, etc.) are modeled as classes since
Contents of CCO they gather shared commonalities that are present in all
The cell cycle ontology covers the following topics: the individuals (instances) they represent (Fig. 1).
1. The cell cycle proteins
2. The genes coding for cell cycle proteins Upper Level Ontology
3. The molecular function of cell cycle proteins An upper level ontology (ULO) connects the inte-
4. The cellular localization of cell cycle proteins grated resources. A ULO is an ontology that structures
5. Cellular processes associated with cell cycle general types of concepts (such as a process) in generic
proteins as well as specific domains to provide an integration
6. Post-translational modifications of cell cycle scaffold for including other ontologies. The CCO-
proteins ULO is based on the basic formal ontology (BFO,
7. Protein-protein interactions among cell cycle Grenon et al. 2004) to ensure interoperability of the
proteins CCO with ontologies from OBO. The CCO-ULO has
8. Phylogenetic information for the supported organ- been customized for the CCO by inclusion of a few
isms through orthology relationships among cell high-level concepts, such as “cell cycle protein” (see
cycle proteins Fig. 2).
The cell cycle ontology integrates the following
resources: Construction of CCO
1. Genes, proteins, and their post-translational modi- The cell cycle ontology is constructed in several steps:
fications from the UniProt database 1. The process starts by selecting portions from vari-
2. The cell cycle, cell division, cell proliferation, DNA ous OBO source ontologies. This “pre-cell cycle
replication branches of the gene ontology biological ontology” constitutes a backbone for the complete
process branch (▶ Gene Ontology) CCO and the four species-specific sub-ontologies
3. The complete molecular function branch from the thereof.
gene ontology 2. The OBO relations ontology is fully incorporated,
4. The complete cellular component branch from the together with the “interaction type” branch from the
gene ontology molecular interaction ontology from OBO.
5. The relation ontology provided by the Open Bio- 3. A specific taxonomy is built on the basis of
medical Ontologies Consortium (OBO) (Smith the NCBI taxonomy for H. sapiens, S. cerevisiae,
et al. 2007) S. pombe, and A. thaliana.
6. The gene ontology annotations (Camon et al. 2004) 4. Protein “hubs” that connect their relevant annota-
of UniProt proteins for the cell cycle (these are the tion data (such as the molecular functions in
attachments of GO Cellular Component concepts to which they participate) are generated. All proteins
protein to describe their location); GO biological described with cell cycle concepts in gene
process and molecular function concepts to ontology annotation (GOA) files (Camon et al.
describe their functionality 2004) are named “core cell cycle proteins.”
Cell Cycle Ontology (CCO) 307 C
core cell G1/S
Saccharomyces transition of
cycle SWI4_yeast
cerevisiae mitotic cell
nucleus protein CCO:G0002318
organism CCO:C0000252 cycle
CCO:B0000000
CCO:T0000016 CCO:P0000012

participates_in
derives_from Orthology
located_in encoded_by
cluster1517 C
is_a protein
swi4-ssa1 CO:O0001289
physical
interaction
CCO:I0002887 is_a
SWI4_YEAST-
participates_in Phosphoserine
SWI4_YEAST transforms_into 159
CCO:B0000111 CCO:B0009551

participates_in transforms_into
swi6-mpg1
physical SWI4_YEAST-
interaction Phosphoserine
CCO:I0003305 806
CCO:B0009552
transforms_into
participates_in transforms_into

participates_in
ho-491 SWI4_YEAST-
physical Phosphoserine
interaction 1003
CCO:I0005128 CCO:B0009553
swi4-2 SWI4_YEAST-
physical Phosphoserine
interaction 1007
CCO:I0005527 CCO:B0009554

Cell Cycle Ontology (CCO), Fig. 1 Protein-centric knowledge in CCO. The local neighborhood of the cell cycle protein wee1
(human) is shown. Entities (color coded) and relationships (arrows) illustrate the types of knowledge in CCO

These proteins are added to the CCO as the chil- (Li et al. 2003). This tool is used to infer evolution-
dren of the concept core cell cycle protein (CCO: ary relationships between classes of proteins. The
B0000000) and used as seeds for the data integra- proteins from the clusters containing at least one
tion process. core cell cycle protein are added to the CCO as
5. The “core cell cycle” protein section of the CCO is subclasses of the class cell cycle protein. All the
expanded to include the proteins known to interact imported entries from the sources are cross-
with the core cell cycle proteins, as documented in referenced so that the data can be traced back to
IntACT (Kerrien et al. 2007). These new protein its source.
classes are included as subclasses of the class cell 7. The four organism-specific ontologies and the com-
cycle protein. Then, protein information (such as posite cell cycle ontology are checked and made
synonym names, encoding genes, and cross- available producing the official release of the
references) is fetched from the UniProt knowledge system.
base. In addition, post-translational modification 8. Now that the CCO is available with all the knowl-
data, when available in UniProt, are also added by edge in place, semantic enrichment can occur. We
creating new concepts defined by their specific use the Ontology PreProcessor Language (OPPL,
modification. http://oppl2.sourceforge.net/) to further transform
6. Clusters of putative orthologs of the four species the CCO with application of ontology design
are generated with the OrthoMCL clustering utility patterns.
C

Data from
308

cell cycle
Data from GO UniProt
(GO)

Data from
biological OrthoMCL
perdurant
protein
Data from MI is_a
is_a modified
is_a protein
is_a is_a cell cycle
biological modified
entity gene product is_a protein
cell cycle
protein

is_a is_a is_a


cell cycle
continuant
is_a is_a
biological cell cycle
is_a gene is_a
endurant gene

is_a
cellular S. cerevisiae
is_a Data from GO
component

S. pombe
Data from
Organism
NCBI
A. thaliana

H. sapiens

Cell Cycle Ontology (CCO), Fig. 2 Upper level ontology (ULO) for the CCO. The ULO is a hierarchical scaffold including generic concepts (e.g., cell cycle gene) which connects
the integrated resources. The dashed rectangles represent the type of data residing below the parental concepts: the node labeled “data from GO” shows where the concept from
GO’s cellular component ontology goes under the concept cellular component (e.g., nucleus), the node “data from UniProt” under the concept protein shows where protein data from
UniProt resides, and so forth
Cell Cycle Ontology (CCO)
Cell Cycle Progression in Bacteria 309 C
Cell Cycle Ontology (CCO), Table 1 Numbers of classes, ▶ NCBI BioProject Genome Resources
breadth, and depth of the CCO hierarchy ▶ Ontology
Structure and cohesion of CCO ▶ Protein–Protein Interaction Databases
Metric Asserted CCO
No. of classes 89,532
No. of leaf classes 85,235
Max. depth 33 References C
Mean depth 15.35
Mean number of subclasses per 4.61 Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I,
class De Baets B, Stevens R, Mironov V, Kuiper M (2009) The
cell cycle ontology: an application ontology for the represen-
Max. number of subclasses 24,021
tation and integrated analysis of the cell cycle process.
No. of properties 52 Genome Biol 10:R58
Property with maximum usage has_source used 65 110 times Bug WJ, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C,
Mean usage per property 6,673.69 Laird AR, Larson SD, Rubin D, Shepherd GM, Turner JA,
Mean property usage per class 3.87 Martone ME (2008) The NIFSTD and BIRNLex vocabular-
ies: building comprehensive ontologies for neuroscience.
Neuroinformatics 6:175–194
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J,
Binns D, Harte N, Lopez R, Apweiler R (2004) The gene
9. Finally, validation and verification processes are ontology annotation (GOA) database: sharing knowledge in
carried out while building the CCO to ensure its uniprot with gene ontology. Nucleic Acids Res 32(Database
issue):D262–266
soundness. The ontologies generated in the previ-
Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology:
ous phase are manually and automatically checked applying BFO in the biomedical domain. Stud Health
by ontology editors, validators, and reasoners. In Technol Inform 102:20–38
addition, the pipeline log execution files are Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A,
Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley
inspected in detail. These files are sufficiently
R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C,
detailed to point out any possible problem. This Montecchi-Palazzi L, Orchard S, Risse J, Robbe K,
has allowed us to assemble a fully automated pipe- Roechert B, Thorneycroft D, Zhang Y, Apweiler R,
line that uploads the ontologies and their exports in Hermjakob H (2007) IntAct - open source resource for
molecular interaction data. Nucleic Acids Res 35(Database
the different formats to the CCO website and to all
issue):D561–565
related and supporting repositories (Table1). Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification
of ortholog groups for eukaryotic genomes. Genome Res
Examples of Other Application Ontologies 13:2178–2189
Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J,
The experimental factor ontology (EFO, Malone et al.
Kolesnikov N, Zhukova A, Brazma A, Parkinson H (2010)
2010), developed by the European Bioinformatics Modeling sample variables with an experimental factor
Institute (EBI), represents sample variables from ontology. Bioinformatics 26:1112–1118
gene expression experimental data. The NIFSTD Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W,
Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, OBI Con-
ontology, developed by the NeuroInformatics Frame-
sortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone
work (NIF, Bug et al. 2008), has produced an inventory SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007)
of Web-based neuroscience resources, integrating an The OBO foundry: coordinated evolution of ontologies to
ontology with separate modules covering major support biomedical data integration. Nat Biotechnol
25:1251–1255
domains of neuroscience: anatomy, cell, subcellular,
Stevens R, Lord P (2008) Application of ontologies in bioinfor-
molecule, function and dysfunction, and concepts for matics. In: Staab S, Studer R (eds) Handbook on ontologies
describing experimental techniques, instruments, and in information systems. Springer, Berlin
other resources.

Cross-References
Cell Cycle Progression in Bacteria
▶ Cell Cycle
▶ Gene Ontology ▶ Cell Cycle, Prokaryotes
C 310 Cell Cycle Regulation, microRNAs

Characteristics
Cell Cycle Regulation, microRNAs
Cell Cycle Genes Directly Regulated by microRNAs
Baltazar D. Aguda1 and Gregory Riddick2 Shown in Fig. 1 are some of the experimentally
1
Mathematical Biosciences Institute, Ohio State validated microRNAs targeting key cell cycle genes
University, Columbus, OH, USA involved in the early phases (G1 and S) of the mam-
2
Neuro-Oncology Branch, National Cancer Institute, malian cell cycle (Chivukula and Mendell 2008;
Bethesda, MD, USA Yu et al. 2010). Myc and E2F are transcription
factors that induce expression of cell cycle genes
such as cyclins (e.g., cyclin E) and cyclin-dependent
Synonyms kinases (e.g., CDK4). Note that Myc and E2F
promote each other’s expression. As shown in
Post-transcriptional regulation of gene expression Fig. 1, these transcription factors also induce the
expression of miR-17/20 (which is the abbreviated
name for two microRNAs, namely, miR-17-5p and
Definition miR-20a). In return, miR-17/20 target and inhibit the
expression of Myc and E2F, thus forming negative
MicroRNAs are short (~21 nucleotides) non-coding
RNAs that are endogenous in the genomes of ani-
mals and plants, and have been shown to induce p16 24-1
cleavage or suppression of translation of their
target mRNAs. Thus, microRNAs are generally let-7 15a/16
17/20
let-7 Myc
known as post-translational negative regulators of 34a cycD CDK4,6 34a
let-7
gene expression, although there are a few recent 17 / 20 15a / 16
reports that some microRNAs act as positive regu-
lators. The mRNA target is recognized through
binding of a seed sequence in the microRNA to 17/20 Rb 17/20
E2F
a complementary sequence in the mRNA. Using 34a

this sequence complementarity, several computer


programs have been created to predict microRNA
targets. Some examples of the more popular pro-
grams publicly available on the Internet are the
following:
TargetScan (http://www.targetscan.org/) p27Kip1 CDK2 cycE cdc25A

Microcosm (http://www.ebi.ac.uk/enright-srv/micro- 221 / 222 34a 15a/16 let-7 15a/16


cosm/htdocs/targets)
mirWalk (http://www.ma.uni-heidelberg.de/apps/zmf/ 17 p21Cip1 93
mirwalk/).
It has been estimated that the expression of at least Cell Cycle Regulation, microRNAs, Fig. 1 MicroRNAs
a third of human genes is post-transcriptionally regu- targeting key cell cycle genes in G1 and S phases. Cell cycle
lated by microRNAs. To date, there are more than 700 genes are shown in boxes. Gene abbreviations: cycD (cyclin D),
cycE (cyclin E), Myc (c-myc), Rb (Retinoblastoma protein), E2F
microRNAs identified in the human genome. The com- (E2F1, E2F2, E2F3). Small black squares indicate microRNAs
plexity of the regulation of gene expression by targeting the genes beside them. Labels beside black squares
microRNAs stems from the possibility of multiple give the shortened names of the microRNAs (e.g., 17/20 means
targets per microRNA (in some cases, a single miR-17 and miR-20). Dashed arrows mean transcriptional
induction of gene expression (e.g., cycD, Myc, and E2F induce
microRNA is predicted to target up to several hundred the expressions of miR-17, and miR-20). Qualitative interactions
mRNAs) and the possibility of multiple microRNAs (edges between genes) are symbolized by hammerheads (inhibi-
targeting per gene. tion) and arrows (activation)
Cell Cycle Signaling, Hypoxia 311 C
a

Myc E2F1 E2F3


p

E2F2 1 C

2
m

17-5p 17-3p 18a 19a 20a 19b 92-1

Cell Cycle Regulation, microRNAs, Fig. 2 Modeling the neg- the row of seven microRNAs at the bottom of the figure. Two
ative feedback loop between miR-17/20 and Myc/E2F. (a) members of the cluster, miR-17-5p and miR-20a, target and
Shown inside the gray box are the proteins Myc, E2F1, E2F2, inhibit Myc and E2F1-3. (b) A reduced model involving an
and E2F3 inducing each other’s expression. All of these proteins autocatalytic protein module, p, and a microRNA module, m
also induce the expression of the miR-17-92 cluster pictured by

feedback loops. A third negative feedback shown References


in Fig. 1 is between miR-17/20 and cyclin D.
Aguda BD, Kim Y, Piper-Hunter MG, Friedman A, Marsh CB
(2008) MicroRNA regulation of a cancer network: conse-
Mathematical Modeling of the Negative Feedback
quences of the feedback loops involving miR-17-92, E2F,
Loops Involving miR-17/20 and Myc/E2F and Myc. Proc Natl Acad Sci USA 105(50):19678–19683
The microRNA cluster called miR-17-92 is composed Chivukula RR, Mendell JT (2008) Circular reasoning:
of seven microRNAs as shown in the bottom of Fig. 2a. microRNAs and cell-cycle control. Trends Biochem Sci
33(10):474–481
Members of this cluster, particularly miR-17-5p and
Yu Z, Baserga R, Chen L, Wang C, Lisanti MP, Pestell RG
miR-20a, have been implicated in various human (2010) microRNA, cell cycle, and human breast cancer.
cancers – that is, their specific deletions in the genome Am J Pathol 176(3):1058–1064
or overexpressions are associated with cancer initia-
tion (Chivukula and Mendell 2008). The negative
feedback loop between miR-17/20 and Myc/E2F was
modeled by Aguda et al. (2008) to explore how Cell Cycle Signaling, Hypoxia
changes in miR-17/20 expression affect Myc/E2F acti-
vation (and thereby the rate of G1-to-S transition in the Bernhard Schmierer
cell cycle). The detailed network shown in Fig. 2a was Department of Biochemistry, Oxford Centre for
abstracted to the two-variable model shown in Fig. 2b. Integrative Systems Biology (OCISB), University of
This simple model predicts that there is an all-or-none Oxford, Oxford, UK
switching behavior of Myc/E2F activity as miR-17/20
expression is increased. This bistable switch is gener-
ated by the autocatalytic nature of the protein module, Definition
p, as indicated by step 1 in Fig. 2b. The model further
predicts that oscillations in the activities of p and m are Hypoxia refers to a state of inadequate oxygen supply in
possible due to the presence of the negative feedback cells, tissues, or organisms. Among many other cellular
loop between p and m. processes, hypoxia affects cell cycle progression.
C 312 Cell Cycle Signaling, Hypoxia

Characteristics PHD-dependent hydroxylation reaction, HIFa escapes


degradation and accumulates. Binding of HIFa to
Oxygen is the terminal electron acceptor of the mito- HIFb, which is constitutively present and identical to
chondrial respiratory chain, and an adequate oxygen the aryl hydrocarbon receptor nuclear translocator
supply is a fundamental requirement for the survival (ARNT) then yields the transcriptionally active,
and proper function of all metazoan cells (Semenza heterodimeric transcription factor. This mechanism is
2007). Higher animals have evolved sophisticated evolutionarily conserved in all metazoans and has been
organ systems such as the cardiovascular and respiratory put forward as the primary cellular oxygen-sensing
systems to supply all body tissues with sufficient mechanism (Kaelin and Ratcliffe 2008).
amounts of oxygen at all times. Disruption of adequate In most primary and also immortalized cell lines,
oxygen supply leads to tissue hypoxia, a condition which one of the functional consequences of the HIF-induced
features prominently in pathophysiological contexts transcriptional program is the inhibition of cell prolif-
such as anemia, ▶ ischemic disease, and tumor forma- eration, caused by a cell cycle arrest. The point in the
tion. However, hypoxic episodes may also be encoun- cell cycle at which the arrest occurs seems to depend
tered in physiological contexts such as embryonic on the severity of the hypoxic stress. Broadly speaking,
development, exercise, or altitude adaptation. At the moderate hypoxia leads to an arrest in the G1 phase of
cellular level, the intracellular oxygen tension is contin- the cell cycle, which is mostly reversible upon
uously monitored. Many cell types are able to adapt to ▶ reoxygenation. Severe hypoxia or anoxia can addi-
hypoxic conditions by systemic strategies such as meta- tionally arrest cells in S-phase as a consequence of the
bolic reprogramming and energy preservation. The for- hypoxic activation of the DNA damage checkpoint.
mer strategy is exemplified by the upregulation of the Interestingly, this occurs in the absence of any detectable
glycolytic pathway and by the shutdown of mitochon- DNA damage. Finally, reoxygenation after stringent hyp-
drial respiration; the latter strategy includes the inhibi- oxia can cause DNA single- and double-strand breaks via
tion of protein translation, cell growth, and cell the formation of reactive oxygen species, leading to a G2
proliferation. By secreting pro-angiogenic growth fac- arrest downstream of the DNA damage checkpoint. To
tors, cells signal their hypoxic status to the surrounding date, insight into mechanistic aspects of the link between
tissue to induce blood vessel formation and to restore the hypoxia and reduced cell cycle progression is limited, as
oxygen supply to the hypoxic area. In cases where these is the understanding of its functional consequences. The
adaptive strategies fail, prolonged hypoxia can lead to following section briefly reviews the available informa-
either cell necrosis or programmed cell death. tion and attempts to connect isolated observations into
A substantial proportion of the cellular adaptation a more coherent picture (Fig. 1).
to hypoxia in animals is regulated by the The best characterized effect of hypoxia on the
ab-heterodimeric transcription factor hypoxia- regulatory network controlling cell cycle progression
inducible factor (HIF), which is thought to be is the HIF-dependent transcriptional induction of the
a master regulator of the hypoxic response (Semenza cyclin-dependent kinase inhibitors (CDKIs) (▶ CDK
1999). HIF is activated by low oxygen tensions. Inhibitors) p21 (also known as CDKN1A) and p27
Briefly, in normoxic conditions, the HIFa-subunit is (also known as CDKN1B). In the G1 phase of the
highly unstable owing to its post-translational hydrox- cell cycle, these CDKIs inhibit CDK2/Cyclin E, and
ylation, which is catalyzed by the prolyl-hydroxylase prevent the phosphorylation of the Retinoblastoma
domain (PHD or EGLN) family of oxygen-dependent protein and thus the release of the E2F-transcription
dioxygenases (Schofield and Ratcliffe 2004). Hydrox- factors, arresting the cells at the restriction point
ylation of two proline residues in HIFa by these (▶ Cell Cycle Transition, Principles of Restriction
enzymes very substantially increases the interaction Point; ▶ Cell Cycle Transition, Detailed Regulation
of HIFa with the tumor suppressor protein von Hippel of Restriction Point) in the G1 phase of the cell cycle
Lindau (▶ PVHL), the targeting component of an (Morgan 2007). The restriction point is the point in G1
E3-ubiquitin ligase complex that triggers rapid after which mammalian cells are committed to com-
proteasomal degradation of HIFa. In hypoxic condi- pleting their division cycle, even in the absence of
tions, when oxygen becomes limiting for the growth factors. Mechanistically, HIF is thought to
Cell Cycle Signaling, Hypoxia 313 C
Hypoxia Anoxia cycle proceeds through the restriction point. In addi-
tion to this direct regulation of the cell cycle machinery
p21 by HIF, a number of other HIF-dependent, but less
HIF p27 (1) Stalled
Replication direct, effects likely contribute to this G1 arrest. Exam-
ples are a reduced protein translation, slowed cellular
Cdc25A CDK2 / CycE
mass gain, and extensive metabolic reprogramming,
ATR
processes which, at least in part, have been attributed C
pRb E2F (2) to the HIF/Myc antagonism.
Chk1 In addition to the reversible G1 arrest observed in
response to moderate hypoxia, very stringent hypoxia
G1
START
Cdc25A or ▶ anoxia additionally activates components of the
DNA damage checkpoint. This occurs in the absence
of any detectable DNA damage and is possibly a DNA
Replication
Origin Firing damage-independent process, most likely triggered by
a hypoxia-induced replication arrest. Among other
M S possible mechanisms, stringent hypoxia has the poten-
tial to cause a DNA replication arrest by HIF-
dependent repression of the gene CAD, which codes
G2
CDK1 / CycB Reoxygenation for multiple enzymatic activities that are required for
pyrimidine biosynthesis (Gordan et al. 2007),
(3)
DNA a metabolic pathway essential for DNA replication.
Cdc25C Chk2 ATM Damage ROS Interestingly, another enzyme required for pyrimidine
Cell Cycle Signaling, Hypoxia, Fig. 1 Cartoon representation biosynthesis is strictly oxygen dependent (Loffler
of the signaling pathways causing cell cycle arrest at different 1989). Thus, repression of this biosynthetic pathway
points in the cell cycle (red circles) in response to hypoxia and has the potential to contribute to a replication arrest in
reoxygenation. (1) Moderate hypoxia arrests cells at the restric- response to severe hypoxia in both a HIF-dependent
tion point (START) of the G1-phase in a HIF-dependent manner.
pRB indicates the phosphorylated Retinoblastoma protein. and a HIF-independent manner. Stalled replication
(2) Severe hypoxia or anoxic conditions additionally cause an forks then trigger the activation of the kinase ATR,
arrest in S-phase. This is most likely caused by an inhibition of which causes an S-phase or replication arrest in
replication origin firing downstream of a signaling cascade trig- response to stringent hypoxia. This arrest is accom-
gered by stalled replication forks, which themselves may be
a consequence of insufficient metabolite concentrations or an plished by the prevention of replication origin firing by
unfavorable energy charge. (3) Finally, reoxygenation after CHK1-dependent inhibition of CDC25A and the con-
severe hypoxia produces significant levels of reactive oxygen sequent high activity of cyclin/CDK2 complexes,
species (ROS), which can lead to DNA damage and activation of which is required for the initiation of replication
the DNA damage checkpoint. This prevents entry into mitosis
and arrests cells in the G2-phase. For details, see text forks (Morgan 2007). Signaling downstream of stalled
replication can also induce apoptosis by activating p53
(Hammond et al. 2007). The evolutionary advantage of
upregulate both the CDKN1A and CDKN1B genes by an activation of the DNA damage checkpoint under
counteracting ▶ c-Myc, which normally represses hypoxic conditions is disputed; however, it has been
these genes (Dang et al. 2008). Conversely, the phos- speculated that this occurs in anticipation of the DNA
phatase CDC25A, an activator of CDK/cyclin com- damage caused by the production of reactive oxygen
plexes and thus a positive regulator of cell cycle species (ROS) during reoxygenation. Such
progression, is normally activated by Myc. In hypoxic reoxygenation-induced DNA damage triggers ATM-
conditions, HIF represses CDC25A by counteracting dependent CHK2 activation, and phosphorylation and
Myc (Huang 2008). All these regulatory events con- inactivation of the phosphatase CDC25C, which nor-
tribute to a reversible G1 arrest in hypoxia: Upon mally allows mitotic entry by activating CDK1/CycB
restoration of sufficiently high oxygen levels, HIFa is complexes. In this way, ROS ultimately arrest cells in
rapidly degraded, its effects are abolished, and the cell the G2 phase of the cell cycle (Hammond et al. 2007).
C 314 Cell Cycle Signaling, Metabolic Pathway

In summary, insufficient oxygen levels block cell


cycle progression in distinct phases of the cycle by var- Cell Cycle Signaling, Metabolic Pathway
ious mechanisms. The consequences on cell fate depend
on the cell type and the degree and duration of the Fabian Rudolf and Joerg Stelling
hypoxic stress, ranging from a perfectly reversible, tran- Department of Biosystems Science and Engineering,
sient arrest to the induction of apoptotic cell death. ETH Zurich, Basel, Switzerland

Cross-References Definition

▶ Anoxia Cell proliferation requires growth and division.


▶ Cdk Inhibitors Growth is connected to cell cycle progression on
▶ Cell Cycle Arrest After DNA Damage multiple levels, such as the metabolic state, nutrient
▶ Cell Cycle Checkpoints signaling programs, and cell homeostasis. An illustra-
▶ Cell Cycle of Mammalian Cells tion of this linkage is the Warburg effect. Cancer cells
▶ Cell Cycle Signaling, Metabolic Pathway propagate by exclusively using an anaerobic metabolic
▶ Cell Cycle Transition, Principles of Restriction program. Forcing them to an aerobic program slows
Point down the cell cycle and thereby inhibits cancer
▶ Cell Cycle, Cancer Cell Cycle and Oncogene progression (Hsu and Sabatini 2008). This effect is
Addiction not only used for diagnostic purposes, but also
▶ c-Myc as a cancer treatment.
▶ Ischemic Disease Cells seem to segregate their metabolic activity to
▶ PVHL different phases in the cell cycle. S. cerevisiae cells
▶ Reoxygenation grown under carbon limitation in a chemostat can
show robust oscillations in their oxygen consumption
as they alternate between glycolytic and respiratory
References metabolism, and their cell cycle is synchronized to
these oscillations (Tu et al. 2005). The metabolic cycle
Dang CV, Kim JW, Gao P, Yustein J (2008) The interplay can be divided into three major phases: oxidative,
between MYC and HIF in cancer. Nat Rev Cancer reductive/building, and reductive/charging, where
8(1):51–56
S phase of the cell cycle is restricted to the reductive
Gordan JD, Thompson CB, Simon MC (2007) HIF and c-Myc:
sibling rivals for control of cancer cell metabolism and phases. Interestingly, individual S. cerevisiae cells
proliferation. Cancer Cell 12(2):108–113 grown under similar conditions show the same coupling
Hammond EM, Kaufmann MR, Giaccia AJ (2007) Oxygen sens- (Silverman et al. 2010). This synchrony provides
ing and the DNA-damage response. Curr Opin Cell Biol
a unique experimental approach for further studying
19(6):680–684
Huang LE (2008) Carrot and stick: HIF-alpha engages c-Myc in the coupling of metabolism and cell cycle.
hypoxic adaptation. Cell Death Differ 15(4):672–677
Kaelin WG Jr, Ratcliffe PJ (2008) Oxygen sensing by meta-
zoans: the central role of the HIF hydroxylase pathway.
Mol Cell 30(4):393–402
Characteristics
Loffler M (1989) The biosynthetic pathway of pyrimidine
(deoxy)nucleotides: a sensor of oxygen tension necessary Nutrient Availability and Cell Cycle Progression
for maintaining cell proliferation? Exp Cell Res Metabolism can regulate the cell cycle by different
182(2):673–680
means, mostly by controlling the availability of
Morgan DO (2007) The cell cycle. Principles of control. New
Science Press Ltd., London crucial molecules. For example, the cycle arrests or
Schofield CJ, Ratcliffe PJ (2004) Oxygen sensing by HIF slows down in S phase upon depletion of any
hydroxylases. Nat Rev Mol Cell Biol 5(5):343–354 nucleotide via drugs or removal of nucleotide
Semenza GL (1999) Regulation of mammalian O2 homeostasis
biosynthesis pathways (Brauer et al. 2008). Similar
by hypoxia-inducible factor 1. Annu Rev Cell Dev Biol
15:551–578 effects are observed during M phase upon reduced
Semenza GL (2007) Life with oxygen. Science 318(5847):62–64 lipid biosynthesis. However, we have to consider
Cell Cycle Signaling, Metabolic Pathway 315 C
possible laboratory artifacts since S. cerevisiae systems. In S. cerevisiae, ribosome biogenesis,
mutants in nucleotide biosynthesis show a defective translational efficiency, and carbohydrate storage
metabolic regulation similar to the Warburg appear to be sensed. This is consistent with the
effect when the corresponding nucleotide is limiting. observation that cell cycle progression depends on
Interestingly, S. cerevisiae cells limited for any translational regulation of key cell cycle proteins.
nutrient in its basic form (P, N, S ¼ inorganic salts, In addition, S. pombe employs a length-sensing
C ¼ carbohydrates) correctly regulate metabolism and network by limiting key cell cycle inhibitors to the C
therefore sensing mechanisms link metabolism and growing tips (Moseley et al. 2009).
cell cycle progression (Brauer et al. 2008).
Nutrient signaling influences the cell cycle state at Nutrient-Controlled Signaling Pathways
multiple points. Unicellular organisms (and presum- Metabolism and the available nutrients generate
ably stem cells) measure their size before they commit signals that are received by the cell cycle, with the
to a new round of division (Fantes et al. 1975). cAMP/PKA, AMPK/Snf, and TOR pathways being
While the nature of the size signal remains unclear, the most prominent signaling pathways. It is assumed
cells are able to sense key metabolites; in bacteria, that the first two pathways are controlled by
genetic evidence suggests that cells measure their carbon sources (Santangelo 2006) while the TOR
nucleotide content (Weart et al. 2007) while simple pathway is downstream of different nitrogen sources
eukaryotes such as S. cerevisiae measure stored (Wullschleger et al. 2006). For S, the signal seems to
carbohydrates in the form of trehalose (Paalman et al. be directly linked to a misregulation of cystein and
2003). As for higher eukaryotes not much is known. methionin biosynthesis. Regulation of intracellular
Interestingly, transcriptional regulation of metabolic polyphosphate levels or direct control of cyclin
pathways seems to be coupled with the cell cycle. translation conveys P availability.
In S. cerevisiae, the P, N, and S pathways are AMP-kinase is activated by high AMP levels,
upregulated around “Start” (the equivalent of the which occur upon energy shortage. AMPK and cell
restriction point in mammalian cells) in G1 (Tu et al. cycle are linked around “Start” and overexpression of
2005). It is not clear whether this is directly required hyperactivated AMPK leads to a G1/S arrest, while
for progression, and if and how metabolite sensing loss of AMPK function results in small cells and
mechanisms link to the cell-size checkpoint. ultimately cell death (Baena-González et al. 2007).
Moreover, cells adjust the timing of the cell cycle to In higher eukaryotes, AMPK controls p53 and thereby
nutrient availability (Fantes and Nurse 1977). Often, delays the G1/S transition. In S. cerevisiae, the
the phase where the nutrient limitation is sensed is evidence is less clear, but AMPK seems to directly
affected and extended. For example, S. cerevisiae cell control the transcription factor (TF) necessary for
cycle length differs on respiratory compared to fer- B type cyclin synthesis and therefore S phase entry. It
mentative carbon sources, and the timing correlates is unclear whether this direct transcriptional effect
with the energy gained from these sources. In general, exists in higher eukaryotes.
slower mass accumulation under limiting C, S, High cAMP levels activate PKA through direct
or P sources predominantly either expands G1 binding to its regulatory domain. In contrast to
(S. cerevisiae and cultured cells from multicellular AMPK, energy shortage inactivates PKA through
organisms) or G2 (S. pombe), but effects on other a drop in cAMP. S. cerevisiae cells with defective
phases cannot be ruled out (especially in S. pombe). cAMP metabolism arrest at “Start,” and addition of
In contrast, N limitation affects all phases and the cell exogenous cAMP rescues this defect (Tamaki 2007).
cycle arrest point in S. pombe is less well defined than PKA influences the cell cycle by multiple processes:
for the other nutrients (Sveiczer et al. 2004). regulation of cyclin transcription, of translation
All those nutrients are sensed within the cell efficiency, and of protein degradation. Interpreting
cycle at the cell-size checkpoint called “Sizer.” the PKA effects is not trivial because the phenotypes
In S. cerevisiae, D. melanogaster, and human cells, are often pleiotropic and not necessarily related to cell
size control is effective at “Start,” whereas in cycle defects. Moreover, in S. cerevisiae, PKA can be
S. pombe it is in G2 before entering M phase. bypassed by deleting TFs not involved in cell cycle
The molecular details of “Sizer” are unknown in all progression. Interestingly, carbon-limited S. cerevisiae
C 316 Cell Cycle Signaling, Metabolic Pathway

cells activate PKA predominately in G1 as cAMP ▶ Model Organism


increases shortly before “Start.” This seems to be ▶ Single Cell Experiments
necessary for cell cycle progression and it is controlled ▶ Signal Transduction Pathway
by the trehalose pathway.
TOR regulates protein synthesis. While high doses
of the TOR inhibitor rapamycin lead to a phase-
References
independent cell cycle arrest, lower concentrations
and conditional TOR loss-of-function alleles arrest Baena-González E, Rolland F, Thevelein JM, Sheen J (2007)
the cycle at the respective nutrient restriction point. A central integrator of transcription networks in plant stress
Similarly, cells with low TOR activity cannot commit and energy signalling. Nature 448:938–942
Brauer MJ, Huttenhower C, Airoldi EM, Rosenstein R, Matese
to a new cycle but they still grow. While the exact
JC, Gresham D, Boer VM, Troyanskaya OG, Botstein D
molecular mechanism is unclear, data from (2008) Coordination of growth rate, cell cycle, stress
S. cerevisiae suggest that translation efficiency is response, and metabolic activity in yeast. Mol Biol Cell
sensed because overexpression of key cell cycle and 19:352–367
Csikász-Nagy A, Novák B, Tyson JJ (2008) Reverse engineering
translational regulators can compensate for deficient
models of cell cycle regulation. Adv Exp Med Biol
TOR signaling (Danaie et al. 1999). 641:88–97
Danaie P, Altmann M, Hall MN, Trachsel H, Helliwell SB
Systems Biology Approaches (1999) CLN3 expression is sufficient to restore G1-to-S-
phase progression in Saccharomyces cerevisiae mutants
Many genome-wide studies of the cell cycle were
defective in translation initiation factor eIF4E. Biochem
performed. Most of them do not specifically address J 340(Pt 1):135–141
the connection between metabolism and cell cycle, Fantes P, Nurse P (1977) Control of cell size at division in fission
except the above studies on the metabolic cycle of yeast by a growth-modulated size control over nuclear
division. Exp Cell Res 107:377–386
budding yeast that combine transcriptome and
Fantes PA, Grant WD, Pritchard RH, Sudbery PE, Wheals AE
metabolome data. (1975) The regulation of cell size and the control of mitosis.
Most computational studies of the cell cycle either J Theor Biol 50:213–244
assume a direct control of the cell cycle by key metab- Hsu PP, Sabatini DM (2008) Cancer cell metabolism: Warburg
and beyond. Cell 134:703–707
olites such as nucleotides or energy equivalents, or
Moseley JB, Mayeux A, Paoletti A, Nurse P (2009) A spatial
they include a growth-dependent effect on general gradient coordinates cell size and mitotic entry in fission
translational (Csikász-Nagy et al. 2008). While these yeast. Nature 459:857–860
approaches capture the cellular behavior, they await Paalman JWG, Verwaal R, Slofstra SH, Verkleij AJ, Boonstra J,
Verrips CT (2003) Trehalose and glycogen accumulation is
further detailing of the metabolic input. Therefore,
related to the duration of the G1 phase of Saccharomyces
key challenges are to mechanistically capture the cerevisiae. FEMS Yeast Res 3:261–268
connection between metabolic state and cell cycle as Santangelo G (2006) Glucose signaling in Saccharomyces
well as to unravel the cell-size checkpoint. cerevisiae. Microbiol Mol Biol Rev 70:253
Silverman SJ, Petti AA, Slavov N, Parsons L, Briehof R,
Thiberge SY, Zenklusen D, Gandhi SJ, Larson DR, Singer
RH et al (2010) Metabolic cycling in single yeast cells from
Cross-References unsynchronized steady-state populations limited on glucose
or phosphate. Proc Natl Acad Sci USA 107:6946–6951
Sveiczer A, Novak B, Mitchison JM (2004) Size control in
▶ Bifurcation
growing yeast and mammalian cells. Theor Biol Med
▶ Cell Cycle Model 1:12
▶ Cell Cycle Checkpoints Tamaki H (2007) Glucose-stimulated cAMP-protein kinase
▶ Cell Cycle Modeling, Differential Equation a pathway in yeast Saccharomyces cerevisiae. J Biosci
Bioeng 104:245–250
▶ Cell Cycle Transition, Detailed Regulation of
Tu BP, Kudlicki A, Rowicka M, McKnight SL (2005) Logic of
Restriction Point the yeast metabolic cycle: temporal compartmentalization of
▶ Cell Cycle, Budding Yeast cellular processes. Science 310:1152–1158
▶ Cell Cycle, Cell Size Regulation Weart RB, Lee AH, Chien AC, Haeusser DP, Hill NS, Levin PA
(2007) A metabolic sensor governing cell size in bacteria.
▶ Cell Cycle, Fission Yeast
Cell 130:335–347
▶ Disease Ontology Wullschleger S, Loewith R, Hall MN (2006) TOR signaling in
▶ DNA Replication growth and metabolism. Cell 124:471–484
Cell Cycle Signaling, Spindle Assembly Checkpoint 317 C
anaphase” signal originates at the unattached kineto-
Cell Cycle Signaling, Spindle Assembly chores. The target of the pathway is the Anaphase
Checkpoint Promoting Complex or Cyclosome (APC/C), an E3
ubiquitin ligase which ubiquitinates, thus tagging
Andrea Ciliberto1 and Jagesh Shah2 for degradation, two fundamental players of the
1
IFOM – The FIRC Institute of Molecular Oncology, metaphase-to-anaphase transition: Cyclin B and
Milan, Italy securin. The first is the mitotic cyclin, which in C
2
Department of Systems Biology, Harvard Medical complex with Cyclin Dependent Kinase 1 (CDK1)
School and Renal Division, Brigham and Women’s drives the cell cycle into mitosis. The latter is
Hospital, Boston, MA, USA a stoichiometric inhibitor of separase, a protease that
cleaves cohesin, the molecular glue holding duplicated
chromatids together, and in so doing allows them to
Definition be segregated. The degradation of the two proteins
following their ubiquitination is carried out by the
When a cell divides, each of the daughter cells receives proteasome, and is required for the exit from mitosis
the same number of chromosomes from the mother. (Cyclin B degradation) and the segregation of sister
When the process of chromosome segregation, or chromatids toward the opposite poles (securin
anaphase, goes awry daughter cells receive an unequal degradation).
number of chromosomes, a condition known as The SAC inhibits APC/C activity, and it does so
aneuploidy. Although unbalanced gene dosage is not indirectly. APC/C recognizes Cyclin B and securin
necessarily lethal, aneuploidy can seriously affect the only when it is bound to its co-activator Cdc20.
viability of cells. Moreover, it is a typical trait in cancer As a result of SAC activation, Cdc20 cannot activate
cells where aneuploidy is thought to contribute to the the APC/C, and Cyclin B and securin are stabilized.
progression to malignancy. Not surprisingly, cells From a molecular standpoint, the inhibition requires
have developed a signaling pathway that ensures sister the formation of a multiprotein complex, the Mitotic
chromatids are properly segregated to daughter cells. Checkpoint Complex or MCC, which comprises
The pathway, known as the Spindle Assembly Cdc20 and several SAC components (Mad2, Bub3,
Checkpoint (SAC), delays sister chromatid separation BubR1). Another critical complex is the one formed
until all kinetochores are attached to microtubules by Mad1 and Mad2 that localizes at the unattached
coming from the opposite spindle poles (a condition kinetochore where it catalyzes the formation of the
known as metaphase). When all kinetochores are MCC. Finally, several kinases have been implicated
attached in a bipolar manner, the checkpoint-mediated in SAC functioning: Aurora B, Mps1, and Cyclin
inhibition is lifted and cells undergo the transition from B/CDK1 itself. Thus, the current model of the check-
metaphase to anaphase. point (Fig. 1a) includes a sensor (Mad1:Mad2) which
While the SAC pathway is not essential in budding by inducing the structural rearrangement of Mad2
yeast, in mammals the checkpoint is activated during favors the formation of the Cdc20:Mad2 complex
every cell cycle, ensuring that mitosis lasts long (Musacchio and Salmon 2007), the first step to the
enough to allow proper microtubule/kinetochore formation of the MCC. It is widely assumed that the
attachment. However, it can be artificially induced MCC binds to APC/C with higher affinity than Cdc20
by drugs that either depolymerize (e.g., nocodazole) alone, and thus as long as MCC is formed it keeps APC/
or stabilize (e.g., Taxol) microtubules of the mitotic C inhibited in the APC/C:MCC complex. After the last
spindle. Genes coding for proteins of this pathway kinetochore has attached, Mad1:Mad2 is displaced from
were originally identified in budding yeast by treating the last kinetochore preventing further MCC produc-
cells with microtubule-depolymerizing drugs (Li and tion, and energy-dependent reactions drive the dissoci-
Murray 1991; Hoyt et al. 1991). Components of the ation of the MCC and the ensuing activation of APC/C
SAC were identified when mutant cells did not arrest in (Fig. 1b). Observations of SAC dynamics implicate the
response to microtubule depolymerization. presence of feedback loops, in addition to the reactions-
Consistent with the notion that the SAC monitors- described above, that give the checkpoint interesting
microtubule/kinetochore attachment, the “wait systems level properties.
C 318 Cell Cycle Signaling, Spindle Assembly Checkpoint

Cell Cycle Signaling, a


Spindle Assembly
Cdc20
Checkpoint, Fig. 1 (a)
Spindle assembly checkpoint
maintenance. Mad1:Mad2 Cohesin
localize at unattached
kinetochores (layered Mad2
structures) where they Mad2*
catalyze the conversion of Cohesin
Mad2 into an active Cdc20 Cyclin
conformation, capable of B
binding Cdc20. Cdc20:Mad2
Cohesin Mad1/
binds to the APC/C, inhibiting
Mad2 APC/C
its activity. As a consequence,
Securin
Cyclin B and securin are
stable, cohesin is not Separase
degraded, and cells are Cohesin
arrested in prometaphase.
(b) When kinetochores attach,
to microtubules Mad1:Mad2 Mad2*
are displaced, Mad2 is not Cohesin Mad2*
converted into a form capable
of fast binding to Cdc20 which Cdc20 APC/C
can now bind and activate Cdc20
APC/C. As a consequence,
Cyclin B and securin are Cdc20
degraded, and (not shown)
cohesin can be proteolysed by
separase driving sister
chromatids separation

Separase
b

Cohesin Mad2
Securin

Separase

Cohesin
Cyclin
B
Cdc20
Cohesin

APC/C
Cohesin

Cohesin
Mad1/ APC/C
Mad2
Cdc20
Cell Cycle Signaling, Spindle Assembly Checkpoint 319 C
Cell Cycle Signaling, a b
Spindle Assembly
Checkpoint, Mad2
Fig. 2 Proposed positive Mad2
feedback loops in the SAC
*
Cdc20
network. (a) The Mad2- Cdc20
template model (De Antoni Cdc20
et al. 2005) proposes that Mad2
Cdc20:Mad2 catalyzes the APC/C C
Cdc20 APC/C
formation of more Cdc20:
Mad2 complexes similarly to Mad2
* Mad2
Mad1:Mad2. (b) Activated
APC/C:Cdc20 has been
proposed to induce the
formation of more APC/C: c
Cdc20 complexes, possibly
via the ubiquitination of Cohesin
Cdc20. (c) A double-negative
feedback loop involving the Mad2
SAC and APC/C:Cdc20 has
also been suggested. The Cohesin
double negative behaves like
a positive feedback loop and
could be important for SAC Cdc20
maintenance and Cohesin Mad1/
disengagement Mad2
APC/C

Cohesin

Mad2
Cohesin *

Characteristics in checkpoint maintenance and then those proposed


to govern checkpoint inactivation (Fig. 2).
Systems Level Properties and Models Checkpoint maintenance: The fact that one
The checkpoint is characterized by three key systems unattached kinetochore can fully arrest cell cycle
level properties: (1) as long as there is one single progression for hours (systems level property (1))
unattached kinetochore, the checkpoint is active and naturally raises the question whether this is the out-
can arrest cells for several hours (Rieder et al. 1995); come of one kinetochore only, or whether it is due to
(2) as soon as the last kinetochore is attached the the additional contribution of circuits away from the
checkpoint is disengaged in a matter of few minutes, kinetochore. Given that only a few thousand SAC
and cells transit into anaphase (Clute and Pines 1999); molecules localize at the unattached kinetochore
and (3) after it is disengaged, the checkpoint can be (Howell et al. 2000), simplified models (Doncic et al.
re-activated, and its key substrates (Cyclin B and 2005) of the checkpoint came to the conclusion that
securin) stabilized when cells are treated with micro- amplification away from kinetochores is required,
tubule stabilizing drugs (Clute and Pines 1999). To a result further reinforced when the fluxes of check-
explain these properties, several positive feedback point components at the unattached kinetochores are
loops have been proposed. Although no existing model taken into account (Sear and Howard 2006). However,
has been conclusively proven, it is worth going through the nature of the amplification process still requires
the suggested loops, addressing first those involved much clarification. Based on structural data, it was
C 320 Cell Cycle Signaling, Spindle Assembly Checkpoint

proposed that the MCC itself (or better, one of its Cell Cycle Signaling, Spindle Assembly Checkpoint,
constituents, the Mad2:Cdc20 complex) is capable of Table 1 Concentrations of molecular species belonging to the
SAC pathway in HeLa cells
propagating the formation of new MCC complexes,
the so-called Mad2-template model (Fig. 2a and De Species Value Reference
Antoni et al. 2005). This idea, never confirmed in vivo, APC/C 80 nM Tang et al. (2001)
was initially criticized on the basis that such a loop 65 nM Sudakin et al. (2001)
Bub1 100 nM Howell et al. (2004)
could make the checkpoint independent from kineto-
BubR1 90 nM Tang et al. (2001)
chores and thus potentially impossible to be switched
52 nM Sudakin et al. (2001)
off, but it should be noticed that the positive feedback
127 nM Fang (2002)
loop does not introduce energy in the system, and thus
Cdc20 100 nM Tang et al. (2001)
is conceivable there exists regimes of autocatalysis 285 nM Fang (2002)
where the non-kinetochore-mediated MCC production Mad1 20 nM Shah et al. (2004)
is still dependent on the kinetochore-mediated cataly- 1/4 of Mad2 Luo et al. (2004)
sis (Simonetta et al. 2009). Mad2 120 nM Tang et al. (2001)
Checkpoint inactivation: Positive feedback loops 400 nM Sudakin et al. (2001)
were also proposed to explain the rapid inactivation 100–300 nM Luo et al. (2004)
of the SAC (systems level property (2)). First, it was 200 nM Shah et al. (2004)
suggested that active APC/C:Cdc20 could help the 100 nM Howell et al. (2000)
transformation of APC/C:MCC complexes into active 230 nM Fang (2002)
APC/C:Cdc20 via the ubiquitination of Cdc20 (Fig. 2b
and Reddy et al. 2007). This positive feedback loop
could accelerate the release of APC/C inhibition once specifically Mad2 molecules, at an unattached kineto-
MCC production is shut off. Later, it was proposed that chore (1,500 molecules/unattached kinetochore)
a double-negative feedback loop comprising APC/C: (Howell et al. 2000). Subsequent to this work,
Cdc20 and the checkpoint could explain the a number of groups have measured Mad2 turnover at
rapid checkpoint inactivation after the last kinetochore the unattached kinetochore (t1/2  10–25 s) (Howell
has attached (Fig. 2c and He et al. 2011; Zeng et al. et al. 2000; Shah et al. 2004) providing an upper bound
2010). While the SAC inactivating APC/C has been to the MCC generation rate (35–65 molecules/
well documented, the mechanism whereby APC/C kinetochore/s).
inactivates the SAC has not been so thoroughly exam- This generation rate must be matched or be greater
ined. It should be noted that SAC components are than the dissociation rate of the MCC:APC/C complex
substrates of APC/C (e.g., Mps1), and Cyclin B itself to permit robust inhibition. The dissociation rate, at
is widely recognized as required for SAC activation, this time, has only been estimated from measurements
and it is of course a substrate of APC/C:Cdc20. Impor- of mitotic timing. Observations of live cells have
tantly, the presence of feedback loops involved in SAC provided detailed measurements of anaphase onset
inactivation needs to be reconciled with its reported after the establishment of complete bipolar attachment.
reversibility (system level property (3)). While the numbers vary, the consensus is that in
mammals from last kinetochore attachment anaphase
Quantitative Measurements ensues in  25 min, and from the loss of Mad2 at the
The circuits depicted above provide only a qualitative last unattached kinetochore anaphase begins  10 min
description of the SAC pathway. In this section, we later. From these measurements and known concentra-
report the most relevant experimental measurements tions of SAC and APC/C proteins (see Table 1), we can
that are needed to understand how the SAC might estimate that the dissociation rate of the MCC:APC/C
actually work. Given the basic structure of SAC is near 0.0013/s. For 100,000 APC/C molecules
signaling, the most relevant parameters would seem in the cell, this is a molecular dissociation rate of
to be the balance of MCC production and MCC:APC/C 130 molecules/s. This does not match the MCC
dissociation. Currently only the former has been production rate from the kinetochore, a fact pointing
measured quantitatively. This work stems from the to the little knowledge we have about the ways cells
measurement of the number of SAC molecules, inactivate the SAC. Although different mechanisms
Cell Cycle Transition, Detailed Regulation of Restriction Point 321 C
have been proposed to explain SAC switching off (the Shah JV et al. (2004) Dynamics of centromere and kinetochore
stripping Mad1:Mad2 away from unattached kineto- proteins; implications for checkpoint signaling and silencing.
Curr Biol 14(11):942
chores, the degradation of Cdc20, the phosphorylation Simonetta M, Manzoni R, Mosca R, Mapelli M, Massimiliano L,
of Mad2, the ubiquitination of Cdc20), no consensus Vink M, Novak B, Musacchio A, Ciliberto A (2009) The
has been reached yet, making SAC silencing a major influence of catalysis on mad2 activation dynamics. PLoS Biol
field of investigation. 7(1):e10
Sudakin V, Chan GK, Yen TJ (2001) Checkpoint inhibition of
the APC/C in HeLa cells is mediated by a complex of C
BUBR1, BUB3, CDC20, and MAD2. J Cell Biol 154(5):925
Tang Z, Bharadwaj R, Li B, Yu H (2001) Mad2-Independent
References inhibition of APCCdc20 by the mitotic checkpoint protein
BubR1. Dev Cell 1(2):227
Clute P, Pines J (1999) Temporal and spatial control of cyclin B1 Zeng X, Sigoillot F, Gaur S, Choi S, Pfaff KL, Oh DC,
destruction in metaphase. Nat Cell Biol 1:82–87 Hathaway N, Dimova N, Cuny GD, King RW (2010) Phar-
De Antoni A, Pearson CG, Cimini D, Canman JC, Sala V, macologic inhibition of the anaphase-promoting complex
Nezi L, Mapelli M, Sironi L, Faretta M, Salmon ED, induces a spindle checkpoint-dependent mitotic arrest in
Musacchio A (2005) The mad1/mad2 complex as the absence of spindle damage. Cancer Cell 18:382–395
a template for mad2 activation in the spindle assembly
checkpoint. Curr Biol 15:214–225
Doncic A, Ben-Jacob E, Barkai N (2005) Evaluating putative
mechanisms of the mitotic spindle checkpoint. Proc Natl
Acad Sci USA 102(18):6332–6337
Fang G (2002) Checkpoint protein BubR1 acts synergistically
Cell Cycle Transition, Detailed
with Mad2 to inhibit anaphase-promoting complex. Mol Biol Regulation of Restriction Point
Cell 13(3):755
He E, Kapuy O, Oliveria RA, Uhlmann F, Tyson JJ, Novák B Laurence Calzone
(2011) System-level feedbacks make the anaphase switch
irreversible. Proc Natl Acad Sci USA 108(24):10016–10021
Institut Curie – INSERM U900 – Mines ParisTech,
Howell BJ, Hoffman DB, Fang G, Murray AW, Salmon ED Paris, France
(2000) Visualization of Mad2 dynamics at kinetochores,
along spindle fibers, and at spindle poles in living cells.
J Cell Biol 150:1233–1250
Howell BJ et al. (2000) Visualization of Mad2 dynamics at
Synonyms
kinetochores, along spindle fibers, and at spindle poles in
living cells. J Cell Biol 150(6):1233 G1 checkpoint; R-point
Howell BJ et al. (2004) Spindle checkpoint protein dynamics at
kinetochores in living cells. Curr Biol 14(11):953
Hoyt MA, Totis L, Roberts BTS (1991) Cerevisiae genes
required for cell cycle arrest in response to loss of microtu- Definition
bule function. Cell 66:507–517
Li R, Murray A (1991) Feedback control of mitosis in budding The restriction point is an irreversible transition occur-
yeast. Cell 66:519–531
Luo X et al. (2004) The Mad2 spindle checkpoint protein has two
ring in late G1 phase after which the cell commits to
distinct natively folded states. Nat Struct Mol Biol 11(4):338 DNA replication and cell cycle progression. Using
Musacchio A, Salmon ED (2007) The spindle-assembly check- a systems biology approach, detailed molecular maps
point in space and time. Nat Rev Mol Cell Biol 8:379–393 are built and reveal a highly complex architecture
Reddy SK, Rape M, Margansky WA, Kirschner MW
(2007) Ubiquitination by the anaphase-promoting complex
governing the regulation of this transition.
drives spindle checkpoint inactivation. Nature 446:921–925
Rieder CL, Cole RW, Khodjakov A, Sluder G (1995) The
checkpoint delaying anaphase in response to chromosome Characteristics
monoorientation is mediated by an inhibitory signal produced
by unattached kinetochores. J Cell Biol 130:941–948
Sear RP, Howard M (2006) Modeling dual pathways for the Basic Concepts of the Restriction Point
me-tazoan spindle assembly checkpoint. Proc Natl Acad Progression and coordination of the cell cycle are
Sci USA 103:16758–16763 governed by complex molecular processes (▶ Cell
Shah JV, Botvinick E, Bonday Z, Furnari F, Berns M,
Cleveland DW (2004) Dynamics of centromere and kineto-
Cycle). Throughout the cycle, different safety mecha-
chore proteins; implications for checkpoint signaling and nisms verify that proper conditions are met for cell
silencing. Curr Biol 14:942–952 cycle events such as DNA replication, chromosome
C 322 Cell Cycle Transition, Detailed Regulation of Restriction Point

alignment, etc. (▶ Cell Cycle Checkpoints). The kinases CDK4 or CDK6 to trigger the G1/S transition.
restriction point (▶ Cell Cycle Transition, Principles On the other hand, it suppresses the activity of ▶ Cdk
of Restriction Point) is one of these checkpoints. It inhibitors (or CKI, such as p16INK4a or p15INK4b),
blocks entry into S phase upon growth-limiting which prohibit the association of the CyclinD1/
constraints. CDK4,6 complexes. However, the transition is not
In the absence of mitogens, the cell enters a G0 (or governed by the sole activation of these complexes.
quiescent) state. In the presence of mitogens, growth is As long as proper intracellular conditions are not met,
stimulated and the cell advances through G1 phase. a checkpoint prevents entry into S phase until the cell
During the time preceding the restriction point transi- is ready.
tion, the cell is responsive to both positive (e.g., mito-
genic growth factors) and negative (e.g., transforming RB, the Guardian of the Restriction Point Gate
growth factor b) external factors. However, when it The tumor suppressor gene retinoblastoma (RB) plays
passes through the restriction point in late G1, the cell the role of gatekeeper. It is the main target of
commits to cell cycle events and proceeds to DNA CyclinD1/CDK4,6 complexes. During the G1 phase,
synthesis. After that point, even if the mitogens are RB sequesters a family of key ▶ transcription factors,
removed, the cell completes its cycle before stopping the E2Fs, that mediates the transcription of many genes
at the following round (▶ Cell Cycle Transition, involved in cell cycle events and in the apoptotic
Principles of Restriction Point). pathway (DeGregori and Johnson 2006). Upon mitotic
What is the molecular machinery behind the cell stimulation, CyclinD1/CDK4,6 complexes start phos-
decision to either halt and enter a quiescent state or phorylating RB, which releases and activates E2Fs.
replicate its DNA and progress through M phase? E2Fs activation terminates G1 phase and leads to
entry into S phase. CyclinE1, one of E2F target
Biochemical Interactions Involved in the genes, binds to CDK2 and collaborates with
Restriction Point Transition CyclinD1/CDK4,6 to maintain the hyperpho-
To recapitulate all the key players and understand the sphorylated state of RB, thereby ensuring the irrevers-
intricate processes governing this decision, a thorough ibility of the transition. The processes described above
description of the molecular machinery controlling the are illustrated in Fig. 1.
restriction point is proposed. The study of RB in the context of the restriction
point is critical for several reasons. First, RB inactiva-
Processing the Information tion and the passage through the restriction point occur
A specific family of kinases has first been identified as concomitantly (Bartek et al. 1996). Furthermore, RB is
cell growth sensors (▶ Cyclins and Cyclin-dependent an anti-oncogene and its activity is deregulated in
Kinases). They link external stimuli to the intracellular virtually all types of cancers (▶ Cell Cycle, Cancer
machinery. These sensors are composed of a kinase Cell Cycle and Oncogene Addiction) (Weinberg
subunit, Cdk, and a cyclin partner, which determines 2006), including familial and sporadic forms (osteosar-
the localization and the function of the complex. If comas, breast carcinomas, small cell lung carcinomas,
these complexes can sense growth signals, how exactly bladder carcinomas, melanomas, etc.).
is the signal transmitted to them? The deficiency of the RB pathway is observed in
The sequence of events leading to the activation of cancers when (1) the external stimulation is constant,
these kinases can be characterized as follows: The for example, when the growth factors are always pre-
binding of growth factors to their receptors lead to sent or when their receptors are mutated and as a result
the activation of cytoplasmic proteins such as Ras, the signal is always considered to be on, (2) the cell is
which passes on the signal to a cascade that ultimately insensitive to antimitogenic factors (e.g., TGFb), or
turns on the transcription of cell cycle genes (▶ Tran- (3) when processes that control the passage through
scription). The growth factor-mediated activation of the restriction point are perturbed by mutations or
these genes operates through two distinct paths. On diverse dysfunctions (Sherr 1996). More precisely, in
one hand, it leads to the accumulation of the proto- cancer samples, dysfunctions of RB itself have been
typical G1 cyclin, CyclinD1, in the nucleus and pro- found to arise from the deletion of the gene, from
motes its association with the cyclin-dependent epigenetic mutations, from the presence of viral
Cell Cycle Transition, Detailed Regulation of Restriction Point 323 C
Mitogens p130 (RBL2) is present in quiescent cells; p130 has
(Ras) 22 sites of phosphorylation.
• E2F transcription factors
G1 phase Activators of transcription: E2F1, E2F2, E2F3a.
CyclinD1 CDK4,6 INK4 Inhibitors of transcription: E2F3b, E2F4, E2F5,
E2F6, E2F7a, E2F7b, E2F8.
The E2F transcription factors are active as dimers. C
R-point RB E2F1 to E2F6 dimerize with DP1 or DP2.
E2F7 and E2F8 are active as homodimers.
• Association between RB family members and the
E2Fs
E2F
E2F1, E2F2 and E2F3 bind RB.
S phase
E2F4 binds RB, p107 or p130.
CyclinE1 CDK2
E2F5 binds p130 (and to some extent p107).
E2F6 binds proteins from the polycomb group.
Cell Cycle Transition, Detailed Regulation of Restriction E2F7 and E2F8 form homodimers and do not bind
Point, Fig. 1 Simple representation of the interactions involved to other known proteins.
in the restriction point transition

The Comprehensive Map


oncoproteins, or as a result of a deregulation of the Network representations can be used to visualize what
kinases – and the kinase inhibitors – that control its is known about a biological process. It offers the pos-
activity. sibility to integrate heterogeneous data from sparse
sources of information from both individual and high
Comprehensive Maps of the Restriction Point throughput studies and summarize them into
Transition a synthetic picture. It also provides a tool to replace
The real events behind the restriction point transition a mechanism or a gene in its context, identify contra-
are, of course, much more complex than the one dictions from the literature, and anticipate the effects
depicted in Fig. 1. In this representation, a precise of a local perturbation on the global system.
and explicit molecular characterization of the protein Networks illustrating any biological mechanisms
interactions and transformations is missing. A simple need to be (1) biologically accurate, coherent, and
inhibition in Fig. 1 can be interpreted as an inhibitory unambiguous, (2) understandable and human-
sequestration, a degradation process, or a (de)phos- readable, (3) easily extensible when new information
phorylation. Such network representation may lead to is added, (4) computable, i.e., ready for modeling,
ambiguous conclusions. and machine-readable, (5) accessible in standard
The complexity increases exponentially as more format (SBGN, Le Novère et al. 2009) and (6) hierar-
details are included in the description. For instance, chical, with different levels of description (Kitano
let us consider only the interactions between RB and et al. 2005).
E2F: RB, along with its family members, binds to some There exist different kinds of representations in
but not all of the eight E2F genes (E2F1-8), some of numerous databases recapitulating heterogeneous
them being activators and others inhibitors of cell types of information and for which the nodes and
cycle advance. edges have different meanings (Pathway Interaction
Database from Nature: http://pid.nci.nih.gov/;
Complex Interactions Between RB, E2F and Their Family Biocarta: http://www.biocarta.com/; KEGG: http://
Members www.genome.jp/kegg/; Reactome: http://www.
• RB protein family reactome.org, etc.).
RB (RB1) is present in quiescent and proliferating Considerable efforts have been made to produce
cells; RB has 16 sites of phosphorylation. detailed and accurate maps of the mammalian cell
p107 (RBL1) is present in proliferating cells; p107 cycle molecular machinery (Kohn 1999 and in various
has 10 sites of phosphorylation. databases cited above) and more specifically that of the
C 324 Cell Cycle Transition, Detailed Regulation of Restriction Point

E2F3_targets E2F1_targets
E2F8_targets E2F7_targets E2F6_targets E2F5_targets E2F2_targets
E2F4_targets

E2F1–3 Apoptosis_entry

E2F4–5

E2F6–8
RB
CycC:CDK3

p16/p15 CycD1:CDK4/6
CycE1:CDK2

CycA2:CDK2

p27/p21
CDC25C
CycH:CDK7

CycB1:CDC2

APC

Weel

Cell Cycle Transition, Detailed Regulation of Restriction modifications of the major proteins involved in RB regulation.
Point, Fig. 2 Detailed map of the retinoblastoma (RB) network Blue rectangles refer to tumor suppressor genes and red rectan-
emphasizing the passage through the restriction point. The map gles to proto-oncogenes. Lower panel: mRNA and gene targets
is constructed using standard SBGN language in CellDesigner of the eight E2F transcription factors. A readable and clickable
(http://www.celldesigner.org). It comprises 80 proteins, 176 version of the map can be found at http://bioinfo-out.curie.fr/
genes, 165 interactions, and established from circa 350 publica- projects/rbpathway/interactive/rb_network.html
tions. Upper panel: biochemical reactions regulating the

restriction point (Calzone et al. 2008). In the latter transformation distinguished. Some links can be bro-
map, the focus is put on the biochemical reactions ken and alternative – or back-up – pathways, etc.,
and transformations (edges) of species (nodes) that proposed.
play an important role in the regulatory process
(Fig. 2). Modular View of the Comprehensive Map
To improve the readability of the map, one can sim-
Use of the Map plify it without losing information about regulations.
Beyond the role of integrating knowledge in an unam- For that purpose, from the initial comprehensive map,
biguous form, these molecular maps can be used to groups of components can be identified from data
interpret biological data or reason on unexpected ▶ clustering, when the data is available, or from
behaviors of the system in specific conditions. The graph theory techniques (▶ Module Network). Here,
map can be viewed as an electronic circuit and inter- the reaction network of Fig. 2 is transformed into an
rogated based on its sole topology. It can be compared influence network (Fig. 3) in which each node corre-
to a map of the New York subway which allows to find, sponds to a set of connected species in the comprehen-
for example, the simplest way from one station to sive map. The sets of proteins are grouped into
another or suggest an alternative route if one station modules. The nodes are connected either by positive
is closed. All the important partners or catalyzers for or negative arrows based on the available biological
a particular activation can be identified. All or one information derived from the complete map. The
path(s) between two species can be isolated and the resulting graph is an abstraction of the initial complex
necessary partners or intermediaries of a and detailed map.
Cell Cycle Transition, Detailed Regulation of Restriction Point 325 C

Cell Cycle Transition, Detailed Regulation of Restriction connected by influence arrows, either positive or negative,
Point, Fig. 3 Modular view of RB pathway. The nodes of this deduced from the initial network and based on biological knowl-
graph represent modules – or groups of interacting proteins – edge of the nature of the interactions
derived from the comprehensive map (Fig. 2). The nodes are

The different versions of the RB network proposed modeling can be applied to influence network (▶ Cell
here (Figs. 1–3) can then be translated into mathemat- Cycle Modeling Using Logical Rules), etc.
ical formulas. Ordinary differential equations are often More generally, the construction of the map of
used for a reaction-based network (▶ Cell Cycle a particular process and the gathering of biological
Modeling, Differential Equation), whereas discrete knowledge from heterogeneous sources is the first
C 326 Cell Cycle Transition, Principles of Restriction Point

step toward the elaboration of a mathematical model Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M,
(Novák and Tyson 2004; Alfieri et al. 2009). Mathe- Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A,
Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson
matical modeling provides a formal tool to answer S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL,
specific biological questions. More importantly, it Kohn K, Kitano H (2009) The systems biology graphical
allows to test and predict, in silico, different perturba- notation. Nat Biotechnol 27(8):735–741
tions of a biological process that would be too complex Novák B, Tyson JJ (2004) A model for restriction point control
of the mammalian cell cycle. J Theor Biol 230(4):563–579
to understand by simple intuitive reasoning. Sherr CJ (1996) Cancer cell cycles. Science 274(5293):
1672–1677
Weinberg R (ed) (2006) The biology of cancer. Garland Science,
Cross-References New York, pp 255–305

▶ CDK Inhibitors
▶ Cell Cycle
▶ Cell Cycle Checkpoints Cell Cycle Transition, Principles of
▶ Cell Cycle Modeling Using Logical Rules Restriction Point
▶ Cell Cycle Modeling, Differential Equation
▶ Cell Cycle Transition, Principles of Restriction Tae Lee1, Guang Yao2 and Lingchong You3
1
Point School of Biology and School of Physics, Georgia
▶ Cell Cycle, Cancer Cell Cycle and Oncogene Institute of Technology, Atlanta, GA, USA
2
Addiction Department of Molecular and Cellular Biology,
▶ Clustering University of Arizona, Tucson, AZ, USA
▶ Cyclins and Cyclin-dependent Kinases 3
Biomedical Engineering Department, Duke
▶ Module Network University, Durham, NC, USA
▶ Pathway
▶ Transcription
▶ Transcription Factor Synonyms

G1 phase checkpoint; Start


References

Alfieri R, Barberis M, Chiaradonna F, Gaglio D, Milanesi L,


Definition
Vanoni M, Klipp E, Alberghina L (2009) Towards a systems
biology approach to mammalian cell cycle: modeling the The restriction point (R-point), similar to the “Start”
entrance into S phase of quiescent fibroblasts after serum point in yeast (▶ Cell Cycle, Budding Yeast), is a G1
stimulation. BMC Bioinformatics 10(Suppl 12):S16
Bartek J, Bartkova J, Lukas J (1996) The retinoblastoma protein
phase checkpoint (▶ Checkpoints) at which mamma-
pathway and the restriction point. Curr Opin Cell Biol lian cells commit to proliferation (▶ Mitosis; ▶ Cell
8(6):805–814 Cycle of Mammalian Cells) and become independent
Calzone L, Gelay A, Zinovyev A, Radvanyi F, Barillot E (2008) of growth signals for the completion of the cell cycle
A comprehensive modular map of molecular interactions in
RB/E2F pathway. Mol Syst Biol 4:173
(Fig.1).
DeGregori J, Johnson DG (2006) Distinct and overlapping roles
for E2F family members in transcription, proliferation and
apoptosis. Curr Mol Med 6(7):739–748 Characteristics
Kitano H, Funahashi A, Matsuoka Y, Oda K (2005) Using
process diagrams for the graphical representation of biolog-
ical networks. Nat Biotechnol 23(8):961–966 Discovery of the R-point
Kohn K (1999) Molecular interaction map of the mammalian The key concept of the R-point is the irreversible
cell cycle control and DNA repair systems. Mol Biol Cell commitment of a cell to proliferation (▶ Cell Cycle
10(8):2703–2734
Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin
Dynamics, Irreversibility). The R-point was first pro-
A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, posed by Pardee in 1974 (Pardee 1974), when he
Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, observed a single, unique arrest point in the G1 phase
Cell Cycle Transition, Principles of Restriction Point 327 C
a
G0
f

Rate
G1 d
Restriction point
M
C
[X]
b

Rate
G2 S
d

[X ]
Cell Cycle Transition, Principles of Restriction Point, Cell Cycle Transition, Principles of Restriction Point,
Fig. 1 The cell cycle (Cell cycle, phases of the) and the R-point Fig. 2 Rates of synthesis (f(X)) and degradation (d(X)) deter-
mine the dynamics of the system described in Equation 1. Sup-
pose d(X) is always first order in terms of X and is sufficiently
of the cell cycle associated with different starvation small. (a) If f has ultrasensitive dependence on [X], the system is
conditions. That is, cells arrested at quiescence by bistable with two stable steady states (filled circles) separated by
different starvation regimens resumed DNA synthesis an unstable one (open circle). (b) Otherwise, the system is
at apparently the same time point. With sequential monostable with one unstable steady state and a stable one. If
d(X) is too large, the system is monostable for both cases, with
applications of different starvation regimens, cells did a single trivial steady state ([X] ¼ 0)
not escape from any previous arrest condition (indicat-
ing overlapping arrest points). Pardee and colleagues
further demonstrated that the R-point is located several entry (Thron 1997; Aguda and Tang 1999; Novak
hours before the S-phase. Passing this critical time and Tyson 2004). While these studies differed in
point, cells commit to proliferation and continue to molecular details, their consensus is that the R-point
transit the cell cycle even when the exogenous growth is regulated by a bistable switch (▶ Bistability), which
condition was removed (Yen and Pardee 1978). The can generate all-or-none and hysteretic (history-
transition between proliferative and quiescent states dependent) response.
was further analyzed by Zetterberg and colleague A bistable switch can result from one or more pos-
(Zetterberg and Larsson 1985), who measured the itive feedback loops with sufficient nonlinearity
timing of the irreversible commitment to proliferation (Ferrell 1996). For example, consider that X regulates
in individual cells by time-lapse cinematograph, and its own synthesis via a positive feedback loop,
found the R-point occurs at a remarkably constant time
after mitosis in exponentially growing cells. dX
¼ f ðXÞ  dðXÞ; (1)
dt
R-point Governed by a Bistable Switch
The R-point has two main characteristics: first, an where f and d denote the synthesis and degradation
initial high threshold on growth-stimulating signals, rates of X, respectively. When the synthesis of X is
which ensures a tight control of cell growth; second, ultrasensitive (e.g., due to cooperativity, multistep,
a low-maintenance mechanism, which ensures that zero-order, or inhibitor ultrasensitivity (Ferrell
once committed, cell proliferation will be completed 1996)), the f(X) curve and the d(X) curve can cross at
even in the absence of sustained growth signals. dt ¼ 0, Fig. 2a). This will generate
three steady states ( dx
Theoretical studies have suggested potential mecha- two stable steady states (and thus, a bistable system)
nisms underlying the R-point control of cell cycle separated by an unstable one. When the system at the
C 328 Cell Cycle Transition, Principles of Restriction Point

Serum

Maintenance-threshold
Myc 10–1

E2F
10–2
CycD / CDK4,6
Activation threshold

10–3
0 0.5 1
Rb Rb-P Serum

Cell Cycle Transition, Principles of Restriction Point,


Fig. 4 Hysteresis in E2F activation. E2F follows two different
trajectories when switching from OFF to ON and ON to OFF.
The “activation threshold” is where E2F switches from OFF to
ON, and the “maintenance threshold” is where E2F switches
from ON to OFF. The region between the two thresholds is
E2F CycE / CDK2 a “bistable region” where E2F can be either at ON or OFF
state, depending on the history of the system

Cell Cycle Transition, Principles of Restriction Point, E2F2, and E2F3a). E2F is a transcriptional factor that
Fig. 3 A simplified Rb-E2F network. It contains two positive can activate a large battery of genes encoding activities
feedback loops: (1) auto-catalysis of E2F, and (2) mutual- involved in DNA replication and cell cycle progres-
inhibition feedback involving E2F, CycE/Cdk2, and Rb sion. In quiescent cells, E2F protein is bound and
repressed by tumor suppressor Rb, and transcription
from the E2F promoter is also inhibited by Rb. Upon
unstable steady state is subjected to minor perturba- sufficient growth stimulation, Myc and cyclin D
tions, the system will move away from the unstable (CycD) are induced. CycD associates with and acti-
steady state towards the stable ones. Without ultrasen- vates Cyclin dependent kinases (Cdk4,6), which phos-
sitivity in the synthesis of X, the system is always phorylate Rb and release E2F. Myc directly binds to
monostable. For example, it can have an unstable the E2F promoter, facilitating E2F binding to nearby
steady state and a stable one (Fig. 2b). sites. E2F then activates its own transcription, forming
a positive feedback loop. In addition, E2F also acti-
Molecular Interactions at the G1/S Transition: The vates Cyclin E, which forms a complex with Cdk2 to
Rb-E2F Switch further phosphorylate Rb and relieve its repression of
The molecular events leading to the traverse of the E2F, constituting another positive feedback loop. The-
R-point involve a sophisticated network of molecules oretical studies suggested that the positive feedback
and interactions (defined as the Rb-E2F network, mediated by E2F and the ultrasensitivity created by the
▶ Cell Cycle Transition, Detailed Regulation of Rb repression of E2F could potentially generate
Restriction Point) (Calzone et al. 2008). Here we pre- a bistable switch (Fig. 4, ▶ Cell Cycle Model Analysis,
sent a highly simplified model that focuses on the cell Bifurcation Theory) (Thron 1997).
cycle entry events (Yao et al. 2008), as shown in Fig. 3. Recent experimental analysis has demonstrated that
In this simplified model, the node Rb represents the the Rb-E2F network indeed functions as a bistable
entire pocket protein family (RB, p107, and p130), and switch and generates bimodal and history-dependent
the node E2F represents all E2F activators (E2F1, E2F activities (Yao et al. 2008). Mammalian cells in
Cell Cycle Transitions, G2/M 329 C
quiescence remain in the OFF state of E2F when Yen A, Pardee AB (1978) Exponential 3t3-cells escape in mid-
deprived of growth stimulation. Following sufficient G1 from their high serum requirement. Exp Cell Res
116:103–113
growth stimulation, these cells eventually activate Zetterberg A, Larsson O (1985) Kinetic-analysis of regulatory
E2F and commit to proliferation. Afterward, E2F events in G1 leading to proliferation or quiescence of swiss
stays at the ON state to drive the completion of the 3t3 cells. Proc Natl Acad Sci USA 82:5365–5369
cell cycle, even after the growth stimuli are removed.
For a given serum concentration, the transition C
timing has been found to be highly variable, which
can be described with a stochastic model (Lee et al. Cell Cycle Transitions, G2/M
2010). A similar bistable switch mechanism was
identified in yeast for the control of the Start point Judit Zámborszky
(Charvin et al. 2010). The Microsoft Research – University of Trento Centre
for Computational and Systems Biology, CoSBi,
Trento, Italy
Cross-References

▶ Bistability Synonyms
▶ Cell Cycle Checkpoints
▶ Cell Cycle Dynamics, Irreversibility Entry into mitosis; G2/M transition; Mitotic entry
▶ Cell Cycle Model Analysis, Bifurcation Theory
▶ Cell Cycle of Mammalian Cells
▶ Cell Cycle Transition, Detailed Regulation of Definition
Restriction Point
▶ Cell Cycle, Budding Yeast G2/M transition (or mitotic entry) is the progression
▶ Mitosis from G2-phase to M-phase (also referred to as
“mitosis”) of the mitotic cell division cycle.

References
Characteristics
Aguda BD, Tang Y (1999) The kinetic origins of the
restriction point in the mammalian cell cycle. Cell Prolif
Eukaryotic cells govern a repetitive series of events
32:321–335
Calzone L, Gelay A, Zinovyev A, Radvanyi F, Barillot E (2008) that result in self-reproduction (Morgan 2006). The
A comprehensive modular map of molecular interactions in cell cycle (▶ Cell Cycle) consists of four different
RB/E2F pathway. Mol Syst Biol 4:173 stages: G1-, S-, G2-, and M-phase by which
Charvin G, Oikonomou C, Siggia ED, Cross FR (2010) Origin of
a growing cell replicates all its components and divides
irreversibility of cell cycle start in budding yeast. PLoS Biol
8(1):e1000284 into two nearly identical daughter cells (▶ Cell Cycle,
Ferrell JE (1996) Tripping the switch fantastic: how a protein Physiology). The alteration between cell cycle phases
kinase cascade can convert graded inputs into switch-like is sophisticatedly controlled, ensuring appropriate
outputs. Trends Biochem Sci 21:460–466
timing and order of sequential action (G1-phase,
Lee T, Yao G, Bennett D, Nevins J, You L (2010) Stochastic E2F
activation and reconciliation of phenomenological cell-cycle G1/S transition, DNA replication, G2-phase, G2/M
models. PLoS Biol 8:9 transition, mitosis, and mitotic exit (▶ Cell Cycle
Novak M, Tyson JJ (2004) A model for restriction point control Transitions, Mitotic Exit)). Transitions are tightly
of the mammalian cell cycle. J Theor Biol 230:563–579
monitored by “surveillance mechanisms” (checkpoints
Pardee AB (1974) Restriction point for control of normal
animal-cell proliferation. Proc Natl Acad Sci USA (▶ Cell Cycle Checkpoints)) (Kastan and Bartek
71:1286–1290 2004). Checkpoints are control mechanisms governing
Thron CD (1997) Bistable biochemical switching and the control crucial irreversible transitions of the cell division cycle
of the events of the cell cycle. Oncogene 15:317–325
(▶ Cell Cycle Dynamics, Irreversibility). Monitoring
Yao G, Lee TJ, Mori S, Nevins JR, You LC (2008) A bistable
Rb-E2F switch underlies the restriction point. Nat Cell Biol the environmental conditions (nutrient, hormones,
10:476–U255 etc.) and the stage of the DNA (if there are lesions or
C 330 Cell Cycle Transitions, G2/M

chromosome aberrations) or check of the size of the P


cell (if it is large enough for the process of the cycle) Wee1 Wee1
are important regulations for the maintenance of geno-
Inactive form Active form
mic stability. Chromosome breaks or rearrangements
are known to decrease survival of an individual cell,
thus proper checkpoint control is crucial for cancer P P P
prevention in multicellular organisms (Kastan and CDK CDK P
Bartek 2004) (▶ Cell Cycle, Cancer Cell Cycle and Cyclin B Cyclin B
Oncogene Addiction). Active form Inactive form
The cell cycle transitions are controlled through cell
cycle proteins’ activities (▶ Cyclins and Cyclin-
dependent Kinases). Cyclin-dependent kinases P
Cdc25 Cdc25
(CDKs) are responsible for most basic cell cycle pro-
cesses and their role is crucial in cell cycle progression Inactive form Active form
(Morgan 1995). CDKs’ activity depends on (1) their
regulatory cyclin subunits (those are specific to the Cell Cycle Transitions, G2/M, Fig. 1 Regulatory mechanism
of the G2/M transition. The active forms of the enzymes (kinases
different cell cycle phases), (2) their phosphorylation and a phosphatase) are able to phosphorylate their target (dashed
state, and (3) the presence of proteins forming inhibi- arrows). The active CDK/Cyclin B phosphorylates (thus inhibits
tory complexes with CDKs. and activates) Wee1 and Cdc25. Active Wee1 (and Cdc25)
The G2/M transition is triggered by the increased modify the CDK/Cyclin B complex by adding (and removing)
phosphate groups to (from) the active (and inactive) form
activity of CDK in combination with B-type cyclins;
and the active CDK/Cyclin B complex has several
substrates that are responsible for mitotic events G2/M transition as the active CDK/Cyclin B becomes
(▶ Mitosis) afterward. CDK activity in G2-phase able to beat the inhibitory effect of Wee1.
depends on phosphorylation or dephosphorylation Regulation of CDK is more complex: CDK feeds
reactions (▶ Post-translational Modifications) of spe- back (▶ Feedback Regulation) by phosphorylating
cific tyrosine and threonine residues. both Wee1 and Cdc25 (Solomon et al.1990, Hoffmann
First of all, phosphorylation of CDK on a conserved et al. 1993) (Fig. 1). The phosphorylation of Wee1 by
threonine residue (T161 in mammals) by cdk- CDK inactivates Wee1, creating a mutual inhibitory
activating kinase (CAK) is crucial for its kinase activ- link between the two kinases. One should think about
ity (Kaldis et al. 1996) (Activator). the relation between CDK and Wee1 as they are ene-
Second, phosphorylation of CDK (Cdc2 in mam- mies inhibiting (phosphorylating) each other, provid-
mals) within the active site of the kinase (at tyrosine 15 ing a double-negative feedback regulation (also called
(Y15) and threonine 14 (T14) in mammals) prevents as antagonism (▶ Negative Feedback)). This molecu-
CDK activity and mitosis (▶ Cdk Inhibitors). This lar network structure has been shown to be able to form
modification is carried out by inhibitory protein a bistable system (▶ Bistability, ▶ Cell Cycle Dynam-
kinases of the Wee1-class (Wee1, Mik1, Myt1 in mam- ics, Bistability and Oscillations) with either a state of
mals; Swe1 in budding yeast; all referred as “Wee1” in low CDK and high Wee1 activity (in G2-phase) or
the latter text) (Coleman and Dunphy 1994). Thus, a high CDK and low Wee1 activity (in mitosis). The
during the G2-phase, the presence of active Wee1 latter stage is appropriate for triggering mitotic entry as
inactivates the CDK/Cyclin B complex and blocks the active CDK dimer is required for the control of
the cell cycle entry into M-phase while cells are grow- several crucial molecular mechanisms of M-phase
ing and preparing for mitosis. (mitosis (▶ Mitosis)).
However, when cells are ready to divide, The phosphorylation of Cdc25 phosphatase by
a phosphatase of the Cdc25-class removes the inhibi- CDK presents a mutual activation as both enzymes
tory phosphate groups from the inactive CDK that “help” each other: CDK activates its phosphatase
results in activation of the complex (Coleman and Cdc25 which in return removes the inhibitory phos-
Dunphy 1994). This modification facilitates the phate groups from CDK. This is remarkably a different
Cell Cycle Transitions, G2/M 331 C
DNA
Wee1 G1 replication
S

Active form

Double-negative Wee1 Cdc25


feedback loop C

P
CDK / Cyclin B
CDK

Cyclin B M G2
Active form Mitosis

Cell Cycle Transitions, G2/M, Fig. 3 Simplified diagram of


Positive
the cell cycle and the regulation of the G2/M transition. Dashed
feedback loop
arrows represent activations and dashed lines with an end -| assist
for inhibitory reactions
P

Cdc25
(▶ Cell Cycle Dynamics, Irreversibility, ▶ Cell Cycle
Active form Dynamics, Bistability and Oscillations, ▶ Cell Cycle
Cell Cycle Transitions, G2/M, Fig. 2 Representation of two
Model Analysis, Bifurcation Theory). Bistability (and
interconnected positive feedback loops regulating entry into hysteretic transitions) of the Wee1-CDK-Cdc25 con-
mitosis during cell cycle. Dashed arrows symbolize activations trol system in frog egg cells has been experimentally
and dashed lines with an end -| assist for inhibitory reactions tested in Jill Sible’s lab (Sha et al. 2003) and in Jim
Ferrell’s lab (Pomerening et al. 2003).
type of positive feedback regulation than the CDK- The switches of cell cycle phases are triggered by
Wee1 link (Fig. 2). transient signals that emerge when the checkpoint con-
In sum (Fig. 3), during the G2-phase of the cell ditions are satisfied. The irreversibility of these transi-
cycle, CDK activity is kept low with inhibitory phos- tions ensures that when the generated signal
phorylation events carried out by the Wee1 kinase. As disappears, cells do not step back into their previous
the activity of CDK/Cyclin B increases in time, it stage resulting in aberration of cells (▶ Cell Cycle
activates its helper phosphatase Cdc25 which in return Dynamics, Irreversibility). For instance, if cells escape
removes the inhibitory phosphate groups from CDK. the checkpoint in G2-phase, they might end up with
The sudden increase of CDK activity triggers inacti- nonidentical daughter cells with extra or less DNAs
vation of Wee1 kinase by its phosphorylation. The than the mother cell, rolling over the stability of the
active CDK generates mitotic events (e.g., chromo- genome that causes fatal errors (Kastan and Bartek
some condensation, nuclear envelope breakdown, 2004).
chromatid segregation, assembly of mitotic spindle Upon DNA damage, the checkpoint proteins are
(▶ Mitosis, ▶ Cell Cycle, Physiology)) and cells often recruited to DNA lesions by repair complexes
enter the M-phase. that generate signals to activate the checkpoint
The two positive feedback loops (▶ Feedback response pathway (▶ Cell Cycle Arrest After DNA
Regulation, ▶ Positive Feedback) described previ- Damage). Transducers transmit and amplify the signal
ously act synergistically and provide an abrupt to downstream targets switching on the DNA-repair
increase in the activity of CDK making the G2/M apparatus and blocking the cell cycle machinery
transition an irreversible, switch-like event (Kastan and Bartek 2004). In case of activated
C 332 Cell Cycle Transitions, G2/M

Cell Cycle Transitions, G2/M, Table 1 Different names used for functional homologues of the regulatory elements in the G2/M
transition network
Generic name Function Mammals Budding yeast Fission yeast Xenopus laevis
CDK Cyclin-dependent kinase Cdc2 Cdc28 Cdc2 Cdc2
Cyclin B Regulatory subunit of CDKs in mitosis CycB Clb1,2 Cdc13 CycB
Wee1 CDK/Cyclin B inhibitory kinase Wee1 Swe1 Wee1, Mik1 Wee1, Myt1
Cdc25 CDK/Cyclin B activatory phosphatase Cdc25 Mih1 Cdc25 Cdc25
CAK Cyclin-dependent kinase activating kinase CAK Cak1 Csk1 MO15

checkpoint response in G2-phase, Cdc25 activation is Cross-References


downregulated resulting in G2 arrest as CDK is not
able to overcome and eliminate its “enemy,” the Wee1 ▶ Activator
kinase. However, when DNA repair is complete, ▶ Bistability
among several actions, Wee1 is degraded which leads ▶ Cdk Inhibitors
to activation of CDK that allows G2/M transition. ▶ Cell Cycle
There are two important exceptions from the com- ▶ Cell Cycle Arrest After DNA Damage
mon regulatory way of G2/M transition. During oogen- ▶ Cell Cycle Checkpoints
esis, the egg cell grows without dividing as its G2 ▶ Cell Cycle Dynamics, Bistability and Oscillations
checkpoint is blocked by Wee1 activation or Cdc25 ▶ Cell Cycle Dynamics, Irreversibility
inhibition. At the same time, the fertilized egg ▶ Cell Cycle Model Analysis, Bifurcation Theory
undergoes a series of rapid mitotic cycles without ▶ Cell Cycle of Early Frog Embryos
growth due to its removed checkpoints (▶ Cell Cycle ▶ Cell Cycle Transitions, Mitotic Exit
of Early Frog Embryos). ▶ Cell Cycle, Cancer Cell Cycle and Oncogene
Research of the mechanisms underlying G2/M tran- Addiction
sition is crucial as cells with defective checkpoints ▶ Cell Cycle, Physiology
have progeny with damaged or lost chromosomes, ▶ Cyclins and Cyclin-Dependent Kinases
which can result in loss of viability or abnormal ▶ Feedback Regulation
growth. Checkpoint defects cause genetic instability ▶ Mitosis
that can result in tumors of multicellular organisms. ▶ Negative Feedback
The basic concept of G2/M transition has been discov- ▶ Positive Feedback
ered with the active help of theoretical work of systems ▶ Post-translational Modifications
biology (▶ Cell Cycle Modeling, Differential
Equation, ▶ Cell Cycle Model Analysis, Bifurcation
Theory). Among several researchers, the contribution References
of the groups of Béla Novák and John J Tyson always
had a great importance to our knowledge (Tyson et al. Coleman TR, Dunphy WG (1994) Cdc2 regulatory factors. Curr
Opin Cell Biol 6(6):877–882
2001). With the assistance of biological modeling and Hoffmann I, Clarke PR, Marcote MJ, Karsenti E, Draetta G
the usage of dynamical systems theory in cell cycle (1993) Phosphorylation and activation of human cdc25-C
research, we became able to understand the underlying by cdc2-cyclin B and its involvement in the self-
logic of interconnected feedback loops of the cell cycle amplification of MPF at mitosis. EMBO J 12:53–63
Kaldis P, Sutton A, Solomon MJ (1996) The Cdk-activating
providing crucial phenomena, such as bistability, hys- kinase (CAK) from budding yeast. Cell 86(4):553–564
teresis, or irreversible transition. The differences in the Kastan MB, Bartek J (2004) Cell-cycle checkpoint and cancer.
network between organisms and the variety of molec- Nature 432(7015):316–323
ular elements playing role in the G2/M transition raise Morgan DO (1995) Principles of CDK regulation. Nature
374:131–134
further questions to be answered in the future Morgan DO (2006) The cell cycle: principles of control. New
(Table 1). Science, London
Cell Cycle Transitions, Mitotic Exit 333 C
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell are cleaved by a specific protease (separase) and sister-
cycle oscillator: hysteresis and bistability in the activation of chromatids are pulled apart by the elongating spindle.
Cdc2. Nat Cell Biol 5:346–351
Sha W, Moore J, Chen K, Lassaletta AD, Yi CS, Tyson JJ, Sible Anaphase requires the activation of the Anaphase
JC (2003) Hysteresis drives cell-cycle transitions in Xenopus Promoting Complex (or Cyclosome, APC/C) by its co-
laevis egg extracts. Proc Natl Acad Sci USA 100:975–980 activator Cdc20 (also called Fizzy in some organisms)
Solomon MJ, Glotzer M, Lee TH, Philippe M, Kirschner MW (Morgan 1999). APC/CCdc20 is an ubiquitin-ligase that
(1990) Cyclin activation of p34cdc2. Cell 63:1013–1024
Tyson JJ, Chen K, Novák B (2001) Network dynamics and cell targets the separase inhibitor, securin and the Cdk1 C
physiology. Nat Rev Mol Cell Biol 2:908–916 activator, Cyclin-B (CycB) for proteasome-dependent
degradation. Securin degradation relieves the inhibi-
tion of separase and triggers anaphase. In addition,
CycB degradation inactivates Cdk1 and thereby
Cell Cycle Transitions, Mitotic Exit reduces the rate of protein phosphorylation, which
helps in mitotic exit. Since returning to interphase
P. K. Vinod and Béla Novák depends on dephosphorylation of the mitotic phospho-
Department of Biochemistry, Oxford Centre for proteins, CycB degradation alone is insufficient for
Integrative Systems Biology, University of Oxford, mitotic exit which requires the action of a protein-
Oxford, UK phosphatase (Queralt and Uhlmann 2008). In princi-
ple, the phosphatase could have a constant activity and
can dephosphorylate mitotic phospho-proteins once its
Definition activity overcomes the falling protein-kinase activity.
However in budding yeast, where the mitotic exit
Mitotic exit refers to the completion of mitosis with process is known best, the mitotic exit phosphatase
dephosphorylation of cyclin-dependent kinase (Cdk) (Cdc14) is regulated and it gets activated at the end
substrates triggered in response to cyclins destruction of mitosis. In the following, we describe the details of
and phosphatase(s) regulation. the budding yeast mitotic exit process.
In budding yeast, APC/CCdc20-mediated degrada-
tion of mitotic CycBs (called Clbs) is incomplete and
Characteristics Cdk1 downregulation requires the activation of either
another APC/C activator (Cdh1) or a stoichiometric
Cell cycle events are tightly regulated such that DNA Cdk inhibitor (Sic1). APC/CCdh1 targets CycB to
replication (S phase) is always followed by mitosis proteasome-dependent degradation similar to APC/
(M phase). In eukaryotic cells, such a strict alteration CCdc20, while Sic1 binds to and inhibits the Cdk1/
of S and M phases is largely dependent on the periodic CycB complexes. However, Cdk1/CycB complexes
activation of cyclin-dependent protein-kinases (Cdks) downregulate both Cdh1 and Sic1 thereby creating an
activated by their specific cyclins partners (Morgan antagonistic relationship between Cdk1/CycB and
2007). The underlying molecular machinery compris- these Cdk1 regulators (Fig. 1). Cdk1/CycB phosphor-
ing of systems-level feedbacks ensures periodic rise ylated Cdh1 cannot bind to APC/C; therefore, it is
and drop in Cdk activities that drive the cell cycle inactive while Cdk1/CycB complex inhibits the syn-
progression. thesis and promotes the degradation of Sic1: The Swi5
In eukaryotic cells, mitosis is brought about by transcription factor of Sic1 is inactivated by Cdk1/
intense phosphorylation of cellular proteins initiated CycB and phosphorylated form of Sic1 gets rapidly
by the activation of Cdk1/Cyclin-B complex. During ubiquitinated and degraded.
early stages of mitosis (metaphase), replicated chro- Antagonism between Cdk1/CycB and its negative
mosomes are still held together by cohesin complexes, regulator (Cdh1 and Sic1) forms the underlying prin-
and they align (biorient) on the bipolar mitotic spindle ciple of cell cycle regulation in budding yeast, which
with the sister chromatids attaching to the opposite makes the control system bistable, with alternative
poles. During the following phase of mitosis (ana- steady states of high and low Cdk1 activities (Chen
phase), cohesins holding sister-chromatids together et al. 2004). At mitotic exit, APC/CCdh1 is directly
C 334 Cell Cycle Transitions, Mitotic Exit

Clb5 Cdh1 therefore loss of its function arrest cells in telophase


Clb2 with relatively high Cdk1 activity. Phosphorylated
Cdh1 and Swi5 are both concentrated in the cytoplasm
Sic1 Clb2 Cdc5 MEN Esp1 and hence their dephosphorylation and activation
requires MEN-dependent cytoplasmic release of Cdc14.
Cdk1 and polo-kinase (Cdc5) are responsible for
Cdc20 Net1 Net1 phosphorylation in the FEAR network, and the
FEAR also includes separase (Esp1), Slk19, and Spo12
Swi5 Pds1 Clb5 PP2A Cdc14 (D’Amours and Amon 2004). Phosphorylated Net1
gets dephosphorylated by PP2ACdc55 and by released
Cdc14 itself, the later mechanism creates a negative
Esp1 feedback in the network. Separase (Esp1) promotes
Net1 phosphorylation and thereby Cdc14 release
Cell Cycle Transitions, Mitotic Exit, Fig. 1 Interaction net- by inhibiting PP2ACdc55 but this action does not
work for the mitotic exit in budding yeast. Cdc20 initiates the require the proteolytic function of separase (Queralt
network by degradation of Clb2, Clb5, and securin (Pds1). This et al. 2006). Active PP2ACdc55 keeps Net1
trigger the downregulation of Cdk1 activities and upregulation
of a phosphatase, Cdc14 leading to the activation of Cdh1 and dephosphorylated and Cdc14 inactive until the degra-
Sic1. Cdh1/Sic1 exists in antagonist relationship with Cdk1 and dation of securin (Pds1 in budding yeast) by APC/
helps in the mitotic exit by further reducing the Cdk1 activity. CCdc20 at anaphase, which activates separase (Esp1 in
Cdc14 release from the nucleolus is crucial for the mitotic exit, budding yeast). However, FEAR-dependent Cdc14
which is inhibited by Net1 until anaphase. Cdc14 inhibition is
relieved through Net1 phosphorylation controlled by Clb2, release is also limited by APC/CCdc20-dependent
Cdc5, MEN (mitotic exit network), and separase (Esp1). Esp1 CycB degradation, which partially inactivates Cdk1.
activates Cdc14 release through its non-proteolytic (PP2A Since this makes Cdc14 release transient, the MEN is
downregulation) and proteolytic (MEN activation) functions required to sustain Net1 phosphorylation and Cdc14
release at later stages of mitosis. In addition to that, the
MEN also helps in the cytoplasmic accumulation of
activated by Cdc14-mediated dephosphorylation, Cdc14 by restricting its nuclear import.
whereas the level of Sic1 gets upregulated by Cdc14- Activation of MEN is complex and it is linked to the
dependent activation of its transcription factor, Swi5 entry of a spindle-pole-body (SPB) into the bud. This
and turning off Sic1 degradation. In addition, Cdc14 control allows the cells to exit mitosis only after proper
also dephosphorylates other Cdk1 substrates at the end positioning of anaphase spindle (spindle orientation
of mitosis and thereby promotes mitotic exit directly checkpoint) (Bardin and Amon 2001; Stegmeier and
as well. Amon 2004). Movement of SPB into bud depends on
Cdc14 is kept inactive in the nucleolus during most the cleavage of sister-chromatid cohesins which is
of the cell cycle by its inhibitor, Net1. As cells enter catalyzed by the proteolytic activity of separase. The
into anaphase, Cdc14 is activated and released from MEN component, Tem1 is a small G-protein localized
nucleolus by two regulatory networks, Cdc-Fourteen at the SPB and it is negatively regulated by GTPase
Early Anaphase Release (FEAR) network and Mitotic activating protein (GAP) complex, Bfa1/Bub2. Tem1
Exit Network (MEN) (D’Amours and Amon 2004; activation depends on a putative GDP:GTP exchange
Stegmeier and Amon 2004). Both of these networks factor, Lte1, present at the bud cortex. In this way,
bring about phosphorylation of Net1 which allows Tem1 activation depends on sister-chromatid separa-
Cdc14 release. The FEAR network–mediated Cdc14 tion and anaphase spindle elongation which brings
release is important for proper anaphase progression Tem1 and its activator, Lte1 in proximity. Tem1 trig-
but it is dispensable for mitotic exit. On other hand, gers the activation of two MEN kinases, Cdc15 and
MEN is required to sustain the released Cdc14 in Mob1- Dbf2, which are localized at the daughter SPB.
nucleus and to disperse it into cytoplasm during later Both Cdc15 and Mob1 are inactivated by Cdk1/CycB-
stages of mitosis. MEN is essential for mitotic exit, mediated phosphorylation, and they undergo
Cell Cycle Transitions, Mitotic Exit 335 C
dephosphorylation in Cdc14-dependent manner during (through Cdh1/Swi5 dephosphorylation) besides to
mitotic exit. Therefore, MEN activation and Cdc14 dephosphorylate several other Cdk1 substrates. How-
release comprise a positive feedback loop by which ever, in other organisms the Cdk1 inactivation is
they activate each other. The early release of Cdc14 by achieved by APC/CCdc20-dependent degradation of
FEAR network provides an initial activation for MEN CycB only, thereby exit from mitosis can be driven
which helps to turn on the positive feedback loop by a constitutive phosphatase. In fission yeast, Clp1
(Queralt et al. 2006). This coordination ensures that (Cdc14 homologue) phosphatase is not crucial for C
the Cdc14-MEN positive feedback depends on the mitotic exit but it regulates G2/M transition. Clp1
activation of separase and thereby on the execution of and Cdk1 exist in antagonistic relationship, which
anaphase. In the absence of FEAR network function, helps to upregulate the phosphatase with Cdk1
activation of the Cdc14-MEN positive feedback is downregulation by APC/Ccdc20 and provide switch
delayed and hence cells exit from mitosis only after like behavior to mitotic exit. Similar to MEN, fission
a time delay. yeast has septation initiation network (SIN), but it only
It can be seen that during mitotic exit budding yeast has a role in cytokinesis and not in mitotic exit (Bardin
cells control the Cdk1/Cdc14 activity ratio to flip the and Amon 2001).
Cdk1-Cdh1/Sic1 bistable switch from high to low Cdk1 Hence, the mitotic exit control in the cell cycle
activity. APC/CCdc20 functions as a critical node by regulation depends on how the Cdk downregulation is
controlling both the kinase (Cdk1) and phosphatase coordinated with phosphatase(s) regulation. There is true
(Cdc14) branches of the network required for mitotic lack of knowledge about the phosphatases and their
exit (Fig. 1). APC/CCdc20 activation decreases the Cdk1/ regulation with regard to mitotic exit in several other
Cdc14 ratio by decreasing the Cdk1 and increasing the organisms. As seen in budding yeast, Cdc14 upregulation
Cdc14 activities. APC/CCdc20-dependent degradation of helps to flip the bistable switch Cdk1-Cdh1 toward lower
its substrates (the major mitotic and S phase CycBs, Cdk1 activity. A positive feedback loop on the MEN
Clb2 and Clb5 respectively as well as securin) is the helps to accelerate the Cdc14 release and exit. It has to
trigger required to flip the bistable switch (Shirayama be seen whether such an upregulated phosphatase control
et al. 1999). is required to drive mitotic exit in other eukaryotes.
Once budding cells exited from mitosis in a Cdc14-
dependent manner, Cdc14 activity is not required any-
more as it can interfere with the entry into next cell Cross-References
cycle. Therefore, the MEN is inactivated and Cdc14
gets re-sequestered into the nucleolus. This is mainly ▶ Cell Cycle, Budding Yeast
achieved by APC/CCdh1-dependent degradation of ▶ Cell Cycle Checkpoints
Cdc5, a key component of MEN. This creates the ▶ Cell Cycle Dynamics, Irreversibility
following negative feedback: Cdc5 ! Cdc14 ! ▶ Cell Cycle, Fission Yeast
Cdh1 —| Cdc5 in the regulatory network which can ▶ Cell Cycle of Mammalian Cells
produce free running oscillations (so-called Cdc14 ▶ Cell Cycle Transitions, G2/M
endocycles) in the presence of nondegradable Clb2 ▶ Cyclins and Cyclin-dependent Kinases
(Lu and Cross 2010). But Clb2 degradation entrains ▶ Endoreplication
these free running Cdc14 endocycles with the periodic ▶ Mitosis
activation and inactivation of Cdk1. Thus, Cdk1 inac-
tivation during mitotic exit is required for restricting
endocycles besides initiating cytokinesis and promot- References
ing entry into G1 phase of the next cycle.
Although Cdc14 is essential in budding yeast, it is Bardin AJ, Amon A (2001) Men and sin: what’s the difference?
Nat Rev Mol Cell Biol 2(11):815–826
dispensable for mitotic exit in several other eukary- Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B,
otes. This could be because in budding yeast Cdc14 Tyson JJ (2004) Integrative analysis of cell cycle control in
activation is required for Cdk1 downregulation budding yeast. Mol Biol Cell 15(8):3841–3862
C 336 Cell Cycle, Archaea

D’Amours D, Amon A (2004) At the interface between signaling


and executing anaphase-Cdc14 and the FEAR network. Cell Cycle, Biology
Genes Dev 18(21):2581–2595
Lu Y, Cross FR (2010) Periodic cyclin-Cdk activity entrains an
autonomous Cdc14 release oscillator. Cell 141(2):268–279 Roberta Alfieri and Luciano Milanesi
Morgan DO (1999) Regulation of the APC and the exit from Institute for Biomedical Technologies – CNR
mitosis. Nat Cell Biol 1(2):E47–E53 (Consiglio Nazionale delle Ricerche), Segrate,
Morgan DO (2007) The cell cycle. Principles of control. New
Science Press Ltd, London Milan, Italy
Queralt E, Uhlmann F (2008) Cdk-counteracting phosphatases
unlock mitotic exit. Curr Opin Cell Biol 20(6):661–668
Queralt E, Lehane C, Novak B, Uhlmann F (2006) Synonyms
Downregulation of PP2A (Cdc-95) phosphatase by separase
initiates mitotic exit in budding yeast. Cell 125(4):719–732
Shirayama M, Toth A, Galova M, Nasmyth K (1999) APC Cell division cycle; Cell proliferation process
(Cdc20) promotes exit from mitosis by destroying the ana-
phase inhibitor Pds1 and cyclin Clb5. Nature
402(6758):203–207
Stegmeier F, Amon A (2004) Closing mitosis: the functions of Definition
the Cdc14 phosphatase and its regulation. Annu Rev Genet
38:203–232 The cell cycle is a complex and crucial event for the
life of every organism (Mitchison 1971; Nurse 2000).
It consists of a series of ordered events, which
regulate the cellular growth and the division
Cell Cycle, Archaea depending on external stimuli. The cell cycle can
also be defined as the space which elapses from the
Chris J. Norbury formation of a cell as result of mitosis and its com-
Sir William Dunn School of Pathology, plete division into two daughter cells. Cell division
University of Oxford, Oxford, UK can occur in two different ways, depending on the
nature of the cells themselves: A somatic cell
undergoes a cellular division named mitosis while
Definition a germ cell undergoes a cellular division named mei-
osis (Jorgensen and Tyers 2004).
The archaea, the “third domain of life” (as distinct Cell division consists of two consecutive processes,
from bacteria and eukaryotes), have some of the gen- mainly characterized by DNA replication and segrega-
eral features of bacteria, notably single, circular chro- tion of replicated chromosomes into two separate cells.
mosomes, but the biochemistry of the archaeal cell Originally, cell division was divided into two stages:
cycle is much more closely related to that of eukary- mitosis (M), i.e., the process of nuclear division; and
otes than it is to that of bacteria. interphase, the interlude between two M phases. Stages
of mitosis include prophase, metaphase, anaphase, and
telophase. The interphase cells simply grow in size, but
Cross-References different techniques revealed that the interphase
includes G1, S, and G2 phases (Norbury and
▶ Cell Cycle, Physiology Nurse 1992; Blow and Tanaka 2005). Replication of
DNA occurs in a specific part of the interphase
called S phase.
References

Barry ER, Bell SD (2006) DNA replication in the archaea.


Cross-References
Microbiol Mol Biol Rev 70:876–887
Bernander R (2007) The cell cycle of Sulfolobus. Mol Microbiol
66:557–562 ▶ Cell Cycle Database
Cell Cycle, Budding Yeast 337 C
References models of the budding yeast cell cycle have been built
and tested against a wide range of experimental obser-
Blow JJ, Tanaka TU (2005) The chromosome cycle: coordinat- vations. (6) The models make interesting predictions
ing replication and segregation. EMBO Rep 6(11):
that have been confirmed, in large part, by experimen-
1028–1034
Jorgensen P, Tyers M (2004) How cells coordinate growth and tal tests.
division. Curr Biol 14:1014–1027
Mitchison JM (1971) The biology of the cell cycle. Cambridge C
University Press, Cambridge
Norbury C, Nurse P (1992) Animal cell cycles and their control.
Characteristics
Annu Rev Biochem 61:441–470
Nurse P (2000) A long twentieth century of the cell cycle and Physiology
beyond. Cell 100(1):71–78 Budding yeast gets its name from its unusual style of
asymmetric division into a large mother cell and
a small daughter cell (Pringle and Hartwell 1981).
Cell Cycle, Budding Yeast After a G1 period, the budding yeast cell initiates
a new bud at about the same time that it enters
John J. Tyson1, Katherine C. Chen1 and Béla Novák2 S phase (DNA synthesis). Also, at this time, the
1
Department of Biological Sciences, Virginia yeast cell replicates its spindle pole bodies and begins
Polytechnic Institute and State University, Blacksburg, preparations for mitosis. These cotemporaneous
VA, USA events of the budding yeast cell cycle are referred to
2
Department of Biochemistry, Oxford Centre for as START. The bud first emerges from the cell in
Integrative Systems Biology, University of Oxford, a burst of polarized growth but quickly switches
Oxford, UK to isotropic growth, to form an expanding spherical
protrusion. Most of the net cell growth after this
time goes into the bud. After DNA synthesis is fin-
Synonyms ished, an intranuclear mitotic spindle is built and the
replicated chromosomes are aligned at the metaphase
Saccharomyces cerevisiae plate. Simultaneously, the G2/M nucleus migrates
to the neck between the mother and bud compart-
ments and orients itself with one pole of the mitotic
Definition spindle in the mother cell and the other pole in the
bud. During anaphase, the replicated chromosomes
The cell cycle of budding yeast has become a hallmark are partitioned into two groups: one group is pushed
problem of molecular systems biology for a number of into the mother cell and the other into the bud.
reasons. (1) The cycle of DNA replication, mitosis, and The stretched nucleus divides in two, and the cell
cell division is crucial to all aspects of biological separates at the bud neck to produce mother and
growth, development, and reproduction. (2) The daughter cells. The daughter cell generally has a con-
genes and pathways for cell cycle control seem to be siderably longer G1 period than the mother cell. It
very similar in all eukaryotes, including mammals. must grow to a certain threshold size before it can
(3) Budding yeast is exceptionally good for genetic initiate a new round of budding, DNA replication, and
analysis of cell cycle controls because it can proliferate division.
as haploid cells, and its genetic makeup can be easily Progression through the budding yeast cell cycle is
altered by standard tools of molecular genetics. For governed by a series of checkpoints. For instance, we
this reason, a great deal is known about the molecular have already mentioned that daughter cells must
machinery regulating the events of the budding yeast grow to a critical size before they can execute START.
cell cycle. (4) The machinery is very complicated, and Also, DNA damage can delay execution of START
its properties cannot be worked out reliably by intui- until the damage is repaired. These cells will not
tive reasoning alone. (5) Comprehensive mathematical execute the metaphase/anaphase transition until all
C 338 Cell Cycle, Budding Yeast

chromosomes are fully replicated and properly controller for the progression of the cell cycle) is
aligned on the mitotic spindle (the “spindle assembly a dimer of a protein kinase, encoded by the CDC28/
checkpoint” or SAC). Budding yeast cells have a cdc2+/CDK1 gene, and a regulatory subunit called
“morphogenetic checkpoint” which blocks mitosis if cyclin B which had been identified years earlier by
the bud is not properly formed, and a spindle align- Tim Hunt. Because the cyclin subunit is required for
ment checkpoint that blocks cell division if the activity of the protein kinase, the kinase is called
daughter cell nucleus is not properly pushed into the cyclin-dependent kinase, or CDK for short. The dis-
bud compartment. covery of the composition of MPF was the beginning
of the modern era of molecular biology of cell cycle
Molecular Cell Biology control in eukaryotic cells. In 2001 the Nobel Prize in
Genetic analysis of the molecular machinery control- Physiology or Medicine was shared by Hartwell,
ling the major events of the cell cycle was begun in Nurse, and Hunt.
budding yeast by Leland Hartwell, who identified In the 1990s a host of cell cycle genes were identi-
dozens of genes that induced the “cell division cycle” fied in budding yeast, including:
phenotype. By Hartwell’s definition, a CDC gene is CLN1,2: encoding G1 cyclins important for START and
a gene that, when mutated, arrests all cells in a specific bud emergence
stage of the cell cycle. For example, cdc7 mutant cells CLB1-6: encoding B-type cyclins essential for DNA
do not replicate their DNA or divide, but bud emer- synthesis and mitosis
gence and growth continue as normal. Mutant cdc CDC20: encoding a ubiquitin-ligase component essen-
genes are isolated as temperature-sensitive alleles tial for sister chromatid separation and cyclin deg-
that on replicate plates form colonies at 25 C but not radation in anaphase
at the restrictive temperature (36 C). Presumptive CDC14: encoding a phosphatase essential for mitotic
mutant cells on the high temperature plate must be exit
examined microscopically to determine that the cells CDH1, SIC1: encoding proteins that stabilize G1 phase
have uniformly arrested at a particular morphological of the cell cycle
stage of the cell cycle. Hartwell’s first set of CDC
genes defined two parallel pathways (Hartwell et al. The Heart of the Budding Yeast Cell Cycle
1974): one pathway relating to bud emergence, isotro- Kim Nasmyth (1996) proposed to think of the budding
pic growth, and nuclear migration, and the other relat- yeast cell cycle as two alternative, self-maintaining
ing to DNA replication and mitosis. These two states: G1 (unreplicated chromosomes, growing but
pathways diverge at START and come together finally not committed to proliferate), and S/G2/M (committed
at mitotic exit. The gene with the earliest effect, whose to chromosome replication and mitosis). START is the
mutation blocked cells before the START transition, was transition from G1 to S/G2/M, and mitotic exit is the
CDC28. transition from S/G2/M back to G1. The molecular
Hartwell’s work set off a landslide of activity correlates of these two states are G1: high activity of
identifying CDC genes in a variety of organisms, Cdh1 and Sic1, no kinase activity associated with
finding other genes that interacted with CDC genes, Clb1-6; and S/G2/M: low activity of Cdh1 and Sic1,
and later cloning these genes and characterizing their high Clb-dependent kinase activity. Finally, Nasmyth
protein products and the interactions among these recognized that a possible reason for these alternative,
proteins (for review, see Morgan 2007). Worthy of self-maintaining states was the antagonism between
note was Paul Nurse’s discovery of a fission yeast Cdh1 and Sic1 on one side (destroying the activity of
gene, cdc2+, which seemed to encode the same pro- the Clb-dependent kinase complexes) and the B-type
tein as CDC28 in budding yeast. Nurse went on to cyclins, Clb1-6, on the other side (inactivating Cdh1
show, quite unexpectedly, that a human gene, later and Sic1).
called CDK1, encodes a protein that can rescue cdc2 Nasmyth stopped short of identifying his idea with
mutant fission yeast cells. In 1989 a consortium of a bistable control system flipping back and forth
geneticists and biochemists demonstrated that MPF between two stable steady states as cell-cycle-
(mitosis promoting factor in frog eggs, a master progression signals pushed the control system past
Cell Cycle, Budding Yeast 339 C
SK EP

CDK
Growth Chromosome
damage alignment
problems problems
Enemies

C
CDK
G2 M A

T
G1 G1

SK EP

SK EP
Enemies CDK

G1 S G2 M A T G1

Cell Cycle, Budding Yeast, Fig. 1 The heart of the budding G1-like steady state merges with and is annihilated by the
yeast cell cycle. Adapted from Tyson and Novak (2008). (Top unstable steady state at a saddle-node bifurcation. Exit from
center) Cyclin-dependent kinase (CDK) and its enemies (Cdh1 mitosis (middle right) is induced by EP (Cdc14 phosphatase),
and Sic1) are involved in an antagonistic (double negative) which activates the proteins that destroy CDK activity. When the
feedback loop, which can persist in either of two stable steady upper steady state merges with the unstable steady state, the cell
states: a G1-like state with low CDK activity and prominent returns to the G1 state. Because EP activity depends on CDK,
enemy forces, and an M-like steady state with high CDK activity after CDK falls, EP activity also disappears, but now the enemies
and the enemies in retreat. These two stable steady states are have the upper hand. The two blue arrows together form
represented by black circles along the CDK axis (middle center). a “hysteresis loop” that switches the CDK-control system from
The pairs of circles side-by-side are meant to represent the same OFF to ON and back again during each cell cycle. (Bottom) Pro-
steady state under conditions when both the starter kinase (SK) gression around the hysteresis loop corresponds to periodic
and exit phosphatase (EP) are close to zero. The two stable changes in the activities of CDK, enemies, SK, and EP. At
steady states are separated by an unstable steady state (open START, a burst of SK activity flips the CDK switch ON, and at
circle). A newborn cell in G1 (middle left) can be induced to EXIT, a burst of EP activity flips the CDK switch OFF. (Top left
enter S/G2/M by activation of SK (Cln-dependent kinase), and right) Checkpoints function by inhibiting either SK or EP.
which phosphorylates and weakens Cdh1 and Sic1, allowing Growth requirements and damage repair processes typically halt
CDK activity to rise and trigger S phase (follow the blue arrow the cell cycle in G1 phase by preventing up-regulation of SK
for the cell’s passage through the cell cycle). Among other activity. Misalignment of chromosomes on the spindle blocks
duties, CDK downregulates SK, but the bistable switch remains exit from mitosis by preventing activation of EP
in the upper state. Notice that, as SK activity increases, the stable
C 340 Cell Cycle, Budding Yeast

23⬚C R = raffinose saddle-node bifurcations. But this connection became


R+G G = galactose the basis of a series of mathematical models of the
G R 0 0.5 1.0 hr
budding yeast cell cycle control system developed by
Novak, Tyson, and coworkers (Chen et al. 2004; Tyson
Clb2p and Novak 2008) and by Alberghina et al. (2009).
* Standard for protein loading A brief overview of these models is presented in
45 88 88 91 46 % Unbudded Fig. 1, and the interested reader is referred to the
original literature for details.
+ 2.5 hr
Glucose Experimental Tests
37⬚C Cross and collaborators (Cross et al. 2002) have con-
Clb2p
*
firmed experimentally a number of predictions of the
Chen model, including the idea of two alternative
85 86 26 % Unbudded stable steady states (see Fig. 2).
Cell Cycle, Budding Yeast, Fig. 2 Experimental confirmation
of alternative stable states with low or high Clb2-dependent
kinase activity. Adapted from Cross et al. (2002). To test the Cross-References
prediction that a stable G1 state of low Clb-dependent kinase
activity coexists with a stable S/G2/M state of high Clb-
dependent kinase activity when SK ¼ EP ¼ 0, Fred Cross and ▶ Cell Cycle Dynamics, Bistability and Oscillations
colleagues measured Clb2 abundance and cell-cycle phase in ▶ Cell Cycle Dynamics, Irreversibility
a mutant strain of budding yeast, cln1D cln2D cln3D GAL-CLN3 ▶ Cell Cycle Model Analysis, Bifurcation Theory
cdc14ts. When grown in raffinose or glucose, the GAL promoter ▶ Cell Cycle Modeling, Differential Equation
is repressed, and the cells are unable to make any Cln proteins, so
they have no Cln-dependent kinase activity (SK ¼ 0 in Fig. 1). ▶ Cell Cycle of Early Frog Embryos
When grown at 37 C, this strain has no Cdc14 activity (EP ¼ 0 in ▶ Cell Cycle, Fission Yeast
Fig. 1). In raffinose or glucose at 37 C, this strain has neither
Cln3 nor Cdc14 activity, i.e., it is in the “neutral” position in
Fig. 1. In galactose-rich medium (GAL promoter active) at 23 C,
the cells have both SK and EP, so they are able to progress Web Resources
around the cell cycle (the blue hysteresis loop in Fig. 1). Under
these conditions, the cells proliferate asynchronously, as indi- Model webpage: http://mpf.biol.vt.edu/research/
cated (upper left) by the facts that 45% of the cells are unbudded budding_yeast_model/pp/
(in G1) and the remainder of the cells are in S/G2/M with
significant amounts of Clb2 protein. When this asynchronous
population of cells is shifted to raffinose medium at 23 C for 6 h
(i.e., deprived of SK), the cells arrest primarily in G1 phase (88%
unbuddded) with undetectable levels of Clb2 (lane R). This is the References
G1 steady state in Fig. 1. Next, the culture is shifted to galactose
+ raffinose medium at 23 C to induce production of Cln3. At the Alberghina L, Coccetti P, Orlandi I (2009) Systems biology of
end of the indicated periods of time, two aliquots of the culture the cell cycle of Saccharomyces cerevisiae: from network
are removed. One aliquot is analyzed immediately for bud index mining to system-level properties. Biotechnol Adv 27:960–
and Clb2 content (lanes 3, 4, 5 of upper gel); the other aliquot is 978
shifted to glucose medium at 37 C for 2.5 h and then analyzed as Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B,
before (lower gel). During the variable period of exposure to Tyson JJ (2004) Integrative analysis of cell cycle control in
galactose, the cells produce increasing amounts of Cln3 protein budding yeast. Mol Biol Cell 15:3841–3862
(i.e., increasing pulses of SK in Fig. 1). Cells exposed to a small Cross FR, Archambault V, Miller M, Klovstad M (2002) Testing
pulse of Cln3 (0 or 0.5 h in galactose) remained in the stable G1 a mathematical model for the yeast cell cycle. Mol Biol Cell
state, as indicated by lack of buds and little or no Clb2 protein 13:52–70
(lanes 1 and 2 of lower gel). Cells that got a larger pulse of Cln3 Hartwell LH, Culotti J, Pringle JR, Reid BJ (1974) Genetic
(1 h in galactose) switched to the stable S/G2/M state (mostly control of the cell division cycle in yeast. Science 183:46–51
budded cells with abundant Clb2 – lane 3 of lower gel) and Morgan DO (2007) The cell cycle: principles of control. New
stayed there because EP was disabled at the higher temperature. Science, London
The cells in all three lanes of the lower gel are exposed to the Nasmyth K (1996) At the heart of the budding yeast cell cycle.
same conditions (SK ¼ EP ¼ 0), but the cells in lanes 1 and 2 are Trends Genet 12:405–412
in the G1 state, whereas the cells in lane 3 are in the S/G2/M Pringle JR, Hartwell LH (1981) The Saccharomyces cerevisiae
state; i.e., the system is bistable cell cycle. In: Strathern JN, Jones EW, Broach JR (eds)
Cell Cycle, Cancer Cell Cycle and Oncogene Addiction 341 C
The molecular biology of the yeast Saccharomyces.
Cold Spring Harbor Laboratory, Cold Spring Harbor,
Evading Insensitivity to
pp 97–142 apoptosis anti-growth signals
Tyson JJ, Novak B (2008) Temporal organization of the cell
cycle. Curr Biol 18:R759–R768

C
Cell Cycle, Cancer Cell Cycle and
Oncogene Addiction

Vito Quaranta1, Darren Tyson2 and Peter Frick2


1
The Vanderbilt-Ingram Cancer Center, Nashville,
TN, USA
2 Sustained Tissue invasion
Vanderbilt University, Nashville, TN, USA
angiogenesis & metastasis

Synonyms Limitless replicative


potential

Cancer cell cycle; Oncogene addiction


Cell Cycle, Cancer Cell Cycle and Oncogene Addiction,
Fig. 1 A schematic of the classic six hallmarks of cancer
taken from the original article (Hanahan and Weinberg 2000),
Definition courtesy of Elsevier Limited

Cancer Cell Cycle


Cancer is a disease of cell proliferation, whereby bypass these regulatory mechanisms. Any cancer
cancer cells progressively and inexorably lose normal therapy must therefore address the uncontrolled cell
cell cycle control. As a consequence of tumor growth, proliferation driven by the dysregulated cell cycle
the burden on patients increases and tumors eventu- (Fig. 1).
ally spread and metastasize. Cancers evolve through
a multistep progression of accumulating somatic
genetic changes (Vogelstein and Kinzler 1993) Characteristics
accompanied with epigenetic abnormalities (Jones
and Baylin 2002). These alterations lead to an Oncogene Addiction
increasing departure from regular cellular function While a minimalistic phenotypic description can
and confer progressively tumorigenic phenotypes. describe all cancers, cancer types are heterogeneous
Cancers arise out of normal tissues, and it is generally with respect to diverse molecular and clinical descrip-
thought that they originate from a single cell. This cell tions. A variety of molecular alterations can synergis-
loses the intrinsic cellular processes that prevent tically act to dysregulate the cell cycle. In the past
abnormal cell division and acquires a conserved set decade, it has become apparent that subset of cancers
of traits – hallmarks – characteristic of all cancers displays a profound dependence on one or a few onco-
(Hanahan and Weinberg 2000). Of these hallmarks, genes – so-called oncogene-addicted cancers
three relate to the ▶ cell cycle: self-sufficiency in (Weinstein 2002). While cancers typically incur con-
proliferative signals, insensitivity to anti- siderable genetic damage, these oncogene-addicted
proliferative signals, and limitless replicative poten- cancers rely heavily on the activity of few oncogene
tial. Normal cells require environmental cues to products. The induced signaling from these genes sup-
divide, may receive signals to stop dividing, and ports the tumor phenotype and is necessary for its
have a limited number of replicative cycles; the can- maintenance. Inhibition or deletion of these proteins
cer cell cycle, by definition, contains alterations that leads to a reversal of the tumorigenic phenotype.
C 342 Cell Cycle, Cancer Cell Cycle and Oncogene Addiction

Recognition of the oncogene-addiction phenomenon Ligand (EGF, TGFα)


has opened the way to an extremely promising thera-
EGFR
peutic opportunity: dramatic clinical responses and
minimal patient toxicity (Sawyers 2003). Transgenic
mouse models and cell lines validate this prospect, but Autophosphorylation
ultimately, patient tumor regressions (such as Chronic
Myeloid Leukemia patients treated with a BCR-ABL TK TK P TK TK P
inhibitor, Imatinib) constitute the most convincing Eriotinib

proofs. In the context of the cancer cell cycle, the


Activation of signal-transduction
deactivation of a single oncogene in these addicted cascades (for example, MAPK)
cancers may overcome the ability of a cancer cell to
bypass normal cellular regulation. However, it should
be noted that the molecular basis for oncogene addic- Cell
proliferation
Apoptosis
Invasion and
metastasis
Angiogenesis
tion is not fully understood.
Cell Cycle, Cancer Cell Cycle and Oncogene Addiction,
Oncogene Addiction and the Cell Cycle Fig. 2 EGFR and erlotinib in cancer signaling. Ligand binding
The relationship between oncogene addiction and the induces dimerization (left), subsequent phosphorylation and ini-
cell cycle is perhaps best illustrated by an example, tiation of downstream signaling (right). In cancer, EGFR signal-
ing can drive the tumorigenic phenotype. Clinically used
activating mutations in the epidermal growth factor inhibitors such as Erlotinib prevent the phosphorylation of the
receptor (EGFR) driving cell cycle progression. The intracellular substrate and can prevent EGFR signaling
EGFR is a transmembrane receptor tyrosine kinase (Figure taken from (Minna and Dowell 2005), courtesy of Nature
that, upon ligand binding, propagates downstream Publishing Group)
signaling to elicit primarily pro-survival and prolifer-
ative responses. A subset of lung cancer patients has
a conserved set of EGFR mutations causing ligand- Limitations
independent signaling; these mutations predict favor- Oncogene addiction, however, comprises only
able patient responses to specific EGFR inhibitors, a fraction of all cancers. Most cancers also have dereg-
e.g., erlotinib, gefitinib (Pao and Chmielecki 2010). ulation of other pathways along with altered molecular
Studies in cultured cancer cell lines with addictive feedback mechanisms, thus reducing the opportunity
EGFR mutations suggest that EGFR inhibition leads for single-agent therapy. Even in oncogene-addicted
to quiescence and apoptosis. Molecular antagonism cancers, tumor heterogeneity and genetic instability
of EGFR in these cases shows how therapeutics prevent lasting treatment responses in patients. Still,
may effectively target the altered cancer cell cycle. the pathway inhibition concept suggests that proper
More accurately, it is the inhibition of biochemical inhibition of tumor cell signaling may effectively over-
signaling downstream of EGFR that mediates come deregulation of the cancer cell cycle. Indeed, the
cellular responses to the drug, i.e., pathway inactiva- late I. Bernard Weinstein highlighted the need for
tion. Drug-induced alternative receptor signaling, rational drug combinations and the promise of systems
e.g., MET amplification, illustrates this point, biology approaches to understand complex cancer sig-
because it can rescue drug-mediated pathway inhibi- naling (Weinstein and Joe 2008). At the cellular level,
tion in these EGFR-reliant cells even in the presence ▶ cancer systems biology may also aid cancer thera-
of successful EGFR inhibition. Therefore, our peutics by lending insight into the dynamics underly-
example shows an important paradigm: Cell cycle ing cancer treatment (Abbott and Michor 2006).
pathway regulation correlates with phenotypic cellu-
lar changes. Notably, many proteins implicated in
oncogene addiction have established roles in cell Cross-References
cycle regulation – EGFR, HER2, MYC, BCR-ABL,
Cyclin D, RAS, WNT, etc. (Fig. 2) (Weinstein and ▶ Cancer Systems Biology
Joe 2008). ▶ Cell Cycle
Cell Cycle, Cell Size Regulation 343 C
References Cell cycle

Abbott LH, Michor F (2006) Mathematical models of targeted Growth cycle Chromosome cycle
cancer therapy. Br J Cancer 95(9):1136–1141
Hanahan D, Weinberg RA (2000) The hallmarks of cancer. Cell The two cycles are not coordinated ® Cell is not viable
100(1):57–70 Cell division
Jones PA, Baylin SB (2002) The fundamental role of epigenetic
G1
events in cancer. Nat Rev Genet 3(6):415–428
Cell M S C
Minna JD, Dowell J (2005) Erlotinib hydrochloride. Nat Rev
growth G2
Drug Discov Suppl 4:S14–S15
Pao W, Chmielecki J (2010) Rational, biologically based treat-
ment of EGFR-mutant non-small-cell lung cancer. Nat Rev The two cycles are coordinated ® Cell is viable
Cancer 10(11):760–774 Cell division
Sawyers CL (2003) Opportunities and challenges in the devel-
opment of kinase inhibitor therapy for cancer. Genes Dev G1
17(24):2998–3010 Cell
M S
Vogelstein B, Kinzler KW (1993) The multistep nature of can- growth -
G2
cer. Trends Genet 9(4):138–141
Weinstein IB (2002) Cancer. Addiction to oncogenes – the
Achilles heal of cancer. Science 297(5578):63–64
Cell Cycle, Cell Size Regulation, Fig. 1 Checkpoints ensure
size homeostasis via negative signal(s) from the growth cycle to
the chromosome cycle

Cell Cycle, Cell Size Regulation size control mechanisms operate, which ensure that at
specific checkpoints the cell cycle progression is
Ákos Sveiczer1 and Anna Rácz-Mónus2 blocked, unless a critical size has been reached. Such
1
Department of Applied Biotechnology and Food ▶ cell cycle checkpoints exist both in G1 and G2
Science, Budapest University of Technology and phases, which regulate the onset of S and M phases,
Economics, Budapest, Hungary respectively (Alberts et al. 2009). However, in
a specific cell type, both size checkpoints do not nec-
essarily work efficiently: one might dominate, while
Synonyms the other becomes cryptic. In the absence of both size
controls, cells would be smaller and smaller from cycle
Size checkpoint; Size control to cycle and finally probably lose viability (Fantes and
Nurse 1981; Sveiczer et al. 1996). The reason is that
growth (cytoplasmic) cycle was normally longer than
Definition chromosome (nuclear) cycle; therefore the cell would
not double its components and its size before cytoki-
Cell size regulation during the cell cycle is nesis (Fig. 1). By contrast, size control extends the
a mechanism coordinating growth and division. It duration of either G1 or G2 phase by blocking the
ensures that cell size distribution of a population next phase transition, ensuring a compensation mech-
remains constant in consecutive generations. anism, that is, the larger the cell at birth is, the shorter
its cycle will be. With other words, size homeostasis is
achieved via negative signal(s), which are generated
Characteristics by the growth cycle and affect progression of the
chromosome cycle. This entry summarizes how size
Size homeostasis is a general characteristic feature of checkpoints act in the most important model organisms
populations of unicellular organisms, at least in steady- studied so far, namely, budding yeast, fission yeast,
state conditions. It means that an "average" cell must early embryonic cells, and finally mammalian cells. In
double its size and all its components between two every case, we will briefly discuss the methods used
successive divisions. At the individual cellular level, in size control studies and the conclusions drawn.
C 344 Cell Cycle, Cell Size Regulation

At the end, we will give a generalized view of size


regulation in eukaryotes from aspects of systems biol-
ogy. Another important side of size control is its mod-
Strong size control
ulation by changes in the environment (shift-up and Slope » –1

Length extension (mm)


shift-down experiments); however, due to space limi-
tations this point will not be discussed here.
No size control
Slope » 0
Cell Size Regulation in Budding Yeast
Unicellular models like the budding yeast Saccharo-
myces cerevisiae are practically able to grow and
divide infinitely, if the required nutrients (C-source,
N-source, minerals, etc.) are available in the environ-
ment and the physical factors (pH, temperature, water Birth length (μm)
activity, etc.) also support proliferation. Although
Cell Cycle, Cell Size Regulation, Fig. 2 The existence of
some cells age and even die, the whole population
a size checkpoint in fission yeast is indicated by a strong nega-
may increase exponentially for many generations if tive correlation between total length extension and birth length.
the milieu does not change. Coupling growth and divi- In extremely oversized cells, size control is abolished
sion happens in late G1, controlling the G1/S transi-
tion, and the control point is called the Start event in
this organism (Morgan 2007). The mechanism of the yeast Schizosaccharomyces pombe an attractive
intracellular size measurement is mainly obscure. An model in size control studies. Cell length, which is
indirect indicator of cell size is probably the protein proportional to volume and mass, is an easily measur-
synthesis rate, which is proportional to the number of able variable, if cells have been grown under
ribosomes and also to cell mass. Budding yeast has a thermostated photomicroscope on an agar surface.
only one type of cyclin-dependent kinase (Cdk), called After measuring many cells from time-lapse films, and
Cdc28; however several different cyclins (Clns and plotting the total length extension versus birth length
Clbs) are activated at different cell cycle stages, more- (Fig. 2), a strong negative correlation indicates the
over, the different cyclin-Cdc28 complexes have dif- existence of a size checkpoint (Fantes 1977). The
ferent substrate specificities, leading to an irreversible closer the slope of the regression line to 1 is,
progression through the whole cycle (▶ Cell Cycle, the stronger the size control is. In abnormally over-
Budding Yeast). Cln3 is the first cyclin emerging in sized temperature-sensitive cdc mutants generated by
G1, and this protein is highly unstable. Therefore, Cln3 a block and release experiment, however, size control
is a good candidate to be a size-sensing protein, whose is abolished, as indicated by the lack of negative cor-
concentration reflects translation rate in the cell relation. This size checkpoint, in contrast to budding
(Morgan 2007). This hypothesis is also supported by yeast, regulates the onset of ▶ mitosis, acting some-
the fact that cells overproducing Cln3 are abnormally where in the middle of G2 phase (Sveiczer et al. 1996).
small. Probably the most relevant effect of the In small wee1 mutants, this checkpoint no more
Cln3-Cdc28 kinase complex is that it directly phos- operates; however, a strong G1/S size regulation is
phorylates a nuclear Start inhibitor, the Whi5 protein. turned on here, which mechanism is cryptic in nor-
Phosphorylated Whi5 is exported from the nucleus, mal-sized wild type cells. The main cyclin-dependent
allowing a specific transcription program to start, lead- kinase complex in fission yeast is the Cdc13-Cdc2
ing (among many other proteins) to Cln1 and Cln2 dimer, also known as the M phase promoting factor
production, and finally to initiation of ▶ DNA replica- (MPF). Its activity in G2 is mainly regulated by
tion (Di Talia et al. 2007). a kinase-phosphatase pair, Wee1 and Cdc25, phos-
phorylating and dephosphorylating the inhibitory
Cell Size Regulation in Fission Yeast Tyr-15 site of Cdc2, respectively (▶ Cell Cycle, Fis-
Its regular cylindrical shape, tip growth with constant sion Yeast). Similarly to budding yeast, a translational
diameter, and symmetric division make the fission control seems to be involved in size sensing, in the case
Cell Cycle, Cell Size Regulation 345 C
of fission yeast both on the levels of Cdc13 and Cdc25 nor divide in this stage (Morgan 2007). The effect of
(Morgan 2007). Analyzing time-lapse films of several growth factors results in passing the G0/G1 transition,
cell cycle mutants of S. pombe, however, indicates that when the cell starts to proliferate (▶ Cell Cycle of
its mitotic size control acts mainly through the Wee1 Mammalian Cells). Their most important checkpoint
protein, while the cryptic Start size checkpoint is the Start control in late G1, similarly to that of
operates via the cyclin-dependent kinase inhibitor, budding yeast; however, it is rather called the restric-
Rum1 (Sveiczer et al. 1996). tion point (▶ Cell Cycle Transition, Detailed Regula- C
tion of Restriction Point) in animal cells. The main
Cell Size Regulation in Early Embryos molecular regulators of the restriction point are E2F
In early embryos of multicellular animals, cell growth transcription factors required for G1/S progression, an
and division are uncoupled, that is, size homeostasis is inhibitor of E2Fs (the retinoblastoma protein or RB),
not maintained by any regulatory mechanisms. The and different ▶ cyclins and cyclin-dependent kinases.
most frequently studied models are embryos of the To get rid of problems of tissue control, in experiments
fly Drosophila and of the frog Xenopus (▶ Cell Cycle cultured animal cells are generally involved, supplied
of Early Frog Embryos). First, giant eggs are produced artificially with growth factors. A further problem is
by the females, in which the cell cycle is blocked in G2, the irregular shape of cells, causing uncertainties in
but growth is continued, stockpiling huge amounts of size determination. One possibility is to measure total
maternal proteins and mRNA, required later for protein content or protein synthesis rate in the cells, or
embryogenesis (Morgan 2007). After maturation and the other one is to measure cell volume by an electronic
fertilization, rapid and synchronous cycles occur prac- cell counter. A very long debate has been continued on
tically without any growth, consisting of only S and to decide whether cell size affects or not the restriction
M phases, but neither G1 nor G2. Size checkpoint is point in animal cells (Morgan 2007). The answer to
lost here, or more precisely, the large size makes it this question may alter in different cell types; however,
cryptic. During about a dozen of these size- a seminal paper suggests us that probably size control
independent cycles, cell size dramatically decreases. mechanisms generally operate in higher eukaryotes
At around the midblastula transition, however, when (Dolznig et al. 2004), which conclusion has recently
cell size reaches a critical level from above, size control been confirmed (Tzur et al. 2009).
starts to operate. Cycle time increases as G1 and G2
phases are involved, and cell divisions are no more Systems Biology of Size Control
synchronous during the following developmental How a cell measures its size during the cell cycle is not
stages. In experiments with Xenopus, plotting cycle clear, however, any size-sensing mechanism must
time as a function of cell radius at birth, similar charac- work via the biochemical cell cycle engine, driving
teristics were found to those ones discussed above for any cell from birth to division. Therefore, cyclins,
fission yeast. In cells before the midblastula transition, Cdks, their inhibitors, and other regulators are gener-
cycle time was found to be totally independent of birth ally thought to be involved in size regulation. As
size; by contrast, after this stage, a strong negative discussed above, translational control on highly unsta-
correlation arose between them (Wang et al. 2000). ble proteins is hypothesized to have a general role here.
The main cellular parameter controlling these cycles So, in mathematical models of the cell cycle consisting
was found to be the nucleocytoplasmic ratio, that is, of ordinary differential equations on the biochemical
the DNA content of the cell divided by its cell mass. reactions of the most important proteins, cyclins are
generally assumed to accumulate in the nucleus pro-
Cell Size Regulation in Mammalian Cells portional to cell mass. Such models were shown to
Mammalian cells, similarly to yeasts, require nutrients describe correctly the size checkpoints in different
for proliferation, but they also need growth factors cell types; for example in the case of fission yeast,
generated normally by other cells of the same organ- even a stochastic version of a former deterministic
ism. With other words, division of these cells is under model was successfully developed (Sveiczer et al.
strict tissue control; in the absence of growth factors 2001). However, a novel way of size sensing was
the cells are in a phase called G0, and they neither grow recently discovered in rod-shaped cells, like in fission
C 346 Cell Cycle, Coupled with Circadian Clock

▶ Cyclins and Cyclin-dependent Kinases


- ▶ DNA Replication
▶ Mitosis

References

- Alberts B, Bray D, Hopkin K, Johnson A, Lewis J, Raff M,


Roberts K, Walter P (2009) Essential cell biology, 3rd edn.
Garland Science, New York and London
Di Talia S, Skotheim JM, Bean JM, Siggia ED, Cross FR
(2007) The effects of molecular noise and size control
on variability in the budding yeast cell cycle. Nature 448:
Cell Cycle, Cell Size Regulation, Fig. 3 Rod-shaped cells 947–951
may measure their size by generating a spatial gradient of Dolznig H, Grebien F, Sauer T, Beug H, M€ ullner EW (2004) Evi-
a cell cycle inhibitor protein at the cellular cortex. In small dence for a size-sensing mechanism in animal cells. Nat Cell
cells (top), the inhibitory effect from the tips is large in the Biol 6:899–905
middle, which blocks the nuclear cycle. In large cells (bottom), Fantes PA (1977) Control of cell size and cycle time in
the inhibitory effect from the tips is little in the middle, which Schizosaccharomyces pombe. J Cell Sci 24:51–67
releases the block Fantes PA, Nurse P (1981) Division timing: controls, models and
mechanisms. In: John PCL (ed) The cell cycle. Cambridge
University Press, Cambridge, pp 11–33
Morgan DO (2007) The cell cycle. Principles of control. New
yeast and even in some bacteria (Bacillus subtilis). In Science Press, London
these cells, a spatial gradient of some proteins is Moseley JB, Nurse P (2010) Cell division intersects with cell
geometry. Cell 142:189–193
formed at the cortex, having a maximal local concen-
Sveiczer A, Novak B, Mitchison JM (1996) The size control of
tration at the cell tips, and a continuous decrease in the fission yeast revisited. J Cell Sci 109:2947–2957
direction of the middle (Fig. 3). These proteins inhibit Sveiczer A, Tyson JJ, Novak B (2001) A stochastic molecular
cell cycle progression in the middle of the cell, when model of the fission yeast cell cycle: role of the nucleocy-
toplasmic ratio in cycle time regulation. Biophys Chem
the cell is small. As the cell becomes larger and larger,
92:1–15
the protein in the middle dilutes, and at a critical cell Tzur A, Kafri R, LeBleu VS, Lahav G, Kirschner MW
length, it loses its efficient inhibitory effect (Moseley (2009) Cell growth and size homeostasis in proliferating
and Nurse 2010). How far this mechanism is really animal cells. Science 325:167–171
Wang P, Hayden S, Masui Y (2000) Transition of the blastomere
involved in size regulation and how universal this cell cycle from cell size-independent to size-dependent con-
way might be are interesting questions for the future trol at the midblastula stage in Xenopus laevis. J Exp Zool
to be studied. 287:128–144

Acknowledgment Research in our group is supported by the


Hungarian Scientific Research Fund (OTKA K-76229) and by
Cell Cycle, Coupled with Circadian Clock
the New Széchenyi Plan (Project ID: TÁMOP-4.2.1/B-09/1/
KMR-2010-0002). Christian I. Hong
Department of Molecular and Cellular Physiology,
University of Cincinnati College of Medicine,
Cross-References Cincinnati, OH, USA

▶ Cell Cycle Checkpoints


▶ Cell Cycle of Early Frog Embryos Definition
▶ Cell Cycle of Mammalian Cells
▶ Cell Cycle Transition, Principles of Restriction Cell cycle and ▶ circadian rhythms carry out specific
Point functions to regulate cell divisions and provide
▶ Cell Cycle, Budding Yeast temporal controls, respectively. These two distinct
▶ Cell Cycle, Fission Yeast mechanisms interact with each other via molecular
Cell Cycle, Coupled with Circadian Clock 347 C
coupling factors such as WEE1 and CHK2. WEE1 is two cell cycle components: (1) an inhibitory kinase,
directly regulated by a heterodimeric circadian WEE1, that inactivates the activity of Cdc2 and
transcription factor, BMAL1/CLOCK. On the other determines the entry into the mitosis, and (2) a cell
hand, a cell cycle checkpoint regulator, CHK2, cycle checkpoint regulator, CHK2, that is induced
phosphorylates one of the core clock components, upon DNA damage to relay proper signals for DNA
mPER1, and leads to its subsequent degradation upon damage responses.
DNA damage. This bidirectional coupling between the In mammals, circadian rhythms are present in C
cell cycle and the circadian clock is conserved from various cell types including liver, fibroblasts,
filamentous fungi, Neurospora crassa, to mammals. etc. The circadian clock is robust and stable at
a single cell level. In fibroblasts, however, cells do
not communicate with each other, and they are
Characteristics asynchronous as a population. The master clock
resides in the suprachiasmatic nucleus (SCN), which
In most eukaryotic organisms, networks of cell cycle is situated in the hypothalamus. These neuronal cells
and circadian rhythms coexist and work in harmony to do communicate and synchronize with each other,
create optimum conditions for cells to grow and receive input signals, and relay output signals to other
adapt to surrounding environment. The underlying cells. Both SCN neurons and peripheral tissues have
molecular mechanisms of cell cycle are conserved identical molecular components that constitute a
from yeast to mammals (Nurse 2000). This robust 24-h oscillator. A conserved PAS domain containing
control system is equipped with multiple checkpoints heterodimeric transcription factors, BMAL1/CLOCK
for controlled growth and cell divisions. Cell cycle or BMAL1/NPAS2, activates transcriptions of core
produces an oscillatory phenomenon of repeated clock components such as mPers (mPer1 and mPer2)
growth and division. The period of this oscillation, and mCrys (mCry1 and mCry2) (Ukai and Ueda 2010).
however, varies with external conditions such as Translated products from these transcripts, mPERs and
nutrient and temperature. Cell cycle system is mCRYs, translocate into the nucleus and negatively
optimized for growth and division, but not for time regulate their own transcription factors creating a
keeping. Circadian rhythms keep track of time and time-delayed negative feedback loop. In a simplified
provide temporal regulations in most eukaryotic picture, this feedback circuit generates a stable
organisms with a period of about 24 h (Dunlap 1999). oscillator. The circadian clock system, however, is
In contrast to the period of cell division cycle, the much more complex with other interlocked
period of circadian clock is relatively insensitive to feedback loops with components such as RevErb-a.
external conditions such as nutrients or temperature. Furthermore, transcriptional, post-transcriptional,
These two molecular mechanisms for distinct post-translational, and nuclear import/export
functions are connected by molecular coupling regulations play important roles in the circadian
components, which are addressed below. clock dynamics adding multiple layers of controls.
Molecular mechanisms of cell cycle progress and Cell cycle and circadian rhythms are coupled
their dynamical properties are identical in all eukary- together despite their discrete functions. The circadian
otes, and extensively studied in various ways including clock–gated cell division cycles are observed in
mathematical modeling (Csikasz-Nagy et al. 2006). various organisms from cyanobacteria to mammals
Key regulatory components are conserved across (Yang et al. 2010; Sahar and Sassone-Corsi 2009).
different species. In mammals, four different From 1950s, scientists observed cell divisions that
complexes of Cyclin-dependent-kinases (CDKs) and only occur during particular times of the circadian
their regulatory Cyclin partners (Cdc2/CycB, Cdk2/ cycle (Sweeney and Hastings 1958). However, it was
CycA, Cdk2/CycE, and Cdk4/CycD) orchestrate cell only recently that scientists identified molecular
cycle progressions. Their appearances, activities, and coupling factors between the cell cycle and the
disappearances are regulated by multiple components circadian clock (Sahar and Sassone-Corsi 2009).
such as inhibitors (Rb, p27Kip1), transcription factors A circadian transcription factor, BMAL1/CLOCK,
(c-Myc, E2F, Mcm), and degradation factors (p55Cdc/ directly binds to Wee1’s E-box elements and activates
APC, Cdh1/APC). For this entry, it is important to note the transcription of Wee1. Both the abundance and the
C 348 Cell Cycle, Coupled with Circadian Clock

activity of WEE1 oscillate and are high during mPER1, which leads to its subsequent degradations
the evening. These, in turn, determine the duration of (Sahar and Sassone-Corsi 2009). In other words,
G2-phase. Cells divide in late evening when WEE1 CHK2-induced phosphorylation triggers premature
level decreases in order to allow cells to enter into the degradation of mPER1, which creates phase shift in
M-phase. the circadian clock. The DNA damage–induced phase
WEE1 undergoes complex regulation, because cell shifts are unique because DNA damage predominantly
cycle–controlled WEE1 is in concert with another creates phase advances, but not much of delays.
regulation that is periodically imposed by the circadian This unique phase response was investigated with
clock. The period of cell division cycle varies as tem- computational modeling and revealed possible
perature and/or nutrient conditions change while the molecular criteria for such phenotype: (1) An autocat-
period of circadian clock remains relatively constant. alytic positive feedback mechanism is required
This fact sets up a stage for in silico studies for the cell in addition to the time-delayed negative feedback
cycle and the circadian clock as coupled oscillators. loop and (2) CHK2 phosphorylates and triggers
Computational modeling and analysis of this particular subsequent degradations of mPERs that are not
coupling via WEE1 revealed: (1) phase-lock and bound to their transcription factors, BMAL1/CLOCK
synchronization of cell division cycle with circadian (Hong et al. 2009). This in silico study provides
rhythms when the mass doubling time (MDT, or the testable hypotheses for the detailed mechanism of
period of cell cycle) is close to 24 h, (2) quantized cell DNA damage–induced phase shifts of circadian
division cycles when the MDT is different than 24 h, rhythms, which is important in understanding why
and (3) circadian influenced cell size control DNA damage signaling cascade influences the
(Zamborszky et al. 2007). If the MDT is close to circadian clock. The implications of such an interac-
24 h, the circadian-regulated WEE1 synchronizes cell tion are hypothesized that cell cycle machinery utilizes
cycle to the phase of circadian rhythms regardless of circadian rhythms to activate WEE1 by prematurely
initial phase of cell cycle. However, this coupling degrading mPERs, which subsequently releases nega-
results in multi-modal cell cycle distributions or quan- tive feedback on BMAL1/CLOCK. The BMAL1/
tized cell cycle lengths if the MDT deviates from 24 h, CLOCK-activated WEE1 will delay cell cycle
because the circadian clock influences different phases progress upon DNA damage, and provide more time
of cell cycle in each generation due to their period for DNA repair.
differences. The circadian clock may lengthen There are more clues of coupling between the cell
the MDT via WEE1 depending on its influences on cycle and the circadian clock. The transcript of c-Myc
particular cell cycle phase. This periodic break on cell oscillates with a circadian period, and BMAL1/NPAS2
cycle by the circadian clock enforces cell size control suppresses its transcription (Sahar and Sassone-Corsi
by regulating the duration of growth at the G2 phase. 2009). Its molecular details, however, remain
These multi-modal distributions are recently reported unexplored. Additional cell cycle components such as
in cyanobacteria cell cycle system under the influence Cyclin D1, Gadd45a, and Mdm-2 show rhythmic tran-
of the circadian clock with single cell measurements scripts, but their detailed regulations are unknown
(Yang et al. 2010). (Sahar and Sassone-Corsi 2009). The rhythmic Cyclin
The above seemingly unidirectional coupling was D1 transcripts suggest a role of the circadian clock at
recently shown to be conditionally bidirectional. G1-to-S transition in addition to G2-to-M transition via
Circadian rhythms show unique phase shifts upon WEE1. Furthermore, rhythmic Gadd45a and Mdm-2
DNA damage, which involves cell cycle checkpoint transcripts suggest potential roles of the circadian
regulator, CHK2. DNA damage induces DNA damage clock in DNA damage responses. It is possible that
responses that slow down cell cycle, arrest cell cycle, there are other molecular coupling components.
or lead to programmed cell death, apoptosis. These may be clock-controlled genes (CCGs) like
In mammalian system, ionizing radiation causes Wee1, conditionally affecting the circadian clock like
double-strand breaks that relays signal transduction CHK2, or novel parts of the circadian clock. Bidirec-
cascades via ATM and CHK2. Both ATM and CHK2 tional communications of these two oscillators require
interact with mPER1, and CHK2 phosphorylates further investigations.
Cell Cycle, Fission Yeast 349 C
In contrast to previous data, a recent effort to study
this coupling indicated that this connection might Cell Cycle, Fission Yeast
depend on cell types (Yeom et al. 2010). Yeom and
colleagues reported the circadian-independent cell Ákos Sveiczer and Anna Horváth
divisions in rat-1 fibroblasts. They showed that Department of Applied Biotechnology and
about 13-h cell division cycles at 37 C had no phase Food Science, Budapest University of Technology
correlations with the 24-h circadian rhythms when and Economics, Budapest, Hungary C
they followed 3–4 generations of cell divisions with
bioluminescence assay. Their data, however, also
indicate variability in cell cycle lengths, which may Synonyms
reflect quantized cell cycles as shown in
cyanobacteria studies (Yang et al. 2010). For accurate Cell division cycle; Mitotic cycle; Schizosac-
conclusions, larger data set of cell cycle lengths is charomyces pombe
necessary for careful study of coupling between the
cell cycle and the circadian clock.
Definition

Cross-References Fission yeast is an attractive unicellular model organ-


ism in eukaryotic cell cycle research from several
▶ Cell Cycle aspects (morphogenesis, size regulation, checkpoints,
▶ Cell Cycle Transitions, G2/M phase transitions, chromosome structure and dynam-
▶ Circadian Rhythm ics, meiosis, mathematical modeling, etc.). Fission
yeasts form four species in the genus Schizosac-
charomyces (Ascomycota, Archiascomycetes),
References among which the best known one is S. pombe.

Csikasz-Nagy A, Battogtokh D, Chen KC, Novak B, Tyson JJ


(2006) Analysis of a generic model of eukaryotic cell-cycle Characteristics
regulation. Biophys J 90:4361–4379
Dunlap JC (1999) Molecular bases for circadian clocks. Cell
Cell Biology and Physiology of the Fission Yeast
96:271–290
Hong CI, Zamborszky J, Csikasz-Nagy A (2009) Minimum Cell Cycle
criteria for DNA damage-induced phase advances in The fission yeast Schizosaccharomyces pombe has
circadian rhythms. PLoS Comput Biol 5(5):e1000384 very frequently been applied since the 1950s as
Nurse P (2000) A long twentieth century of the cell cycle and
a model organism in cell biology, microbial physiol-
beyond. Cell 100:71–78
Sahar S, Sassone-Corsi P (2009) Metabolism and cancer: ogy (▶ Cell Cycle, Physiology), and genetics, and later
the circadian clock connection. Nat Rev Cancer also in molecular biology, genomics, and systems biol-
9(12):886–896 ogy (Mitchison 1990; MacNeill 2002; Sveiczer et al.
Sweeney BM, Hastings JW (1958) Rhythmic cell division in
2004). As a species, S. pombe belongs to class
populations of Gonyaulax polyedra. J Protozool 5:217–224
Ukai H, Ueda HR (2010) Systems biology of mammalian Schizosaccharomycetes, subphylum Archiasco-
circadian clock. Annu Rev Physiol 72:579–603 mycotina, and phylum Ascomycota. The genus
Yang Q, Pando BF, Dong G, Golden SS, van Oudenaarden A Schizosaccharomyces consists of four species, namely
(2010) Circadian gating of the cell cycle revealed in single
cyanobacterial cells. Science 327(5972):1522–1526
S. pombe, S. japonicus, S. octosporus, and
Yeom M, Pendergast JS, Ohmiya Y, Yamazaki S (2010) S. cryophilus. All are considered to be fission yeasts;
Circadian-independent cell mitosis in immortalized fibro- however, S. pombe was the first one to be discovered
blasts. Proc Natl Acad Sci USA 107(21):9665–9670 and it is also the most known and important. Fission
Zamborszky J, Hong CI, Csikasz-Nagy A (2007) Computational
yeast is a haploid unicellular eukaryote with rod-
analysis of mammalian cell division gated by a circadian
clock: quantized cell cycles and cell size control. J Biol shaped cells of almost constant diameter (3.5 mm).
Rhythms 22(6):542–553 Cell length at birth is 7–8 mm, which extends to
C 350 Cell Cycle, Fission Yeast

13–14 mm till the end of the mitotic cell cycle by tip G1/S

growth (MacNeill and Fantes 1995). Although starva- checkpoint
• Cryptic size
tion blocks cell proliferation and induces either sexual Metaphase/ Cell division
anaphase control
differentiation (mating, meiosis, and sporulation) or checkpoint
dormancy, all these processes are not discussed in
S
this essay. G1
Fission yeast divides symmetrically, producing two
nearly identical progenies. As the sister cells behave − M

similarly in the next cycle, artificially generated syn-


chronicity in a population remains sufficient for sev-
eral generations. Moreover, cell length correlates with − G2

age; and all these characteristics make S. pombe an


attractive model organism in cell cycle research. The G2/M
generation time of its populations is generally 2–4 h, checkpoint
• Operating size
depending on temperature and the media used for control
growth. The fission yeast cell cycle (Fig. 1) contains
short G1, S, and M phases, while G2 is very long Cell Cycle, Fission Yeast, Fig. 1 The general features of the
(70% of total cycle time), because size control fission yeast cell cycle (wild type)
operates in G2 in wild-type cells (Mitchison 1989).
After ▶ mitosis, the cells cannot divide immediately,
because septation requires 30–40 min. During this Metaphase/
anaphase Cell division
septation period the cells have two nuclei, which are checkpoint
mainly in their (next) G1 phase. However, as G1 is
very short, the cells are actually replicating their DNA -
G1
(▶ DNA Replication) parallel with ▶ cytokinesis.
The first cell division cycle (cdc) mutants of fission
yeast were isolated in the 1970s by classic genetic M S
methods. Loss-of-function point mutants were gener- -
G2
ated, which could grow normally at the permissive
temperature (25 C), but stopped proliferation at the G1/S
- checkpoint
restrictive temperature (35 C). The terminal pheno-
G2/M • Operating size control
type was usually a highly elongated cell, blocked at checkpoint
a specific event in the cell cycle, called the transition • Abolished size control
point of the mutated gene. The position of the transi-
tion point discriminated the 25 identified cdc genes Cell Cycle, Fission Yeast, Fig. 2 The general features of the
fission yeast cell cycle (wee1– mutant)
as G1/S, S, G2, G2/M, and septation genes. The
temperature-sensitive mutants also enabled a good
synchronization technique; a simple temperature
block-and-release experiment produced induction gain-of-function point mutants of the cdc2 gene (the
synchrony in the population (MacNeill and Fantes wee2 and cdc2 genes were identical, and named cdc2
1995). Among these cdc mutants, some grew normally, afterward). Analyzing the cell cycle of wee1 mutants
but their size were about half that of wild type cells. revealed that G1 phase was much extended at the
Isolated in Edinburgh (Scotland), they were soon expense of G2 (Fig. 2) (Mitchison 1989). A general
renamed wee mutants, since wee means small in conclusion was drawn that fission yeast has two size
ancient Scottish language. Later it became clear that control mechanisms during the mitotic cycle (▶ Cell
mutations in only two genes (wee1 and wee2) could Cycle, Cell Size Regulation). In wild type cells the G2
cause wee phenotype; however, wee1– mutants were size checkpoint operates, while the G1 one is cryptic
found to be loss-of-function point mutants of the because of large cell size. By contrast, in wee mutants
wee1 gene, while wee2– mutants were found to be the G2 size control is abolished, cell length decreases,
Cell Cycle, Fission Yeast 351 C
control system, which ensured that discrete events of
Cell length

the cycle followed each other in the correct order.


Moreover, this control network was found to be evo-
lutionarily conserved (Morgan 2007). Fission yeast
was a crucial model organism in discovering the
engine’s mechanism, because its regulatory network
was probably the simplest among the known recent C
eukaryotes, and even simple mutations easily led to
RCP
characteristic phenotypes in this haploid species. For
example, deleting the mitotic cyclin gene cdc13 caused
▶ endoreplication cycles (repeated S phases without
Cell Time
division Cell division intervening M phases) (Hayles et al. 1994); meanwhile
deleting the replication license factor gene cdc18 led to
S G2 M G1 S a mitotic catastrophe (a lethal execution of mitotic and
cytokinetic events from the haploid G1 state)
Cell Cycle, Fission Yeast, Fig. 3 Length growth profile during
the cell cycle of fission yeast (Nishitani and Nurse 1995). Furthermore, deleting the
mitotic activator phosphatase gene cdc25 in the tem-
perature sensitive wee1–50 background generated
and the G1 mechanism maintains size homeostasis at quantized cycles at the restrictive temperature
a reduced level (Fig. 1, Fig. 2). (Sveiczer et al. 1996). By the end of the second
The time profile of cellular growth during the fis- millennium, a “wiring diagram” for the fission yeast
sion yeast cell cycle has also been an extensively cell cycle engine was made (Fig. 4), which was based
studied question for several decades. Although debates on the above mentioned experiments and could be
still arise, a more or less generally accepted view is that studied by methods of mathematical modeling
length growth starts at birth and lasts until about (▶ Cell Cycle Modeling, Differential Equation)
mitotic onset, when the cell starts to prepare for cyto- (Sveiczer et al. 2004). Since sequencing of the whole
kinesis. So, the last 25% of the cycle without visible S. pombe genome has been finished (MacNeill 2002),
growth is called the constant volume phase, meanwhile these models probably contained the major players of
the first 75% is the growth phase. The latter one is the network; however, novelties in the regulatory
made up of two linear segments, separated by a rate mechanisms still arise. Below we briefly summarize
change point (RCP) (Fig. 3), i.e., length growth in the conception on how this wiring diagram drives
fission yeast is not exponential (Mitchison and Nurse fission yeast cells through the cycle, from birth to the
1985). The reason of this RCP is also controversial. In next division.
wild type cells, tip growth is unipolar at the beginning In the center of Fig. 4 is the Cdc2/Cdc13 dimer
of the cycle and it is bipolar later, therefore changing protein complex (Sveiczer et al. 2004). Cdc2 is the
the growth manner in mid G2 phase was thought to regulatory kinase (cyclin-dependent kinase or Cdk)
cause a change in growth rate. However, such a general subunit, which phosphorylates its specific substrate
conclusion has been challenged when different cell proteins at specific Ser or Thr residues (▶ Cyclins
cycle mutants were also studied. and Cyclin-dependent Kinases). Cdc13 is the regula-
tory (cyclin) subunit of the dimer, having no enzymatic
Molecular Biology of the Fission Yeast Cell Cycle activity by itself, but its presence is essential for the
In the 1980s and 1990s novel sophisticated methods Cdc2 kinase activity. In S and early G2 phases, the
made possible that geneticists could generate several dimer forms even at small cell size, however, it is still
novel types of fission yeast mutants, for example, gene inactive, because the Tyr-15 residue of the Cdc2
deletion, gene over-expressing, and multiple mutants. subunit is phosphorylated by the Wee1 kinase,
At that time, molecular biologists also discovered the a mitotic inhibitor. In late G2 when the cell is large
biochemical functions of the most important genes and enough, by contrast, this inhibitory phosphate group is
proteins. It became clear that these proteins formed removed by the Cdc25 phosphoprotein phosphatase,
a regulatory network, named the cell cycle engine or a mitotic activator (MacNeill and Fantes 1995).
C 352 Cell Cycle, Fission Yeast

Cell Cycle, Fission Yeast, G1


Fig. 4 The cell cycle control
system of fission yeast (based
on Sveiczer et al. 2004)
St
ar
n Cdc2 t
isio +
ll div Cig2
ce +

Rum1
S
− − (DNA replication)

Ste9
C − −
AP
M
Exit o
f Slp1 − Cdc2 −
C
AP Cdc13
Wee1
+

Cdc25
+

M G2/M G2
(Mitosis)

The Cdc2/Cdc13 dimer gets fully activated, and its albeit with the help of the remaining MPF activity.
functions help condense chromosomes and remodel (Note that Cdc13 is the only known essential cyclin
microtubules in fission yeast cells. As a consequence, in fission yeast, i.e., the onset of DNA replication does
the cells pass the G2/M transition (▶ Cell Cycle Tran- not require Cig2.) This transition is under size control
sitions, G2/M), therefore this Cdk/cyclin is also known in small wee mutants, but it is cryptic in wild type cells
as the M-phase promoting factor, or MPF. In anaphase (see above). The length of G1 influences when cell
the MPF activity abruptly starts to decrease, as the division occurs (Mitchison 1989): either in G1 (wee
anaphase promoting complex (APC), with the help of mutant) or rather in S phase (wild type) (for simplicity,
its auxiliary subunit Slp1, ubiquitinates Cdc13 cyclin, Fig. 4 shows a case, where cytokinesis comes about in
marking it for degradation by the proteasome (Morgan G1). As far as DNA replication starts, the cell returns to
2007). Losing Cdk activity leads to exit of M phase, the very point from which our cell cycle journey
i.e., the cell gets into the next G1 phase in a binucleated started in the previous generation.
state (▶ Cell Cycle Transitions, Mitotic Exit). During
early G1, MPF activity is very low for two reasons. Acknowledgment Research in our group is supported by the
First, APC (with the help of another auxiliary protein Hungarian Scientific Research Fund (OTKA K-76229) and by
the New Széchenyi Plan (Project ID: TÁMOP-4.2.1/B-09/1/
Ste9) continuously ubiquitinates newly synthesized
KMR-2010-0002).
Cdc13 proteins. Second, Cdc2/Cdc13 dimers are
bound to a stoichiometric inhibitor of Cdk, the Rum1
protein (▶ Cdk Inhibitors). In late G1, another cyclin Cross-References
(Cig2) is formed, also making dimers with Cdc2. This
complex is called the starter kinase, because it helps ▶ CDK Inhibitors
the cell pass through the G1/S (or Start) transition, ▶ Cell Cycle Modeling, Differential Equation
Cell Cycle, Physiology 353 C
▶ Cell Cycle Transitions, G2/M normal physiological (as opposed to pathological) con-
▶ Cell Cycle Transitions, Mitotic Exit ditions. Chromosomal replication (▶ DNA Replica-
▶ Cell Cycle, Cell Size Regulation tion) and ordered segregation of the replicated
▶ Cell Cycle, Physiology chromosomes (in eukaryotes by ▶ mitosis or ▶ meio-
▶ Cyclins and Cyclin-dependent Kinases sis) are fundamental aspects of cell cycle physiology
▶ Cytokinesis that are common to all proliferating cells. Other events
▶ DNA Replication such as cellular growth and division (▶ Cytokinesis) C
▶ Endoreplication are features of many cell cycles, but are not universal.
▶ Mitosis Each of these processes can be regulated by physio-
logical cues from within and outside the cell. Elucida-
tion of cell cycle physiology at the systems level
References therefore requires both an understanding of the univer-
sal chromosomal cycle and consideration of the ways
Hayles J, Fisher D, Woolard A, Nurse P (1994) Temporal in which other processes are coordinated with these
order of S-phase and mitosis in fission yeast is determined
chromosomal events.
by the state of the p34cdc2/mitotic B cyclin complex. Cell
78:813–822
MacNeill SA (2002) Genome sequencing: and then there were
six. Curr Biol 12:R294–R296 Characteristics
MacNeill SA, Fantes PA (1995) Controlling entry into mitosis in
fission yeast. In: Hutchison C, Glover DM (eds) Cell cycle
control. Oxford University Press, Oxford, pp 63–105 Coordination of cell cycle events with other biological
Mitchison JM (1989) Cell cycle growth and periodicities. In: processes such as macromolecular synthesis, cell
Nasim A, Young P, Johnson BF (eds) Molecular biology of growth, and differentiation is a key aspect of the phys-
the fission yeast. Academic, New York, pp 205–242
iology of all organisms (Alberts et al. 2010; Morgan
Mitchison JM (1990) The fission yeast, Schizosaccharomyces
pombe. Bioessays 12:189–191 2007). This is particularly evident in multicellular
Mitchison JM, Nurse P (1985) Growth in cell length in the fission organisms, where the development and maintenance
yeast Schizosaccharomyces pombe. J Cell Sci 75:357–376 of complex structures, tissues, and organs are only
Morgan DO (2007) The cell cycle. Principles of control. New
possible through timely and highly regulated cell pro-
Science, London
Nishitani H, Nurse P (1995) p65cdc18 plays a major role control- liferation, differentiation, and, in many cases, ▶ apo-
ling the initiation of DNA replication in fission yeast. Cell ptosis. Understanding biological systems of this
83:397–405 complexity constitutes a monumental challenge.
Sveiczer A, Novak B, Mitchison JM (1996) The size control of
There is therefore much to be gained by studying
fission yeast revisited. J Cell Sci 109:2947–2957
Sveiczer A, Tyson JJ, Novak B (2004) Modelling the fission comparatively simple and genetically well-defined
yeast cell cycle. Brief Funct Genom Proteom 2:298–307 multicellular organisms such as Drosophila
melanogaster (▶ Model Organism), by studying the
cell cycles of early cleaving embryos, where there is
no appreciable cell growth or differentiation (▶ Cell
Cell Cycle, Physiology Cycle of Early Frog Embryos), and by studying cell
cycle regulation in unicellular models (▶ Cell Cycle,
Chris J. Norbury Archaea; ▶ Cell Cycle, Budding Yeast; ▶ Cell Cycle,
Sir William Dunn School of Pathology, Fission Yeast; ▶ Cell Cycle, Prokaryotes), including
University of Oxford, Oxford, UK cell lines derived from multicellular organisms (▶ Cell
Cycle of Mammalian Cells).
Consideration of what constitutes “normal” physi-
Definition ological conditions can be an important aspect of cell
cycle studies. While the conditions experienced by
The physiology of the ▶ cell cycle describes the pro- a cell within a multicellular organism can often be
cesses that underlie and regulate the cycle under assumed to be physiologically relevant, it is usually
C 354 Cell Cycle, Prokaryotes

difficult to reproduce an authentic “niche” for any ▶ Cytokinesis


given cell type grown in cell culture. For example, ▶ DNA Replication
proliferation and maintenance of mammalian stem ▶ Meiosis
cell cultures depends on engagement of specific com- ▶ Mitogens
binations of cell surface receptor proteins that, in the ▶ Mitosis
authentic stem cell niche, would interact with compo- ▶ Model Organism
nents of the extracellular matrix, soluble ▶ mitogens
and/or neighboring cells. Cell culture studies using References
defined medium and purified mitogens have allowed
the dissection of pathways linking these signaling pro- Alberts B, Bray D, Hopkin K, Johnson A, Lewis J, Raff M,
Roberts K, Walter P (2010) Essential cell biology. Garland
cesses to cell cycle progression, most notably for mam-
Science, New York
malian fibroblasts (▶ Cell Cycle Transition, Detailed Morgan DO (2007) The cell cycle. Oxford University Press,
Regulation of Restriction Point). This type of study has Oxford
given rise to the widely held view that non-
proliferating cells are physiologically arrested in
a protracted G1-like state (often referred to as G0, Cell Cycle, Prokaryotes
though the latter may be identical to G1 at the bio-
chemical level). It is nonetheless important to bear in Emanuele G. Biondi
mind that, under normal physiological conditions, CR1-CNRS-Laboratory of Biochemistry and
numerous non-proliferating cell types are arrested Integrated Structural Biology, Institut de Recherche
after ▶ DNA replication, in a G2-like state. Interdisciplinaire - IRI CNRS USR3078, Villeneuve
Another important aspect of mammalian cell cul- d’Ascq, France
ture is that standard growth media are routinely
supplemented with serum mitogens, often at high
levels that would not be encountered under physiolog- Synonyms
ical conditions in the tissue of origin. For many
microbes, unlike mammalian cells, proliferation in Bacterial cell cycle; Cell cycle progression in bacteria
culture is limited only by nutrient availability. Similar
considerations apply here, as the nutrient-rich environ-
ment experienced in the laboratory may bear little Definition
relation to that in which the microbe grows in the
wild. Cell kinetic, biochemical and molecular biolog- The bacterial cell cycle is defined as a series of events
ical parameters measured under non-physiological that ensure the coordination of three processes in order to
conditions may well give a misleading picture of cell duplicate a cell: DNA replication, cell growth, and cell
cycle regulation at the systems level. division. The cycle is composed by an initial G1 phase
followed by the initiation of DNA replication (S phase),
which is usually paralleled by cell growth; then, when
Cross-References replication is completed, chromosomes are positioned to
opposite compartments of the predivisional cell (G2
▶ Apoptosis phase); eventually cell division takes place by the poly-
▶ Cell Cycle merization of specific proteins that create a septum,
▶ Cell Cycle of Early Frog Embryos followed by daughter cell separation. In several bacterial
▶ Cell Cycle of Mammalian Cells species, the two cells can be functionally and morpho-
▶ Cell Cycle, Archaea logically different, such as a spore of Bacillus subtilis
▶ Cell Cycle, Budding Yeast during the sporulation process or the swarmer cell for-
▶ Cell Cycle, Fission Yeast mation during the cell cycle in Caulobacter crescentus
▶ Cell Cycle, Prokaryotes (see the section “Characteristics” below).
Cell Cycle, Prokaryotes 355 C
Cell Cycle, Prokaryotes, B period C period D period
Fig. 1 Morphological G1 S G2
transitions of cells during the
cell cycle in E. coli and
B. subtilis (upper part) and
C. crescentus (lower part). E. coli &
B. subtilis
Chromosomes are symbolized
as yellow and green circles.
Cell division machinery is the C
red circle in the midcell of C. crescentus
both models

Characteristics The initiation of DNA replication depends on the


activity of the AAA+ family of ATPases member,
Bacteria have evolved different systems for the regu- DnaA, which binds at a specific site of oriC, causing
lation of cell cycle, probably due to different ecologi- an AT-rich region to open and allowing the replisome
cal and evolutionary constraints. Differently from to start its DNA polymerization activity (Zakrzewska-
eukaryotes, bacteria have no nucleus and usually pos- Czerwińska et al. 2007).
sess a single circular chromosome which replicates Several mechanisms prevent new rounds of repli-
starting from a single origin of replication. Initiation cation right after the replisome has started: in E. coli,
of DNA replication and cell division are processes during DNA elongation, Hda converts the active
conserved in all bacteria as illustrated in the next DnaA-ATP to the less active DnaA-ADP, while in
section. B. subtilis YabA blocks DnaA activity by chaining
DnaA to DnaN (the beta clamp, which ensures DNA
Cell Cycle in Bacteria: Escherichia coli and B. subtilis polymerase processivity).
In E. coli and B. subtilis, cell cycle can be divided (as When DNA replication is terminated and the con-
illustrated in Fig. 1) in three phases (Wang and Levin densed chromosomes (nucleoids) are segregated in
2009): (1) the B Period (Birth) between a cell division opposite compartments of the predivisional cell, bac-
and the initiation of DNA replication (G1); (2) the terial cells need to position the cell division apparatus
C phase, when the chromosome is replicated (S); and in order to generate cells possessing a sufficient mass
(3) the D period, between the end of replication of and a complete chromosomal content. Those parame-
DNA and cell division (G2). ters are monitored by different mechanisms that can
The doubling time of the cell in these species vary across bacteria and a clear model has not been
(defined as the time required to double the mass of elucidated yet.
the cell) depends on temperature and nutrient avail- The initial step of cell division is coordinated
ability; more nutrients and optimal growth temperature by FtsZ, a tubulin-like GTPase, which is able to
speed up the doubling time. However, the periods polymerase, using GTP. FtsZ is able to form a ring-
C and D are always constant. In particular conditions like structure (Z-ring) at midcell and this circular
of high nutrient concentration, the growth rate is faster structure serves as scaffold for the formation of
than doubling time. In a “slow” cell cycle, replication a septum, able to constrict at the end of cell cycle.
of DNA takes place from a single origin of replication, The position of the FtsZ polymers and timing of
named oriC. From this single origin, DNA replication septum formation depends on different parameters
proceeds in two directions, ending at the opposite such as glucose availability in B. subtilis (by the
region. In the “fast” replication rate, a new round of activity of the UgtP effector protein that senses the
replication from oriC starts before the previous one has nutrient status and is able to inhibit FtsZ polymeriza-
terminated and this special case is called “multifork” tion) but largely on DNA replication and chromo-
replication. some segregation.
C 356 Cell Cycle, Prokaryotes

In fact, cell division depends on the position of the ends up in the swarmer compartments and it is absent
two oriCs which are able to inhibit septum formation; in the stalked cell that can immediately starts a new
several proteins such as MinC and MinD, which are round of replication.
localized at the poles, together with the DNA- All cell cycle regulation factors are schematized in
associated proteins SlmA in E. coli and Noc in Fig. 2 where we illustrate the multilevel regulation of
B. subtilis are ensuring that cell division can take the Caulobacter cell cycle. Two main oscillators are
place only in the midcell (Margolin 2006). working during cell cycle progression: (1) the tran-
scriptional and epigenetic circuit (CtrA-DnaA-GcrA-
Cell Cycle Regulation in C. crescentus CcrM); (2) the phosphorylation/proteolysis and
C. crescentus, a gram negative bacterium of the transcription circuit (CckA-CtrA-DivK). The latter
alpha subdivision of proteobacteria, has recently also involves coordination of CtrA proteolysis through
become a model organism for bacterial cell cycle the regulation of DivK activity.
studies (Ryan and Shapiro 2003). In this organism Transcription of ctrA is controlled by circuit com-
each cell division is asymmetric producing always posed by DnaA and GcrA, and the DNA
two functionally and morphologically different methyltransferase CcrM. DnaA, which is also
cells, the replicating “stalked” cell type and the veg- involved in initiation of DNA replication, as
etative “swarmer” type (Fig. 1). After each initiation discussed for E. coli and B. subtilis, is a key element
of DNA replication, the replication fork is kept in cell cycle regulation in C. crescentus and it also
blocked so that the Caulobacter cell cycle can follow controls the transcription of about 40 genes involved
a pattern of once-and-only-once replication per divi- in nucleotide biogenesis, cell division, and polar mor-
sion (G1, S, and G2 phases are temporally phogenesis. DnaA also activates the transcription of
distinguished). the gcrA gene. GcrA controls the de novo transcrip-
Many factors are known to regulate cell cycle tion of ctrA after proteolysis and genes involved in
progression and most of them are members of the DNA metabolism and chromosome segregation,
family of two-component system proteins, comprised including those encoding DNA gyrase, DNA
of histidine kinases, histidine phosphotransferases, helicase, DNA primase, and DNA polymerase III.
and their response regulator substrates. Among The transcriptional loop of ctrA is closed by CcrM.
those proteins CtrA is the master regulator of the In fact, CtrA activates the transcription of ccrM,
Caulobacter cell cycle; it is an essential response which encodes for a DNA methyltransferase whose
regulator whose activity as a transcription factor abundance is cell cycle dependent. CcrM is able to
varies as a function of the cell cycle. CtrA controls activate the dnaA promoter region through methyla-
various functions during cell cycle progression by tion, closing the positive feedback composed by
activating or repressing expression of genes involved CtrA, DnaA, and GcrA.
in cell division (ftsZ), flagellum biogenesis genes, CtrA must be phosphorylated to bind DNA and its
stalk biogenesis regulatory genes, pili biogenesis phosphorylation depends on cell cycle progression.
genes, and chemotaxis genes. CtrA also blocks the An essential phosphorelay, composed by the hybrid
initiation of DNA replication through binding of the histidine kinase CckA and the histidine
replication origin, oriC. phosphotransferase ChpT, is responsible for CtrA
CtrA activity and stability varies during the cell phosphorylation (Biondi et al. 2006). The membrane
cycle. Oscillation of CtrA levels, peaking at the hybrid histidine kinase CckA is dynamically local-
predivisional stage before cell division, is achieved ized during cell cycle progression acting as a kinase
by different mechanisms: transcription, proteolysis, for CtrA when it is localized at the poles, while it acts
and phosphorylation control. The result of this oscilla- as a phosphatase in the delocalized form. DivK,
tion is that CtrA is present and phosphorylated in which is a single domain response regulator that con-
swarmer cells, then is degraded at the G1/S transition trols CckA localization, plays an essential role as
and then transcribed again and phosphorylated in the a positive regulator of cell cycle progression because,
predivisional stage. Eventually, at cell division, CtrA when phosphorylated, it indirectly delocalizes CckA
Cell Cycle, Prokaryotes 357 C
DivK
divK
Gene expression
Transcription, DNA-binding
DivJ~P PleC + Pi
Post-translational/protein interactions
Phosphotransfer events DivJ PleC
Methylation DivK~P C

CckA-HK~P CckA-HK

DnaA CckA-RD CckA-RD~P


dnaA

ChpT~P ChpT
gcrA GcrA
P1 P2 CtrA
ctrA CpdR CpdR~P

CtrA~P Proteolysis
(ClpX)

ClpP
clpP
CcrM Origin of replication
ccrM
RcdA
rcdA
Pili biogenesis

In bold, the essential genes in C. crescentus Chemotaxis Flagellum biogenesi

Cell division Stalk biogenenesis

Cell Cycle, Prokaryotes, Fig. 2 The circuit controlling cell cycle progression in C. crescentus. Colors of arrows have different
colors depending on their interaction type (as indicated in the legend). See text for details

repressing CtrA and thus promotes DNA replication. Sporulation in B. subtilis


Two membrane histidine kinases are known to inter- B. subtilis, as discussed before, has a regulation
act with DivK: PleC and DivJ. PleC and DivJ are of cell cycle progression that shares many features
considered, respectively, the principal phosphatase with E. coli. However, B. subtilis is able in
and kinase of DivK and they are in opposite locations response to starvation to form resistance cell types
before cell division. called spores. This process, called sporulation
ChpT also transfers the phosphate to a second (Fig. 1), represents a special case of asymmetric
response regulator named CpdR, which, together with cell division and it is discussed separately in this
RcdA, is a factor required for CtrA proteolysis medi- section.
ated by the ClpP-ClpX protease. CtrA is degraded at Cells of B. subtilis can divide centrally in nutrient
the stalked pole at the G1/S transition when the origin rich media leading to identical progeny but, in a specific
of replication needs to be cleared and also in the cell density and nutritional signals conditions, cells can
stalked compartment, where initiation of DNA repli- sporulate. A complex phosphorelay system composed
cation occurs immediately after cell division. by two-component systems proteins is controlling the
C 358 Cell Cycle, Prokaryotes

Cell Cycle, Prokaryotes, KinA~P KinA


Fig. 3 The sporulation in DNA demage (-) KinB~P KinB
B. subtilis. In the upper part, Energy potential (+) KinC
KinC~P
the phosphorelay controlling Redox state (+)
KinD~P KinD
the activation of Spo0A Cell density (+)
together with known signals KinE~P KinE
activating (+) or repressing
() the phosphorylation are Spo0F Spo0F~P
shown. After Spo0A
activation, stages of spore
formation are indicated on the Spo0B~P Spo0B
left bottom part of the figure;
on the right the Transcription
compartmentalized Spo0A Spo0A~P
Multi step mixed control
cross-activation of Sigma
factors is indicated. Arrows
Phosphotransfer events σH
have different colors
depending on their interaction
type (as indicated in the
legend). See text for details 1 σH Spo0A
σF

2 σF σE
σE

3 σF σE
σG

4 σG σK σK
Forespore Mother cell

phosphorylation of the master regulator of sporulation, Spo0A, which integrates several environmental sig-
Spo0A. Activated Spo0A leads to a polar septation that nals using its kinases sensing apparatus, is then able to
initially produce a larger mother cell, which eventually start a series of steps leading to the formation of a spore
lised, and a smaller forespore surrounded by special and the decay of the mother cell. This multistep pro-
envelops (Piggot and Hilbert 2004). cess is carried on by different sigma factors acting in
In B. subtilis, five histidine kinases (KinA, the prin- the mother cell or in the forespore. Each sigma factor is
cipal kinase, to E) are controlling the phosphorylation indeed able to activate via many other sporulation
state of Spo0A, through the phosphorylation of Spo0F, factors the following sigma factor in a different
which then transfers to the histidine compartment.
phosphotransferase Spo0B and eventually to the Activated Spo0A and Sigma H are present
response regulator Spo0A as illustrated in Fig. 3. and functional in the step right before the asymmetric
Cell Cycle, Synchronization 359 C
septation and they together activate many Definition
genes including sigma F and sigma E precursors,
respectively, acting in the forespore and in the This term refers to experimental methods to obtain
mother cell compartment. In order to have a fully a population of cells that are in the same cell-cycle
functional Sigma E, Sigma F is required. At this stage (Davis et al. 2001). Cell-cycle synchronization
step, Sigma E activates other factors in order to allows the analysis of biological processes that are
have an active Sigma G in the forespore compart- specific to certain cell-cycle stages and that would C
ment, which is eventually required to activate otherwise be confounded or averaged out in asynchro-
Sigma K in the mother cell. Those last two nous populations. Cell-cycle synchronization can be
factors are required for full maturation of the achieved in two general ways (Fig. 1). (1) Using “block
spore, which is characterized by a special and release” approaches where all cells are first
envelope that is responsible for its special resistance arrested in a defined cell-cycle stage (“block”), and
to stresses. then allowed to cycle again in a synchronous manner
(“release”). This approach involves adding/removing
specific molecules to/from a culture or by using cells
References with conditional mutant alleles of cell-cycle regula-
tors. (2) Isolating cells that are in a given stage of the
Biondi EG, Reisinger SJ, Skerker JM, Arif M, Perchuk BS,
Ryan KR, Laub MT (2006) Regulation of the bacterial
cell cycle based on a phenotypic characteristic such as
cell cycle by an integrated genetic circuit. Nature size (centrifugal elutriation) or age (“baby machine”).
444(7121):899–904
Margolin W (2006) Bacterial division: another way to box in the
ring. Curr Biol 16(20):R881–R884
Piggot PJ, Hilbert DW (2004) Sporulation of Bacillus subtilis. Division
Curr Opin Microbiol 7(6):579–586 Selective ‘Block / release’
Ryan KR, Shapiro L (2003) Temporal and spatial regulation in method method
prokaryotic cell cycle progression and development. Annu Growth
Rev Biochem 72:367–394
Wang JD, Levin PA (2009) Metabolism, cell growth

Asynchronous
and the bacterial cell cycle. Nat Rev Microbiol
7(11):822–827

culture
Zakrzewska-Czerwińska J, Jakimowicz D, Zawilak-Pawlik A,
Messer W (2007) Regulation of the initiation of chromo-
somal replication in bacteria. FEMS Microbiol Rev
31(4):378–387

Select cell which Block cells at


are in stage stage Synchronous

Cell Cycle, Synchronization


culture

J€
urg B€ahler and Samuel Marguerat
Department of Genetics, Evolution and Environment
and UCL Cancer Institute, University College London,
London, UK

Cell Cycle, Synchronization, Fig. 1 Schematic representation


of the two general classes of cell-cycle synchronization methods
Synonyms applied to an example cell type. The blue inset at the upper right
corner describes the morphological changes of these cells during
Cell synchronization the cell cycle
C 360 Cell Cycle-regulated Gene Expression

Cross-References Cross-References

▶ Cell Cycle Analysis, Expression Profiling ▶ Cell Cycle Analysis, Expression Profiling

References

Davis PK, Ho A, Dowdy SF (2001) Biological methods for cell- Cell Cycle-regulated Kinase Complexes
cycle synchronization of mammalian cells. Biotechniques
30:1322–1326, 1328, 1330–1331
▶ Cyclins and Cyclin-dependent Kinases

Cell Cycle-regulated Gene Expression

J€
urg B€ahler and Samuel Marguerat
Cell Cycle–regulated Transcription
Department of Genetics, Evolution and Environment
▶ Cell Cycle-regulated Gene Expression
and UCL Cancer Institute, University College London,
London, UK

Synonyms Cell Death

▶ Lymphocyte Dynamics and Repertoires, Biological


Cell cycle–regulated transcription; Periodic gene
Methods
expression

Definition
Cell Division
This term describes a specific mode of gene regulation
▶ Cell Cycle of Mammalian Cells
where transcript levels peak at a specific cell cycle
phase, creating periodic profiles of gene expression
across several cycles (Fig. 1). Periodic gene expression
is not only found among cell cycle–regulated genes but
also among genes regulated during other periodic pro-
Cell Division Cycle
cesses such as the circadian cycle.
▶ Cell Cycle, Biology
▶ Cell Cycle, Fission Yeast
G1 S G2 M G1 S G2 M
Level of gene expression

Cell Division Database

▶ Cell Cycle Database

Time

Cell Cycle-regulated Gene Expression, Fig. 1 Expression


Cell Growth
profile of a hypothetical gene with a peak in expression level in
the early S-phase ▶ Cell Cycle of Mammalian Cells
Cell Membrane Proteins 361 C
where the hydrophobic parts are shielded from water
Cell Labeling between the two outer sheets of hydrophilic head
groups. Like lipids, also some proteins have
▶ Lymphocyte Dynamics and Repertoires, Biological a hydrophobic nature arising from either the presence
Methods of a hydrophobic amino acid sequence along the pro-
tein amino acid sequence, or an attached lipid mole-
cule. The need to protect hydrophobic parts from an C
aqueous environment is the major driving force for the
Cell Marker membrane attachment of a protein, which is fulfilled
by immersing these parts into the hydrophobic lipid
▶ Fluorescent Markers bilayer (Fig. 1c).
The most common three-dimensional protein struc-
ture traversing the membrane is a ahelix (Fig. 1d):
Formed by of a single peptide sequence with amino
Cell Membrane Proteins acids rotating along a common center line, the hydro-
phobic nonpolar side chains point outside and face the
Mika O. Ruonala inner core of the lipid bilayer (Alberts 1994). A less
NeuroToponomics Group, Center for Membrane frequent membrane-traversing 3D structure is
Proteomics Campus Riedberg, Goethe University of a b-strand, assembled of two or more peptide
Frankfurt, Frankfurt, Germany sequences. The hydrophobic side chains of amino
acids in a b-strand interact with equally hydrophobic
head groups of other b-strand structures to form
Synonyms a larger membrane-traversing b-sheet structure.

Lipid-protein interactions; Transmembrane proteins Structure-based Classification of Membrane


Proteins (Jennings 1989)
Basic classification of membrane proteins to integral
Definition and peripheral membrane proteins (Fig. 1d, e, respec-
tively) follows the nature of membrane association. As
Cellular membranes have a thickness of 7–20 nm and the name implies, integral membrane proteins are an
efficiently restrict free passage of nonpolar molecules integral component of the host membrane and thus
and cross-membrane communication. To overcome cannot be removed without simultaneously destroying
this problem, cells have established a massive the lipid membrane, for example, by using detergents.
membrane protein machinery that regulates most In contrast, peripheral membrane proteins are not an
cellular activities ranging from signal transduction to integral part of a host membrane, associate with it
energy balance, respiration, reproduction, and cellular weakly, and can be separated from the membranes
morphology. Thus, the sole existence of multicellular with relative ease.
organisms and cells rely on fully functional cell Integral membrane proteins are categorized to
membrane protein system. either bitopic or polytopic integral membrane
according to the number of membrane-traversing
domains. A bitopic integral membrane protein
Characteristics contains a single membrane-spanning domain
(Fig. 1d) whereas a polytopic membrane protein con-
A cell and its individual internal compartments are tains two or more membrane-embedded regions
surrounded by a membrane made of lipids. Lipids (Fig. 1f). Polytopic membrane proteins are finally
have a hydrophilic (water loving) and a hydrophobic classified either as type I or as type II according to the
(water repelling) fatty acid tail (a and b in Fig. 1, relative orientation of the first and last amino acids of
respectively). In an aqueous environment, lipids spon- the protein sequence (amino and carboxyl termini,
taneously self-organize to form lipid bilayers (Fig. 1c) respectively). In contrast to type I membrane protein,
C 362 Cell Membrane Proteins

Cell Membrane Proteins, Fig. 1 Scheme of prominent membrane domains, and both ends of the protein reside within
features of a biological membrane: Hydrophilic polar head the same volume separated by a membrane. Another peripheral
group (a) and a hydrophobic oily fatty acid group (b) are oriented membrane protein associated with the membrane via a partially
to form a lipid bilayer (c) that seals the fatty acid chains between hydrophobic polypeptide chain is shown in (g). A peripheral
two layers of polar head groups. A bitopic, type I integral mem- membrane protein (e) with attached lipid modification (h) is
brane protein (d) has parts on both sides of the membrane. present only on the inner side of the membrane. A lipidated
A polytopic, type II membrane protein (f) has more than one bitopic integral membrane protein is shown in (i)

where amino and carboxyl terminals are located at the acylation forms. While myristoylation is an event
opposing sides of the separating membrane (Fig. 1d), where a myristoyl fatty acid group (Fig. 1h) is perma-
the peptide chain of type II starts and ends both on the nently attached exclusively to a glycine amino acid at
same side of the membrane (Fig. 1f). The membrane the beginning of the newly formed polypeptide
incorporation of an integral membrane protein takes sequence protein (Fig. 1e), the attachment of a palmi-
place already during protein biosynthesis in the lumen tate fatty acid group during palmitoylation to any
of the endoplasmic reticulum (ER) following the cysteine residue of a finished protein is a reversible
detection of a signal peptide by the translation post-translational modification. Isoprenylation is yet
machinery. another form of post-translational lipid modification
Rather than by a membrane-traversing peptide where an isoprenyl lipid group, either a farnesyl or
sequence, peripheral membrane proteins associate geranylgeranyl lipid, is attached to a cysteine amino
with the membranes through hydrophobic interactions acid resides at the end of the protein. In general,
between the membrane lipids and a hydrophobic a lipidated membrane protein can attach itself either
polypeptide sequence (Fig. 1g) or via a lipid molecule on the outer membrane of a cytosolic organelle, or on
(Fig. 1e). The attachment of a lipid group to a soluble the intracellular leaflet of the plasma membrane, and
protein can provide sufficient hydrophobicity to facil- thereby executes its function in the cytosolic volume of
itate membrane attachment, and the reversible nature a cell. Yet another family of lipid-modified proteins
of some lipid modifications can act as a regulatory is formed by glycosylphosphatidylinositol (GPI)-
switch to concentrate a protein from soluble, cytosolic anchored proteins synthesized directly on a GPI-lipid
form to a membrane-bound form. A signal sequence, residing on the ER membrane. GPI-anchored mem-
which is recognized by lipid transfer enzymes (such as brane proteins are transported to the outer surface of
palmitoyl-, myristoyl-, and prenyl transferases), leads the plasma membrane and since their peptide sequence
to the covalent attachment of a palmitoyl or a myristoyl is exposed toward the extracellular space, they execute
fatty acid, an isoprenyl fatty acid lipid group, or their function toward the extracellular matrix, i.e.,
a glycosylphosphatidylinositol-(GPI) anchor attached outside the cells.
to a cysteine or glycine amino acid. Timing of Provided that the required targeting signal is present
acylation and the position of the attached fatty acid within the peptide sequence, an integral membrane
group are the major differences between the various protein can additionally be lipidated in a regulated
Cell Membrane Proteins 363 C
manner, for example, with a palmitate fatty acid. Such of epidermal growth factor (EGF) to the extracellular
a lipid molecule provides a membrane protein affinity binding pocket of EGFR induces a conformational
toward specialized membrane regions, such as change to the cytoplasmic tail of the receptor, which
caveolae or other cholesterol- and sphingolipid-rich then leads to the assembly of signaling complex and
lipid microdomains (also termed as lipid rafts, Ikonen subsequent downstream events, such as remodelling of
and Simons 1997). Thus, reversible lipidation may act the cellular cytoskeleton. Yet another functional mode
as a switch in signaling efficiency by moving an inac- for membrane proteins is the selective transport of C
tive protein from the bulk membrane inside molecules across the membrane via a channel protein
a specialized signaling platform. An example of such or protein complex. As the hydrophobic lipid bilayer is
switchable integral membrane protein is the Neuronal not readily passable for nonpolar solutes, their trans-
Cell Adhesion Molecule (NCAM) that can not only be port is mediated by a physical pore or a channel formed
palmitoylated to intracellular cysteine residues by by one polytopic membrane protein, or as a complex of
palmitoyl transferase, but also de-palmitoylated upon several bi- and/or polytopic membrane proteins.
need. The location and function of NCAM critically Within such a complex, the hydrophobic amino acid
depends on its lipidation status and subsequent resi- side chains orient themselves toward the lipid environ-
dence in cholesterol-rich membrane regions. ment, and the hydrophobic side chains together form
the passage. Traffic through the passage can be pas-
Mixed Membrane Attachment Types sive, i.e., the solutes flow from higher concentration
Hydrophobic amino acids can be used to build poly- toward lower concentration across the membrane; or
peptide chains with one-sided hydrophobicity. Upon active, where the transfer against a chemical gradient is
orientation of the hydrophobic side of the protein coupled to the use of energy, such as ATP. So-called
toward a membrane, it can be protected from aqueous cotransporters couple the chemical concentration gra-
environment. This facilitates membrane association dient to the transport of molecules against a gradient.
and results in a conformation where a part of the
protein rather lays “flat” on the plane of the membrane Membrane Proteins Move Rapidly in a Regulated
(Fig. 1g) while other parts of the protein can freely Manner
move around and possibly also obtain a lipid modifi- Experimental work has shown that membrane proteins
cation (below). Presence of such a amphiphatic are in constant diffusional movement along the mem-
domain in a soluble protein can provide sufficient brane plane and around their own axis. An integral
affinity for membrane attachment; alternatively, via membrane protein can not only freely rotate around
such a domain an integral membrane protein can inter- itself at high speed but is also able to diffuse across
act with certain lipid groups on the membrane, and a distance, corresponding to the periphery of a cell at
possibly use it for targeting along the membrane several times per second. Equally, the proteinaceous
plane to regions with specific properties. part of a lipid-modified peripheral membrane protein
can freely rotate around its lipid anchor and also along
Operation Modes of Membrane Proteins the membrane plane at high speed. However, in cellu-
Basically, two action modes exist for membrane pro- lar plasma membranes, the lateral movement of inte-
teins, and the type of membrane attachment is indica- gral and peripheral membrane proteins is restricted by
tive for the functional mode. The activity of specialized domains below and upon the membranes.
a monotopic membrane protein is restricted to one An example of restricted movement of cell membrane
side of a membrane (cis-mode). For example, proteins can be found from epithelial cell layer lining
a lipidated kinase bound on the cytoplasmic side of the digestive system. Here, the surrounding membrane
the lipid bilayer only phosphorylates other cytoplasmic of the epithelial cells is divided by a tight junction to
proteins (soluble or membrane bound). In contrast, basolateral and apical sides both having a unique mem-
a membrane-traversing bi- or polytopic membrane brane protein composition that does not mix with the
protein is likely to operate on both sides of the mem- other side.
brane in trans-mode. One concrete example are the Recent data shows that cellular cytoskeleton just
family of receptor tyrosine kinases, e.g., the receptor beneath the cell plasma membrane forms “picket
for epidermal growth factor (EGFR). Here, the binding fences,” or “pockets”. Where the poles are membrane
C 364 Cell Membrane Toponomics

proteins interacting with cytoskeletal components, enormous number of distinct proteins in order to orga-
such as actin fibers (Kusumi et al. 2011). The latter, nize the myriad of different membrane functionalities
in turn, form the boundaries of the pockets. Once inside (▶ Cell Membrane Proteins). These proteins serve
such a pocket, a protein is free to move around in a variety of essential functions, e.g., establishing and
lateral direction; however, due to the restricting cyto- controlling cell adhesion, inducing intracellular signal-
skeleton, energy is required for an integral membrane ing (by behaving as receptors), maintaining the ion
protein to jump from one pocket to another one during homeostasis (by acting as ion channels), as well as
a so-called hop-diffusion. Thus, the lateral movement controlling environmental adaptation. A fundamental
along the membrane plane is reduced and controlled by biological question is how the approx. 8,000 distinct
the cellular cytoskeleton. cell surface proteins are spatially organized as supramo-
Another cellular system provides restriction lecular functional units (toponome) in one and the same
toward lateral movement. Clusters of cholesterol and cell membrane, and how these units are altered in
sphingolipids form lipid microdomains with specific chronic diseases, such as cancer and Alzheimer’s
properties, which are significantly different from the disease. This field of research is referred to as
bulk of the membrane. Such “lipid rafts” attract both membrane toponomics. It is a subdiscipline in
integral and lipid–modified peripheral proteins to form ▶ toponomics related to the functional detection of pro-
specialized signaling platforms. Interestingly, raft asso- tein networks of cell membranes in morphologically
ciation can be stimulated and modulated upon post- intact cells or tissue sections by using TIS (▶ TIS
translational attachment of specific lipid molecules, as Robot) microscopy. It has been shown that quantitative
described above for the NCAM cell adhesion molecule. information on the subcellular topology and differential
assembly of distinct proteins and their spatial relation-
ships on the cell surface are straightforward to the
References detection of lead proteins hierarchically controlling
cell function: For instance, the systematic mapping of
Alberts B (1994) Molecular biology of the cell, 3rd edn. Garland cell surface proteins in rhabdomyosarcoma cells
Publishing, New York
(Schubert 2010) led to understanding on how tumor
Ikonen E, Simons K (1997) Functional rafts in cell membranes.
Nature 387:569 cells establish multi-protein cluster networks composed
Jennings ML (1989) Topography of membrane proteins. Annu of different classes of CD proteins required for cell
Rev Biochem 58:999 polarization/migration and tumor morphogenesis
Kusumi A et al (2011) Hierarchical mesoscale domain organi-
(Fig. 1). The CD system is a collection of a variety of
zation of the plasma membrane. Trends Biochem Sci 36:604
distinct molecular classes present on the surface of
leucocytes and, in part, also on the surface of many
other cell types (so-called Cluster of Differentiation
Cell Membrane Toponomics (CD) antigens; Zola 2001; Barclay et al. 1995). CD
antigens are held to belong to the best characterized
Anne Gieseler molecules of the cell surface membrane. An example
Molecular Pattern Recognition Research (MPRR) of co-mapping of many CD proteins has revealed a
Group, Otto-von-Guericke-University Magdeburg tumor cell specific surface toponome (Fig. 1): Underly-
Medical Faculty, Magdeburg, Germany ing experiments have shown that blocking the lead pro-
tein CD 13 (present in all multi-protein clusters, or,
▶ CMPs) leads to disassembly of the corresponding
Definition protein network and loss of function of the cell to polar-
ize. This underscores the relevance of cell surface
Any cell is a collection of highly coordinated subcellu- toponomics in understanding chronic disease and finding
lar compartments formed by unit membranes. Unit novel drug-target candidates. Analyzing the toponome
membranes are highly organized as lipid bilayers with of cell surface proteins will provide access to the mech-
a hydrophilic external and internal part and anisms underlying self-organization and cell control in
a hydrophobic intermediate part (Lodish et al. 2000). health and disease (Schubert et al. 2011; ▶ Clinical
This basic structure allows the cell to integrate an Aspects of the Toponome Imaging System (TIS)).
Cell Membrane Toponomics 365 C

Cell Membrane Toponomics, Fig. 1 Illustration showing of the supramolecular structure of the single molecular compo-
a scheme of the molecular protein species combined in various nents within these clusters is shown from (c–g). The colors
clusters along the cell surface of cell extensions (a and b) of “red,” “dark blue,” “yellow” “light blue,” “green” and corre-
a rhabdomyosarcoma tumor cell. The sequential arrangement spond to the colors in the toponome maps in (a) and (b). Scale
and geometry of the corresponding multi-protein clusters is bar: 2 mm (Modified after Schubert 2010)
highly similar in both these extensions. Note that the scheme
C 366 Cell Proliferation

References (▶ Flow Cytometry http://en.wikipedia.org/wiki/


Flow_ cytometry), it can make a physical separation of
Barclay AN, Brown MH, Law SKA, McKnight AJ, interested cells or particles from a heterogeneous popu-
Tomlinson MG, van der Merwe PA (1995) The leucocyte
lation through measure and select user-defined cell
antigen factsbook, 1st edn. Academic, San Diego
Lodish H, Berk A, Zipursky SL et al (2000) Molecular cell types by illuminating individual cells with a laser and
biology, 4th edn. W. H. Freeman, New York detecting emitted light which can be spectrally sepa-
Schubert W (2010) On the origin of cell functions encoded in the rated to show the individual character of the interested
toponome. J Biotechnol. 15; 149(4):252–259, Review doi:
cells. After the interested cells are recognized the FACS
org/10.1016/j.jbiotec.2010.03.009
Schubert W, Gieseler A, Krusche A, Serocka P, Hillert R (2011) will produce the fluid into single cell containing droplet
Next-generation biomarkers based on 100-parameter func- and make it electrically charged, then when it passes
tional super-resolution microscopy TIS. N Biotechnol. through an electric field between two high voltage
doi:10.1016/j.nbt.2011.12.004
deflection plates of opposite polarities, the droplets con-
Zola H (2001) Human leukocyte differentiation antigens as
therapeutic targets: the CD molecules and CD antibodies. tain single cell will be deflected into different containers
Expert Opin Biol Ther 3:375–383, Review according to the commander of the user-defined
program.

Characteristics
Cell Proliferation
1. Highly efficient: flow cytometer can examine hun-
▶ Modeling, Cell Division and Proliferation dreds of cells in a second and decide which cell or
particle is interested and charge it to be separated;
this is almost the most high-speed sorter now days.
2. High purity: If only the program has been set by the
Cell Proliferation Process user, FACS can get a pure group of the interested
cells since the technique is based on the reaction of
▶ Cell Cycle, Biology antigen and antibody.
3. Accurate number control of sorted cells or particles:
FACS can even put a single cell or particle into
committed container, so it is easy to control the
interested cells number according to the user.
Cell Selection
Basic Instruments for Cell Sorting
▶ Lymphocyte Dynamics and Repertoires, Biological Cell sorting of FACS needs the help of flow cytometry
Methods which includes the following: the fluidics system, the
optics and detection, the signal processing, and elec-
trostatic cell sorting.

Cell Sorting Fluidics System


In order to be interrogated by the machine’s detection
Xiaojun Liu system, the cells or the particles injected into the
Internal Medicine, The Second hospital of Hebei machine must be ordered into a steam of single parti-
Medical University, Shijiazhuang, Hebei, China cle, the fluidics system is designed for this aim. It
consists of a central channel/core through which the
sample is injected, enclosed by an outer sheath that
Definition contains faster flowing fluid. Both of the sample and
sheath are driven by an air pump and the air pressure
Fluorescence-activated cell sorting (FACS): FACS for them will be adjusted by their own pressure regu-
is a special technique based on flow cytometry lator, respectively, for the sheath at 4.5 psig and for the
Cell Sorting 367 C
sample at 4.6, 4.8, or 5.0 psig according to need of Electronic System
sample speed. As the sheath fluid moves driven by the The electronic system is aimed to convert the optical
air pump, it creates a massive drag effect on the signals to proportional electronic signals (voltage
narrowing central chamber. This alters the velocity of pulses), then after analysis of the voltage pulse height
the central fluid whose flow front becomes parabolic it will convert the signals to digital signals, and at last
with greatest velocity at its center and zero velocity at the computer will analyze the signals and display so
the wall. The effect creates a single file of particles and that we can get the data we want. After the photons are C
is called hydrodynamic focusing. Then the particles collected by the photon diode it will turn this signals
will become a single stream after they pass this area into electronic signals, while for the FSC signals they
which is ready for the machine to detect. were first passed on by a pre amp (E00, E01, E02, E-1).
Then they were passed on to the amplifier. The ampli-
Optics and Detection fier has two style LIN and LOG, the Lin amplifier will
The excitation optics is consisted of a laser, lenses, and amplify at levels of 1.00–9.99. FSC always choose this
prisms. The laser is the origin of light while the lenses style. While for the SSC, FL1, FL2, and FL3 they have
and prisms are aimed to shape and focus the laser different pathways for the amplification course, at the
beam. After the light beam passes the lenses and beginning of photon diode there is a PMT power
prisms it reached a size of 64  20 mm light spot. supply (levels 150–999 V) connects with it, and after
After hydrodynamic focusing the particles will meet the current output by the photon diode the detector will
the light beam of the laser or the arc lamp in the flow not preamplify the signals, but directly pass it on to the
cytometry, then light scattering or fluorescence emis- amplifier. The signals from SSC FL1 FL2 and FL3
sion (if the particles is labeled with a fluorochrome) normally are amplified by the LOG style. When the
will provide information about the particles properties. cells or the particles pass through the laser area, signals
The information will be detected by the detections. of light will be turned into electronic signals and then
There are three kinds of detections in the flow will create a pulse.
cytometry: the forward scatter channel (FSC), the
side scatter channel, and fluorescence channel. FSC Sorter
collects the light that is scattered in the forward direc- Sorter is the final part of the FACS. It receives instruc-
tion, typically up to 20 offset from the laser beam’s tions from the flow cytometer set by the user and
axis. The intensity of the FSC will show the roughly divides the cells into interested ones and waste. Differ-
relative size of the particles. It can be used to distin- ent type of sorter may have different manners of
guish between the cellular debris and living cells. SSC harvesting cells, for example, some harvest cells
collects the part of light that is scattered by the particles through controlling a catcher; when the flow detects
at a 90 angle to the excitation line. It gives the infor- an interested cell it will direct the catcher to move to
mation of the content within the particles. The fluores- the fluid stream to catch the cell and then quickly shift
cence (FL) channel collects the fluorescence emited aside when it is finished in order to let the waste pass
from the fluorescence materials conjugated on the anti- through (Fig. 1). While some other type of sorter will
body that are special to the cell marker, different FL work in a different way they use electric field force to
detect special wavelength light, this is controlled by separate the cells through making the cell-contained
the optical filter, there different kinds of filter: droplet electronic charged. When the cell-contained
long pass filter, short pass filter, and band pass filter. droplets pass through the high voltage plate area, the
They selectively pass certain wavelength light while electric field force will drive the interested cells
block the left. Another kind of filter called dichroic contained and waste droplets into different containers,
mirror it works in a different way, it can pass certain so that the cells interested can be divided (Fig. 2).
kind of light but block left by deflect it. FL1 normally When the cell passes the laser area the signals of the
accepts light with wavelength of 515–545 nm. FL2 cell will be detected by the detector and then the
564–606 while equal and above 650 nm light will be electronic system of the cytometer will wait for
collected by FL3. Other FLs may be designed to collect a fixed period of time to allow the interested cell to
special wavelength light according to different reach the catcher tube, then the catcher tube will
manufacture. swing into the sample stream to capture the interested
C 368 Cell Sorting

a b
catcher catcher

waste waste
harvest tube harvest tube

laser detector laser detector

sheath sheath

sample sample

Cell Sorting, Fig. 1 Cell sorter that sorting cells by catcher shifting

by the machine and is compared to the sort criteria


set on the computer. If the particle character matches
nozzle
the selection criteria, the particle will be charged
before the nozzle of fluidics system break it down
form the fluid stream which is controlled by the
laser detecter
vibration of the nozzle. After charged and broken
down from the fluid stream the droplet liquid
containing the interested particle will pass through
a strong electrostatic field where the droplets will
– + + be deflected to the harvest tube or the waste tube
plates (Fig 2).
+ –

Channels
+ –
Different fluorescence-activated cell sorting machine
harvest tube harvest tube from different manufactory may have different chan-
for interested for interested
cells
nels for fluorescence detecting. Normally the more
cells
laser source the machine has the more channels detec-
waste tube tors will be available on the machine. The followings
are some of the channels that contained by the different
Cell Sorting, Fig. 2 Cell sorter that sorting cells by electro- machine.
static field
FACSCalibur benchtop flow cytometer:
FACSCalibur benchtop flow cytometer is a two
cell (Fig. 1a). When the cell is judged to be the non- laser source machine (488 nm blue laser and
interested cell, the cytometer will switch the capture 635 nm red diode laser). It has six detector channels
tube out of the sample steam, and then the cells will FSC which receive forward scatter light that pass
pass on to the waste (Fig. 1b). a filter of 488/10 nm (a wavelength between 488
The FACSVantage SE, a stream-in-air flow 10 nm, which means wavelengths of light that are
cytometer, works in a different way. First, the between 515 and 545 nm). Others are SSC 488/
sample will be hydrodynamically focused so that the 10 nm, FL1 530/30 nm, FL2 585/42 nm, FL3 670LP
particles in the stream will pass the beam of light (long pass which means wavelength over 670 nm can
one after another. Then the character of the particle, pass the filter and detected by the detector), FL4 661/
like the FSC, SSC, fluorescence, etc., will be detected 16 nm.
Cell Sorting 369 C
BD LSR benchtop flow cytometer: BD LSR bench- membrane potential, permeability changes, caspase
top flow cytometer is a kind of cytometer that has three activity), and membrane fluidity change
lasers, they are 325, 488, and 633 nm laser respec- 8. Different combinations of the detectable markers
tively. So in this kind of machine, the excitation wave- such as DNA/surface antigen, etc.
length band is relatively wider than the FACSCalibur
which has two lasers, and more detector channels are Application
set in this kind of machine enable the machine to detect The aim of sorting is to separate the interested cell C
more surface marker at the same time. The following from the sample for further research or clinical use,
are the available detector channels in this kind of and if only the special character that is different from
machine: FSC 488 nm, SSC 488/25 nm, FL1 530/ other cells can be detected by the flow cytometer the
28 nm, FL2 575/25 nm, FL3 670LP, FL4 510/20 nm, sorter will be able to pick it out. Cell sorting cannot
FL5 380LP, FL6 660/13 nm. only give you a pure cell population but also can
With the modern techniques development more control the population’s concise cell number. This
laser and detectors can be used at the same time, enable character is very important and can enable us to do
the machine to detect more and more parameters a lot of things we need. The following are some exam-
simultaneously so that the machine can get more ples for cell sorting applications:
details of the particles being detected, for example, 1. To get pure population of the interested cells for
BD influx system has enabled seven laser paths and special research or clinical use.
supports 24 parameters simultaneously. (a) Eliminating dead cells for accurate internal
staining of cytokines: Cells are incubated with
Detectable Parameters EMA, produced by Molecular Probes and then
Fluorescence-activated cell sorting is based on the EMA enters dead cells (membranes not intact)
technique of flow cytometry so the detectable param- and intercalates into the DNA. Exposure to light
eters are the same with flow cytometer and with the from a fluorescent lamp causes covalent linkage
development of flow cytometer, the detectable of EMA to DNA. EMA fluorescence remains
parameters will be enriched gradually. The following associated with the dead cells then EMA-
are some of the detectable parameters that can be stained cells can be gated out before analysis
detected: of the FACS data.
1. Size and amount of content of cells through FSC (b) Sex control of birth: The yield of flow
and SSC cytometric sorted X- and Y-chromosome-
2. Different component of the cells such as: cell pig- bearing sperm in a given time period is an
ments such as chlorophyll, total DNA content, total important factor in the strategies used for fertil-
RNA content, DNA copy number variation (by ization and the production of sex-preselected
Flow-FISH), Glutathione, pH, intracellular ionized offspring. With the help of the flow cytometer
calcium, magnesium, membrane potential, etc. sperm can be divided into X-and Y-
3. Original function of cells such as: protein modifi- chromosome-bearing group, then they can be
cations, phospho-proteins, protein expression, and used for fertilizer to control the sex of offspring.
localization 2. To control the concise number of the interested cells
4. Extra component like transgenic products in vivo, for special research or clinical use. For example: the
particularly the Green fluorescent protein or related sorter can easily recognize and drop a single inter-
fluorescent markers) ested cell to a well or tube so that single cell
5. Antgiens on the cell surface, intracellular, nuclear research can be done.
such as cluster of differentiation (CD), various cyto- 3. Get the rare interested cells from vast back ground
kines, secondary mediators, etc. cells.
6. Function detection such as enzymatic activity, char- (a) Tumor infiltrating lymphocytes (TILs)
acterizing multidrug resistance (MDR) in cancer (b) Residual tumor cells
cells (c) Neonatal blood
7. Cell viability, apoptosis (quantification, measure- (d) Rare event analysis
ment of DNA degradation, mitochondrial (e) Fetal cells in maternal blood
C 370 Cell Synchronization

Cross-References CSO can represent various types of biological pathway


within a unified framework. As an exchange format,
▶ Flow Cytometry CSO is implemented in a formal ontology (▶ Web
▶ Fluorescence Ontology Language (OWL)).
▶ Forward-scattered Light (FSC)
▶ Side-scattered Light (SSC)
Characteristics

References CSO is based on a mathematical model called the


hybrid functional Petri nets with extension (Nagasaki
Herzenberg LA, Parks D, Sahaf B (2002) The history and future et al. 2004). Petri nets have graph-like structures
of the fluorescence activated cell sorter and flow cytometry:
consisting of three elements including place, transi-
a view from stanford. Clin Chem 48(10):1819–1827
http://en.wikipedia.org/wiki/Flow_cytometry#Fluorescence- tion, and arc. In CSO, the Petri net elements are
activated_cell_sorting renamed as more intuitive terms: entity, process, and
http://www.abdserotec.com/support/electrostatic_cell_sorting- connector, respectively, to bridge the gap between
711.html
computer science and biology researchers.
http://www.cytonome.com/ci/176/Existing_Cell_Sorting_Methods/
http://www.dddmag.com/product-bd-influx-51310.aspx
Kanz L, Bross KJ, Mielke R (1986) Fluorescence-activated Main Capabilities
sorting of individual cells onto poly-L-lysine-coated slide CSO is a general framework for understanding the
areas. Cytometry 7:491–494
behavior of cell systems in an integrated way. The
Manual Part Number:11-11-32-01, BD Biosciences, San Jose
Rens W, Welch GR, Johnson LA (1999) Improved flow features of CSO are as follows: (1) manipulation of
cytometric sorting of X- and Y-chromosome bearing different levels of granularity and abstraction of path-
sperm: Substantial increase in yield of sexed semen. Mol ways, for example, metabolic pathways, regulatory
Reprod Dev 52(1):50–56
pathways, signal transduction pathways, and cell-cell
11-10784-02 Key operator training manual BD
immunocytometry systems 2350 Qume drive San Jose CA interactions; (2) capture of both quantitative and qual-
95131–1807 itative aspects of biological function by using Petri net;
and (3) encoding of biological pathway data related to
visualization and simulation, as well as modeling.

Cell Synchronization Structure


CSO consists of approximately 200 classes for
▶ Cell Cycle, Synchronization a comprehensive representation of cell system. There
are several classes to represent cell system as
a dynamic system.
• CSMLBase is the root class for all classes in CSO.
Cell System Ontology • All data in CSO is structured around Project which
has slots to represent the comprehensive environ-
Euna Jeong, Masao Nagasaki and Satoru Miyano ment of a pathway model.
Institute of Medical Science, University of Tokyo, • Each project is required to have only one Model,
Tokyo, Japan which describes pathways via a set of processes.
• Submodel can be defined as a subset of a given
model. Each submodel contains some selected ele-
Definition ments of a model, which may be grouped to convey
any meaning.
The cell system ontology (CSO) is a system A model comprises a set of processes connected to
dynamics–centered language for modeling, visualiz- entities via connectors, and facts to provide more
ing, and simulating biological pathways (Jeong et al. information related to biological processes. The
2007a). CSO is a formal ontology that represents com- ElementBase class contains fundamental concepts
putational models as well as visualization of models. for the model, which has four subclasses: Entity,
Cell System Ontology 371 C
Process, and Connector are Petri net elements and are pathways (Nagasaki et al. 2010). Biological path-
all related to dynamic simulation, whereas Fact repre- ways in CSO can be created, loaded, edited, saved,
sents other properties that cannot be described with and simulated via CIO. The Cell System Markup
Petri net. Language (CSML) is a primary language of CIO
• Entity describes biological entities (e.g., protein, based on XML and is fully compatible with CSO.
complex, DNA, and small molecule), cellular com- • Data conversion is possible from several formats,
partment, and biological environments (e.g., UV, such as CSML to CSO, CSO to CSML, ▶ SBML C
temperature, and pH). (http://www.sbml.org/) to CSML, CellML
• Process defines interaction among entities, for (▶ CellML) to CSML, Transpath to CSML (Naga-
example, degradation, translation, and saki et al. 2008), BioPAX (http://www.biopax.org/)
phosphorylation. to CSO (Jeong et al. 2007b) for pathway databases
• Connector links entities and processes. such as KEGG (http://www.kegg.org/), Reactome
• Fact is designed for understanding the pathway (http://www.reactome.org/), CellML repository
functionalities and evaluating the status of dynamic (http://www.cellml.org/), and BioModels (http://
simulation such as concentrations that should sat- www.biomodels.net/). CIO can read and convert
isfy among simulation steps, view elements that do these formats into CSML and CSO.
not effect among simulation steps, and biological • As a data repository, the curated executable macro-
elements that do not effect among simulation steps. phage database MACPAK (http://macpak.csml.
An entity reflects the concentration of the sub- org) (Nagasaki et al. 2011) in CSML can be
stance. A process is linked to entities by connectors accessed from a web browser. TRANSPATH
that are incoming from an entity and outgoing to an (Krull et al. 2006) at BIOBASE, focused on signal
entity. A process has a speed that depends on the transduction pathways, is embedded in CIO and can
concentration of the incoming entity. Each connector be searched and edited on CIO.
has its own threshold value. During simulation, the • CSO validator (Jeong et al. 2011a, b) is
evaluated result of the threshold value decides whether a semiautomatic validation tool for CSO data,
the linked process is activated by consuming concen- which reduces time spent on checking annotation
tration of the input entity. mistakes and correcting problematic parts in terms
Several classes are also defined in CSO to represent of biological meaning and system dynamics.
various properties.
• BiologicalBase contains biological terms for anno-
tating biological properties. Controlled vocabular- Cross-References
ies are defined to investigate the reuse of existing
structured information from other sources, such as ▶ CellML
biological event, cell component, cell type, and ▶ Systems Biology Markup Language (SBML)
evidence code. ▶ Web Ontology Language (OWL)
• SimulationBase describes simulation properties
specific to each Petri net elements, such as kinetics,
thresholds, and variables. References
• ViewBase is for visualization of model compo-
nents, including geometric position, graphical Jeong E, Nagasaki M, Saito A, Miyano S (2007a) Cell system
shape, image file–related properties that link each ontology: representation for modeling, visualizing, and sim-
element to graphical representation. ulating biological pathways. In Silico Biol 7(55):623–638
Jeong E, Nagasaki M, Miyano S (2007b) Conversion from
• ChartBase is for saving the results of model simu- BioPAX to CSO for system dynamics and visualization of
lation as 2D plots. biological pathway. Genome Inform 18:225–236
Jeong E, Nagasaki M, Ikeda E, Sekiya Y, Saito A, Miyano S
Softwares (2011a) CSO validator: improving manual curation workflow
for biological pathways. Bioinformatics 27:2471–2472
• Cell Illustrator Online (CIO) is a software platform Jeong E, Nagasaki M, Ueno K, Miyano S (2011b) Ontology-
for systems biology that uses the concept of Petri based instance data validation for high-quality curated bio-
net for modeling and simulating biological logical pathways. BMC Bioinformatics 12(Suppl 1):S8
C 372 CellML Model Curation

Krull M, Pistor S, Voss N, Kel A, Reuter I, Kronenberg D, Characteristics


Michael H, Schwarzer K, Potapov A, Choi C,
Kel-Margoulis O, Wingender E (2006) TRANSPATH: an
information resource for storing and visualizing signaling Background
pathways and their pathological aberrations. Nucleic Acids Mathematical models play an essential role in
Res 1(34):D546–D551 interpreting complex biological processes. However,
Nagasaki M, Doi A, Matsuno H, Miyano S (2004) A versatile as these mathematical models themselves become
petri net based architecture for modeling and simulation
of complex biological processes. Genome Inform 15(1): increasingly complex, a need has arisen for defined
180–197 standards to facilitate model exchange and reuse.
Nagasaki M, Saito A, Li C, Jeong E, Miyano S (2008) System- ▶ CellML (Lloyd et al. 2004), and The Systems Biol-
atic reconstruction of TRANSPATH data into Cell System ogy Markup Language (SBML) (Hucka et al. 2003) are
Markup Language. BMC Syst Biol 2:53
Nagasaki M, Saito A, Jeong E, Li C, Kojima K, Ikeda E, two XML-based markup languages that have been
Miyano S (2010) Cell illustrator 4.0: a computational plat- developed in response to such a need. Once a model
form for systems biology. In Silico Biol 10(2):5–26 has been described in either CellML or SBML, it can
Nagasaki M, Saito A, Fujita A, Tremmel G, Ueno K, Ikeda E, be uploaded into a centralized, online database, such as
Jeong E, Miyano S (2011) Systems biology model repository
for macrophage pathway simulation. Bioinformatics 27: the CellML model repository (Lloyd et al. 2008, http://
1591–1593 models.cellml.org/) or the BioModels Database
(Le Novere et al. 2006, http://www.ebi.ac.uk/
biomodels-main/). From there it is freely available to
the public for download, simulation, and reuse.
Unless a modeler chooses to develop their mathe-
CellML Model Curation matical model in CellML from the outset, the creation
of a CellML model is a multistep translation process. It
Catherine M. Lloyd and Geoff Nunns begins with a model being developed in a particular
Auckland Bioengineering Institute, University of computational language, subsequent code conversion
Auckland, Auckland, New Zealand into text and rendered equations for publication, and
finally translation of the text and equations into
CellML. Once it is written, a CellML model has to be
Synonyms curated because each step in the translation process is
potentially a source for errors.
Curation; Validation Of the 550 models in the CellML model reposi-
tory (August 2012), over half have been curated.
▶ Curation can be carried out by the modeler at the
Definition time of CellML model development and/or translation,
or by the team of CellML model curators after the
▶ CellML model ▶ curation refers to the process of model has been uploaded into the CellML model
validating and annotating a CellML model. In most repository.
cases, the creation of a CellML model is a multistep
translation process: from computational code to text Curation Workflow
and rendered mathematical equations and then back to Vaild XML and CellML
code. Once created, a CellML model generally needs The first step in curating a CellML model is to load and
to be curated, since it is rare for a model to reproduce validate it with a CellML-compliant tool such as
the published results at the first iteration. Typographi- ▶ OpenCell (http://www.cellml.org/tools/opencell) or
cal errors in the publication, missing parameters, Cellular Open Resource, (COR http://cor.physiol.ox.
equations, or unit definitions, mean the CellML ac.uk/). These modeling environments provide
model is often incomplete, contains errors, or unit a validation service which highlights general errors
inconsistencies. Model ▶ curation is an ongoing pro- and inconsistencies in the CellML model. CellML
cess for which we employ a variety of simulation and model validation is carried out by the software in
validation tools and often seek help from the original a hierarchical manner. The base XML is checked
model author. first, and the tool ensures that it is well defined and
CellML Model Curation 373 C
adheres to the XML standard, for example, that each Model Simulation
opened XML tag is also closed. The next step tests for Once the model is known to be valid CellML, with
syntactic errors as each CellML file must follow gen- a complete set of equations and parameters and bal-
eral syntax rules: variables have to be defined within anced units, the model is run and the simulation
components, equations inside MathML elements, and output is compared with the results of the original
so on. published model. In most cases, this involves setting
These two types of code validation are automati- the parameter values and simulation conditions to C
cally carried out when the CellML file is opened in reflect those in the publication, and then comparing
either ▶ OpenCell or COR, and these errors have to be the simulation output with a figure or data in the
fixed before the model can be further edited as these article (see Fig. 1).
types of errors render the model invalid CellML, and
the modeling software is unable to work with such Obtaining the Original Model Code
files. Unfortunately, due to the error-prone nature of the
The second stage of software validation is multistep model translation process, it is not uncom-
a semantic check, which is of main interest for mon to have a valid CellML file which does not
▶ curation. A common source of semantic errors are reproduce the published results. In such a situation,
connections as any two components are only allowed the first step is to compare the list of equations and
a single mapping, and the variables mapped must be parameters present in the journal article with those in
defined in each component and feature appropriate the CellML model to eliminate errors introduced in
interfaces (“out” from the component in which the the creation of a CellML model. This “by eye” check
variable is defined and “in” into the component in is made easier by the verbose ▶ MathML equations
which the variable is simply used). Another form of being rendered into condensed, human readable equa-
semantic errors concerns the mathematical formula- tions (Fig. 2).
tion of the model. A model can be underconstrained If, after this check, the CellML model still will not
if there is insufficient information to run a simulation – run to reproduce the published results, it is likely that
for example, if a variable is missing a defining equation errors were introduced during conversion of the orig-
or an initial condition. Conversely, a model can be inal model code into the equations and text present in
overconstrained, for example, if a variable is both the published paper, and if possible a copy of the
defined by an equation and is assigned an initial con- original model code is obtained. This primary source
dition (and the equation is not a differential equation). has not undergone any translation and is a true repre-
Furthermore, a model can contain circular dependen- sentation of the intended form of the published
cies of variables, which in most cases can be detected model.
by the software. As the research community has become increas-
ingly aware of the CellML project, our chances of
Units Consistency obtaining a copy of the original model code from the
One of the requirements of a CellML model is that all model authors have improved. However, for some
variables and numbers are assigned a unit – even if older models we will have to accept that we are
that unit declaration is “dimensionless.” Further, as unlikely to get a copy of the original code. Further,
part of the validation service in the modeling envi- due to differences in the capabilities of different model
ronments, ▶ OpenCell and COR check for unit con- description languages, there may be certain functions
sistency. Unit consistency has two parts: across in the original model which cannot be translated into
a model and across an equation. An example of the CellML.
former is the variable “time”; if it is first defined
with the unit of “second,” “time” should not appear Curation Standards
later in the model associated with the unit of “minute” All the models in the CellML model repository are
or “hour.” For the latter, the tools report unit in- assigned a ▶ curation rating in the form of a number
balances in each individual equation. For example, of “stars” (ranging from zero to three) and a more
it is incorrect to add together two variables with verbose model status comment, which provides
different units. model-specific information such as whether the
C 374 CellML Model Curation

CellML Model Curation, Fig. 1 (a) Published figure showing the results of the original model (Goldbeter (2005) Proc Biol Sci) and
(b) the simulation output from the CellML version of the model

CellML Model Curation, Fig. 2 A comparison of the verbose raw MathML equation and the same equation rendered in COR
CellML Model Curation 375 C
CellML model was translated from the published paper a more descriptive set of ▶ curation flags. Initially,
or from the original model code. these will be based on the ▶ MIRIAM standard: the
There are four defined ▶ curation levels: Minimum Information Requested in the Annotation
• Level 0: Not curated. of Biochemical Models (Le Novere et al. 2005). This
• Level 1: The CellML model description is consis- is an agreed set of rules for curating quantitative
tent with the mathematics in the original published models of biological systems which ensure
paper. a consistent standard of curation between models in C
• Level 2: The CellML model has been checked for: distinct databases.
– Typographical errors
– Consistency of units Conclusions
– Completeness, such that all parameters and ini- We encourage model developers to publish their com-
tial conditions have been defined putational code in a common format, such as CellML,
– Overconstraints, such that the model does not concurrent with the manuscript of their paper. Indeed,
contain equations or initial values which are certain journals are already encouraging modelers to
either redundant or inconsistent submit their model code in SBML. However, until
– Consistency of simulation output with the code submission becomes an obligatory step getting
published results, such that running the model a model reviewed and published, the processes of
in an appropriate simulation environment repro- model translation and ▶ curation will remain
duces the results in the original paper essential.
• Level 3: The model has been checked to the extent
that it is known to satisfy physical constraints such
as conservation of mass, momentum, charge,
etc. This level of ▶ curation needs to be conducted References
by a specialized domain expert, who is not the
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H,
original model author.
Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A,
However, flaws have been identified in this Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V,
▶ curation rating system. The system is not hierarchi- Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH,
cal – that is, a model assigned level 2 does not Hunter PJ, Juty NS, Kasberger JL, Kremling A,
Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P,
necessarily meet the criteria for level 1. In fact,
Minch E, Mjolsness ED, Nakayama Y, Nelson MR,
the two levels are often mutually exclusive because Nielsen PF, Sakurada T, Schaff JC, Shapiro BE,
in order to get the CellML model to replicate the Shimizu TS, Spence HD, Stelling J, Takahashi K,
published results (level 2) we have to adjust Tomita M, Wagner J, Wang J (2003) The systems biology
markup language (SBML): a medium for representation and
the published model description (level 1). Further,
exchange of biochemical network models. Bioinformatics
we have observed that the current “star system” can 19:524–531
lead to uncertainty as to how to interpret the Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
▶ curation status of a model. For example, it is Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
Hucka M (2006) BioModels database: a free, centralized
often a misconception that unless a model has been
database of curated, published, quantitative kinetic models
assigned three stars, it is not working. The reality of biochemical and cellular systems. Nucleic Acids Res 34:
is that domain experts are usually very busy individ- D689–D691
uals who do not have the time to spend on Le Novere N, Finney A, Hucka M, Bhalla US, Campagne F,
Collado-Vides J, Crampin EJ, Halstead M, Klipp E,
checking other people’s models. Also, since most
Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL,
of the CellML models have been derived from Spence HD, Wanner BL (2005) Minimum information
published papers, we assume that the model has requested in the annotation of biochemical models
already undergone peer review and the reviewers (MIRIAM). Nat Biotechnol 23:1509–1515
Lloyd CM, Halstead MD, Nielsen PF (2004) CellML: its future,
were satisfied. present and past. Prog Biophys Mol Biol 85:433–450
To solve this problem of misinterpretation, we are Lloyd CM, Lawson JR, Hunter PJ, Nielsen PF (2008) The
proposing to replace the current star system with cellML model repository. Bioinformatics 24:2122–2123
C 376 CellML Model Repository

contains models describing a wide range of biological


CellML Model Repository processes, including signal transduction and metabolic
pathways, gene regulation, the cell cycle, electrophys-
Catherine M. Lloyd and Tommy Yu iology, hormone secretion, immunology, and muscle
Auckland Bioengineering Institute, University of contraction. In addition, it also supports collaborative
Auckland, Auckland, New Zealand model development, merging and alteration, and
models are not restricted to those which have been
published in journals.
Synonyms
History of the CellML Model Repository
PMR2; Physiome model repository 2 When it was created in 2000 the repository served as
a place to store model test cases while the descriptive
capabilities of the ▶ CellML language were explored.
Definition Today (June 2012), there are over 500 models in the
repository and it acts as a valuable resource for
The CellML model repository is an online database for researchers.
storing mathematical models of physiological and bio- From the onset of the project, several objectives for
logical processes. There are currently over 500 models the CellML model repository were identified. It was
in the repository, which are freely available for users to decided that the repository should be designed to:
download, simulate, modify, and upload. The reposi- • Promote the sharing of models
tory is powered by the ▶ Physiome Model Repository • Be publicly accessible
2 (PMR2) software, which provides a web-based inter- • Provide a user-friendly interface to access, query,
face to the repository and defines the user workflows, and store models
enabling both human and machine interaction with the • Provide a consistent workflow whereby models can
repository content. be submitted, reviewed, validated, and published
• Facilitate feedback on models
• Provide a mechanism for advertising new models or
Characteristics changes to existing models
There have been three different versions of the
The Purpose of the CellML Model Repository CellML model repository. The original repository
The CellML model repository (Lloyd et al. 2008, achieved few of the goals outlined above; although
http://models.cellml.org/) is an online database for models were freely accessible to the public, it had no
storing mathematical models encoded in ▶ CellML. mechanism to promote the sharing of models, facilitate
There exist similar efforts, such as the BioModels feedback, or track the change history of a model.
Database (Le Novere et al. 2006) which focuses Similarly the second version of the repository, the
on biochemical pathway models, described in ▶ Physiome Model Repository 1 (PMR1), failed to
peer-reviewed publications, and expressed in the address many of the objectives. Finally, in July 2009,
Systems Biology Markup Language (SBML) (Hucka the ▶ Physiome Model Repository 2 (PMR2) was
et al. 2003), JWS Online (Olivier and Snoep 2004) created, and the design objectives were met.
which combines a repository of kinetic models with
extensive online simulation and analysis facilities, The Underlying Software
and ModelDB (Hines et al. 2004), which stores With the introduction of the PMR2 software in July
published models in the field of computational 2009, the CellML model repository became more than
neuroscience. just a place to store models. Some of the key features of
In contrast with these other databases, which tend to the current model repository are:
focus on specific areas, such as systems biology path- • Facilitated model exchange directly between mod-
way models or computational neuroscience and exclu- elers, in possibly distant locations, without reliance
sively contain models which have already been peer on a central repository
reviewed and published, the CellML model repository • A detailed change history record for each model
CellML Model Repository 377 C
• User access workflows to control privacy when bodies or at conferences. They require a web-based
required intuitive interface through which they can easily
• Embedded ▶ workspaces to enable model reuse showcase their group’s research.
• The ability to include extra information, such as the
experimental data on which the model is based, to Navigating the CellML Model Repository
enhance the model descriptions (metadata) There are two main methods to locate a particular
The content of the repository is managed by using model in the CellML model repository: C
Mercurial (http://mercurial.selenic.com/), a ▶ Distrib- • Browsing by category: Every model stored in the
uted Version Control System (DVCS). Such a system repository is tagged with a least one category type.
allows modelers, who may be located on opposite sides If a user is interested in metabolism, for example,
of the world, to work together on the same model they can click on this category heading on the
simultaneously. This system tracks every set of repository home page, and a list of all the metabolic
changes made to the model file, by each individual pathways models in the repository will be
author, thus safeguarding the data from being displayed.
unintentionally overwritten. As it is also a distributed • Keyword search: If a user is interested in all the
system, modelers are able to work independent of the models developed by a particular author, or if they
CellML model repository, and can share their changes want to view everything related to a particular topic,
directly with each other until they decide the model is such as HIV dynamics, they can carry out a specific
ready to be uploaded into the repository. search for the author or topic through the repository
The combination of these above features, together search facility.
with the CellML language and associated metadata
specifications, provides a collaborative model devel- Downloading a Model
opment environment which is capable of enhancing There are two principal methods to download a model
communication throughout the modeling community. from the CellML model repository. The chosen
method depends on the intentions of the user.
Repository Users A casual user may just want to test the capabilities
Several specific classes of CellML model repository of a working, fully curated model, to see how it
users have been identified. Each of these classes can be behaves, while a modeler may want to take an
assigned a role, and each role has specific needs: existing model from the repository and either
• Modelers create models and need a centralized improve on it or develop it further. The former user
place to store them. The required storage mecha- can either click on a link to a pre-created model
nism is a ▶ Distributed Version Control System simulation session file (if one is available for that
(DVCS) to allow collaborators to be able to work particular model), or they can download the raw
on a model simultaneously. model file itself and subsequently open it in their
• Curators check the quality of the models in the repos- preferred simulation tool and run the model. By con-
itory. They need to know when new models have trast the latter user needs to take advantage of the
been added, or old models have been updated, so ▶ Distributed Version Control System (DVCS), they
they can be checked and then published or rejected. should clone (or copy) the entire model ▶ workspace
• General users are usually interested in investigating (or folder) through a Mecurial client onto their own
the stored models. They want to be able to search computer. This workflow ensures that a model’s mod-
the repository for keywords, model categories, or ification history is properly recorded, and enables the
author names, and need an intuitive interface to other advantages as were discussed in the software
access and download the models. section above.
• Administrators control who has permission to
upload material to the model repository, and they Uploading a Model
require an interface to grant or revoke permission To upload a model, or any other associated data (such
settings for users of the system. as an image file or experimental data sets), into the
• Senior Investigators, or project leaders, often want CellML model repository, a user requires an account.
to present the capabilities of the models to funding If the model already exists in the repository and this is
C 378 CellML

a newer version, the new model has to be committed to ▶ Distributed Version Control System (DVCS)
the local clones of an existing ▶ workspace with ▶ JWS Online
a comment on what has been changed, and then pushed ▶ Systems Biology Markup Language (SBML)
back into the repository. It is then the responsibility of
the curators to check the quality of the model before
they ▶ expose (or publish) it. Alternatively, if the References
model being added is a completely new one, the crea-
tion of a new ▶ workspace is required. Depending on Hines ML, Morse T, Migliore M, Carnevale NT, Shepherd GM
(2004) ModelDB: a database to suport computational neuro-
the permissions granted to the user, they can either
science. J Comput Neurosci 17:7–11
create a new ▶ workspace themselves, or they have Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H,
to ask the curators to create one on their behalf. The Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A,
same pushing, commenting, exposure, and publication Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V,
Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH,
process then takes place.
Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U,
Le Novere N, Loew LM, Lucio D, Mendes P, Minch E,
Private Models and Password Protection Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF,
Sometimes model developers may want to utilize the Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD,
Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003)
▶ Distributed Version Control System (DVCS) feature
The systems biology markup language (SBML): a medium for
of the CellML model repository, to track the changes representation and exchange of biochemical network models.
made to their model during the development process, Bioinformatics 19:524–531
without making their model publicly available until it Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
is complete. They can work in a private ▶ workspace
Hucka M (2006) BioModels database: a free, centralized
and choose not to ▶ expose their model until the very database of curated, published, quantitative kinetic models
end. Until they are made publicly available, models of biochemical and cellular systems. Nucleic Acids Res 34:
will not come up in search listings. Alternatively, D689–D691
Lloyd CM, Lawson JR, Hunter PJ, Nielsen PF (2008) The
a modeler may want to provide only a few, select
cellML model repository. Bioinformatics 24:2122–2123
individuals with access to their model, such as collab- Olivier BG, Snoep JL (2004) Web-based kinetic modelling using
orators, or reviewers of a journal article. If this is the JWS online. Bioinformatics 13:2134–2144
case the model can be ▶ expose but user access to
this exposure can be controlled by the repository
administrator granting permission to only a few spe-
cific users.
CellML
Persistent Model References
Each model entry in the repository has a unique iden- Andrew K. Miller1, Randall D. Britten1, Catherine M.
tifier in the form of a unique URL. Even if an older Lloyd1, David P. Nickerson1 and Poul M. Nielsen2
1
version of a model is superseded by a newer published Auckland Bioengineering Institute, University of
version, the URL of the old model is retained, with Auckland, Auckland, New Zealand
2
a note that that particular version of the model has been Department of Engineering Science, University of
superseded, or “expired.” This feature allows Auckland, Auckland, New Zealand
a modeler to refer to a specific model version, at
a specific point in time, and they can be confident
that a link in a journal article, or in correspondence Definition
with a colleague, will remain unbroken.
CellML (Hedley et al. 2001; Cuellar et al. 2006) is
a format for specifying mathematical models, building
Cross-References upon Extensible Markup Language (XML). It is
supported by a range of software tools and databases,
▶ BioModels Database: A Repository of and is most commonly used to specify mathematical
Mathematical Models of Biological Processes models of biological systems.
CellML 379 C
Characteristics used to represent parameters, state variables, and
named constants.
Systems biology researchers often need to exchange In CellML, the physical units of all variables and
mathematical models. One approach is to publish constant numbers (which can be dimensionless) must
a paper containing the mathematical equations in the be specified. If units are specified, they may either be
model. However, the process of manually converting one of the standard SI base or derived units, or units
a model to mathematical equations, and then back defined in the model. New units can be defined in terms C
into a computer modeling format is both time con- of other units with specified multipliers and offsets
suming and error prone. Another approach is to (e.g., a millimeter can be defined as a thousandth of
exchange programs for solving the model in a meter, nanomolar can be defined as a billionth of
a procedural language (such as Fortran or MATLAB). a mole per liter), and new base units can also be
This approach has several major drawbacks: it tends defined.
to result in models where the numerical algorithms A variable in one component can be connected to
used are intermixed with the model, it is difficult to a variable in another component. A connection
compose two models to make a model incorporating between two variables means that the two variables
features of each model, and it is hard to use the model represent the same quantity. If the two connected vari-
for any purpose other than the one it was originally ables are in the same physical units, then the connec-
coded for. tion is treated as equality. If they are in different, but
These problems are avoided by the CellML format dimensionally compatible units, then the CellML
(http://www.cellml.org), which acts as a container for processing software will automatically apply the
mathematical expressions encoded in Mathematical appropriate units conversion factor.
Markup Language (MathML), and so can represent In a CellML model, components can be arranged in
a wide variety of models. CellML tools are generally an encapsulation hierarchy. This can be used so that
focused on supporting the class of problems known some components (the encapsulation children)
as systems of differential algebraic equations describe the internal details of another component,
(DAEs). which provides a higher level interface (the encapsu-
CellML has been used to describe a wide range of lation parent). The internal details of each encapsula-
models relevant to systems biology. At the time of tion child component can in turn be described in the
writing, the CellML Model Repository included 500 next level of the encapsulation hierarchy, and so on to
models (including electrophysiology, gene regula- whatever depth is required.
tion, metabolism, and signal transduction models To ensure that there is a distinction between the
(Cooling et al. 2007; Nickerson and Buist 2008; internal and externally exposed parts of a model,
Terkildsen et al. 2008) through to endocrine models). components have two types of interface, the public
CellML allows models to be built in a reusable and interface and the private interface. A connection
modular fashion, and this has been exploited to build between variables in encapsulation siblings uses the
a modular library of synthetic biology models public interface of both components. A connection
(Cooling et al. 2010). between an encapsulation parent and an encapsula-
All mathematical expressions in a CellML model tion child uses the public interface of the encapsula-
are contained within components. A component is tion child and the private interface of the
a container that may represent concrete or abstract encapsulation parent. Variables can have different
concepts. For example, a component could represent visibility on the public and private interfaces. Their
an ion channel, a compartment in a cell, the surround- visibility is directional, and is set to either “in” or
ing environment of a model, or the collection of adjust- “out,” which indicates that the variable is visible on
able parameters. that interface, and may be connected to a variable of
Each CellML component may contain variables, opposite directionality. Visibility defaults to “none,”
which are named values that may or may not meaning that the variable is not visible on that
change with respect to other variables. Every named interface.
value in the mathematics within a component In large systems biology models, the same mathe-
needs to be declared as a variable; variables are matical constructs are often used multiple times, both
C 380 CellML

within a model, and across different models. For exam- licensed under Free/Open Source licenses so that it
ple, a particular mathematical formulation for an ion can be freely used. In addition, bindings that
channel might be used to represent both sodium and allow this implementation to be accessed from other
potassium channels in a cell model, and might be languages are available (e.g., for Python and Java).
reused in other models. Duplicating the markup for Modelers can use CellML as the primary source
the definition of an ion channel component multiple language for models, while leveraging numerical
times would be a repetitive and error-prone approach, tools in other environments. This is achieved using an
and improvements to the component would need to be API extension module that allows tools to convert
added manually to all copies. To address this, CellML models from CellML into procedural programming
1.1 uses a mechanism called importing. Importing languages like C or MATLAB.
allows components and units to be imported by linking In addition, modelers can make use of several pop-
to the Uniform Resource Identifier (URI) of the ular stiff and non-stiff numerical integrators, using an
imported model document from within the importing API module that allows simulations of models to be
model document. When a component is imported, run with numerical integrators from the GSL, CVODE,
the entire encapsulation hierarchy underneath it is and IDA packages.
implicitly imported, including the connections Another API module can be used to validate that
between components in that encapsulation hierarchy, models follow the rules of the CellML specification, as
so that complex submodels made up of multiple well as good modeling practices. For example, the
components can be imported. validation service can generate a warning if the units
Using the structures described so far, CellML in a mathematical expression are inconsistent.
allows the mathematics underlying a model to be There is also a range of tools for use by modelers:
described. However, for a model to be useful, it is one of these is OpenCell, which is based on the CellML
necessary to describe the correspondence between API. It is intended to be used by people creating or
parts of the model and parts of the real-world system working with CellML models. Models can be edited
being modeled. For this reason, CellML models can and validated, simulations can be run and results can be
include information, known as metadata, about plotted on graph axes. OpenCell is specifically
the model and its parts (Beard et al. 2009). This meta- designed to support working with multiple models
data is normally represented within the CellML model simultaneously, and allows the results from different
in Resource Description Format (RDF). Metadata models to be compared graphically on the same set of
specifications exist for basic information about a axes.
model, such as the details of the publication it is OpenCell also supports session files, which
based on, and the chemical species represented by a are a stored description of the complete OpenCell work-
concentration variable. There are also metadata speci- ing environment including the list of open models and
fications for stipulating parameters to numerical inte- information on which graphs to display. Session files
grators, and how to graphically display simulation can also include embedded figures; these figures can be
results. made interactive so that clicking on part of a figure will
A number of tools and libraries capable of bring up a corresponding trace on the graph.
processing CellML models have been developed COR (http://cor.physiol.ox.ac.uk) is another popu-
(Garny et al. 2008); a listing can be found at http:// lar software application for modelers, focused on the
www.cellml.org/tools. editing and use in simulation of CellML, with similar
The CellML API (Miller et al. 2010) is a description features to OpenCell. Other end-user tools that do not
of a programming interface for working with CellML focus on CellML, but include support for it are JSim,
models. Software developers can use the CellML API VirtualCell, and insilicoIDE.
when incorporating support for the CellML format into An important bioengineering application of CellML
their software. The API includes a core for performing is for multiscale organ simulation. Two simulation
basic model manipulations, as well as a number of systems that allow part of the mathematical model to
optional extension modules. The API comes with an be specified by means of CellML are Chaste and
efficient full reference implementation in C++, OpenCMISS.
Cellular Automata 381 C
Cross-References
Cellular Automata
▶ CellML Model Repository
▶ Systems Biology Markup Language (SBML) Carlo Cosentino, Basilio Vescio and Francesco Amato
Department of Clinical and Experimental Medicine,
Magna Graecia University of Catanzaro,
References Catanzaro, Italy C
Beard DA, Britten R, Cooling MT, Garny A, Halstead MDB,
Hunter PJ, Lawson J, Lloyd CM, Marsh J, Miller A et al
(2009) CellML metadata standards, associated tools and
Synonyms
repositories. Phil Trans R Soc A: Math Phys Eng Sci
367(1895):1845 Cellular spaces; Cellular structures; Homogeneous
Cooling M, Hunter P, Crampin EJ (2007) Modeling hypertrophic structures; Iterative arrays; Tessellation automata;
IP3 transients in the cardiac myocyte. Biophys
Tessellation structures
J 93(10):3421–3433
Cooling MT, Rouilly V, Misirli G, Lawson J, Yu T, Hallinan J,
Wipat A (2010) Standard virtual biological parts:
a repository of modular modeling components for synthetic Definition
biology. Bioinformatics 26(7):925
Cuellar A, Nielsen P, Halstead M, Bullivant D, Nickerson D,
Hedley W, Nelson M, Lloyd C (2006) CellML 1.1 Specifi- A cellular automaton (pl. cellular automata, abbrev.
cation. CellML Specification. http://www.cellml.org/specifi- CA) is a discrete model studied in computability
cations/cellml_1.1 theory, mathematics, physics, complexity science
Garny A, Nickerson DP, Cooper J, Santos RW, Miller AK,
(▶ Complexity), theoretical biology, and microstruc-
McKeever S, Nielsen PMF, Hunter PJ (2008) CellML and
associated tools and techniques. Phil Trans R Soc A: Math ture modeling. They are used to build parallel comput-
Phys Eng Sci 366(1878):3017 ing (▶ Parallel Computing, Data Parallelism)
Hedley WJ, Nelson MR, Bullivant DP, Nielsen PF architectures as well as to model and simulate
(2001) A short introduction to CellML. Phil Trans R Soc
physical systems. Moreover they can be used to inves-
Lond A: Math Phys Eng Sci 359(1783):1073
Miller AK, Marsh J, Reeve A, Garny A, Britten R, Halstead M, tigate and reproduce the emergence of patterns
Cooper J, Nickerson DP, Nielsen PF (2010) An overview of (▶ Spatiotemporal Pattern Formation), self-replication
the CellML API and its implementation. BMC Bioinformat- and self-similarity properties in natural systems.
ics 11(1):178
Nickerson D, Buist M (2008) Practical application of CellML
1.1: the integration of new mechanisms into a human
ventricular myocyte model. Prog Biophys Mol Biol Characteristics
98(1):38–51
Terkildsen JR, Niederer S, Crampin EJ, Hunter P, Smith NP
History
(2008) Using physiome standards to couple cellular func-
tions for rat cardiac excitation-contraction. Exp Physiol CA were first developed in the 1940s by Stanislaw
93(7):919–929 Ulam, who was investigating the growth of crystals,
using a simple lattice network as his model, and by
John Von Neumann, who was working on the problem
of self-replicating systems (Von Neumann 1966).
Cell-to-Cell Communication In 1946, Norbert Wiener and Arturo Rosenblueth
developed a cellular automaton model of excitable
▶ Cellular Communication media. Their specific motivation was the mathematical
description of impulse conduction in cardiac systems.
Their original work continues to be cited in modern
research publications on cardiac arrhythmia and
Cellular Aging excitable systems (Wiener and Rosenblueth 1946). In
the 1960s, cellular automata were studied as
▶ Senescence a particular type of dynamical systems and the
C 382 Cellular Automata

Cellular Automata, a b
Fig. 1 (a) Moore
neighborhood; (b) Von
Neumann neighborhood

r=0 r=1 r=0 r=1

r=2 r=3 r=2 r=3

connection with the mathematical field of symbolic of two-dimensional CA, the most used neighborhoods
dynamics was established for the first time (Hedlund are Moore neighborhood and Von Neumann neighbor-
1969). hood (Fig. 1). More formally, if we refer to a certain
In the 1970s, K. Zuse proposed that the physical cell in the grid as a location identified by discrete
laws of the universe are discrete by nature, and that the coordinates (x, y), being (x0, y0) the central cell and
entire universe is the output of a deterministic compu- being r the neighborhood radius, we can define its
tation on a giant cellular automaton (Zuse 1982). In Moore neighborhood as
those same years, a two-state, two-dimensional cellu-
lar automaton named “Game of Life” became very NM ðx0 ; y0 Þ ¼ fðx; yÞ : jx  x0 j <¼ r; jy  y0 j <¼ rg;
widely known, particularly among the early computing (1)
community. The Life CA is able to perform computa-
tions, and after much effort, it has been shown that the while its Von Neumann neighborhood is
Game of Life can emulate a universal Turing machine
(Rendell 2002). NV ðx0 ; y0 Þ ¼ fðx; yÞ : jx  x0 j þ jy  y0 j <¼ rg:
In the 1980s, Stephen Wolfram started a
(2)
systematical investigation of a very basic but essen-
tially unknown class of cellular automata, which he
Given N, the total number of cells, and being a ¼ (x, y)
called “elementary cellular automata.” The unexpected
a generic cell, we can define the neighboring function
complexity of the behavior of these simple automata
as (Sipper 1997) g : N  N ! 2 N  N, defined as
led Wolfram to suspect that ▶ complexity in nature
may be due to similar mechanisms (Wolfram 2002).
gðaÞ ¼ fa þ d1 ; a þ d2 ; . . . ; a þ dn g; (3)
Cellular Automata Structure
for all a 2 N  N, where di 2 N  N, i ¼ 1, . . ., n,
A CA consists of a regular grid of elements, named
is fixed.
cells, each in one of a finite number of states, such as
“On” and “Off” or “0” and “1.” The grid can be in any
finite number of dimensions. Rule
Each cell contains a copy of the finite automaton
Neighborhood (V, v0, f), where V is the set of cellular states, v0 is
For each cell, a set of cells called “neighborhood” is a particular state called quiescent state, and f is the
defined, consisting of the cell itself, together with the transition function f : V n ! V. This function is
surrounding cells within a certain distance. In the case subject to the constraint f(v0, v0, . . ., v0) ¼ v0, that
Cellular Automata 383 C
8
is, a cell whose neighborhood is entirely quiescent < 0; S<2_S>3
remains quiescent. The state vt(a) of a cell a at time vtþ1 ðaÞ ¼ nt ðaÞ; S¼2 (4)
:
t is precisely the state of its associated automaton at 1; S¼3
time t. Each cell a is connected to the n neighbor cells
a + d1, a + d2, . . ., a + dn. The neighborhood state
function ht : I  I ! V n is defined by ht(a) ¼ (vt(a), A compact representation of this rule is“B3/S23”:
vt(a + d2), . . ., vt(a + dn)), assuming that d1 ¼ (0, 0). This essentially means that, for the next generation, C
Now we can relate the neighborhood state of a cell a a cell has a new birth (B) if its neighborhood consists of
at time t to the state of that cell at time t + 1 by exactly 3 alive cells and survives (S) if there are two or
f(ht(a)) ¼ vt + 1(a). The function f is referred to as three living cells in its Moore neighborhood (r ¼ 1).
the CA rule and is usually given in the form of a rule
table, specifying all possible pairs of the form (ht(a), Properties
vt + 1(a)). Such a pair is termed a transition or rule- CA are actually simple computer programs capable of
table entry. An allowable assignment of states to all a remarkable range of behaviors. Some CA have been
cells in the space is called a configuration. By applying proven to be universal computers. Others exhibit prop-
the transition function to an allowable configuration, erties familiar from traditional science, such as
a new configuration is generated and reiterating this thermodynamic behavior, continuum behavior, con-
process gives out a succession of configurations, called served quantities, percolation, sensitive dependence
evolution. A rule (and its corresponding CA) is said to on initial conditions, and others. They have been used
be totalistic when the value of a cell at time t depends as models of traffic, material fracture, crystal growth,
only on the sum of the values of the cells in its biological growth, and various sociological, geologi-
neighborhood (possibly including the cell itself) at cal, and ecological phenomena. Another feature of
time t  1. simple programs is that making them more compli-
cated seems to have little effect on their overall
Evolution ▶ complexity and this argument has been used as evi-
An initial state (time t ¼ 0) is selected by assigning dence that simple programs are enough to capture the
a state for each cell. A new generation is created essence of almost any complex system (▶ Complex
(advancing t by 1), according to rule f. For example, System). Von Neumann showed that some CA can be
the rule might be that the cell is “On” in the next universal, in the sense that they can perform every
generation if exactly two of the cells in the neighbor- computation, just like a universal Turing machine.
hood are “On” in the current configuration, otherwise The Game of Life, given a proper initial configura-
the cell is “Off” in the next generation. In uniform CA, tion, can generate gliders (Fig. 2), or patterns that are
the rule for updating the state of cells is the same for able to replicate and propagate themselves through the
each cell, while it may change in nonuniform CA. grid. This property has revealed to be useful for
Typically, the rule does not change over time, and is performing calculations, and a universal Turing
applied to the whole grid simultaneously. machine has been implemented in Life, thus demon-
strating the universality of this CA (Rendell 2002).

The Game of Life Computing Architectures Based on CA


The best-known two-dimensional cellular automaton Toffoli and Margolus (1987) have pioneered the idea
is the famous Game of Life, devised by John H. of building computer architectures based on cellular
Conway in 1970. It consists of a two-dimensional automata. Conventional computer architectures are
grid of cells, whose state may be dead (0) or alive optimized for the arithmetic treatment of continuum
(1), and Let D ¼ fd0 ; d1 ; . . . ; d8 g ¼ ½1; 1  ½1; 1 , models. On the other hand, CA can faithfully model
P
8
continuum systems, such as fluids. Unlike differential
being d0 ¼ ð0; 0Þ, and let be S ¼ ht ðaÞðkÞ the sum of
k¼1 equations (▶ Ordinary Differential Equation (ODE)),
the states of all the neighbors of cell a. The following they can be realized exactly by digital hardware. With
rule describes Conway’s Game of Life cellular a more appropriate architecture, a performance factor
automaton: of at least 104 can be easily gained in the simulation of
C 384 Cellular Automata

20

15

10

−5
Cellular Automata, Fig. 2 A glider in the Game of Life: This
pattern can replicate and propagate itself through the CA grid 0 5 10 15 20 25 30 35 40

Cellular Automata, Fig. 3 Simulation of a flow through


a circular cylinder obtained by means of a lattice-gas cellular
cellular automata, thus allowing to simulate very com- automaton
plex physical phenomena. This is the case of the CAM
(Cellular Automata Machine) architectures
implemented in hardware by Toffoli and Margolus,
successfully used to model and solve fluid dynamics,
diffusion, and collective phenomena problems.

Applications
The following are some of the many applications of
cellular automata for the simulation of physical systems.
Hydrodynamic modeling: Differential equations,
such as the Navier-Stokes equation, capture important
macroscopic aspects of fluid dynamics; however, their
implementation on a digital computer is not the equa-
tion themselves, but finite models obtained from them
by truncation and roundoff. It is possible to simulate
analogous macrodynamics starting directly from
discrete microscopic models, cellular automata that
idealize the motion and collisions of individual parti-
cles. Figure 3 shows the simulation of a flow through
a circular cylinder obtained by means of a lattice-gas
cellular automaton (Wolf-Gladrow 2005) (Lattice-Gas
Cellular Automaton Models for Biology).
Growth processes: This is useful for the study of
tumor growth dynamics in tissues. Cellular automata Cellular Automata, Fig. 4 A CA model of tumor growth
are used to model cell behavior and interactions at the
microscopic level, in order to simulate and evaluate
growth dynamics, vascularization or response to ther-
apy. Figure 4 was originated by a model that takes into needs of the tumor, cell-doubling time, and an imposed
account four cell types: healthy cells, proliferating spherical symmetry (Kansal et al. 2000).
tumor cells, non-proliferating tumor cells, and necrotic Pattern formation: Figures 5 and 6 show example
tumor cells, using a simple set of rules and a set of four patterns found in nature (Spatio-temporal pattern for-
microscopic parameters that account for the nutritional mation), together with their reproductions obtained by
Cellular Communication 385 C
Cellular Automata,
Fig. 5 (a) Snowflakes and
(b) a snowflake pattern
reproduced by means of a two-
dimensional CA on
a hexagonal grid

a b

Cellular Automata,
Fig. 6 (a) Cone shell (conus
textile) and (b) pattern
generated by the rule 30 CA

means of cellular automata evolution. Snowflakes Toffoli M, Margolus N (1987) Cellular automata
(Fig. 5) are obtained by running a two-dimensional machines: a new environment for modeling. MIT Press,
Boston
CA on a hexagonal grid, while cone shell patterns Von Neumann J (1966) Theory of self-reproducing automata.
(Fig. 6) are generated by a rule 30 CA. University of Illinois Press, Urbana
Wiener N, Rosenblueth A (1946) The mathematical formulation
of the problem of conduction of impulses in a network of
connected excitable elements, specifically in cardiac muscle.
Cross-References Arch Inst Cardiol Mex 16:205–265
Wolf-Gladrow DA (2005) Lattice-gas cellular automata and
▶ Complex System lattice boltzmann models – an introduction. Springer, Berlin
▶ Complexity Wolfram S (2002) A new kind of science. Wolfram Media,
Champaign
▶ Lattice-Gas Cellular Automaton Models Zuse K (1982) The computing universe. Int J Theo Phys
▶ Ordinary Differential Equation (ODE) 21:589–600
▶ Parallel Computing, Data Parallelism
▶ Spatiotemporal Pattern Formation

References Cellular Communication

Hedlund GA (1969) Endomorphisms and automorphisms of the Xiaojuan Sun


shift dynamical system. Math Syst Theory 3:320–375 Zhou Pei-Yuan Center for Applied Mathematics,
Kansal AR, Torquato S, Harsh GR IV, Chiocca EA, Deisboeck
Tsinghua University of Beijing, Beijing, China
TS (2000) Simulated brain tumor growth dynamics using
a three-dimensional cellular automaton. J Theor Biol
203:367–382
Rendell P (2002) Turing universality of the game of life. In: Synonyms
Adamatzky A (ed) Collision-based computing. Springer,
London
Sipper M (1997) Evolution of parallel cellular machines: the Cell-to-cell communication; Intercellular
cellular programming approach. Springer, Heidelberg communication
C 386 Cellular Potts Model

Definition References

Cell communication is an important process by which Alberts B, Bray D, Hopkin K, Johnson AD, Lewis J, Raff M,
Roberts K, Walter P (2004) Essential cell biology. Garland
a cell sends or receives information from other cells.
Science, New York
Individual cells, such as yeast, need to sense and
respond to their environment, and communicate
with other cells to have any kind of “social life.” For
example, when a yeast is ready to mate, it secrets Cellular Potts Model
a small protein called a mating factor. Neighboring
yeast cells of opposite “sex” can detect this chemical Anja Voß-B€ohme1, J€orn Starruß2 and Walter de Back2
1
mating call and respond by halting their cell cycle Center for Information Services and High
progress and reaching out toward the cell that emitted Performance Computing (ZIH), Technical University
the signal. In a multicellular organism, cells must Dresden, Dresden, Germany
2
interpret the multitude of signals they receive from Centre for High Performance Computing (ZIH),
other cells to help coordinate their behaviors, such as Technical University Dresden, Dresden, Germany
cell fate decision in embryo development (Alberts
et al. 2004).
In typical cell communications, the signaling Synonyms
cell produces a particular type of signal molecule
that is detected by the target cell. The target cell CPM; Glazier–Graner–Hogeweg model; Potts model,
possesses receptor proteins that recognize and cellular/extended
respond specifically to the signal molecule. Signals
can act over long or short range, depending on the
way in which the signal is transported. For example, Definition
hormones produced in endocrine cell are secreted
into the bloodstream and are often distributed A cellular Potts model (CPM) is a spatial lattice-based
widely throughout the body. In embryo development, formalism for the study of spatiotemporal behavior of
signal proteins (morphogens) disperse from localized biological cell populations. It can be used when the
synthesis sites to form concentration gradients details of intercellular interaction are essentially deter-
across the developing tissue. Neuronal signals mined by the shape and the size of the individual cells
are transmitted along axons to remote target cells. as well as the length of the contact area between
Cells with membrane-to-membrane interface can neighboring cells.
engage in contact-dependent signaling (Alberts et al. Formally, a cellular Potts model is a time-discrete
2004). Markov chain (▶ Markov Chain). It is a lattice model
Each cell can only respond to a limited set of where the individual cells are simply connected domains
signals selectively depending on the specific receptor of nodes with the same cell index. A CPM evolves by
proteins. Most extracellular signal molecules cannot updating the cells’ configuration by one pixel at a time
pass through the plasma membrane. They bind to based on probabilistic rules. These dynamics are
specific cell-surface receptor proteins to induce the interpreted to resemble membrane fluctuations, where
intracellular signal transduction pathway. Receptor one cell shrinks in volume by one lattice site and
proteins act as transducers, binding to signaling mol- a neighboring cell increases in volume by occupying
ecules, and converting the signal from one physical this site. The transition rules follow a modified
form to another. Most cell-surface receptor proteins ▶ Metropolis algorithm with respect to a Hamiltonian.
belong to one of three large families: ion-channel-
linked receptors, G-protein-linked receptors, or Characteristics
enzyme-linked receptors. These families differ from
each other by the nature of signals they generate when Problem
the extracellular signal molecule binds to them The biological structure and function typically result
(Alberts et al. 2004). from the complex interaction of a large number of
Cellular Potts Model 387 C
Cellular Potts Model,
Fig. 1 Cell-surface
interaction in the Cellular Medium
Potts Model. Three cells, each
JT1,M JT1,M JT1,M JT1,M
one covering several lattice
sites, interact with each other
at the cell surfaces. The JT1,M JT1,M JT1,M
strength J of the interaction
depends on the cell types, type JT1,M JT1,M JT1,M C
T1 depicted in red, type T2 in
green. There are also JT1,T1
Celltype 1 Celltype 1
interactions between the cells
and the medium (white) JT1,T2 JT1,T2

JT1,T2 JT1,T2

JT1,M JT1,T2 JT1,T2

JT1,M JT1,T2 Celltype 2 JT1,T2

JT1,M JT2,M JT1,M

JT2,M JT2,M

components. When ▶ spatiotemporal pattern forma- expression of molecules that regulate intercellular
tion in cellular populations or tissues is considered, adhesion are responsible for cell sorting. Since then,
one is often interested in concluding characteristics of this formalism has been elaborated and applied to
the global, ▶ collective behavior of cell configurations study a wide range of morphogenetic phenomena in
from the individual properties of the cells and the developmental biology.
details of the intercellular interaction. However, even
if the basic cell properties and interactions are per- The Model
fectly known, it is possible that – due to the complex State Space
structure of the system – the collective traits cannot be A CPM assigns a value ðxÞ from a set
directly extrapolated from the individual properties. W ¼ f0; 1; :::; ng to each site x of a countable set S,
Therefore, appropriate mathematical models need to cp. Fig. 1. The set S resembles the discretized space
be designed and analyzed that help to accomplish this and is often chosen as a two- or three-dimensional
task on a theoretical basis. Cellular Potts models con- regular lattice. The set W ¼ f0; 1; :::; ng contains the
stitute a modeling framework that is applicable when so-called cell indices, where n 2 N is the absolute
the details of intercellular interaction are essentially number of cells that are considered in the model. The
determined by the shape and the size of the individual state of the system as a whole is described by config-
cells as well as the length of the contact area between urations Z 2 X ¼ Ws. Given a configuration Z 2 X,
neighboring cells. a cell is the set of all points in S with the same cell
This model class has been developed by Glazier and index, cellw :¼ fx 2 S : ðxÞ ¼ wg; w 2 Wnf0g:The
Graner (1993) in the context of cell sorting. The latter value 0 is assigned to a given node, if this node is not
refers to the observed segregation of heterotypic cell occupied by a cell but by medium. Each cell is of
aggregates into spatially confined homotypic cell clus- a certain cell type, which determines the migration
ters. The CPM was introduced to explore the tissue- and interaction properties of the cell, the set of all
scale consequences of the differential adhesion possible cell types being denoted by L. Denote by
hypothesis (▶ Differential Adhesion Hypothesis) that t : W ! L the map that assigns each cell its cell
holds that cell-type-dependent disparities in the type. A cell with index w 2 W has volume (for the
C 388 Cellular Potts Model

Kronecker symbol d it holds that dðu; vÞ ¼ 1 if u ¼ v Again mt , the target surface length, and at , the strength
and dðu; vÞ ¼ 0 otherwise) of the surface constraint, are parameters, t2L. Thus,
the typical structure of a CPM-Hamiltonian is
X
Vw ðÞ :¼ dðw; ðxÞÞ;
H ¼ Hs þ Hv þ H0 ; (4)
x2S

where Hs ; Hv are given in (1) and (2) and H0: X ! R is


and surface length a model-specific addend. See the section “Extensions
and Applications” for additional examples of H0.
1 X Transitions from one configuration to another fol-
Mw ðÞ :¼ dðw; ðxÞÞ:
2 low a special rule which is called modified Metropolis
interfaces fx;yg
algorithm (▶ Metropolis Algorithm). First, two addi-
tional parameters are specified: a so-called temperature
The sum in the last term is taken over all interfaces T > 0, which is a biological analogue of the energy of
of a given configuration  that are all pairs of lattice thermal fluctuations in statistical physics and is
neighbors which do not belong to the same cell. a measure of cell motility, and the transition threshold
h, which accounts for energy dissipation during forma-
Dynamics tion and breaking of intercellular bonds and avoids
A cellular Potts model (CPM) is a time-discrete oscillatory behavior (Savill and Hogeweg 1997;
Markov chain (▶ Markov Chain) with state space X, Ouchi et al. 2003). Then, the following algorithm is
where the transition probabilities are specified with the performed (1):
help of a Hamiltonian. The latter is a function H: X ! 1. Start with configuration Z.
R which often has a special structure. Usually, it is the 2. Pick a target site x 2 S at random with uniform
sum of several terms that control single aspects of the distribution on S.
cells’ interdependence structure. The standard CPM 3. Pick a neighbor y of x at random with uniform
uses the following two terms. First a surface interac- distribution among all lattice neighbors of x.
tion term 4. Calculate the energetic difference
DHxy :¼ Hðyx Þ  HðÞ of a transition  ! yx ,
X where yx ðzÞ :¼ ðyÞ if z ¼ x and yx ðzÞ :¼ ðzÞ
Hs ðÞ ¼ JðtððxÞÞ; tððyÞÞÞ;  2 X; (1)
interfaces fx;yg
otherwise.
5. Accept the transition by setting  :¼ yx with prob-
ability pðDHxy Þ, or ignore the transition with proba-
is specified. Here, J: L  L ! R, the matrix of so- bility 1  pðDHxy Þ, where
called surface energy coefficients, is assumed to be 1 if DHyx <h
symmetric. Second the volume constraint pðDHyx Þ ¼ ðDHyx hÞ=T
e otherwise
5. Go to 1 or end the algorithm.
X Consequently, only such transitions are possible
Hv ðÞ ¼ ltðwÞ ðVw ðÞ  vtðwÞ Þ2 ;  2 X: (2)
w2W where the index of at most one lattice site is changed,
resulting in a shift of the cell’s center of mass. The new
assignment to this lattice site is chosen from the cell
is used. Here Vt, the target volume, and lt, the strength
indices of the neighboring lattice sites. These dynam-
of the volume constraint, are cell-type-specific param-
ics are interpreted to resemble membrane fluctuations,
eters, t2L. Depending on the phenomenon under
where one cell shrinks in volume by one lattice site and
investigation, further summands can be included. For
a neighboring cell increases in volume by occupying
instance, a constraint can be put on the surface length
this site.
(Ouchi et al. 2003)
To complete the model, appropriate boundary con-
X ditions must be specified. If the influence of the bound-
Hm ðÞ ¼ atðwÞ ðMw ðÞ  mtðwÞ Þ2 ;  2 X: (3) ary shall be neglected, periodic boundary conditions
w2W are used. This means that the space can be thought of as
Cellular Potts Model 389 C
being mapped onto a torus. However, fixed boundary concept, where each cell consists of a row of standard
conditions, where the interaction between cell surfaces CPM cells (Starruß et al. 2007).
and confining environment is explicitly modeled, can Chemotactic response to some field c:S ! [0,1) of
be defined as well. signals can be modeled in the simplest form by an
P P
addend Hchemo ¼ w2Wnf0g lchemo ðwÞ x2cellw cðxÞ to
Extensions and Applications the Hamiltonian, where lchemo is possibly a cell-type-
The CPM model formalism has been used for several specific chemotactic response parameter (Glazier et al. C
problem-specific extensions. In general, this is done by 2007). If lchemo < 0, the cells prefer to move up the
including additional terms into the Hamiltonian (4). In chemotactic gradient, for lchemo > 0 they prefer to
some cases, these additional terms also depend on the move down the gradient. There have been several
chosen target spin, thereby changing the weights for more refined extensions to the CPM that model che-
the acceptance of a proposed transition in the modified motaxis (Glazier et al. 2007). One example is the
Metropolis algorithm. The latter extensions are called following kinetic extension used by Savill and
kinetic extensions, since they directly affect the Hogeweg (1997), where the positions of the target
transition rates. spin x and the trial spin y in a proposed transition
Cell motility emerges in the CPM implicitly from  ! yx are taken into account,
the fluctuations of the cells’ center of masses. To
X
explicitly model physical characteristics of cell motil- DHchemo ¼ lchemo ðwÞðcðyÞ  cðxÞÞ: (6)
ity such as cell persistence and inertia, additional terms w2W
that constrain the cell displacement per time step can
be added to the difference DH of the standard CPM- Hybrid and multiscale modeling: The CPM
Hamiltonian (4) that is calculated in step (3) of the can be coupled to auxiliary formalisms, typically
modified Metropolis algorithm. Inertia, for example, using systems of differential equations. A hybrid
has been modeled by constraining the cell velocity approach enables multiscale modeling in which
increment via the term molecular species are represented as continuous
quantities, and cells are treated as discrete entities.
X For instance, CPM parameters pertaining to cellular
DHinertia ðtÞ ¼ linertia ðwÞ properties can be under the control of ordinary differ-
w2W ential equations, representing subcellular processes
! !
k vel ðw; tÞ  vel ðw; t  DtÞk2 ; (5) such as gene regulation. CPM cell behavior can
also be linked to lattice-based reaction-diffusion
systems representing the biochemical microenviron-
!
where vel ðw; tÞ denotes the instantaneous center-of- ment through, for example, chemotaxis. A similar
mass velocity of the cell w at time t, linertia ðwÞ is a cell- approach can be adopted to spatially represent the
specific parameter, and Dt is one or more Monte Carlo intracellular biochemistry that exerts influence on
steps (Balter et al. 2007). Since the increment of the the protrusions and retractions in the CPM by kinetic
Hamiltonian depends on the proposed transition, this is modulation of transition probabilities (Marée et al.
a kinetic extension of the CPM. 2006).
Cell shapes arise in the CPM implicitly from sat-
isfying the volume constraint. In the two-dimensional Implementations
CPM, cells adopt approximately hexagonal When applied to specific biological problems, the
shapes, producing a space tiling pattern comparable CPM framework is typically used with several exten-
to epithelial tissues. Elongated cell shapes can be sions and modifications. Its analysis comprises exten-
modeled by imposing a cell length constraint sive numerical simulation studies. In an effort to
which renders the major axis of the ellipsoidal provide a common implementation for CPM simula-
approximation of the cell’s shape to be close to tions, CompuCell3D has been developed (Glazier et al.
a predefined target length or ratio (Zajac et al. CompuCell3D). This open source software imple-
2003). Rod cell shapes with particular stiffness have ments a large number of common CPM extensions
been modeled using a compartmentalized cell and provides a graphical modeling interface.
C 390 Cellular Spaces

Limitations and Merits Glazier JA et al. CompuCell3D, an open source modeling envi-
From a theoretical perspective, the CPM is poorly ronment, www.compucell3d.org
Glazier JA, Graner F (1993) Simulation of the differential adhe-
understood. Hence, the analysis of CPM models can sion driven rearrangement of biological cells. Phys Rev
effectively only be performed by numerical simula- E 47(3):2128–2154
tion. Important mathematical methods, such as rigor- Glazier JA, Balter A, Poplawski NJ (2007) Magnetization to
ous spatiotemporal limit procedures to derive the laws morphogenesis: a brief history of the Glazier-Graner-
Hogeweg model. In: Anderson ARA, Chaplain MAJ,
that guide the behavior of certain macroscopic vari- Rejniak KA (eds) Single cell-based models in biology and
ables, are not yet available. Since the classical Metrop- medicine, Mathematics and biosciences in interaction.
olis algorithm (▶ Metropolis algorithm) is modified in Birkh€auser Verlag, Basel, pp 79–106
the CPM, these models differ in essential aspects from Marée AFM, Jilkine A, Dawes A, Grieneisen VA, Edelstein-
Keshet L (2006) Polarization and movement of keratocytes:
classical equilibrium models. In addition, CPMs have a multiscale modelling approach. Bull Math Biol
been criticized because their calibration is often 68(5):1169–1211. doi:10.1007/s11538-006-9131-7
nontrivial. Cellular behaviors are specified in an indi- Ouchi NB, Glazier JA, Rieu J, Upadhyaya A, Sawada Y
rect or phenomenological manner via the Hamiltonian (2003) Improving the realism of the cellular Potts model
in simulations of biological cells. Physica A 329(3–4):451–
and the modified Metropolis algorithm. Consequently, 458
the relation between the parameters that control the Savill NJ, Hogeweg P (1997) Modelling morphogenesis: from
dynamics of the CPM and the biological-physical single cells to crawling slugs. J Theor Biol 184(3):229–235
quantities they represent often remains allusive. Starruß J, Bley T, Sogaard-Andersen L, Deutsch A (2007) A new
mechanism for collective migration in Myxococcus xanthus.
Despite these limitations, the CPM formalism has J Stat Phys 128(1–2):269–286
found applications in many topics, mainly in the field Zajac M, Jones GL, Glazier JA (2003) Simulating convergent
of developmental biology. Its spatial and cell-centered extension by way of anisotropic differential adhesion.
nature renders it suitable for the study of phenomena J Theor Biol 222:247–259
where a mesoscopic description of individual cell
shape and motility is important. It provides a flexible
modeling framework that allows incorporation of
problem-specific extensions. Moreover, coupling the
CPM to auxiliary model formalisms enables the explo- Cellular Spaces
ration of the complex interplay between several factors
at different biological scales, acting at the intracellular, ▶ Cellular Automata
the intercellular, and the tissue level.

Cross-References Cellular Structures

▶ Collective Behavior ▶ Cellular Automata


▶ Differential Adhesion Hypothesis
▶ Markov Chain
▶ Metropolis Algorithm
▶ Spatiotemporal Pattern Formation CE-MS-based Metabolomics

Tamaki Fujimori
Human Metabolome Technologies Inc., Tsuruoka,
References
Yamagata, Japan
Balter A, Merks RMH, Poplawski NJ, Swat M, Glazier JA
(2007) The Glazier-Graner-Hogeweg model: extensions,
future directions, and opportunities for further study. In: Synonyms
Anderson ARA, Chaplain MAJ, Rejniak KA (eds) Single
cell-based models in biology and medicine, Mathematics
and biosciences in interaction. Birkh€auser Verlag, Basel, Capillary electrophoresis mass spectrometry–based
pp 151–167 metabolomics
CE-MS-based Metabolomics 391 C
Definition property. The migration speed of metabolites is deter-
mined by hydrated ionic radius and valence. Because
Capillary electrophoresis (CE) mass spectrometry even the metabolites with the same mass can be sep-
(MS) is an analytical technique. CE is superior in arated based on the nature of CE, the separation of
separation efficiency of ionic metabolites. MS can pro- structural isomers is possible. For example, glucose
vide high sensitivity. The combination of CE with MS 1-phosphate, glucose 6-phosphate, and fructose
can lead to achieve the analytical platform with high 6-phosphate can be easily separated in CE. The CE C
separation capacity, resolution, and sensitivity. CE-MS analysis is also advantageous in terms of small vol-
has been often used for measurements of ionic metab- ume requirement (a few nanoliters) for sample
olites in metabolomics. CE-MS analysis is not suited to injection.
measurements of neutral metabolites. CE-MS analysis
is a complementary tool to reversed-phase LC-MS TOFMS
analysis which is specific to measurements of neutral Time-of-flight mass spectrometry (TOFMS) is often
metabolites. used as MS in CE-MS analysis. Accelerated ions by
applied voltage reach the detector under high vacuum.
Ions with low m/z reach the detector earlier than those
Characteristics with high m/z. The value of m/z is calculated on the
basis of the arrival time. TOFMS provides relatively
Selection of Analytical Machines high resolution, which enables estimation of composi-
Most of the metabolites in primary metabolism are tion formula.
ionic, hydrophilic, and polar. For example, metabo-
lites of glycolysis, TCA cycle, amino acid metabo- Peak Annotation
lism, and nucleotide synthesis show these Detected peak information including migration time
characteristics. Therefore, CE-MS analysis is suitable in CE and m/z is obtained in CE-TOFMS analysis.
for measurements of metabolites in primary metabo- Peaks of putative metabolites are assigned to those of
lism. Metabolomics using CE-MS analysis could lead known metabolites according to the peak informa-
to understandings of biochemical and biological tion. Human Metabolome Technologies Inc. pos-
mechanisms in primary metabolism. On the contrary, sesses library information about migration time and
CE-MS analysis is not suited to measurements of m/z of about 900 metabolites. These metabolites can
neutral metabolites because they cannot be separated be identified and quantified in reference to library
in CE. LC-MS is suitable for measurements of neutral information. Most amino acids, organic acids, sugar
metabolites. CE-MS analysis is a complementary tool phosphates, and nucleotides are included in the
to LC-MS analysis. Whether the usage of CE-MS or library.
LC-MS is selected in metabolomics depends on the
target metabolites. Metabolomics Application
CE-MS based metabolomics is applied to studies in
CE many fields including basic sciences, medical sci-
CE separates polar metabolites on the basis of their ences, nutrition, agriculture, and toxicology.
mass-to-charge ratios. The separation capacity of CE is Metabolomics could lead to elucidating of gene,
superior to that of gas chromatography (GC) or liquid enzyme, biomarker, and metabolic network. Specifi-
chromatography (LC). Although the sensitivity is low cally, CE-MS based metabolomics can depict the
in CE, it can be sufficiently compensated by high details of primary metabolism which consists of
sensitivity in MS. Low-molecular-weight metabolites many ionic metabolites. Several applications are
in biological samples can be analyzed in two modes for introduced below.
cationic and anionic metabolites. Soga et al. identified an oxidative biomarker in
After sample injection, the voltage is applied to hepatotoxicity by CE-MS analysis. The changes in
the capillary. Cationic metabolites migrate toward the liver metabolites following acetaminophen-induced
cathode side and anionic metabolites migrate toward hepatotoxicity were examined. Ophthalmic acid
the anode side on the basis of electrochemical increased in conjunction with glutathione
C 392 Cene

consumption in liver after the addition of acetamino-


phen. The changes were also observed in serum. Oph- Cene
thalmic acid might be a new biomarker for hepatic
oxidative stress. Eric Werner
Ishii et al. performed multi-omics (transcriptomics, Department of Physiology, Anatomy and Genetics,
proteomics, and CE-MS-based metabolomics) analy- University of Oxford, Oxford, UK
sis to examine metabolic regulation of Escherichia Department of Computer Science, University of
coli in response to gene disruption or changes in Oxford, Oxford, UK
growth rate. Metabolite levels were stable compared Oxford Advanced Research Foundation, Fort Myers,
to gene expression and protein levels. The results FL, USA
showed robustness of metabolic network against per-
turbation such as gene disruption and changes in Synonyms
growth rate.
Ohashi et al. examined metabolome changes in Cancer networks; Cenome; Developmental control net-
histidine-starved E. coli by CE-MS-based works; Developmental networks; Stem cell networks
metabolomics. The analysis quantified 198 metabolites
among 375 charged and hydrophilic metabolites in
primary metabolism. In E. coli, histidine starvation Definition
induced changes in many metabolic pathways includ-
ing glycolysis, TCA cycle, amino acid metabolism, A cene is a developmental network (▶ Developmental
and nucleotide biosynthesis. Control Networks) that controls the development of
multicellular organisms. The nodes in the cene are cell
control states. Edges in the network denote cell actions
including jumps to new cell states. There are branches
References in the network that denote cell division where each
daughter cell enters a possibly new control state. Some
Ishii N, Nakahigashi K, Baba T, Robert M, Soga T, Kanai A, branches stand for stochastic changes of control states.
Hirasawa T, Naba M, Hirai K, Hoque A, Ho PY, Kakazu Y, Other branches describe cell signaling protocols
Sugawara K, Igarashi S, Harada S, Masuda T, Sugiyama N, (Werner 2011a, b).
Togashi T, Hasegawa M, Takai Y, Yugi K, Arakawa K,
Iwata N, Toya Y, Nakayama Y, Nishioka T, Shimizu K,
Mori H, Tomita M (2007) Multiple high-throughput analyses Characteristics
monitor the response of E. coli to perturbations. Science
316(5824):593–597 Cenes can be linked together to form larger cenes. The
Monton MR, Soga T (2007) Metabolome analysis by capillary
electrophoresis-mass spectrometry. J Chromatogr A global developmental control network in a genome is
1168(1–2):237–246 called the ▶ cenome. The topology of the cene deter-
Ohashi Y, Hirayama A, Ishikawa T, Nakamura S, Shimizu K, mines its ideal dynamic phenotype.
Ueno Y, Tomita M, Soga T (2008) Depiction of metabolome Examples of cenes include ▶ stem cell networks,
changes in histidine-starved Escherichia coli by
CE-TOFMS. Mol Biosyst 4(2):135–147 ▶ cancer networks and terminal, and progenitor cell
Ramautar R, Somsen GW, de Jong GJ (2009) CE-MS in ▶ developmental control networks (for details see
metabolomics. Electrophoresis 30(1):276–291 Werner 2011a, b).
Ramautar R, Mayboroda OA, Somsen GW, de Jong GJ
(2011) CE-MS for metabolomics: developments and
applications in the period 2008–2010. Electrophoresis Cenes Are Executable Networks
32(1):52–65 Developmental control networks or cenes are execut-
Soga T, Baran R, Suematsu M, Ueno Y, Ikeda S, Sakurakawa T, able networks. The cell has an interpretive executive
Kakazu Y, Ishikawa T, Robert M, Nishioka T, Tomita M system, the IES, that interprets and executes the direc-
(2006) Differential metabolomics reveals ophthalmic
acid as an oxidative stress biomarker indicating hepatic tives in developmental control networks. This system
glutathione consumption. J Biol Chem 281(24):16768– co-evolved with the developmental control networks
16776 (Werner 2011a).
Cenome 393 C
Sub-cenes Within Cenes Definition
Any cene can link to another cene. This means the link
has further developmental progeny generated by that A cenome is the global developmental network
sub-cene. In this way more and more complex cenes (▶ Developmental Control Networks) that controls
are built up. the development of a multicellular organism.
The cenome is a type of global ▶ Cene. The nodes
Evolution of Cenes and Multicellular Organisms in the cenome are cell control states. Edges in the C
The evolution of multicellular organisms is directly network denote cell actions including jumps to
linked to the increasing complexity of their cenes. new cell states. There are branches in the network
These networks are an autonomous layer on top of that denote cell division where each daughter cell
normal gene networks (Werner 2011a, b). enters a possibly new control state. Some branches
stand for stochastic changes of control states.
Cross-References Other branches describe cell signaling protocols
(Werner 2011a, b).
▶ Cancer Networks
▶ Cenome
▶ Developmental Control Networks Characteristics
▶ Stem Cell Networks
Cenomes consist of linked-together ▶ developmental
References control networks or ▶ cenes. The global developmen-
tal control network in a genome is called the Cenome.
Werner E (2011a) On programs and genomes, arXiv:1110.5265v1 The topology of the cenome determines part of its
[q-bio.OT]. http://arxiv.org/abs/1110.5265
ideal dynamic phenotype. Since most genes are
Werner E (2011b) Cancer networks: a general theoretical and
computational framework for understanding cancer, shared between organisms they cannot be responsible
arXiv:1110.5865v1 [q-bio.MN]. http://arxiv.org/abs/ for the unique morphologies and functional pheno-
1110.5865v1 types of organisms. Cenes and the global cenome
complement genetic processes with multicellular
control processes.
Cenes The cenome consists of cenes (▶ Cene)
that can include ▶ stem cell networks, ▶ cancer
▶ Cancer Networks networks and terminal, and progenitor cell ▶ devel-
▶ Developmental Control Networks opmental control networks (for details see Werner
2011a, b).

Cenome Cenomes Are Executable Networks


The cenome of an organism is an executable network.
Eric Werner Each cell in a multicellular system has an interpretive
Department of Physiology, Anatomy and Genetics, executive system, the IES, that interprets and executes
University of Oxford, Oxford, UK the directives in the cenome. The IES co-evolved with
Department of Computer Science, University of the cenome (Werner 2011a).
Oxford, Oxford, UK
Oxford Advanced Research Foundation, Fort Myers,
FL, USA Cenome and Cene Control Have a Subsumption
Architecture
The cenome control network is a higher layer that
Synonyms subsumes lower level standard gene networks.
Lower level gene networks allow the cell to react
Cene; Developmental control networks to its local environment, while higher level cenes,
C 394 Centromere

which make up the cenome, guide the global


development of the embryo. Ceteris Paribus Laws

Evolution of Cenes and Multicellular Organisms Max Kistler


Any cene can link to another cene. This means the link IHPST, Université Paris 1 Panthéon-Sorbonne, Paris,
has further developmental progeny generated by that France
sub-cene. In this way, more and more complex cenes
are built up within a cenome. The evolution of
multicellular organisms is directly linked to the Definition
increasing complexity of their cenome (Werner
2011a, b). Regularities in ▶ special sciences typically have
exceptions. A law expressing a regularity that has
exceptions is sometimes called a “ceteris paribus law”
Cross-References
(or “cp-law”). A situation is an exception to a law if it
does not correspond to the regularity expressed by the
▶ Cancer Networks
law, but is not taken to refute the law, because the
▶ Cene
exception can be explained to be the result of interfer-
▶ Developmental Control Networks
ences that follow (other) laws. A cp-law expresses
▶ Stem Cell Networks
a regularity that exists in “normal” situations, or, liter-
ally, if “all else is equal,” i.e., if there are no perturba-
References tions resulting in an exception. In classical genetics,
Mendel’s law of segregation says that, in sexually
Werner E (2011a) On programs and genomes, arXiv:1110.5265v1 reproducing organisms, during gamete formation each
[q-bio.OT]. http://arxiv.org/abs/1110.5265
Werner E (2011b) Cancer networks: a general theoretical
member of an allelic pair separates from the other
and computational framework for understanding cancer, member to form the genetic constitution of an individual
arXiv:1110.5865v1 [q-bio.MN]. http://arxiv.org/abs/ gamete, so that there is a 50:50 ratio of alleles in the
1110.5865v1 mass of the gametes. However, some sets of gametes are
exceptional with respect to this law because some
organisms undergo “meiotic drive” leading to an
overproduction of gametes with one allele at the
expense of gametes with the other allele. No strict law
Centromere is true of real systems whose evolution is subject to
perturbations. One proposal for understanding ceteris
Rosella Visintin paribus laws is to take them to be strictly universal
IEO, European Institute of Oncology, generalizations that are true of ideal systems. However,
Milan, Italy the status of such “ideal systems” is problematic. It is
not clear how laws describing systems that are not real
can help explain and predict real systems. An alternative
Definition is to distinguish between laws of nature and laws “in
situ” (Cummins 2000), which are specific for certain
A region on the DNA, usually localized in the center of systems that Cartwright (1999) calls “nomological
chromosomes, where sister chromatids are closely machines.” Laws of nature determine relations between
associated. different properties of objects and have unrestricted
scope: All massive objects obey Newton’s law of uni-
versal gravitation. However, no such law determines
Cross-References directly and by itself the evolution of any particular
system: To obtain the equation of motion of
▶ Mitosis a particular system, one has to build a model that
Checkpoints 395 C
takes into account all properties of the objects
composing the system and all forces due to these prop- Characteristic Path Length
erties. To find the equation of motion of a charged
massive particle p such as a biological molecule, one Falk Schreiber
has to describe all massive and charged particles qi in Leibniz Institute of Plant Genetics and Crop Plant
the environment of the object, then calculate the forces Research (IPK), OT Gatersleben, Stadt Seeland,
that the objects qi exert on p. The motion of the molecule Germany C
is determined by the sum of all these forces. Exceptions Martin Luther University Halle–Wittenberg, Halle,
to generalizations bearing on particular systems can Germany
then be conceived as resulting from the neglect of
some properties of p, or of relevant objects qi in its
environment. Synonyms

Average path length


References

Cartwright N (1999) The dappled world, A study of the bound- Definition


aries of science. Cambridge University Press, Cambridge
Cummins R (2000) How does it work? vs. What are the laws?
Two conceptions of psychological explanation. In: Keil F, The characteristic path length l(G) of a ▶ graph
Wilson R (eds) Explanation and cognition. MIT Press, Cam- G ¼ (V, E) is defined as the average number of edges
bridge, pp 117–145 in the shortest paths between all vertex pairs given by

1 X X
lðGÞ ¼ splðu; u0 Þ; (1)
jV j ðjV j  1Þ u2V u0 2Vnfug
CFSE
where spl(u,u0 ) gives the number of edges in a shortest
▶ Modeling, Cell Division and Proliferation path between vertices u and u0 . In case of an
unconnected graph, the characteristic path length is
infinite, as the number of edges between two
unconnected vertices is considered infinite. In this
Chain Type case, formula (1) is often modified to sum over just
all connected vertex pairs. Alternatively, other mea-
▶ IMGT-ONTOLOGY, ChainType sures such as the harmonic mean or the average inverse
path length can be used.

Change of Antigenic Properties by References


Mutations
Junker BH, Schreiber F (2008) Analysis of Biological Networks.
Wiley Series on Bioinformatics, Computational Techniques
▶ Antigenic Drift and Shift
and Engineering, Wiley, 368
Watts D, Strogatz S (1998) Collective dynamics of small-world
networks. Nature 393:440–442

Change of Antigenic Properties by


Rearrangement of Viral Genome
Segments Checkpoints

▶ Antigenic Drift and Shift ▶ Cell Cycle Dynamics, Irreversibility


C 396 Chemical Master Equation

and
Chemical Master Equation

Hao Ge1 and Hong Qian2 vji ¼ the change in the number of Si molecules
1
School of Mathematical Sciences and Centre for caused by one Rj reaction; ðj ¼ 1; ; M; i ¼ 1; ; NÞ:
Computational Systems Biology, Fudan University, (2)
Shanghai, China
2
Department of Applied Mathematics, University of
Washington, Seattle, WA, USA The stoichiometry vector {vji} can be obtained from
the difference between the numbers of a molecular
species that are consumed and produced in the
Synonyms reaction.
Exact description of the rate function associates
Gillespie algorithm; Stochastic chemical kinetics with a reaction that can be either from phenomenolog-
ical models for stochastic chemical kinetics or from
Definition more fundamental molecular physics based on the
concept of elementary reactions. In general, the func-
Chemical master equation is the stochastic counterpart tion rj has the mathematical form:
of the chemical kinetic equation based on the law of
mass action. It describes the kinetics of chemical rj ðxÞ ¼ kj hj ðxÞ: (3)
reactions in a rapidly stirred tank with small volume
in terms of stochastic reaction times giving rise to
fluctuating copy numbers of reaction species. Here kj is the specific probability rate constant for
Consider a system of fixed volume V at constant reaction Rj, which is defined such that kjdt is the
temperature T. Let there be well-stirred mixture of probability that a randomly chosen combination of
N  1 molecular species {S1, . . . ,SN} and M  1 the reactants of Rj will react accordingly in the next
reactions {R1, . . ., RM}. One specifies the dynamical infinitesimal time interval dt. kj is intimately related to
state of this system by X(t) ¼ (X1(t), . . ., XN(t)), where the rate constants in the traditional law of mass action
Xi(t) is the copy number of molecular species Si in the kinetics.
system at time t. The function hj(x) in Eq. 3 measures the number
One describes the time evolution of X(t) from some of distinct combinations of Rj reactant molecules
given initial state X(t0) ¼ x0. Both single-molecule available in the state x. It is a combinatorial factor
experimental measurements and theoretical investiga- that can be easily obtained from the reaction Rj, i.e.,
QN xk !
tions have shown that X(t) is a stochastic process hj ðxÞ ¼ k¼1 mjk !ðxk mjk Þ!. In the transitional mass-
because the time at which a particular reaction occurs
action kinetics, this term is related to the product of
is random.
the concentrations of all reactants.
The chemical master equation kinetics assumes that
In general, for a chemical reaction
the system is well stirred such that at any moment each
reaction occurs with equal probability at any position in
space. Furthermore, it assumes that for each reaction Rj, Rj : mj1 S1 þ þ mjN SN ! nj1 S1 þ þ njN SN ; (4)
there is a corresponding rate function rj and a stoichi-
ometry vector vj ¼ (vj1, . . ., vjN), which are defined as
one has
rj ðxÞdt ¼ the probability; given XðtÞ ¼ x;
that one reaction Rj will occur some where
vji ¼ nji  mji ;
in the next infinitestimal time interval ½t; t þ dtÞ;
Y
N
xk ! (5)
ðj ¼ 1; ; MÞ: rj ðxÞ ¼ kj
m
k¼1 jk
!ðx k  mjk Þ!
(1)
Chemical Master Equation 397 C
If for any k and j, have xk >> mjk, then and now derive its evolutionary equations, i.e., the
approximately chemical master equation.
We take a time increment dt and consider the
variation between the probability of X(t) ¼ x and of
Y
N
xk mjk
r j ð xÞ ¼ k j (6) X(t + dt) ¼ x. This variation is
k¼1
mjk !
Pðx; t þ dtÞ  Pðx; tÞ ¼ Increasing of the probability in dt C
Then  Decreasing of the probability in dt
(9)
0 1

rj ðxÞ B n1 C N  
B kj V C Y xk mjk XN We take dt so small such that the probability of
¼BN C ; n¼ mjk : (7) having two or more reactions in dt is negligible com-
V @Q A k¼1 V
mjk ! k¼1 pared to the probability for only one reaction. Then
k¼1 increasing of the probability in dt occurs when
a system with state X(t) ¼ x  vj reacts according to
is precisely the reaction flux per unit volume in the Rj in (t, t + dt), the probability of which is rj(x  vj)dt.
mass-action kinetics; the (xk/V) is the concentration of Thus,
0 k V n1
species k, and the term in the parenthesis, kj ¼ Q
j
N ,
mjk ! X
M
k¼1 Increasing of the probability in dt ¼ rj ðx  vj ÞPðx  vj ; tÞdt:
j¼1
is the reaction rate constant in the traditional chemical
kinetics. Here, kj is regarded as “stochastic reaction (10)
0
constant,” while kj is the corresponding reaction
constant for Law of Mass Action. Similarly, when a system with state X(t) ¼ x reacts
The reaction rate constant is deduced only from according to any reaction channel Rj in (t, t + dt), the
experiments, or calculated based on Kramers-Marcus probability P(x, t) will decrease. Thus,
theory. It is usually a function of temperature but
is independent of the volume of a reaction system. X
M

However, the stochastic reaction constant kj depends Decreasing of the probability in dt ¼ rj ðxÞPðx; tÞ dt:
j¼1
on the system volume and the temperature.
From the rate function given from above, the (11)
state vector X(t) is a Markov jump process on the
nonnegative N-dimensional integer lattice. Substituting Eqs. 10 and 11 into Eq. 9, we obtain
Any Markov process can be described by two
completely different, complementary mathematical X
M
Pðx; t þ dtÞ  Pðx; tÞ ¼ rj ðx  vj ÞPðx  vj ; tÞ dt
methods. One follows its stochastic trajectories and j¼1
one consider its probability distribution function (12)
X
M
changing with time. The Gillespie algorithm gives  rj ðxÞPðx; tÞ dt;
the former, and the chemical master equation is for j¼1
the latter.
In the chemical master equation perspective, one no which yields, with the limit dt ! 0, the chemical
longer asks what are the copy number (or concentra- master equation:
tion) of species i at time t, but rather what is the
probability of the system having xi copies of species @ XM
Pðx; tÞ ¼ rj ðx  vj ÞPðx  vj ; tÞ dt
Xi at time t. @t j¼1
We focus on the probability
X
M
 rj ðxÞPðx; tÞ dt: (13)
Pðx; tÞ ¼ PrfXðtÞ ¼ xg; (8) j¼1
C 398 Chemical Master Equation

Characteristics mass-action kinetics. This is exactly due to the s  T


in Eq. 15. This can be explained by a theory of multiple
Consistent with Law of Mass Action timescales.
The relationship between deterministic law of mass
action and stochastic chemical master equation for Numerical Simulation: Gillespie Algorithm
chemical reactions was established by T. Kurtz in We now introduce stochastic simulation methods that
a general theory in which a stochastic Markov generate the random trajectories of X(t). Once we have
chain model for general chemical reaction is studied enough sample trajectories of a stochastic process, we
alongside its deterministic counterpart. Recall that will be able to calculate the probability distribution
both Gillespie algorithm and the chemical master function P(x, t) and all other statistical behaviors
equation are two different descriptions of the including the mean trajectory, variances, and
same stochastic, jump process. Let XV(t) denote this correlations.
stochastic process, where V is the volume of the The stochastic simulation algorithm (SSA) (also
system, and the initial condition (in the thermody- known as the Gillespie algorithm) is based on the
namic limit) is following next-reaction probability distribution:
assume the system is in state x at time t, and let
XV ð0Þ
lim ¼ x0 (14) pðt; jjx; tÞ ¼ probability that; given XðtÞ ¼ x;
V!1 V
the next reaction will occur in the
(16)
Then, the solution to the initial value problem of the infinitesimal time interval ½t þ t; t þ t þ dtÞ;
corresponding chemical kinetics based on the law of and will be the Rj reaction:
mass action model is denoted by c(t,x0). Kurtz
has shown that the relationship between these two In probability theory, there is an elementary theo-
solutions is rem, which states that

XV ðtÞ
lim Pr sup  cðt; x0 Þ > e ¼ 0; pðt;jjx;tÞ ¼ rj ðxÞ expðr0 ðxÞtÞ; 0  t < 1; j ¼ 1;:::; M
V!1 sT V (15)
(17)
for every T and e > 0;
where
It is important to note the mathematical subtlety of
s  T. The above result is only valid for finite time X
M

interval, even though it can be as long as one likes: The r0 ðxÞ ¼ rk ðxÞ (18)
k¼1
e is a function of T; greater the T, larger the e.
Compare the solution to the chemical master The key step of this simulation method is to
equation with initial distribution concentrated at x0, generate the pair of numbers (t, j) in accordance with
P(x,t|x0) and the deterministic kinetics c(t,x0); if the the probability Eq. 18: First generate two random
latter approaches to a steady-state value then the numbers a1 and a2 from the uniform distribution in
former cumulates probability at the steady state. In the unit interval, and then take
general, there is an agreement between the steady
states of the deterministic kinetics and the peaks of  lnða1 Þ
the stationary distribution of the chemical master t¼ ;
r0 ðxÞ
equation.
However, in the limit of infinitely large volume, the X
j
situation is quite different: The peaks of the stationary j ¼ the smallest integer satisfying ri ðxÞ > a2 r0 ðxÞ
distribution of chemical master equation may not i¼1
be consistent with the stable fixed points of the (19)
Chi-Squared Test 399 C
Below are main steps in the stochastic simulation Vellela M, Qian H (2007) Quasistationary analysis of
algorithm. a stochastic chemical reaction: Keizer’s paradox. Bull Math
Biol 69:1727–1746
1. Initialization: Let the initial state as X ¼ X0, and Vellela M, Qian H (2009) Stochastic dynamics and
t ¼ t0. nonequilibrium thermodynamics of a bistable chemical sys-
2. Simulation: Generate a pair of random numbers tem: the Schl€
ogl model revisited. J R Soc Interface 6:925–940
(t, j) according to the probability density function
(17). C
3. Update: Increase the time by t, and replace the ChIP
molecule numbers by X + vj.
4. Iterate: Go back to Step 2 unless the end of the ▶ Chromatin Immunoprecipitation
simulation procedure.
The exact stochastic simulation algorithm consists
with the chemical master equation and gives an exact
sample trajectory of the real system. ChIP-on-Chip Assay

Nobuo Shimamoto
Faculty of Life Sciences, Kyoto Sangyo University,
Cross-References
Kyoto, Japan
▶ Fokker–Planck Equation
▶ Law of Mass Action
▶ Markov Chain
Definition
▶ Stochastic Differential Equation
ChIP-on-CHIP Assay is a method to determine the
location of the genomic DNA to which a DNA-binding
protein binds. The protein is cross-linked to DNA,
References usually in vivo, and the DNA is fragmented.
The DNA fragments cross-linked with the protein are
Deard DA, Qian H (2008) Chemical biophysics: quantitative then immunoprecipitated, and hybridized to DNA chip
analysis of cellular systems, Cambridge texts in biomedical
engineering. Cambridge University Press, Cambridge
or directly sequenced to determine the location.
(Chap 11)
Ge H, Qian H (2009) Thermodynamic limit of a nonequilibrium
steady-state: Maxwell-type construction for a bistable Cross-References
biochemical system. Phys Rev Lett 103:148103
Gillespie DT (1977) Exact stochastic simulation of coupled
chemical reactions. J Phys Chem 81:2340–2361 ▶ Transcription in Bacteria
Kurtz TG (1972) The relationship between stochastic and
deterministic models for chemical reactions. J Chem Phys
57:2976–2978
Lei JZ (2010) Stochastic modeling in systems biology. J Adv
Chi-Squared Test
Math Appl (to appear)
Liang J, Qian H (2010) Computational cellular dynamics Larissa Stanberry
based on the chemical master equation: a challenge for Bioinformatics and High-throughput Analysis
understanding complexity. J Comput Sci Technol (Rev)
25:154–168
Laboratory, Seattle Children’s Research Institute,
McQuarrie DA (1967) Stochastic approach to chemical kinetics. Seattle, WA, USA
J Appl Probab 4:413–478
Qian H, Bishop LM (2010) The chemical master equation
approach to nonequilibrium steady-state of open biochemical
systems: linear single-molecule enzyme kinetics and
Synonyms
nonlinear biochemical reaction networks. Int J Mol Sci
11(9):3472–3500 Pearson’s chi-squared test
C 400 Chondrocytes

Definition References

The chi-squared test of goodness of fit evaluates Agresti A (2002) Categorical data analysis. Wiley, Hoboken
whether the observed frequencies differ from theoret-
ical values.
Chondrocytes

Characteristics Steven D. Rhodes


School of Medicine, Indiana University, Indianapolis,
The chi-squared test of independence is applied IN, USA
to count data that could be represented by the IJ
contingency table. For example, counting sightings
of three different bird species in four different Definition
locations would result in a 3  4 table, where each
cell frequency nij gives the number of sightings of the Chondrocytes are a special type of connective tissue
i-th species in the j-th location. One might be inter- cells that form cartilage. They are derived from mes-
ested to evaluate whether the abundance of species enchymal stem cells of the bone marrow cavity and
differ between the locations. The test statistic is reside in joints where cartilage is required to cushion
given by: bone-on-bone friction. Chondrocyte differentiation
can be induced in vitro from mesenchymal stem cell
cultures in the presence of ascorbic acid plus cytokines
XI X J
nij  mij such as transforming growth factor beta 1 (TGF-b1) or
X2 ¼ ;
i¼1 j¼1
mij TGF-b3. Chondrocytes can be defined phenotypically
by the expression of Cbfa1/Runx2, Type-II collagen,
Type-IX collagen, and Aggrecan. Histologically,
where mij are the expected frequencies under the null mature chondrocytes stain positively with toluidine
hypothesis. The expected frequencies are typically blue, signifying an abundance of proteoglycans within
unknown, but could be estimated by the extracellular matrix.
P P
^ij ¼
m nij nij =n, where n is the total number of
counts.j Substituting
i
^ij in place of mij , we obtain that,
m
asymptotically, the resulting test statisticX2 follows Cross-References
a chi-squared distribution with (I1)(J1) degrees of
freedom (Agresti 2002). ▶ Single Cell Assay, Mesenchymal Stem Cells

Conclusions
The chi-square test provides evidence of association
between the variables; however, it does not indicate Chorioallantoic Membrane (CAM) Assay
how the variables relate to each other. To truly under-
stand the nature of the association, the chi-squared test Marsha A. Moses
alone is not sufficient and should be followed by more Department of Surgery/Harvard Medical School,
detailed analysis, i.e., residual analysis, decomposi- Vascular Biology Program/Children’s Hospital
tion, odds ratios, etc. The chi-squared test relies on Boston, Boston, MA, USA
asymptotic distribution and is applicable only when
the cell frequencies are sufficiently large. For small
sample sizes, the Fisher’s Exact Test can be used. Definition
Furthermore, the chi-squared test is invariant with
respect to reodering of rows and columns. Hence, if The chorioallantoic membrane (CAM) assay is one of
one of the variables is ordinal, the appropriate statistics the earliest in vivo angiogenesis assays to be devel-
that respect the ordinality should be chosen. oped. The CAM of fertilized chicken eggs is
Chromatin Immunoprecipitation 401 C
challenged with either potential stimulators or
inhibitors of neovascularization by placement of Chromatin
polymers or other delivery systems impregnated
with the test substances at a site on the CAM imme- Vani Brahmachari and Shruti Jain
diately above the subectodermal plexus (Ausprunk Dr. B. R. Ambedkar Center for Biomedical Research,
et al. 1975). Zones of vessel clearance indicate University of Delhi, Delhi, India
anti-angiogenic activity, whereas increased vessel C
development in a spokewheel configuration indicates
angiogenic stimulation (Fernandez et al. 2003; Synonyms
Moses et al. 1990).
Compact DNA, Nucleosomes

Cross-References
Definition
▶ Neovascularization
The genome which is a linear DNA molecule is very
long relative to the size of the nucleus in the cells. For
References example, it is estimated that all the 23 pairs of
chromosomes present in the human cells if completely
Ausprunk DH, Knighton DR, Folkman J (1975) Vascularization stretched and lined up end to end, it will extend over
of normal and neoplastic tissues grafted to the chick chorio-
2 m, which is of many orders more than the size of the
allantois. Role of host and preexisting graft blood vessels.
Am J Pathol 79:597–618 nucleus in the cell. However, the DNA is wrapped
Fernandez CA, Butterfield C, Jackson G, Moses MA (2003) Struc- around a ball of proteins called the histones and further
tural and functional uncoupling of the enzymatic folded to be accommodated in the nucleus. This
and angiogenic inhibitory activities of tissue inhibitor of
metalloproteinase-2 (TIMP-2). J Biol Chem 278:40989–40995
compact DNA-protein complex is called chromatin.
Moses MA, Sudhalter J, Langer R (1990) Identification of an The DNA-protein complex forms repetitive units called
inhibitor of neovascularization from cartilage. Science ▶ nucleosomes. The compaction state of chromatin is
248:1408–1410 correlated to the transcriptional activity of DNA. The
chromatin provides the platform for interaction with
various regulatory factors that control transcription.

Choristoma
Cross-References
Barbara J. Davis
Section of Pathology, Tufts Cummings School of ▶ Epigenetics
Veterinary Medicine Biomedical Sciences, ▶ Nucleosomes
North Grafton, MA, USA

Definition Chromatin Immunoprecipitation

A heterotopic rest of cells in normally organized tissue Vani Brahmachari and Shruti Jain
in an abnormal site – common to find pancreatic tissue Dr. B. R. Ambedkar Center for Biomedical Research,
in the liver. University of Delhi, Delhi, India

Cross-References Synonyms

▶ Cancer Pathology ChIP


C 402 Chromatin Proteins

Chromatin
Immunoprecipitation,
Fig. 1 Outline of chromatin Sonicated to break up the cross-
immunoprecipitation (ChIP). Protein-DNA cross-linked linked chromatin to 500-200base
Immunoprecipitation with pair fragments
with formaldehyde &
anti-H3K4me is shown as an These fragments are not
Chromatin isolated precipitated as they do
example. The modification on
not have modified histone
histone is marked as a star H3 (marked as star).

Immunoprecipitated with
antibodies directed against histone
H3 methylated at lysine 4

DNA purified and


PCR amplified and
sequenced.

Definition References

Chromatin immunoprecipitation (ChIP) is a type of Collas P (2010) The current state of chromatin immunoprecipi-
tation. Mol Biotechnol 45:87–100
immunoprecipitation technique used to analyze the pro-
tein-protein and DNA-protein interactions (Collas 2010).
It aims to determine whether specific proteins, like tran-
scription factors are associated with specific genomic
regions like promoter elements or regulatory elements.
It can also be used to determine the specific histone Chromatin Proteins
(▶ epigenetics) modifications associated at the target
sites in the genome, indicating gene expression profiles. ▶ Histones
The steps involved are as follows (Fig. 1):
(a) The cells are taken and the protein-DNA
complexes (chromatin) are crosslinked by formal-
dehyde treatment.
(b) The DNA-protein complexes are then sheared and Chromodomain Helicase DNA Binding
DNA fragments associated with the protein(s) of (CHD)
interest are selectively immunoprecipitated using
antibody against the specific protein. Tetsuro Kokubo
(c) The associated DNA fragments are purified and Department of Supramolecular Biology, Graduate
sequences are identified by amplification or DNA School of Nanobioscience, Yokohama City
sequencing reactions. University, Yokohama, Kanagawa, Japan

Cross-References Definition

▶ Epigenetics In metazoans such as fly and human, the CHD class of


▶ Epigenetics, Drug Discovery ATP-dependent nucleosome-remodeling factors
Chromodomain Helicase DNA Binding (CHD) 403 C
includes many members, e.g., CHD1 and Mi-2/NuRD. arrays in the coding region of ADH2
In yeast, by contrast, the family has only a single (Xella et al. 2006). CHD is involved in multiple
member, CHD1 (Clapier and Cairns 2009) (summa- steps of mRNA biogenesis, including transcrip-
rized in Table 1). tional initiation, elongation, termination, and
CHD appears to function redundantly and/or splicing.
cooperatively with ISWI in remodeling nucleo- Mi-2/NuRD is a multi-protein complex that
somes. For instance, CHD and ISWI are both contains HDACs and methyl CpG-binding proteins C
required in yeast for nucleosome eviction from (see the definition “▶ mammalian HDAC”). This
the PHO5 promoter (Ehrensberger and Kornberg complex plays a crucial role in transcriptional repres-
2011) and for appropriate spacing of nucleosome sion rather than in activation.

Chromodomain Helicase DNA Binding (CHD), Table 1 Remodeler composition and orthologous subunits
Organisms
Family and composition Yeast Fly Human
SWI/ Complex SWI/SNF RSC BAP PBAP BAF PBAF
SNF ATPase Swi2/Snf2 Sth I BRM/Brahma hBRM or BRGI
BRGI
Noncatalytic Swi1/Adr6 OSA/ BAF250/
homologous subunits eyelid hOSAI
Polybromo BAF 180
BAP170 BAF 200
Swi3 Rsc8/Swh3 MOR/BAP155 BAF155, BAF170
Swp73 Rsc6 BAP60 BAF60a or b or c
Snf5 Sfh1 SNRI/BAP45 hSNF5/BAF47/INII
BAPI11/dalao BAF57
Arp7, Arp9 BAP55 or BAP47 BAF53a or b
Actin β-actin
a b
Unique
ISWI Complex ISW1a ISW1b ISW2 NURF CHRAC ACF NURF CHRAC ACF
ATPase Isw1 Isw2 ISWI SNF2L SNF2Hc
Noncatalytic Itc1 NURF301 ACF1 BPTF hACFI/
homologous subunits WCRF180
CHRAC14 hCHRAC17
CHRAC16 hCHRAC15
NURF55/ RbAp46
p55 or 48
Unique Ioc3 Ioc2, NURF38
Ioc4
CHD Complex CHD1 CHD1 Mi-2/NuRD CHD1 NuRD
ATPase Chd1 dCHD1 dMi-2 CHD1 Mi-2α/CHD3,
Mi-2β/CHD4
Noncatalytic dMBD2/3 MBD3
homologous subunits dMTA MTA1,2,3
dRPD3 HDAC1,2
p55 RbAp46 or 48
p66/68 p66α,β
Unique DOC-17
(continued)
C 404 CIA

Chromodomain Helicase DNA Binding (CHD), Table 1 (continued)


Organisms
Family and composition Yeast Fly Human
INO80 Complex INO80 SWR1 Pho- Tip60 INO80 SRCAP TRRAP/Tip60
dINO80
ATPase Ino80 Swr1 dIno80 Domino hIno80 SRCAP p400
Noncatalytic Rvb1,2 Reptin, Pontin RUVBL1,2/Tip49a,b
homologous subunits Arp5,8 Arp6 dArp5,8 BAP55 BAF53a
Arp4, Actin1 dActin1 Actin87E Arp5,8 Arp6 Actin
Taf14 Yaf9 dGAS41 GAS41
Ies2,6 hIes2,6
Swc4/ dDMAP1 DMAPI
Eaf2
Sw2/ dYL-1 YL-1
Vps72
Bdf1 dBrd8 Brd8/TRC/
p120
H2AZ, H2Av,H2B H2AZ,
H2B H2B
Swc6/ ZnF-HITI
Vps71
dTra1 TRRAP
dTip60 Tip60
dMRG15 MRG15
MRGX
dEaf6 FLJI1730
dMRGBP MRGBP
E(Pc) EPCI, EPC-
like
dING3 ING3
d
Unique Ies1,les3-5, Swc3,5,7 Pho
Nhp10
a
Swp82, Taf14, Snf6, SnfI 1
b
Rsc1 or Rsc2, Rsc3-5, 7, 9, 10, 30, Htl1, Ldb7, Rtt102
c
In addition, SNF2H associates respectively with Tip5, RSF1, and WSTF to form NoRC, RSF, and WICH remodelers
d
Amida, NFRKB, MCRS1, UCH37, FLJ90652, FLJ20309

Cross-References remodeling factor. Proc Natl Acad Sci USA


108(25):10115–10120
Xella B, Goding C, Agricola E, Di Mauro E, Caserta M (2006)
▶ Mechanisms of Transcriptional Activation and The ISWI and CHD1 chromatin remodelling activities
Repression influence ADH2 expression and chromatin organization.
Mol Microbiol 59(5):1531–1541

References

Clapier CR, Cairns BR (2009) The biology of chromatin


remodeling complexes. Annu Rev Biochem 78:273–304
CIA
Ehrensberger AH, Kornberg RD (2011) Isolation of an
activator-dependent, promoter-specific chromatin ▶ CIA/Asf1
CIA/Asf1 405 C
CIA/Asf1

Toshiya Senda1 and Naruhiko Adachi2


1
Biomedicinal Information Research Centre (BIRC),
National Institute of Advanced Industrial Science
and Technology (AIST), Tokyo, Japan C
2
Structure-guided Drug Development Project, JBIC
Research Institute, Japan Biological Informatics
Consortium, Tokyo, Japan

Synonyms

Asf1; CIA CIA/Asf1, Fig. 1 Crystal structures of CIA/Asf1 and its histone
complex. (a) Crystal structure of CIA (PDB code: 1ROC). (b)
Crystal structure of the CIA–histone-H3–H4 complex (PDB
code: 2IO5). CIA, H3, and H4 are shown in red, blue, and
Definition green, respectively

CIA (CCG1 interacting factor A)/Asf1 (anti-silencing


function 1) is the most conserved histone chaperone in DNA repair. In these nuclear processes, CIA/Asf1 seems
eukaryotes and plays important roles in various nuclear to play a role(s) in histone transfer and nucleosome
events such as transcription, replication, and DNA structural change. Furthermore, CIA/Asf1, as described
repair (Eitoku et al. 2008). The gene of this factor above, forms a complex with CAF-1 and HIRA, which
was first identified in genetic screening as a factor have been considered to be involved in nucleosome
with an anti-silencing function. However, at that formation in replication-dependent and -independent
time, the biochemical activity of Asf1 was unknown. manners, respectively. The tertiary structure of CIA/
After a few years, two groups have independently Asf1 revealed that it has a b-sandwich structure like
rediscovered this factor as one relevant to nucleosome immunoglobulin (Fig. 1a). CIA/Asf1 forms a complex
formation with histone H3–H4-binding activity. One with the histone H3–H4 dimer as elucidated by crystal
group identified the CIA/Asf1–histone-H3–H4 com- structure analyses (Fig. 1b). Biochemical analysis
plex as an activation factor of CAF-1, which is showed that CIA/Asf1 has an activity that disrupts the
involved in the nucleosome formation in DNA histone (H3–H4)2 tetramer into two dimers through the
replication. The other group identified CIA/Asf1 as formation of a CIA/Asf1–histone-H3–H4 complexes.
a histone chaperone that interacts with the CCG1
(cell cycle gene 1) subunit of the general transcription
factor TFIID. Further analysis revealed two subtypes Cross-References
of CIA/Asf1, Asf1a and Asf1b (CIA-I and CIA-II,
respectively). The CIA/Asf1 molecule is composed ▶ Histone Post-translational Modification to
of two parts, the N-terminal structured region with Nucleosome Structural Change
155 amino acid residues and a C-terminal acidic tail.
The amino acid sequence of the N-terminal structured
region is highly conserved among all eukaryotes; References
amino acid sequences of human and yeast CIA/Asf1
Eitoku M, Sato L, Senda T, Horikoshi M (2008) Histone chap-
show approximately 60% sequence identity. Genetic,
erones: 30 years from isolation to elucidation of the mecha-
biological, and biochemical studies have revealed that nism of nucleosome assembly and disassembly. Cell Mol
CIA/Asf1 is involved in transcription, replication, and Life Sci 65:414–444
C 406 Circadian

Figure 1 shows the circadian patterns of typical


Circadian human who rises early in the morning, eats lunch
around noon, and sleeps at night.
▶ Circadian Rhythm In mammals, the focus point of circadian
rhythms is a master clock, located in the
suprachiasmatic nuclei (SCN) of the anterior hypo-
Circadian Rhythm thalamus, which orchestrates the circadian program.
Circadian timing in mammals is organized in
Jinzhi Lei a hierarchy of multiple circadian oscillators. The
Zhou Pei-Yuan Center for Applied Mathematics, oscillatory machinery of the master clock is contained
Tsinghua University of Beijing, Beijing, China within single neurons (“clock cells”), and the SCN is
composed of numerous clock cells. The SCN receives
light information by a direct retinohypothalamic tract
Synonyms (RHT) to entrain the clock to the 24-h day, and in turn
coordinates the timing of slave oscillators in other
Biological clock; Circadian; Human clock brain areas and in peripheral organs (Reppert and
Weaver 2002).
Differences in clock protein levels and/or kinetics
Definition are considered as the main molecular basis of circa-
dian timing in slave oscillators. The intracellular
A circadian rhythm is the daily cycle of biological clock mechanism in mammals involves interacting
activity based on a 24-h period and influenced by positive and negative transcriptional feedback loops
regular variations in the environment, such as the alter- that drive recurrent rhythms in the RNA and protein
nation of night and day. Circadian rhythms include levels of key clock components (Reppert and Weaver
sleeping and waking in animals, flower closing and 2002). A detailed predictive model of the mammalian
opening in angiosperms, and tissue growth and differ- circadian clock was developed in Forger and Peskin
entiation in fungi. (2003).

00:00
22:30
Midnight 02:00
Bowel movements supperssed Deepest sleep
21:00
Melatonin secretion starts

04:30
19:00 Lowest body temperature
Highest body temperature
Highest blood pressure 18:30
18:00 06:00

17:00 06:45 Sharpest rise in blood pressure


Greatest cardiovascular efficiency 07:30 Melatonin secretion stops
and muscle strength

15:30 08:30 Bowel movement likely


Fastest reaction time 09:00 Highest testosterone secretion
14:30
Best coordination 10:00 High alertness
12:00
Noon

Circadian Rhythm, Fig. 1 Diagram of the circadian patterns of a typical human (Smolensky and Lamberg 2000)
Classification 407 C
References data. Roughly speaking, this means that C should be
able to classify so far unseen instances x 2 X correctly.
Forger DB, Peskin CS (2003) A detailed predictive model of the Apart from predictive accuracy, other criteria may of
mammalian circadian clock. Proc Natl Acad Sci USA
course play a role. For example, interpretable classi-
100:14806–14811
Reppert SM, Weaver DR (2002) Coordination of circadian fiers such as rule-based models (▶ Rule-based
timing in mammals. Nature 418:935–941 Methods) are often preferred to “black box” models
Smolensky M, Lamberg L (2000) The body clock guide to better delivering predictions which are not comprehensible C
health: how to use your body’s natural clock to fight illness
by a human user. Besides, the efficiency and scalability
and achieve maximum health. Henry Holt and Company,
New York of the underlying learning algorithms is of major
importance.

Cis-Acting Sequences in the Core Characteristics


Promoter
The predictive accuracy of a classifier C is typically
▶ Core Promoter Elements measured in terms of its expected loss (or risk) R(C),
that is, the expected value of the loss L(C(X),Y), where
(X,Y) is a sample taken at random from X  Y
according to a probability distribution P. Moreover,
Cis-Elements L : Y  Y !  is a (real-valued) loss function that
penalizes class predictions y^ ¼ CðxÞ deviating from
▶ Transcription Factors and DNA Elements in the true class y of an instance x:
Eukaryote
Z
RðCÞ ¼ EðLðCðXÞ; YÞÞ ¼ LðCðxÞ; yÞdPðx; yÞ
X Y

Classification The simplest loss function is the 0/1 loss, defined as


Lð^y; yÞ ¼ 0 if y^ ¼ y and Lð^y; yÞ ¼ 1 otherwise, though
Eyke H€ullermeier, Thomas Fober and other losses are used as well. In cost-sensitive classifi-
Marco Mernberger cation, for example, a different loss value
Philipps-Universit€at Marburg, Marburg, Germany cij ¼ Lðyi ; yj Þ can be specified for each pair of classes
ðyi ; yj Þ 2 Y 2 .
Since the probability distribution P is in general not
Definition known, the expected loss of a candidate model can
only be estimated. An obvious estimate is the so-called
In statistics and machine learning, classification refers empirical risk of a classifier C, namely, the average
to the problem of learning a classifier in the form of loss on the training data:
a mapping C from an instance space X to a finite set of
class labels Y ¼ fy1 ; y2 ; . . . ; yk g: Thus, classification 1X n
Remp ðCÞ ¼ LðCðxi Þ; yi Þ
is closely related to regression (▶ Regression Analy- n i¼1
sis), where the output space is numerical instead of
categorical. In order to induce a classifier, the However, for a classifier C learned on the training
learning algorithm has access to training data, data, the empirical risk will normally underestimate
which typically consists of a set of examples the true risk. Roughly speaking, this is because C has
fðx1 ; y1 Þ; . . . ; ðxn ; yn Þg  X  Y, that is, instances been optimally adapted to the training data, and thus
together with a corresponding class label. Classifica- may fit this data even better than the true risk mini-
tion is a specific type of supervised learning (▶ Learn- mizer C ¼ arg minC0 2H RðC0 Þ. In other words, mini-
ing, Supervised), and the main goal is to induce mizing the empirical risk on the training data comes
a classifier that generalizes well beyond the training with the danger of “▶ overfitting” the data, which
C 408 Classification

means that the true risk of a classifier C is much higher classifier that discriminates this class (considered as
than its empirical risk. the positive class) from all other classes (considered as
The danger of “over-fitting” typically increases negative). At prediction time, each binary classifier
with the flexibility (capacity) of the underlying predicts whether the query input is positive or not.
▶ hypothesis space H, that is, the set of candidate Tie breaking techniques (typically based on confidence
models from which a classifier C is chosen. This flex- estimates for the individual predictions) are used in the
ibility depends on the classification method or, more case of a conflict, which may arise when no class or
precisely, on the underlying model representation. more than one class is predicted.
▶ Linear models, for example, separate classes by
fitting a hyperplane in the instance space. They are Ordinal and Hierarchical Classification
less flexible than artificial neural networks, ▶ decision In standard classification, the set of class labels Y does
trees, or nearest neighbor methods, which can repre- not have any internal structure. In many applications,
sent highly nonlinear decision boundaries. The restric- however, the classes are not completely unordered.
tion to certain types of decision boundaries is called the Two important special cases are a total order
representation bias of a classification method. It (y1  y2  . . .  yk ) and a hierarchy, giving rise,
largely determines the inductive bias of the method, respectively, to ordinal classification (also called ordi-
which, informally speaking, produces a preference for nal regression in statistics) and hierarchical classifica-
specific types of models. A bias of that kind is needed tion. As examples consider, respectively, learning to
to enable a choice between a potentially infinite num- predict the expression of a gene on the scale
ber of mappings X ! Y which explain the training Y ¼ {underexpressed, normal, overexpressed} and
data equally well. Implementing a suitable inductive the functional class of a protein according to the hier-
bias and finding a proper trade-off between the com- archical EC nomenclature for enzymes.
plexity of a model and its performance on the training From a learning point of view, the structure of Y is
data are key prerequisites for successful learning additional information that a learner should try to
(Bishop 2006). exploit, and this is what existing methods for ordinal
and hierarchical classification essentially seek to do.
Binary Versus Multi-class Classification The basic assumption in this regard is that the structure
The simplest type of classification problem is of Y is reflected in the topology of the class distribution
binary (dichotomous) classification, in which Y con- in the instance space X . Moreover, corresponding
sists of only two classes, typically referred to as the algorithms normally seek to minimize loss functions
positive and negative class, respectively. For such other than the simple 0/1 loss, which is arguably inap-
problems, a multitude of efficient and theoretically propriate for a structured output space Y. For example,
well-founded classification methods exists. In fact, despite being wrong, predicting the functional class
the representation of models is often geared toward a-amylase (EC 3.2.1.1) for a protein that actually
the binary case, and sometimes even restricted to belongs to the class of b-amylases (EC 3.2.1.2) is still
this problem class. For example, methods such as better than predicting a phosphorylase (EC 2.4.1.1) –
▶ logistic regression and support vector machines the former two classes are both glycosidases, and
produce decision boundaries in the form of a separat- hence functionally related.
ing hyperplane that can only divide the instance space
into two parts. Feature Representation
Other methods, such as decision trees and the naı̈ve Instances x 2 X are normally described in terms of an
Bayes classifier, are not restricted in this way. Instead, attribute-value representation or “feature vector”,
they are directly applicable to the case of multi-class which means that x is a vector ðx1 ; . . . ; xm Þ and X the
(polytomous) classification, that is, problems involv- Cartesian product X 1  . . .  X m , with X i the domain
ing more than two classes. Algorithms that are inher- of the i-th attribute (i ¼ 1; . . . ; m). In fact, many
ently binary can still be used in a multi-class setting, methods for classification as well as widely used soft-
namely, through class binarization techniques. ware systems, such as the WEKA machine learning
A popular example is the one-against-rest binarization, toolbox (Hall et al. 2009), expect exactly this type of
where one takes each class in turn and learns a binary representation.
Classification 409 C
Prior to actually training a classifier, this represen- methods can be used on a “meta level,” more or less
tation is normally subjected to a number of independently of the underlying classifier.
preprocessing steps, including normalization, dimen- An important example of this type of approach is
sionality reduction (like ▶ Principal Component Anal- ▶ ensemble learning, where a potentially large number
ysis (PCA)), and ▶ feature selection or weighing. The of classifiers is trained instead of a single one (Rokach
selection of a small to moderate number of attributes 2010). At prediction time, each member of the ensem-
(features) is important for many learning algorithms, ble is queried, and the predictions thus obtained are C
the performance of which may deteriorate in the pres- combined in one way or the other, for example, by
ence of irrelevant, noisy, or simply too many features. means of a simple voting scheme or so-called stacking,
This aspect is especially critical in applications like where the optimal combination itself is formalized as
gene expression analysis, where the number of features a classification problem. In order to generate
(the genes) may largely exceed the number of a (diverse) ensemble of classifiers, different methods
instances (e.g., different types of tissue). can be used, including resampling methods such as
▶ bagging and boosting.
Classification of Structured Data
Especially in the life sciences, however, a feature rep- Classification in Systems Biology
resentation of the above kind is often not natural, and Systems biology offers a multitude of applications for
representing an object in terms of a fixed number of classification methods. For example, classification has
predefined features may come along with a considerable already been used for the construction and analysis of
loss of information. Instead, other representations, such metabolic networks, signaling and protein interaction
as sequences and graphs, appear to be more appealing. networks, gene regulatory networks, the analysis of
A small molecule, for example, is naturally represented microarray data, and the prediction of the influence
in terms of a graph, with each atom corresponding to of toxins on biochemical pathways (Muggleton 2005;
a node, and a bond between a pair of atoms modeled as Larranaga et al. 2006; Fogel et al. 2007). The use of
an edge; likewise, biological networks are commonly powerful predictors for protein–protein or protein–
represented as graphs or hypergraphs. Another example ligand interactions can help to reduce and organize
is modeling molecular structures on the sequence level, the “wet” laboratory work necessary to verify such
which comes down to representing them in terms of interactions in order to construct interaction networks.
a sequence over a finite number of symbols, where Moreover, classification tasks arise when extracting
different sequences can differ in length. the information contained in huge biological networks
Dedicated classification algorithms for handling this (Rapaport et al. 2007).
type of data have been developed in the recent years.
Especially interesting in this regard is kernel-based
machine learning (▶ Learning, Kernel-based), includ- Cross-References
ing support vector machines, which allows for general-
izing (“kernelizing”) conventional methods by means of ▶ Bagging
a mathematical construct called kernel function (Shawe- ▶ Data Mining–based Transcriptional Regulatory
Taylor and Christianini 2004). A kernel function K is Network Construction
a mapping X  X !  satisfying special properties, ▶ Decision Tree
and Kðx; x0 Þ can often be interpreted as a degree of ▶ Ensemble
similarity between the instances x and x0 . A kernel of ▶ Feature Selection
that kind can also be defined for structured data objects ▶ Learning, Kernel-Based
like graphs and sequences, thereby making kernel-based ▶ Learning, Supervised
algorithms amenable to this type of data. ▶ Linear Model
▶ Logistic Regression
Ensemble Learning ▶ Overfitting
In recent years, several methods for improving predic- ▶ Principal Component Analysis (PCA)
tive performance have been developed that go beyond ▶ Regression Analysis
the learning of a single classifier. Instead, such ▶ Rule-based Methods
C 410 Classification Assessment and Optimization

References Classification of Cancer Genesis, Table 1 Types of cancers


(A) Inborn errors of development
Bishop CM (2006) Pattern recognition and machine learning. (A1) Inherited error of development (5% of all clinical
Springer, Berlin cancers)
Fogel G, Fogel GB, Corne DW, Pan Y (2007) Computational
intelligence in bioinformatics. Wiley-IEEE Press, New York (A2) Induced errors of development (?% of all clinical
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten cancers)
IH (2009) The WEKA data mining software: an update. (B) Sporadic cancers
SIGKDD Explorations 11(1):10–18 90% (?) of all clinical cancers
Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I,
Lozano JA, Armananzas R, Santafé G, Perez A,
Robles V (2006) Machine learning in bioinformatics. Brief
Bioinform 7(1):86–112 Characteristics
Muggleton SH (2005) Machine learning for systems biology. In:
Kramer S, Pfahringer B (eds) Inductive logic programming.
Springer, Berlin, pp 416–423 Inborn Errors of Development
Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP Among these, two varieties have been distinguished,
(2007) Classification of microarray data using gene net- namely, (A1) inherited inborn errors of development
works. BMC Bioinformatics 8(1):35–50
and (A2) induced inborn errors of development
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev
33:1–39 (Table 1). The inherited inborn errors of development
Shawe-Taylor J, Christianini N (2004) Kernel methods for pat- are the result of a process initiated by germ line
tern anaylsis. Cambridge University Press, Cambridge mutation(s) in the genome of one or both gametes
(sperm and/or ovum). If a mutated conceptus develops
from this mutated egg, necessarily all of its cells will
carry such genomic mutation(s); however, tumors
usually appear in a single organ. This type of tumor
Classification Assessment and (or malformation) represents the outcome of abnormal
Optimization organogenesis in one or more tissue(s) in said
offspring. Examples of this variety of inborn errors
▶ Receiver Operating Characteristic (ROC) Curve of development include retinoblastoma, Gorlin
syndrome, xeroderma pigmentosa, and BRCA-1 breast
neoplasias. For a comprehensive listing and descrip-
tion of these tumors, see Pizzo and Poplack (2011).
Classification of Cancer Genesis Tumors belonging to the (A2) induced inborn error of
development variety appear either perinatally, during
Carlos Sonnenschein and Ana M. Soto the first two decades after birth, or even later. Recent
Department of Anatomy and Cellular Biology, Tufts experimental evidence collected in laboratories world-
University School of Medicine, Boston, MA, USA wide using rodents verified that when embryos, fetuses,
and/or neonates become exposed to chemical (even
minute concentrations of environmental endocrine
Definition disruptors) and/or physical (radiation) carcinogens,
a number of premalignant and malignant neoplastic
Cancers can be grouped into two main types regarding lesions appeared in young and adult populations.
their genesis in the lifetime of humans. They are: There is evidence of a similar process in humans. For
(a) Inborn errors of development instance, in utero exposure of human fetuses to diethyl-
(b) Sporadic cancers stilbestrol (DES) during pregnancy was responsible for
The sporadic cancers have been estimated to the appearance of rare vaginal adenocarcinomas around
represent about 95% of the clinical cancers. As of puberty and for an increased incidence of breast tumors
recently, the inborn errors of development were in those women at the age of prevalence of this cancer
considered as representing less than 5% of all cancers. (Hatch et al. 1998; Palmer et al. 2006). Given the greater
These percentages are subject to reinterpretation as incidence of breast cancers among increasingly younger
new evidence accumulates (Table 1). women and the experimental data already accumulated,
Classification Performance Evaluation 411 C
it has been inferred that a good number of what so far Regarding ▶ metastases, the TOFT proposes that
have been considered sporadic cancers might, instead, the microenvironment in which the migrating
belong to this induced inborn error of development cancer cells (epithelial plus stromal emboli) land
group (Soto and Sonnenschein 2010). However, data plays either a “normalizing” (inhibiting) or a facilitat-
have yet to be collected before accurate estimates of ing effect on the proliferation of those cells. In this
the percentage of what have been once considered to context, the inhibiting or the facilitating effect of
be sporadic cancers become, in fact, examples of the microenvironments is susceptible to change as C
induced inborn errors of development. the host ages or if the host is exposed to additional
carcinogens later in life. Additionally, cancer
Sporadic Tumors treatments (chemotherapy, radiation) may affect the
These represent the bulk of tumors seen by clinicians microenvironment in which the cancer cells (epithelial
and diagnosed by pathologists that appear in adults plus stromal) that became detached from the primary
which have no obvious link to germ line mutations. tumor land. These contingencies will decide whether
Their exact pathogenesis is currently the subject of or not and when those “seeds” will become metastases.
intense controversy between the somatic mutation In contrast, the SMT claims that metastases will
theory (SMT) (Weinberg 2006) and the tissue organi- develop as a result of the acquisition of additional
zation field theory of carcinogenesis (TOFT) (Soto and mutations or their activation in the cells of the primary
Sonnenschein 2011) (▶ Cancer Theories). tumor (which would induce them to migrate) and/or in
The SMT states that tumor formation is due to the those cells that already migrated from the primary and
effect of one or the accumulation of more mutations landed at a distance of it.
over a lifetime in a single founder cell in a host
who was exposed to one or many carcinogens.
Alternatively, the TOFT posits that carcinogens target Cross-References
tissues as units in the process of carcinogenesis.
According to the TOFT, the disrupted interaction ▶ Cancer Theories
between tissues and among cells would fully explain ▶ Metastasis
the formation of a tumor. While the SMT explicitly
proposes that all tumors are originally monoclonal, the
TOFT postulates that the tumor mass is made up of References
multiple heterogeneous populations of cells that, from
the beginning of the carcinogenic process, have Pizzo PA, Poplack DG (2011) Principles and practice of
pediatric oncology, 6th edn. Lippincott Williams & Wilkins,
the capacity to either evolve into a full-blown tumor
Philadelphia
or to regress and become normalized depending of the Hatch EE, Palmer JR, Titus-Ernstoff L, Noller KL, Kaufman RH,
degree of tissue disruption the target tissue undergoes Mittendorf R, Robboy SJ, Hyer M, Cowan CM, Ervin A
(Soto and Sonnenschein 2011). et al (1998) J Am Med Assoc 280:630–634
Palmer JR, Wise LA, Hatch EE, Troisi R, Titus-Ernstoff L,
For the purpose of classifying tumors as either Strohsnitter W, Kaufman R, Herbst AL, Noller KL,
induced inborn errors of development or of the Hyer M et al (2006) Cancer Epidem Biomar 15:1509–1514
sporadic type, a bone of contention has been whether Sonnenschein C, Wadia PR, Rubin BS, Soto AM (2011) Journal
carcinogens have to be mutagenic. According to of Developmental Origins of Health and Disease 2:9–16
Soto AM, Sonnenschein C (2010) Nat Rev Endocrinol
the SMT, carcinogens are mutagens. The TOFT
6:363–370
acknowledges, instead, that disruptors of normal Soto AM, Sonnenschein C (2011) Bioessays 33:332–340
tissue-tissue or cell-cell interactions can be Weinberg RA (2006) The biology of cancer. Taylor & Francis,
carcinogens regardless of whether they are of a New York
physical or chemical nature. Increasing amounts of
experimental data suggest that carcinogens need not
be mutagenic. This is exemplified by environmental
endocrine disruptors that are not known to be Classification Performance Evaluation
mutagenic while retaining effective carcinogenic
properties (Sonnenschein et al. 2011). ▶ Receiver Operating Characteristic (ROC) Curve
C 412 Classification Tree

Most systems biology techniques, such as gene arrays


Classification Tree or mass spectrometry, rely on tissue disruption before
molecular phenotyping is undertaken. Clearly, as spa-
▶ Decision Tree tial and anatomical information is not preserved, the
resultant lists of genes, transcripts, or proteins can only
be partially translated into putative functional molec-
ular networks by the use of informatics and this will
Clinical Aspects of the Toponome be “error prone.” There is certainly no possibility to
Imaging System (TIS) compare protein anatomical location between chosen
cells, or to directly link molecular phenotype with
Michael Khan1 and Christine Waddington2 visible biology.
1
Department of Biological Science, Biomedical The current goal now is to microscopically examine
Research Institute, University of Warwick, protein expression in tissue sections or cell cultures.
Coventry, UK TIS is a new technique that combines traditional fluo-
2
MOAC/University of Warwick, Coventry, UK rescence microscopy with the ability to visualize
expression of several hundred proteins simultaneously
in the same tissue section or cell culture preparation. In
Definition disease diagnosis and subclassification, it is often crit-
ical to be able to directly relate molecular phenotype to
The Toponome Imaging System (TIS) (▶ TIS Robot) anatomy. Importantly, to define a particular cell line-
has several key clinical applications including disease age or stem cell identity, observation of several pro-
diagnosis and drug design. TIS can visualize the co- teins and other molecules may be needed and with TIS
location of many proteins present on the surface and this can be done in situ, allowing study of such cells in
inside cells. Location patterns of the resulting their relevant “niche.”
protein clusters can be recognized that are disease or Finally, the ability to visualize protein co-
drug reaction specific. Additionally, by using known localization on a large scale is potentially a very pow-
protein cluster patterns linked to a cell’s phenotype erful surrogate for protein interaction. TIS looks
(combinatorial molecular phenotype, CMP), and in directly at protein co-locations on the cell surface and
combination with other systems biology tools, inside the cell, and so can infer possible proteins work-
conclusions can be made about cell protein networks ing together. Although one cannot be certain that pro-
as a cell enters the diseased state. Drug therapies teins co-localizing are physically interacting,
can be targeted so that they are disease specific, and proximity can be employed to greatly narrow down
the opportunities for early disease diagnosis are options defining functional protein networks.
enhanced. To discover where proteins are co-locating in clus-
ters, comparison is made between the individual pro-
tein fluorescent images. Patterns of clustering between
Characteristics specimens of similar types can then be determined, and
visualized, as a mosaic of protein clusters directly
The Toponome Imaging System (TIS) (▶ TIS Robot) linking to the cell’s phenotype, these analyzed results
is an automated fluorescence technique with the ability being known as ▶ combinatorial molecular pheno-
to co-map at least 100 different proteins, or other TAG- types (CMPs). Combining this with knowledge of
recognizable biomolecules, on a single tissue section. associated protein networks, active proteins can be
To enable co-mapping, the system will generate identified, and predictions made of the functional
a location image for every protein, or biomolecule, importance of these proteins for the different cell phe-
being studied. notypes. Samples from different subjects will have
Proteins are known to interact with each other in a natural CMP variation; however, comparable fea-
a cell, and the challenge of understanding how tures between similar cell types are still observed due
this cellular interaction occurs is a major stated to the similarity of the underlying spatial arrangement
goal in both medical research and systems biology. of the protein networks.
Clinical Aspects of the Toponome Imaging System (TIS) 413 C

Clinical Aspects of the Toponome Imaging System (TIS), Fig. 1 A TIS comparison of CMPs between cancerous colon cells
(on the left) and normal colon cells in the same patient (on the right) (Bhattacharya et al. 2010)

Diseased and nondiseased tissues will show unique dermatitis exhibit distinct protein clusters: Some of
and different CMP patterns, and thus the patterns can these clusters are specific to either psoriasis or atopic
act like “fingerprints” in further comparative analysis dermatitis, whereas others are common to both condi-
of other tissue. An example of these different CMP tions (Schubert et al. 2006). If the clinician knows the
patterns can be seen in Fig. 1. The pictures show the unique “fingerprint” of each disease, only a small skin
CMP clusters displayed in color over a visual image sample is needed to give a clear diagnosis. Disease
(Phase) and over a fluorescent image (DAPI). The left- fingerprinting will also aid the clinician in situations
hand side shows colon cancer tissue, and the right- of uncertain or limited clinical symptoms, helping to
hand side normal colon tissue (both tissues are from prevent situations of misdiagnosis.
the same patient). In this example, 21 proteins were Another example of the clinical possibilities for
used (given acronyms such as PCNA, Bax, etc.) and it using TIS is in cancer studies. In cancer, the ability to
can be seen for each color of protein cluster which look at the tissue for the visible hallmarks of malig-
proteins contribute (shown as a 1 in the table). Lead nancy alongside molecular phenotype is critical, in
proteins (L) are present in all the main CMPs, while particular, as it enables the examination of single
other proteins may be absent in the clusters (A). Wild cells, which may be cancer cells that are invading,
card proteins (W) are present in some but not all dividing, or could be immune or stromal cells. Because
protein clusters and may have key significance in the TIS can detect individual cell protein patterns, a very
cell networks. small number of affected cells can give a positive
Using TIS analysis is a clinically powerful method cancer result; this enhances the chances of early detec-
because, as well as revealing which proteins are pre- tion, especially because cancer cells show very distinc-
sent and where, it indicates how they are co-located, tive differences in their topology in comparison to
giving further information about particular cells and normal, noncancerous cells. While much is known
subcellular regions. Importantly, being able to use about the biochemical pathways in cancers, an under-
a large number of biomarkers allows subtle differences standing is needed in cell organization of functions,
between similar samples to be determined. For exam- and this can be achieved using ▶ toponome analysis to
ple, compared to normal skin, psoriasis and atopic create insight into the key protein networks driving the
C 414 Clinical Aspects of the Toponome Imaging System (TIS)

disease. Schubert et al. (2009) analyzed prostate cancer can be found, for example, by also using mass spec-
tissue, determining a mosaic of distinct protein clusters trometry or gene analysis. Despite having pre-decided
that were specific to the observed cancerous cells. on the proteins to investigate, TIS remains a tool for
Using disease-specific CMP knowledge, two routes discovery science because previously unsuspected pro-
can now be taken: The first to target lead proteins in tein interactions may still be identified. Combining TIS
the development of the cancer with specifically with new knowledge from genomics, transcriptomics,
designed drug therapies, the second to aid the early and proteomics allows various differentially expressed
detection of cancer development. In instances where proteins to be studied in the context of intact anatomy.
two diseases can have shared but distinct disease path- The promise that TIS can detect clear differences
ways, the subtle differences in lead proteins can aid between normal and unhealthy tissue, even if there
understanding of the triggers for pathogenic events that are few cells and the differences are subtle, opens up
result in cancer. In addition, with the CMP clusters not the prospect of easy, early-stage cancer detection using
likely to occur by chance, a correct diagnosis can still blood or, for colon cancer, stool samples. There is
be made even if the cells are at the early stages of further potential that cancer stem cells will become
cancer formation. As technological advances are detectable, and drug therapies will be developed to
made, these protein clusters can be looked for in specifically target stem cells restricting cancer growth.
other body samples, such as from the blood, enabling With greater understanding of the cell’s network path-
a less invasive method of early cancer detection. ways, drug design can be made more disease specific
Sometimes the knowledge of genomic loci involved and enable more precise targeting, whether this is in
in the pathogenesis of a disease is limited. Because the the form of an analgesic or perhaps a unique drug for
TIS technique can furnish a toponomic picture of what a rare disease.
is occurring in the cell, proteins can be detected assem-
bling on the cell surface. This surface assembly is part
of a complex communication network involved with Cross-References
many different cell functionalities and, by monitoring
these protein networks, key proteins can be determined ▶ TIS Robot
and disease-altered network dynamics detected. This is ▶ Toponome Analysis
valuable in situations where diseases are rare, or there
is limited knowledge of cell protein pathways in
a disease. Given sufficient typical protein clusters, by
References
using comparisons with protein clusters from known
networks, it is possible to predict disease behavior and Bhattacharya S, Mathew G, Ruban E, Epstein D, Krusche A,
effective therapies. Hillert R, Schubert W, Khan M (2010) Toponome Imaging
Using a systems biology approach, by combining System: In situ protein network mapping in normal and
cancerous colon from the same patient reveals more than
the TIS results with proteome analysis and
five-thousand cancer specific protein clusters and their sub-
transcriptome analysis, an interpretation of multiple cellular annotation by using a three symbol code. J Proteome
levels of gene expression can take place leading to Res 9:6112–6125
further analysis, for example, comparing the effect of Linke B, Pierre S, Coste O, Angioni C, Becker W, Maier T,
Steinhilber D, Wittpoth C, Geisslinger G, Scholich K (2009)
drug interactions on different patients. Drugs are gen- Toponomics analysis of drug-induced changes in
erally designed to influence protein interactions and so arachidonic acid-dependent signalling pathways during
the effects of the drug can be easily detected using TIS: spinal nociceptive processing. J Proteome Res 8:4851–4859
Changes in the protein networks of synapses due to the Schubert W, Bonnekoh B, Pommer A, Philipsen L,
B€ockelmann R, Malykh Y, Gollnick H, Friedenberger M,
analgesic drug dipyrone have already been observed
Bode M, Dress A (2006) Analyzing proteome topology and
(Linke et al. 2009). function by automated multidimensional fluorescence
While TIS is a powerful and widely applicable microscopy. Nat Biotechnol 24:1270–1278
stand-alone tool, it also works well in complement Schubert W, Gieseler A, Krusche A, Hillert R (2009) Toponome
mapping in prostate cancer: Detection of 2000 cell surface
with other tools as part of a systems biology pipeline.
protein clusters in a single tissue section and cell type specific
The experimenter specifies which proteins or other annotation by using a three symbol code. J Proteome Res
molecules to study using TIS, and new target proteins 8:2696–2707
Closure, Causal 415 C
Closed World Assumption, Table 1 Sample instances about
Clinical Decision Support Systems alumni and their degrees obtained
Alumnus Degree obtained
▶ Biomedical Decision Support Systems Delani PhD in Molecular Biology
Anna PhD in Ecology
Peter MSc in Informatics
Dalila PhD in Genetics C
Clinical Systems Pathology

▶ Systems Pathology Example. Take the sample data in Table 1 and


a query “Which alumni do not have a PhD?”. Then
under the CWA, it answers with “Peter” because under
the CWA the system assumes this information is all
Cliquishness there is in the world.

▶ Clustering Coefficient

Closure, Causal

Clonal Division Matteo Mossio


IAS-Research Philosophy of Biology Group,
▶ Modeling, Cell Division and Proliferation Department of Logic and Philosophy of Science,
University of the Basque Country (UPV/EHU),
Donostia – San Sebastian, Spain

Closed Chromatin
Definition
▶ Heterochromatin
In biological systems, closure refers to a holistic fea-
ture such that their constitutive processes, operations,
and transformations (1) depend on each other for their
Closed World Assumption production and maintenance and (2) collectively con-
tribute to determine the conditions at which the whole
C. Maria Keet ▶ organization can exist.
KRDB Research Centre, Free University of According to several theoretical biologists, the con-
Bozen-Bolzano, Bolzano, Italy cept of closure captures one of the central features of
biological organization since it constitutes, as well as
evolution by natural selection, an emergent and dis-
Definition tinctively biological causal regime. In spite of an
increasing agreement on its relevance to understand
The Closed World Assumption (CWA) is the assump- biological systems, no agreement on a unique defini-
tion that that what is not known to be true, is false, tion has been reached so far.
so that absence of information is interpreted as
negative information. It assumes that complete infor-
mation about a given state of affairs is provided, Characteristics
which is useful for constraining information and
validating data in an application such as a relational The concept of closure plays a relevant role in biolog-
database. This is contrasted with the Open World ical explanation since it is taken as a naturalized
Assumption. grounding for many distinctive biological dimensions,
C 416 Closure, Causal

as purposefulness, normativity, and functionality taken as a necessary but not sufficient condition to
(Chandler and Van De Vijver 2000). define biological organization. Biological systems, in
The contemporary application of closure to the bio- fact, constitute a subclass of autonomous systems,
logical domain comes from a philosophical and theoret- which realize a specific form of operational closure,
ical tradition tracing back at least to Kant who claimed, which Varela labels, with Humberto Maturana,
in the Critique of Judgment, that biological autopoiesis (▶ Systems, Autopoietic) (Varela 1979).
systems should be understood as natural purposes The specificity of operational closure as autopoiesis is
(Naturzwecke), i.e., systems in which the parts are recip- that, unlike other possible forms, it describes the sys-
rocally causes and effects of the others, such that the tem at the chemical and molecular level, and supposes
whole can be conceived as organized by itself, self- relations of material production among its constituents.
organized. The essence of living system is a form of A crucial distinction is usually made between orga-
internal and circular causality between the whole and the nizational/operational and material closure, where the
parts, distinct from both efficient causality of the phys- latter indicates the absence or incapacity to interact.
ical world and the final causality of artifacts (Kant 1985). While being organizationally closed, biological sys-
One of the most influential contemporary charac- tems are structurally coupled with the environment,
terizations of closure in the biological domain has been with which they exchange matter, energy, and infor-
provided by Francisco Varela (1979). In his account, mation. The concept of biological closure implies then
he builds on an algebraic notion, according to which “a a distinction between two causal levels, an open and
domain K has closure if all operations defined in it a closed one – an issue which have been more explic-
remain within the same domain. The operation of itly addressed by the account proposed by Robert
a system has therefore closure, if the results of its Rosen (Rosen 1991).
action remain within the system (Bourgine and Varela Rosen’s account is based on a rehabilitation and
1992, p. xii).” reinterpretation of the Aristotelian categories of cau-
Applied to biological systems, closure is realized as sality and, in particular, on the distinction between
what Varela labels operational (or organizational) efficient and material cause. Let us consider an
closure, which designates an organization of processes abstract mapping f between the sets A and B, such
such that “(1) the processes are related as a network, so that f: A¼> > B. Represented in a relational diagram,
that they recursively depend on each other in the gen- we have (Fig. 1):
eration and realization of the processes themselves, When applied to model natural systems, Rosen
and (2) they constitute the system as a unity recogniz- claims that the hollow-headed arrow represents mate-
able in the space (domain) in which the processes rial causation, a flow from A to B, whereas the solid-
exist” (Varela 1979, p. 55). headed arrow represents efficient causation,
It should be noted that Varela himself has proposed, a ▶ constraint exerted by f on this flow.
over time, slightly different definitions of operational Rosen’s central thesis is that “a material system is
closure. In addition, more recent contributions have an organism [a living system] if, and only if, it is closed
introduced a theoretical distinction between organiza- to efficient causation” (Rosen 1991, p. 244), whereas
tional and operational closure. Whereas “organiza- a natural system is closed to efficient causation if and
tional” closure indicates the abstract network of only if its relational diagram has a closed path that
relations that define the system as a unity, “operational” contains all the solid-headed arrows. It is worth noting
closure refers to the recurrent dynamics and processes that, unlike the varelian tradition, Rosen takes closure
of such a system (Thompson 2007). as the definition of biological organization.
In Varela’s view, operational closure is closely According to Rosen, the central feature of
related to ▶ autonomy, the central feature of living a biological system consists in the fact that all compo-
organization. More precisely, he enunciates the “Clo- nents having the status of efficient causes are materially
sure Thesis,” according to which “every autonomous produced by and within the system itself. At the most
system is operationally closed” (Varela 1979, p. 58). In general level, closure is realized in biological systems
principle, the class of autonomous systems realizing among three classes of efficient causes corresponding
operational closure is larger than the class of biological to three broad classes of biological functions (▶ Func-
systems. As a consequence, operational closure is tion, Distributed) that Rosen denotes as metabolism
Closure, Causal 417 C
f at least some of the constraints that make work possible.
The cycle is a thermodynamic irreversible process,
which dissipates energy and requires a coupling
between exergonic (spontaneous, which release energy)
and endergonic (non spontaneous, which require
energy) reactions, such that exergonic processes are
constrained in a specific way to produce a work, C
A B which can be used to generate endergonic processes,
which in turn generate those constraints canalizing
Closure, Causal, Fig. 1 Relational diagram, descriprion see exergonic processes. In Kauffman’s terms: “Work
text
begets constraints beget work” (Kauffman 2000).
A complementary account of closure has been pro-
f posed by Howard Pattee, who focused on its informa-
tional dimension (Pattee 1982). In his view, biological
organization consists of the integration of two
intertwined dimensions, which cannot be understood
separately. On the one side, the organization realizes
a dynamic and autopoietic network of mechanisms and
A B Φ
processes, which defines itself as a topological unit,
Closure, Causal, Fig. 2 Relational diagram, descriprion see structurally coupled with the environment. On the
text other side, it is shaped by the material unfolding of
a set of symbolic instructions, stored and transmitted as
genetic ▶ information.
(f: A¼>> B), repair (F: B¼>> f), and replication (B: According to Pattee, the dynamic/mechanistic and
f¼>>F) (Fig. 2). informational dimensions realize a distinct form of
By providing a clear-cut theoretical and formal closure between them, which he labels semantic clo-
distinction between material and efficient causation, sure. By this notion, he refers to the fact that while
Rosen’s characterization explicitly spells out that bio- symbolic information, to be such, must be interpreted
logical organization consists of two coexisting causal by the dynamics and mechanisms that it constrains, the
regimes: closure to efficient causation, which grounds mechanisms in charge of the interpretation and the
its unity and distinctiveness, and openness to material “material translation” require that very information
causation, which allows material, energetic, and infor- for their own production. Semantic closure, as an
mational interactions with the environment. interweaving between dynamics and information, con-
More recently, the scientific work on biological stitutes then an additional dimension of organizational
closure has been developed in various directions closure of biological systems, complementary to the
(Chandler and Van De Vijver 2000). In particular, operational/efficient one.
a thriving research line has specifically focused on
the critical nature of systems realizing closure, which
must maintain a continuous flow of energy and matter Cross-References
with the environment in conditions far from thermo-
dynamic equilibrium. To capture this dimension of ▶ Autonomy
closure, Stuart Kauffman has proposed the notion of ▶ Constraint
Work-Constraint cycle (Kauffman 2000). ▶ Emergence
The Work-Constraint cycle represents an interpreta- ▶ Explanation in Biology
tion of organizational closure that links the idea of ▶ Explanation, Functional
“work” to that of “▶ constraint,” the former being ▶ Function, Distributed
defined, as “constrained release of energy into relatively ▶ Holism
few degrees of freedom.” A system realizes a Work- ▶ Information, Biological
Constraint cycle if it is able to use its work to regenerate ▶ Organization
C 418 Cluster Analysis

References Characteristics

Bourgine P, Varela FJ (1992) Toward a practice of autonomous The main idea behind TIS (TIS robot) (▶ Clinical
systems. MIT Press/Bradford Books, Cambridge, MA
Aspects of the Toponome Imaging System (TIS);
Chandler JLR, Van de Vijver G (eds) (2000) Closure: emergent
organizations and their dynamics. Ann NY Acad Sci 901 ▶ TIS Robot) data clustering is to find a way to interpret
Kant I (1987/1790) Critique of judgment. Hackett, Indianapolis ▶ toponome images without applying thresholds. Thus,
Kauffman S (2000) Investigations. Oxford University Press, a toponome image recorded with an antibody library of
Oxford
N tags is considered a N-dimensional data of fluores-
Pattee HH (1982) Cell psychology: an evolutionary approach
to the symbol–matter problem. Cogn Brain Theory cence grey value patterns g(x,y) ¼ (g1, . . ., gN)(x,y) with
5(4):325–341 (x, y) as pixel coordinates and gj(x,y) denoting the grey
Rosen R (1991) Life itself: a comprehensive inquiry into the value for the j-th tag at pixel (x, y). ▶ Clustering pro-
nature, origin, and fabrication of life. Columbia University
vides a valuable method for partitioning this data into
Press, New York
Thompson E (2007) Mind in life: biology, phenomenology, groups of similar grey value patterns. This partitioning
and the sciences of mind. Harvard University Press, has two aims: First, clustering achieves a data reduc-
Cambridge, MA tion, so that data sets can be analyzed and compared on
Varela FJ (1979) Principles of biological autonomy. North
a meta-level. Second, one wants to address the fact that
Holland, New York
similar co-location patterns may also belong to the
same functional group or to the same hierarchically
organized network. If cluster analysis is to be applied
Cluster Analysis to TIS data, the following aspects have to be
considered.
Tim W. Nattkemper Like in any other clustering application, it is not
Biodata Mining Group, Faculty of Technology, possible to define a criterion for an optimum cluster
Bielefeld University, Bielefeld, Germany result, since clustering tries to achieve two opposing
goals: connectedness and compactness (Handl et al.
2005). Transferred to the context of toponomics data
Synonyms ▶ clustering this can be seen as the conflict between
the two ideas:(1) to generate large clusters that repre-
Vector quantization sent one frequent co-fluorescence pattern with some
variation, and (2) to form small compact clusters that
contain only those proteins which are utmost similar.
Definition Cluster analysis algorithms can be divided into the
following two groups.
A large and high-dimensional TIS (TIS robot) (▶ Clin- Hierarchical cluster analysis methods organize the
ical Aspects of the Toponome Imaging System (TIS); input data (i.e., the fluorescence grey value patterns of
▶ TIS Robot) data set is reduced to a small set of protein one or two TIS images) into a tree structure exposing
co-location prototypes without the application of thresh- the relationships from the most similar to the most
olds. Different indices are proposed to evaluate the different fluorescence grey value patterns. In contrast,
▶ clustering result or to estimate an appropriate number partitioning or vector quantization cluster algorithms
of clusters in a data set. To allow visual diagnostics and successively assign each pattern g(x,y) to one distinct
to apply biological expert knowledge, a special visual- group, i.e., cluster. In the clustering result, each of the
ization icon is proposed. This way, important molecular clusters is characterized by a typical representative or
co-location patterns can be identified and localized in the cluster center, which is usually referred to as the
the image, which helps in different fields of systems prototype or the codebook vector. The most popular
biology (▶ Systems Biology Pathway Exchange partitioning cluster method is the ▶ K-means
(SBPAX)) like ▶ pathway analysis or protein network approach (Lloyd 1982). The objective is to find a set
analysis or other ▶ toponomics applications. of K prototypes so that the distances between each
Cluster Analysis 419 C
data item and its closest prototype are minimized. a certain range (e.g., [0; 1]) using a linear scaling or
Although the obtained clustering result may represent some nonlinear scaling function such as tan h (gj(x,y)),
only a locally optimal solution rather than the global so for all prototype components it is cj(k) 2 [0; 1]. An
optimum, the strategy is often used due to its algorith- essential first step in the evaluation is the visual inspec-
mic simplicity and efficiency, and has been recently tion of the prototypes, so a graphical display is needed
voted in the list of top ten algorithms in data mining to support visual ▶ data mining. A visual inspection of
(Wu et al. 2008). For large data sets such as toponome the prototypes allows the identification of interesting C
data, the consideration of each data item for prototype protein co-location patterns in the data and
adaptation can be computationally expensive, which a comparison with ▶ combinatorial molecular pheno-
motivates an online K-means version applying the types (CMPs), without the need of analyzing single
winner-takes-all (WTA) rule. It is common to repeat- images which eases the knowledge discovery process.
edly run the algorithm with different initializations If interesting prototypes are found, the associated data
(Hastie et al. 2001) and to take that solution as items can be analyzed in a subsequent step following
the clustering result, which shows the smallest the Shneiderman mantra of “Overview first, zoom in,
intra-class variance. Neural Gas clustering claims to and filter, details on demand.” However, suitable pro-
be an enhancement as it takes into account totype visualization is not as straightforward as it
a “neighborhood ranking” of all proteins that are seems. One way to display multivariate data are
assigned to a cluster – an advantage bought by an glyph or icon displays. According to Colin Ware “A
increase in computational running time. In TIS cluster glyph is a graphical object designed to convey multiple
analysis, one should usually favor partitioning cluster data values” (Ware 2004, p.145). Each data feature is
algorithms, since the tree plot for an entire image mapped to a different graphical attribute of the glyph
should be too large for a visual analysis. Thus, the such as size, shape, or color. We apply a glyph-based
following part of this entry focuses on partitioning visualization approach, which has been developed to
cluster algorithms. suit the needs of non-binarized signal co-location anal-
A number of quality measures have been proposed ysis (Loyek et al. 2011a, b; K€olling et al. 2012). This
to evaluate the outcomes of cluster algorithms. glyph approach combines visualization aspects known
Because of its opposing goals, no definite criterion from bar charts and star glyphs and is to some extent
can be formulated that describes an optimal clustering inspired by the sequence logo display, which represents
of a dataset. This pertains not only to the applied patterns in nucleotide or amino acid sequences. In
cluster algorithm but also to the “true” number of a sequence logo, for each position of a set of aligned
clusters of a dataset. Proposed measures that base sequences, e.g., nucleotide sequences, the four nucleo-
solely on the clustering itself and the underlying tides are arranged on top of each other sorted according
dataset range from early approaches (Calinski and to their frequency at that position. The character height
Harabasz 1974; Davies and Bouldin 1979) up to represents the frequency of the according nucleotide.
novel instruments (Yeung et al. 2001; Maulik and Through this visualization, a rapid identification of
Bandyopadhyay 2002). Another kind of assistance in prominent sequence patterns can be achieved as high
choosing a cluster algorithm was recently proposed in frequent nucleotides can directly be “read” from the
Yeung et al. (2001). They delineated an instrument logo. To construct a glyph for one signal co-location
called Figure of Merit (FOM) to evaluate cluster solu- ck (k ¼ 1,. . ., K), a horizontal box is drawn for each data
tions. The idea of their method is to integrate a kind of feature (see Fig. 1a). The height, as well as the length, of
bootstrapping approach and thereby estimate the pre- each box is scaled according to the feature’s value. To
dictive power of a cluster algorithm. increase differentiation between neighboring boxes,
One important aspect in TIS cluster analysis is the they are alternatingly colored in black and light shaded
evaluation of the outcome, i.e., the cluster result, which grey. This follows Ware’s suggestion for star glyphs or
is, in case of partitioning methods, represented by whisker plots to increase the number of dimensions by
the set of K prototypes {c(k)}k ¼ 1,. . ., K. Usually, changing length and width of the bars as well as using
the grey values of the images are transformed to different luminance levels. Furthermore, by employing
C 420 Cluster Analysis

Cluster Analysis, Fig. 1 (a) Generation of a glyph for one the assigned prototype color, are given. For comparison, two
cluster prototype c(k): Each box is scaled in height and width classic bar plot displays for two 15-dimensional cluster proto-
according to the features’ values. The prototype names (P1, . . ., types are displayed. Since this glyph display is much more
PN) are written in the boxes for fast association of the proteins to compact compared to a standard box plot (b), more prototypes
the boxes. The relative and absolute abundance of the prototype, can be displayed on the same screen space and a comparative
i.e., how many pixel are associated to that prototype, as well as analysis is easier

length as an attribute for data representation, a well- course). The grey box with the overlaid number shows
suited graphical parameter to encode quantitative data the relative abundance, below the absolute abundance
has been chosen, as has already been discussed for the of pixels is displayed, which are assigned to this proto-
bar plot. To allow for a fast identification of prominent type according to the best match criterion. This meta-
proteins, the protein names are directly incorporated information is very valuable for prototype analysis.
into the visualization. To this end, the associated protein If this glyph visualization is compared to a classic bar
name is written in each bar and scaled in height and graph (see Fig. 1b), it is evident that in the glyph display
length analogue to the bar itself. With this strategy, the association of proteins to individual graphical attri-
prominent protein co-localization can easily be identi- butes is much easier. Furthermore, besides being able to
fied by “reading” the glyph analogous to the reading of rapidly identify the dominant proteins, an advantage
a sequence logo. Figure 1a displays the construction of of the glyph display is that only features with high
the glyph display and shows two glyph examples on the values allocate space, whereas low value features are
right. In addition to the special box plot display of the squeezed in contrast to Fig. 1a). Thereby, space is only
feature vector components, the top bar of the glyph allocated proportional to the importance of the protein
shows the color which is assigned to the glyph, and and the total size of the glyph reflects the amount of
meta-information as the associated data set and the information provided by the prototype. In some appli-
prototype number can be displayed here as well. How cations, this might not be a desirable feature so that bar
color will be assigned to each prototype will be graphs, or glyphs with constant bar width would better
explained below. On the right, the glyph displays infor- be suited.
mation about the abundance of the feature combination Different strategies to assign color to the prototypes
in one selected TIS image (or in a set of TIS images of and the corresponding glyphs can be proposed. If one
Cluster of Differentiation 421 C
wants to achieve preservation of topologies between References
the cluster positions in the N- dimensional co-location
space and the color space, i.e., similar clusters get Calinski T, Harabasz J (1974) A dendrite method for cluster
analysis. Commun Stat Theory Methods 3:1–27.
similar colors, dimensional reduction can be applied.
doi:10.1080/03610927408827101. http://dx.doi.org/10.1080/
Using for instance Sammon mapping or ▶ principal 03610927408827101
component analysis (PCA), the N-dimensional proto- Davies DL, Bouldin DW (1979) A cluster separation measure.
types are mapped into a 3D color space where each IEEE Trans Pattern Anal Mach Intell PAMI-1:224–227 C
Handl J, Knowles J, Kell D (2005) Computational cluster
prototype is associated with a three-dimensional color
validation in post-genomic data analysis. Bioinformatics
vector h ¼ (h1, h2, h3).. A whole pseudo-color image 21:3201–3212
can be rendered by finding for each feature vector g(x,y) Hastie T, Tibshirani R, Friedman JH (2001) The elements of
its best matching prototype Nearest Neighbor statistical learning. Springer, New York
K€
olling J, Langenk€amper D, Abouna S, Khan M, Nattkemper
Method and its color. The pixel position x, y is plotted
TW (2012) WHIDE—a web tool for visual data mining
in the corresponding color. The most common color colocation patterns in multivariate bioimages. Bioinformat-
space is the RGB space, where color is produced by ics 28(8):1143–1150. doi:10.1093/bioinformatics/bts104.
additively mixing red, green, and blue primaries. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans
Inf Theory 28:129–137
Another widely used space is the HSV space, where
Loyek C, Bungkowski A, Nattkemper TW (2011a) Web2.0
color is described by means of hue (H), saturation (S) paves new ways for collaborative and exploratory analysis
and value (V). of chemical compounds in spectrometry data. J Integr
Bioinform 8(2):158. doi:10.2390/biecoll-jib-2011-158
Loyek C, Langenk€amper D, K€ olling J, Niehaus K,
Nattkemper TW (2011b) A Web2.0 strategy for the collabo-
Clustering Algorithms and Tools rative analysis of complex bioimages. In: Advances in intel-
In general, there is a large variety of clustering algo- ligent data analysis X: 10th international symposium, IDA
rithms and tools; open source tools widely used in 2011, Porto, October 29–31, 2011, Proceedings. Lecture
notes in computer science (Applications, incl. Internet/
computational biology include Weka (Weka, Machine
Web, and HCI). Springer, Berlin/Heidelberg, pp 258–270
Learning Tool) and R (R, Data Analysis Tool) Maulik U, Bandyopadhyay S (2002) Performance evaluation of
▶ R, Programming Language. some clustering algorithms and validity indices. IEEE Trans
Pattern Anal Mach Intell 24:1650–1654
Ware C (2004) Information visualization: perception for design.
Morgan Kaufmann, San Francisco
Wu X, Kumar V, Ross GJ, Yang Q, Motoda H, Mclachlan G,
Cross-References Ng A, Liu B, Yu P, Zhou ZH, Steinbach M, Hand D,
Steinberg D (2008) Top 10 algorithms in data mining.
Knowl Inf Syst 14:1–37
▶ Clinical Aspects of the Toponome Imaging System
Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for
(TIS) gene expression data. Bioinformatics 17:309–318
▶ Clustering
▶ Clustering, k-Means
▶ Data Mining
▶ Information Theory and Toponomics Cluster of Differentiation
▶ Knowledge Acquisition
▶ Learning, Supervised Rohan Dhiman
▶ Model Testing, Machine Learning Center for Pulmonary and Infectious Disease Control,
▶ Model Training, Machine Learning University of Texas Health Science Center at Tyler,
▶ Principal Component Analysis (PCA) Tyler, TX, USA
▶ Subset Surprisology and Toponomics
▶ TIS Robot
▶ Topology and Toponomics Definition
▶ Toponome Analysis
▶ Toponomics It stands for cluster of differentiation. It is
▶ Unsupervised Learning Methods a method conceptualized in 1st international
C 422 Clustering

workshop and conference on human leukocyte of the other clusters. From different clustering types,
differentiation antigens to provide nomenclature to there are three clustering patterns as below.
all the cell surface molecules present on white
blood cells. It includes various molecules some
which either act as ligands or receptors (Zola et al.
2005; Bernard and Boumsell 1984; Fiebig et al.
1984).

Cross-References

▶ Fluorescent Markers

References

Bernard A, Boumsell L (1984) Human leukocyte differentiation


antigens. Presse Méd 13(38):2311–2316
Fiebig H, Behn I, Gruhn R, Typlt H, Kupper H, Ambrosius H
(1984) Characterization of a series of monoclonal anti-
bodies against human T cells. Allerg Immunol (Leipz)
30(4):242–250
Zola H, Swart B, Nicholson I, Aasted B, Bensussan A, Boumsell
L, Buckley C, Clark G, Drbal K, Engel P, Hart D, Horejsı́ V, References
Isacke C, Macardle P, Malavasi F, Mason D, Olive D,
Saalmueller A, Schlossman SF, Schwartz-Albiez R, Cornish (2007) Cluster analysis. Mathematics learning support
Simmons P, Tedder TF, Uguccioni M, Warren H (2005) chapter 3.1.
CD molecules 2005: human cell differentiation molecules.
Blood 106(9):3123–3126

Clustering Coefficient

Clustering Guilhem Chalancon, Kai Kruse and M. Madan Babu


MRC Laboratory of Molecular Biology, University of
Kejia Xu Cambridge, Cambridge, UK
Institute of Systems Biology, Shanghai University,
Shanghai, China
Synonyms

Definition Cliquishness; Density of a subgraph; Transitivity;


Watts-Strogatz local clustering coefficient
A loose definition of clustering could be “the process
of organizing objects into groups whose members are
similar in some way.” Each cluster is then character- Definition
ized by the common attributes of the entities it con-
tains. Usually, the objects in a cluster are “more Many problems in network analysis converge to the
similar” to each other and “less similar” to the objects question of the cohesion of a graph, aiming to determine
Clustering Coefficient 423 C
the extent to which the nodes of a network are closely divided by the total number of connected triplets
connected with one another. Numerous strategies exist t3 found in this given set. A connected triplet
to measure the cohesion of a network and therefore corresponds to three nodes connected by at least
several metrics can be used (Kolaczyk 2009). In this two edges. Triangles are cliques of three nodes, in
respect, the clustering coefficient of a graph is widely which each node is directly connected to the two
used in network analysis. One can distinguish between others.
local measurements of the clustering of nodes in a graph Therefore, the intuition behind the global cluster- C
and global measurements of the clustering coefficient of ing coefficient is to measure the density of connected
an entire graph. triplets that form triangles in a graph, which indicates
how much edges are clustered together. This is also
referred to as the density of a subgraph (Kolaczyk
Characteristics 2009).
For a subset of nodes v 2 V in a graph G;
Local Clustering Coefficient
In a graph G composed of a set of vertices (nodes) V
tD ðvÞ
and a set of edges (links) E, the local clustering coef- 8v 2 V s:t: t3 ðvÞ > 0; cLðv 2 V Þ ¼
ficient, denoted Cv , measures the local density of edges t3 ðvÞ
in the neighborhood of a node v 2 V. Cv corresponds to
the proportion of links within the immediate neighbor- For the nodes v being part of connected triplets
hood of v, among all possible links connecting v and its (i.e., t3 ðvÞ > 0; ) Cv ¼ cLðvÞ. Therefore, the exten-
neighbors. sion of this definition to a whole graph G; which
Let ne ðvÞ denote the number of edges that connect defines the global clustering coefficient cLðGÞ;
the immediate neighbors of a node v. Let kv denote the is equal to the average of the local clustering coeffi-
node degree of v, that is, its number of immediate cient of nodes such that t3 ðvÞ > 0. Defining
neighbors. The maximal number of possible edges kV kas the size of such set of nodes, then cLðGÞ is
between the neighbors of v is therefore kv ðkv  1Þ=2 obtained by:
for an undirected graph. Therefore, in an undirected
P
graph with no self-loops, Cv is given as:
v2V tD ðvÞ 1 X
cLðGÞ ¼ ¼ Cv
t3 ðGÞ kV k v2V
2ne ðvÞ
Cv ¼
kv ðkv  1Þ
This definition is commonly used in network
With directed edges, the number of possible links in biology (Dong and Horvath 2007), for instance,
the neighborhood of v becomes kv ðkv  1Þ. to measure the modular organization of metabolic
Therefore: networks (▶ Metabolic Networks, Structure and
Dynamics).
ne ðvÞ cLT ðGÞ; also referred to as the transitivity of
Cv ¼
kv ðkv  1Þ a graph, is given by:
In both cases, a clustering coefficient of Cv ¼ 1 will
P
v2V t3 ðvÞcLðvÞ 3tD ðvÞ
indicate that v and its neighbors form a clique, that is,
a fully connected cluster of nodes (see Fig. 1). cLT ðGÞ ¼ ¼
t3 ðGÞ t3 ðGÞ
Inversely, a value of Cv close to zero indicates that v
is part of a loosely connected set of nodes.
As for the local definition of the clustering
Global Clustering Coefficient coefficient, a value of 1 indicates that the graph is
The global clustering coefficient cL corresponds to fully connected. Clustering coefficients are often
the number of triangles tD found in a set of nodes used in network biology to measure the cohesion
C 424 Clustering Coefficient

a
Local clustering coefficient
CL CL
IQU IQ
E UE
v

kv =2 kv =2 kv =3 kv =3 kv =4

ne(v )= 0 ne(v )= 1 ne(v )= 0 ne(v )= 1 ne(v )= 5

Cv =0 Cv =1 Cv =0 Cv =0.33 Cv =1

EDGES CONNECTED TO vi EDGES BETWEEN NEIGHBORS OF vi OTHER EDGES

b
Global clustering coefficient
Example
Triplet Triangle LOCAL DENSITY
(3) (Δ) τΔ(v )
cL(V ) =
τ3(v )

TRANSITIVITY
3τΔ(G )
cLT(G) =
τ3(G )

CLUSTERING COEFFICIENT

1
cL(G ) = Σ cL(v)
V⬘ v∈V⬘ G(V=7,E=15)

G(V=9,E= 36) cL(G) = 0.814

EDGES BEING PART OF A TRIPLET EDGES FORMING A TRIANGLE

Clustering Coefficient, Fig. 1 The density of connections in definition of the clustering coefficient relates to the proportion
a graph can be approached by local and global definitions of the of triplets that form triangles in a graph. The clustering coeffi-
clustering coefficient. (a) The local clustering coefficient repre- cient corresponds to the mean value of the local clustering
sents the density of connections among the neighbors of a node, coefficient (referred to as local density), while the transitivity
and ranges from 0 to 1. The higher the value, the more the node is of a graph gives the probability that the direct neighbors of
part of a densely connected cluster of nodes. A value of 1 a node are connected
indicates that the node is part of a clique. (b) The Global

into modules of protein–protein interactions or References


metabolic pathways.
Dong J, Horvath S (2007) Understanding network concepts in
modules. BMC Syst Biol 1:24
Cross-References Kolaczyk ED (2009) Statistical analysis of
network data. Number XII. Springer Series in Statistics,
▶ Metabolic Networks, Structure and Dynamics pp 94–97
Clustering Methods 425 C
Besides, there are also some other distances, such as
Clustering Methods Hamming distance, Mahalanobis distance, Chebyshev
distance, Minkowski distance, Euclidean distance,
Jiguang Wang Spearman distance, Jaccard coefficient, city block
Beijing Institute of Genomics, Chinese Academy of metric, and so on.
Sciences, Beijing, Beijing, China Based on similarity, there are many types of
clustering methods with different advantages and C
weaknesses.
Synonyms
K-Means Clustering
Automatic classification methods; Numerical taxon- Given n numeric points in d dimensional space and
omy methods; Typological analysis methods; fixed integer k, k-means clustering (▶ Clustering,
Unsupervised learning methods k-Means) attempts to partition these points into k
groups so as to minimize the sum of within-cluster
sum of deviation
Definition
X
k X 
Clustering methods are unsupervised data mining  x j  mi  ;
methods that assign objects into groups based on simi- i¼1 xj 2Gi

larity. Clustering methods such as k-means clustering,


hierarchical clustering, graph partitioning, spectral clus- where Gi is the ith group, and mi is the mean of points
tering, etc., have been widely used in marketing, library, in Gi
image processing, medicine, biology, and other fields. This problem has been proved to be ▶ NP-hard, and
it can be solved in time Oðndkþ1 log nÞ (Inaba et al.
1994), so it is usually solved with heuristic algorithm
Characteristics (▶ Heuristic optimization) as follows:
Step 1: Assign all points to a group by random;
▶ Clustering or ▶ clustering analysis is the process of Step 2: Repeat step 2.1 and step 2.2 until stable:
putting objects into groups whose members are simi- Step 2.1: Compute centroid of each group;
lar, so an important step in clustering methods is to Step 2.2: Reassign each point to its nearest centroid.
define similarity or distance. k-Means is simply and easy to understand, so it has
been widely used and works well in most applications,
Similarity Measure but the weaknesses are obvious too. The number of
Similarity or distance (dissimilarity) that determines groups k should be artificially chosen. k-Means tends
the probability of two objects to be in one group is to identify spherical clusters and is sensitive to outliers.
important in clustering. There are several frequently Centroid positions are easily to be distorted by outliers.
used methods to measure similarity or distance. In
Euclidean n-space, if x ¼ ðx1 ; x2 ; ; xn Þ; and Hierarchical Clustering
y ¼ ðy1 ; y2 ; ; yn Þ; then Instead of direct partition, hierarchical clustering aims
• The Euclidean distance is defined to build a dendrogram (▶ hierarchy) of groups.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P n According to the strategy, hierarchical clustering can
as ðxi  yi Þ2 be divided into two types, “agglomerative” and “divi-
i¼1
P
n sive” (Fowlkes and Mallows 1983). “Agglomerative”
• The Manhattan distance is defined as jxi  yi j is a “bottom up” method following two steps:
i¼1
• The ▶ Pearson’s correlation coefficient is Step 1: Treat each object as a group.
Pn
Step 2: Merge the closest pair of groups until there is
ðxi 
xÞðyi 

defined as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
i¼1
ffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi only one group left.
P n
2
P n
2 By contraries, “divisive” starts from one big group,
ðxi 
xÞ ðyi 

i¼1 i¼1
and follows a “top down” rule to divide groups.
C 426 Clustering Methods

In hierarchical clustering, it is necessary to measure the original n objects with k-dimension vectors.
the similarity or dissimilarity of two groups to decide Then use k-means algorithm to partition all these
which two groups should be merged or how a group vectors into k clusters. Similarly, there are also revised
should be divided. Several methods have been devel- algorithms like normalized spectral clustering
oped. Single linkage clustering defines distance according to Shi and Malik (2000) and normalized
between groups as the distance between the closest spectral clustering according to Ng et al. (2002).
pair of objects, one from each group, while complete Spectral clustering is successful mainly based on
linkage is opposite, defining that as the distance the fact that it does not make strong assumptions on the
between the most distant pair of objects. Average link- form of the clusters, but at the same time spectral
age clustering defines that as the average of distances clustering can be quite unstable under different choices
between all pairs of objects, where each pair is made up of the parameters or neighborhood graphs. It should be
of one object from each group. Besides, there are also used with care.
some other approaches like average group linkage, In addition to these classical clustering methods,
Ward’s hierarchical clustering method, and so on. there are also some other related methods including
Hierarchical clustering is also one of the most com- fuzzy c-means clustering (Bezdek and Ehrlich 1984),
monly used clustering methods. It has the advantage graph-theoretic methods (Sharan and Shamir 2000),
that any valid measure of distance can be used. Com- biclustering (Madeira and Oliveira 2004), probabilistic
paring with k-means, the observations themselves are clustering (Nikhil et al. 2005), Markov clustering
not necessary. Hierarchical clustering does not really (Dongen 2000), support vector clustering (Ben-Hur
produce groups, and the user should artificially decide et al. 2001), and so on. Clustering methods are useful
where to split the tree into groups. Besides, hierarchi- in exploring data, but most methods are still very ad
cal clustering is also sensitive to noise and outliers. hoc, depending on choosing proper similarity metric
and parameters.
Spectral Clustering
As described by Luxburg (2007), spectral clustering is
one of the most popular modern clustering algorithms. Cross-References
The basic idea of spectral clustering is to make use of
the spectrum of the graph Laplacian of data to perform ▶ Clustering
dimension reduction for clustering in low dimension. ▶ Clustering, k-Means
The first step in spectral clustering is to transform ▶ Heuristic Optimization
similarities or distances into a similarity graph to ▶ Heuristic Optimization
model the local neighborhood relationships between ▶ Hierarchy
objects. There are several methods to construct the ▶ NP-Hard
graph. e-neighborhood graph connects two objects ▶ Pearson Correlation Coefficient
when their distance is smaller than given e; k-nearest
neighbor connects an object with k nearest neighbors,
and kernel-based fully connected graph connects each
References
pair of objects with a weighted edge. Finally,
a similarity graph matrix W is obtained to represent Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support
local relationships between objects. The graph vector clustering. J Mach Learn Res 2:125–137
Laplacian is defined as Bezdek JC, Ehrlich R (1984) FCM: the fuzzy c-means clustering
algorithm. Comput Geosci 10(22):191–203
Dongen SV (2000) Graph clustering by flow simulation. Ph.D.
L ¼ D  W; thesis. University of Utrecht
Fowlkes EB, Mallows CL (1983) A method for comparing two
P
n hierarchical clusterings. J Am Stat Assoc 78(384):553–584
where D is a diagonal degree matrix with dii ¼ wij : Inaba M, Katoh N, Imai H (1994) Applications of weighted
j¼1 voronoi diagrams and randomization to variance-based
Based on the graph Laplacian, given number k-clustering. Proceedings of 10th ACM symposium on
k of clusters, unnormalized spectral clustering algo- computational geometry. 332–339. doi:10.1145/
rithms compute the first k eigenvectors to represent 177424.178042
Clustering, Hierarchical 427 C
Luxburg UV (2007) A tutorial on spectral clustering. Stat clustering is simple to apply and in comparison to other
Comput 17(4):395–416 clustering methods makes no parametric assumptions
Madeira SC, Oliveira AL (2004) Biclustering algorithms for
biological data analysis: a survey. IEEE Trans Comput Biol about the size, shape, or quantity of clusters (Jain and
Bioinform 1(1):24–45 Dubes 1988).
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis
and an algorithm. In: Dietterich T, Becker S, Ghahramani Dissimilarity Measure
Z (eds) Advances in neural information processing, systems
14. MIT, Cambridge, pp 849–856 Unlike k-means clustering (▶ Clustering, k-means), C
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic hierarchical clustering does not require one to specify
fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst the number of clusters a priori. In contrast, it uses
13(4):517–530 a dissimilarity measure.
Sharan R, Shamir R (2000) CLICK: a clustering algorithm with
applications to gene expression analysis. In: Proceedings of The choice of the dissimilarity measure depends on
ISMB’00, pp 307–316 the data type (nominal, ordinal, ratio etc.). Common
Shi J, Malik J (2000) Normalized cuts and image segmentation. examples include Euclidean distance, Mahalanobis
IEEE Trans Pattern Anal Mach Intell 22(8):888–905 distance, Manhattan distance, and others.

Linkage Method
Recursive splitting or merging of the clusters can be
Clustering, Hierarchical represented by a binary tree known as dendrogram.
The tree provides a simple and convenient visualiza-
Larissa Stanberry tion of data hierarchy. Based on the dissimilarity
Bioinformatics and High-throughput Analysis measure, the data are grouped into a hierarchical tree
Laboratory, Seattle Children’s Research Institute, called dendrogram using a linkage method. The three
Seattle, WA, USA most popular linkage methods in hierarchical agglom-
erative clustering are
• Single linkage, or the nearest neighbor, method.
Definition Here, the dissimilarity between two groups is
defined as the minimum pairwise distance of points
Hierarchical clustering represents a collection of in the two groups.
methods which seeks to partition the data into groups • Complete linkage, or the furthest neighbor, where
by building a hierarchy of clusters. the dissimilarity between two clusters is taken as the
maximum distance between points in the clusters.
• Average linkage uses the average dissimilarity
Characteristics between the groups.
If the grouping structure is compact with
The goal of a clustering method is to partition the data well-separated clusters, all of the three clustering
into distinct groups where each group contains objects methods produce a similar result.
with similar characteristics. Hierarchical clustering Single linkage has certain advantages as compared
refers to the collection of methods which builds to other methods. In particular, it is known to correctly
a hierarchy of clusters. The hierarchical clustering identify the underlying structure of the data. This is not
methods are subdivided into two categories necessarily the case for the other linkage methods,
• Agglomerative, where each datum initially repre- unless points of high density are sufficiently separated.
sents a cluster and the two clusters are merged For example, when sets of original points become
together at each step of the hierarchy. Hence, the close or overlap, the complete, average, and K-means
hierarchy is built from the bottom up. clustering (▶ Clustering, k-means) yield several large
• Divisive, where initially all data represents a single clusters, giving an impression of distinct grouping in
cluster that is split as one moves down the the data regardless of the density. It has been shown
hierarchy. that the large clusters produced by these algorithms
Agglomerative clustering has been extensively depend on the range, but not on the true density of the
studied and applied in numerous studies. Hierarchical data set. Single linkage behaves differently, producing
C 428 Clustering, Hierarchical

Clustering, Hierarchical,
Fig. 1 The binary tree for 1.6
the simulated example 7 2
1.4
1
9 1.2
2
1.0

height
0 4
3
0.8
1 0.6
−1 5
10
10 3 4 7 9 6
8 0.4 1
6
−2 0.2 5 8

−2 −1 0 1 hclust (*, "single")

long-chained clusters that are difficult to interpret. The data set consisting of ten two-dimensional points
“chaining” effect indicates the lack of spatial separa- labeled from 1 to 10 (Fig. 1). Initially every point is
tion in case of touching or overlapping clusters. a cluster. The first step is to compute all pairwise
In addition, the single linkage is fully consistent for distances between the points. Here, we use the Euclid-
separating two disjoint, high-density clusters in one ean. The two closest points are 5 and 8 and will be
dimension, and is fractionally consistent for data in merged to form a new cluster. The next smallest dis-
higher dimensions. The fractional consistency means tance is between points 1 and 5, but 5 has already been
that if two disjoint population groups exist, there will merged with 8, thus 1 is added to a cluster {5, 8}. Next,
be two distinct single linkage clusters containing points 3 and 4 are merged together and so on. The
a positive fraction of the sample points from the process continues until all the nodes form a single
corresponding population groups. Hence, the single cluster. The result is the single linkage dendrogram
linkage is in a sense conservative, as it will not neces- shown in the left plot. The original points are marked
sarily detect all clusters, but will identify modal on the tree.
regions separated by a sufficiently deep valley The single linkage clusters can be obtained by cut-
(Hartigan 1975, 1977). ting the tree at a certain level. In the example, deleting
the largest edge will yield to two clusters a singleton
Cluster Identification {2} and a cluster containing all the remaining nine
The clusters are typically defined by setting an appro- points. Cutting, for example, at 0.9 would give three
priate inconsistency threshold, which is equivalent clusters {2}, {3,4,7,9}, and {10,6,1,5,8}.
to cutting the dendrogram at a certain level. For that To demonstrate the performance of the hierarchical
the length of each link in a cluster tree is compared clustering, we classify the famous iris data set that
with the length of neighboring links below it. If the gives the measurements (in cm) of the four variables,
length of the link does not differ significantly from the including sepal length and widths, and petal length and
length of the neighboring links, it means that objects, width. The data was collected on the three species of
joined at this level of the hierarchy, have similar Iris setosa, versicolor, and virginica and is freely avail-
characteristics. Large differences would indicate able as a part of R package.
some dissimilarity between the objects, and the link Figure 2 shows the dendrogram trees for the single,
is said to be inconsistent. Alternatively, one can con- complete, and average linkage. The three iris species
sider an adaptive rule that determines inconsistency are color coded to show their relative positions in the
threshold for each link. tree. The average and single-linkage trees look similar
misplacing only a few versicolor species into the
Example virginica branch. The complete linkage performs the
To gain a better understanding, consider the following worst as it identifies half of the virginica species to be
example of the single linkage algorithm for a small closer to the setosa and half to the versicolor. The
Clustering, Hierarchical 429 C
1.5
Height

1.0

0.5

0.0
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
v
v
g
v
g
g
g
v
v
v
v
v
v
v
g
g
g
v
v
v
v
v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
v
v
v
v
v
v
v
v
v
g
v
g
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
Single Linkage C
6
Height

0
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
gvv
vv
vv
vv
vv
vv
g v
gv
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gv
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gs
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
s
Complete Linkage

4
3
Height

2
1
0
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
sv
s
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gv
g
gv
g
v
v
v
v
gvv
v
v
v
v
gv
g
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
Average Linkage

Clustering, Hierarchical, Fig. 2 Binary trees for the iris data for the single-, complete and average linkage methods. The three iris
species were color-coded

misclassification error is about 9% for the average and ELKI package implements a single-linkage clustering
single linkage and is 16% for the complete linkage among others. Orange is a free data-mining software
method. that can be used for clustering. Hierarchical Clustering
Explorer is another useful tool for data visualization
Divisive Clustering and cluster analysis.
Divisive clustering was studied and used considerably
less extensively. The divisive method can be accom-
plished by recursively subdividing groups into two
clusters. The splitting continues until every cluster is Cross-References
a singleton. The divisive clustering does not necessar-
▶ Clustering, k-Means
ily leads to the convenient representation of the data as
a binary tree, although some methods were proposed to ▶ Clustering, Model-Based
▶ Model Testing, Machine Learning
address this issue.

Clustering Software and Tools


There are a number of existing software that offer References
various types of hierarchical clustering. The
R Project for Statistical Computing offers a number Hartigan JA (1977) Distribution problems in clustering. In:
of clustering options; for details see the CRAN task van Ryzin J (ed) Classification and clustering. Academic,
New York
view on Cluster Analysis and Mixture Models. The Hartigan JA (1975) Clustering algorithm. Wiley, New York
statistical package in Matlab (Mathworks) includes Jain AK, Dubes RC (1988) Algorithms for clustering data.
functions for agglomerative hierarchical clustering. Prentice-Hall, Englewood Cliffs
C 430 Clustering, k-Means

In K-means, the cluster labels are determined by an


Clustering, k-Means iterative algorithm that alternates between the two
steps:
Larissa Stanberry 1. Given the set of K means, assign each observation to
Bioinformatics and High-throughput Analysis the cluster with the closest mean.
Laboratory, Seattle Children’s Research Institute, 2. Update the K means by computing an average of the
Seattle, WA, USA observations within each cluster.
The algorithm converges when the cluster assign-
ments no longer change. The iteration steps reduce the
Synonyms value of the loss function thus assuring convergence.
To initialize, one can either randomly choose K points
k-means to represent cluster means or randomly assign K labels
to N observations.
Note that the cluster assignment may be
Definition suboptimal corresponding to the local rather than
global minimum of the loss function. To elevate this
K-means is an unsupervised clustering algorithm for problem, one can do multiple cluster runs with ran-
classifying data points into K distinct groups. dom initializations and choosing the final partition
with the smallest value of the loss function (Hastie
et al. 2009).

Characteristics Example
Here we demonstrate the performance of the K-means
K-means algorithm clustering algorithm on the famous iris data set
The K-means algorithm is based on the dissimilarity that gives the measurements (in cm) of the four vari-
measure given by squared Euclidean distance, i.e., for ables including sepal length and widths, and petal
p-dimensional data vectors x1 ¼ (x11 . . .,,x1p) and x2 ¼ length and width. The data was collected on the
(x21, . . ., x2p) three species of Iris setosa, Iris versicolor, and Iris
virginica. The data set is freely available as a part of
R package.
X
p
Figure 1 shows the results of clustering the
dðx1 ; x2 Þ ¼ ðxi1  x2i Þ2 data into four groups using K-means method with 25
i¼1
random initializations. Here, s, v, g stand for the
three different species setosa, versicolor, and
In K-means, the clusters are determined by mini- virginica. The cluster assignments are marked by the
mizing the loss function three different color. Clearly, K-means exactly iden-
tifies setosa species, but makes some misassignments
by classifying some versicolor with virginica and
X
K X vice versa. This is expected, since the setosa species
LðCÞ ¼ Nk dðxi ; xk Þ; are well separated, while the other two have some
k¼1 CðiÞ¼k overlap across different measurements. The
misclassification error for the K-means classification
is about 11%.
where k ¼ 1, . . ., K indexes clusters,
xk ¼ ð
x1k ; . . . ; xpk Þ is the mean vector of cluster k, Software
and Nk is the total number of points in cluster k. The The K-means clustering algorithm is available in
loss function L(C) is minimized when, within each a number of software packages both free and commer-
cluster, the average distance from any point to the cial including R, S-plus, SAS, Apache Mahout,
cluster mean is minimized. Matlab, and Mathematica.
Clustering, k-Means 431 C
2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5
v v v
v v vv v v vvvv vvvvv

7.5
vv v v
vv v v vvvvv v vv v v
g vvgvg
v ggv v v gg v v
v gg g vvv
g gggggv vvvvv
g
g
gg gv vv v
gg vvv

6.5
g
vvvgv v
vg
vg g
ggggg vv v v ggg
g vvvvvvv
g g
v
g v g vv
vgg vv gggvvvvvvvv v g vg v
v v vvv
Sepal.Length g
v gg ggv g
v
v ggg
g vvvg vv ggg
g
g vg
g v
v
vg g
v
ggvv v g gggg gv g g g vv
vggg
g g
gg
g s s s s ss ggg ggg vvv sss gggg
gg vv
v
gggg v g ss s s s ggggggg gggg

5.5
g sss sss s
s
sssss g ss s
sssssss
gg C
g ggg ssss
s
ss sss
sss ggg g gg g
v ss
ss
sss
ss s s
ssss
sss s g v ss
ssss
ss s g v
ss s s s sssss s
sss

4.5
s sss s sss ss

s s s
s s s ss sss
2.0 2.5 3.0 3.5 4.0

ss s
s s s vv ssss vv ss
ssss v v
s sssssssss v s ss
s
s
s v ss v
s s sssss s g vg v sss
ss
ss
gg vv sssss s g
g vvv
s ss s g vg vvvg v sss gg g
ggggv
v
vvvv ss s gg g v
sss s sssss g g g g
vvg vvvvgg
vvg
g v vv vv Sepal.Width sss
s ggggg vvvv
v s
ss ggg
gg
g
v vv v
vgv vvvv
vv
gg
g ggvgggvg
v
gg vv v sssss g ggggg gg vvvvv v
vv v
v v sss ggg v
v
v v
g ggggg
g vv
v v vv
g
v
g ggggggggg
vv
vgv
vv v v vv gg
g
gggg
v vv
ggg g vvvvv v
g gg
v gv v v g v gg gggg v gv vv g gvg vvvv v
g gg
s g g g vg
g s g g g gg v s g
g gg v
g g g

7
v v v vvvv
vv v v
vvv v v vvvv vv v
v vvv v v
v v v v vvv

6
vvvvvv vvvv v vv vv vvvv
vvv v v v vv v v v
vvvv
vvg
vv gvvg vvvv g
vvg
vg
gvg v g
vvv g
v g vgvg
v v
gggvvvv vv

5
g gg vg
vgg
vgg g g
ggggg
ggvg
v v
g gg g
v ggg g ggg g gggg gg vg g g g gggg
g g
ggggggg gg gg
g gggg
gggg g gg Petal.Length

4
ggg
g gg
gg g gg g g g
g
g g g

3
2
s ss s s ss
s s ss s
sssss
s sss
ssssssssss s sssss
ssssssss ss s sss s
ss
s
s sss s s s s s s s ss sss s

1
0.5 1.0 1.5 2.0 2.5

v vvvv vvvvv v v v vv vvv


v v
vv v
v v v
vvvvv
v vv
vvv v
vv v vvvvvv vvv
v
vvv vvvv vvv v v vv v vv vvv vvvvv
vvvvg v vvvv v
g
vvvvvvv v vv v vv g vgg gvvvvvvvvvv
v ggg g v g vgvvgg
g gg
g ggg g
vggg
v ggg
g gg
v v gvgg
g g gg
vgg
vgg g
gggggggvv v
g
gggg g
v
gggg g g ggggg
g ggg
gg gg ggg
gg gg
gg v
Petal.Width
ggg g g gg g gggg
g ggg
g
gggggggg g
g gg

s sss sss ss
sssss sssss s ss
sss s ss sss s s s ssssss
s ss ssssss s
s ssss ss
ss ssss
sss
ss s s ssssss
ss
s s
ss
s
sssss
s

4.5 5.5 6.5 7.5 1 2 3 4 5 6 7

Clustering, k-Means, Fig. 1 Color-coded K-means cluster assignments for the three different iris species setosa, s, versicolor, v,
and virginica, g

Conclusion which is not readily available in practice. Numerous


The K-means clustering is an efficient algorithm for variations have been proposed to improve the perfor-
data classification. The method is intuitively simple mance of the algorithm.
and is easy to implement. The K-means clustering,
however, has a number of assumptions and performs
poorly when these assumptions are violated. In partic-
References
ular, the method fails when clusters are nonspherical or
differ considerably in size. Furthermore, the algorithm Hastie T, Tibshirani R, Friedman J (2009) The elements of statis-
requires a prior knowledge on the number of clusters tical learning, 2nd edn. Springer, New York, pp 509–510
C 432 Clustering, Model-based

data is viewed as a sample from the mixture of normal


Clustering, Model-based distributions. Other examples include beta mixtures,
gamma mixtures, etc.. The K-means clustering algo-
Larissa Stanberry rithm is implicitly based on the Gaussian mixture
Bioinformatics and High-throughput Analysis model.
Laboratory, Seattle Children’s Research Institute,
Seattle, WA, USA Implementation
For estimation purposes, the data is specified as (yi, zi)
where zi ¼ (zi1, zi2, . . ., ziK) is the vector of binary
Synonyms variables such that zik ¼ 1, if yi is in cluster k, and zero
otherwise. The membership vector zi is considered
Mixture model as a sample from the multinomial distribution of
a single draw from K groups with probabilities
(p1,p2. . ., pk). By maximizing the likelihood function
Definition for the data (yi, zi), one estimates ^zik , which is
the conditional probability of observation yi being in
Model-based clustering is a classification technique cluster k. In hard clustering, point yi is ultimately
where the data is viewed as arising from the underlying assigned to a cluster with the largest value of ^zik .
mixture of probability distributions with each mixture In fuzzy clustering, the data point is not assigned
component representing a cluster. to any particular cluster, rather the results are returned
as a vector ð^zi1 ; ^zi2 ; . . . ; ^ziK Þ which gives the proba-
bilities of the ith point being in any one of the K
clusters.
Characteristics The model-based clustering can be solved
using the expectation maximization (EM) algorithm
Overview which iterates between computing the estimates
Model-based clustering treats data as arising ^zi given the current parameter estimates and estimat-
from the mixture of probability distributions. The ing the parameters by maximizing the likelihood
mixture components are assumed to correspond to given the current estimates ^zi (Fraley and Raftery
distinct clusters. The goal of the clustering approach 2002).
is to classify data into distinct groups by assigning
every data point to a mixture component. Estimating Number of Clusters
More specifically, if y ¼ (y1, y2,yn) is the observed In real data, the number of clusters is typically
data, the finite mixture model has a form unknown. One can estimate K by comparing models
with different number of mixture components. The
idea is to select a criterion to assess the model fit and
n X
Y K
choose the partition that corresponds to the model with
Lðy; Y; KÞ ¼ pk fk ðyi ; yk Þ
i¼1 k¼1
the highest value of the criterion. The most popular
criterion choices include Bayesian information crite-
rion (BIC), Akaike information criterion, and the min-
where K is the number of mixture components, pk is the imum description length.
probability that an observation belongs to the kth com-
ponent, fk ð ; yk Þ is the density of the kth component Software
with parameter yk, and Y ¼ ðy1; y2 ; :::; yK Þ are the An R package for normal mixture modeling MCLUST
parameters of the K components. Note that pk  0 and can be used for model-based clustering. The MCLUST
PK
i¼1 pk ¼ 1. package includes model-based clustering, the imple-
The component densities fk are chosen to best reflect mentation of the EM algorithm for four different
the data-generating mechanism. The most popular covariance mixture models, and the BIC approach
approach is the Gaussian mixture model, where the to estimating the model and the number of clusters.
Clustering, Model-based 433 C
2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5
g g g
ggg g g g gggg gggg
g

7.5
gg gg gg
gg g
g gggg gg g g
g v
vg vvggg vv g g
g vvg gvvgvgg vvvvvvgggggg v
vvvvv vg
g
g gg ggg

6.5
gvvg
g g g
g v gggg g g
vv g gg
v gg vg
vg vvvvgvgggggg
gg vv g g gg
vvv gg ggg
g
Sepal.Length g
vg
v gvvvvg v g vv gg
vvv vvvg gv g v vvg
vg
v g g
g
g
gg g v v v vgg v vvv g
v
gvv
vvvg
vvvvv s s s s ss v v
vvv g
vv vvvv sss vv vvv v gg
vv g g
vvv vvv g v

5.5
vvvv v ss ss s s
v v s ssssssssssss s
sssss
ssss
ssss v vv v
s
s s
ss
s
ss
sss
vvvv v
v v
C
v vvg sssssss g ss
sss s v g
sssss s s s ss v ss v
ss s s s sssss s
ss

4.5
s sss s s
sss ss

s s s
s s s s
ssss ss
2.0 2.5 3.0 3.5 4.0

s s
s sss s gg ssss gg sss gg
s ssss ss s
g s ssss g ss
s s g
s s ssss s v g gg ggg g sssssss
s vv gg g g s
ss
ss s v gg gg
s ss s s v vg g
vg ggvvgg gg
vgg Sepal.Width sss
ss vvvvvvvg
vvggggggg s
s s v
vvv g g g g
v g gg
sss s sssss v v v g
vggv g g sssssss gvgggg
ggggggg ss vvvv vvgvg
gggggg
vvvg vvvg
g vgvvvvv gg g
gg v vv
v vvvvv vv
vvvgvg ss
ss v
vvg
v vvg v vg gg g v vv ggg
v ggg
g g
g
v v v g
v v gg g
vv vvvg
ggg g
gv v vvvgv g v g v vvvvvvv vg vg g gggg
v
s v v g v s v vv vv g s vvvv vv v
vv v g
v
v v v
g gg gggg

7
gg g g
ggg g g gggg gg g
g ggg g g
g g g gg g ggg

6
ggggg gg gg gg gggg ggg g g gggg
gg ggg ggg g ggg g ggg g
v
gggg
gg vg vg g gv g
gvg g
vg
vg g v gg
vg g gg

5
v v g
vvvg v
g vvvv vvvvvvvvvvvvvv v g v
vv v vvvvvvvvv
v
vvvvvv
vvv
vgv
vvv
vvvvvv vvvvvvvv v vvvv
vv v

4
v vv v Petal.Length v vv
vv v vv v v
vv
v vv
v v v

3
2
s s s s ss sss ssss
s sss
sssssssssss s sssssssssssssssssss s s
sss
ss
s s
s sss s s s s s s ss
s
s

1
0.5 1.0 1.5 2.0 2.5

g gg gg g ggg g g gg gg
gg g gggggg gg
ggg g g g gg g g g ggggg gg
ggg ggg ggg g g gg g
g g
g g
g
g
g
g
g gg
g g ggggg gggg
gg gg
gg g
g vgggggg g
g gg g gggg
g gg
v g v ggg
g
vg g ggg
v v vg
v vvv vv v g
v vg
v v vgvvg
g g v
vvvvvvv v vvvvvgg
v v
v vvv g vvvv vvvv v v vvvvvvvvvvvvvv v vvvvvvvvvvvv
g
v v
vvvv vv v vvvvvvvvvvv v
Petal.Width
vv v vv v v vvvvvvv
ss sss sss s
ss s ssss ssss s
s s s s ssss
ssss ss
ss sss
sssss
sssss s s sssssssssss sss
sssssss s
ssss
ss
sss
ss
ssss
4.5 5.5 6.5 7.5 1 2 3 4 5 6 7

Clustering, Model-based, Fig. 1 Scatter plot of the three clusters identified in the iris data using model-based clustering. Clusters
are color-coded and the three species are labeled, i.e., setosa, s, versicolor, v, and virginica, g

The package is well documented and provides a good agreement between the true data and clustering
enhanced display and visualization tools. results. The misclassification rate is 3.3%.

Example
Figure 1 shows the classification of the iris data using Cross-References
the model-based clustering approach with
unconstrained covariance model as implemented in ▶ Bayesian Information Criterion (BIC)
MCLUST. The species are labeled with letters and ▶ Clustering
the three clusters are color-coded. Clearly, there is ▶ K-Means
C 434 CMP Motif

▶ Model Testing, Machine Learning The term dual is used in the formal sense of category
▶ Model Validation (▶ Mathematical Structure, Category) theory, namely,
that the defining operations are reversed.
References

Fraley C, Raftery A (2002) Model-based clustering, discriminant Characteristics


analysis and density estimation. J Am Stat Assoc 97:611–631

Coalgebra for Classical Algebra


Classical algebra studies structures endowed with
CMP Motif binary operations that obey various properties: asso-
ciativity in semigroups, existence of neutral and
▶ Combinatorial Molecular Phenotypes (CMPs) inverse elements in groups, distributivity in rings,
commutativity in Abelian algebras, absorption in
Boolean algebras, etc. The corresponding coalgebraic
structures are endowed not with operations that pro-
c-Myc duce a single element as a combination of two but
conversely with operations that produce a pair of ele-
Bernhard Schmierer ments as a decomposition of one. Applications of
Department of Biochemistry, Oxford Centre for coalgebraic structures in science are still extremely
Integrative Systems Biology (OCISB), University of scarce, compared with their algebraic counterparts.
Oxford, Oxford, UK The most important instance is the use of Hopf
coalgebras alongside Hopf algebras in standard
model particle physics (de Wild Propitius and Bais
Definition 1995). In biological sciences, coalgebra has been
used for formal accounts of genetics (Tian and Li
c-Myc is a proto-oncogene and member of the Myc 2004). There, the operation of genetic combination
family of transcription factors. c-Myc is a key activator from parents to offspring is reversed to a deduction of
of cell proliferation in response to growth factors and, genetic traits of the parents from those of their
when activated by mutation or overexpression, can offspring.
drive unrestricted proliferation. Hyperactivation of
c-Myc is observed in many cancers. Coalgebra for Universal Algebra
Universal algebra, more abstractly, studies properties
common to all the aforementioned structures and many
Cross-References others beside. It has been recognized only at the end
of the twentieth century that, when these universal
▶ Cell Cycle Signaling, Hypoxia properties are dualized rather than their specific manifes-
tations, many structures of systems theory and computer
science come into scope, most notably transition systems
Coalgebra (Aczel 1988) and automata (Adámek and Trnková
1990). This insight has attracted theoretical computer
Baltasar Trancón y Widemann scientists to the field, culminating in the proposal by
Ecological Modelling, University of Bayreuth, Rutten (2000) of universal coalgebra as a general theory
Bayreuth, Germany of system dynamics, with applications to stochastic
dynamics (▶ Markov Chain) and control.
In the language of category theory, both algebras
Definition and coalgebras are studied relative to a ▶ functor F that
specifies an expression language.
Coalgebra is the field of study of mathematical structures An F-algebra is a carrier set S together with an
that are dual to structures studied in the field of algebra. operation a: F(S) ! S; that is, single elements in S
Coalgebra 435 C
are produced by combination according to composite a unique homomorphism to any other F-algebra and is
expressions in F(S). Dually, an F-coalgebra is a carrier isomorphic to the term algebra that consists of finitely
set S together with an operation c: S ! F(S); that is, nested composite F-expressions. Thus, universal
single elements in S produce decompositions into algebra is equipped with a universal syntactic structure
composite expressions in F(S). for constructions, and the corresponding unique
The appearance of the carrier set S on the left or homomorphism is their systematic bottom-up
right hand side of a mapping makes a crucial difference interpretation. C
for the interpretation of the operation: Algebras Dually, for many functors F, the category of
formalize the construction and hence the causal F-coalgebras has a final object: There exists an
explanation of elements (▶ Causality; ▶ Reduction). F-coalgebra that admits a unique homomorphism
Coalgebras, on the other hand, formalize the from any other F-coalgebra. In contrast to the
characterization and hence the behavioral explanation term algebra, it contains also infinite and circular
of elements; they may take existence as granted and are expressions in the spirit of the non-well-founded set
hence indifferent to infinitely regressing and circular theory of Aczel (1988). It can be thought of as the
dependencies (▶ Holism). space of all possible ways of global system behavior
(Jacobs and Rutten 1997), and the corresponding
Mappings and Relations unique homomorphism is their systematic top-down
The notion of homomorphisms as structure-respecting representation.
mappings, classically defined individually for each
class of algebras, can be generalized in the categorical Application Examples
language to all functors F and to both algebra and As an example, consider the functor F(X) ¼ 1 + X on
coalgebra as commuting diagrams. The diagrams for the category of sets, which adds an extra element,
a homomorphism h between two F-algebras and for say *, to each set. Its initial algebra consists of the set
a homomorphism k between two F-coalgebras are as N of nonnegative integers and an operation that assigns
follows: zero to * and the successor to each number. Every
deterministic discrete dynamical system can be
h k represented as an F-algebra, consisting of the state
S T S T
space S and an operation that assigns a distinguished
initial state x0 to * and the successor to each state. The
a b c d unique homomorphism h : N ! S then computes
the trajectory of x0.
Dually, the final coalgebra for the same functor F
F(S ) F(T ) F(S ) F(T ) consists of the set N [ f1g of extended nonnegative
F(h) F(k)
integers and the predecessor operation that maps 1
(1) to itself and zero to *. Every subset U  S of states
of a dynamical system can be represented as an
The same statements can be rephrased in equational F-coalgebra, consisting of S and the evolution function
form as of the system, but restricted to U and mapping
all other states to *. The unique homomorphism
h a ¼ b FðhÞ FðkÞ c ¼ d k (2) h : S ! N [ f1g then assigns to each state the
number of steps the system may take before leaving
respectively, where ○ denotes the composition of the region U. U is stationary if and only if h(x) ¼ 1
mappings. for all x 2 U.

Distinguished (Co)algebras Equivalence


F-algebras and their homomorphisms form a category, Elements of different F-algebras may be related
and so do F-coalgebras and their homomorphisms. by originating from the same element of the initial
For many functors F, the category of F-algebras has F-algebra via the respective unique homomorphisms.
an initial object: There exists an F-algebra that admits In that case, one may speak of them as constructively
C 436 Coefficient of Correlation

equivalent. Elements of different F-coalgebras may


be related by being mapped to the same element Co-expression
of the final F-coalgebra via the respective unique
homomorphisms. In that case, one may speak of them Jiguang Wang
as behaviorally equivalent. Beijing Institute of Genomics, Chinese Academy of
The two relationships are logically similar to the Sciences, Beijing, Beijing, China
concepts of homology and analogy of comparative
biology. The genuinely coalgebraic notion of formal
behavioral equivalence has proven very useful in the Definition
abstract modeling of systems and processes in com-
puter science (Milner 1989). The same abstraction can Co-expression means the simultaneous expression of
be used to circumvent the issue of ▶ multiple realiza- two or more genes. Identifying co-expression genes is
tion of biotic phenomena, by modeling and an important method in exploring microarray data.
formalizing the essential behavior rather than the acci- Yu et al. (2003) show that genes targeted by the same
dental structures that cause it. transcription factors tend to show similar expression
profiles. Also, interacting proteins are often
significantly co-expressed (Jansen et al. 2003).
Cross-References

▶ Category Map References


▶ Causality
▶ Functor Jansen R, Yu HY, Greenbaum D, Kluger Y, Krogan NJ,
Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M
▶ Holism
(2003) A Bayesian networks approach for predicting pro-
▶ Markov Chain tein-protein interactions from genomic data. Science
▶ Multiple Realization 302(5644):449–453
▶ Reduction Yu HY, Luscombe NM, Qian J, Gerstein M (2003) Genomic
analysis of gene expression relationships in transcriptional
regulatory networks. Trends Genet 19(8):422–427

References

Aczel P (1988) Non-well-founded sets. CSLI, Stanford University


Press, Stanford Co-expression Network
Adámek J, Trnková V (1990) Automata and algebras in categories.
Kluwer, Dordrecht/Boston
de Wild Propitius M, Bais FA (1995) Discrete gauge theories. ▶ Functional/Signature Network Module for Target
Report 95-46, LPTHE, Paris, arXiv:hep-th/9511201v2 Pathway/Gene Discovery
Jacobs B, Rutten J (1997) A tutorial on (co)algebras and (co)
induction. EATCS Bull 62:222–259
Milner R (1989) Communication and concurrency. Prentice
Hall, New York
Rutten J (2000) Universal coalgebra: a theory of systems. Theor Cofactors
Comput Sci 249(1):3–80. doi:10.1016/S0304-3975(00)
00056-6
Tian J, Li B (2004) Coalgebraic structure of genetic inheritance. Tetsuro Kokubo
Math Biosci Eng 1(2):243–266 Department of Supramolecular Biology, Graduate
School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan

Coefficient of Correlation Synonyms

▶ Correlation Coefficient NC1; NC2; PC1; PC2; PC3; PC4


Coherent Type I FFL 437 C
Definition Roeder RG (1998) Role of general and gene-specific cofactors in
the regulation of eukaryotic transcription. Cold Spring Harb
Symp Quant Biol 63:201–218
USA (Upstream factor Stimulatory Activity) was Thomas MC, Chiang CM (2006) The general transcription
originally identified as an activity promoting machinery and general cofactors. Crit Rev Biochem Mol
a greater level of transcriptional activation by its dual Biol 41(3):105–178
function of repressing basal transcription and potenti-
ating activator-dependent transcription (Kaiser and C
Meisterernst 1996; Roeder 1998; Thomas and Chiang
2006). Further fractionation of USA revealed that it Coherent Type I FFL
contained four positive cofactors (PC1, PC2, PC3, and
PC4) and one negative cofactor (NC1). PC1, PC2, Jinzhi Lei
PC3, and NC1 were later found to be equivalent to Zhou Pei-Yuan Center for Applied Mathematics,
poly(ADP-ribose) polymerase-1, Mediator, DNA Tsinghua University of Beijing, Beijing, China
topoisomerase I and HMG1, respectively. PC1 stimu-
lates PIC formation at a post-TFIID binding step, and
PC3 enhances TFIID-TFIIA-DNA complex formation. Definition
PC4 has a single-stranded DNA binding activity and
plays a general role in the stabilization of various In the coherent type 1 FFL (C1-FFL), both transcrip-
protein-DNA interactions. NC1 binds to the TBP- tion factors X and Y are transcriptional activators
TATA complex and prevents the incorporation of (Fig. 1). The C1-FFL is a “sign-sensitive delay” ele-
TFIIB into PIC. Notably, each component of USA ment and a persistence detector. This means that this
shows dual functions (positive and negative), motif can provide pulse filtration in which short pulses
depending on whether activators are present or absent, of signal will not generate a response but persistent
similar to that observed in the original mixed fraction signals will generate a response after short delay
of USA. (Mangan and Alon 2003).
NC2 was isolated from a chromatographic fraction The C1-FFL has opposite effect in the OR
different from that of USA, as a factor that could input function comparing with those in the AND
associate with the TBP-TATA complex. This factor input function. With an AND input function, the
comprises two subunits, NC2a and NC2b, which C1-FFL shows a delay after stimulation, but no delay
interact with each other via their HFD (histone fold when stimulation stops, while with an OR input
domain). Interestingly, NC2 inhibits TATA-dependent function, the C1-FFL shows no delay after stimulation,
transcription while it promotes DPE-dependent but does show a delay when stimulation stops (Mangan
transcription. The negative function may be due to and Alon 2003; Alon 2007).
its inhibitory effect on the incorporation of TFIIA
and TFIIB into the PIC, and the positive function
may be related to its remarkable activity that mobilizes
TBP on DNA. SX
X

SY
Cross-References Y

▶ Transcription in Eukaryote
AND

Z
References
Coherent Type I FFL, Fig. 1 The coherent type-1 FFL with an
Kaiser K, Meisterernst M (1996) The human general co-factors. AND input function at the Z promoter. SX and SY are input
Trends Biochem Sci 21(9):342–345 signals for X and Y
C 438 Cohesin

References Definition

Alon U (2007) Network motifs: theory and experimental Collaborative and distributed biomedical applications
approaches. Nat Rev Genet 8:450–461
on the World Wide Web or biomedical
Mangan S, Alon U (2003) Structure and function of the
feed-forward loop network motif. Proc Natl Acad Sci USA “collaboratories” (Olsen et al. 2008) enable and social-
100:11980–11985 ize multimodal activities critical to biomedical
research, using social media technologies. They enable
near-real-time group discussion of methods and
Cohesin findings, sharing of reagents, cooperative construction
of databases, and publication and annotation of new
Rosella Visintin scientific communications.
IEO, European Institute of Oncology,
Milan, Italy
Characteristics

Definition The use of the Internet to support scientific communi-


cation is one of the major shifts in the practice of
Cohesin is a hetero-tetrameric complex organized in science in this era (Kling 1999).
a ring-like shape whose main function is to hold sister Collaborative and distributed web applications
chromatids together from DNA replication up to (or “collaboratories”) provide biomedical researchers
anaphase. and clinicians with online facilities for very-large-
scale cross-disciplinary biomedical research commu-
nication. They support essential scientific activities of
Cross-References reporting, documenting, searching, sharing, validating,
combining, collectively witnessing, critiquing,
▶ Mitosis reviewing, reproducing, integrating, and extending
the results and methods of experiment, observation,
and theoretical endeavor. As such they are part of an
ongoing transition in science generally from print-
Cold and Flu-Like Illness based to web-based scientific communications
(Borgman 2007).
▶ Viral Respiratory Tract Infections Many biomedical web communities are disease-
focused, such as the pioneering Alzforum (Kinoshita
and Clark 2007) as well as newer offerings such as Pain
Research Forum (http://painresearchforum.org) and
Collaborative and Distributed SFARI (https://sfari.org/).
Biomedical Applications Other collaboratories – many of them wikis – focus
on specific biological entities, such as WikiProteins
Tim Clark (http://conceptwiki.org/index.php/WikiProteins);
Department of Neurology, Massachusetts General domain ontologies, such as NeuroLex (http://
Hospital and Harvard Medical School, Boston, neurolex.org/wiki); or protocols, such as
MA, USA OpenWetWare (http://openwetware.org/) (Waldrop
2008).
Collaborative workflow repositories such as
Synonyms myExperiment (http://myexperiment.org) support
archival, retrieval, and discussion of analytical
Biomedical web communities; Collaboratories; workflows used in computational systems biology
Virtual laboratories and other data-intensive disciplines (Goble and
Collaboratories 439 C
DeRoure 2007). They provide distributed support for reuse, modification, and redistribution. In an
detailed scientific discourse on the computational open source WCMS, the community of its software
methods used to arrive at published results developers is typically organized online using its
(Gil et al. 2008). own software development collaboratory (e.g.,
Collaboratories markedly reduce time and distance http://drupal.org).
constraints upon participants, while providing essen- Informatics to support collaboratories can now
tial computational and communications support for be considered a distinct research agenda in itself C
large-scale cross-disciplinary integration of biological (Lee et al. 2003).
and clinical findings on a global scale.
Collaboratories are thus core infrastructure for
what Weston and Hood (Weston and Hood 2004) Cross-References
term “studying biological systems on a global level,”
ideally allowing “as many levels of information as ▶ Ontologies
possible to be integrated,” and are thus fundamental ▶ Wiki
enabling technology for the development of systems ▶ World Wide Web
biology.
Scientific blogs, such as Science in the Open (http://
cameronneylon.net/), Open Reading Frame (http:// References
www.sennoma.net/) or RRResearch (http://rrresearch.
fieldofscience.com/), are another important form of Borgman CL (2007) Scholarship in the digital age: informa-
tion, infrastructure, and the internet. MIT Press, Cambridge,
scientific social media. However, unlike
MA
collaboratories, which orient strongly toward collec- Carroll MW (2006) Creative commons and the new intermedi-
tive establishment and refinement of scientific aries. Mich St L Rev 45:55–59
methods, hypotheses, models, and findings, scientific Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D,
Myers J (2008) Examining the challenges of scientific
blogs typically support group discussion of the peri-
workflows. IEEE Computer 40(12):24–32
odic publications of a single individual. Goble CA, DeRoure DC (2007) myExperiment: social network-
An important feature of biomedical and other ing for workflow-using e-scientists. Paper presented at the
collaboratories is the open-access nature of their con- proceedings of the 2nd workshop on workflows in support of
large-scale science, Monterey, CA
tent licensing. Creative commons or “CC” (http://
Kinoshita J, Clark T (2007) Alzforum. Methods Mol Biol
creativecommons.org/) licenses are standardized and 401:365–381. doi:10.1007/978-1-59745-520-6_19
widely recognized legal models for open-access con- Kling R (1999) What is social informatics and why does it
tent sharing. These licenses are frequently used for matter? D-Lib Magazine 5(1)
Lee FSL, Vogel D, Limayem M (2003) Virtual community
collaboratory content, with “CC-BY” (free to share
informatics: a review and research agenda. J Inf Technol
with attribution to the original author) a popular Theory Appl 5(1):47–61
option. CC licenses enable legally sound, free of Olsen GM, Zimmerman A, Bos N (2008) Scientific collaboration
charge, and end-to-end transactions in copyrighted on the internet. MIT Press, Cambridge, MA
Waldrop MM (2008) Science 2.0. Sci Am 298(5):68–73
works on the web (Carroll 2006).
Weston AD, Hood L (2004) Systems biology, proteomics,
Collaboratories are increasingly supported by and the future of health care: toward predictive, preventative,
standard web content management systems (WCMS), and personalized medicine. J Proteome Res 3:179–196.
providing reusable and customizable features, such as doi:10.1021/pr0499693
Drupal (http://drupal.org); by wiki platforms such as
MediaWiki (http://www.mediawiki.org) and Semantic
Mediawiki (http://semantic-mediawiki.org/); or by
web application frameworks such as Ruby on Rails
(http://rubyonrails.org). Collaboratories
WCMS used to support collaboratories in
bioscience are, as a rule, “open source software” – ▶ Collaborative and Distributed Biomedical
that is, their license terms provide for free use, Applications
C 440 Collective Behavior

Definition
Collective Behavior
The combinatorial molecular phenotype (CMP)
Andreas Deutsch is calculated from the alignment of a stack of
Center for Information Services and High Performance binary images originating from fluorescence signals
Computing (ZIH), Technical University Dresden, (gray values) generated by a Toponome Imaging
Dresden, Germany System (TIS) (▶ Clinical Aspects of the Toponome
Imaging System (TIS)). A CMP is a bijective
binary vector denoting the presence or absence (1 or
Definition 0) of any single out of N marker molecules
co-mapped by TIS (▶ Clinical Aspects of the
The term collective behavior describes the macroscopic Toponome Imaging System (TIS)) at a certain pixel/
behavior observed in systems that consist of a large voxel position or at multiple positions in the
number of interacting components. Collective behavior image. A CMP vector includes a certain number of
can result from self-organization, that is, the emergence pixel coordinates, at least one, which determine(s)
of patterns without external influence, and the formation the CMP frequency for statistical analysis. Several
of complex spatiotemporal patterns even from relatively distinct CMPs can be grouped as a so-called CMP
simple interactions. Examples of collective behavior motif (Fig. 1). A motif describes a set of CMPs by
include the sorting of cells by differential adhesion, the using a three-symbol code (Schubert 2007). The sym-
formation of fruiting bodies by microorganisms, and bol L (or 1) describes a marker or protein that all
organization of fish into huge swarms. CMPs of the group have in common. Symbol
A (or 0) is used when a certain marker is always
absent (anti-colocated) in the CMPs of the group. If
Colony-Forming Fibroblastic Cells a marker is variably present within a CMP group, the
symbol in the corresponding motif is a wild card
▶ Single Cell Assay, Mesenchymal Stem Cells W (or *).
A CMP at pixel position x,y is defined as binary
vector as follows:
Combinatorial Diversity

▶ Immune Repertoire Diversity CMPðx; yÞ ¼ ðs1 ; s2 ; s3 ; . . . ; si ÞTx;y


ðsi is a binary signal; s 2 f0; 1g; i ¼ f1; 2; 3; . . . ; Ng
j N ¼ number of markers ; x; y ¼ pixel coordinatesÞ
Combinatorial Molecular Phenotypes
(CMPs)
CMPs and resulting CMP motifs with their
Reyk Hillert lead protein(s) are direct quantitative measures of
Molecular Pattern Recognition Research (MPRR) the hierarchy of proteins interlocked as protein
Group, Otto-von-Guericke-University, Magdeburg, cluster networks. Lead proteins appear to exert
Germany control over the topology and function of protein
cluster networks (Schubert et al. 2006; Friedenberger
et al. 2007; Schubert et al. 2011; Schubert 2010).
Synonyms Hence, their detection by hierarchical toponome/
CMP analysis is a new way in medical systems
CMP motif; Multi-protein cluster; Three-symbol code; biology and drug target discovery (Schubert et al.
TIS code 2008).
Combinatorial Transcription Regulatory Network 441 C

Combinatorial Molecular Phenotypes (CMPs), Fig. 1 The figure shows the stepwise generation of binary images, as well as the
calculation of CMP vectors and the three-symbol code of the resulting CMP motif

References
Combinatorial Transcription Regulatory
Friedenberger M, Bode M, Krusche A, Schubert W (2007) Network
Fluorescence detection of protein clusters in individual
cells and tissue sections by using toponome imaging system: Yong Wang
sample preparation and measuring procedures. Nat Protoc
2(9):2285–2294 Academy of Mathematics and Systems Science,
Schubert W (2007) A three-symbol code for organized Chinese Academy of Sciences, Beijing, China
proteomes based on cyclical imaging of protein locations.
Cytometry A 71(6):352–360
Schubert W (2010) On the origin of cell functions encoded in the
toponome. J Biotechnol 149(4):252–259 Synonyms
Schubert W, Bonnekoh B, Pommer AJ, Philipsen L,
B€ockelmann R, Malykh Y, Gollnick H, Friedenberger M, Combinatorial regulation; Transcriptional factor
Bode M, Dress AW (2006) Analyzing proteome topology cooperativity
and function by automated multidimensional fluorescence
microscopy. Nat Biotechnol 24(10):1270–1278
Schubert W, Bode M, Hillert R, Krusche A, Friedenberger M
(2008) Toponomics and neurotoponomics: a new way to Definition
medical systems biology. Expert Rev Proteomics
5(2):361–369
Schubert W, Gieseler A, Krusche A, Serocka P, Hillert R (2011) A regulatory network consists of regulators such as
Next-generation biomarkers based on 100- parameter func- transcription factors or kinases that control the expres-
tional super-resolution microscopy TIS. N Biotechnol. sion or activity of their target genes (Chen et al. 2009).
doi:10.1016/j.nbt.2011.12.004 Compared to more commonplace contexts, these reg-
ulators can be thought of as managers in a social or
corporate setting controlling their common subordi-
nates. Usually, these regulators are not working alone
Combinatorial Regulation and, almost always, these multiple regulators are
partnering together to control their targets. Combinato-
▶ Combinatorial Transcription Regulatory Network rial regulation is not a well-defined term in Biology and
C 442 Combinatorial Transcription Regulatory Network

we refer to the combinatorial regulation generally when


the regulators (both TFs and kinases) perform their
function mostly in combination with other regulators
under different spatial and/or temporal conditions.
Specifically, combinatorial transcriptional regula-
tory network describes the combinatory regulatory
relationships among transcription factors. Transcrip-
tion factors (TFs) are proteins that dynamically read
and interpret the static genetic instructions in the DNA.
As mentioned above, transcription factors usually
cooperate with other TFs to facilitate (as an activator)
or inhibit (as a repressor) the recruitment of RNA
polymerase, using complex logic rules building upon
Combinatorial Transcription Regulatory Network,
simple ones (AND, OR, and NOT) to control the pre- Fig. 1 TF combinatorial regulation, structure of the a1/a2 com-
cise condition-dependent expression of target genes. plex and their binding DNA fragment. Red: TF a1, green: TFa2,
For example, one TF can help another TF to stabilize blue: DNA fragment
onto regulatory DNA sequence and recruit RNAP to
start transcription; several TFs may use diverse and
complex logic rules to control the phased temporal conditions in the environment, integration of multiple
expression of target genes; also, it has been demon- signaling inputs, and generation of highly specific
strated by computational model that, in yeast cell-cycle outputs with the help of a relatively small number of
modeling, the inclusion of synergistic interactions can regulators. Overall, transcriptional cooperativity
increase the prediction accuracy of transcriptional reg- among several TFs is believed to be the main
ulation to as much as 1.5–3.5-fold. mechanism of complexity and precision in transcrip-
Below, we show a picture for TF combinatorial tional regulatory programs.
regulation, which is the structure of the a1/a2 complex Combinatorial regulation is a common theme
and their binding DNA fragment. The TF a1 is shown in eukaryotic transcriptional circuits, as transcription
in red and the TFa2 is green and the DNA fragment is factors often work in different combinations to
blue (Fig. 1). The interaction interface between two regulate different sets of genes under different
transcription factors is highlighted by spheres repre- conditions. Many combinatorial interactions are due
sentation, which includes residues whose distance is to direct protein–protein contacts between sequence-
within 7 Å. specific DNA-binding proteins. These interactions are
It should be noted that the TF combinatorial regu- often much weaker than the protein–DNA interactions.
lation is not uniquely defined and can be interpreted in It is therefore not surprising that changes in the
different ways (transcriptional combinatorial regula- interactions between transcription factors play an
tion, TF synergism, TF cooperation, TF interaction, important role in transcriptional rewiring.
and TF cooperativity). It can refer to TF–TF physical/
genetic/regulatory interaction, co-occurrence of Patterns of Combinatory Regulation
DNA-binding sites close to each other in the same There are three different patterns of TF combinatorial
promoter region, and TF spatial and temporal regulation that suggest three possible types of TF
combinatorial co-regulation. cooperativity mechanisms in transcriptional control
(Fig. 2):
In type-I cooperativity (Subfigure A), TFs cooper-
Characteristics ate through physical interactions in a transcriptional
complex, and jointly regulate many target genes. Upon
Significance of Combinatorial Regulation the formation of the complex, the binding probability
Combinatory regulation is very important since it of RNA polymerase onto the promoter sequence will
allows for a sophisticated response to multiple either increase or decrease, thus affecting the
Combinatorial Transcription Regulatory Network 443 C
a Physical interaction cascade, belong to this type. Alternatively, one TF
can genetically interact with another TF which leads
to a joint phenotype. Here, TF regulatory or genetic
TF 1 TF 2 interactions are the dominant predictors for type-II
cooperativity. Examples of this type of cooperativity
Motif 1 Motif 2 include Sok2-Phd1, Yap6-Cup9, and Skn7-Yap1 pairs
Promoter Gene
in yeast, which are all detected by ChIP-chip experi- C
ments as belonging to regulatory feed-forward motifs
b Regulatory or genetic interaction and supported by literature evidences.
In type-III cooperativity (Subfigure C), TFs often
cooperate with each other in a competitive way (TF
TF 1 TF 2 competitive regulation), or in a redundant manner. In
Motif 1 Motif 2 this case, the DNA-binding sites of these TFs share
Promoter Gene high sequence identity or similarity. Their binding
motifs often overlap with each other, leading to com-
c Competitive cooperation petition between TFs for transcriptional control. In this
case, the motif-occurrence data-based features are the
TF 1 TF 2 dominant predictors. Some representative cases of
type-III cooperativity are Pdr1-Pdr3, Cat8-Sip4, and
Msn2-Msn4 in yeast.
Motif
Promoter Gene Methods to Detect or Predict TF Combinatorial
Regulations
Combinatorial Transcription Regulatory Network, Experimental methods for detecting TF interaction
Fig. 2 Different patterns of TF combinatorial regulation. (a) include co-immunoprecipitation and super-gel shift.
type-I cooperativity, (b) type-II cooperativity, (c) type-III One difficulty for these methods is that the TF interac-
cooperativity
tion is usually transient and hard to be captured. In
addition, these methods are generally time consuming,
subsequent transcriptional efficiency. Here, protein and it is difficult to apply them to mapping the whole-
physical interaction is the dominant predictor for this genome TF cooperativity network in the living cell.
kind of cooperative relationships. Examples of this Complementarily, a wide variety of computational
type of cooperativity include the Met4 Complex, approaches have been proposed to predict TF
CCAAT-binding factor complex, MBF complex, and cooperativity. Intuitive idea is that combinatorial
SBF complex in yeast. Since the two TFs cooperate in regulation and higher-order regulatory logic can be
a transcriptional complex and they stay together most revealed by examining overlaps of target genes and
of the time, we expect that other features are good binding sites between transcription modules. Such
predictors as well. For example, the co-evolution rela- quantitative modeling may provide insight into under-
tionship is a good predictor since interacting TFs tend lying mechanisms and design principles, which can be
to co-evolve in different genomes. tested by experiment.
In type-II cooperativity (Subfigure B), TFs cooper- In addition to some case studies of combinatorial
ate largely through genetic or regulatory interaction, regulation (Wang et al. 2009; Parisi et al. 2007),
and jointly regulate many target genes. Different from existing methods can be roughly classified into three
the first case, TFs interact in a genetic or regulatory catalogs. The first class is the binding motif–based
module that may or may not be explained by the methods. For example, Wagner (1999) employed
existence of a physical complex. For example, one a statistical technique to identity significant homotypic
TF can regulate a secondary TF, and they jointly reg- or heterotypic TF-binding site clusters. (Pilpel et al.
ulate the target genes. The well-known network motifs, 2001) used TF-binding motifs with gene expression
such as the feed-forward loop and the regulator data to look for cooperatively binding TFs.
C 444 Combinatorial Transcription Regulatory Network

(Hannenhalli and Levy 2002) predicted TF synergism genomic features, which provide further biological
by the co-localization of transcriptional factor–binding insights into the different types of TF cooperativity.
site (TFBS) on the genome at the specific distance
apart. (Das et al. 2004) created a multivariate adaptive
Cross-References
regression splines model to correlate the occurrences
and interactions of TFs to the logarithm of the ratio of
▶ Gene Regulatory Networks
gene expression levels. (Yu et al. 2006) identified
▶ Regulation
interacting binding motif pairs considering their orien-
tation and distance. Similarly, (Chang et al. 2006) used
promoter sequences and gene expression data to detect References
cooperativity. Recently, (Datta and Zhao 2008) used
a log-linear model to infer cooperative binding. The Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L (2006)
second class is the target gene (TG)-based methods. Comprehensive analysis of combinatorial regulation using
(Banerjee and Zhang 2003) assumed that a TF pair is the transcriptional regulatory network of yeast. J Mol Biol
360:213–227
cooperative if their TGs are significantly co-expressed Banerjee N, Zhang MQ (2003) Identifying cooperativity among
while (Nagamine et al. 2005) considered TG interac- transcription factors controlling the cell cycle in yeast.
tions instead of TG co-expression. (Tsai et al. Nucleic Acids Res 31:7024–7031
2005) directly selected TF pairs with significant num- Chang Y-H, Wang Y-C, Chen B-S (2006) Identification of
transcription factor cooperativity via stochastic system
ber of common TGs and further applied the ANOVA model. Bioinformatics 22:2276–2282
test to determine synergy. (Balaji et al. 2006) applied Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
a network transformation procedure to ChIP-chip data methods and applications in systems biology. Wiley,
and analyzed combinatorial regulation among multiple Hoboken
Das D, Banerjee N, Zhang MQ (2004) Interacting models of
TFs. The third class is the transcription factor activity cooperative gene regulation. Proc Natl Acad Sci 101:16234–
(TFA)-based methods. (Yang et al. 2005) inferred the 16239
TFAs from gene expression data then assessed TF Datta D, Zhao H (2008) Statistical methods to infer cooperative
cooperativity by their TFA correlation. Wang binding among transcription factors in Saccharomyces
cerevisiae. Bioinformatics 24:545–552
(2007) used the similar framework to study combina- Hannenhalli S, Levy S (2002) Predicting transcription factor
torial regulation in yeast cell cycle. synergism. Nucleic Acids Res 30:4278–4284
In addition to the above case studies and the Nagamine N, Kawada Y, Sakakibara Y (2005) Identifying coop-
unsupervised framework using information from erative transcriptional regulations using protein-protein
interactions. Nucleic Acids Res 33:4828–4837
a single data source such as TF-binding motif, target Parisi F, Wirapati P, Naef F (2007) Identifying synergistic reg-
gene, and TF activity, Wang (2009) used Bayesian ulation involving c-Myc and sp1 in human tissues. Nucleic
networks, a supervised learning approach, to system- Acids Res 35:1098–1107
atically integrate diverse data sources at different Pilpel Y, Sudarsanam P, Church GM (2001) Identifying regula-
tory networks by combinatorial analysis of promoter ele-
levels to predict and analyze TF cooperativity. Specif- ments. Nat Genet 29:153–159
ically, they design a Bayesian network structure to Tsai H-K, Lu HH-S, Li W-H (2005) Statistical methods for
capture the dominant correlations among features and identifying yeast cell cycle transcription factors. Proc Natl
TF cooperativity, and introduce a supervised learning Acad Sci 102:13532–13537
Wang J (2007) A new framework for identifying combinatorial
framework with a well-constructed gold standard regulation of transcription factors: a case study of the yeast
dataset. This framework allows us to assess the predic- cell cycle. J Biomed Inform 40:707–725
tive power of each genomic feature, validate the supe- Wang Y, Zhang X-S, Chen L (2009) Predicting eukaryotic
rior performance of our Bayesian network compared to transcriptional cooperativity by Bayesian network
integration of genome-wide data. Nucleic Acids Res
alternative methods, and integrate genomic features. 37(18):5943–5958
Data integration reveals 159 high-confidence predicted Yu X, Lin J, Masuda T, Esumi N, Zack DJ, Qian J (2006)
cooperative relationships among 105 TFs, most of Genome-wide prediction and characterization of interactions
which are subsequently validated by literature search. between transcription factors in Saccharomyces cerevisiae.
Nucleic Acids Res 34:917–927
The existing and predicted transcriptional Yang Y-L, Suen J, Brynildsen M, Galbraith S, Liao J (2005)
cooperativities can be further grouped into three cate- Inferring yeast cell cycle regulators and interactions using
gories based on the combination patterns of the transcription factor activities. BMC Genomics 6:90
Community Genomics 445 C
(1) facilitating integration across datasets and disci-
Combinatorics of Sets plinary approaches to the study of the same organism
and (2) facilitating cross-species comparison across
▶ Subset Surprisology and Toponomics available datasets (Howe and Rhee 2008; Leonelli
and Ankeny 2012). Examples of community databases
are The Arabidopsis Information Resource, gathering
data about Arabidopsis thaliana (Koornneef and C
Committee Meinke 2010); and FlyBase, gathering data about Dro-
sophila melanogaster (Drysdale and FlyBase Consor-
▶ Ensemble tium 2008).

Cross-References
Common Cold
▶ Data-Intensive Research
▶ Viral Respiratory Tract Infections

References

Communication Theory Drysdale R, FlyBase Consortium (2008) FlyBase: a database for


the Drosophila research community. Methods Mol Biol
420:45–59
▶ Information Theory and Toponomics Howe D, Rhee SY (2008) The future of biocuration. Nature
455:47–50
Koornneef M, Meinke D (2010) The development of
Arabidopsis as a model plant. Plant Journal 61(6):909–921
Leonelli S, Ankeny RA (2012) Re-thinking organisms: the epi-
Community stemic impact of databases on model organism biology. Stud
Hist Philos Biol Biomed Sci 43(1):29–36
▶ Dynamic Modularity
▶ Module Network

Community Detection

Community Database ▶ Modules, Identification Methods and Biological


Function
Sabina Leonelli
ESRC Centre for Genomics in Society, University of
Exeter, Exeter, Devon, UK

Community Genome
Synonyms
▶ Metagenome
Model organism database

Definition
Community Genomics
Database collecting diverse types of data documenting
the same (set of) organisms, with the purpose of ▶ Metagenomics
C 446 Community Metabolomics

Definition
Community Metabolomics
A fundamental goal of biology is to understand the cell
▶ Metametabolomics as a system of interacting components which can be
represented as a network (Chen et al. 2009). The dis-
covery, analysis, and understanding of interactions
between molecules as well as the networked system
Community Proteomics
have received significant attentions in recent years. In
a network, each node corresponds to a molecule and an
▶ Metaproteomics
edge indicates a direct/indirect interaction between the
molecules. As the number of available molecular net-
works for various species rapidly increases, compara-
Community Structure tive analysis of them across species is proving to be
increasingly important. The comparative study of
▶ Modular Organization of Gene Regulatory Networks molecular networks can be topologically coarse-level
comparison such as the comparative analysis of degree
distribution, clustering coefficient, diameter, and rela-
Community Transcriptomics tive graphlet frequency distribution, or can be the
discovery of conserved subgraphs among networks.
▶ Metatranscriptomics Actually, the later one which is known as network
alignment problem in bioinformatics field is a well-
known concept for molecular network studies and will
be the key of this entry. We should note here that the
Compact Chromatin problems similar to network alignment have also been
studied in other fields like graph theory, data mining,
▶ Heterochromatin and computer vision. For example, the problem of
matching a query image to an existing image in the
database has been formulated as a graph-matching
problem in computer vision field, where each image
Compact DNA
is represented as a graph/network.
▶ Chromatin
Characteristics

Comparative Analysis of Molecular The network comparative analysis especially network


Networks alignment is similar in spirit to traditional sequence-
based comparative genomic analyses or structure-based
Shihua Zhang1 and Zhenping Li2 comparative analysis. It also offers a function-oriented
1
National Center for Mathematics and perspective that complements traditional sequence-
Interdisciplinary Sciences, Academy of Mathematics based and structure-based methods. Comparative
and Systems Science, Chinese Academy of Sciences, network analysis also enables us to uncover the network
Beijing, China (dis)similarity, identify conserved functional modules
2
School of Information, Beijing Wuzi University, across species, perform accurate ortholog prediction
Beijing, China and function prediction as well as transfer insights and
information across species.
In general, the goal of network alignment problem
Synonyms is to find a common and approximative subgraph with
a set of conserved edges across the input networks
Network alignment; Network querying (Fig. 1). There exists a mapping between the nodes of
Comparative Analysis of Molecular Networks 447 C
Species 1 Matched proteins Network alignment
(Condition/type 1) Match protein pairs that are
sequence-similar

PKSDIDVDLCSELMAKACSE-GV
PKS +D+DLCSEL+ KAC++ +
PKSSLDIDLCSELIIKACTDCKI

C
Biological networks

Conserved Matched
Species 2 interactions protein pairs
(Condition/type 2)

High-scoring
conserved subnetworks
Search
algorithm

Comparative Analysis of Molecular Networks, Fig. 1 A general framework for pairwise network alignment, with permission
from Sharan and Ideker (2006)

the subgraph which may correspond to the sequence taking into account both the sequence homology
similarity or function similarity of molecules (Ogata between the aligned molecules and the probabilities
et al. 2000). The goal is to maximize the overlap that interactions in the subnetworks are true or not
between the aligned networks and ensure that the pro- (Kelley et al. 2004), while the NetworkBLAST tool
teins mapped to each other are evolutionarily or func- detects conserved protein clusters rather than only
tionally related. Moreover, according to the number of paths across pairwise or multiple networks, by
input networks, the network alignment problem can be deploying a likelihood-based scoring scheme that
formulated as pairwise or multiple alignments. While weighs the denseness of a subnetwork versus the
according to the scope of node mapping desired, the chance of observing such subnetwork at random
alignment can be local or global which is analogous to (Sharan et al. 2005). By using this tool to compare
sequence alignment or structure alignment problem. multiple networks across different species, Suthram
We may also want to align a known or even unknown et al. (2005) explored whether the divergence of Plas-
small network to a large network or network database modium at the sequence level can be embodied at the
to find the conserved ones which is named as the level of the structure of its protein interaction network.
network querying problem. They found that Plasmodium has only three conserved
The majority of methods used for alignment of complexes versus yeast, and no conserved complexes
molecular networks have focused on local alignments against fly, worm, and bacteria. But yeast, fly, and
(Berg and L€assig 2004, 2006; Kelley et al. 2004; worm share an abundance of conserved complexes
Flannick et al. 2006; Liang et al. 2006) which attempts with each other. In another method MaWISh, the
to find local region similarity and find node mappings authors formulated network alignment as a maximum
independently for different regions. There have been weight induced subgraph problem and implemented an
many algorithms for local alignment in the literature. evolution-based scoring scheme to discover conserved
The PathBLAST tool searches for high-scoring linear modules. The method extends the concepts of evolu-
or tree substructures between two large networks, by tionary events in sequence alignments to that of
C 448 Comparative Analysis of Molecular Networks

duplication, match, and mismatch in network align- rankings with other ranking algorithms (Liao et al.
ments and evaluates the similarity between network 2009). The Graemlin method has also been extended
structures through a scoring function that accounts for to allow global network alignment which is based
these evolutionary events (Koyut€ urk et al. 2005), while on a learning algorithm that uses a training set of
the Graemlin is the first method which can identify known network alignments and their phylogenetic
dense conserved subnetworks of arbitrary structure. relationships to learn parameters for its scoring
This method scores a module by computing the log- function, and by automatically adapting the learned
ratio of the probability that it is subject to evolutionary objective function to any set of networks (Flannick
constraints and the probability that it is under no con- et al. 2008).
straints, while taking into account phylogenetic rela- Another aspect of network alignment studies is
tionships between species whose networks are being alignment for multiple networks which has gained
aligned (Flannick et al. 2006). Another example is great progress recently. Several of above methods
MNAligner (Li et al. 2007), which is designed for have also been developed for multiple cases. For
general molecular networks and combines both molec- example, Graemlin developed by Flannick et al.
ular similarity and topological similarity. This method (2006), which uses a probabilistic function for topol-
can detect conserved subnetworks in an efficient math- ogy matching, and can be applied to search for con-
ematical programming manner without requiring spe- served functional modules among multiple protein
cial structures on the networks. interaction networks. The C3Part-M method (Deniélou
Local alignments can be ambiguous, in which one et al. 2009) addressed the problem of multiple network
node can have different counterparts in different con- alignment with an exact and generic approach by
served modules. By contrast, a global network align- avoiding the explicit construction of the network align-
ment provides a unique alignment for each node in the ment multigraph, and then can deal with a large num-
smaller network to exactly one node in the larger ber of networks. By using microarray data from
network. We have to note that this may lead to multiple conditions and species, various comparative
suboptimal alignments in some local regions. Gener- studies have been conducted so as to reveal transcrip-
ally, the local network alignment algorithms have not tional regulatory modules, predict gene functions, and
been able to identify large subnetworks that have been uncover evolutionary mechanisms (Zhou and Gibson
conserved during evolution due to the algorithmic 2004). For example, Yan et al. (2007) have developed
complexity or modeling aspect (Berg and L€assig a graph-based data-mining algorithm NeMo to detect
2004; Kelley et al. 2004). Recently, global network frequent co-expression modules among a large number
alignment methods have been studied and designed for of gene co-expression networks across various condi-
aligning molecular networks (Singh et al. 2007; tions. They found a large number of potential tran-
Flannick et al. 2008; Zaslavskiy et al. 2009). Unlike scriptional modules, which are activated under
the local alignment algorithms, the IsoRank method multiple conditions.
(Singh et al. 2008) aims to maximize the overall match In addition to the network alignment method
between the two networks. The method is based on discussed above, the network querying technique is
spectral graph theory which computes scores of becoming a new major network comparative analysis
aligning pairs of nodes according to the score of their tool. The goal of network querying is to find a small
respective neighbors from different networks. In other network against a large-scale network or a database of
words, the score of a protein pair depends on the score large-scale networks which is a NP-hard problem
of their neighbors. And then sequence scores are incor- (Fig. 2). The network querying problem has been stud-
porated into these “topological” scores in the pairwise ied in the past few years and a few search tools have
alignment scores. Finally, IsoRank constructs the been developed, such as PathBLAST (Kelley et al.
node alignment with the repetitive greedy strategy of 2003, 2004), QPath (Shlomi et al. 2006), TORQUE
identifying among all protein pairs the highest scoring (Bruckner et al. 2009), and QNet (Dost et al. 2007).
pair (Singh et al. 2008). Recently, the IsoRankN PathBLAST (Kelley et al. 2003, 2004) developed by
has extended based on the notion of node-specific Kelley et al. can query a small pathways with length no
Comparative Analysis of Molecular Networks 449 C
Comparative Analysis of
Molecular Networks,
Fig. 2 Illustration of
metabolic pathways querying
in a pathway database,
redrawn from (Pinter et al.
2005)
C

more than 5 nodes. MetaPathwayHunter (Pinter et al. Cross-References


2005) developed by Pinter et al. can fast query for
smaller pathways or trees. QPath (Shlomi et al. 2006) ▶ Global Network Alignment
developed by Shlomi et al. can search for linear ▶ Local Network Alignment
pathways. NetMatch (Ferro et al. 2007) based on ▶ Network Alignment
a graph-matching algorithm can find the subgraphs of ▶ Network Querying
the original graph connected in the same way as the ▶ Networks Comparison
querying graph. The results of NetMatch can be
viewed as candidate network motifs as a result of
their similar topological features (Alon 2007). Besides References
the network querying tools discussed above, many
querying tools, such as BLAST for sequence querying Alon U (2007) Network motifs: theory and experimental
and DALI for structure querying, have been developed approaches. Nat Rev Genet 8:450–461
Berg J, L€assig M (2004) Local graph alignment and motif search
by researchers in other areas of computational biology,
in biological networks. Proc Natl Acad Sci USA 101:14689–
and have had a tremendous impact on the development 14694
of biological science (Zhang et al. 2008). Berg J, L€assig M (2006) Cross-species analysis of biological
To benefit from the accumulation of networked data networks by bayesian alignment. Proc Natl Acad Sci USA
103:10967–10972
for different species, it will be important to develop
Bruckner S, Huffner F, Karp RM, Shamir R, Sharan R (2009)
user-friendly network comparison tool. Recent Topology-free querying of protein interaction networks.
advances in the field demonstrate the great potential Nucleic Acids Res 37(suppl 2):W106–W108
of network comparison in elucidating network organi- Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
methods and applications in systems biology. Wiley,
zation, function, and evolution. Advances in computa-
Hoboken
tional methods and powerful software tools are being Deniélou Y, Boyer F, Viari A, Sagot M (2009) Multiple align-
made by interdisciplinary cooperation across different ment of biological networks: a flexible approach. Combina-
fields. With the development of powerful and sophis- torial pattern matching. LNCS 5577:263–273
Dost B, Shlomi T, Gupta N, Ruppin E, Bafna V, Sharan R (2007)
ticated network comparison tools, we expect to gain
QNet: a tool for querying protein interaction networks.
deep insight into essential mechanisms of biological Research in computational molecular biology. LNCS
systems at the network level. 4453:1–15
C 450 Competitive Repopulation

Ferro A, Giugno R, Pigola1 G, Pulvirenti A, Skripin D, Bader Zhang S, Zhang XS, Chen L (2008) Biomolecular network
GD, Shasha D (2007) NetMatch: a cytoscape plugin for querying: a promising approach in systems biology. BMC
searching biological networks. Bioinformatics 23:910–912 Syst Biol 2:5, Jan 18
Flannick J et al (2006) Graemlin: general and robust alignment Zhou XJ, Gibson G (2004) Cross-species comparison of
of multiple large interaction networks. Genome Res genome-wide expression patterns. Genome Biol 5(7):232
16(9):1169–1181
Flannick J et al (2008) Automatic parameter learning
for multiple network alignment. RECOMB LNBI
4955:214–231
Kelley BP et al (2003) Conserved pathways within bacteria and Competitive Repopulation
yeast as revealed by global protein network alignment. Proc
Natl Acad Sci USA 100:11394–11399 ▶ Single Cell Assay, Hematopoietic Stem Cell
Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR,
Ideker T (2004) PathBLAST: a tool for alignment of
protein interaction networks. Nucleic Acids Res 32
(Web Server issue):W83–W88
Koyut€urk M, Grama A, Szpankowski W (2005) Pairwise local Compiler
alignment of protein interaction network guided by models of
evolution. RECOMB LNBI 3500:48–65
Li Z, Zhang S, Wang Y, Zhang X, Chen L (2007) Alignment of José Román Bilbao Castro
molecular networks by integer quadratic programming. Bio- Supercomputing-Algorithms Group, University of
informatics 24(4):594–596 Almerı́a, Almerı́a, Spain
Liang Z, Xu M, Teng M, Niu L (2006) NetAlign: a web-based
tool for comparison of protein interaction networks. Bioin-
formatics 22:2175–2177
Liao CS, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: Definition
spectral methods for global alignment of multiple protein
networks. Bioinformatics 25:i253–i258
Ogata H, Fujibuchi W, Goto S, Kanehisa M (2000) A heuristic
A compiler is a computer program that translates a
graph comparison algorithm and its application to detect programmer-written code into something a processor
functionally related enzyme clusters. Nucleic Acids Res can “understand.” The programmer writes what is called
28:4021–4028 a “source code” in a specific language. Such a language
Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M
(2005) Alignment of metabolic pathways. Bioinformatics
is easier to understand to the programmer than a
21:3401–3408 “machine language” composed of 1s and 0s. The com-
Sharan R, Ideker T (2006) Modeling cellular machinery piler then facilitates programming by allowing a much
through biological network comparision. Nature Biotechnol higher level of abstraction. Compiler development is
24:427–433
Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P,
a very important field of study as compiled programs’
Sittler T, Karp RM, Ideker T (2005) Conserved patterns of performances will strongly depend on the compiler
protein interaction in multiple species. Proc Natl Acad Sci ability to interpret the high level code and optimize it
USA 102:1974–1979 for the underlying machine it will be executed on.
Shlomi T, Segal D, Ruppin E, Sharan R (2006) QPath: a method
for querying pathways in a protein-protein interaction net-
work. BMC Bioinformatics 7:199
Singh R, Xu J, Berger B (2008) Global alignment of multiple Cross-References
protein interaction networks with application to functional
orthology detection. PNAS 105(35):12763–12768
Suthram S, Sittler T, Ideker T (2005) The plasmodium protein
▶ Multicore Computing
network diverges from those of other eukaryotes. Nature
438:108–112
Yan X, Mehan M, Huang Y, Waterman MS, Yu PS, Zhou XJ
(2007) A graph based approach to systematically reconstruct
human transcriptional regulatory modules. Bioinformatics Complementarity Determining Region
23(13):i577–i586
Zaslavskiy M, Bach F, Vert JP (2009) Global alignment of
protein-protein interaction networks by graph matching ▶ Complementarity Determining Region (CDR-
methods. Bioinformatics 25(12):i259–i267 IMGT)
Complementarity Determining Region (CDR-IMGT) 451 C
IMGT ®, the international ImMunoGeneTics informa-
Complementarity Determining Region tion system ® (http://www.imgt.org) (▶ IMGT® Infor-
(CDR-IMGT) mation system).
In a V-LIKE-DOMAIN (V domain of ▶ Immuno-
Marie-Paule Lefranc globulin Superfamily (IgSF) other than IG and TR),
Laboratoire d’ImmunoGénétique Moléculaire, Institut the three loops BC, C’C” and FG correspond,
de Génétique Humaine UPR 1142, Université structurally, to the three CDR-IMGT of a C
Montpellier 2, Montpellier, France V-DOMAIN, and are delimited by anchors at the
same positions, 26 and 39, 55 and 66, 104 and 118,
respectively.
Synonyms Amino acid positions of the CDR-IMGT of a
V-DOMAIN have always the same number according
CDR-IMGT; Complementarity determining region to the ▶ IMGT unique numbering for V domain
(Lefranc et al. 2003). This allows to define in an
▶ IMGT Collier de Perles the lengths of the CDR-
IMGT. These characteristics, based on the ▶ IMGT-
Definition ONTOLOGY concepts, are managed in the ▶ IMGT ®
information system databases and tools. Starting from
A Complementarity determining region (CDR-IMGT) amino acid sequences, the CDR-IMGT lengths are
is a loop region of a V-DOMAIN (▶ Variable (V) obtained using the IMGT/DomainGapAlign tool
Domain), delimited according to the IMGT unique (http:www.imgt.org) (Lefranc 2009; Ehrenmann et al.
numbering for V domain (Lefranc et al. 2003). There 2010).
are three CDR-IMGT in a V-DOMAIN: CDR1-IMGT The lengths of the three CDR-IMGT lengths of
(loop BC), CDR2-IMGT (loop C0 C00 ), and CDR3- a V-DOMAIN are shown between brackets and sep-
IMGT (loop FG). arated with dots. For example, the CDR-IMGT
In a V-DOMAIN (V domain of the immunoglobu- lengths of the V-DOMAIN displayed in Fig. 1 are
lins (IG) or antibodies and T cell receptors (TR)), [8.10.12]. The CDR-IMGT lengths of a basic
the amino acids of the CDR-IMGT bind to an V domain without gaps are [12.10.13]. For CDR3-
antigen (▶ Epitope) and confer the specificity to IMGT with more than 13 amino acids, additional
the IG and TR (▶ Paratope, ▶ IMGT-ONTOLOGY, positions are added at the top pf the loop (Lefranc
SpecificityType). The first two CDR-IMGT are part of et al. 2003).
the V-REGION (encoded by a ▶ variable (V) gene), The CDR-IMGT lengths of therapeutic
whereas the CDR3-IMGT corresponds to the junction antibodies are required for the World Health
and results from the rearrangement between a V gene Organization/International Nonproprietary Names
and a ▶ joining (J) gene (V-J rearrangement) or (WHO/INN) and are included in the INN definitions
between a V gene, a ▶ diversity (D) gene, and a J (Lefranc 2011). The standardized delimitation
gene (V-D-J rearrangement) (▶ Immunoglobulin Syn- of the CDR-IMGT has a crucial importance in anti-
thesis). The two anchors, 2nd-CYS 104 and J-TRP or body engineering and antibody humanization by
J-PHE 118 (▶ Framework region (FR-IMGT)), are CDR grafting: it allows to precisely delimit the
part of the JUNCTION but do not belong to the CDR amino acids from the original antibody
CDR3-IMGT. (murine, rat, etc.) that need to be grafted on an
Analysis of the JUNCTION and of the CDR3- human antibody framework, in order to preserve the
IMGT of V-DOMAIN nucleotide sequences is specificity of the original antibody in the therapeutic
performed by IMGT/JunctionAnalysis, integrated in monoclonal antibody, while decreasing its immuno-
IMGT/V-QUEST, the nucleotide sequence analysis genicity (Lefranc 2009, 2011; Ehrenmann et al.
tool (Lefranc 2009; Ehrenmann et al. 2010) of 2010).
C 452 Complementarity Determining Region (CDR-IMGT)

Complementarity Determining Region (CDR-IMGT), representation. (e) Spacefill 3D representation. The CDR1-
Fig. 1 Complementarity determining regions (CDR-IMGT) of IMGT, CDR2-IMGT and CDR3-IMGT regions are colored in
a V-DOMAIN. The CDR-IMGT correspond to three loops in red, orange and purple, respectively (IMGT Color menu). The
IMGT Colliers de Perles (on one layer, and on two layers with CDR-IMGT lengths are [8.10.12]. The antiparallel beta strands
hydrogen bonds) and in three-dimensional (3D) structures. (A to G) correspond to the ▶ Framework Region (FR-IMGT).
(a) Amino acid sequence (http://www.imgt.org) of an antibody Anchors positions, shown as squares in B and C (26 and 39, 55
V-DOMAIN (VH of the IG-Heavy chain) with delimitation of and 66, 104 and 118), and as spheres in D, belong to the
the three CDR-IMGT. (b) IMGT Collier de Perles on one layer. FR-IMGT. Hydrophobic amino acids (hydropathy index with
(c) IMGT Collier de Perles on two layers. Hydrogen bonds positive value) and tryptophan (W) found at a given position in
between the amino acids of the C, C0 , C00 , and F and G strands more than 50% of analysed IG and TR sequences are shown
and those of the CDR-IMGT are shown. (d) Ribbon 3D in blue in B and C
Complex Behavior 453 C
Cross-References understand living systems as being somehow both
lawful and yet also unpredictable. Yet in her book of
▶ Diversity (D) Gene the same name, Mitchell (2009) notes the current lack
▶ Epitope of a consensual definition of what precisely constitutes
▶ Framework Region (FR-IMGT) a complex system. She cites definitions of complexity
▶ IMGT Collier de Perles as the algorithmic information content of a system
▶ IMGT Unique Numbering (i.e., the minimum coding length of an algorithm C
▶ IMGT ® Information System capable of reconstructing the system), as the fractal
▶ IMGT-ONTOLOGY dimension of a system, and as the level of composi-
▶ Immunoglobulin Superfamily (IgSF) tional hierarchy possessed by a system’s structure. She
▶ Immunoglobulin Synthesis concludes from this potpourri of definitions that
▶ Joining (J) Gene “notions of complexity [. . .] have many interacting
▶ Paratope dimensions and probably can’t be captured by
▶ IMGT-ONTOLOGY, SpecificityType a single measurement scale.”
▶ Variable (V) Domain These definitions reflect a marked ambiguity in
▶ Variable (V) Gene what precisely we mean when we describe life
as being complex. Definitions in terms of
algorithmic information content regard complexity
References as complicatedness: Humans are more complex than
bacteria because they possess a more complicated
Ehrenmann F, Duroux P, Giudicelli V, Lefranc M-P (2010) Stan- behavioral repertoire. On the other hand, definitions
dardized sequence and structure analysis of antibody using
in terms of compositional hierarchy tackle a more
IMGT ®. In: Kontermann R, D€ ubel S (eds) Antibody engi-
neering, vol 2, 2nd edn. Springer-Verlag Berlin, Heidelberg, elusive quality of living systems as organized:
pp 11–31, chapter 2 Humans are more complex than bacteria because
Lefranc M-P (2009) Antibody database and tools: the IMGT ® their wide behavioral repertoire is organized around
experience. In: Zhiqiang An (ed) Therapeutic monoclonal
overarching purposes.
antibodies: from bench to clinic. Wiley, Hoboken,
New Jersey, pp 91–114, chapter 4 A synthesis of these quite differing criteria has
Lefranc M-P (2011) Antibody nomenclature: from IMGT- emerged from Kauffman’s (1993) analysis of the
ONTOLOGY to INN definition Mabs 3:1–2 behavior of a family of dynamical systems called NK
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
Boolean networks. When the connectivity K of
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
unique numbering for immunoglobulin and T cell receptor a Boolean network is below a certain threshold, the
variable domains and Ig superfamily V-like domains. Dev network exhibits ordered behavior, converging to
Comp Immunol 27:55–77 a temporally stable, globally coordinated attractor
state, while at higher values of K, the behavior
becomes chaotic, veering unpredictably between topo-
logically uncoordinated states. At intermediate values
Complex Behavior of K the network enters the complex regime, in which
the network divides into connected regions of compar-
Niall Palfreyman atively stable, topologically coordinated states, yet
Biotechnology and Bioinformatics, Weihenstephan- intermittently redistributing these coordinated states
Triesdorf University of Applied Science, Freising, and regions.
Germany Such complex behavior is certainly visually remi-
niscent of living systems – it is neither random nor
predictably ordered, but is characterized by dynamic
Definition metastability. A dynamical system is metastable with
respect to some parameter l if its dynamics remain
Since the 1990s, the term complexity has played an relatively stable over large variational regions of l, but
increasingly prominent role in our attempts to bifurcate abruptly across specific critical values of l.
C 454 Complex EGI and Enriched Complex EGI

Complex Behavior, 2.5


Fig. 1 Complex speciation Group1
2 Group2
of birds under a symmetrical
Group3
evolutionary rule 0.5 Group4

0.5

−0.5

−1
0 1 2 3 4 5 6 7 8 9 10 11 12

According to this account, complex behavior is


therefore behavior which is metastable with respect Complex EGI and Enriched Complex EGI
to one or more parameters which may vary either
spontaneously or in response to environmental inputs. Virginio Cantoni1,2, Alessandro Gaggia1 and
To put it differently: Complex behavior is the ability to Luca Lombardi1
1
switch between different, reliably reproducible, varie- Department of Computer Engineering and Systems
ties of behavior in response to varying environmental Science, University of Pavia, Pavia, Italy
2
conditions. Computational Biology, KTH Royal Institute of
For example, the orbiting of the Moon round the Technology, Stockholm, Sweden
Earth is not complex – it is too predictable. The motion
under gravity of three stars of equal mass about each
other is also not complex, being in general chaotic Definition
and completely unpredictable. By contrast, the
self-organization exhibited by Golubitsky and Stewart The most important advantage of the ▶ Extended
(2002, pp. 3–6: see Fig. 1) model of sympatric speci- Gaussian Image, the position invariance, is also its
ation is complex. Four distinct interacting groups of principal drawback: as a consequence, the translation
organisms evolve according to a single symmetrical of a 3D target object cannot be recovered. A solution to
rule parameterized by an exogenous parameter l. this problem is to represent object surfaces by their
Although initially the groups have almost identical EGI adding a support function: the signed distance of
traits, as the value of l varies smoothly over time, the oriented tangent plane from a predefined origin. In
this symmetry collapses, and the four trait groups this new representation, called Complex EGI (CEGI)
self-organize into two distinct species with clearly (Kang and Ikeuchi 1991) (Kang and Ikeuchi 1993), the
distinguishable traits. While the final trait value of weight at each patch, representing a discretized cell, is
any particular group depends sensitively on initial a complex number that encodes both area and distance
conditions, the bifurcation into the two species is (Fig. 1). The magnitude of the complex number is the
predictable. surface area of the object associated with that surface
normal, while the phase, which supports the displace-
ment information, is the signed distance of the surface
References patch from a predefined origin in the direction of its
normal.
Golubitsky M, Stewart I (2002) The symmetry perspective. The complex weight associated with the surface
Birkh€auser, Basel patch Ai is Ai,nkejdk, where Ai,nk is the area of patch Ai
Kauffman SA (1993) The origins of order. Oxford University
Press, New York
with the outward normal nk, and dk is the normal
Mitchell M (2009) Complexity: a guided tour. Oxford University distance of the plane within which Ai,nk lies to an
Press, Oxford assigned origin. For any given point in the CEGI
Complex EGI and Enriched Complex EGI 455 C
corresponding to normal nk, the magnitude of the distance from each one from the coordinate planes.
point’s weight is IAnkejdkI. Ank is independent of the The magnitude of the ECEGI representation is trans-
normal distance, and if the object is convex, the distri- lation-invariant. The ECEGI can be viewed as three
bution of Ank corresponds to the conventional EGI independent complex Gaussian spheres, each
representation. Equation 1 represents the CEGI com- corresponding to the axis (x, y, z) (Fig. 2).
plex weight: The weight is, in this case, represented by three
!
complex numbers given by (2): C
X
Nk
W ! ¼ Al;nk ejdl;k (1) !
nk
l¼1 X
Nk
W ! ¼ Ai;nk ejXi;k
x;n k
Also in the Enriched Complex Extended Gaussian i¼1
!
Image (ECEGI) (Hu et al. 2010) each patch of the 3D
X
Nk
object surface contributes with a complex weight to the W ! ¼ Ai;nk ejYi;k (2)
y;n k
associated point orientation of the Gaussian sphere. i¼1
!
However, while the CEGI uses only a scalar complex
X
Nk
number, the ECEGI uses a vector of three complex W ! ¼ Ai;nk ejZi;k
z;n k
numbers. The resultant weight is the sum of the i¼1
contributions of all surface patches having the normal
in common, where the exponent is given by the

Cross-References
n1

n1 A1e jdl ▶ Extended Gaussian Image


A1 ▶ Extended Gaussian Image for Pocket-Ligand
Matching

References

Hu Z, Chung R, Fung KSM (2010) EC-EGI: enriched complex


EGI for 3D shape registration. Machine vision and applica-
tions vol 21, pp 177–188
Complex EGI and Enriched Complex EGI, Fig. 1 Complex Kang SB, Ikeuchi K (1991) Determining 3-D object pose using
extended Gaussian image the complex extended Gaussian image. In: IEEE Computer

n1 n1
A1eidxy A1eidxz
n1
A1

n1
A1eidyz

Complex EGI and Enriched


Complex EGI,
Fig. 2 Enriched complex
extended Gaussian image
C 456 Complex System

Society conference on computer vision and pattern recogni- complex systems. The effects of an element’s behavior
tion, Maui, HI, pp 580–585 are fed back to in such a way that the element itself is
Kang S, Ikeuchi K (1993) The complex EGI, a new representa-
tion for 3D pose determination. IEEE Trans Pattern Anal altered (Kitano 2007). Emergent behavior in complex
Machine Intell 15(7):707–721 systems arises if all constituents of the system observed
on one level cannot explain the system properties on
a coarser or higher level (Walleczek 2000).

Complex System
References
Roberta Alfieri and Luciano Milanesi
Institute for Biomedical Technologies – CNR Kitano H (2007) Towards a theory of biological robustness. Mol
Syst Biol 3:137
(Consiglio Nazionale delle Ricerche), Segrate,
Rocha LM (1991) Systems modeling: using metaphors from
Milan, Italy nature in simulation and scientific models, Los Alamos
National Laboratory, Nov 1999
Walleczek J (2000) Self-organized biological dynamics and
nonlinear control. Cambridge University Press, Cambridge
Synonyms

Highly structured system


Complexity

Definition Marie I. Kaiser


Department of Philosophy, University of Cologne,
A complex system is a system composed of Cologne, Germany
interconnected parts which as a whole exhibit one
or more properties (behavior among the possible
properties) not obvious from the properties of the indi- Synonyms
vidual parts. This characteristic of every system is
called emergence and is true of any system, not just Complication; Intricacy
complex ones.
Complex systems are usually open systems since
they exist in a thermodynamic gradient and dissipate Definition
energy. In other words, complex systems are fre-
quently far from energetic equilibrium: but despite Complex systems are systems which show a range of
this flux, there may be pattern stability (Rocha 1991). different characteristics like the multiplicity of their
Complex systems may have a memory: the history of parts, the nonlinearity and nonadditivity of the inter-
a complex system may be important. Because complex actions between their parts, the sensitivity of their
systems are dynamical systems, they change over time, behavior to initial conditions, their hierarchical orga-
and prior states may have an influence on present nization, their self-organization, and the robustness
states. More formally, complex systems often exhibit and the emergence of their behavior. Some of these
hysteresis. Complex systems may be nested since the characteristics are necessary conditions for a system to
components of a complex system may themselves be be complex; others are just typical for many complex
complex systems. systems. The latter is due to the fact that nature exhibits
Generally relationships in a complex system are a variety of different kinds of complexity.
nonlinear. In practical terms, this means a small pertur-
bation may cause a large effect, a proportional effect, or
even no effect at all. In linear systems, effect is always Characteristics
directly proportional to cause. Moreover, relationships
contain feedback loops: Both negative (damping) and In recent decades, the focus of scientific research has
positive (amplifying) feedback are often found in shifted more and more to trying to understand and
Complexity 457 C
handle the complexity of nature. In biology, for some crucial insights of pursuing the first strategy
instance, the reductionistic assumption (▶ Reduction) will be revealed, whereas section “Different Kinds of
that there is a neat, simple gene-protein-trait-fitness Complexity” introduces some classifications of com-
relationship and that the behavior of a biological sys- plexity (second strategy).
tem can be understood merely by studying the parts of Before doing so, it should be stressed that the two
the system in isolation has been rejected. Instead, con- strategies are neither opposite nor incompatible. What
temporary biologists try to account for the complexity distinguishes them is the emphasis they put on the C
of nature by recognizing the “wholeness” (▶ Holism) diversity of complexity. Those who aim at a definition
of biological systems (e.g., Chong and Ray 2002), that of complexity which identifies necessary and jointly
is, by paying attention to the various interactions sufficient conditions for a system to be complex (first
between the parts of a system and to how these parts strategy) focus on the similarities among different
are integrated to the system as a whole. For example, cases of complexity. On the contrary, those who pursue
rather than examining the isolated functions of genes, the second strategy and distinguish different kinds of
biologists study the dynamics of entire ▶ gene regula- complexity put more emphasis on the actual variety of
tory networks. This focus shift toward complexity ways that systems are complex. However, it is also
issues is not restricted to the biological sciences, but possible to seek after a list of core features of complex
rather, it is a quite general trend. This is why some systems and at the same time account for their diver-
authors speak of a “complex system revolution” sity. For instance, one could abandon the requirement
(Hooker 2011, 6) that has taken place and still goes that these features must be necessary and allow also
on in contemporary science. features on the list that are only typical for many (but
not for all) complex systems. Furthermore, one could
What Is a Complex System? also combine the first and the second strategy by at first
Despite the fact that complexity issues more and more distinguishing different kinds of complexity, and then
gain center stage in several research fields, there exists specifying these types of complexity by identifying
neither a unified science of complex systems nor different sets of features that are associated with
a consensus about what complexity is and what makes these different kinds.
a system complex. Rather, the characterizations of com- What are the features that are said to be necessary
plexity partially vary from field to field and from author for complexity or that are at least typical for many
to author. There are two different ways of how one complex systems? The following main features are
could react to this situation. On the one hand, one widely associated with complex systems (which is
could argue that there is still much empirical and con- not to say that this list is exhaustive; for alternative
ceptual work left to do, and that sometime in the near approaches, see Hooker 2011 or Ladyman et al. 2012).
future, scientists and philosophers would have figured
out how to specify what a complex system is. On the Multiplicity of Parts
other hand, one could point out that the actual disagree- Complex systems typically consist of a large number of
ment on how to characterize complexity is not due to parts. In some cases, many of these parts are of the same
our insufficient knowledge. Rather, it arises from the or of similar kind (e.g., a swarm of birds is composed of
actual variety of ways that systems are complex. In birds of the same species). Other complex systems are
other words, nature exhibits different kinds of complex- made of components that belong to several different
ity that cannot be captured by a single definition. kinds (e.g., organ systems like the cardiovascular system,
Depending on which claim an author subscribes to, which consist of different kinds of tissues and cells).
he will favor one of two strategies for characterizing
complexity: first, several authors try to specify what Nonlinearity and Nonadditivity of Interactions
complexity is by proposing a list of features, each of The parts of a complex system causally interact
which is necessary and all together are sufficient for (▶ Causality) with each other in order to bring about
a system to be complex (e.g., Hooker 2011; Ladyman a particular behavior of the overall system. It is char-
et al. 2012); the second strategy consists in acteristic for many complex systems that these inter-
distinguishing different kinds of complexity (e.g., actions are nonlinear, more precisely, that the behavior
Mitchell 2003; Kuhlmann 2011). In this section, of the system is described by a mathematical function
C 458 Complexity

that is nonlinear (e.g., because the variable of interest is remains stable under a certain range of disturbances
squared). Nonlinear interactions frequently involve of the parts of the system.
positive or negative ▶ feedback.
Nonlinear dynamical equations are characterized by Emergence
nonadditivity, which means that numerical combina- The behavior of complex systems is often said to be
tions of solutions are in general not solutions. The emergent (▶ Emergence) in the sense of being
feature of nonadditivity is one reason why complex unexpected, unpredictable, or unexplainable on basis
systems are said to be “more than the sums of their of the knowledge about the parts of the system in
parts.” To put it another way, complex systems are not isolation (more on this in section “Emergence and the
aggregative systems (Wimsatt 2007) because their Limits of Reductionism”).
behavior does not remain invariant under interchanging
their parts, under changes in the number of parts, and Different Kinds of Complexity
under decomposing and rearranging their parts, and the A second way to characterize the phenomenon of com-
interactions among their parts are not linear. plexity is to distinguish different kinds of complexity.
At least two ▶ classifications of complexity are worth
Sensitivity to Initial Conditions being mentioned here. The first one has been intro-
The behavior of some complex systems is sensitive to duced by Mitchell (2003, 2009). She distinguishes
initial conditions, that is, their behavior differs largely three different kinds of complexity in biology: consti-
in the long run even if there are only minuscule differ- tutive complexity, dynamic complexity, and evolved
ences at the beginning. This is due to the fact that the complexity. “Constitutive complexity” refers to the
nonlinearity of the interactions between the system’s complexity of the structure that biological systems
parts allows small differences in the system state to be (like organisms) display. The structure of biological
amplified (▶ Amplification) into large differences in systems is complex if the system as a whole is being
the subsequent system trajectory. formed of numerous parts in nonrandom, non-simple
▶ organization (see also Simon 1962; Wimsatt 2007).
Hierarchical Organization “Dynamic complexity” concerns the complexity of the
Complex systems typically possess a hierarchical processes that biological systems are engaged in, for
nature (▶ Hierarchy), that is, they are parts of higher- instance, the developmental or evolutionary processes
level systems and they consist of parts that are them- that organisms undergo. Finally, “evolving complex-
selves (lower-level) systems which, in turn, may also ity” refers to the domain of alternative adaptive solu-
be composed of subsystems, and so on. For instance, tions that are available for certain adaptive problems. If
organisms consist of organ systems that are composed there exists a wide diversity of forms in life which have
of tissues which consist of cells, etc. This feature is evolved as solutions to the same adaptive problem,
also referred to as the multilevel character of complex there is said to be much evolving complexity.
systems. Recognizing the hierarchical nature of com- More recently, Kuhlmann (2011) has argued that it
plex systems is important for understanding how com- is important to distinguish compositional complexity
plexity can evolve (Simon 1962). from dynamical complexity. Although he uses similar
words as Mitchell, the two kinds of complexity he
Self-organization identifies are different from hers. This difference
Complex systems are self-organizing systems. might (at least partly) be due to the fact that Kuhlmann
▶ Self-organization means that an initially disordered is more interested in complex systems from physics
system becomes more organized or ordered because of and socio-economics, rather than from biology. What
the interactions of the parts of the system. The process Kuhlmann means by compositional complexity of
of self-organization is spontaneous, that is, it is not a system is the complicated organization of the setup
centrally controlled by any agent or subsystem. conditions of a system, that is, the fact that a system
consists of many parts and that the individual behav-
Robustness iors of the parts as well as their organization determine
The organization or order of a complex system the overall behavior of a system. Somehow surpris-
is said to be robust (▶ Robustness), that is, it ingly, Kuhlmann emphasizes that, in this case, the
Complexity 459 C
parts of compositionally complex systems interact developed which decouple the concept of
with each other in a linear fashion, which is why the a mechanistic explanation from the concept of
behavior of the system is a summation of the behaviors a reductive explanation because they allow higher-
of its parts. Contrary to this, in the case of dynamical level and contextual factors to figure center stage in
complexity, most facts about the nature of the parts of a mechanistic explanation (e.g., Bechtel 2008).
the system as well as their initial arrangement have no
bearing on the behavior of the system. Rather, what Causality and Complex Systems C
makes these kinds of systems complex is that although A second philosophical question concerns the causal
they are compositionally simple and the rules that structure of complex systems (▶ Causality). Do any
determine their dynamics are simple (but nonlinear), challenges and constraints for a philosophical theory of
they show patterns that are factually unpredictable and causation arise from the peculiarities of the causal
qualitatively unexpectable. structure of complex systems?
A possible challenge might be the existence of
Further Philosophical Issues downward causation, which is a topic that is also
Explaining Complexity frequently discussed in science itself. Downward cau-
One important philosophical question that arises in sation encompasses cases, in which entities from
the context of complex system research concerns the a higher level of organization causally affect entities
nature of scientific ▶ explanation. Do the explana- on a lower level (▶ Interlevel Causation). Downward
tions of the behavior of complex systems constitute causation between a whole (i.e., the higher-level
a special kind of explanations, as, for instance, Mitch- entity) and its parts (i.e., the lower-level entities) is
ell (2009) and Kuhlmann (2011) argue? Or do they regarded as particularly problematic because a central
belong to one or more of the established kinds of assumption in most theories of causation is that cause
explanation (like causal-mechanistic explanation, and effect must not be identical. However, if it is true
covering-law explanation, functional explanation, that the relation between a whole and the set of its
mathematical explanation, etc.)? A good starting organized parts is one of identity, as one could argue,
point for addressing this question is the widespread the whole cannot be causally related to its parts.
claim of biologists that reductionistic-mechanistic Other challenges and constraints of a theory of
research strategies (▶ Reduction; ▶ Mechanism) are causation that may arise from the investigation of
inappropriate or, at least, insufficient to investigate complex systems concern the context-sensitivity of
and explain the behavior of complex systems (e.g., their behavior and the nonlinearity of the interactions
Gallagher and Appenzeller 1999). This suggests that between their parts. The latter often involve cyclic
explanations that are developed in sciences of com- causal relations like ▶ feedback, which constitute
plex systems are non-reductive (▶ Explanation, a challenge for some theories of causation (e.g., for
Reductive) and non-mechanistic. However, it is far causal graph theories).
from clear what exactly this claim amounts to and
whether it is true for all explanations in this field (e.g., Emergence and the Limits of Reductionism
whether it also applies to computational explanations One of the features that are characteristic of complex
that can be found, for instance, in systems biology). systems is that their behavior is said to be emergent
One reason why the explanation of the behavior of (see section “What Is a Complex System”). However,
complex systems might be non-reductive is that these despite its ubiquity, the notion of ▶ emergence is left
explanations frequently appeal not only to the parts of notoriously vague. Most scientists use the term “emer-
the system, but also to contextual factors and higher- gence” in an epistemic sense, that is, they call
level factors (which is why they are said to be a behavior (or property) of a system emergent if it is
multilevel explanations; Mitchell 2009). Moreover, unpredictable, unexpectable, or unexplainable on the
the behavior of complex systems cannot be explained basis of present knowledge about the behavior (or
by referring only to the parts of the system in isolation properties) of its parts in isolation.
(which is, in turn, typical for reductive explanations; Others want to understand emergence ontologically
Kaiser 2011). However, in the last 15 years, several and link it to irreducibility. According to them, study-
accounts of mechanistic explanation have been ing emergent behaviors of complex systems reveals the
C 460 Complication

limits of reductionism (▶ Reduction). More precisely, Bechtel W, Richardson RC (2010) Discovering complexity.
it discloses the conditions under which the (exclusive) Decomposition and localization as strategies in scientific
research. MIT Press, Cambridge, MA
application of reductive research strategies (like Gallagher R, Appenzeller T (1999) Beyond reductionism.
decomposition, simplification of the system’s context, Science 284:7
and investigating parts in isolation; Wimsatt 2007; Hooker C (ed) (2011) Philosophy of complex systems. Elsevier,
Kaiser 2011) is not adequate any more. In other Amsterdam (¼ Handbook of the philosophy of science; 10)
Kaiser MI (2011) Limits of reductionism in the life sciences.
words, the emergent behavior of a complex system Hist Philos Life Sci 33:389–396
cannot be understood by decomposing the system Kuhlmann M (2011) Mechanisms in dynamically complex
into its parts, by examining the parts in isolation (i.e., systems. In: McKay Illari P, Russo F, Williamson J (eds)
separated from the original system), and by ignoring Causality in the sciences. Oxford University Press, Oxford,
pp 880–906
the context of the system. This is, for instance, due to Ladyman J, Lambert J, Wiesner K (2012) What is a complex
the fact that the parts of a complex system are orga- system? Eur J Philos Sci (online first)
nized or “integrated” (Bechtel and Richardson 2010) in Mitchell SD (2003) Biological complexity and integrative
such a complicated way that their behavior is co- pluralism. Cambridge University Press, Cambridge
Mitchell SD (2009) Unsimple truths. science, complexity, and
determined by the system’s organization. Furthermore, policy. University of Chicago Press, Chicago/London
the behavior of several complex systems heavily Simon HA (1962) The architecture of complexity. Proc Am
depends on certain parts of their context (i.e., it is Philos Soc 106:467–482
non-robust under variations of these contextual Wimsatt WC (2007) Re-engineering philosophy for limited
beings. Piecewise approximations to reality. Harvard
factors), which is why the context of the system cannot University Press, Cambridge, MA
be ignored altogether or grossly simplified.

Cross-References Complication

▶ Amplification ▶ Complexity
▶ Causality
▶ Classification
▶ Emergence
▶ Explanation in Biology Components of Variance
▶ Explanation, Reductive
▶ Feedback Regulation ▶ Experimental Design, Variability
▶ Gene Regulatory Networks
▶ Hierarchy
▶ Holism
▶ Interlevel Causation Composite Motifs
▶ Mechanism
▶ Organization Jianhua Ruan
▶ Reduction Department of Computer Science, University of Texas
▶ Robustness at San Antonio, San Antonio, TX, USA
▶ Self-Organization

Definition
References A set of transcription factor binding motifs, usually
located close to each other on the chromosome, asso-
Bechtel W (2008) Mental mechanisms. philosophical perspec-
tives on cognitive neuroscience. Taylor and Francis, New ciated with a cooperative set of transcription factors. It
York/London is also called cis-regulatory modules.
Computational Autopoiesis 461 C
demarcation were accepted, then autopoietic organi-
Computational Autopoiesis zation would be a required property of all properly
biological systems, that is, of the entire domain of
Barry McMullin systems biology. Computational autopoiesis is
The Rince Institute, Dublin City University, Dublin, concerned with testing the conjecture by building
Ireland computational universes (virtual worlds) specifically
designed to facilitate embedding of systems with C
autopoietic organization and investigating the extent
Definition to which these can then exhibit lifelike phenomenol-
ogy within those universes (including dynamic
Autopoiesis (“self-production”) denotes a particular self-maintenance, reproduction, and evolutionary
class of “circular” system organization. A system is growth of complexity). Computational autopoiesis is
said to be autopoietic if its organization satisfies two commonly classified as a subfield within ▶ Artificial
separate, but intertwined, “closure” properties: Life.
• Closure in production: The system is composed of
components which give rise to (realize or instanti- The Minimal Model
ate) processes of production which, in turn, collec- Partly because the definition of autopoiesis was still
tively produce more of those same components informal and qualitative, Maturana and Varela offered
• Closure in space: The self-construction of what they called a concrete “minimal model” of
a boundary between the system and the world in autopoietic organization. With this they put forward
which it is embedded, yet from which it distin- a relatively abstract, but still “chemical-like,” world,
guishes itself incorporating a minimal set of chemical species and
Computational autopoiesis denotes the study of reactions that would still suffice to allow their organi-
autopoietic systems realized in computational form – zation into an autopoietic system. With Ricardo Uribe,
that is, embedded in artificial, computer-generated, they took the highly original (for the time) further step
virtual worlds. of actually implementing this minimal model in the
form of a computer simulation of their abstract chem-
ical world (Varela et al. 1974).
Characteristics The minimal (computer-simulated) chemistry takes
places in a discrete, two-dimensional, space. Each
Motivation position in the space is either empty or occupied by
The term autopoiesis was coined in the early 1970s by a single particle. Particles generally move in random
Chilean biologists Humberto Maturana and Francisco walks in the space. There are three distinct particle
Varela. They introduced it in an attempt to character- types (chemical species), engaging in three distinct
ize what they saw as a decisive organizational dis- reactions:
tinction between living and nonliving systems – • Production: Two substrate (S) particles may react,
namely, that even the simplest living system (such in the presence of a catalyst (K) particle, to form
as a bacterial cell) has the capacity both to regenerate a link (L) particle.
all its own components and also to demarcate itself, as • Bonding: L particles may bond to other L particles.
a distinct system, from its surrounding environment. Each L particle can form (at most) two bonds, thus
In their view, this entails an essential self-referential allowing the formation of indefinitely long chains,
circularity in the organization of living systems; and which may close to form membranes. Bonded
they conjectured that this self-referential circularity is L particles become immobile.
at least a necessary (though hardly sufficient) condi- • Disintegration: An L particle may spontaneously
tion for the distinctive phenomenology of biological disintegrate, yielding two S particles. When this
systems. This conjecture is clearly relevant to what is occurs, any bonds associated with the L particle
now called systems biology insofar as if this are destroyed also.
C 462 Computational Autopoiesis

model, there was also an additional technical feature


(“chain-based bond inhibition”) which was not made
explicit in the published algorithm but was, in fact,
necessary to the claimed autopoietic operation. This
oversight was subsequently identified through the fail-
ure of some independent attempts to reproduce the
originally described results. The issue was eventually
clarified and resolved (McMullin 2004).

Elaborations
The earliest systematic elaborations of the minimal
model of computational autopoiesis were developed
by Milan Zeleny (1978). In particular, he reported
such phenomena as growth, change in shape, oscilla-
tion in chemical activity, and self-reproduction of
autopoietic entities. Prima facie, these represented
important demonstrations of progressively richer, life-
like, phenomena within the framework of computa-
tional autopoiesis. However, the algorithmic
Computational Autopoiesis, Fig. 1 Minimal autopoietic sys- (“chemical”) basis for these phenomena was not fully
tem embedded in 2D computational model. Small hollow detailed in the published reports. Technological limi-
squares are substrate (S); solid disk is catalyst (K); large hollow tations of the time meant that the source code (the
squares with bonds are membrane/link (L) particles
definitive documentation) was not widely distributed,
and, indeed, all copies are now believed to have been
Chains of L particles are permeable to S particles lost. The qualitative descriptions that are available
but impermeable to K and L particles. Thus, a closed suggest that, in at least some cases, the original
chain, or membrane, which encloses K or L particles constraints of local interaction, random motion of
effectively traps such particles. particles, and time-independent particle dynamics
The basic autopoietic system embedded in this may have been substantially relaxed. If that were so,
world (Fig. 1) consists of a closed chain (membrane) the interest of these results would be seriously
of L particles enclosing one or more K particles. diminished – in the sense that the more complex
Because S particles can permeate through the mem- “macroscopic” phenomena may have been, in effect,
brane, there can be ongoing production of L particles. implicitly programmed into individual microscopic
Since these cannot escape from the membrane, this particles. In any case, this particular line of elaboration
will result in the buildup of a relatively high concen- was not pursued further.
tration of L particles. On an ongoing basis, the mem- Some years later, Breyer et al. (1998) described
brane will rupture as a result of disintegration of another series of developments beyond the minimal
component L particles. Because of the high concentra- autopoiesis model. They first relaxed the original
tion of L particles inside the membrane, there should restrictions on the motion of bound L particles to
be a high probability that one of these will drift to the allow the formation of flexible chains and, indeed,
rupture site and effect a repair, before the K particle(s) membranes. Allowing multiple, doubly bonded,
escape, thus reestablishing precisely the conditions L particles to occupy a single lattice site further
allowing the buildup and maintenance of that high enhanced membrane motion. Adopting more sophisti-
concentration of L particles. cated “bond rearrangement” interactions then allows
Practical computer simulation experiments chain fragments formed within the cell to be dynami-
demonstrated that such dynamic, self-maintaining, cally integrated into the membrane. These mechanisms
autopoietic systems can indeed both emerge and per- together obviate the need for the chain-based bond
sist as discrete, self-demarcated, composite individuals inhibition mechanisms – since now, even if
embedded within this abstract world. In the original L particles spontaneously bond in the interior of the
Computational Autopoiesis 463 C
cells, the oligomers so formed remain mobile and, Related Concepts
through bond rearrangement, can still successfully Computational autopoiesis is distinct from, but clearly
function to repair membrane ruptures. In this way, related to several other, parallel, developments.
the model was reported as successfully supporting The chemoton concept of Tibor Gánti (2003),
cell growth. though apparently developed quite independently,
More recently, a variation of the minimal model has shares significant features with cellular autopoiesis.
been presented which demonstrates systematic motion It is again a proposal for an abstract, “minimal” cell, C
of the autopoietic cell in a manner reminiscent of consisting of a collectively autocatalytic network of
chemotaxis (Suzuki and Ikegami 2008). reactions which is enclosed within a membrane,
A specific criticism of the minimal model is that it which is also generated and maintained by the reac-
provides no mechanism for production of further tion network. It differs from the minimal autopoiesis
K particles (within a functioning autopoietic cell). model in explicitly including a “genetic subsystem.”
This is argued to be both a logical and practical defi- It is also rather more detailed in its analysis of the
ciency. Logical because, prima facie, it conflicts with required chemical dynamics (kinetics, etc.) and aims
the stated requirement of autopoiesis for closure of at supporting self-reproduction by growth and fission
production. But perhaps more importantly, from even in the minimal version. The chemoton has been
a practical point of view, in the absence of ongoing subject to various computer simulation studies. How-
production of K, then self-reproduction of this form of ever, these have been based on an ordinary differen-
autopoietic entity is impossible; and consequently, tial equation approach rather than spatially
there is also no possibility for spontaneous Darwinian distributed molecular (particle) models. This means
evolution among variant lineages of such entities. that, in particular, the distributed, spatial, dynamics
Breyer et al. (1998) did address this issue to the of membrane growth, fission, and individuation have
extent of presenting a number of specific mechanisms not been substantively modeled in the chemoton
for production of further K particles but did not pursue context.
that to the demonstration of autopoietic entities actu- Another related line of work involves much more
ally capable of sustained reproduction (i.e., which physicochemically realistic computer models of mini-
could continue up to the limit of available resources mal cellular structures. This line of attack poses some
of space and particles). scaling difficulties, as the computational demands of
Further substantive progress in demonstrating evo- physicochemical realism rise very rapidly. On the
lution of computational autopoietic systems has (to other hand, the cost of computation continues to fall,
date) not generally focused on direct elaboration of so this may well be a very fruitful domain of further
the original minimal model but instead has tended to research in the near future.
shift to computer modeling at a more coarse-grained Finally, there are specific conceptual connections
level. This has been at least partially driven by reasons between the idea of autopoiesis and Robert Rosen’s
of computational cost in the simulations. In the original metabolism-repair (MR) systems and “closure under
model, each lattice site can generally be occupied by at efficient causation” (Rosen 1991). However, it must be
most one particle: It is fine grained to the single mol- noted that Rosen explicitly argued that his form of
ecule level. In the so-called lattice artificial chemistry closure could not, even in principle, be embedded within
(LAC) models, by contrast, each single lattice site is a purely computational system. By contrast, computer
permitted to host many particles, of many distinct realization has been an explicit exemplar of autopoietic
molecular species. Using this approach, a variety of closure from the very start. The relationship between
additional phenomena have been reported, including autopoiesis (especially in its computational realizations)
self-reproducing autopoietic cells, with (limited) heri- and closure under efficient causation is, accordingly, an
table variation and Darwinian evolution; for example, issue of continuing active research and debate.
selection between varieties of catalyst particles having
different rates of membrane particle generation (Ono
and Ikegami 2002). Some example phenomena, Cross-References
including self-reproduction, have also been reported
in three-dimensional LAC models. ▶ Artificial Life
C 464 Computational Creativity

References Characteristics

Breyer J, Ackermann J, McCaskill J (1998) Evolving reaction- Understanding brain processes behind creativity and
diffusion ecosystems with self-assembling structures in thin
modeling them using computational means is one of
films. Artif Life 4:25–40
Gánti T (2003) The principles of life. Oxford University Press, the grand challenges for systems biology. Computa-
Oxford tional creativity is a new field, inspired by cognitive
McMullin B (2004) Thirty years of computational autopoiesis: psychology and neuroscience. In many respects,
a review. Artif Life 10:277–295. doi:10.1162/
human-level intelligence is far beyond what artificial
1064546041255548, http://alife.rince.ie/bmcm-alj-2004/
Ono N, Ikegami T (2002) Selection of catalysts through intelligence can provide now, especially in regard to
cellular reproduction. In: Standish R, Bedau MA, Abbass the high-level functions, involving thinking, reason-
HA (eds) Proceedings of the 8th international conference ing, planning, and the use of language. Intuition,
on artificial life (ALIFE 8). MIT Press, Sydney, Australia,
pp 57–64. http://web.archive.org/web/20040518183545/
insight, imagery, and creativity are important aspects
http://parallel.hpc.unsw.edu.au/complex/alife8/proceedings/ of all these functions. Computational models show
sub2844.pdf great promise both in elucidating mechanisms behind
Rosen R (1991) Life itself. Columbia University Press, such high-level mental functions, and in applications
New York
Suzuki K, Ikegami T (2008) Shapes and self-movement in
requiring intelligence (Duch 2007).
protocell systems. Artif Life 15(1):59–70. doi:10.1162/ Creativity, defined by Sternberg (1998) as “the
artl.2009.15.1.15104, http://dx.doi.org/10.1162/artl.2009.15. capacity to create a solution that is both novel and
1.15104 appropriate,” has often been understood in a narrow
Varela FJ, Maturana HR, Uribe R (1974) Autopoiesis: the orga-
nization of living systems, its characterization and a model.
sense, with a focus on big discoveries, inventions, and
Biosystems 5:187–196 creation of novel theories, arts, and music, but it also
Zeleny M (1978) APL-AUTOPOIESIS: experiments in self- permeates everyday activity, thinking, understanding
organization of complexity. In: Progress in cybernetics and language, providing flexible solutions to everyday
systems research, vol III. Hemisphere, Washington,
pp 65–84
problems (R. Richards, in Runco and Pritzke 2005,
pp. 683–688). Creativity research has been pursued
mostly in the domain of philosophy, education, and
psychology, with many research results published in
two specialized journals: Creativity Research Journal
and Journal of Creative Behavior. The “Encyclopedia
Computational Creativity of Creativity” (Runco and Pritzke 2005), written by
167 experts, does not contain any testable neurological
Włodzisław Duch or computational models of creativity. MIT Encyclo-
Department of Informatics, Nicolaus Copernicus pedia of Cognitive Sciences (Wilson and Keil 1999)
University, Toruń, Poland contains only a single page (of about 1,100 pages)
School of Computer Engineering, Nanyang about creativity, does not mention intuition at all, but
Technological University, Singapore, Singapore devotes six articles to logic, appearing in the
index almost 100 times. Logic has never been too
successful in modeling real-thinking processes that
Synonyms rely on intuition and creativity. The interest in research
on computational and neuroscience approaches to
Ideation; Ingenuity; Innovation creativity is thus quite recent.

Creativity from the Psychological and


Definition Neuroscientific Perspective
D.T. Campbell (1960) described creativity as a two-
Computational creativity is the capacity to find solu- stage process of blind variation and selective retention
tions that are both novel and appropriate using compu- (BVSR). This idea is the basis of combinatorial models
tational means. of creative thinking (Simonton 2010). It is also the
Computational Creativity 465 C
basis of evolutionary biological processes, where the not clear whether brain mechanisms behind transfor-
mechanisms of blind variations operate on many mational creativity are really different, requiring
levels, with selective retention due to the increased a change of the rules that are used to define conceptual
fitness in a given context. In this sense, one can say spaces, or is it rather due to the linking of many brain
that viruses, bacteria, and other living organisms patterns that form a new, higher-level complex
exhibit a primitive form of creativity by solving col- representing observations in a more coherent way.
lectively the problem of survival. However, the BVSR Despite the limitations of the current knowledge of C
idea is more general as it does not have to rely on the neural processes that give rise to the cognitive
specific Darwinian mechanisms. It has applications in processes in the brain, it is possible to propose
such diverse fields as immunology, psychiatry, neuro- a testable, neurocognitive model of creative processes.
science, cognitive sciences, memetics, linguistics, Although direct brain imaging of creative thinking has
anthropology, philosophy, and computer science not been done yet, the “Aha!” phenomenon, or insight
(Simonton 2010). Blind variation is never random: it experience (Sternberg and Davidson 1995) during
is structured by specific interactions of basic elements, problem solving, understanding a joke or a metaphor,
from molecular to social, determining probabilities of has been studied using functional MRI and EEG tech-
arising combinations. niques. Brain states during insight were contrasted
Psychological research on creativity has focused on with analytical problem solving that does not require
empirical research with gifted children, distinguishing insight (Kounios and Jung-Beeman 2009). Although
creativity from general intelligence, testing fluency, the insight experience is sudden, it is a culmination of
flexibility, and originality of thought in both visual a series of brain processes. A few seconds before the
and verbal domains (Runco and Pritzke 2005). Suc- insight, an alpha burst (i.e., a sudden high amplitude
cessful intelligence theory separates creative and cog- oscillations in the range of 8–12 Hz) is seen in the right
nitive components of intelligence (Sternberg 1998), occipital cortex (i.e., the visual cortex behind the rear-
with creativity implying not only high quality but most portion of the skull), and about 300 ms before the
also novelty. Creativity is not reducible to cognitive feeling of insight, a burst of gamma activity (in the
thinking skills. The four basic stages of problem solv- 40 Hz range) is observed in the right hemisphere ante-
ing according to the widely used Gestalt model involve rior superior temporal gyrus (i.e., the upper ridge of the
preparation, incubation (that may be followed by temporal lobe cortex, on the side of the brain). Alpha
a period of frustration), illumination (insight), and activity helps to decrease activation of irrelevant cor-
verification of solution, including communication. tex after the information stating the verbal problem has
These stages, not necessarily in the same sequence, been taken in, while gamma burst reflects the connec-
were identified in creative problem solving by individ- tion of distantly related patterns. The right brain hemi-
uals and small groups of people. sphere is able to create more abstract associations
Boden (1991) defined creativity as “a matter of based on meanings, avoiding close associations that
using one’s computational resources to explore, and the left hemisphere is routinely processing. The same
sometimes to break out of, familiar conceptual neural structures are probably involved in creative
spaces.” Concepts are patterns of brain activations thinking. This shows the need for multiple levels of
(Pulverm€ uller 2003; Duch et al. 2008), and exploration representations of concepts that help to constraint the
of conceptual spaces may be linked to transitions search for solutions of problems requiring creativity.
between brain activations. Processing remote, loose Intuition is also a concept difficult to grasp
associations between ideas is responsible for the asso- (Lieberman 2000; Myers 2002) but plays an important
ciative basis of creativity (Mednick 1962; Simonton role in mathematics, science, and general decision
2010). Exploratory creativity is incremental and com- making. It has been defined as “knowing without
binatorial in nature, usually restricted to personal dis- being able to explain how we know.” Intuition relies
coveries (novel only for one person), binding diverse on implicit learning, gaining tacit knowledge without
activity of brain areas in a new way. “Transformational being aware of learning. Insight into structural rela-
creativity” leads to ideas that are new for the whole tions is usually not present, only fast judgment or
humanity, i.e., big paradigm shifts (Boden 1991). It is response based on probability estimation. Social
C 466 Computational Creativity

intuition is the basis of nonverbal communication, and problem solving, although the problem of find a good
can be seen as the phenomenological and behavioral way to represent knowledge domain in which genetic
correlate of knowledge gained through implicit learn- processes operate may be as hard as the original prob-
ing (Lieberman 2000). lem. Other approaches to insight include the “small
The measurement of intuition is based on several world” network model of Schilling (2005) and the
tests and inventories (for example, the Myers-Briggs work on fluid concepts and creative analogies
Type Inventory, or Accumulated Clues Task), but (Hofstadter 1995), with some applications to design.
there is little correlation between them, so the concept A direct attempt to model creative processes in the
of intuition is not well defined from an operational brain is still not feasible, but inspirations from the
point of view. Significant correlations were found BVSR models may be used in a number of ways. The
between the Rational-Experiential Inventory (REI) results of the experimental and theoretical research in
intuition scale and some measures of creativity (Raidl this domain can be summarized in three points:
and Lubart 2001). Such tests reflect rather complex 1. Space: creativity involves neural processes that are
cognitive processes, and it is not clear which brain realized in the space of quasi-stable neural activi-
processes are behind these measures. ties, leading to patterns of activity that reflect rela-
From a computational point of view, it is much tions between concepts in some domain.
easier to create predictive models of data than to pro- 2. Blind variation: priming by concepts that represent
vide explanations (Duch 2007). In particular, it is dif- the problem leads to distributed fluctuating neural
ficult to explain decisions made by neural networks or activity constrained by the strength of associations
similarity-based systems. Using such systems for between patterns of neural activity coding concepts;
learning from partial observations can constrain the this process is responsible for imagination and the
search for solutions, avoiding combinatorial explosion flexible formation of transient novel associations.
that is a main problem in AI, making the reasoning 3. Selective retention: filtering of interesting results,
process feasible. discovering partial solutions that may be useful to
reach goals, amplifying or forming new associa-
Creativity from the Computational Perspective tions; in biological systems, this may involve emo-
Psychology and neuroscience agree that creativity is tional arousal.
a product of ordinary cognitive processes. The lack of The blind-variation process may require some
understanding of detailed mechanisms involved in cre- structuring to be effective. Brainstorming, free associ-
ative activity makes the development of creative com- ations, random stimulation, or lateral thinking have not
puting rather difficult. The need for everyday creativity been very successful in the generation of creative ideas
has been almost completely neglected by the artificial in advertising and product innovation (Goldenberg and
intelligence research community and may be credited Mazursky 2002). However, structured approaches,
for failures of many AI programs. Early attempts to based on higher-order rules and templates, led to
model intuition, insight, and inspiration from the AI excellent results. Computer-generated ideas based on
point of view have been summarized by Simon (1995). templates were rated significantly higher both for
His work has mostly been directed at understanding creativity and originality than the non-template
historical discoveries of scientific laws, as well as the human ideas. The associative processes may in this
search for new scientific knowledge of this kind in case have been guided by general rules. Connectionist
astronomy, physics, chemistry, and biology. Simon models for the generation of ideas within the brain-
made no attempts to connect search-based AI storming context can successfully predict factors that
approaches to processes in the brain. Research in auto- enhance brainstorming productivity. The model of Iyer
matic genetic programming (Koza et al. 2003) can be et al. (2009) is perhaps the most sophisticated, with
credited for useful patentable inventions in automated features, concepts, and cognitive control components
synthesis of antennas, analog electrical circuits, con- as separate neural layers. Ideas emerge in a multilevel,
trollers, metabolic pathways, genetic networks, and modular semantic space from itinerant attractor
other areas. These inventions have been mostly dynamics (i.e., activity of neuronal changes in the
restricted to optimized versions of known designs. itinerant way, attracted toward quasi-stable states
Genetic programming may be capable of creative where it slows down) shaped by context, synaptic
Computational Creativity 467 C
learning, and ongoing evaluative feedback. This model In computational models, these cognitive processes
generates novel ideas by multilevel dynamical search may be implemented in large-scale neural models, but
in various contexts, capturing the interplay between even the simplest approximations give interesting
semantic representations, working memory, focusing results (Duch 2006; Duch and Pilichowski 2007). The
attention on various features, and using reinforcement algorithm involves three major components:
signals. The model is quite useful for the elucidation of 1. An autoassociative memory (AM) structure, trained
the mechanisms of creativity. For example, it has on a large lexicon to learn statistical properties at C
found interesting associations for tourist activities the morphological level, providing the model of
such as “outing and vacation,” depending on the infor- a neural space and storing background knowledge
mation of the context. that is modified (primed) by keywords.
The simplest domain in which creativity is fre- 2. Blind-variation process (imagery), forming new
quently manifested and can be studied in experiments strings from combinations of substrings found in
as well as computational models is the invention and keywords and their synonyms, with probabilistic
understanding of neologisms. Poems by Lewis Carroll constraints provided by the AM to select only lex-
are full of neologisms, but novel words are also in ically plausible strings.
great demand for products, web sites, or company 3. Selective retention ranking the quality of the strings
names. In languages with rich morphological and representing neologisms from a phonological and
phonological compositionality (Latin-based, Slavic, semantic point of view.
and other families), out-of-vocabulary words appear Filters should estimate phonological plausibility
fairly often in conversations. In most cases, the mor- and “semantic density,” or the number of potential
phology of these words gives sufficient information to associations with commonly known words, calculat-
understand their meaning. Given keywords or a short ing the number of substrings in the lexical tokens that
description from which keywords are extracted primes may serve as morphemes in each new candidate
the brain at the phonetic and semantic level. The string. Many factors may be included here: general
structure of the language is internalized in the neural similarity between morphemes, personal biases,
space. Priming leads to blind variation at the level of tweaking phonology for neologisms with interesting
word constituents (syllables, morphemes), creating phonetics. The implementation of this algorithm led
a large number of transient resonant configurations to the generation of neologisms with highest ranks
of neural cell assemblies that remain unconscious. that have actually been used as company or domain
This process explores the space of possibilities that names in about 75% of cases. For example, starting
agree with internalized constraints on the statistical from an extended list of keywords, portal, imagina-
probabilities of phonological structure (phonotactics tion, creativity, journey, discovery, travel, time,
of the language) and morphological structure. Imag- space, infinite, the best neologisms included creatival
ery processes are approximated in a better way by (used by creatival.com), creativery (used by
taking keywords, finding their synonyms to increase creativery.com). Novel neologisms (not found by
the pool of words, breaking words into syllables and the Google search engine) included discoverity, asso-
morphemes, and combining the fragments in all pos- ciated with discovery of something true (verity), and
sible ways. Words that share properties with many linked to many morphemes: disc, disco, discover,
other words (i.e., patterns that code them in the brain verity, discovery, creativity, verity, and through pho-
overlap strongly with patterns for other words) have nology to many others. Another interesting word
a higher chance to win the competition for access to found is digventure, with many associations to dig
awareness. Many variants of words are created around and venture.
the same morphemes. The same word can be used with These examples show that computational
many meanings because the context creates specific approaches to creativity can, at least in restricted
brain activation patterns for this word. Creative brains domains, lead to results that are comparable with
spread the activation to more words associated with human ingenuity, and that blind-variation selective
initial keywords, produce more combinations, retention ideas based on the generalization of evolu-
selecting the most interesting ones using phonologi- tionary processes may be as useful in cognitive science
cal, affective, and semantic filters. as they are in life sciences.
C 468 Computational Data Analysis

References
Computational Data Analysis
Boden M (1991) The creative mind: myths and mechanisms.
Abacus, London
▶ Data-intensive Research
Campbell DT (1960) Blind variation and selective retention in
creative thought as in other knowledge processes. Psychol
Rev 67:380–400
Duch W (2006) Computational creativity. IEEE World congress
on computational intelligence, 16–21 July. IEEE Press, Van-
couver, pp 1162–1169
Computational Identification of
Duch W (2007) Towards comprehensive foundations of com- microRNA Targets
putational intelligence. In: Duch W, Mandziuk J (eds)
Challenges for computational intelligence. Springer, Berlin/ ▶ MicroRNA, Target Prediction
Heidelberg/New York, pp 261–316
Duch W, Pilichowski M (2007) Experiments with computational
creativity. Neural Inform Process Lett Rev 11(4–6):123–133
Duch W, Matykiewicz P, Pestian J (2008) Neurolinguistic
approach to natural language processing with applications Computational Linguistics
to medical text analysis. Neural Netw 21(10):1500–1510
Goldenberg J, Mazursky D (2002) Creativity in product innova-
tion. Cambridge University Press, Cambridge ▶ Natural Language Processing
Hofstadter DR (1995) Fluid concepts and creative analogies:
computer models of the fundamental mechanisms of thought.
Basic Books, New York
Iyer LR, Doboli S, Minai AA, Brown VR, Levine DS, Paulus PB
(2009) Neural dynamics of idea generation and the effects of Computational Methods for Mapping
priming. Neural Netw 22:674–686 B Cell Epitopes
Kounios J, Jung-Beeman M (2009) Aha! The cognitive neuro-
science of insight. Curr Dir Psychol Sci 18:210–216
▶ B Cell Epitope Prediction
Koza JR et al (2003) Genetic programming IV: routine human-
competitive machine intelligence. Kluwer, New York
Lieberman MD (2000) Intuition: a social cognitive neuroscience
approach. Psychol Bull 126(1):109–137
Mednick SA (1962) The associative basis of the creative process.
Psychol Rev 69:220–232 Computational Methods for
Myers DG (2002) Intuition: its powers and perils. Yale University Transcriptional Regulatory Networks
Press, New Haven
Pulverm€uller F (2003) The neuroscience of language On brain
circuits of words and serial order. Cambridge University Jianhua Ruan
Press, Cambridge Department of Computer Science, University of Texas
Raidl M-H, Lubart TI (2001) An empirical study of intuition and at San Antonio, San Antonio, TX, USA
creativity. Imagination, Cognition Personality 20(3):
217–230
Runco M, Pritzke S (eds) (2005) Encyclopedia of creativity,
vol 1–2. Elsevier, Burlington Definition
Schilling MA (2005) A “Small-orld” network model of cogni-
tive insight. Creativ Res J 17(2–3):131–154 The transcriptional level of gene expression is
Simon HA (1995) Explaining the ineffable: AI on the topics of
intuition, insight and inspiration. In: Proceedings of the 14th controlled, to a large extent, by specific interactions
international joint conference on artificial intelligence, between transcription factors (TFs) and the promoter
Montreal, pp 939–948 sequences of their target genes. The interactions
Simonton DK (2010) Creative thought as blind-variation and between TFs and target genes are combinatorial in
selective-retention: combinatorial models of exceptional cre-
ativity. Phys Life Rev 7:156–179 nature, and are typically many-to-many, i.e., each TF
Sternberg RJ (ed) (1998) Handbook of human creativity. controls many genes, and a gene can be controlled by
Cambridge University Press, Cambridge many TFs, forming complex transcriptional regulatory
Sternberg RJ, Davidson JE (1995) The nature of insight. MIT networks (TRNs). To understand gene functions in dif-
Press, Cambridge, MA
Wilson R, Keil F (eds) (1999) MIT encyclopedia of cognitive ferent biological processes, it is necessary to reconstruct
sciences. MIT Press, Cambridge, MA such regulatory networks from experimental data.
Computational Methods for Transcriptional Regulatory Networks 469 C
Characteristics expressed genes”) are often considered co-regulated.
A number of statistical methods have been developed
Data to identify differentially expressed genes, for example,
Input data used for computationally reconstructing Statistical Analysis of Microarray (SAM) (Tusher et al.
genome-scale TRNs may include gene expression data 2001), and Rank Products (Breitling et al. 2004).
measured by DNA microarrays or RNA-seq, promoter Statistically, genes exhibiting similar expression patterns
sequences that the cis-regulatory elements may be across multiple experimental conditions are more likely C
located, and protein-DNA interaction data obtained co-regulated than genes co-expressed under very specific
from high-throughput experiments such as chromatin conditions. Therefore, unsupervised methods perform
immunoprecipitation (ChIP) combined with microarrays better when the data set contains a large number of
(ChIP-chip), chromatin immunoprecipitation combined experimental conditions. Genes can then be grouped
with next-generation sequencing (ChIP-Seq), and into clusters, where genes in the same cluster have
universal protein-binding microarrays (PBM). similar expression patterns. This clustering-based
These data sets provide different, orthogonal, infor- method is generally considered a more reliably way of
mation and should be combined whenever possible. identifying co-regulated genes than methods based on
ChIP-based protein-DNA interaction data provides differentially expressed genes. Many clustering
the most direct evidence of physical interactions algorithms have been introduced from the data mining
between regulators and DNA. However it is relatively field, including hierarchical clustering (Eisen et al.
costly and may be limited by the availability of anti- 1998), k-means clustering (Tavazoie et al. 1999), self-
bodies specific to the regulators of interest. Further- organizing maps (Tamayo et al. 1999), and graph-
more, protein-DNA interaction is dynamic and is theoretic algorithms (Xu et al. 2002) (see Belacel et al.
affected by many factors, while ChIP-based data only (2006) for a survey). Bi-clustering methods have also
captures the interaction under a particular condition been introduced to find clusters of genes that are
and at a specific time point. Gene expression data co-expressed under subsets of conditions (Cheng and
combined with promoter sequences can be combined Church 2000; Tanay et al. 2002; Turner et al. 2005;
to reveal dynamic changes of transcriptional regulation Madeira and Oliveira 2004).
under different conditions. However, the simple
assumption that such method usually relies on, i.e., Identifying Common cis-Regulatory Elements
co-expressed genes are transcriptionally co-regulated, After obtaining a list of putatively co-regulated genes,
does not always hold. the next logical step is to find the common regulators, or
In addition, using promoter sequences one can only to find common cis-regulatory elements. The latter is
infer the binding preferences of a transcription factor. more often used, simply because the former relies on
The identity of the actual transcription factor needs to be protein-DNA interaction data which is still relatively
resolved, which is often not easy. The PBM technique rare and may not be available for the particular condition
may fill this gap by explicitly measuring the affinity of interest. In contrast, finding cis-regulatory elements
between regulators and all possible DNA subsequences only needs genomic sequence data – it attempts to find
of a fixed length. common short subsequences from the promoter regions
of the co-regulated genes (“motif finding”). Motif find-
Unsupervised Methods ing can also be classified into two categories: de novo
Identifying Putatively Co-regulated Genes motif finding and motif scanning. While de novo motif
An unsupervised method for reconstructing transcrip- finding does not require input of known motifs, motif
tional regulatory networks usually starts with identifying scanning relies on databases of known cis-regulatory
potentially co-regulated genes. Although co-regulated elements. With motif scanning, one searches for the
genes can be directly obtained from protein-DNA occurrence of a list of known cis-regulatory elements
interaction data, the most common source is gene expres- (with some tolerance of mismatches) from the promoters
sion data analysis, which relies on the simple assumption of the co-regulated genes, and then reports the ones with
that co-expressed genes are likely co-regulated. For the highest number of occurrence (or more accurately,
example, genes that show similar transcriptional with the highest statistical significance). The de novo
responses to the same treatment (“differentially motif finding method does not rely on databases of
C 470 Computational Methods for Transcriptional Regulatory Networks

known cis-regulatory elements; instead, it attempts to Supervised Methods


identify a subsequence pattern that appears more Another major direction to model transcriptional reg-
frequently in the co-regulated genes than would be ulatory networks from high-throughput data relies on
expected. Such subsequence patterns are often compared supervised machine learning. Unlike the unsupervised
with the known cis-regulatory elements for validations. methods, the supervised methods do not assume
The actual algorithm usually depends on the representa- knowledge of co-regulated genes. Instead, it builds
tion of motifs. quantitative or qualitative models to capture possible
Motif representation. A cis-regulatory element (motif) interactions between transcription factors and target
can be represented by either a consensus or genes. The assumption is that if a simple model can
a position-specific weight matrix. A consensus be used to explain the expression levels of one or more
describes the most frequent nucleotide in each posi- genes, then the variables used by the model are likely
tion of the binding motif, while a position-specific responsible for regulating the genes. Figure 1 shows
weight matrix depicts the frequency of each nucle- the relationship between several types of methods
otide occurring at each position of the binding under the supervised machine-learning framework.
motif. The first type of supervised methods attempts to
De novo motif finding. Many computational motif model gene expression levels/patterns with the regula-
finding approaches have been developed. These tory motifs on promoter sequences (Fig. 1, Boxes
methods differ from one another in their ways of A and B). In Box A, the expression levels of multiple
defining motifs, the objective functions for calculat- genes under a single condition are modeled as
ing motif significance, and the search techniques used a function of motif occurrences on their promoters.
to find the optimal (or near optimal) motifs. Most of That is, each gene is treated as an instance, and attri-
these algorithms can be classified into one of two butes are the motifs. Gene expression levels under
broad categories: heuristic optimization algorithms a particular condition are the response variables,
based on position-specific weight matrices (PSWM), which can be either discrete or continuous. Commonly
and enumerative algorithms based on consensuses. used models include regression-based and rule-based.
Examples of the first category include the well- Rule-based models have relatively better interpretabil-
known programs such as MEME (Bailey and Elkan ity over regression-based models. For examples,
1994), AlignACE (Roth et al. 1998), Gibbs Sampler Bussemaker et al. (2001) and others (Keles et al.
(Lawrence et al. 1993), and BioProspector (Liu et al. 2002; Conlon et al. 2003) modeled the expression
2001), while the second category can be exemplified levels of genes as a linear regression of putative bind-
by Weeder (Pavesi et al. 2001), YMF (Sinha and ing motifs, and applied feature selection techniques to
Tompa 2002), MultiProfiler (Keich and Pevzner find the most significant motifs. Hu et al. (2000) used
2002), and Projection (Buhler and Tompa 2002). decision rules to find motif combinations that best
Motif scanning. Motif scanning is relatively straight- separate two sets of genes. Simonis et al. (2004) com-
forward. Regardless of the motif representation, bined a string-based motif finding method and linear
two issues need to be decided. First, how to score discriminant analysis to identify motif combinations
the matches between a known motif and that can separate true regulons from false ones. In Box
a subsequence of the same length on a promoter B, the expression patterns of multiple genes under
sequence; second, what cutoff should be used to multiple conditions are modeled by the motifs on
select true matches instead of random matches. In their promoters. The difference between these methods
the case of consensuses, the simplest scoring func- and the ones represented by Box A is that each gene in
tion can be the number of matched nucleotides, and Box B has multiple response variables. To solve this
a cutoff is typically decided empirically based on problem one can either pre-cluster the genes and then
the number of mismatches that can be allowed. use gene cluster IDs as response variables, or conduct
Some programs can also take into consideration clustering and motif finding simultaneously. Probabi-
the nonuniform distribution of the different nucle- listic graphical models, e.g., Bayesian networks, were
otides and assign them different scores. In the case used to explain gene expression patterns from motifs
of position-specific weight matrices, the scoring (Beer and Tavazoie 2004). Phuong et al. (2004)
function is usually based on the information score. applied multivariate regression trees to model the
Computational Methods for Transcriptional Regulatory Networks 471 C
E D
C

regulators
motifs
C
1 1 0 1 0 0 1
0 1 0 1 1 0 0
1 0 0 0 1 0 1
1 0 0 0 0 1 0
0 1 1 0 0 0 0

genes
0 0 1 0 1 0 0
1 1 0 0 1 1 0
0 0 0 1 1 0 0
0 0 1 1 0 1 0
0 0 1 0 0 0 0
A 1 0 0 1 0 1 1
conditions
B

Computational Methods for Transcriptional Regulatory are modeled by the motifs on their promoters. Box C, the expres-
Networks, Fig. 1 Schematic representation of existing super- sion levels of a single gene under multiple conditions are modeled
vised machine-learning methods for modeling transcriptional reg- by the expression levels of putative regulators. Box D, the expres-
ulation. The bottom right matrix represents gene expression levels. sion levels of multiple genes under multiple conditions are
The bottom left matrix represents motif occurrences on promoter modeled by the expression levels of putative regulators. Box E,
sequences. The top matrix is the expression levels of regulators. the expression levels of multiple genes under multiple conditions
Box A, the expression levels of multiple genes under a single are modeled by the motifs on their promoters and the expression
condition are modeled by the motifs on their promoter. Box B, levels of putative regulators
the expression levels of multiple genes under multiple conditions

transcriptional regulation of gene expressions over example, Qian et al. (2003) applied support vector
several time points simultaneously. In these two machines (SVMs) to predict the targets of 36 yeast
methods, motifs will be associated with a cluster of transcription factors by identifying subtle relationships
genes that have similar expression patterns; this is between their expression profiles. Soinov et al. (2003)
similar to the unsupervised methods. However, in and Segal et al. (2003a) used decision trees and regres-
these methods clustering of genes and identification sion trees for similar purposes. The method in Soinov
of cluster-specific motifs are performed et al. (2003) used decision tree to identify possible
simultaneously. regulators for several cell-cycle genes individually,
The second type of supervised methods attempt to while the method in Segal et al. (2003a) proposed
model the expression levels of target genes by the a more sophisticated procedure suitable for whole-
expression levels of other genes, e.g., TFs and other genome analysis. The method first clusters genes
regulators (Fig. 1, Boxes C and D). In Box C, the according to their expression patterns, and then builds
expression levels of a single gene under multiple con- a regression tree for each cluster to represent their
ditions are modeled as a function of the expression common regulation program. The procedure then iter-
levels of putative regulators under the same set of atively refines the clusters and the trees.
conditions. That is, each condition is treated as an Finally, the third type of supervised methods model
instance, and the expression levels of putative regula- gene expression levels using both putative binding
tors under the condition are its attributes. Similar to the motifs on promoter sequences and the expression
first type of methods, both rule-like and regression- levels of putative regulators (Fig. 1, Box E). For exam-
based methods have been commonly applied. For ple, Middendorf et al. (2004) used an ensemble of
C 472 Computational Methods for Transcriptional Regulatory Networks

decision trees to model gene expression levels by com- (Ihmels et al. 2002; Bar-Joseph et al. 2003). Hybrid
bining putative binding motifs and the expression methods that combine gene clustering and classifica-
levels of putative TFs. The method by Ruan and tion iteratively have also been developed (Segal et al.
Zhang (2006), Bi-Dimensional Regression Tree 2003b; Beer and Tavazoie 2004).
(BDTree), is an extension to the multivariate regres-
sion tree approach of Segal (1992) and Phuong et al.
(2004). A multivariate regression tree recursively References
splits genes into groups, where the genes in each
group have similar selected attributes (motifs) and Bailey TL, Elkan C (1994) Fitting a mixture model by expecta-
tion maximization to discover motifs in biopolymers. Proc
responses (Segal 1992). Phuong et al. (2004) extended
Int Conf Intell Syst Mol Biol 2:28–36
the method to handle multiple responses, so that Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F,
the instances in each group have a similar pattern of Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK
responses across multiple conditions. The basic idea (2003) Computational discovery of gene modules and
regulatory networks. Nat Biotechnol 21:1337–1342
of BDTree, as suggested by its name, is to extend
Beer MA, Tavazoie S (2004) Predicting gene expression from
the multivariate regression tree approach to both sequence. Cell 117(2):185–198
dimensions of the expression matrix (Fig. 1, Box E). Belacel N, Wang Q, Cuperlovic-Culf M (2006) Clustering
On one dimension, each gene is treated as an instance, methods for microarray gene expression data. OMICS
10:507–531
where the attributes are the binding motifs on its
Breitling R, Armengaud P, Amtmann A, Herzyk P (2004) Rank
promoter sequence, and the responses are its expres- products: a simple, yet powerful, new method to detect
sion levels under different conditions. On this dimen- differentially regulated genes in replicated microarray exper-
sion, the approach recursively partitions genes so iments. FEBS Lett 573(1–3):83–92
Buhler J, Tompa M (2002) Finding motifs using random pro-
that those in the same subset have some common
jections. J Comput Biol 9:225–242
binding motifs and similar expression patterns across Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element
the conditions. On the other dimension, each condition detection using correlation with expression. Nat Genet
is treated as an instance, while the attributes are 27:167–171
Cheng Y, Church GM (2000) Biclustering of expression data.
the expression levels of candidate regulators under
Proc Int Conf Intell Syst Mol Biol 8:93–103
the condition, and the responses are the expression Conlon EM, Liu XS, Lieb JD, Liu JS (2003) Integrating regula-
levels of genes under that condition. BDTree recur- tory motif discovery and genome-wide expression analysis.
sively partitions the conditions so that the expression Proc Natl Acad Sci USA 100:3339–3344
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster
levels of the regulators and target genes within each
analysis and display of genome-wide expression patterns.
subset of conditions are similar. The final model, Proc Natl Acad Sci U S A 95:14863–14868
represented by a tree, can be considered as a set of Hu YJ, Sandmeyer S, McLaughlin C, Kibler D (2000) Combi-
rules, each of which has the form of “if a gene has natorial motif analysis and hypothesis generation on
a genomic scale. Bioinformatics 16:222–232
binding motif A, it will be up-regulated when TF B is
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N
up-regulated.” (2002) Revealing modular organization in the yeast tran-
scriptional network. Nat Genet 31:370–377
Other Methods Keich U, Pevzner PA (2002) Finding motifs in the twilight zone.
Bioinformatics 18:1374–1381
Besides the supervised and unsupervised methods,
Keles S, van der Laan M, Eisen MB (2002) Identification of
several other approaches have been proposed to explic- regulatory elements using a feature selection method. Bioin-
itly identify synergies among regulators or regulatory formatics 18:1167–1175
elements without clustering or classifying genes. For Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF,
Wootton JC (1993) Detecting subtle sequence signals:
example, Pilpel et al. (2001) analyzed the combinato-
a gibbs sampling strategy for multiple alignment. Science
rial effects of motif pairs on gene expression profiles 262:208–214
and identified many significant motif combinations. Liu X, Brutlag DL, Liu JS (2001) Bioprospector: discovering
Also, several methods have been developed to conserved DNA motifs in upstream regulatory regions of co-
expressed genes. Pac Symp Biocomput 2001:127–38
combine gene expression clustering with additional
Madeira SC, Oliveira AL (2004) Biclustering algorithms for
information, such as promoter sequences, cellular biological data analysis: a survey. IEEE/ACM Trans Comput
functions, and genomic localization data Biol Bioinform 1:24–45
Computational microRNA Biology 473 C
Middendorf M, Kundaje A, Wiggins C, Freund Y, Leslie C
(2004) Predicting genetic regulatory response using classifi- Computational microRNA Biology
cation. Bioinformatics 20(Suppl 1):I232–I240
Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding
signals of unknown length in DNA sequences. Bioinformat- Julio Vera and Ulf Schmitz
ics 17:S207–S214 Department of Systems Biology and Bioinformatics,
Phuong TM, Lee D, Lee KH (2004) Regression trees for regu- Institute of Computer Science, University of Rostock,
latory element identification. Bioinformatics 20(5):750–757
Pilpel Y, Sudarsanam P, Church GM (2001) Identifying regula- Rostock, Germany C
tory networks by combinatorial analysis of promoter ele-
ments. Nat Genet 29:153–159
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Predic- Synonyms
tion of regulatory networks: genome-wide identification of
transcription factor targets from gene expression data. Bio-
informatics 19:1917–1926 Kinetic modeling of microRNA regulation; Mathemat-
Roth FP, Hughes JD, Estep PW, Church GM (1998) Finding ical modeling of microRNA regulation; MicroRNA;
DNA regulatory motifs within unaligned noncoding MicroRNA clusters; MicroRNA databases and Web
sequences clustered by whole-genome mRNA quantitation.
Nat Biotechnol 16:939–945 resources; MicroRNA gene prediction; MicroRNA
Ruan J, Zhang W (2006) A bi-dimensional regression tree synthesis; MicroRNA target hubs; MicroRNA target
approach to the modeling of gene expression regulation. prediction; MicroRNA target regulation;
Bioinformatics 22:332–340 Regulatory networks
Segal MR (1992) Tree-structured methods for longitudinal data.
J Am Stat Assoc 87:407–418
Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D,
Friedman N (2003a) Module networks: identifying regula- Definition
tory modules and their condition-specific regulators from
gene expression data. Nat Genet 34:166–176
Segal E, Yelensky R, Koller D (2003b) Genome-wide discovery MicroRNAs (miRNAs) are small regulatory RNAs of
of transcriptional modules from DNA sequence and gene ~22nt length that regulate the activity and stability of
expression. Bioinformatics 19(Suppl 1):i273–i282 a large number of messenger RNAs (Bartel 2004;
Simonis N, Wodak SJ, Cohen GN, van Helden J (2004) Com- Ambros 2004; Filipowicz et al. 2008). miRNAs bind
bining pattern discovery and discriminant analysis to predict
gene co-regulation. Bioinformatics 20(15):2370–2379 to specific messenger RNAs and target them for cleav-
Sinha S, Tompa M (2002) Discovery of novel transcription age, deadenylation, or translational repression. In all
factor binding sites by statistical overrepresentation. Nucleic three processes, protein synthesis of the targeted mes-
Acids Res 30:5549–5560 senger RNA is prevented. Until today, a vast amount of
Soinov LA, Krestyaninova MA, Brazma A (2003) Towards
reconstruction of gene networks from expression data by human miRNAs has been detected (1.524 human
supervised learning. Genome Biol 4:R6 miRNAs are listed in the ▶ miRBase database, r18).
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Many of them are involved in the regulation of relevant
Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting cellular processes, including cell signaling, metabo-
patterns of gene expression with self-organizing maps:
methods and application to hematopoietic differentiation. lism, apoptosis, and developmental processes (Ambros
Proc Natl Acad Sci USA 96:2907–2912 2004).
Tanay A, Sharan R, Shamir R (2002) Discovering statistically In addition to their function in cancer, numerous
significant biclusters in gene expression data. Bioinformatics studies have validated the role of miRNAs as
18(Suppl 1):S136–S144
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM key modulators in other high prevalence diseases,
(1999) Systematic determination of genetic network archi- including cardiovascular disorders like fibrosis
tecture. Nat Genet 22:281–285 and atherosclerosis, neurodegenerative diseases, auto-
Turner HL, Bailey TC, Krzanowski WJ, Hemingway CA immunity, and infectious processes including pulmo-
(2005) Biclustering models for structured microarray data.
IEEE/ACM Trans Comput Biol Bioinform 2:316–329 nary infections (▶ microRNA, Disease and Therapy).
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of On the other hand, strategies have been established
microarrays applied to the ionizing radiation response. Proc that use miRNAs as therapeutic agents by either
Natl Acad Sci USA 98(9):5116–5121 downregulating those that are abnormally
Xu Y, Olman V, Xu D (2002) Clustering gene expression data
using a graph-theoretic approach: an application of minimum overexpressed or replacing silenced ones with stable
spanning trees. Bioinformatics 18:536–545 nucleotide constructs.
C 474 Computational microRNA Biology

Characteristics composed of Drosha and DGCR8; (3) export from the


nucleus to the cytoplasm via Exportin 5; (4) Dicer-
The regulation of basic cell functions is controlled by mediated cleavage of the pre-miRNA into a short
complex intracellular biochemical networks involving miRNA duplex; and (5) Argonaute2 controlled final
interacting genes, proteins, and small molecules. processing, which generates a RNA strand that is
Recently, an additional level of regulation has been incorporated into the RNA silencing complex (RISC).
discovered subsequent to the identification of a class of This protein-RNA complex structure can bind, in
short non-coding RNAs called microRNAs. The first a sequence-dependent manner, to the 30 untranslated
two miRNAs (lin-4 and let-7) were identified in region (UTR) of messenger RNA targets
Caenorhabditis elegans (Lee et al. 1993). Later, homo- (▶ MicroRNA Biogenesis, Regulation). Hybridiza-
logues of these were found in higher organisms tions with the 50 UTR and ORF of targets have been
like D. melanogaster and soon also in humans. observed but are typically nonfunctional in terms of
This increased the interest in these small post- target repression.
transcriptional regulators, and subsequently many MicroRNA Target Regulation. In miRNA-induced
more miRNAs have been found. Today, it is known target repression, the mature miRNA, previously
that miRNAs are involved in the regulation of many loaded to the RISC complex, binds to a specific
critical cellular processes. For example, some of them locus on the targeted messenger RNA (▶ Target
post-transcriptionally repress genes that control criti- Site). Success in the hybridization of miRNA-RISC
cal features in developmental processes, including with the target site depends on the level of sequence
developmental timing and stem cell maintenance in complementarity, where the so-called seed
both plants and animals (Carrington and Ambros (▶ microRNA, Seed) region, nucleotides 2–8 nt of
2003). Additionally, many studies associate miRNAs the miRNA, plays a major role in miRNA-target rec-
with a whole bunch of diseases either as mediator or, ognition. Successful hybridization of miRNA-RISC
when deregulated, as causal factor. Here, of highest with the target side leads to post-transcriptional
interest is the implication of miRNAs in repression of the target gene (Fig. 2). The degree of
tumourigenesis via regulation of target genes which sequence complementarity between the targeted
play a role in tumor development and progression, like mRNA and the microRNA can designate the mecha-
Ras, c-Myc, BCL2, and cell cycle-dependent kinases nism by which the expression of a gene is hindered.
(Garzon et al. 2009). In this context, miRNAs can act Rarely observed in animals but common in plants are
as tumor suppressors when they downregulate onco- sites with extensive or perfect sequence complemen-
genes (Takamizawa et al. 2004), or like oncogenes (a. tarity, in which miRNAs direct Argonaute-catalyzed
k.a. oncomirs) by inhibiting tumor suppressor expres- cleavage of the target mRNA (▶ Target Cleavage).
sion (Iorio et al. 2005). Other miRNAs play With imperfect complementarity, a miRNA bound to
a fundamental role during malignant transformation a mRNA target site can induce deadenylation and
(the so-called metastamir, Hurst et al. 2009, subsequent destabilization (target deadenylation), or
▶ MicroRNA Families, Cancer Progression). it can repress protein synthesis of the target by
blocking translation initiation or elongation, or by
MiRNA Biogenesis and Target Regulation causing early translation termination.
MicroRNA Biogenesis. miRNAs go through a complex The identification of miRNA targets can be
biogenesis process in which a maturated single- achieved through various in silico, in vitro, and
stranded functional miRNA is generated from in vivo techniques (target identification, microRNA).
a miRNA gene (Fig. 1). MiRNA biogenesis involves These range from predictions of computational algo-
several successive steps: (1) transcription of the rithms to experimentally driven predictions using gene
miRNA gene into a primary transcript, pri-miRNA expression assays, sequencing, or mass spectrometry.
(miRNA genes are often embedded in introns of Targets are typically validated by reporter gene ana-
protein-coding genes or clusters of several consecutive lyses, immunoblotting, or immunoprecipitation. The
miRNA genes); (2) cleavage into a ~80-nt-long hairpin aim of target regulation studies is to quantify changes
RNA (stem-loop structure) referred to as precursor in target mRNA and protein concentration upon
miRNA (pre-miRNA) via a microprocessor complex miRNA overexpression or knockdown.
Computational microRNA Biology 475 C
Computational microRNA
Biology, Fig. 1 Sketch of the
miRNA biogenesis pathway.
The primary transcript of
a miRNA gene (pri-miRNA) is
cleaved in the nucleus by the
endonuclease Drosha, yielding
a 60–70-nt stem-loop
intermediate (pre-miRNA). C
The pre-miRNA is
translocated into the
cytoplasm and there cleaved
by Dicer. The resulting
double-stranded RNA
associates with Argonaute2
a member from the Argonaute
protein family (Ago2) to form
the miRNA-RISC loading
complex (miRLC). One strand
of the miRNA duplex guides
the RNA-induced silencing
complex (RISC) to specific
mRNA target that is
recognized by a certain
sequence complementarity to
the miRNA. This leads to
either repression or
Argonaute-induced
degradation of the target, with
subsequent reduction in the
protein level. ▶ MicroRNA
Biogenesis, Regulation

Networks Involving miRNA Regulation program or be used as a backup option to ensure the
MicroRNAs are often embedded in complex gene and activation of those programs.
signaling networks composed of transcription factors, MiRNAs are involved in positive feedback loops; in
miRNAs, and signaling proteins. Interestingly, the simplest of these structures, the expression of
miRNA-regulated networks are often enriched in reg- a miRNA is inhibited by one of its target proteins
ulatory motifs like feedback and feed-forward loops. (Fig. 3 top left). This motif can induce bistable expres-
Moreover, they exhibit other complex motifs that are sion of both the miRNA and its target, while under
specific to miRNA regulation. certain conditions this system transforms a transient
Feedback Loops. Often miRNAs regulate the activation signal into a sustained cellular response,
expression of signaling proteins that are involved in driven by the miRNA target or downstream signal
feedback loop structures. In other cases, the expression mediators. This confers robustness to those systems
of miRNAs can be regulated by their own targets or against fluctuations in gene expression and noise in
their interaction partners (▶ MicroRNA Regulation, the activation signal (Inui et al. 2010; ▶ MicroRNA
Signaling Pathways). MiRNAs embedded in feedback Regulation, Feed-Forward Loops). MiRNAs also par-
loop systems can reinforce a given transcriptional ticipate in negative feedback loops; in the simplest
C 476 Computational microRNA Biology

Computational microRNA Biology, Fig. 3 Basic regulatory


motifs involving miRNA regulation

regulated by direct TF interaction and indirect TF


interaction realized through miRNA repression
Computational microRNA Biology, Fig. 2 Sketch of the dif-
ferent modes of miRNA target repression. A microRNA can
(Fig 3. bottom left). In these motifs, the transcription
regulate its target by (a) blocking translation initiation or causing factor and the miRNA repress the expression of the
early translation termination (left, translation repression) or target gene at the transcriptional and post-transcrip-
(b) by inducing deadenylation and subsequent destabilization tional level to ensure complete gene silencing. In inco-
(right, target deadenylation and destabilization). Extensive or
herent feed-forward loops, a target is regulated by a TF
perfect sequence complementarity between microRNA and
mRNA leads to Argonaute-catalyzed cleavage of the target and a miRNA, but with opposite signs (positively and
mRNA, which is rare in animals but oftener in plants negatively), and is therefore inconsistently regulated
(not shown, ▶ target cleavage) (Fig 3. bottom right). Incoherent miRNA-mediated
feed-forward loops can display accelerated response
as compared to the simple regulation motif, but they
configuration, a target activates the expression of its can also realize a noise buffering function, fine-tuning
repressing miRNA (Fig. 3 top right). Those systems of the target gene, and can have the ability to convert
confer homeostasis to network response, fine-tune step-like activation into transient peak activation
gene expression, and can induce fast decay or signal (Osella et al. 2011).
termination (Tsang et al. 2007, ▶ MicroRNA Regula- MicroRNA Clusters. miRNA clusters are sets of two
tion, Feed-Forward Loops). or more miRNAs that are (a) transcribed from physi-
Feed-Forward Loops. MicroRNAs are also cally adjacent miRNA genes, (b) transcribed in
involved in many gene and signaling networks with the same orientation, and (c) not separated by
feed-forward structures. In this case, the combination a transcriptional unit or a miRNA in the opposite
of transcriptional and miRNA-mediated post- orientation (▶ MiRNA Cluster). Typically, members
transcriptional regulation plays a crucial role in con- of miRNA clusters are transcribed in synchrony upon
trolling gene expression. Feed-forward loops an activation event (Fig. 4 left). MiRNAs integrated in
containing miRNAs consist of three components: a cluster sometimes target common but more often
a transcription factor (TF), a miRNA directly regulated different target genes. One example is the miR-17-92
by the TF, and a target gene, which is regulated by both cluster, which is found on the human chromosome 13,
the TF and the miRNA (▶ MicroRNA Regulation, and it is composed of six miRNAs. Among others, the
Feed-Forward Loops). There exist two types of transcription factor c-Myc binds to the upstream region
miRNA-mediated feed-forward loops. In coherent of the miR-17-92 cluster to induce its expression.
feed-forward loops, a target gene is consistently Interestingly, two miRNAs belonging to the cluster
Computational microRNA Biology 477 C
Computational microRNA
Biology, Fig. 4 Illustration
of a miRNA cluster (left) and
miRNA target hub regulation
(right)

C
(miR-17-5p, miR-20a) post-transcriptionally regulate developed that could help to uncover many more puta-
E2F1, which is itself a transcription factor that cross tive miRNA genes. In ▶ miRNA gene prediction,
activates c-Myc. Thus, a complex regulatory network sequence-based approaches try to find genomic regions
of miRNAs and TFs is formed, including negative and which may reveal stem-loop structures when being
positive feedback loops (Aguda et al. 2008). expressed. Such methods are often complemented by a
MicroRNA Target Hubs. MiRNA target hubs are conservation analysis of candidate sequences and their
genes targeted by many miRNAs (▶ Target Hub, predicted structures. Additional approaches investigate
Fig 4. right). These miRNAs might belong to the structural properties, like stability and robustness, of the
same miRNA cluster or are subject to totally different stem loop in order to reduce the space of candidates.
and independent transcriptional regulation. Under the Machine-learning approaches use known miRNA pre-
assumption that miRNA target hubs are regulated by at cursors as training set and exploit their features as
least 15 miRNAs, Shalgi and coauthors (2007) found criteria to classify new candidates. Transcriptiomics
470 of such genes in the human genome. It has been based approaches search in ▶ RNA-seq data for short
suggested that miRNA target hubs are integrated in transcripts that can be mapped to genomic loci that may
larger regulatory networks, involving transcription form stem loops upon expression. Nowadays, the search
factors, signaling proteins, and miRNAs, enriched for miRNA genes in newly sequenced genomes benefits
with many of these regulatory motifs introduced from the large body of known miRNA genes, which can
above (feed-forward and feedback loops). This multi- be used in a homology search in order to come up with
plicity of miRNA target hub regulation can lead to an initial set of miRNA candidates (Mendes et al. 2009).
highly nonlinear features like miRNA-mediated cross
talk, cooperation between different transcription fac- Target Prediction
tors, and generation of tissue-specific expression pat- The first step in the functional characterization of
terns (Lai et al. 2012). miRNAs and their association with biological pro-
cesses is the identification of miRNA-regulated
Bioinformatic Tools and Algorithms for mRNA targets. Shortly after the first miRNA-target
miRNA Investigation pairs have been identified experimentally, computa-
Advancements in the understanding of miRNA biol- tional algorithms for the prediction of more targets
ogy have always been paved by bioinformatic tools have been developed (▶ MicroRNA Target Predic-
and algorithms. Computational methods are being used tion). Soon, patterns in miRNA-target recognition
for the prediction of miRNA genes and their targets, emerged and were exploited to improve target predic-
for the analysis of high-throughput expression experi- tion results. A central criterion for target candidates is
ments and functional assessment of miRNA targets. the sequence complementarity between a potential
Furthermore, bioinformatic approaches are used for ▶ target site and the mature miRNA, with special
data integration and knowledge transfer. emphasis on the ▶ seed region. Other criteria that
suggest the presence of functional miRNA-mRNA
MiRNA Gene Identification pairs are the evolutionary conservation of the target
Around the beginning of this century, it became clear site sequence, the thermodynamic stability of the puta-
that there exist more miRNAs than just lin-4 and let-7. tive miRNA-mRNA duplex, and the multiplicity of
And it was found that many of them are conserved in miRNA binding sites in the 30 UTR of the mRNA
different species. At that time, methods based on target. For many algorithms, target predictions were
▶ non-coding RNA prediction algorithms were made accessible in public databases. Some examples
C 478 Computational microRNA Biology

Computational microRNA Biology, Table 1 MicroRNA Web resources. It has to be noted that URLs might change for unforeseen
reasons. Similarly, users should be aware of the date of the last update of the database to avoid the use of outdated information
Relevant microRNA Web resources
Name Description URL
miRBase Primary miRNA registry www.mirbase.org
Pre-miRNA and mature miRNA sequence and structure information
microRNA.org miRanda predicted miRNA targets www.microrna.org
miRecords Database of validated miRNA-target interactions http://mirecords.biolead.org
miRGator v2.0 miRNA expression profiles http://mirgator.kobic.re.kr
Functional characterization of miRNAs
miR2Disease miRNA disease relationships www.mir2disease.org
miRWalk Target predictions from eight algorithms http://mirwalk.uni-hd.de
Validated targets
Predicted functional associations

of single and composite Web-accessible resources of compartmentalization of species, the spatiotemporal


miRNA target predictions are listed in Table 1. organization of those networks can hardly be under-
stood without mathematical modeling. There is no
MiRNA Web Resources default modeling approach to be used in the investiga-
Apart from databases for miRNA target predictions, tion of microRNA regulation. The appropriate model-
there exists a large variety of other useful miRNA Web ing framework to be used depends on several
resources and Web services (▶ MicroRNA Web circumstances, for example, the extent of biomedical
Resources). The data provided includes (a) primary knowledge that exists, the quality and quantity of the
data of miRNA genes, their sequences, and expression experimental data available, and the nature of the sys-
profiles; (b) processed data, like functional assign- tem’s properties to be analyzed.
ments and computational predictions; and (c) collec- Kinetic Modeling of miRNA Regulation. Kinetic
tions of literature-based knowledge. The most models describe the evolution in time of the expression
prominent example of a miRNA database is the levels for the different biochemical species (mRNAs,
▶ miRBase database, the primary miRNA sequence microRNAs, TFs, signaling proteins) involved in reg-
repository. New miRNA sequences are registered and ulatory networks. In this kind of models, reaction rates
annotated in this database. Additionally, it is a resource account for biochemical events like processing,
for sequence and structure information of pre-miRNAs degradation, and interaction of messenger RNAs,
and mature miRNAs (Enright and Griffiths-Jones miRNAs, and proteins, including the miRNA-induced
2008). Other examples are databases of target regulation of targeted genes. An exemplary model
predictions (e.g., miRWalk), collections of validated structure is presented in Table 2.
miRNA-target interactions (e.g., miRecords), Values of the model parameters characterizing the
databases of miRNA expression profiles (e.g., reaction rates are assigned by using parameter values
microRNA.org), and databases of functional miRNA taken from relevant publications and/or by applying
characterization (e.g., miR2Disease). An overview of data fitting techniques over quantitative data. In the
some relevant miRNA Web resources, a brief descrip- context of miRNA regulation, several studies have
tion, and their URL is provided in Table 1. used kinetic models to investigate miRNA regula-
tion, some of which have addressed general design
Mathematical Modeling of miRNA Regulation principles of miRNA regulation. For example,
In complex biochemical networks involving transcrip- Levine and coauthors (2007) developed a simple
tional regulation, miRNA-mediated repression, quantitative model for miRNA-mediated target
protein-protein interactions, and subcellular repression and used it to investigate how distinctive
Computational microRNA Biology 479 C
Computational microRNA Biology, Table 2 An exemplary model structure
d X In these equations, mRNAi accounts for
mRNAi ¼ ks e2 Fact ðTFe2 Þ  kd me2 mRNAi  ka mRi miRi mRNAi
dt messenger RNAs, ProtXi for proteins and
i
protein complexes, and TFi for transcription
d
miRi ¼ ks mRi Fact ðProtX; TFmRi Þ  kd mRi miRi  ka mRi miRi mRNAj factors. Reaction rates account for TF-
dt mediated synthesis of messenger RNAs,
d X X
Tgti ¼ ks e2 mRNAi  kd e2 TgProti  TgProtXi ProtXj þ Fk ðProtXÞ degradation of messenger RNAs, miRNA-
dt j k mediated regulation of target genes, protein C
synthesis and degradation, and protein-protein
interactions. Rate equations are constructed
using mass action kinetics, Hill equations,
power-law terms, and other formalisms. In
addition, parameters kx account for the rate
constants

responses in the mRNA and protein expression levels categories. In this way, the effective inhibitory level
of the target gene are affected by parameter settings, of a miRNA can in multivalued logic be described by
which are thought as being target gene several stages, for example, null, low, high, and max-
specific. Others have shown how to employ kinetic imum inhibition, while the expression of the target
modeling to investigate the regulation of miRNA- might be categorized into silenced, low, high, and
regulated networks that are of biomedical interest. maximal expression.
For example, Lee and collaborators (2010) devel- Graph Analysis of miRNA Networks. MicroRNA-
oped a network model that characterized miR-204 mRNA networks can be described by using
as tumor suppressor miRNA and uncovered previ- graphs that capture the processes by which mature
ously unknown connections between network topol- miRNAs control the translation of target mRNAs.
ogy and expression dynamics. Additionally, they In these graphs, elements involved in the network
validated 18 target genes related to tumor are represented as nodes. Edges that connect
progression. two nodes account for a relationship (interaction)
Logical Modeling of miRNA Regulatory Networks. between them, for example, the repression of
When microRNAs are embedded in regulatory net- a mRNA by a given miRNA (▶ MicroRNA-mRNA
works, the size limits the applicability of the kinetic Regulation Networks). Using this approach, miRNA-
modeling approach. Logical models represent mRNA regulation networks have proven to display
a suitable alternative for larger networks. These can, topological properties like fat-tail degree distribution,
for example, be regulatory networks involving dozens abundance of cybernetic motifs, and modular
of transcription factors, miRNAs, and other types of structures.
functional molecules (e.g., repressors, coactivators,
corepressors, ribonucleoproteins). The logical
approach can be classified into Boolean logic and Cross-References
multivalued logic (▶ MicroRNA-embedding Regula-
tion Networks, Logical Modeling). Boolean logic ▶ MicroRNA Biogenesis, Regulation
works on the basis of binary categories. For instance, ▶ MicroRNA Clusters
a miRNA can only be treated as absent or present, ▶ MicroRNA Families, Cancer Progression
while a transcription factor as inactive or active. ▶ MicroRNA Gene Prediction
Their time evolution is restricted to transitions ▶ MicroRNA Regulation, Feedback Loop
between two generic states encoded by 0 and 1. ▶ MicroRNA Regulation, Feed-Forward Loops
Multivalued logic instead generalizes the Boolean ▶ MicroRNA Regulation, Signaling Pathways
approach and allows the use of multiple ordinal ▶ MicroRNA Target Prediction
C 480 Computational Modeling

▶ MicroRNA Web Resources Lee Y et al (2010) Network modeling identifies molecular func-
▶ microRNA, Disease and Therapy tions targeted by miR-204 to suppress head and neck tumor
metastasis. PLoS Comput Biol 6(4):p.e1000730
▶ MicroRNA, Seed Levine E, Ben Jacob E, Levine H (2007) Target-specific and
▶ MicroRNA-embedding Regulation Networks, global effectors in gene regulation by MicroRNA. Biophys
Logical Modeling J 93(11):L52–L54
▶ MicroRNA-mRNA Regulation Networks Mendes ND, Freitas AT, Sagot MF (2009) Current tools for the
identification of miRNA genes and their targets. Nucleic
▶ MiRBase Acids Res 37(8):2419–2433
▶ Non-coding RNA, Prediction Osella M et al (2011) The role of incoherent microRNA-
▶ RNA-seq mediated feedforward loops in noise buffering. PLoS
▶ Target Cleavage Comput Biol 7(3):p.e1001101
Shalgi R et al (2007) Global and local architecture of the mam-
▶ Target Hub malian microRNA-transcription factor regulatory network.
▶ Target Identification, microRNA PLoS Comput Biol 3(7):p.e131
▶ Target Site Takamizawa J et al (2004) Reduced expression of the
let-7 microRNAs in human lung cancers in association
with shortened postoperative survival. Cancer Res
64(11):3753–3756
References Tsang J, Zhu J, van Oudenaarden A (2007) MicroRNA-mediated
feedback and feedforward loops are recurrent network motifs
in mammals. Mol Cell 26(5):753–767
Aguda BD et al (2008) MicroRNA regulation of a cancer net-
work: consequences of the feedback loops involving miR-
17-92, E2F, and Myc. Proc Natl Acad Sci USA
105(50):19678–19683
Ambros V (2004) The functions of animal microRNAs. Nature
431(7006):350–355
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mecha- Computational Modeling
nism, and function. Cell 116:281–297
Carrington JC, Ambros V (2003) Role of microRNAs in plant
▶ Modeling Formalisms, Lymphocyte Dynamics and
and animal development. Science (New York, NY)
301(5631):336–338 Repertoires
Enright AJ, Griffiths-Jones S (2008) MiRBase: a database of
microRNA sequences, targets and nomenclature. In:
Appasani K (ed) MicroRNAs: from basic science to disease
biology. Cambridge University Press, Cambridge,
pp 157–171
Filipowicz W, Bhattacharyya SN, Sonenberg N (2008) Computational Modeling Languages
Mechanisms of post-transcriptional regulation by
microRNAs: are the answers in sight? Nat Rev Genet
9:102–114 ▶ Cell Cycle Modeling, Process Algebra
Garzon R, Calin GA, Croce CM (2009) MicroRNAs in cancer.
Annu Rev Med 60:167–179
Hurst DR, Edmonds MD, Welch DR (2009) Metastamir: the
field of metastasis-regulatory microRNA is spreading.
Cancer Res 69(19):7495–7498
Inui M, Martello G, Piccolo S (2010) MicroRNA Computational Models of Mechanisms
control of signal transduction. Nat Rev Mol Cell Biol
11:252–263 ▶ Mechanism, Dynamic
Iorio MV et al (2005) MicroRNA gene expression deregulation
in human breast cancer. Cancer Res 65(16):7065–7070
Lai X, Schmitz U, Gupta S, Bhattacharya A, Kunz M,
Wolkenhauer O, Vera J (2012) Computational analysis of
target hub gene repression regulated by multiple and coop-
erative miRNAs. Nucleic Acids Res 40:8818–8834
Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans
Computer Modeling
heterochronic gene lin-4 encodes small RNAs with antisense
complementarity to lin-14. Cell 75:843–854 ▶ Lymphocyte Dynamics and Repertoires, Modeling
Concavity Tree for Protein–Protein Interaction Analysis 481 C
the root is a simple characterization of the whole
Computer Simulations object boundary, i.e., its ▶ convex hull. The next
level describes the set of objects obtained by
▶ Partial Differntial Equations, Numerical Methods subtracting the object from the convex hull. All
and Simulations deeper levels are elaborated in the same way, itera-
tively. A node that represents a convex shape corre-
sponds to a leaf in the tree, so it does not have any C
child.
Figure 1 shows an example of a shape (a), its
Computer-Based Patient Records (CPR) convex hull, concavities, and meta-concavities (b),
and its corresponding concavity tree (c). The shape
▶ Electronic Health Records generates five concavities as reflected in level one of
the tree. The four leaf nodes in level one correspond to
the highlighted triangular concavities shown
in (d), whereas the non-leaf node corresponds to the
concavity shown in (e). Similarly, the nodes in levels
Concavity Tree for Protein–Protein two and three correspond to the meta-concavities
Interaction Analysis highlighted in (f) and (g), respectively. Typically,
each node in a concavity tree stores information perti-
Virginio Cantoni1,2, Alessandro Gaggia1, nent to the part of the object the node is describing
Riccardo Gatti1 and Luca Lombardi1 (a feature vector for example), in addition to tree meta-
1
Department of Computer Engineering and Systems data (like the level of the node; height, number of
Science, University of Pavia, Pavia, Italy nodes, and number of leaves in the subtree rooted at
2
Computational Biology, KTH Royal Institute of the node, etc.).
Technology, Stockholm, Sweden

Characteristics
Definition
One of the most successful approaches for protein
First introduced by Sklansky (1972), further shape analysis and description is the structural one.
researched by Bachelor (1979, 1980) and Borgefors A complex shape can be segmented into its compo-
and Sanniti di Baja (1992), a concavity tree is a data nent (e.g., a pockets set), and each pocket can be
structure used for describing non-convex subsequently decomposed into simpler regions, and
two-dimensional shapes. It is a rooted tree in which the complete description is given in terms of the

Concavity Tree for Protein–Protein Interaction Analysis, Fig. 1 An object (a), its convex hull with green concavities (b), the
actual shape of each node (c) and the corresponding three levels concavity tree (d)
C 482 Concavity Tree for Protein–Protein Interaction Analysis

D1
B331
B321
B33
B111 B311
B31 B32
D B3
D2 B3
B11
B

B1 B1
A
C B2
A1
B12
B22

A2 C1 B21 B2
B211

Concavity Tree for Protein–Protein Interaction Analysis,


Fig. 2 A 2D representation example with a tunnel and three
pockets in a section composed of three connected components Concavity Tree for Protein–Protein Interaction Analysis,
(in brown). The closed curve in black corresponds to the Fig. 3 Continuing the 2D representation example of Fig. 2, the
first level convex hull, and the border in brown dotted-dashed details of the tunnel B component (light green) are shown. The
line embodies the area under analysis. Part of the second second level is composed of the three meta-concavities (B1,
level with three meta-concavities (A, B, C) is shown; in evidence B2, and B3). Each one has concavities at the third level: B1 has
also five third level termination-node components (A1, A2, C1, a termination-node concavity (B12) and a second component
D1, and D2) B11 which maintains a meta-concavity at the fourth level
B111; B2 has a termination-node concavity (B22) and
a second component B21 which maintains a meta concavity at
the fourth level B211; finally, B3 has three node concavities
(B31, B32, B33) each one maintaining a meta-concavity node-
component B311, B321, and B331 respectively, at the fourth
level
region’s features and their spatial relationship.
This process can be executed recursively. In this
way, a sequence of approximations is built, and,
at each stage, this structural hierarchical representa-
tion can be effectively applied for analyzing a level of detail sufficiently fine for the purpose is
and comparing complex shapes. This approach is reached.
particularly fruitful in proteomics (Cantoni et al. The final result is a hierarchical structure: the (meta)
2010), in which the morphology plays a fundamental concavity tree. At each level, the concavities can be
role, for example to study protein–protein interaction analyzed and described on the basis of a set of features
in which also a geometrical congruence for evaluated at each node. Obviously, the features defined
a normally extended part of the molecule surfaces is for concavities can also be computed for the meta-
required. concavities of the same levels.
A hierarchical refinement of the morphological In Fig. 2 the concavities (three “pockets” and
analysis of extended regions is performed to describe one “tunnel”) and second-level meta-concavities of
further details. The Convex Hull of each region at a 2D example are shown. Figure 3 shows concavities
every scale level is analyzed by the same process as and meta-concavities of level two, three, and
that applied to the whole initial shape, going through four for the tunnel of level one. The corresponding
concavity and meta-concavity. The process continues concavity tree is shown in Fig. 4. As an example
until all regions of the last level are convex, or until of the proposed data description, Fig. 5 is
Concentration Control Coefficient 483 C
a partial representation of the concavity tree References
for the three main pockets of the Crystal Structure
of uncomplexed HIV1 Protease subtype A Batchelor BG (1979) Computers and digital techniques. J IEEE
2(4):157–168
(PDB ID3IX0). Note that, each node can be
Batchelor BG (1980) Shape descriptors for labeling concavity
described quantitatively through geometrical and trees. J Cybernetics 10:233–237
biochemical parameters such as skewness, kurtosis, Borgefors G, Sanniti di Baja G (1992) Methods for
mouth aperture, surface to volume ratio, travel hierarchical analysis of concavities.In: Proceedings of the C
international conference on pattern recognition, vol 3.
depth, etc.
pp 171–175
Cantoni V, Gatti R, Lombardi L (2011) Geometrical constraints
for ligand positioning. In: Proceedings of the first interna-
tional conference on bioinformatics. pp 211–216
Sklansky J (1972) Measuring concavity on a rectangular
mosaic. IEEE Trans Comput C-21:1355–1364

A B C D
Concentration Control Coefficient

A1 A2 C1 D1 D2
Emma Saavedra and Rafael Moreno-Sánchez
Department of Biochemistry, National Institute for
B1 B2 B3 Cardiology “Ignacio Chávez”, Mexico City, Mexico

B11 B12 B22 B21 Definition

It is the degree of control that each enzyme exerts on


B111 B31 B32 B33 B211 the concentration of the pathway intermediaries. It is
written as CXai where X is any pathway metabolite and
a is the activity of each pathway enzyme i.
B311 B321 B331

Concavity Tree for Protein–Protein Interaction Analysis, Cross-References


Fig. 4 Representation of the concavity tree for the 2D section of
Fig. 2 and Fig. 3. Note that each node contains the information of
the feature vector previously presented
▶ Metabolic Control Theory

PDB: 3IX0

Concavity Tree for Protein–


1st Level
Protein Interaction
Analysis, Fig. 5 The
representation of the concavity 2nd Level
tree for the Crystal Structure of
uncomplexed HIV1 Protease 3rd Level
Subtype A (PDB ID 3IX0)
C 484 Concentration Graph Model

fX;Y ðx; yÞ
Concentration Graph Model fY ðyjX ¼ xÞ ¼ ; (2)
fX ðxÞ

▶ Graphical Gaussian Model where fX;Y ðx; yÞ gives the joint density of X and Y,
while fX ðxÞ gives the density of X.

Cross-References
Concept Extraction Task
▶ Causal Relationship
▶ Template Filling, Text Mining ▶ Conditional Independence

References

Conditional Distribution John R (1999) Mathematical statistics and data analysis.


Duxbury, Belmont
Lin Wang
School of Computer Science and Information
Engineering, Tianjin University of Science and
Technology, Tianjin, China Conditional Independence

Katsuhisa Horimoto
Synonyms Computational Biology Research Center, National
Institute of Advanced Industrial Science and
Conditional probability distribution; Conditional Technology, Koto-ku, Tokyo, Japan
probability density function; Conditional probability
mass function
Synonyms

Definition Bayes rule; Causal relationship; Conditional distribu-


tion; Correlation relationship
Letting X and Y be two random variables over the
same sample space S, the conditional probability
distribution of Y given X is the probability distribution Definition
of Y when X is known to be a particular value. For
discrete random variables, the conditional distribution In probability theory, two events A and B are condi-
of Y given the value x of X can be written as the tionally independent given a third event C, if the
following formula: occurrence or nonoccurrence of A and the occurrence
or nonoccurrence of B are independent events in their
conditional probability distribution given C. In the
PðX ¼ x \ Y ¼ yÞ standard notation of probability theory, A and B are
pY ðyjX ¼ xÞ ¼ PðY ¼ yjX ¼ xÞ ¼ ;
PðX ¼ xÞ conditionally independent given C if and only if
(1)
Pr ðA \ BjCÞ ¼ Pr ð AjCÞPr ð BjCÞ

For continuous random variables, the conditional or equivalently,


distribution of Y given the value x of X can be written
as Pr ðAjB \ CÞ ¼ Pr ð AjCÞ
Confidence Intervals 485 C
In other words, A and B are conditionally indepen-
dent if and only if, given knowledge of whether Confidence Intervals
C occurs, knowledge of whether A occurs provides
no information on the likelihood of B occurring, and Andreas Raue1 and Jens Timmer1,2,3
1
knowledge of whether B occurs provides no informa- Institute for Physics, University of Freiburg, Freiburg,
tion on the likelihood of A occurring. In statistics, two Germany
2
random variables X and Y are conditionally indepen- BIOSS Centre for Biological Signalling Studies and C
dent given a third random variable Z, if and only if they Freiburg Institute for Advanced Studies (FRIAS),
are independent in their conditional probability distri- Freiburg, Germany
3
bution given Z. This is generally written: Department of Clinical and Experimental Medicine,
Link€oping University, Link€oping, Sweden
X q Y jZ
Definition
and is read “X is independent of Y, given Z.”
Often, parameter values are unknown and have to be
estimated from experimental data. It is important not
Cross-References to rely on the mere estimated values and the predic-
tions that correspond to these values. It is necessary to
▶ Bayes Rule consider the uncertainties in the parameter estimation
▶ Causal Relationship procedure: from measurement uncertainties to param-
eter uncertainties and possibly non-▶ identifiability,
to uncertainties in the model predictions. In ▶ maxi-
mum likelihood estimation, uncertainties in the
parameter estimates are usually described by confi-
þ
Conditional Independency dence intervals. A confidence interval ½s i ; si of
^
a parameter estimate yi to a confidence level a sig-
▶ Bayesian Network Model nifies that the true value yi is expected to be inside
this interval with probability of a. For nonlinear
models, confidence intervals can be defined by
a threshold Da in the likelihood. This threshold
defines a confidence region
Conditional Probability Density Function
n o  
▶ Conditional Distribution yjw2 ðyÞ  w2 ð^yÞ < Da with Da ¼ Q w2df ; 1  a
(1)

whose borders represent likelihood-based confidence


Conditional Probability Distribution intervals (Meeker and Escobar 1995). w2(y) is
the weighted sum of squared residuals and the
▶ Conditional Distribution threshold Da is the 1a quantile of the w2df  distribu-
tion. The choice of df yields confidence intervals
that hold jointly for df number of parameters
(Press et al. 1990); often df ¼ 1 is desired. For high-
dimensional models, the profile likelihood can be
Conditional Probability Mass Function evaluated to derive likelihood-based confidence
intervals; see in ▶ structural and practical
▶ Conditional Distribution identifiability analysis.
C 486 Configuration Type

Cross-References
Conformational Epitope
▶ Identifiability
▶ Maximum Likelihood Estimation ▶ Discontinuous Epitope
▶ Structural and Practical Identifiability Analysis

References
Conjugation Reactions
Meeker WQ, Escobar LA (1995) Teaching about approximate
confidence regions based on maximum likelihood
▶ Phase II Enzymes
estimation. The Am Stat 49(l):48–53
Press WH, Teukolsky SLA, Flannery BNP, Vetterling WMT
(1990) Numerical recipes: FORTRAN. Cambridge
University Press, Cambridge

Connectionist Model

▶ Linear Additive Network Model


Configuration Type

▶ IMGT-ONTOLOGY, ConfigurationType

Connective Tissue

▶ Stroma
Confocal and Multiphoton Microscopy

Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for Connectivity Theorem
Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA Emma Saavedra and Rafael Moreno-Sánchez
Department of Biochemistry, National Institute for
Cardiology “Ignacio Chávez”, Mexico City, Mexico
Definition

Confocal and multiphoton microscopy can achieve up Definition


to diffraction-limited resolution when imaging virtual
tissue sections in intact tissue volumes, which is an It links the kinetic properties of a particular enzyme
aspect of these methods referred to as “tissue section- (elasticity coefficients) with its ability to control the
ing” or “optical sectioning.” In particular, laser scanning pathway flux (flux control coefficient). The sum of the
confocal microscopy scans a focused laser beam inside products of the elasticity coefficient and the flux con-
the specimen and uses a pinhole to reject photons that trol coefficient of each of the pathway enzyme has to
arrive to the detector from out-of-focus areas. add up a value of zero.

Cross-References Cross-References

▶ Spectroscopy and Spectromicroscopy ▶ Metabolic Control Theory


Constant (C) Domain 487 C
constraints (Sauro and Ingalls 2004; Vallabhajosyula
Consensus Sequence et al. 2006). Examples are the total amount of atomic
elements (mass conservation) and the conservation of
Jianhua Ruan metabolites; the sum of ATP, ADP, and AMP remains
Department of Computer Science, University of Texas constant with time if the synthesis of these nucleo-
at San Antonio, San Antonio, TX, USA tides is not part of the network. A conservation anal-
ysis focuses on the identification of conserved C
moieties in networks and involves linear algebra
Definition methods (Sauro and Ingalls 2004; Vallabhajosyula
et al. 2006).
In transcriptional regulation studies, a consensus
sequence is a representation of a transcription factor-
binding motif. It is obtained from the multiple References
alignment of a collection of binding sites recognized
by the transcription factor, and depicts which nucleo- Sauro HM, Ingalls B (2004) Conservation analysis in biochem-
ical networks: computational issues for software writers.
tides are most abundant in the alignment at each posi- Biophys Chem 109(1):1–15
tion. For example, consider the following DNA Vallabhajosyula RR, Chickarmane V, Sauro HM (2006) Conser-
sequence, AccATGg. An upper-case letter means that vation analysis of large biochemical networks. Bioinformat-
the transcription factor has a very strong preference of ics 22(3):346–353
the corresponding nucleotide at that position, while
a lower-case letter represents the most abundant nucle-
otide at those positions. When multiple nucleotides are
equally likely to appear, IUPAC nucleotide codes can
also be used, for example, ATNNCY, where N stands Conserved Mass Analysis
for any base, and Y represents any pyrimidine.
▶ Conservation Analysis

Conservation Analysis

Sang Yup Lee Conserved Moiety Analysis


Department of Chemical and Biomolecular
Engineering and Department of Bio and Brain ▶ Conservation Analysis
Engineering, Korea Advanced Institute of Science and
Technology (KAIST), Daejeon, Korea

Synonyms Constant (C) Domain

Conserved mass analysis; Conserved moiety analysis; Marie-Paule Lefranc


Stoichiometric mass balance analysis Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142, Université
Montpellier 2, Montpellier, France
Definition

Conserved moieties are molecular subgroups that Synonyms


are conserved throughout the course of a network
evolution with respect to time due to stoichiometric C domain; C type domain; Constant domain
C 488 Constant (C) Domain

Constant (C) Domain, C domain description


Table 1 C domain C domain Receptor type per chain type
description per receptor type
and chain type C-DOMAIN IG CH1 Several CHa per IG-Heavy
CH2 chain (ex: 4 for IG-Heavy-Mu)
CH3
CH4
C-KAPPAb
C-LAMBDAb
TR C-ALPHA
C-BETA
C-GAMMA
C-DELTA
C-LIKE-DOMAIN IgSF other than C-LIKE-DOMAIN One or several domain(s) per
IG and TR chain
a
CH: C-DOMAIN of IG-Heavy chain
b
If the IG-Light chain is not specified, C-KAPPA and C-LAMBDA are described as CL
(C-DOMAIN of IG-Light chain)

Definition 104 (2nd-CYS). The amino acid at position 118,


although not conserved, is highlighted in the ▶ IMGT
The Constant (C) domain is a type of structural unit Collier de Perles as it is one of the two anchors of the
(domain) that, with the ▶ variable (V) domain, charac- FG loop.
terizes a protein chain belonging to the ▶ immunoglob- Analysis of C domain sequences can be performed
ulin superfamily (IgSF) (Duprat et al. 2004; Lefranc by tools of IMGT ®, the international ImMunoGeneT-
et al. 2005). The C domain comprises the C-DOMAIN ics information system ® (http://www.imgt.org)
of the immunoglobulins (IG) or antibodies and T cell (▶ IMGT® Information System):
receptors (TR), and the C-LIKE-DOMAIN of the IgSF • Nucleotide sequences of IG and TR C-DOMAIN by
proteins other than IG and TR. The C domain descrip- IMGT/V-QUEST
tion per receptor type and chain type, based on the • Amino acid sequences of C-DOMAIN and
▶ IMGT-ONTOLOGY concepts (▶ Chain Type, C-LIKE-DOMAIN by IMGT/DomainGapAlign
▶ Domain Type), is shown in Table 1. IMGT® labels (Ehrenmann et al. 2010)
are in capital letters. These tools align the user sequences with the closest
A C domain (C-DOMAIN or C-LIKE-DOMAIN) is C domains of the IMGT reference directory, create
usually encoded by one exon of a gene. Although the gaps according to the ▶ IMGT unique numbering for
C domain sequences may be very diverse, the structure C domain (Lefranc et al. 2005), delimit the strands and
is well conserved. The C domain (C-DOMAIN or loops, highlight differences with the closest reference
C-LIKE-DOMAIN) is made of seven antiparallel (s), and generate the ▶ IMGT Collier de Perles.
beta strands (A, B, C, D, E, F, and G) linked by beta Analysis of C domain three-dimensional (3D) struc-
turns (AB, DE, and EF), a transversal strand CD, and tures and interactions is available in the 3D database
loops (BC and FG), forming a sandwich of two sheets (IMGT/3Dstructure-DB) of the ▶ IMGT® information
(four strands on one sheet and three strands on the other system (Ehrenmann et al. 2010).
sheet). Allotypes in humans are antigenic determinants on
According to the ▶ IMGT unique numbering C-DOMAIN of the IG (CH of IG-Heavy gamma and
(Lefranc et al. 2005), the four conserved amino acids alpha chains, and C-KAPPA of IG-Light-Kappa
of a C domain always have the same position, cysteine chains) (Lefranc and Lefranc in press). They comprise:
23 (1st-CYS), tryptophan 41 (CONSERVED-TRP), • the Gm allotypes (‘gamma markers’) on CH of
conserved hydrophobic amino acid 89, and cysteine IG-Heavy-Gamma chains,
Constant (C) Gene 489 C
• the Am allotypes (‘alpha markers’) on CH of
IG-Heavy-Alpha chains, Constant (C) Gene
• the Km allotypes (or ‘kappa markers’) on
C-KAPPA of IG-Light-Kappa chains (Lefranc and Marie-Paule Lefranc
Lefranc in press) (IMGT Repertoire, http://www. Laboratoire d’ImmunoGénétique Moléculaire,
imgt.org) Institut de Génétique Humaine UPR 1142, Université
They have extensively been analysed in population Montpellier 2, Montpellier, France C
genetics and molecular evolution of the IG. They
presently regained a lot of attention in antibody
engineering and antibody humanization, in the anal- Synonyms
ysis of the immunogenicity of therapeutic antibodies.
The corresponding amino acid changes are identified C gene; Constant; Constant gene
and described according to the IMGT unique
numbering for C-DOMAIN (Lefranc and Lefranc
in press).
Definition

Cross-References The constant (C) gene, or “constant” is a ▶ leafconcept


of the “▶ GeneType” concept of identification (gener-
▶ Chain Type ated from the ▶ IDENTIFICATION axiom) of
▶ Domain Type ▶ IMGT-ONTOLOGY, the global reference in
▶ IMGT Collier de Perles ▶ immunogenetics and ▶ immunoinformatics
▶ IMGT Unique Numbering (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ IMGT ® Information System 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ IMGT-ONTOLOGY international ImMunoGeneTics information system®
▶ Immunoglobulin Superfamily (IgSF) (http://www.imgt.org) (▶ IMGT® Information Sys-
▶ Variable (V) Domain tem). “Constant” identifies a gene that codes the con-
stant region of an immunoglobulin (IG) or antibody or
of a T cell receptor (TR) chain (▶ Chain Type).
References It is one of the four leafconcepts that are character-
istics of the IG and TR loci (Lefranc and Lefranc
Duprat E, Kaas Q, Garelle V, Lefranc G, Lefranc M-P 2001a, b), the other three being “variable” (V), “diver-
(2004) IMGT standardization for alleles and mutations of sity” (D), and “joining” (J) (▶ Variable (V) Gene,
the V-LIKE-DOMAINs and C-LIKE-DOMAINs of the ▶ Diversity (D) Gene, ▶ Joining (J) Gene).
immunoglobulin superfamily. In: Pandalai SG (ed) Recent
Research Developments in Human Genetics. Research Sign-
An IG or TR constant (C) gene in contrast to V, D,
post, Trivandrum, vol 2, pp 111–136, chapter 6 and J genes does not rearrange itself and has a
Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/3Dstructure- configuration identified as “undefined” (▶ Configura-
DB and IMGT/DomainGapAlign: a database and a tool for tion Type). A C gene is characterized by an acceptor
immunoglobulins or antibodies, T cell receptors, MHC, IgSF
and MhcSF. Nucleic Acids Res 38:D301–D317
splice in 50 (the splicing occurring with the 30 donor
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou splice of a J gene).
D, Jean C, Ruiz M, Da Piedade I, Rouard M, Foulquier E, The C region encoded by a C gene comprises one or
Thouvenin V, Lefranc G (2005) IMGT unique numbering several domains (▶ Constant (C) Domain) depending
for immunoglobulin and T cell receptor constant domains
and Ig superfamily C-like domains. Dev Comp Immunol
on the chain type (▶ Chain Type).
29:185–203
Lefranc M-P, Lefranc G (in press) Human Gm, Km and Am Cross-References
allotypes and their molecular characterization: a remarkable
demonstration of polymorphism. In: Tait B, Christiansen F
(eds) Methods in Molecular Biology – Immunogenetics, ▶ Chain Type
chapter 34 ▶ Configuration Type
C 490 Constant Domain

▶ Constant (C) Domain


▶ Diversity (D) Gene Constant
▶ Gene Type
▶ IMGT ® Information System ▶ Constant (C) Gene
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
▶ Immunogenetics Constraint
▶ Immunoinformatics
▶ Joining (J) Gene Jon Umerez and Matteo Mossio
▶ Variable (V) Gene IAS-Research Philosophy of Biology Group,
Department of Logic and Philosophy of Science,
University of the Basque Country (UPV/EHU),
References Donostia – San Sebastian, Spain

Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,


Giudicelli V (2008) IMGT-Kaleidoscope, the formal IMGT-
ONTOLOGY paradigm. Biochimie 90:570–583
Synonyms
Giudicelli V, Lefranc M-P (1999) Ontology for immuno-
genetics: the IMGT-ONTOLOGY. Bioinformatics (Approx.) boundary condition; Control
12:1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
FactsBook. Academic Press, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook. Definition
Academic Press, London, pp 1–398
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, Constraint refers to a reduction of the degrees of free-
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
dom of the elements of a system exerted by some
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, collection of elements, or a limitation or bias on the
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics variability or possibilities of change in the kind of such
and immunoinformatics. In Silico Biol 4:17–29 elements.
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Although the term has several meanings in diverse
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
D, Lefranc G (2005) IMGT-Choreography for immunoge- scientific fields, the idea of a constraint is usually
netics and immunoinformatics. In Silico Biol 5:45–60 employed in relation to conceptualizations in terms
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, of levels or ▶ hierarchies. Some general features of
a system and an ontology that bridge biological and
constraints such understood are the following:
computational spheres in bioinformatics. Brief Bioinform
9:263–275 constraints do not interact with the elements they
influence and their dynamics; they arise from
dynamics at different levels of ▶ organization;
constraint relations are always asymmetric, and may
give rise to new phenomena.
Constant Domain In the tradition of the theories of emergent evolution
of the early twentieth century and, even clearer, in the
▶ Constant (C) Domain theory of levels of integration of the organicist
tradition of the 1930s, higher levels are understood as
arising from lower-level elements or processes, whose
laws all obey, but concurrently exerting some specific
influence on those very elements or processes (see
Constant Gene Blitz 1992). Later on, in most approaches to hierarchy
theory, the very concept of constraint is profusely used
▶ Constant (C) Gene to account for the specificity of nontrivial inter-level
Constraint 491 C
relation, where lower level and upper level act upon life and the nature of biological function and heredity,
each other but in different ways (see, e.g., Allen and using the concept of constraint as an explanatory tool
Starr 1982; Salthe 1985). Even the introduction of derived from Mechanics. He develops in several
some specific senses of constraint in evolutionary papers the concept and the relevant distinction
biology (as developmental constraints and between holonomic (▶ Constraint, Holonomic) (struc-
historical constraints) is due to the attempt to tural-like) and non-holonomic (▶ Constraint, Non-
accommodate different explanations, stemming from holonomic) (▶ functional-like) kinds. Pattee claims C
diverse factors or forces, and is a potential alternative, that, in order to explain hereditary storage, transmis-
if not rival, to adaptive selection, in an unavoidably sion, and the action of genetic instructions, we should
multilevel explanatory construction. understand them in terms of non-holonomic con-
straints acting in the context of a very specific kind of
interlevel relation (semantic closure).
Characteristics Following his description we may recall that, in
physics, initial conditions and laws of movement pro-
The development of more specific characterizations of vide, in principle, an exhaustive description of the
constraint (or alike) are attempts to spell out the inter- possible behavior, future and past, of a mechanical
action among levels of organization by deriving them system. No other conditions are needed. The choice
from epistemologically legitimate concepts grounded of the coordinates of the system that specifies its space
in the physical sciences or in their explanatory armory. of configuration or space of states defines all the pos-
The difficult issue of interlevel relation is rendered sible degrees of freedom in the system or all the pos-
more concrete by Polanyi’s (1968) account of organ- sible trajectories of movement of its elements. These
isms (and machines) as dual control systems, which coordinates establish the variables of the movement
relies on an indistinct concept of boundary conditions. equations of the system. In this context, constraints are
Polanyi deliberately employs the machine example to those additional equations that are introduced as aux-
introduce his idea of boundary conditions “harnessing iliary conditions in order to define the specific mechan-
the laws of nature” that govern matter and the forces ical system subject to calculation (see, e.g.,
acting on it. In a machine, the harnessing is exerted Sommerfeld 1952). In other areas of Physics, con-
with a goal, which makes easier to describe the dual straints may be expressed in a more general form as
control: the design of the machine, by which the boundary conditions. In Chemistry, constraints refer to
machine does what it is intended for, and the laws of the steady state of elementary particles (chemical
physics and chemistry, that the components of the bonds). When referred to dissipative systems, the con-
machine and the machine itself must obey. cept acquires an even more specific sense as it becomes
In the organism “its structure serves as a boundary dynamical (an unstable macroscopic pattern that
condition harnessing the physical-chemical processes remains as long as there is energy contribution).
by which its organs perform their functions” (Polanyi Finally, its presence is patent in any form of biological
1968, 1308). The machine analogy goes on to compare regulation (starting from a membrane) and more con-
morphogenesis, as the process that develops that struc- troversial in its contribution to the understanding of
ture, with the shaping of the machine. Yet, as Polanyi evolutionary paths. In social and artificial systems, it is
emphasizes, the analogy ends here: Although both are clearly manifested in the form of rules. In sum, con-
systems under dual control (unlike inanimate systems straints refer to certain conditions or rules additional to
which may exhibit boundary conditions without being the laws of dynamics (that are taken as basic), that rule/
subject to dual control), organisms are not artificially govern the behavior of the elements and that arise from
designed. Therefore, the harnessing principle in organ- their aggregation.
isms has to be autonomous (▶ Autonomy), task that Whereas natural laws are, in principle, inexorable
Polanyi attributes to the informational account of and incorporeal, constraints are, by necessity, acciden-
heredity. tal or arbitrary, and require some distinct physical
Is precisely this the challenge that Pattee (see, e.g., materialization (as molecules, membranes, or sur-
1968, 1972) undertakes in his research on the origin of faces). Constraints are alternative descriptions of part
C 492 Constraint

of the system. Namely, constraints cannot be expressed behalf.” A system acts on its own behalf if it is able
in the same language than the microscopic description to use its work to regenerate at least some of the
of matter. In fact, what a constraint does is to selec- constraints that make work possible. When this occurs,
tively ignore microscopic degrees of freedom in order a Work-Constraint cycle is realized. In physical terms,
to obtain a simplification in the prediction or explana- it requires very specific conditions to occur. Actually,
tion of movement. The concept of constraint means the cycle is inevitably a thermodynamic irreversible
a selective loss of detail or a predetermined rule about process, which dissipates energy and requires cou-
what is going to be ignored. plings between exergonic (spontaneous, energy releas-
Therefore, for Physics constraining forces are, ing) reactions and endergonic (non-spontaneous,
unavoidably, linked to a new hierarchical level of energy requiring) ones, such that exergonic processes
description. When in Physics a constraint equation is are constrained in a specific way to produce work that
added to the equations of movement, we are always may be used to generate endergonic processes, which
dealing with two languages at the same time. The in turn generate those constraints canalizing exergonic
language of the equation of movement relates the processes. In Kauffman’s terms, “Work begets con-
detailed trajectory or the state of the system with straints beget work” (Kauffman 2000).
dynamical time, whereas the language of the constraint In evolutionary biology, the concept of constraint
does not deal at all with the same kind of system, but has mainly been introduced as a challenge by develop-
with another situation in which the dynamical detail mental approaches regarding the scope of selection and
has been purposely ignored. In other words, constric- the extent to which ▶ adaptation remains the main
tion forces are not detailed forces of the individual explanatory factor (see ▶ Explanation, Developmen-
particles but forces of the collections of particles or, tal). Besides the specific issue of connecting develop-
sometimes, forces of simple units averaged out in time. mental with evolutionary accounts, the concept of
In any case, some form of statistical averaging process constraint plays an important role, for instance, in the
has substituted microscopic detail. In Physics, then, in criticisms to adaptationism, in the discussion regarding
order to describe a constraint, the detailed dynamical the relevance of stasis and other macroevolutionary
description has to be abandoned. A constraint requires, patterns, or in the morphologically oriented proposals
therefore, an alternative description. of structuralism or ▶ complexity theories (see Orzack
Another relevant contribution to naturalize the con- and Sober 2001).
cept of constraint in the biological domain has been In an already classic paper that may be considered
provided by theoretical biologist Stuart Kauffman, to be an attempt to build a “consensus” position on the
who has recently proposed the idea of the “Work- subject, a developmental constraint is defined as “a
Constraint cycle” (Kauffman 2000, 2003). The Work- bias on the production of variant phenotypes or
Constraint cycle is supposed to capture what Kauffman a limitation on phenotypic variability caused by the
takes as a central feature of all biological organisms, structure, character, composition, or dynamics of the
namely, the fact of acting “on their own behalf” developmental system” (Maynard Smith et al. 1985,
(Kauffman 2003, 1089). Whereas this idea appears to 266). The origin of this bias or limitation may be
be in accordance with common intuition, Kauffman’s attributed to various sources (materiality, genetic
scientific challenge consists in giving a naturalized and dynamics, evolutionary pathways, complexity regime,
consistent account of it. The concept of a Work- etc.) but what is agreed upon is that they have an
Constraint cycle plays precisely this role. The main impact in evolution. A distinction is also drawn
idea is to link the idea of action to that of “work,” the between universal constraints, deriving from general
latter being defined, following Atkins, as “constrained laws of physics or from invariant properties of some
release of energy into relatively few degrees of free- material or complex systems, and local constraints,
dom” (Kauffman 2003, 1094). In this definition, the confined to particular taxa.
concepts of work and constraint are related: work is Amundson (1994), while accepting that constraint
constrained release of energy. This connection gives “implies some sort of restriction on variety or on
a way to interpret the slogan acting “on their own change,” claims that the key rests on the answer to
Constraint 493 C
the question: “What is being constrained?” Thus, he Cross-References
identifies two possible answers depending on whether
the explanandum is adaptation or form. We would ▶ Adaptation
therefore have to distinguish between understanding ▶ Autonomy
the effect of developmental principles on evolution as ▶ Complexity
constraints on adaptation, that is, as restrictions ▶ Constraint, Holonomic
imposed by embryology on the adaptive optimality of ▶ Constraint, Non-holonomic C
adult organisms on the potential of adaptation, or con- ▶ Explanation, Developmental
straints on form, in the sense that in the morphospace ▶ Explanation, Evolutionary
there are morphologies that cannot be achieved by the ▶ Explanation, Functional
process of development. ▶ Functional
This distinction is orthogonal to the previous one ▶ Hierarchy
between local and universal kinds of constraints since ▶ Interlevel Causation
both of them may operate, irrespectively, on the pros- ▶ Organization
pects of reaching more optimal adaptations and on the
scope of generation of organic form.
Sometimes, other kinds of constraints are men- References
tioned in the context of contending evolutionary expla-
nations, even if they are not generally accepted as Allen TFH, Starr TB (1982) Hierarchy. Perspectives for
specific constraints. For instance, historical constraints ecological complexity. The University of Chicago Press,
Chicago
are often brought up to contrast with developmental
Amundson R (1994) Two concepts of constraint: adaptationism
ones though maintaining the same feature of imposing and the challenge from developmental biology. Philos Sci
restriction on adaptation, but they are due to more 61:556–578
“evolutionary” common factors such as contingency Blitz D (1992) Emergent evolution. Qualitative novelty and the
levels of reality. Kluwer, Dordrecht
or accident creating some kind of bias. Ecological
Kauffman S (2000) Investigations. Oxford University Press,
constraints can also be sometimes evoked. Oxford
Regarding these uses and besides those distinctions Kauffman S (2003) Molecular autonomous agents. Phil Trans
about how to understand the concept, there is another R Soc A 361:1089–1099
Maynard Smith J, Burian R, Kauffman S, Alberch P, Campbell J,
point worth mentioning, since it allows connecting the
Goodwin B, Lande R, Raup D, Wolpert L (1985) Develop-
concept of developmental constraint with the more mental constraints and evolution: a perspective from the
general idea defined in the beginning of this entry. mountain lake conference on development and evolution.
According to Schwenk and Wagner (2003), this col- Q Rev Biol 60(3):265–286
Orzack SH, Sober E (eds) (2001) Adaptationism and optimality.
lective definition highlights the separation between
Cambridge University Press, Cambridge
constraint and selection by distinguishing the “gener- Pattee HH (1968) The physical basis of coding and reliability in
ation of variation from the operation of selection on biological evolution. In: Waddington CH (ed) Towards
that variation” (p. 54). In this sense, the force of selec- a theoretical biology, vol 1, Prolegomena. Edinburgh Uni-
versity Press, Edinburgh, pp 67–93
tion would assume the role of the general law upon
Pattee HH (1972) Laws and constraints, symbols and languages.
which some rules are imposed and, here too, some In: Waddington CH (ed) Towards a theoretical biology,
potential degrees of freedom are reduced due to what vol 4, Essays. Edinburgh University Press, Edinburgh,
Polanyi was calling boundary conditions and what pp 248–258
Polanyi M (1968) Life’s irreducible structure. Science
Pattee called constraints in their more general
160:1308–1312
approaches. Salthe SN (1985) Evolving hierarchical systems. Their structure
and representation. Columbia University Press, New York
Schwenk K, Wagner GP (2003) Constraint. In: Hall BK, Olson
Acknowledgments Funding for this work was provided by WM (eds) Keywords in evolutionary developmental biology.
grants GV/EJ IT505-10 from the Government of the Basque Harvard University Press, Cambridge, MA, pp 52–61
Country, and FFI2011-25665 from the Ministerio de Economı́a Sommerfeld A (1952) Mechanics, 2nd edn. Academic,
y Competitivad (MEC) and FEDER funds from the E.C. New York
C 494 Constraint, Holonomic

among degrees of freedom. These correlations among


Constraint, Holonomic coordinates are nonintegrable with the equations of
movement, since they introduce a different temporal
Jon Umerez and Matteo Mossio scale. Classically, these constraints do reduce more
IAS-Research Philosophy of Biology Group, degrees of freedom in the dynamical movement of
Department of Logic and Philosophy of Science, the system than the number of degrees of freedom
University of the Basque Country (UPV/EHU), that were necessary to define in the static description
Donostia – San Sebastian, Spain that specifies the initial conditions and the states or
configuration space of the system. They are mainly
found in biological and social systems, where pro-
Definition cesses of selection and classification are common,
opening up the state space and increasing the variety
Holonomic constraints are auxiliary conditions that of behaviors. Constraints whose equations contain
limit permanently the number of degrees of freedom time are called rheonomic.
of a system. They are, then, fixed and passive struc-
tures that, in a sense, are independent of time (at least
of the specific time frame, dynamics, of the system in Cross-References
particular). They establish some fixed relations among
some coordinates of the system that are mathemati- ▶ Constraint
cally integrable with the fundamental equations of
movement. They are integrable since they introduce
no new temporal dimension. They just reduce degrees
of freedom once and for good. They are mainly found Constraint-based Modeling
in most physical-chemical and artificial mechanistic
systems. Constraints whose equations do not contain Osbaldo Resendis-Antonio
time are called scleronomic. Center for Genomics Sciences-UNAM, Universidad
Nacional Autónoma de México, Cuernavaca, Morelos,
Mexico
Cross-References

▶ Constraint Synonyms

Genome-scale metabolic modeling

Constraint, Non-holonomic
Definition
Jon Umerez and Matteo Mossio
IAS-Research Philosophy of Biology Group, With the advent of high-throughput technology, there
Department of Logic and Philosophy of Science, has been a growing need to develop computational
University of the Basque Country (UPV/EHU), frameworks that contribute to analyze these data for
Donostia – San Sebastian, Spain unveiling the biological principles underlying meta-
bolic networks. Constraint-based modeling is
a paradigm in systems biology that contributes to this
Definition latter aim by considering physical, enzymatic, and
topological constraints underlying the phenotype in
Non-holonomic constraints are variable auxiliary con- a metabolic network. In global terms, this method can
ditions that limit in time the number of degrees of be stratified into four steps: metabolic reconstruction
freedom of the system. They are, then, dynamical of an organism, mathematical representation of the
structures that establish time-dependent relations metabolic network, in silico analysis, and experimental
Constraint-based Modeling 495 C
Constraint-based
Modeling, Constraint-based modeling.
Deterministic
Fig. 1 Constraint-based
modeling. Constraint-based Mathematical Topology Stochastic / Boolean
modeling is a paradigm in Network dx = s.v
representation dt
systems biology which
consists of four main steps. Maximize z = Σc .x
2 2

(1) genome scale metabolic In silico Feasible space.


reconstruction (region with Modeling. C
blue block), (2) mathematical -Physical
-Thermodynamics
Omics:
representation of the
metabolic network (black), -Enzymatic
Experimental S .v = 0 -Microarrays
(3) in silico analysis (red), and v3 -Metabolomics
(4) experimental assessment assessment.
-Proteomics
(green). A continuous -Fluxomics
feedback is needed to improve
the in silico predictions, v2
extend the metabolic v1
reconstruction and design Integration databases Metabolic
experiments for uncovering high-throughput data. Reconstruction.
the organizing principles in
metabolism

assessment of computational predictions. In the end, stoichiometric, thermodynamic, enzymatic capacities,


this approach supplies with a powerful scheme for: regulatory and kinetic constraints when they are avail-
(1) integrating HT data, (2) exploring the metabolic able. In general terms, this approach can be summa-
phenotype in a metabolic reconstruction, and rized by four steps: genome scale metabolic
(3) designing experiments to assess biological reconstruction, mathematical representation of the
hypothesis. metabolic network, in silico analysis, and experimental
assessment, see Fig. 1.

Characteristics Metabolic Reconstruction


The current DNA sequence technologies and the bio-
The advent of high-throughput technology has informatics tools available in the literature have facil-
triggered the appearance of a significant number of itated the development of metabolic databases in a
databases that constitute important sources of data for variety of organisms. In this contextual scheme, these
understanding the mechanisms by which cells organize databases can be used to survey how genes codified in
their biological activity at diverse biological levels. the genome contribute to determine the innate meta-
Given the high number of variables inherent to these bolic capacities in microorganisms. These databases,
databases, this enterprise requires computational such as KEGG (Kanehisa et al. 2008) and Biocyc
methods that are capable of dealing with the heteroge- (Caspi et al. 2010) to name a few, constitute a good
neous and high dimensionality nature of high- base to construct a metabolic network by identifying its
throughput data in a coherent and systematic fashion. components and defining their interactions (Feist et al.
Constraint-based modeling serves as a model- 2009; Reed et al. 2006; Resendis-Antonio et al. 2007,
concentric database that provides with a framework 2010). Furthermore, in order to characterize the feasi-
for describing and predicting metabolic phenotypes ble metabolic capacities in the organism, this informa-
in microorganisms. Having selected a metabolic tion usually is complemented with high-throughput
reconstruction in an organism, this computational data and a carefully genetic and metabolic literature
approach involves the application of a series of review of the organism under study (Reed et al. 2006).
constraints arising from the consideration of During this curation process, it is important to verify
C 496 Constraint-based Modeling

that all the reactions included in the reconstruction are certain reactions along the reconstruction. Other
such that they are charge and mass balanced, this issue constraint belonging to this classification is related
is essential for ensuring proper in silico results and with the second law of thermodynamics, the latter
interpretations (Thiele and Palsson 2010). being a fundamental law to regulate the direction of
the biological process.
Mathematical Representation of the Metabolic • Enzymatic: when one has information about the flux
Network capacity that one enzyme can be able to carry out.
The relation between the genotype and the phenotype For instance, uptake of oxygen is usually of 20
is a central issue when studying microorganisms in nMol•gDW/Hr.
systems and synthetic biology approaches (Palsson • Additional constraints can be imposed to delimit
2006). At a macroscopic level, the phenotype state in the metabolic capacities, for instance, regulatory
a cell is well characterized by the activity of metabolic constraints when the transcriptional regulatory net-
pathways, which in turn, mirror the genetic and protein work is known in one organism (Covert et al. 2004).
activity required for facing external environment. For Taking into account these constraints brings as
this reason, the study of the metabolic capacities in a consequence a reduction of the feasible metabolic
a genome scale reconstruction becomes relevant for space, this point is an important step toward a more
exploring the feasible phenotype space in an organism. accurate modeling of genome scale metabolic recon-
To this end, the set of reactions that form the metabolic struction, see Fig. 1 (black region). A variety of
reconstruction are mathematically represented through approaches have been suggested to explore the pheno-
the ▶ stoichiometric matrix, S. Thus, by assuming that type capacities of a metabolic reconstruction and link
metabolic concentrations are at steady state condition, their outputs to experimental and high-throughput data
the feasible metabolic space of flux distribution can be (Price et al. 2004). Among these approaches, ▶ flux
calculated by: balance analysis (FBA) is a paradigm in systems biol-
ogy for exploring the metabolic phenotype in an organ-
S v¼0 (1) ism and predicting its response under external
perturbation (Orth et al. 2010). In this biased approach,
where v represents a vector whose entries are the the metabolic activity in a reconstruction is obtained,
metabolic fluxes of the reactions included in the recon- first, by defining a function that represents certain type
struction. Hence, from a mathematical perspective, of biological purpose – called objective function, Z, and
the problem associated to the feasible phenotype then, identifying the metabolic flux distribution that
state is close related to solve the null space of the maximizes it. Depending on the case of study, this
stoichiometric matrix (Palsson 2006). objective function can represent a specific physiological
state (as growth rate or bacterial nitrogen fixation) and it
In Silico Analysis is calculated through linear optimization,i.e.,
Physical, thermodynamical, and enzymatic con-
" #
straints. Even though the feasible metabolic capacities X
in the metabolic network are quantitatively well Maximize Z ¼ c k Xk
k¼1
defined through Eq. 1, the number of possibilities
remains still huge and additional considerations are
such that:
required to limit their spectrum. Thus, in order to
limit even more this space, one imposes these main X
constraints when available: Si;j vj ¼ 0 i ¼ 1 . . . :m
• Physical: when we have information that certain
reactions occur in well-defined compartmentalized  a j  v j  bj j ¼ 1...n
organelle inside the cell, for instance, mitochondria
or cytoplasm. where Xk is a metabolite participating in the objective
• Thermodynamical: when we have information function and ck is a weight factor. Here m indicates
related with the irreversibility or reversibility of the number of metabolites and n the number of
Constraint-based Modeling 497 C
reactions along the reconstruction. Note that the latter capable of integrating high-throughput data and com-
two equations indicate that the solution is subject to putational simulations. This integrative description at
a steady state condition and is constrained by thermo- genome scale is useful for: (1) understanding the fun-
dynamic and enzymatic limits. This approach has damental activity of metabolism, (2) predicting pheno-
been successfully and extensively applied to explore type metabolic under internal or external
the consequences that gene perturbations produce on perturbations, and (3) designing more informative
phenotype in archaea, eukarya, and bacteria. The experiments at various biological layers. C
number of papers where FBA has contributed to
understand the metabolic activity in microorganism
has been rising recently; a representative set of Cross-References
applications in some organism can be reviewed in
Resendis-Antonio et al. (2007, 2012), Chang et al. ▶ Flux Balance Analysis
(2012) and Varum Mazumdar et al. (2009). ▶ Stoichiometric Matrix
In parallel with this approach, a variety of methods
have been proposed for extending the analysis of the
metabolic phenotype inherent to a genome scale References
metabolic reconstruction (Price et al. 2004). The appli-
Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F,
cability of these methods ranges from sampling
Kaipa P, Karthikeyan AS, Kothari A, Krummenacker M,
the entire phenotypes in the null space of the stoichio- Latendresse M, Mueller LA, Paley S, Popescu L, Pujar A,
metric matrix to calculating the metabolic phenotype Shearer AG, Zhang P, Karp PD (2010) The MetaCyc data-
when knockout or gene deletion occurs in the base of metabolic pathways and enzymes and the BioCyc
collection of pathway/genome databases. Nucleic Acids Res
metabolic reconstruction (Segre et al. 2009). In addi-
38:D473–D479
tion, there are some algorithms devoted to explore the Chang RL, Ghamsari L, Manichaikul A, Hom EFY, Balaji S,
relation between dynamic behavior of metabolites, Fu W, Shen Y, Hao T, Palsson BO, Salehi-Ashtiani K,
topology of network, and biological functionality Papin JA (2012) Metabolic network reconstruction of
Chlamydomonas offers insight into light-driven algal
(Resendis-Antonio 2009; Jamshidi and Palsson 2008).
metabolisms. Mol Syst Biol 7:518
Covert MW, Knight EM, Reed JL, Herrgård MJ, Palsson BØ
Experimental Assessment of In Silico Predictions (2004) Integrating high-throughput and computational data
Constraint-based modeling lets us simulate real bio- elucidates bacterial networks. Nature 429(6987):92–96
Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO
logical processes and explore the resulting phenotypes
(2009) Reconstruction of biochemical networks in microbial
when changes in parameters occur due to genetic organisms. Nat Rev Microbiol 7(2):129–143
mutations or environmental perturbations. The struc- Jamshidi N, Palsson BØ (2008) Formulating genome-scale
tural organization of this approach is such that it lets us kinetic models in the post-genome era. Mol Syst Biol 4:171
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M,
to move toward experimental assessment with high-
Katayama T, Kawashima S, Okuda S, Tokimatsu T,
throughput technology. Thus, transcriptome data, pro- Yamanishi Y (2008) KEGG for linking genomes to life and
teome, and metabolome profiles – and other omics – the environment. Nucleic Acids Res 36:D480–D484
become valuable data for refining the model and Mazumdar V, Snitkin E, Amar S, Segre D (2009) Metabolic
network model of a human oral pathogen. J Bacteriol
assessing the biological hypothesis emerging from in
191(1):74–90
silico analysis (Resendis-Antonio et al. 2012). This Orth JD, Thiele I, Palsson BØ (2010) What is flux balance
feedback, established between experiment design and analysis? Nat Biotechnol 28:245–248
computational evaluation, is a valuable and needed Palsson BØ (2006) Systems biology: properties of reconstructed
networks. Cambridge University Press, Cambridge/New York
task for completing our understanding of how micro-
Price ND, Reed JL, Palsson BO (2004) Genome-scale models of
organisms organize and control their metabolic microbial cells: evaluating the consequences of constraints.
response at specific environments. Nat Rev Microbiol 2:886–897
Reed JL, Famili I, Thiele I, Palsson BO (2006) Towards
multidimensional genome annotation. Nat Rev Genet
Conclusions
7(2):130–141
Constraint-based modeling of metabolic networks Resendis-Antonio O (2009) Filling kinetic gaps: dynamic
contributes to establish a systems biology platform modeling of metabolism where detailed kinetic information
C 498 Continuous Model

is lacking. PLoS One 4(3):e4967. doi:10.1371/journal. In Systems Biology, ordinary differential equations
pone.0004967 (ODEs) are the most established continuous modeling
Resendis-Antonio O, Reed JL, Encarnacion S, Collado-Vides J,
Palsson BØ (2007) Metabolic reconstruction and modeling representation. For example, one uses ODEs to model
of nitrogen fixation in Rhizobium etli. PLoS Comput Biol the concentration of each metabolite which is calcu-
3:10 e192 lated by a single ODE encapsulating all the reactions
Resendis-Antonio O, Checa A, Encarnación S (2010) Modeling where the metabolite is synthesized or consumed, with
core metabolism in cancer cells: surveying the topology
underlying the warburg effect. PLoS One 5(8):e12383. fluxes determining the transformations to and from
doi:10.1371/journal.pone.0012383 other metabolites in the metabolic network.
Resendis-Antonio O, Hernandez M, Salazar E, Contreras S, The use of continuous models is the main technique
Martinez-Batalla G, Mora Y, Encarnacion S (2012) Systems of the quantitative sciences. The antonym of continu-
biology of bacterial nitrogen fixation: high-throughput tech-
nology and its integrative description with constraint-based ous model is logic model (Please refer to the definition
modeling. BMC Syst Biol 5:120 of logic model).
Segre D, Vitkup D, Church G (2009) Analysis of optimality in
natural and perturbed metabolic networks. Proc Natl Acad
Sci U S A 99(23):15112–15117
Thiele I, Palsson BO (2010) A protocol for generating a
high-quality genome scale metabolic reconstruction.
Nat Protoc 5(1):93–121
Contractile Actomyosin Ring (CAR)

▶ Actomyosin Ring

Continuous Model

Yong Wang Contractile Ring


Academy of Mathematics and Systems Sciences,
Chinese Academy of Sciences, Beijing, China ▶ Actomyosin Ring

Synonyms
Control
ODE
▶ Constraint
Definition

Generally speaking, continuous model is


a mathematical model whose variables are real num- Controlled Vocabulary
bers, that is, variables take the continuous value (for
example, a real number between 0 and 1). The related J€org Hakenberg
term is discrete model whose variables take the dis- Department of Computer Science and Department of
crete value. Physicists often choose continuous math- Biomedical Informatics, Arizona State University,
ematical models for problems ranging from the Tempe, AZ, USA
dynamical systems of classical physics to the operator
equations and path integrals of quantum mechanics.
These mathematical models use the real or complex Definition
number fields and we argue that the real-number model
of computation should be used in the study of the A controlled vocabulary (CV) is a set of preselected,
computational complexity of continuous mathematical predefined, and authorized terms pertaining to
models. a specific domain.
Conventional Gene 499 C
Characteristics of a classical gene (initiation codon and stop codon in
the same gene unit).
Controlled vocabularies contain carefully selected The conventional gene includes two “GeneType”
terms to describe each anticipated concept in leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept):
a certain domain. As opposed to natural language • “Conventional-with-leader” identifies any (coding
vocabularies, controlled vocabularies utilize fixed or not coding) gene other than IG or TR genes with
terms and phrases. CVs are employed to build subject a leader L region (or signal peptide). C
headings, taxonomies, and thesauri. These are gener- • “Conventional-without-leader” identifies any (cod-
ally used for annotation and search of documents, ing or not coding) gene other than IG or TR genes
clinical notes, studies, experiments, studied mate- with no leader L region (or signal peptide).
rials, etc. CVs alleviate both the processes of initial The other four “GeneType” leafconcepts are, in
indexing as well as retrieval, when users are contrast, specific to the immunoglobulins (IG) or anti-
restricted to use the same vocabulary and thus ambi- bodies and T cell receptors (TR) (▶ Variable (V) Gene,
guities such as homonyms, acronyms, and synonyms ▶ Diversity (D) Gene, ▶ Joining (J) Gene, ▶ Constant
are avoided. (C) Gene), have special characteristics (▶ Recombina-
tion Signal (RS) for V, D, and J, initiation codon and
stop codon in different genes V and C, respectively),
Cross-References and need DNA rearrangements during the biosynthesis
of the IG and TR chains (Lefranc and Lefranc 2001a,
▶ Entity Mention Normalization b) (▶ Immunoglobulin Synthesis).
▶ Information Retrieval The configuration of a conventional gene is always
▶ Named Entity Recognition identified as “undefined” (▶ Configuration Type).
▶ Text Mining

Cross-References

Conventional Gene ▶ Configuration Type


▶ Constant (C) Gene
Marie-Paule Lefranc ▶ Diversity (D) Gene
Laboratoire d’ImmunoGénétique Moléculaire, ▶ Gene Type
Institut de Génétique Humaine UPR 1142, Université ▶ IMGT ® Information System
Montpellier 2, Montpellier, France ▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
Definition ▶ Immunogenetics
▶ Immunoglobulin Synthesis
The conventional gene, or “conventional” belongs ▶ Immunoinformatics
to the “▶ GeneType” concept of identification (gener- ▶ Joining (J) Gene
ated from the ▶ IDENTIFICATION Axiom) of ▶ Recombination Signal (RS)
▶ IMGT-ONTOLOGY, the global reference in ▶ Variable (V) Gene
▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008), built by IMGT ®, the References
international ImMunoGeneTics information system ®
(http://www.imgt.org) (▶ IMGT ® Information Sys- Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc
M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
tem). The conventional gene allows to identify any
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
(coding or not coding) gene other than IG or TR genes. Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
A conventional gene has by definition the characteristics ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
C 500 Convergent Evolution

Lefranc M-P, Lefranc G (2001a) The immunoglobulin Cross-References


FactsBook. Academic, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic, London, pp 1–398 ▶ Mechanism, Conserved
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudo D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29 Convex Hull
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume Virginio Cantoni1,2, Alessandro Gaggia1 and
D, Lefranc G (2005) IMGT-Choreography for immunoge-
netics and immunoinformatics. In Silico Biol 5:45–60 Luca Lombardi1
1
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, Department of Computer Engineering and Systems
a system and an ontology that bridge biological and Science, University of Pavia, Pavia, Italy
computational spheres in bioinformatics. Brief Bioinform 2
Computational Biology, KTH Royal Institute of
9:263–275
Technology, Stockholm, Sweden

Convergent Evolution Definition

William Bechtel For a set of points S the Convex Hull H(S), in 2D space,
Department of Philosophy and Center for is the smallest convex polygon P that encloses S,
Chronobiology, University of California, San Diego, smallest in the sense that there is no other polygon P0
La Jolla, San Diego, CA, USA such that S  P0  P (Fig. 1).

Synonyms 3D Convex HULL


The previous definition can be easily extended to
Homoplasy the 3D space considering P as the smallest
polyhedron that encloses the 3D set of points S
(Fig. 2).
Definition

Convergent evolution is the process by which dis-


tantly related species independently evolve similar
genotypic or phenotypic traits. Trait convergence is
typically caused by the adaptation of organisms to
similar environments by means of natural selection.
However, traits can also converge as a result of design
constraints imposed by biological systems and
chance. Well-known examples representing the con-
vergence of gross morphological traits include the
visual systems of vertebrates, cephalopods, and
arthropods and the wings of birds, bats, and insects.
Traits that are similar due to convergent evolution are
called “analogous,” in contrast to “homologous”
traits, which are similar due to common evolutionary
or developmental descent. Convex Hull, Fig. 1 A 2D convex hull
Convex Optimization 501 C
The objective function is given by f0, and the func-
tions fi are inequality constraints. If there does not exist
a variable x which satisfies the constraints, the optimal
value is 1 by definition.
An optimization problem is called a convex opti-
mization problem if the objective function f0 and all
of the inequality constraints fi, i ¼ 1,. . ., m are con- C
vex, i.e., satisfy the condition fi((1a)x + ay) 
(1a)fi(x) + afi( y) for all a 2 [0,1] and x,y 2 ℝn
(Boyd and Vandenberghe 2004). In convex optimiza-
tion problems, any local minimum is guaranteed to be
a global minimum. Convex optimization problems
can be solved very efficiently by interior point
methods.
An important concept in convex optimization is
duality. The Lagrangian function associated with the
optimization problem (Eq. 1) is defined as
Convex Hull, Fig. 2 A 3D convex hull
X
m
Lðx; lÞ ¼ f0 ðxÞ þ li fi ðxÞ: (2)
i¼1
Cross-References
The dual optimization problem for (Eq. 1) is
▶ Extended Gaussian Image for Pocket-Ligand
defined as
Matching
max infn Lðx; lÞ: (3)
l2m
þ x2
Convex Optimization
For certain classes of convex optimization prob-
Frank Allg€ ower, Jan Hasenauer and Steffen Waldherr lems, for example, linear programs or ▶ semidefinite
Institute for Systems Theory and Automatic Control, programs, the dual problem is of the same class as the
University of Stuttgart, Stuttgart, Germany primal problem. Thus, solvers for these types of prob-
lems can also be applied to solve the dual problem.
The dual problem (Eq. 3) has the property that its
Definition optimal objective function value is always less than or
equal to the optimal value of the primal problem
Convex optimization is the task of solving a convex (Eq. 1). Also, evaluation of the objective function in
optimization problem. The goal of an optimization the dual problem for any feasible l directly gives
problem is to find a value for the optimization variable, a lower bound on the optimal value of the primal
lying within some given constraints, which minimizes problem and can be used to estimate how far the
an objective function. objective function value for any feasible x is away
Mathematically, an optimization problem over the from the optimal value. For convex optimization
optimization variable x in an n-dimensional vector problems, the optimal objective function values in
space over the real numbers is written as the primal and dual problem are even equal in many
cases. This property of duality can be used to certify
minimize
n
f0 ðxÞ nonexistence of solutions to ▶ feasibility problems
X2
(1) where an algorithm to solve the dual problem is
subject to fi ðxÞb bi ; i ¼ 1; :::; m: available.
C 502 Convex Programming

Cross-References
Core Promoter Elements
▶ Convex Programming
▶ Model Falsification Tetsuro Kokubo
▶ Semidefinite Programming Department of Supramolecular Biology, Graduate
School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan
References

Boyd S, Vandenberghe L (2004) Convex optimization. Synonyms


Cambridge University Press, Cambridge

Cis-Acting sequences in the core promoter

Definition
Convex Programming
The core promoter elements are cis-acting functional
Lin Wanglin sequences embedded within the core promoters.
School of Computer Science and Information Several representatives in the RNAPII system are
Engineering, Tianjin University of Science and summarized in Fig. 1 (Juven-Gershon and Kadonaga
Technology, Tianjin, China 2010; Smale and Kadonaga 2003). The first-
discovered and best-characterized is the TATA box,
which serves as a recognition site for TBP. This ele-
Synonyms ment is widely conserved from yeast to human, but is
found in only a small fraction (approximately 10–20%)
Convex optimization of all RNAPII promoters. Importantly, TBP is essential
for transcription, regardless of whether the TATA box
is present or absent in the core promoter. Therefore,
TBP must play some indispensable but as-yet
Definition unidentified role(s) in transcription, other than simple
binding to the TATA box. In some human promoters,
A following nonlinear programming is called a convex the TATA box is flanked by BREu and/or BREd, both
programming if f ðxÞ; gi ðxÞ; i ¼ 1; ; m; are convex of which are recognition sites for TFIIB. These ele-
functions in Rn : ments function positively or negatively depending on
the architecture of promoters and/or transcription fac-
Minimize f ðxÞ tors present in the system.
subject to gi ðxÞ  0; i ¼ 1; ; m; (1) The DNA region encompassing the transcription
Ax ¼ b; initiation site(s) often contains the initiator (Inr)
element (Fig. 1). Inr consensus sequences appear to
where A is an p  n matrix. be different in human (YYANWYY) and Drosophila
(TCAKTY) (underlined A indicates initiation site/+1).
In addition, even in human, there are multiple Inr
consensus sequences ranging from a very relaxed
References motif (YR) to a very strict motif (sINR;
GSCGCCATYTTG) (Dikstein 2011), depending on
Zhang XS (2000) Neural networks in optimization. Kluwer, the estimation method. Therefore, it is likely that
Dordrecht there are a variety of core promoter elements
Core Promoter Elements 503 C
initiation site

XCPE1/2 XCPE1: DSGYGGRASM (human)


XCPE2: VCYCRTTRCMY (human)
−8/9~+2

+27~+29
−40 +40 C
BREu TATA BREd Inr

−37~−32 −31~−24 −23~−17 −2~+4 +18~+22 +30~+33


SSRCGCC TATAWAAR RTDKKKK TCAKTY (fly) MTE
YYANWYY (human) CSARCSSAACGS
DPE
-RGWYVT
Bridge
SI SII SIII
DCE
+6~+11 +16~+21 +30~+34
CTTC CTGT AGC

Core Promoter Elements, Fig. 1 S(G/C), K(G/T), R(A/G), (motif ten element), DPE (downstream promoter element),
M(A/C), W(A/T), Y(T/C), V(G/A/C), D(G/A/T), BRE (TFIIB DCE (downstream core element), XCPE1/2 (X gene core pro-
recognition element), BREu (upstream BRE), BREd (down- moter element 1/2)
stream BRE), TATA (TATA box), Inr (Initiator), MTE

encompassing the initiation site(s), with only a fraction Nearly all of the core promoter elements are recog-
belonging to the original Inr. Consistent with this, nition sites for GTFs. As previously discussed, TATA
some human genes contain XCPE1 or XCPE2, which box and BREu/d are recognized by TBP and TFIIB,
were originally identified in the core promoter of hep- respectively. Importantly, TFIID (transcription factor
atitis B virus X gene. These elements are not only IID), a large protein complex comprising TBP and 14
structurally but also functionally different from Inr. TAFs (TBP-associated factors), is involved in the
In higher eukaryotes, some core promoter elements recognition of Inr, MTE, DPE, Bridge, and DCE. Spe-
are located downstream of transcription initiation cifically, Inr, MTE/DPE/Bridge and DCE may be rec-
site(s) (Fig. 1). Although MTE and DPE were origi- ognized by TAF1/TAF2, TAF6/TAF9, and TAF1,
nally identified as separate elements, it was recently respectively. However, it is still unclear which factor
proposed that they comprised two distinct functional recognizes XCPE1/2. Native promoters contain only
subregions, one of which is included in common in one or a few of these core promoter elements, probably
these two elements. The novel core promoter element because a weaker level of basal transcription is more
that contains the 50 -subregion of MTE and the advantageous for regulation by activators. In fact, an
30 -subregion of DPE is designated as “Bridge.” DCE artificial core promoter containing TATA box, Inr,
is another downstream core promoter element that also MTE, and DPE is highly active even in the absence
comprises three subregions (SI, SII, and SIII). DCE is of activators.
not related to MTE, DPE, and Bridge, because the Core promoter elements such as Inr that
distance from the transcription initiation site(s) is overlap with the initiation site(s) could function as
highly restricted for the latter three elements but not a platform for PIC assembly. However, these ele-
for the former element. Conversely, the TATA box ments may also contain as-yet uncharacterized but
occurs frequently with the former element but not crucial information that enables RNAPII to initiate
with the latter elements. transcription productively. Genetic studies indicate
C 504 Cores

that such information may be decoded by TFIIB,


TFIIF, and RNAPII, since these three transcription Corneal Pocket Assay
factors are required for accurate initiation site
selection. Marsha A. Moses
Department of Surgery/Harvard Medical School,
Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA
Cross-References

▶ Transcription in Eukaryote Definition

This assay, originally developed for use in the rabbit but


now routinely conducted using rodent cornea, requires
References creation of a micropocket in the normally avascular
corneal immediately above the limbus into which test
Dikstein R (2011) The unexpected traits associated with core substances that represent putative angiogenesis stimula-
promoter elements. Transcription 2(5):201–206
Juven-Gershon T, Kadonaga JT (2010) Regulation of gene tors or inhibitors (proteins, cells, or tissues) can be placed
expression via the core promoter and the basal transcriptional to determine their effect on blood vessels that arise from
machinery. Dev Biol 339(2):225–229 the limbus (Gimbrone et al. 1974). Alternatively, blood
Smale ST, Kadonaga JT (2003) The RNA polymerase II core vessels stimulated by growth factors implanted in the
promoter. Annu Rev Biochem 72:449–479
pocket can be inhibited via systemic administration of
anti-angiogenic agents (Moses et al. 1999).

Cross-References
Cores
▶ Neovascularization
José Román Bilbao Castro
Supercomputing-Algorithms group, University of
Alméria, Alméria, Spain
References

Gimbrone MA Jr, Cotran RS, Leapman SB, Folkman J


Definition (1974) Tumor growth and neovascularization: an experimen-
tal model using the rabbit cornea. J Natl Cancer Inst
Core is the name currently given to a single processor 52:413–427
Moses MA, Wiederschain D, Wu I, Fernandez CA,
within a multiprocessor chip. So, if we read “This
Ghazizadeh V, Lane WS, Flynn E, Sytkowski A, Tao T,
processor is a quad core one,” it really means “this Langer R (1999) Troponin I is present in human cartilage
chip contains four cores of processors.” Cores are and inhibits angiogenesis. Proc Natl Acad Sci USA
encapsulated together to form a single chip and some 96:2645–2650
additional elements are added to improve communica-
tions among them.

Corpora for Biomolecular Event


Cross-References Extraction

▶ Multicore computing ▶ Text Corpora, Molecular Event Extraction


Correlation Coefficient 505 C
References
Corpora for Event Extraction
Rizzo D (2006) Fundamentals of anatomy & physiology. Delmar
Cengage Learning, Clifton Park
▶ Text Corpora, Molecular Event Extraction
Stocco C, Telleria C, Gibori G (2007) The molecular control of
corpus luteum formation, function, and regression. Endocr
Rev 28(1):117–149
Zhang B, Yan L, Tsang PC, Moses MA (2005) Matrix C
metalloproteinase-2 (MMP-2) expression and regulation by
Corpus Luteum tumor necrosis factor alpha (TNFalpha) in the bovine corpus
luteum. Mol Reprod Dev 70(2):122–132
Marsha A. Moses
Department of Surgery/Harvard Medical School,
Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA Correlation

▶ Correlation Coefficient
Synonyms

Yellow body
Correlation Coefficient

Definition Fuyan Hu
Institute of Systems Biology, Shanghai University,
The corpus luteum is a temporary endocrine structure Shanghai, China
formed during the mammalian menstrual cycle. In the
early phase of the human menstrual cycle, 20–25 pri-
mary follicles containing oocytes begin to develop into Synonyms
secondary follicles, among which only one will mature
into a Graafian follicle containing a mature egg. During Coefficient of correlation; Correlation; Correlational
ovulation, the egg is released upon rupture of the statistics; Correlativity
Graafian follicle, in which the follicular cells then
undergo reprogramming to form the corpus luteum
(Rizzo 2006). During this process, the follicular cells Definition
exit cell cycle and undergo terminal differentiation into
estrogen- and progesterone-producing luteal cells. If A correlation coefficient is a statistic index which
fertilization and implantation do not occur, the corpus measures the degree to which two variables are related.
luteum soon degenerates to form the corpus albicans Generally, it varies from 1 (perfect negative correla-
(white body), and a new menstrual cycle is initiated tion) through 0 (no correlation) to +1 (perfect positive
(Stocco et al. 2007). Intense angiogenesis occurs during correlation). If a negative correlation is detected,
the formation of the corpus luteum, which is regulated whenever one variable has a high (low) value, the
by key angiogenic factors such as vascular endothelial other will have a low (high) value. If it is a positive
growth factor (VEGF) and matrix metalloproteinase-2 correlation, whenever one variable has a high (low)
(MMP-2) (Stocco et al. 2007; Zhang et al. 2005). value, so does the other.
The correlation coefficient of x1 and x2 is given by
the formula:
Cross-References
covðx1 ; x2 Þ E½ðx1  m1 Þðx2  m2 Þ
¼ (1)
▶ Neovascularization s1 s2 s1 s2
C 506 Correlation Network

where m1 is the arithmetic mean of x1 , m2 is the arith- correlation between the values on the two axes in
metic mean of x2 , covðx1 ; x2 Þ is the covariance of x1 a two-dimensional plot is quantified by the correla-
and x2 , s1 is the standard deviation of x1 , and s2 is the tion coefficient. Also, correlating values of one vari-
standard deviation of x2 . able with corresponding values at different times is
called autocorrelation.
Cross-References

▶ Correlation Relationship Cross-References

▶ Causal Relationship
References ▶ Conditional Independence

Rodgers JL, Nicewander WA. Thirteen ways to look at the


correlation coefficient. Am Stat. 1988;42(1):59–66.

Correlational Statistics

Correlation Network ▶ Correlation Coefficient

▶ Relevance Networks

Correlativity

Correlation Relationship ▶ Correlation Coefficient

Katsuhisa Horimoto
Computational Biology Research Center, National
Institute of Advanced Industrial Science and CoSBiLab
Technology, Koto-ku, Tokyo, Japan
▶ BlenX

Synonyms

Bayes rule; Causal relationship; Correlation coefficient Cost Function

▶ Objective Function
Definition

A correlation is the degree of the relationship


between two variables. A positive correlation is Covariance Graph
a relationship that the amounts of the two variables
simultaneously increase. In a negative correlation, as ▶ Relevance Networks
the amount of one variable increases, the amount of
another variable decreases. Note that there is no evi-
dence that changes in one variable cause changes in
the other variable. In statistics, correlation frequently Covariance Selection Model
means the degree to which two or more quantities are
linearly associated. In particular, the degree of ▶ Graphical Gaussian Model
Cross Tissue Disease Network 507 C
Definition
CpG Island
A cross view of the different molecular states,
Jianhua Ruan interaction of physiological functions and pathways
Department of Computer Science, University of Texas across multiple tissues of a diseased system could be
at San Antonio, San Antonio, TX, USA appropriately and comparably presented through
▶ cross tissue disease networks. CTDNs are C
constructed from specific gene/protein expression
Definition data across important tissues of diseased organisms
(Fig. 1).
In genetics, CpG islands refer to genomic regions that The application of CTDN could be properly defined
contain a high frequency of CG dinucleotides, where from the following examples:
the “p” simply indicates that C and G are on the same • It could be used to study the inter-tissue relation-
strand connected by a phosphodiester bond. CpG ships and identify genes, which are important in
islands are typically located near the transcription different tissues and can also act as information
start site of housekeeping genes or other genes that relays in the control of peripheral tissues in obese
are highly expressed. mice. The subnetworks, which are specific to the
tissue-to-tissue interactions, are enriched in the
genes having corresponding disease-relevant bio-
logical functions (Drake 2010).
CPM • CTDNs derived from gene co-expression data have
been shown to determine the pleiotropic effects of
▶ Cellular Potts Model single disease gene in human. Such genes differ in
their expression-based tissue-specific interactions
(Schadt 2009).
• The gene enrichment and genetic association ana-
Criterion Function lyses of CTDN modules facilitate identification of
important modules relating to macrophage func-
▶ Objective Function tion and metabolic traits in human obese patients.
This module has been shown to have high correla-
tion with obesity-specific clinical parameters.
Such networks can also describe causal relation-
ship of genes with disease-associated phenotypes
Cross Tissue Disease Network (Schadt 2009).
• Tissue-to-tissue networks enable the identification
Sanjeev Kumar1 and Shipra Agrawal2 of disease-specific genes that respond to the physi-
1
BioCOS Life Sciences Private Limited, Bangalore, ological changes in one tissue, which is actually
Karnataka, India triggered in some other tissues (Sieberts and Schadt
2
BioCOS Life Sciences Pvt. Limited, Institute of 2007).
Bioinformatics and Applied Biotechnology, • CTDN could also identify genes related
Bangalore, Karnataka, India to communication between tissues. These net-
works provide a first step toward understanding
of the complex diseases by laying out the hierarchy
Synonyms of interacting molecular networks. These
interacting networks in turn define various physi-
Cross tissue signature network; Tissue-to-tissue ological states in the mammalian systems (Drake
network 2010).
C 508 Crossover Design

Cross Tissue Disease


Tissue A
Network, Fig. 1 A tissue-to-
tissue network is generated
from each possible tissue pairs
by identifying the significantly
correlated gene co-expression
traits

Cross-tissue network

Tissue B Tissue C

References
Cross-Validation
Drake TA (2010) Genes and pathways contributing to
obesity: a systems biology view. Prog Mol Biol Transl Sci Joo Chuan Tong
94:9–38 Data Mining Department, Institute for Infocomm
Schadt EE (2009) Molecular networks as sensors and drivers
of common human diseases. Nature 461:218–223 Research, Singapore, Singapore
Sieberts SK, Schadt EE (2007) Moving toward a Department of Biochemistry, Yong Loo Lin School of
system genetics view of disease. Mamm Genome Medicine, National University of Singapore,
18:389–401 Singapore

Definition

Training data set will be randomly partitioned into


i-number of subsets with equal size. Then one subset
Crossover Design
will be evaluated by a model trained by the data from
the I-1 subsets, this process is repeated for i times.
▶ Repeated Measures
A good model trained on data points that are represen-
tative for the population should consistently produce
high accuracy across the i-number of validation. This
process is called cross-validation.

Cross Tissue Signature Network Cross-References

▶ Cross Tissue Disease Network ▶ TAP Translocator, In Silico Prediction


Cyclins and Cyclin-dependent Kinases 509 C
CRSP Cyclin Detection

▶ Mediator ▶ Quantifying Lymphocyte Division, Methods

C
Cyclin-dependent Kinase 1 (CDK1)/
Cumulative Causation Anaphase-Promoting Complex (APC)
Oscillator
▶ Positive Feedback
▶ Cell Cycle of Early Frog Embryos

Curation
Cycling Egg Extract
Sabina Leonelli
ESRC Centre for Genomics in Society, University of ▶ Xenopus laevis Egg Extract
Exeter, Exeter, Devon, UK

Definition
Cyclins and Cyclin-dependent Kinases
Ensemble of activities involved in developing and
maintaining a database (Howe and Rhee 2008; Stein Marcos Malumbres
2008; Leonelli 2010). Spanish National Cancer Research Centre (CNIO),
Madrid, Spain

Cross-References
Synonyms
▶ Bio-Ontologies
▶ CellML Model Curation Cell cycle-regulated kinase complexes

Definition
References
Cyclins are proteins initially identified by their differ-
Howe D, Rhee SY (2008) The future of biocuration. Nature ential protein levels during different phases of the
455:47–50
▶ cell cycle. This characteristic depends on specific
Leonelli S (2010) Packaging data for re-use: databases in
model organism biology. In: Howlett P, Morgan MS (eds) amino acid sequences that are recognized by the pro-
How well do facts travel? The dissemination of tein degradation machinery. Cyclins are known to
reliable knowledge. Cambridge University Press, function as the regulatory subunit of heterodimeric
Cambridge, MA
protein kinase complexes that also contain a catalytic
Stein LD (2008) Towards a cyberinfrastructure for the biological
sciences: progress, visions and challenges. Nat Rev Genet subunit with serine/threonine kinase activity, known as
9(9):678–688 cyclin-dependent kinase (Cdk).
C 510 Cyclins and Cyclin-dependent Kinases

Characteristics to DNA Damage (▶ Cell Cycle Arrest After DNA


Damage) whereas M-cyclins are divergent cyclins
Cyclins were first discovered as oscillating proteins involved in ion transport. Cables1,2 are cyclin-like
that accumulate and suddenly disappear during cell proteins that are both partners and substrates of Cdks
division in sea urchin eggs (Jackson 2008). These pro- and may be involved in apoptosis. Several other
teins are usually not expressed in quiescent cells but cyclins including Cyclin F, G1,2, I, I2, JL, M1-4, and
they are synthesized in dividing cells to regulate entry O do not have cognate Cdk partners. Despite their
into the cell cycle and progression throughout the name, it is also unclear whether the protein levels of
different phases of the cell cycle. However, what many of these cyclins are tightly regulated by protein
makes these proteins special is the tight control of synthesis and degradation.
their levels by protein degradation. Cyclins contain
specific amino acid sequences, such as the destruction Mammalian Cdks
(D)-box, that are recognized by the ubiquitin-mediated The Cdk denomination was originally applied to
protein degradation machinery. Specific ubiquitin kinases whose function was clearly demonstrated to
ligases recognize these residues and tag these proteins be dependent on a cyclin. This denomination is now
through the addition of ubiquitin residues. These applied to 20 different kinases (CDK1-20; Fig. 1)
ubiquitinated cyclins are then quickly degraded by although their control by cyclins is not well charac-
the proteasome. terized in some cases (Malumbres et al. 2009). All
Cyclins contain a common amino acid stretch, Cdks display high similarity in their kinase domain
known as the cyclin box, responsible for binding and and contain a specific stretch of amino acids, usually
activation of their partner kinases, the Cdks. Mono- known as the PSTAIRE domain (or a1 helix),
meric Cdks are inactive as kinases and their activation involved in the binding to cyclins (Morgan 2007).
requires the interaction with cyclins. The binding Similarly to cyclins, this family of proteins includes
between cyclins and Cdks results in crucial changes members with a clear implication in the cell division
in the structural conformation of the kinase domain of cycle. Thus, CDK1, 2, 3, 4, and 6 play distinct func-
Cdks (Morgan 2007). In addition, cyclins are thought tions in the progression throughout the cell cycle
to provide additional recognition sites for specific sub- upon activation by specific members of the A-, B-,
strates. Thus, cyclins determine the substrate specific- D-, and E-cyclins. On the other hand, other Cdks,
ity and timing of activation of Cdks. In addition, their such as CDK7-9 and CDK11 display critical roles in
interaction with Cdks modulates other crucial aspects ▶ transcription and RNA processing, usually in com-
of Cdk function such as its control by upstream regu- bination with C-, H-, K-, L-, and T-cyclins
latory enzymes or by subcellular localization (Malumbres and Barbacid 2005).
(Malumbres and Barbacid 2005).
Regulation and Function of Cdks in the Control of
Mammalian Cyclins the Cell Cycle
Cyclins are defined by the presence of one or two Only a few cyclin-Cdk complexes are known to be
cyclin boxes. About 35 cyclins exist in the human directly involved in the regulation of the cell cycle.
genome (Fig. 1). Many of these proteins display little CDK1, the mammalian ortholog of the yeast CDC2/
sequence similarity outside the cyclin box and their CDC28 kinase (▶ Cell Cycle, Budding Yeast; ▶ Cell
functional homology is therefore unclear. The Cycle, Fission Yeast), is a major regulator of G2 and
N-terminal region usually contains regulatory and ▶ mitosis (▶ Mitotic Kinases) in combination with
targeting domains, including the D-box, that are spe- A- and B-type cyclins. CDK2 (and perhaps CDK3)
cific for each cyclin class. are activated by E- and A-type cyclins and may par-
Major cell cycle cyclins are grouped into a cluster ticipate in cell cycle control during ▶ DNA replica-
that contains A-, B-, D-, and E-type cyclins. Cyclins C, tion and G2 (▶ Cell Cycle Transitions, G2/M) and in
H, K, L, and T are mostly involved in the control of DNA repair. CDK4 and CDK6, on the other hand, are
transcription and RNA processing. G-type cyclins specifically activated by D-type cyclins and control
have been proposed to play a role in the response the entry into the cell cycle and G1 progression in
Cyclins and Cyclin-dependent Kinases 511 C
Cyclin B3
Cyclin D1
Cyclin D2
Cyclin D3
Cyclin E1
Cyclin E2
Cyclin A1
Cyclin A2 CDK1
Cyclin B1 CDK2 C
Cyclin B2 CDK3
Cyclin G1 CDK5
Cyclin G2 CDK14
Cyclin C CDK15
Cyclin H CDK16
Cyclin I2 CDK17
Cyclin O CDK18
Cyclin I CDK4
Cyclin J CDK6
Cyclin JL CDK7
Cyclin L1 CDK20
Cyclin L2 CDK11A
Cyclin Y CDK11B
Cyclin YL1 CDK10
Cyclin YL2 CDK12
Cyclin YL3 CDK13
Cables1 CDK9
Cables2 CDK8
Cyclin T1 CDK19
Cyclin T2
Cyclin K
Cyclin F
Cyclin M1
Cyclin M3
Cyclin M2
Cyclin M4

Cyclins and Cyclin-dependent Kinases, Fig. 1 Representa- indicate interactions whose biological significance is not clear.
tive interactions between mammalian cyclins and Cdks. CDK5 Evolution distances in cyclins and Cdks are represented at
may interact with some cyclins (e.g., D-type cyclins) although different scale
this binding does not result in kinase activation. Dotted lines

response to mitogenic signaling (Malumbres and both activating and inactivating phosphorylations. In
Barbacid 2005). It is not clear whether most of the mammals, CDK7 is the catalytic subunit of the com-
other Cdks play direct roles in the regulation of the plex known as Cdk-activating protein (CAK). CAK
cell cycle. CDK5 for instance is activated by cyclin- phosphorylates several Cdks at a critical threonine
like proteins that are almost uniquely expressed in residue at the T-loop of the kinase domain, and this
brain cells. As indicated above, CDK7-9 and phosphorylation is required to unblock the active-site
CDK11 are involved in transcription and RNA cleft. On the other hand, two additional kinases,
processing and the information available for most WEE1 and MYT1, are able to phosphorylate two
other Cdks is too scarce to assign a clear role in the residues located in the roof of the kinase ATP-binding
control of the cell cycle. site and these phosphorylations inhibit Cdk kinase
The activation of cyclin-Cdk complexes is tightly activity. Dephosphorylation of these sites is mediated
controlled at different levels including phosphoryla- by CDC25(A-C) phosphatases. Both WEE1 and
tion and dephosphorylation events, as well as folding CDC25 enzymes are critical regulators of Cdk acti-
and subcellular localization. Cdks are regulated by vation in response to multiple signals such as those
C 512 Cyclins and Cyclin-dependent Kinases

involved in the DNA damage response (Morgan monitored by several ▶ cell cycle checkpoints
2007; Jackson and Bartek 2009) (▶ Cell Cycle Arrest which sense possible abnormalities and generally
After DNA Damage). Cdks are also regulated by result in the inhibition of Cdk activity and cell cycle
direct binding to several ▶ Cdk inhibitors of the arrest.
Ink4 (p16Ink4a, p15Ink4b, p18Ink4c and p19Ink4d) and In general, Cdks are considered as the major
Cip/Kip (p21Cip1, p27Kip1 and p57Kip2) families. engines that drive entry and progression throughout
Ink4 proteins bind CDK4 and CDK6 thus preventing the different phases of the eukaryotic cell cycle. Not
the interaction of the monomeric Cdk with their acti- surprisingly, their activity is deregulated in human
vating cyclins. Cip/Kip inhibitors, on the other hand, cancer (▶ Cell Cycle, Cancer Cell Cycle and
generally bind and inhibit cyclin-Cdk complexes gen- Oncogene Addiction) and inhibiting Cdk activity is
erating inactive trimeric complexes (Sherr and Rob- now considered as an attractive therapeutic approach
erts 1999) against tumor cell proliferation (Malumbres and
The typical phosphorylation sequence recognized Barbacid 2009).
by Cdks is [S/T]*PX[K/R], where [S/T]* indicates
the phosphorylated serine or threonine residue, P is
proline, X represents any amino acid, and [K/R] rep- Cross-References
resents a basic amino acid lysine or arginine two
positions downstream of the phosphorylated residue. ▶ Cdk Inhibitors
Cdks exert their effects in the cell cycle by phosphor- ▶ Cell Cycle
ylating a large number of proteins in the cell, thus ▶ Cell Cycle Arrest After DNA Damage
regulating many aspects of cellular architecture and ▶ Cell Cycle Checkpoints
metabolism. In response to mitogenic signals, cyclin ▶ Cell Cycle Transition, Principles of Restriction
D-CDK4/6 complexes phosphorylate and inhibit the Point
retinoblastoma protein (pRB), a transcriptional ▶ Cell Cycle Transitions, G2/M
▶ repressor that defines the ▶ restriction point in ▶ Cell Cycle Transitions, Mitotic Exit
mammals by preventing the synthesis of cell cycle ▶ Cell Cycle, Budding Yeast
proteins. pRb exerts this function by inhibiting E2F ▶ Cell Cycle, Fission Yeast
transcription factors and recruiting chromatin ▶ DNA Replication
remodeling complexes thus repressing transcription ▶ Mitosis
at specific sites. pRB may be also phosphorylated by ▶ Mitotic Kinases
many other Cdks such as CDK1, CDK2, CDK3, ▶ Repressor
CDK9. Upon pRB inactivation, many proteins ▶ Transcription
required for the synthesis of DNA and mitosis are
synthesized, and cells become committed to ▶ DNA
replication and progression throughout the following References
phases of the cell cycle. CDK2 and CDK1, in com-
plex with E-, A-, and B-type cyclins, are responsible Jackson PK (2008) The hunt for cyclin. Cell 134:199–202
for the phosphorylation of a significant number of Jackson SP, Bartek J (2009) The DNA-damage response in
human biology and disease. Nature 461:1071–1078
proteins. CDK1 (▶ Mitotic Kinases), the major Cdk Malumbres M, Barbacid M (2005) Mammalian cyclin-
activity during G2 and M, phosphorylates more than dependent kinases. Trends Biochem Sci 30:630–641
70 proteins involved in DNA condensation, formation Malumbres M, Barbacid M (2009) Cell cycle, Cdks and cancer:
of the spindle, Golgi remodeling, nuclear envelop a changing paradigm. Nat Rev Cancer 9:153–166
Malumbres M, Harlow E, Hunt T, Hunter T, Lahti JM,
breakdown, etc. (Malumbres and Barbacid 2005).
Manning G, Morgan DO, Tsai LH, Wolgemuth DJ (2009)
Mitotic exit (▶ Cell Cycle Transitions, Mitotic Exit) Cyclin-dependent kinases: a family portrait. Nat Cell Biol
requires the proteasome-dependent degradation of A- 11:1275–1276
and B-type cyclins, thus switching off Cdk activity Morgan DO (2007) The cell cycle: principles of control. New
Science Press, London
and allowing DNA decondensation, spindle disas-
Sherr CJ, Roberts JM (1999) CDK inhibitors: positive and neg-
sembly, reformation of the nuclear envelop, ative regulators of G1-phase progression. Genes Dev
etc. Normal progression throughout the cell cycle is 13:1501–1512
Cytokinesis 513 C
Cyclosome Cytokinesis

▶ Anaphase-Promoting Complex (APC/C) Sebastian Mana-Capelli and Dannel McCollum


▶ Anaphase-Promoting Complex Inhibitors Department of Molecular Genetics and
Microbiology, University of Massachusetts,
Worcester, MA, USA C
Cytogenetics
Definition
Jingky Lozano-K€uhne
Department of Public Health, University of Oxford, Cytokinesis is the process by which one cell
Oxford, UK divides into two daughter cells with equivalent
genetic and cytoplasmic content. Cytokinesis initiates
after the chromosomes have segregated and culmi-
Definition nates by dividing the cell perpendicular to the
mitotic spindle after the completion of telophase,
Cytogenetics is a branch of genetics that deals with thus bringing closure to the ▶ cell cycle. The process
the study of the structure and function of the chromo- of cell division requires extensive rearrangements of
somes (Prabhu Britto and Ravindran 2007). the cytoskeleton and vesicle trafficking apparatus.
Much of the spatial and temporal regulation of cyto-
kinesis is controlled by ▶ small GTPases and regula-
References tory kinases that coordinate this process with the
nuclear cycle.
Prabhu Britto A, Ravindran DG (2007) A review of cytogenetics
and its automation. J Med Sci 7:1–18

Characteristics

Cytokinesis can be universally divided into three major


Cytokines events: determination of the cleavage plane, assembly
(and constriction) of the cytokinesis apparatus, and cell
Rohan Dhiman
abscission. However, different organisms have devel-
Center for Pulmonary and Infectious Disease Control,
oped remarkably variable approaches to cytokinesis.
University of Texas Health Science Center at Tyler, Specification of the division plane is the least con-
Tyler, TX, USA
served of the three major events. Once the site is
determined, metazoans and fungi divide through the
use of a contractile ▶ actomyosin ring. In contrast,
Definition plants assemble membranes and cell wall in
a structure called the phragmoplast, which assembles
These are low molecular weight molecules secreted by in the cell middle and then expands outward toward the
various immune cells which modulate the immune
cell cortex. The final joining of cell membranes to
system by acting as intercellular mediators (Gilman
bring about completion of cytokinesis is called abscis-
et al. 2001) sion (Guertin et al. 2002). In this chapter, we will
survey the mechanisms by which different cell types
undergo cytokinesis, with special emphasis on the
References model systems that have been extensively studied.
Both commonalities and differences (e.g., presence of
Gilman A, Goodman LS, Hardman JG, Limbird LE
(2001) Goodman & Gikman’s the pharmacological basis of a cell wall, or the use of a contractile ring) are
therapeutics. Mc Graw-Hill, New York discussed below.
C 514 Cytokinesis

Determination of the Cleavage Plane Fission Yeast


The division plane in Schizosaccharomyces pombe is
Animal Cells and Some Lower Eukaryotes determined by the position of the nucleus in late G2/
In animal cells and some lower eukaryotes (but not early mitosis (Oliferenko et al. 2009). Fission yeast
yeast), the selection of the division plane is deter- cells are cylindrical in shape, and the nucleus is
mined by the position and orientation of the mitotic roughly positioned at the center of the cell by inter-
spindle at anaphase. Although the cell division plane phase microtubules, leading to the generation of two
typically bisects the mitotic spindle, it has long been daughter cells of approximately the same size. Mid1,
argued whether the signal to position the furrow orig- a paralog to metazoan anillins, plays a central role in
inates from the central spindle or overlapping astral the determination of the cleavage plane. Upon entry
microtubules from opposite poles. More recent evi- into mitosis, the bulk of Mid1 translocates out of the
dence suggests that both types of microtubules can nucleus in a Polo-kinase-dependent manner and local-
induce furrow formation, with the emphasis izes to protein assemblies called nodes at the equator of
depending on the cell type. For example in large the cell. Mid1 then recruits other components of the
cells such as early embryos, the central spindle is actomyosin ring to the medial plane of the cell, stim-
a long distance from the cortex, and in these cells, ulating equatorial interphase nodes to mature into
the astral microtubules seem to have a more important cytokinesis nodes that will eventually form the acto-
role in positioning the cell division site. There has myosin ring (Pollard and Wu 2010).
also been a debate over the role of astral microtu-
bules, which could act either by promoting constric- Plant Cells
tion in the cell middle, or inhibiting cortical tension at Plant cells do not divide through actomyosin ring con-
the cell poles (Eggert et al. 2006). At the molecular striction but by the continuous deposition of membrane
level, there is evidence that spindle microtubules and cell-wall components at a medial plane between
control the equatorial cortex contractility by regulat- the recently divided nuclei. Shortly after mitotic entry,
ing the activity of the small GTPase RhoA, the master microtubules, microtubule-associated proteins, and
driver of actomyosin ring formation and constriction. actin form a cortical structure, called ▶ pre-prophase
The RhoA GAP MGCRacGap and the GEF ECT2 band, at the future division site. The pre-prophase band
both localize to the spindle midzone in anaphase, marks the cortex by localization of landmark proteins
where they form a complex that is regulated by sev- before it is disassembled in pro-metaphase. Although
eral mitotic kinases, including Cdk1, Aurora B, and a number of proteins have been implicated in pre-
Plk1. Unfortunately, it is not clear yet how the signal prophase band formation and function, the precise
is transduced from the spindle to the cortex (Piekny mechanisms governing their action are still unclear
et al. 2005). (M€uller et al. 2009).

Budding Yeast Assembly and Action of the Cytokinesis Apparatus


Unlike animal cells, the plane of division in Saccharo-
myces cerevisiae is determined before ▶ mitosis. Since Animal Cells and Some Lower Eukaryotes
this type of yeast divides through budding, the plane of In mammalian cells, actomyosin ring assembly and
cytokinesis is the narrow connection between the constriction is regulated by RhoA (see above). Active
mother and daughter cell, known as the bud neck. RhoA localizes to the cortical side of the division
The site selection for bud growth is determined in plane, where it interacts with formin, promoting
G1/S by the localization of landmark proteins. Inter- assembly of actin filaments. RhoA also interacts with
estingly, as in mammalian cells, the landmark proteins the multidomain protein anillin, which binds actin,
recruit a small GTPase (in this case RSR1), which myosin, and possibly anchors the actomyosin ring to
starts a GTPase cascade resulting in growth of the the cell membrane. Later in anaphase, RhoA activates
bud and assembly of the septin ring, which acts as Rock kinase, which phosphorylates the regulatory
a scaffold for actomyosin ring formation (Oliferenko chains of myosin, inducing bipolar myosin filament
et al. 2009). assembly and motor activity. A large number of other
Cytokinesis 515 C
components contribute to actomyosin ring assembly Fission Yeast
and dynamics, including septins and various actin- As previously described, actomyosin ring assembly in
binding proteins, such as the Arp2/3 complex and S. pombe is driven by protein complexes called cyto-
cofilin. Actomyosin ring constriction was initially kinesis nodes. Actin filaments form from nodes and are
explained through the “purse string” model whereby cross-linked by myosin, whose contractile activity is
bipolar myosin filaments walk along actin filaments to thought to bring about condensation of the nodes into
bring about constriction of the ring in a manner similar a discrete actomyosin ring. Interestingly, the Mid1- C
to muscle constriction. Although there is considerable dependent pathway only appears to operate prior to
evidence for this model, other models are also anaphase. At anaphase onset, a signaling pathway
supported by experimental data, including global con- called the SIN takes over from Mid1 to promote acto-
traction of the cortex and weakening of resistive forces myosin ring maturation and stability. SIN mutants
in the cleavage furrow (Eggert et al. 2006). Alterna- form rings in early mitosis through the Mid1-
tively, cell division could come about by global con- dependent pathway, but the rings fall apart in ana-
traction of the cortex and astral microtubule stimulated phase. In contrast, cells lacking Mid1 fail to assemble
relaxation at the cell poles distal from the cleavage rings in early mitosis, but assemble misplaced rings in
furrow. The amoeba Dictyostelium discoideum usually late mitosis. In addition, activation of the SIN pathway
undergoes cytokinesis in a similar fashion to metazoan in interphase can trigger actomyosin ring assembly and
cells. However, cells defective in myosin II still form constriction independent of Mid1. Together, these
equatorial furrows through a passive mechanism that results suggest that fission yeast uses two pathways to
requires adhesion to a substrate and involves cytoskel- promote ring assembly; the Mid1 pathway, which posi-
eton actin dynamics and cortical tension. This could be tions the ring in early mitosis, and the SIN-dependent
explained by models showing that contraction can be pathway, which promotes ring assembly in anaphase
generated in absence of motor proteins through (Pollard and Wu 2010).
a ratchet diffusion based mechanism involving fila-
ment cross-linking and bundling proteins (Sun et al. Plants Cells
2010). During anaphase, most plant cells use the remnants of
the mitotic spindle to interdigitate microtubules and
Budding Yeast actin at the plane previously marked by the pre-
Cytokinesis in budding yeast occurs at the bud neck, prophase band, forming the phragmoplast. Molecular
a small connection of about 1 um in diameter between motors then use microtubules to drive Golgi-derived
the mother cell and the bud. Septins arrive at the bud vesicles containing cell-wall components and mem-
neck by late G1 and sequential localization of evolu- brane to the phragmoplast, where the new cell wall is
tionary conserved components of the actomyosin ring synthesized. The phragmoplast then expands toward
result in a functional contractile ring by late anaphase. the region of the cortex marked by the pre-prophase
Since the metaphase spindle resides in the mother cell, band (M€uller et al. 2009).
cytokinesis is halted until one ▶ spindle pole body
enters the bud at the end of anaphase. Multiple mech- Cell Abscission
anisms appear to block the mitotic exit network (MEN) Abscission is the process by which the membrane
from triggering initiation of actomyosin ring constric- connections between daughter cells are severed.
tion until the daughter spindle pole body has passed Although it is clear that vesicle fusion is important
through the bud neck and into the daughter cell for this step, cell abscission as a whole is still poorly
(Balasubramanian et al. 2004). Ingression of the acto- understood, and the various model systems will be
myosin ring is closely followed by deposition of dealt with together.
a septum. It is thought that because of the high turgor In animal cells, furrow ingression leads to the for-
pressure in yeast cells, the actomyosin ring cannot mation of an intercellular bridge connecting both
cause constriction on its own, but instead acts to daughter cells. Inside the bridge, stabilized
guide the primary septum, which provides most of overlapping microtubules remaining from the central
the force required for constriction. spindle recruit g-tubulin and other proteins, organizing
C 516 Cytometry, Cytofluorimetry, Microfluorimetry, Analytical Cytology

into an electron dense structure called the ▶ midbody Cross-References


(Steigemann and Gerlich 2009). Actomyosin ring con-
striction proceeds perpendicularly to the midbody until ▶ Actomyosin Ring
both structures interact. At this point, two events must ▶ Cell Cycle
happen at the cellular bridge before the two daughter ▶ Midbody
cells complete cytokinesis: microtubule disassembly/ ▶ Mitosis
severing and membrane fusion. Microtubules at one ▶ Pre-Prophase Band
side of the midbody are usually severed through ▶ Small GTPases
a mechanism that is not fully understood but involves ▶ Spindle Pole Body
the microtubule severing protein spastin. Spastin is
recruited to the midbody by the endosomal sorting
complex, or ESCRT, which is essential for cell abscis- References
sion. The thin connection between daughter cells is
then sealed by vesicle fusion. Recycling endosomes Balasubramanian MK, Bi E, Glotzer M (2004) Comparative
analysis of cytokinesis in budding yeast, fission yeast and
and exocytic vesicles accumulate at the cellular bridge
animal cells. Curr Biol 14:806–818
through interactions with t-SNARE proteins and tar- Eggert US, Mitchinson TJ, Field CM (2006) Animal cytokinesis:
get-site components of the exocyst complex, recruited from parts list to mechanisms. Annu Rev Biochem 75:
at specialized membrane domains at both sides of the 543–566
Guertin DA, Trautmann S, McCollum D (2002) Cytokinesis in
midbody. However, the exact nature of the vesicle
eukaryotes. Microbiol Mol Biol Rev 66:155–178
fusion events at the midbody is not known. In addition, M€uller S, Wright AJ, Smith LG (2009) Division plane control in
there is evidence that ESCRT proteins can promote plants: new players in the band. Trends Cell Biol 4:180–188
cell abscission by self-assembly into spiral-shaped Oliferenko S, Chew TG, Balasubramanian MK (2009) Position-
ing cytokinesis. Genes Dev 6:660–674
fibers that induce membrane curvature (Steigemann
Piekny A, Werner M, Glotzer M (2005) Cytokinesis: welcome to
and Gerlich 2009). the Rho zone. Trends Cell Biol 12:651–658
Although fission and budding yeast both require Pollard TD, Wu JQ (2010) Understanding cytokinesis: lessons
a mechanism for final partitioning of the cellular mem- from fission yeast. Nat Rev Mol Cell Biol 11:149–155
Steigemann P, Gerlich DW (2009) Cytokinetic abscission: cel-
branes upon contractile ring constriction, evidence for
lular dynamics at the midbody. Trends Cell Biol 11:606–616
a discrete abscission step has only been observed in Sun SX, Walcott S, Wolgemuth CW (2010) Cytoskeletal cross-
budding yeast. However, the mechanisms governing linking and bundling in motor-independent contraction. Curr
abscission are unclear. Once abscission partitions the Biol 15:649–654
cytoplasm of the two daughter cells, cell separation is
achieved through degradation of the division septum.
This process is tightly regulated to avoid cell lysis and
involves glucanases (fission yeast) and chitinases (bud- Cytometry, Cytofluorimetry,
ding yeast) (Balasubramanian et al. 2004). Microfluorimetry, Analytical Cytology
In higher plants, vesicle fusion at the division
plane results in an outward growth of the cell plate ▶ Cell Cycle Analysis, Flow Cytometry
toward the cell cortex. The plane of cell plate growth
is oriented by the expanding phragmoplast, which is
guided toward the spatial cues left by the pre-
prophase band. How membrane fusion is coordinated Cytostatic Factor (CSF) Egg Extract
to occur all around the cell cortex at once remains
a mystery. ▶ Xenopus laevis Egg Extract
D

D Gene initiation and perpetuation of the inflammatory


response (Janeway 1989). Many DAMPs are nuclear
▶ Diversity (D) Gene or cytosolic proteins with defined intracellular function
that, when released outside the cell following tissue
injury, move from a reducing to an oxidizing milieu
resulting in their functional denaturation (Rubartelli
DAH and Lotze 2007). Also, following necrosis (a kind
of cell death), tumor DNA is released into the extra-
▶ Differential Adhesion Hypothesis nuclear space/extracellular microenvironment and
functions as a DAMP (Farkas et al. 2007).
Protein DAMPs include intracellular proteins, such
as heat-shock proteins or HMGB1 (high-mobility
Damage-associated Molecular Patterns group box 1), and proteins derived from the extracel-
lular matrix that are generated following tissue injury,
Harpreet Singh such as hyaluronan fragments. Examples of nonprotein
Computational Biology Service Unit, Cornell DAMPs include ATP, uric acid, heparin sulfate, and
University, Ithaca, NY, USA DNA.
Biomedical Informatics Centre, Indian Council of
Medical Research, New Delhi, India
Cross-References

Synonyms ▶ Microbe-associated Molecular Pattern


▶ Pattern Recognition Receptors
Alarmins; DAMP

References
Definition Farkas AM, Kilgore TM, Lotze MT (2007) Detecting DNA:
getting and begetting cancer. Curr Opin Investig Drugs
Damage associated molecular patterns (DAMPs) are 8(12):981–986
molecules released by stressed cells. Interactions Janeway C (1989) Immunogenicity signals 1,2,3 . . . and 0.
Immunol Today 10(9):283–286
between PRRs and DAMP initiate and perpetuate
Rubartelli A, Lotze MT (2007) Inside, outside, upside down:
immune response similar to pathogen-associated damage-associated molecular-pattern molecules (DAMPs)
molecular pattern molecules (PAMPs) that drive and redox. Trends Immunol 28(10):429–436

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
D 518 DAMP

informaticians. The SEEK is also actively


DAMP codeveloped by one of its first adopters, the Virtual
Liver project, and is available as a free-standing plat-
▶ Damage-associated Molecular Patterns form (http://www.sysmo-db.org/) for academic
research groups.
The SEEK encourages the use of community stan-
dards by providing tools to assist with data manage-
Data and Model Management Platform, ment and annotation, as data exchange and reuse rely
SEEK on sufficient annotation, consistent metadata descrip-
tions, and the use of standard exchange formats for
Franco du Preez models, data, and the experiments that they are
SysMO-DB team, Manchester Centre for Integrative derived from. The SEEK makes use of its own set of
Systems Biology, University of Manchester, minimum information models for each data type,
Manchester, UK known as JERMs (Just Enough Results Model). The
JERMs are derived from the minimum information
models established by MIBBI and are available as
Definition spreadsheet templates. As entries in such templates
often include items from controlled vocabularies,
The ▶ SysMO-DB project in the ▶ SysMO consortium such as ontologies, the ▶ SysMO-DB team also
is developing a general platform to enable the sharing developed an open source tool called RightField,
and exchange of research data/models/processes in which makes dynamic browsing of ontologies and
systems biology consortia. This platform is known as embedding of their elements possible within JERM
the SEEK (1) and emphasis is placed on conserving and templates.
increasing the reusability of research outputs. The SEEK The SEEK’s recommended model format is SBML
provides an index of consortium resources and acts as (▶ Systems Biology Markup Language (SBML)),
a gateway to other tools and services. Examples include which is used by the JWS Online model repository
integration with the ▶ JWS Online model repository to and simulator, that has been a part of the SEEK since
enable model simulations, and the PubMed plugin that its inception. The aim of this integration is to provide
allows publications to be linked to supporting data and researchers and modelers with easy access to modeling
author profiles. This transforms the SEEK from a static standards such as ▶ Systems Biology Markup Lan-
repository to an active, dynamic resource. guage (SBML), ▶ SBGN, and ▶ MIRIAM compliant
model annotation.

Characteristics
Cross-References
The SEEK is accessible via a web-based client and
built on an open source philosophy. It has been adopted ▶ JWS Online
by projects outside the ▶ SysMO consortium, includ- ▶ MIRIAM
ing The Virtual Liver project (http://seek.virtuelle- ▶ SBGN
leber.de/), EraSysBio + (http://www.erasysbio.net), ▶ SysMO
and UniCellSys (http://www.unicellsys.eu/) by the ▶ Systems Biology Markup Language (SBML)
time of writing. Development of the SEEK follows
a rapid, incremental cycle that is strongly user oriented
by virtue of frequent interactions, in site visits and References
workshops with a focus group of users, the
Wolstencroft K, Owen S, du Preez FB, Krebs O, Mueller W,
▶ SysMO-PALS (Project Area Liasons), who are rep-
Goble C, Snoep JL (2011) The SEEK: a platform for sharing
resentatives from each ▶ SysMO project and are data and models in systems biology. Methods Enzymol
a mixture of experimentalists, modelers, and 500:629–655, Elsevier, Amsterdam
Data Integration and Visualization 519 D
biology–related activities, from functional characteri-
Data Collection zation of genomic and proteomic data to the develop-
ment of mathematical models of biological processes,
▶ Data Integration requires an integrated view of all relevant data useful
to accomplish those tasks.

Data Collection (Integration) from Cross-References


Distributed Sources D
▶ Cell Cycle Database
▶ Distributed Data Access

References
Data Deluge Stein LD (2003) Integrating biological databases. Nat Rev Genet
4(5):337–345
▶ Data-intensive Research

Data Integration Data Integration and Visualization

Roberta Alfieri and Luciano Milanesi Steve R. Pettifer and Teresa K. Attwood
Institute for Biomedical Technologies – CNR Faculty of Life Sciences and School of
(Consiglio Nazionale delle Ricerche), Segrate, Milan, Computer Science, University of Manchester,
Italy Manchester, UK

Synonyms Synonyms

Data collection; Data warehouse Interoperability; Semantic integration

Definition Definition

In systems biology, data integration is an important Data integration is the process of bringing together
approach to better understand the main features of information from multiple, diverse sources such that
a biological process, because it represents a way to it can be interrogated as a whole to provide holistic
combine interesting information related to the reaction knowledge that is greater than the sum of its parts. In
involved in specific network. One possible method to particular, data integration aims to seamlessly expose
develop an integration system is the data warehousing information inherent in the relationships between con-
approach (Stein 2003), which allows the integration of cepts. There are numerous technological challenges
information stored in different biological databases. relating to the scalability of data-integration systems,
The necessity for data integration is widely approved as well as complex issues concerning both the nature of
in the bioinformatics and systems biology community the data itself and the means by which the data may be
since bioinformatics data are currently spread across understood by humans. Visualization is the process of
different databases and they are stored in a wide vari- making data human intelligible, enabling human intu-
ety of formats. Moreover, the achievement of ition and expert knowledge to be applied in areas
interesting results in most bioinformatics and systems where algorithmic interrogation is unrealistic.
D 520 Data Integration and Visualization

Systems biology attempts to take a “universal” Many data-integration issues arise from the need to
view of complex biological phenomena, treating reconcile disparate source database schemas; the
them as an integrated whole rather than as individual, more diverse the schemas, the harder the problem.
independently functioning components: it is as much Traditional (relational) databases explicitly encode
about understanding the interactions between biolog- schemas into the database architecture; this requires
ical entities as it is about the entities themselves. This the schemas to be “decomposed” when loaded into
approach necessitates studies at a variety of levels, a warehouse, or at run-time in a federated system.
from individual small molecules and macromolecular Schemas can be very simple, with all information
complexes to their interactions in a variety of interre- encoded in a single monolithic table of records – this
lated contexts: e.g., the specific biochemical path- makes some queries trivial, but is inflexible, making
ways in which they participate, the molecular re-purposing the data difficult. Alternatively, they may
networks those pathways may form, and, ultimately, encode records with finer levels of granularity – this
the complete organisms to whose evolution and func- provides greater flexibility, but imposes on users the
tioning those networks and pathways contribute. need to formulate significantly more complex queries.
Supporting systems perspectives with information Contemporary approaches attempt to obviate the need
technology is challenging in terms of both data inte- for schemas, relying on the use of ontologies to give
gration and visualization: managing the individual meaning and structure to database contents. Here,
components in isolation is hard enough; integrating “triple stores” and “schemaless” databases provide
this information to provide “system-wide” views is a generic underlying technology, and shared ontol-
even more daunting. ogies for reconstituting those data are managed by
the community.
Various community-driven initiatives are evolving
Characteristics data-integration standards (e.g., BioSharing,
BioDBcore, BioPax) in close collaboration with
Issues with Data-Integration Technologies publishers, journals, database curators, software devel-
Traditionally, data integration has involved either opers, and so on (Field et al. 2009; Gaudet et al. 2011;
data warehousing or database federation. In data Demir et al. 2010). These are intended to help users
warehousing, data are extracted from various sources locate and access information dispersed within data-
(databases, “flat files,” and other information- bases; to help shape the data-preservation/manage-
management systems), transformed into a common ment/sharing policies implemented by journal editors
schema (and also possibly audited for quality), and and funders; and to encourage software developers to
finally loaded into what is logically a single (usually embrace and extend community-endorsed standards,
relational) data-store for querying by end-users. This like the systems biology markup language (SBML,
process is often referred to as “ETL” (extract, trans- Hucka et al. 2004) and CellML (Garny et al. 2008).
form, load). Federating databases, however, retains Alongside ontologies and schemas, numerous “mini-
the original data sources and schemas at their original mum information” standards have evolved as a means
locations, instead providing a distributed query mech- of establishing a baseline for the information deposited
anism that interrogates the sources in unison, as if in data repositories (Taylor et al. 2008). These check-
they were logically a single resource. In essence, lists and guidelines aim to improve the quality and
federated systems perform a similar process to ETL integrity of recorded data, leading to improved oppor-
but “on the fly.” Warehousing is a predominantly tunities for data integration.
centralized approach, and federation is inherently
distributed; thus, the same issues of balancing space
complexity (storage) and time complexity (computa- Issues with the Data
tional load) apply as with other distributed architec- Irrespective of data-integration technologies, there are
tures. Similarly, the usual pros and cons associated also issues at the data level, especially with identity
with scalability, run-time performance, and data and nomenclature. Getting humans to agree on the
integrity must be taken into consideration when meaning of words is hard; getting them to understand
designing data-integration platforms. when they are using the same name to mean different
Data Integration and Visualization 521 D
Data Integration and Visualization, Table 1 The variety of names for ‘the same’ gene and its protein product in three different
species, including the many different protein alternative names and gene synonyms
Species Protein Alternative name Gene Synonym UniProtKB:ID
Saccharomyces DNA replication licensing cell division control MCM4 CDC54, MCM4_YEAST,
cerevisiae factor MCM4 protein 54 HCD21 P30665
Drosophila DNA replication licensing protein disc proliferation dpa MCM4_DROME,
melanogaster factor MCM4 abnormal Q26454
Mus musculus DNA replication licensing CDC21 homolog, Mcm4 Cdc21, MCM4_MOUSE,
factor MCM4 P1-CDC21 Mcmd4 P49717
D

things is harder; getting a machine to understand the Visualization


difference is harder still. Precision is key; but unfortu- Humans are intuitive pattern matchers. Computers, by
nately, adherence to standard nomenclatures has been contrast, are incapable of the leaps of intuition that are
limited in the life sciences. the essence of human thought processes. Machines
Consider the protein shown in Table 1: this has five must be programmed to find specific patterns; this
protein and six gene names in three different species. requires programmers to characterize those patterns
Humans can unravel such complexity relatively easily; in terms that are machine comprehensible. Visualiza-
but once we involve computers in the loop of compre- tion bridges the gap between human intuition and
hension, the task becomes more complicated, and the machine pattern-matching, and constitutes the set of
level of precision required to specify a particular tools and paradigms that allow computers to aid
protein/gene becomes more difficult to achieve: e.g., humans in knowledge discovery from large, complex
a text-mining algorithm exploring the literature for data-sets.
gene dpa or protein “disc proliferation abnormal” Aside from standard histograms, scatter graphs,
would miss other mentions of the same gene (Mcm4, etc., the life sciences also tend to use network-,
HCD21, CDC54, etc.) or protein (cell division control structure-, and sequence-visualization approaches.
protein 54, CDC21 homolog, P1-CDC21) unless it had Network visualization is important because systems
been programmed to use comprehensive synonym biology tries to understand relationships between bio-
dictionaries. logical entities. Computationally, these relationships
Consistent use of identifiers is also an issue. Con- can be represented as “graphs” in the mathematical
sider ovine rhodopsin. This protein entered the PIR sense, i.e., collections of objects (represented by
database with identifier (ID) OOSH, accession num- “nodes”), some of which are linked in some way
ber (AC#) A03155. Successive changes to its (with relationships represented by “edges”). Numerous
sequence and annotation then led to changes of its exchange formats (e.g., GraphML) have been devel-
AC# to A93264, A90319 and then A30407. Later, the oped for encapsulating the properties of generic math-
protein also appeared in Swiss-Prot with ID OPSD ematical graphs, with more specific formats (like
$SHEEP, AC# P02700. Its ID then changed to SBML, CellML, and BioPax) for capturing networks
OPSD_SHEEP; finally, the PIR and Swiss-Prot and pathways in machine-readable form. The systems
entries acceded to UniProtKB, where changes biology graphical notation (SBGN) initiative defines
and refinements continued to be made. Today, a mechanism for making such networks human read-
UniProtKB archives 82 versions of the entry for able, providing both a graphical notation for displaying
ovine rhodopsin, including these three different IDs and a schema for encoding network diagrams
and five different AC#s; the sequence itself is also (Le Novère et al. 2009).
stored in UniParc, with AC#s UPIUPI000059C30D Structure visualization encompasses techniques
and UPI0000130E18, the latter being the currently and formats for representing small molecules
active sequence of record. Initiatives such as the (consisting of perhaps a few tens of atoms and
MIRIAM Registry (Laibe and Le Novère 2007) and bonds), through to macromolecular structures (involv-
its associated naming scheme will be crucial in ing hundreds or thousands of atoms and bonds). For
untangling such “identity crises” in future. small molecules, compact representations such as the
D 522 Data Integration and Visualization

IUPAC international chemical identifier (InChiTM) are disparate data drawn from a field that is itself growing
often sufficient to define a molecule’s topological in complexity. It has become clear that the ad hoc
structure (but the use of InChi as a chemical identifier mechanisms and file formats that have served the
is hotly contested, with many arguing that “semantic- field for decades are no longer sufficient, and greater
free” identifiers offer a more reliable foundation for use of ontologies, standards, and identification
data integration). A plethora of other file formats schemas will be required to help extract knowledge
explicitly capture atomic coordinates for small mole- from our growing data collections. With increasing
cules; for large molecules like proteins, where the complexity, however, our ability to integrate data
atomic structure is typically determined experimen- meaningfully using ontologies and schemas is being
tally using techniques like X-ray crystallography, for- pushed to its limits. Ultimately, if we are to be able to
mats such as that defined by the protein data bank grasp the subtleties of communication, we will need to
(PDB, Berman et al. 2000) are often used for data be able to more effectively capture the relationships
exchange. As well as encoding the empirically deter- between data and the biomedical literature (i.e., how
mined coordinate data, this format supports basic data are described in scientific articles). Tools for
annotation of the underlying sequence. managing this relationship will become crucial in
Sequence visualization techniques usually involve future.
displaying ordered lists of amino acids (in a protein) or
nucleotides (in a chromosome or genome). Here, the Cross-References
main challenge is to provide mechanisms for mapping
regions of interest (topological domains, coding/non- ▶ Alignment, Protein Interaction Networks
coding regions, etc.) onto a sequence, and to allow ▶ Applied Text Mining
comparison between multiple sequences (e.g., to deter- ▶ Biological Network Model
mine their similarity). The data formats for sequences ▶ Biological System Model
are relatively simple compared to those for structure ▶ Bio-Ontologies
and network visualization: e.g., the FastA and PIR ▶ Controlled Vocabulary
formats primarily comprise strings of characters from ▶ Data Integration
the relevant sequence alphabet; the “general feature ▶ Data Mining
format” (GFF) and the schema of the distributed anno- ▶ Directed Acyclic Graph
tation system (DAS) also encode regions of interest ▶ Distributed Data Access
that may be mapped to the sequence. Typically, ▶ Gene Ontology
sequences are depicted visually as horizontal rows of ▶ Graph
color-coded blocks, representing residues or nucleo- ▶ Graph Algorithms in Network Analysis
tide bases. Such visualizations allow users to scroll ▶ Graph Alignment, Protein Interaction Networks
back and forth, zoom in and out, reorder the sequences ▶ Graphical Model
or their features, render them in 3D, etc. Until recently, ▶ Interoperability
users would tend to align tens or hundreds of ▶ Knowledge
sequences; however, developments in next-generation ▶ MIRIAM
sequencing (NGS) are likely to drive this into the ▶ MIRIAM URI
thousands, pushing the limits of sequence visualization ▶ Ontology
into new dimensions. ▶ Ontology Structure
▶ SBML
Conclusions ▶ Text Mining
Capturing life science data for the purposes of data
integration and visualization – whether for sequence, References
structure, or network analysis – is challenging. As
techniques like NGS create more data than ever before, Berman HM, Westbrook J, Feng Z et al (2000) The protein data
bank. Nucleic Acids Res 28:235–242
the problems become more complex. In attempting to
Demir E, Cary MP, Paley S et al (2010) The BioPAX community
paint holistic pictures, systems biology must take into standard for pathway data sharing. Nat Biotechnol 28:
account the increasing convolutions of integrating 935–942
Data Integration, Breast Cancer Database 523 D
Field D, Sansone SA, Collis A et al (2009) Omics data sharing. Characteristics
Science 326:234–246
Garny A, Nickerson DP, Cooper J et al (2008) CellML and
associated tools and techniques. Philos Transact A Math Motivation
Phys Eng Sci 366:3017–3043 Breast cancer is one of the most common cancer types:
Gaudet P, Bairoch A, Field D et al (2011) Towards BioDBcore: Approximately, it affects 1 out of 10 women and rep-
a community-defined information specification for biologi- resents the 25% of all tumors that hit women. From
cal databases. Nucleic Acids Res 39:D7–D10
Hucka M, Finney A, Bornstein BJ et al (2004) Evolving a lingua a scientific point of view, it is increasingly believed
franca and associated software infrastructure for computa- that using a systems biology perspective it is possible
tional systems biology: the Systems Biology Markup to develop better strategies for cancer treatment; this D
Language (SBML) project. Syst Biol 1:41–53 consideration is due to the fact that the complex behav-
Laibe C, Le Novère N (2007) MIRIAM resources: tools to
generate and resolve robust cross-references in systems ior of living systems can be hard to predict from the
biology. BMC Syst Biol 1:58 properties of individual parts (such as genes, proteins,
Le Novère N, Hucka M, Mi H et al (2009) The systems biology and cells). In this context, a multilevel integration of
graphical notation. Nat Biotechnol 27:735–741 the available knowledge regarding both biological
Taylor CF, Field D, Sansone SA et al (2008) Promoting coherent
minimum reporting guidelines for biological and biomedical components and pathways is a crucial task in order to
investigations: the MIBBI project. Nat Biotechnol 26: promote the system perspective.
889–896
Functionality
The G2SBC Database contains several types of molec-
ular alterations associated with breast cancer. These
Data Integration, Breast Cancer alterations encompass the genome (mutations and
Database SNPs), the transcriptome (RNA expression level and
splicing variations), and the proteome (protein expres-
Ettore Mosca, Ivan Merelli and Luciano Milanesi sion level, sequence, structure, and localization).
Institute for Biomedical Technologies – CNR Molecular alteration data are integrated with informa-
(Consiglio Nazionale delle Ricerche), Segrate, tion concerning the molecular network layer (path-
Milan, Italy ways and interactions, retrieved from databases as
KEGG Pathway and BioGRID, respectively) and the
cellular layer (breast cancer types and tissue images
Synonyms from the Human Protein Atlas, mathematical models
of carcinogenesis, tumor growth, and tumor response
Genes-to-systems breast cancer database from literature).
The Web site (implemented employing PHP
and JavaScript) by which the database can be
Definition accessed is mainly divided into three sections. The
first concerns the query system, which allows to
The Genes-to-Systems Breast Cancer Database retrieve data from the three levels of biological enti-
(shortly, G2SBC Database) is a freely available Web ties: molecular components, the molecular systems,
resource that collects data about genes, transcripts, and and the cellular layer.
proteins reported in the scientific literature as altered in The second section concerns the analysis tools
breast cancer cells; alterations encompass different available by means of the Web interface. In particular
types of mutations and expression variations of tran- there are two tools that rely on the application of
script and proteins. These data are integrated in graph theory to biological networks for highlighting
a multilevel database (from genes, transcripts, and pro- interactions, protein complexes, and hubs.
teins to molecular networks, cell populations, and tis- Concerning the gene-annotation enrichment analysis,
sues), which is coupled with a series of analysis tools the G2SBC Database provides a tool that enables
concerning cellular biochemical pathways, protein– the functional annotation of gene lists provided by
protein physical interactions, protein structures, and the user or retrieved exploring the data from the
mathematical models of cell behavior. database.
D 524 Data Management

Data Integration, Breast DIFFERENTIAL EXPRESSION


NORMAL BREAST CANCER TYPES
Cancer Database, AND PATHWAYS
Fig. 1 Integration between
protein expression in different Cadherin
breast cancer types and p120ctn
biochemical pathways data.
Lobular carcinoma in situ RPTPs
A use case showing the −p −p
differential expression of LMW-PTP
some proteins belonging to the PTPμ
“adherens junction” KEGG PTPs
VEPTP
map in lobular carcinoma in PTP1B
situ and invasive lobular LAR SHP-1
carcinoma tissue images
collected from the Human
Protein Atlas database. Green:
Cadherin
downregulation; yellow: Invasive lobular carcinoma
p120ctn
similar expression
RPTPs −p −p
LMW-PTP

PTPμ
PTPs
VEPTP PTP1B
LAR SHP-1

Lastly, the G2SBC Database maintains a model- ▶ Gene Set and Protein Set Expression Analysis
oriented section, which involves two aspects. The ▶ Interaction Networks
first one concerns the interaction among cell cycle ▶ KEGG Pathway Database
regulation and breast cancer: due to this connection it
is possible to retrieve the breast cancer genes involved
in cell cycle control and simulate the associated math- References
ematical models. The second regards the mathematical
models related to carcinogenesis, tumor growth, and Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M
(2010) KEGG for representation and analysis of molecular
response to treatments.
networks involving diseases and drugs. Nucleic Acids Res
The data integration approach employed to develop 38(Database issue):D355–D360
this resource enables the collection of a large amount Mosca E, Alfieri R, Merelli I, Viti F, Calabria A, Milanesi L
of records and the Web tools provided allow to infer (2010) A multilevel data integration resource for breast can-
cer study. BMC Syst Biol 4:76
nontrivial knowledge: one example is the integration
Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L,
of protein expression data available for different types Oughtred R, Livstone MS, Nixon J, Van Auken K,
of breast cancer with the cellular pathways, Fig. 1. Wang X, Shi X, Reguly T, Rust JM, Winter A, Dolinski K,
Tyers M (2011) The BioGRID interaction database:
2011 update. Nucleic Acids Res 39(Database issue):D698–
Accessibility
D704
The G2SBC Database is freely accessible at http:// Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K,
www.itb.cnr.it/breastcancer. An extensive help section Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S,
is available and contains some use cases covering the Wernerus H, Bj€ orling L, Ponten F (2010) Towards
a knowledge-based human protein atlas. Nat Biotechnol
different sections of the G2SBC Database.
28(12):1248–1250

Cross-References

▶ Comparative Analysis of Molecular Networks Data Management


▶ Data Integration, Breast Cancer Database
▶ Functional Enrichment Analysis ▶ Knowledge Management
Data Mining 525 D
designs. In contrast, data mining is more interested in
Data Management: Data Processing making inferences from data without much back-
ground information or typically without control over
▶ Distributed Data Management data producing process, for example, experimental
design. John Chambers (1993) coined the term
“greater statistics” as everything related to learning
from data, and data mining falls within that definition.
Data Mining Thus, data mining can also be thought of as a field of
statistics making heavy use of visualizations and com- D
Aleksi Kallio1 and Jarno Tuimala2 putationally intensive methods with an emphasis on
1
CSC, IT Center for Science Ltd, Espoo, Finland algorithmic efficiency.
2
Finnish Red Cross Blood Service, Helsinki, Finland Machine learning is a related field that shares many
of its methods with data mining. The distinction
between the two is largely historical; machine learning
Definition evolved from artificial intelligence research, whereas
data mining evolved from information science. Artifi-
Data mining is the process of applying computational cial intelligence problems are often conceptually well
methods to large amounts of data in order to reveal new defined such as making a correct medical diagnosis
non-trivial and relevant information. Data mining is based on data from a patient, whereas the data mining
not only used for finding interesting patterns from the problem could be, for example, to mine the database of
data but also for exploring large data sets, for building medical records to find new and surprising associa-
models that describe the relevant properties of data, tions. Partly due to the somewhat different goals,
and for making predictions based on the data (Hand these two communities of computational data analysis
et al. 2001). Due to rapidly evolving biological mea- typically rely on different statistical frameworks (Hand
surement instruments such as sequencers and array et al. 2001; Baldi and Brunak 2001). Machine learning
scanners, data mining problems have become central research has embraced the flexible Bayesian frame-
in modern biosciences (Baldi and Brunak 2001). work (▶ Bayesian inference) for probabilistic model-
ing, while data mining often uses computationally
lightweight frequentist statistics to process large data
Characteristics sets.

There is some confusion on the meaning of the term Data Mining Methods
data mining and its relation to overlapping terms such Data mining is to a large extent defined by the methods
as machine learning and statistics. As a field of science, that are used in the field. The methods and their goals
data mining sits between information technology and are varied, but can be roughly divided into two main
statistics. Data mining originated from knowledge dis- categories: they either try to model the data (learn the
covery research in information science, and it has global structure of the data) or try to find patterns of
traditionally had very little interaction with statistics. interest (learn local structures from the data). From
This history also explains much of the critique toward a computational point of view, data mining problems
data mining, to a large extent from the statistical com- are often NP-hard, meaning that they do not have an
munity. Early data mining methods have been criti- optimal solution algorithm and need approximated
cized for so-called “data fishing”: searching for solutions. This inherent complexity is also one of the
complex patterns without controlling the probability reasons why data mining is seen as an interesting area
of them being just random artifacts in the data (Hand for computer science research.
et al. 2001). The most common data mining tasks can be divided
However, the difference between statistics and data into clustering (unsupervised learning), classification
mining is vague: statistics is especially interested in (supervised learning), regression (▶ Regression
inference and prediction, typically based on some Analysis), association mining, and text mining
a priori hypotheses and supporting experimental (▶ Text Mining) (Hand et al. 2001; Hastie et al. 2009).
D 526 Data Mining

Clustering (▶ Clustering) tries to find groups of (▶ Data Mining–based Transcriptional Regulatory


similar or similarly behaving entities, such as genes, Network Construction) and protein interaction
proteins, or metabolites, from the data. In general, networks.
clustering is blind to the known structures of the Text mining (▶ Text Mining) methods locate pat-
data in a sense that it does not use any knowledge terns and trends from textual data sets. The two major
outside the data to assign the entities to the clusters. application areas in systems biology are analyzing
Clustering methods are at the core of data mining, genomic and proteomic sequences for patterns of inter-
although arguably no longer a central point of est and mining literature databases for associations
research. A large variety of clustering methods exists, between, for example, genes and proteins.
but among the most well-known ones are k-means
clustering (▶ Clustering, k-Means) and hierarchical Visualization
clustering (▶ Clustering, Hierarchical) (Fig. 1). One Humans are skilled at quickly locating patterns
typical application area for clustering is gene expres- from visual representations, which can be harnessed
sion analysis, where clustering can be used for, for for efficient data exploration. In statistics, data is
example, grouping cancer samples to discover novel often summarized with variables such as mean
subtypes. and variance. Visualizations can be thought of as
Classification (▶ Classification) falls into two cate- an extension to simple variables, allowing the over-
gories. First, if knowledge of the grouping of the enti- view of more complex structures (e.g., contour plots)
ties, such as the sex of the providers of the biological and multiple variables simultaneously (e.g., scatter
samples, is available, then classification can be applied plots). Visualizations can also be coupled with
to find the themes that make the groups different. data mining techniques. For example, by using
Second, once such themes are identified, classification dimensionality reduction techniques to bring the
can predict the grouping of new entities. Perusing the number of dimensions down to two, and then by
previous example on cancer gene expression data, using scatter plots to show the result, a high-
classification methods can be used to learn the expres- dimensional data set can often be shown in an under-
sion patterns of known cancer subtypes and then to standable form.
classify new cancer samples. Common classification
techniques are discriminant analysis, decision trees, Validation
nearest-neighbor methods, neural networks, and sup- Not all discovered patterns are necessarily robust or
port vector machines. valid. Data mining methods often find patterns that are
Regression (▶ Regression) comprises a very large unique to the analyzed data, and cannot be detected
family of methods. The general idea of regression is to from independent data sources. The same basic phe-
find a function of predictor variables that fits the nomenon exists in different forms in different data
observed response with the least error. Hence, regres- mining tasks: in modeling, the problem is known as
sion can be used for, for example, studying an “overfitting” (▶ Overfitting), whereas in pattern
organism’s response to the dosage of a drug. The searching and statistical testing, it is known as “false
range of methods varies from the classical linear positives.” In all cases, one is led to draw overly
regression (▶ Linear Regression) through generalized optimistic conclusions.
linear models to generalized additive mixed models There are different approaches to guarantee valid and
and mixture models. robust results. One approach is to split the data into
Association methods (▶ Association Rule) search a training set (▶ Model Training, Machine Learning)
for relationships between variables. Association rule and a test set (▶ Model Testing, Machine Learning).
mining has found only a few applications in bioinfor- The selected data mining method is applied on the
matics, since most implementations of it need categor- training data only, and once the pattern has been dis-
ical data, and typical measurements in biology are covered, its validity is assessed using the test data
made on a continuous scale. However, especially in (▶ Model Validation, Machine Learning). Some data
systems biology, mining associations in the form of types allow randomization methods to be used, where
networks has been studied extensively. Example the whole data is used in training, but the trained method
applications include gene regulatory networks is also applied to a large number of randomized versions
Data Mining 527 D

Data Mining, Fig. 1 Visualization of a hierarchical clustering of gene expression samples. Genes are clustered so that they have
similar expression levels in the sampled conditions. Visualization was produced with Chipster software
D 528 Data Mining–based Transcriptional Regulatory Network Construction

of the original data. Only those original results that are Cross-References
unlikely in randomized data are selected.
Validation is hard to do well, and it often takes ▶ Bayesian Inference
considerable effort to program a validation approach ▶ Classification
that is suitable for the particular data at hand. Ready- ▶ Clustering, Hierarchical
made solutions in most software programs often do not ▶ Clustering, k-Means
perform satisfactorily. ▶ Clustering, Model-Based
▶ Data Mining–based Transcriptional Regulatory
Tools and Data Management Network Construction
It is common to store structurally simple data sets as ▶ Linear Regression
flat files, such as comma separated tables. More com- ▶ Model Testing, Machine Learning
plex data sets are typically stored in SQL databases, in ▶ Model Training, Machine Learning
XML files or in custom data storage systems. Data ▶ Model Validation, Machine Learning
warehouse technologies commonly used in the indus- ▶ Overfitting
try are less often seen in bioscientific data mining. ▶ R, Programming Language
For preprocessing and analysis, most commonly ▶ Text Mining
used tools are either generic scripting languages (e.g.,
Perl and Python), interactive data analysis environ-
ments (e.g., R (▶ R, Programming Language) and References
Matlab), and graphical tools (e.g., Weka). Knowledge
discovery professionals in the industry often use spe- Baldi P, Brunak S (2001) Bioinformatics: the machine learning
approach, 2nd edn. MIT Press, Massachusetts
cialized data mining packages, but in systems biology
Chambers J (1993) Greater or lesser statistics: a choice for future
in academia, a set of open source command line tools is research. Stat Comput 3:182–184
favored. As the need for data mining methods is grow- Hand DJ, Mannila H, Smyth P (2001) Principles of data mining.
ing due to developments in measurement technology, MIT Press, Massachusetts
Hastie T, Tibshirani R, Friedman J (2009) The elements of
commercial companies have started to provide special-
statistical learning: data mining, inference, and prediction,
ized graphical tools, which have been later followed by 2nd edn. Springer, New York
open source developments. These tools aim to provide
advanced data analysis capabilities to users without
computational background.
Analysis workflows differ, but they all contain
some variants of the following steps: preprocessing, Data Mining–based Transcriptional
data exploration, and main analysis. In preprocessing, Regulatory Network Construction
the data is formatted correctly, filtered for unneces-
sary or dubious content, and often normalized. Explo- Xing-Ming Zhao
ration ranges from outputting simple statistics to Institute of System Biology, Shanghai University,
elaborate visualizations. The main analysis is often Shanghai, China
constructed in the form of a statistical test, although
sometimes a less strict approach is used. Especially
for systems biology work, an additional step is taken Synonyms
after the main analysis: result integration and expla-
nation. In the integration and explanation step, the Classification; Data mining; Regression
results are often validated (Model Cross-validation,
Machine Learning) against independent data sources,
such as the many biological databases that are pub- Definition
licly available. The intent is to find support for the
results and also to find sound explanations that help to Data mining is a new field that aims to analyze large
understand the results as part of the larger “systems datasets and extract knowledge from data, and it is an
view”. interdisciplinary field involving statistics, pattern
Data Mining–based Transcriptional Regulatory Network Construction 529 D
recognition, information theory, and so on. Generally, feature selection techniques include maximum rele-
data mining techniques can extract informative pat- vance/minimum redundancy (MRMR), Entropy, and
terns from data and construct decision-making sys- Mutual information, etc. The selected features can help
tems. The most popular techniques in data mining to interpret the model and important patterns.
include clustering, classification, regression, etc. Data
mining is being widely used in various fields, including Data Mining Algorithms
image processing, marketing relationship, and medi- Various data mining methods can be grouped into
cine (Fayyad et al. 1996). three groups, that is, supervised methods, unsupervised
Recently, data mining has been playing an impor- methods, and semi-supervised methods, among which D
tant role in bioinformatics, based on which various the supervised and unsupervised methods are widely
tools have been developed to construct transcriptional used in transcriptional regulatory network construction.
regulatory networks. Instead of determining individual In supervised methods, there are some regulations that
protein-gene regulations with traditional biological are known in advance and are therefore used as training
experiments, the integration of emerging high- set to infer the patterns for prediction of new regula-
throughput data and data mining is able to predict tions. The supervised methods widely used to construct
transcriptional regulations in an automatic way. In transcriptional regulation networks include artificial
general, the regulation relationship can be predicted neural network, decision tree, genetic algorithm, and
as a classification or regression problem, which aims to support vector machine. On the other hand, if there are
extract regulation patterns from the experimental data. not any known regulations available, the unsupervised
methods will be helpful, among which the most popular
one is clustering.
Characteristics
Artificial Neural Networks
Data Representation and Preprocess Artificial Neural Network (ANN) model is designed to
In data mining, the data are usually formulated as mimic the real biological neural network that can pro-
vectors, where each sample is represented as one vec- cess information and make decision. In general, the
tor. Therefore, the similarity between samples can be ANN model consists of three parts, including input
estimated. In constructing transcriptional regulatory layer, output layer, and hidden layer. There are some
network, one of the most commonly available data nodes in each layer that work like the neurons, and the
source is microarray data that describes the expression nodes are interconnected to simulate the information
profile of genes under different conditions. The gene flow between neurons. The structure of the ANN
expression data can be easily used for data mining. model reflects the mapping from input variables to
Other kinds of data, for example, ChIP-on-chip (also output variables. The ANN model can handle large
known as ChIP-chip) data, can also be represented as dataset and is robust against noisy data. However, the
numeric vectors. ANN model works as black box and it is difficult to
However, it is not a trivial task to extract informa- interpret the results obtained in some cases.
tive patterns due to the noise inherited in the experi- Veiga et al. (2008) designed a feed-forward (FF)
mental data. Some data mining techniques are and bi-fan (BF) regulatory motif prediction model for
therefore used to reduce noise and sometimes also Escherichia coli based on multilayer perception artifi-
help to reduce the dimensionality of the data, such as cial neural networks (ANNs). The regulatory motifs
principal component analysis (PCA). Since the gene predicted consist of transcription factors and regulated
expression data can be measured as time series data, genes, and are highly enriched. Hart et al. (2006) found
some signal processing methods, for example, Fourier that single-layer feed-forward ANN models can effec-
transform, are helpful to reduce noise and transform tively discover gene network structure by integrating
the data into another description space so that the global in vivo protein-DNA interaction data (ChIP/
signal in the data can be easily detected. Another Array) with genome-wide microarray data. The ANN
important issue in data mining is feature selection, models were successfully applied to construct the yeast
which aims to select important patterns so that the cell cycle transcriptional regulatory network, which is
performance of the classifier can be improved. Popular composed of hundreds of genes.
D 530 Data Mining–based Transcriptional Regulatory Network Construction

Genetic Algorithms transcription factors act in concert to regulate the


The genetic algorithm (GA) is a heuristic optimization expression of their targets.
method that works like the evolution of biological
processes and aims to find out the global optimization Support Vector Machine
in the solution space. There are some chromosomes in Support vector machine (SVM) is a newly developed
GA, which denote possible solutions, and combination supervised classifier that is especially useful for dataset
and mutation are used to change the structure of the with high dimensionality and small samples, and it can
chromosomes so that the evolution proceeds to better work on both classification and regression problems.
solution. The objective function of GA is called fitness The success of SVM is attributed to its two new fea-
function and is specially designed for the algorithm’s tures. Firstly, SVM uses kernel technique to describe
goal based on expert knowledge. GA performs very the relationship between samples, which enables SVM
well in searching for optimal solutions although it is to work efficiently without considering the dimension-
time consuming in some cases. ality of the data. Secondly, SVM only uses the samples
In the construction of transcriptional regulatory net- near the classification hyperplane, that is, support vec-
work, it is important to determine the topological tors, to build the classifier so that it can work robustly
structure of the regulatory network. Seema and against noise.
Ramanatha (2010) successfully applied GA to infer Qian et al. (2003) used support vector machines
the structure of transcriptional regulatory network. (SVMs) to predict the targets of a transcription factor
Kikuchi et al. (2003) presented a modified version of based on the association relationships between their
GA that not only predicts the structure of the regula- expression profiles. In particular, SVMs successfully
tory network but also predicts the dynamics of the predicted the targets of 36 transcription factors for
regulations. In particular, a new fitness function was Saccharomyces cerevisiae based on the microarray
designed to improve the performance. data obtained under different physiological conditions.
Kumar et al. (2007) presented another framework for
Decision Tree predicting protein-DNA interactions based on
Decision tree is a decision-making system that utilizes sequence information and SVMs, and gave promising
a tree-like data structure. Decision tree has been widely results.
used as a decision-making system for a long time due
to its simple structure and interpretability. The popular Clustering
decision tree learning methods include Classification Clustering is one of the most popular unsupervised
and Regression Trees (CART) and Chi Square Auto- learning methods in data mining. Clustering generally
matic Interaction Detection (CHAID). Decision tree is groups samples into different clusters so that the sam-
robust against noisy data and can handle large dataset ples in the same cluster are similar based on the pat-
in a short time. In addition, the decision tree can terns in the data while dissimilar between clusters. The
perform feature selection automatically, which can most important and difficult things in clustering are
help to interpret the model. how to describe the similarity between samples and
Ruan et al. (2009) built an ensemble classifier of what criteria should be used to make a group for a set of
decision trees to construct transcriptional regulatory samples.
network, where each decision tree is learned based on In biology, the genes that are always co-expressed
one specific gene expression dataset. The yeast tran- under various conditions are assumed to be
scriptional regulatory network was constructed by this co-regulated by same regulators. Therefore, the
classifier that associates the gene expression level with genes can be clustered into groups based on their
binding affinity between transcription factors and expression profiles. If a set of genes are clustered
DNAs detected by chromatin immunoprecipitation into one group, these genes are regulated by either
and microarray (ChIP-chip) experiment. regulators outside of the group or regulators within
From the decision tree ensembles, the logical the group. Based on the above assumptions, various
rules can be extracted to explain how a set of clustering techniques have been applied to construct
Data Sampling 531 D
transcriptional regulatory network. For example, de
Hoon et al. (2003) first clustered the genes into dif- Data Parallel
ferent groups and then constructed a transcriptional
regulatory network based on the clustering results for ▶ General-Purpose Computation, Graphics Processing
Bacillus subtilis. Units
▶ Grid Computing, Parallelization Techniques

Cross-References
D
▶ Chromatin Immunoprecipitation Data Sampling
▶ Entropy
▶ Feature Selection Haiying Wang and Huiru Zheng
▶ Maximum Relevance/Minimum Redundancy School of Computing and Mathematics, Computer
(MRMR) Science Research Institute, University of Ulster,
▶ Mutual Information Jordanstown, UK

Synonyms
References
Sampling; Statistical sampling
de Hoon MJL, Imoto S, Kobayashi K, Ogasawara N,
Miyano S (2003) Inferring gene regulatory networks
from time-ordered gene expression data of Bacillus
subtilis using differential equations. Pac Symp Biocomput Definition
8:17–28
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data
mining to knowledge discovery in databases. AI Mag
In machine learning (▶ Model Validation, Machine
17:37–54 Learning), data sampling (Cochran 1977) is a widely
Hart CE, Mjolsness E, Wold BJ (2006) Connectivity in the yeast accepted process concerned with the selection of an
cell cycle transcription network: inferences from neural net- unbiased subset of data, which are representative of
works. PLoS Comput Biol 2(12):e169
a larger population, for the purposes of constructing
Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M
(2003) Dynamic modeling of genetic networks using predictive models with machine learning algorithms.
genetic algorithm and s-system. Bioinformatics The size of the sample is the number of data items in
19(5):643–650 a sample, typically denoted as an integer number N.
Kumar M, Gromiha M, Raghava G (2007) Identification of
The major benefit of applying data sampling in the
DNA-binding proteins using support vector machines and
evolutionary profiles. BMC Bioinformatics 8:463 context of machine learning is it can effectively speed
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Predic- up the modeling process, which allows the analyst to
tion of regulatory networks: genome-wide identification of build a model and make prediction with relatively little
transcription factor targets from gene expression data.
cost and effort. To achieve this, a sample is required to
Bioinformatics 19(15):1917–1926
Ruan J, Deng Y, Perkins EJ, Zhang W (2009) An ensemble contain the essence and reflect the characteristics of the
learning approach to reverse-engineering transcriptional reg- entire dataset. The key to meet this fundamental
ulatory networks from time-series gene expression data. requirement is randomness, i.e. allowing each data
BMC Genomics 10(Suppl 1):S8
item in the database to have the same probability of
Seema S, Ramanatha KS (2010) Inference of gene regulatory
network using modified genetic algorithm. In: Proceedings of being selected.
the international symposium on biocomputing, Calicut, Types of data sampling commonly used in machine
Kerala, pp. 1–8 learning include the following:
Veiga DF, Vicente FF, Nicolás MF, Vasconcelos AT
(2008) Predicting transcriptional regulatory interactions
• Simple Random Sampling
with artificial neural networks applied to E. coli multidrug Each data item has the same chance of being
resistance efflux pumps. BMC Microbiol 8:101 selected in the data set.
D 532 Data Storing and Querying

• Stratified Random Sampling


The entire dataset is first divided into k disjoint Database Accession Number
groups with each group having the size of N1, N2,
. . ., Nk. These subgroups are non-overlapping and Teresa K. Attwood
together they are comprised of the whole popula- Faculty of Life Sciences and School of Computer
tion, i.e., N1 + N2 + . . .+ Nk ¼ N. Then for each Science, University of Manchester, Manchester, UK
subgroup, a simple random sample is taken in
a number proportional to its size when compared
to the whole population. The collection of these Synonyms
subsets constitutes a stratified sample. The sub-
groups are referred as strata and the whole proce- AC#; Database AC
dure is called stratified random sampling.
• Cluster Sampling
The entire dataset is divided into groups of items, i.e.,
clusters, and each cluster becomes a sample unit. This Definition
is an example of two stage sampling. In the first stage,
analysis is carried out on a population of clusters and A database accession number, rather like a database
each cluster has the same chance of being included in identifier, is a short code used to uniquely identify
the sample. After this process, a random number of a particular entry or record within a particular
items within these clusters is selected. database. The code normally contains alphanumeric
characters, and is usually designed to be machine
readable (they are seldom, if ever, human readable).
Cross-References For example, P02700 is the accession number that
identifies the entry for ovine rhodopsin in the
▶ Model Validation, Machine Learning UniProtKB:Swiss-Prot (UniProt consortium 2011)
protein sequence database: Unlike its largely
human-readable database identifier (OPSD_SHEEP),
References this accession number is neither informative nor par-
ticularly memorable to humans.
Cochran WG (1977) Sampling Techniques. Wiley, New York As already mentioned, accession numbers are data-
base specific, and different databases adopt different
numbering conventions. Hence, for example, in the
PIR protein sequence database, ovine rhodopsin has
Data Storing and Querying the accession number A03155.
Information pertinent to ovine rhodopsin,
▶ Distributed Data Management which belongs to a superfamily of G protein–
coupled receptors (GPCRs), may also be found
in protein family databases like InterPro (Hunter
et al. 2009), PROSITE, PRINTS, Pfam, and so on:
Data Warehouse Examples of accession numbers for the GPCR super-
family entries in these databases are IPR00026,
▶ Data Integration PS00237, PR00237, and PF0001, respectively. By
contrast with protein sequence database numbering
schemes, it is broadly possible to decipher which
is the parent database from protein family
Database AC database accession numbers: For example, IPR
denotes InterPro; PS denotes PROSITE; PR,
▶ Database Accession Number PRINTS; and PF, Pfam. The number itself, however,
Database Identifier 533 D
gives nothing away about the protein family entry
with which it is associated. Database Code
Database accession numbers are intended to
provide a stable means of tracking down particular ▶ Database Identifier
database entries. Consider, for example, the DNA rep-
lication licensing factor MCM4. Although its Swiss-
Prot identifier has oscillated since March 1993, from
CD21_YEAST, to C21H_YEAST, CC54_YEAST,
CDC54_YEAST, and finally to MCM4_YEAST, its Database ID D
accession number remained the same throughout –
that is, P30665. Nevertheless, accession numbers can ▶ Database Identifier
and do change between database releases. Thus, for
example, the accession number for the same protein in
the PIR protein sequence database began as S25527,
then became S26641, and finally S56050. Sometimes,
different database entries in the same database are Database Identifier
merged or replaced with or by others (e.g., when
a computer-annotated TrEMBL sequence is discov- Teresa K. Attwood
ered to be redundant with an existing manually anno- Faculty of Life Sciences and School of Computer
tated Swiss-Prot sequence). In such cases, the Science, University of Manchester,
accession number of the deprecated sequence is Manchester, UK
retained so that it too can be tracked. For example,
in November 2010, the TrEMBL entry D6W429
(ID, D6W429_YEAST) was replaced by UniProtKB: Synonyms
Swiss-Prot entry P30665 (ID at that time,
CDC54_YEAST): Effectively, the entries were Database code; Database ID; DBID
merged and D6W429 was retired. At that point, the
accession number record of the revised entry was
updated to read, “AC P30665; D6W429.” In this
case, P30665 is denoted the primary accession number Definition
and D6W429 the secondary accession number.
Retaining all the accession numbers in this way A database identifier is a short code or name that is
makes tracking the history of particular database enti- used to uniquely and reliably identify a particular
ties much easier. entry or record within a particular database. The
code normally contains alphanumeric characters
(but may also sometimes contain other symbols),
Cross-References and, by contrast with database accession numbers, is
usually designed to be human readable or, at least,
▶ Data Integration and Visualization human decipherable. For example, OPSD_SHEEP
is the code that identifies the entry for ovine rhodop-
sin in the UniProtKB:Swiss-Prot (UniProt consortium
References 2011) protein sequence database: Here, OPSD
is a shorthand that identifies the protein “rhodopsin”;
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A et al SHEEP is, self-evidently, a label that identifies
(2009) InterPro: the integrative protein signature database. the species in which this particular rhodopsin is
Nucleic Acids Res 37(Database issue):D211–D215
found.
UniProt Consortium (2011) Ongoing and future developments
at the universal protein resource. Nucleic Acids Res As already mentioned, identifiers are database
39(Database issue):D214–D219 specific, and different databases adopt different
D 534 Database of Quantitative Cellular Signaling (DOQCS)

naming conventions. Hence, for example, in the PIR


protein sequence database, ovine rhodopsin has Database of Quantitative Cellular
identifier OOSH. Signaling (DOQCS)
Information pertinent to ovine rhodopsin, which
belongs to a superfamily of G protein–coupled recep- G. V. Harsha Rani and Upinder S. Bhalla
tors (GPCRs), may also be found in protein family Neurobiology, National Center for Biological
databases like InterPro (Hunter et al. 2009), Sciences, Tata Institute of Fundamental Research,
PROSITE, PRINTS, Pfam, and so on: Examples of Bangalore, India
identifiers for the GPCR superfamily entries in these
databases are 7TM_GPCR_Rhodpsn, G_PROTEIN_
RECEP_F1_1, GPCRRHODOPSN, and 7TM_1, Synonyms
respectively. Although some of the identifiers here
are more cryptic than others, in each case, the code DOQCS: Database of quantitative cellular signaling;
is broadly readable or decipherable, each in some way GENESIS: General neural simulation system;
pointing to 7TM proteins, to GPCRs and/or to MATLAB: Matrix laboratory; MIRIAM: Minimal
rhodopsins. information required in the annotation of models;
Although database identifiers were intended to MySQL: My structured query language; ODE: Ordi-
provide a stable means of tracking down particular nary differential equation; PHP: Hypertext preproces-
database entries, in practice, they can and do change sor; URL: Uniform resource locator
between database releases. Thus, for example, prior
to March 1993, the Swiss-Prot identifier for ovine
rhodopsin, OPSD_SHEEP, was OPSD$SHEEP. Definition
Less subtle and more frequent changes also occur;
hence, for example, in March 1993, the Swiss-Prot The Database of Quantitative Cellular Signaling
identifier for DNA replication licensing factor (DOQCS) is a repository of models of signaling path-
MCM4 was CD21_YEAST: This changed to ways available at http://doqcs.ncbs.res.in. This is
C21H_YEAST, CC54_YEAST, CDC54_YEAST, a curated database in which all models have been
and finally to MCM4_YEAST in 1994, 1995, 2005, implemented and tested. The database is free for
and 2011, respectively. This kind of identifier vola- content access and download. Models are presented
tility can make tracking particular database entities in various data formats, and each entry includes reac-
problematic, and is partly why additional forms of tion diagram, annotation, and links to other models
identification, in the form of accession numbers, and literature.
are also vital.

Characteristics
Cross-References
Role
Technical advances in biological information capture
▶ Data Integration and Visualization
have yielded intimidating quantities of data pertaining
to many areas of biology, including biochemical sig-
naling. In parallel, continuing developments in soft-
References ware and simulators have helped researchers to
develop and explore increasingly detailed biological
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A models of complex signaling pathways. Model data-
et al (2009) InterPro: the integrative protein bases are an emerging category of bioinformatics
signature database. Nucleic Acids Res 37(Database Issue): tools that serve the intersection of these trends
D211–D215
UniProt Consortium (2011) Ongoing and future developments
(Ghosh et al. 2011).
at the universal protein resource. Nucleic Acids Res DOQCS was designed to integrate several modeling
39(Database issue):D214–D219 requirements to provide a resource for model
Database of Quantitative Cellular Signaling (DOQCS) 535 D

Database of Quantitative Cellular Signaling (DOQCS), enzyme, and reaction tables. The identity of reactants (substrates
Fig. 1 Entity Relationship diagram for the DOQCS database. and products of reactions and enzymes) and hence the connec-
The accession table is used as an entry point into the model. The tivity of the model is stored in the “Molecule List” table. The
accession number is the unique identifier for each model. Most channel and geometry information are stored in respective
other tables in the database refer to this (broken line). Similarly, tables. Original and converted model files are stored as large
each pathway has a unique pathway number used by other tables text objects in the “File Format” table
(the black line). Model parameters are stored in the molecule,

development. It includes a collection of models of inter-compartment diffusion as a reaction term.


signaling pathways, but has a special emphasis toward DOQCS is particularly rich in models on synaptic
organizing the data both in terms of schematic descrip- plasticity and MAPK signaling.
tions as well as searchable model data. It provides
reaction schemes and associated rate constants, Database Structure
enzyme parameters, concentration terms, as well as DOQCS is implemented using ▶ relational database;
contextual data including data sources and tissue type. it is structured as a set of accessions, each of
which represents a complete model. Accessions may
Scope consist of one or more signaling pathways, and
DOQCS includes compartmental chemical kinetic each pathway is specified in terms of several mole-
models solved using systems of ▶ ordinary differen- cules, enzymatic reactions, and binding reactions.
tial equations (ODE), and also using stochastic This conceptual structure has been previously
methods such as the ▶ Gillespie Stochastic Simula- described (Sivakumaran et al. 2003) and is briefly
tion (Gillespie 2007). It also includes a few one- recapitulated here. The underlying table structure
dimensional spatial models which have been has been revised based on additional navigation and
expanded out into large ODE systems by treating data requirements considered in this paper (Fig. 1).
D 536 Database of Quantitative Cellular Signaling (DOQCS)

Database of Quantitative Cellular Signaling (DOQCS), Fig. 2 Screenshot of overview page of DOQCS accession, showing
directory-tree-like hierarchy structure, block diagram and basic annotations

As before, DOQCS is implemented using Linux/ Original and converted model files are stored as
Apache/MySQL/PHP. Large Text Objects in the File Formats table.
The accession table specifies the primary model
entry into the database and holds substantial contextual Database Interface
data as well as a summary diagram for each model. The The web interface to DOQCS facilitates navigation
contextual data include ▶ MIRIAM compliant annota- either through links (URL-based navigation) or
tions. Further fields including tissue, cell type data in through searches (form-based navigation) (Fig. 2).
addition to fields on curation level and model type The URL-based navigation matches the conceptual
annotation are added. organization of the database into accessions, pathways,
The Pathway table provides additional overview and biochemical entities. These levels are explicitly
information about the models, including reaction dia- represented using a directory-tree-like hierarchy in the
grams and annotations. web interface. The root of the tree is the accession,
The remaining tables provide low-level model folders include different pathways within the acces-
details: molecular identity, rate constants, and sub- sion, and the molecules, reactions, and enzymes are
strate/product lists. represented as entries within the pathways. A click on
Compartmental details are stored in Geometry any level of the tree brings up the details pertaining to
table. that level of the database, as mentioned above.
Databases for Kinetic Models 537 D
Overview data is provided on the accession and path- Cross-References
way pages, and detailed biochemical parameters are
found on the reaction and molecule pages. ▶ Gillespie Stochastic Simulation
Each of these pages includes richly hyperlinked ▶ MIRIAM
results from the navigation. For example, each pathway ▶ Ordinary Differential Equation (ODE)
page has links to all related pathways in the database. ▶ Relational Database
Further, each biochemical entity page has links to all the ▶ Systems Biology Markup Language (SBML)
reactions or other molecules that interact with it.
The second major navigation option is through D
searches. This uses a query-based form present at the References
top of each page of DOQCS. In additional to simple
queries as previously described (Sivakumaran et al Bhalla US (2002) Use of Kinetikit and GENESIS for modeling
signaling pathways. Methods Enzymol 345:3–23
2003), additional several complex searches
Ghosh S, Matsuoka Y, Asai Y, Hsin KY, Kitano H (2011)
are possible for textual pattern matching within Software for systems biology: from tools to integrated plat-
various subfields, as it is common on most databases. forms. Nat Rev Genet 12(12):821–832
More sophisticated searches can also be carried out Gillespie DT (2007) Stochastic simulation of chemical kinetics.
Annu Rev Phys Chem 58:35–55
based on connectivity or functional criteria.
Sivakumaran S, Hariharaputran S, Mishra J, Bhalla US
Typical connectivity criteria specify that a given (2003) The database of quantitative cellular signaling: man-
molecule is a substrate of another. A functional crite- agement and analysis of chemical kinetic models of signaling
rion might be that a given molecule acts as an enzyme. networks. Bioinformatics 19(3):408–415
Vayttaden SJ, Bhalla US (2004) Developing complex signaling
models using GENESIS/Kinetikit. Sci STKE 219:l4
Data Sources and Curation
Except for few models that were explicitly developed
for the database by the curation team, all other
models are published models which have been tested
before putting online and in the accession table citation Database Search
information is included as a link to PubMed. This
procedure frequently reveals ambiguities in model rep- ▶ Protein Identification Analysis
resentation, and in some complex models, the original
implementation is difficult to replicate. In all cases, the
extent of the fit between published and internally tested
representations is reported as an important entry in the Databases for Kinetic Models
validation field in the table.
Jacky L. Snoep
File Formats Department of Biochemistry, Stellenbosch University,
DOQCS supports three file formats: Kinetikit, SBML, Matieland, South Africa
and MATLAB. All models in the database have Manchester Institute for Biotechnology, University of
been implemented using GENESIS/Kinetikit simula- Manchester, Manchester, UK
tor (Bhalla 2002; Vayttaden and Bhalla 2004) and Molecular Cell Physiology, Vrije Universiteit
are available in its internal model file format Amsterdam, Amsterdam, The Netherlands
which includes annotation information. In cases
where the original publication includes model
files, these are also presented unchanged for down- Definition
load. Most models have also been converted to
MATLAB and SBML. All converted models have Over the last decade, systems biology (SB) has been
been tested for equivalent function to the GENESIS/ one of the fastest growing fields in the life sciences.
Kinetikit form. There are many interpretations and definitions for the
D 538 Databases for Kinetic Models

SB field (Westerhoff and Alberghina 2005), but most Models have increased in size for two reasons:
scientists agree that an SB study combines experimen- firstly because larger systems are being studied and
tal and theoretical (including modeling) approaches. secondly because the models are more detailed. In
As such the increase in number of SB studies correlates some cases, the modelers are actually attempting to
with the increase in mathematical models for biologi- build replicas of the real system. In the latter case,
cal system, which is evident from a recent inventory where the model variables have a well-defined
(Huebner et al. 2011). Not only did the number of mechanistic interpretation, it is important to annotate
models increase, also the size of the models has the model. The information in such annotated models
increased. The latter effect is due to the larger size of is ideally suited for storing in relational databases.
the systems being studied but more importantly due to Strong searching capabilities, easy access and dis-
a more complete representation of the biological semination via standard formats and protocols, linking
system in the models. to other databases, and possibilities of incorporation of
The aim of many SB studies is to relate systems automated workflows are just a few of the added
behavior to characteristics of the components of the advantages of storing models in databases.
system. This is a subtle but important difference from
the classic theoretical biology approaches, where core Minimal Requirements for Model Databases
models were constructed to describe biological behav- A model repository or database must fulfill three min-
ior using mathematical equations that were as simple imal criteria to make it useful. Firstly, the models’
as possible, not necessarily reflecting biological descriptions must be correct, i.e., they must be identi-
mechanisms. Specifically in so-called bottom-up cal to the model description in the publication where
SB models, a strong mechanistic link is maintained the model was first presented (see ▶ CellML Model
between model components and biological entities. Curation). Secondly, the models must be available for
Such models can be used to critically test our download from the repository in a standard model
biochemical knowledge of a given system (e.g., description format, such as ▶ Systems Biology
Teusink et al. 2000). Markup Language (SBML) or ▶ CellML. Thirdly,
The increase in number and size of kinetic models the models must be annotated (model annotation).
and the more realistic representation of biological A minimal annotation should link the model to the
components in these models have led to the develop- reference where it was published and it should be
ment of a number of model database initiatives. In this clear how the model variables and parameters link to
section, descriptions of several of these initiatives are the species and constants in that publication.
given and the advantages of storing models in such A nice-to-have functionality for kinetic model
databases and the consequences for model reuse, anno- repositories is a simulation tool, such that the models
tation, and dissemination are discussed. in the repository can be directly inspected and simu-
lated, for instance in a web browser. The first initia-
tives that stored kinetic models of biological systems
Characteristics started out as model repositories with a simulation
engine (see ▶ JWS Online and the ▶ Virtual Cell
Why Model Databases (VCell) Modeling and Analysis Platform).
The increased size of kinetic models in SB studies A large number of model databases have been ini-
necessitates the storage of the models in a publically tiated in the last decade. Some of these initiatives focus
accessible form. Whereas it is easy to code a two- on a specific subset of models, such as models for
variable model from a publication, the effort and signal transduction pathways (▶ DOQCS: Database
chances of making errors becomes much larger with of Quantitative Cellular Signaling), for the cell cycle
increased model sizes. Clearly, if a model were avail- (▶ Cell Cycle Database), for neuronal models
able in a repository from which it can be downloaded (modelDB), while other databases are more general
and used without recoding the model, this would save repositories (▶ BioModels Database: a repository of
a lot of work and would also eliminate coding errors mathematical models of biological processes, ▶ JWS
(if the model in the database were properly curated). Online, ▶ CellML, ▶ Virtual Cell, ▶ WebCell).
Databases for Kinetic Models 539 D
Model Annotation What Information Should Be Stored in Model
Whereas the focus in theoretical biology studies used Databases?
to be on the mathematical aspects of a model, i.e., on For each model entry in the database, its description
the analytical and numerical analyses, in systems biol- file in a standard format and an annotation file would
ogy studies the direct link to experimental data is much be the minimal information to be stored (in SBML, the
stronger and it therefore becomes more important to two can be combined in a single xml file). In this
relate model components to biological entities, i.e., to section, the focus is on models described as determin-
annotate models. istic ordinary differential equations, which is the class
In addition to a minimal reference annotation to in which the majority of the kinetic models in SB D
link models in the database to the scientific publica- studies are described. For a more general treatment,
tion in which the model is described, a further anno- see the section on kinetic models (▶ Kinetic Modeling
tation of model components to biological entities and Simulation).
using controlled vocabularies (e.g., SBO terms) A typical model description would include the fol-
greatly enhances the application strength of mathe- lowing: (1) description of a set of reactions, (2) rate
matical models. The annotation of models makes it equations for the reactions, (3) parameter values, and
much easier to search for specific components in (4) initial conditions. In addition, models can contain
models, or to compare or even link models for events, functions, and algebraic equations, but the above
overlapping systems. four components are generic. The reaction description
The Minimal Information Required in the Annota- and the rate equations can be combined to give a set of
tion of Models (▶ MIRIAM) is an example of guide- differential equations, and some models are defined in
lines for annotating models, and it links model differential equations only, without specifying the reac-
variables to ontology terms (e.g., to a systems biology tions and their rate equations. The standard model
ontology, SBO term) and thereby to unique identi- description formats allow for these different ways to
fiers. Thus, where modelers can use names as G6P, describe kinetic models. Whether a model is described
Glu6P, Glc6p, or X2 to denote a variable such as in terms of reactions or as ODEs does not influence the
glucose 6-phosphate, if such variable names are numerical integration (since these are performed as
linked to an identifier such as a ChEBI number they differential equations), but the model description can
can all be related to glucose 6-phosphate. This anno- affect the functionality of the model.
tation relates model constituents unambiguously to
a known entity, where the references information is The Importance of Specifying Reactions
given in a {“data type,” “identifier,” “qualifier”} trip- If a model is described in ODEs without explicit ref-
let. The “data type” is given as a Unique Resource erence to the reactions in a system, it is not always
Identifier and can be an Uniform Resource Locator possible to dissect the contributions of the individual
(url) or a Uniform Resource Name (urn), for instance reactions to an ODE.
for a model variable Ca2+ which is a reactant in The famous Lotka-Volterra model for predator-
a reaction the data type could be a url: “http://www. prey interaction might serve as an example to illustrate
ebi.ac.uk/chebi/” with identifier: “CHEBI:29108” the point. This model is usually presented in the form
and qualifier: “is.” MIRIAM extends beyond the of ODEs:
annotation of model components; it also requires
 
a reference correspondence, standard model format, dðx½tÞ=dt ¼ x½t ða  b y½tÞ
and the possibility of instantiation of a model
 
simulation. dðy½tÞ=dt ¼ y½t ðc x½ t   dÞ
Biomodels (▶ BioModels Database: a repository of
mathematical models of biological processes) was one with x[t] and y[t] the densities of prey and predator,
of the first model database initiatives to strongly pro- respectively, and a, b, c, and d positive parameters.
mote model annotation and has played an important From these ODEs, it is not immediately clear that the
role in development in many model description, anno- system consists of three reactions: (1) a birth rate of
tation, and simulation standards. prey (a*x[t]), (2) a death rate of predator (d*y[t]), and
D 540 Databases for Kinetic Models

(3) a rate for prey consumption by predator leading to to represent biological pathways and is widely
a decrease of prey (  b y½t x½t) and an increase of supported by database initiatives.
predator (c y½t x½t). From the ODEs, it cannot be Thus, although the traditional way of defining
automatically derived that (3) is a single reaction models in terms of ODEs without explicit reference
with a different stoichiometry for prey consumption to reactions has no consequence for the time integra-
and predator growth. If the ODEs would be given in tion of the model, it does affect the functionality of the
terms of reactions this uncertainty would not exist, for model both in terms of comparing to other resources
instance: and in terms of analyses that can be made with the
model. For instance, automated drawing of reaction
dðx½tÞ=dt ¼ v1  v2 network graphs or ▶ metabolic control analysis
(MCA) can only be performed for models defined in
dðy½tÞ=dt ¼ e v2  v3 terms of reaction steps.

with v1 ¼ a x½t, v2 ¼ b  y½t  x½t, and e ¼ c/b, and Rate Equations


v3 ¼ d * y[t]. When the ODEs are given in terms of For some model organisms, a fairly complete set of
reactions, a reaction network can automatically be reactions occurring in a cell has been constructed (at
drawn: least for metabolism). Such structural models, which
define a reaction network, have been defined for much
1 X 2
e
Y 3 larger system than have been used for kinetic models.
The reason for this is that the information needed for
The Lotka-Volterra model is not a mechanistic building a kinetic model is more extensive and not as
model and the translation to explicit reactions is not simple and condition independent as for structural
so important. In addition, it is not difficult to make the models.
translation into reactions with some background infor- For well-studied enzymes, a kinetic mechanism
mation. However on the basis of the ODEs only, might be known and then a rate equation can be
a computer cannot make the translation to the network derived from the mechanism. However, for many
automatically. For larger models, such a translation enzymes no mechanism is known and then often
becomes much more difficult, even for humans. See a generic rate equation is used based on a simplified
Conradie et al. 2010, for an example of a study where mechanism such as a random order rapid equilibrium
considerable effort was made to translate a model binding mechanism. If the enzyme can be studied in
defined in ODEs back to the original reactions for isolation, kinetic parameters can be estimated by
which it was defined. fitting the rate equation on the experimental data set
If a model is defined in terms of reactions, it is easier for the isolated enzyme. If the enzyme cannot be
to identify the biological system it refers to and to studied in isolation, the kinetic parameters are often
search for components or compare (parts of) models. fitted on behavior of the complete system. In the latter
A reaction is defined in terms of substrates and prod- case, it is much more difficult to make specific per-
ucts and a unique identifier can be given for a reaction turbations to the enzyme and even simpler rate equa-
(e.g., KEGG, see entry ▶ KEGG pathway database; tions must be used since the parameters are often not
www.genome.jp/kegg/). If the process is enzyme cat- identifiable.
alyzed, as is the case for most reactions in the cell, The above paragraph indicates some of the difficul-
a corresponding E.C. number exists. Such identifiers ties of obtaining suitable rate equations for kinetic
are important as they can give information on thermo- models, which are treated in more detail in the sections
dynamic constants (e.g., Keq for a reaction), and they on model construction and ▶ model validation. For
can give indications for enzyme kinetic constants as this section, it is sufficient to know that for many
stored in enzyme kinetic databases. Furthermore, by systems we can define the reaction network with
defining a mathematical model in terms of reactions it good confidence and can store this information well
is possible to describe the modeled system in in a database. But for the equations that can be used to
a standard format. BioPAX (Biological Pathway describe the rate of the reactions, the situation is not so
Exchange; www.biopax.org/) is a standard language simple.
Databases for Kinetic Models 541 D
Although the number of possible rate equations that interactivity and exchange between different data-
can be used in mathematical models might appear bases. Whereas molecules, reactions, and enzyme
limitless, the situation is not quite that bad. Due to identifiers are largely species and condition indepen-
the limited number of substrates and products for dent, this does not hold for rate equations or parameter
most reactions, the possible combinations, even when values. Thus, the strong point of connecting databases
taking several mechanisms into account, is not that big. to search for kinetic parameters for rate equations
Thus, libraries for rate equations have been should be used with caution and might be difficult to
constructed, and when storing models in databases automate.
a reference to such a library can be made if a standard A simple example might make the problem clearer. D
mechanism was implemented for a reaction. In prac- Let us consider the well-known Michaelis-Menten
tice many models will also include nonstandard equation for the enzyme-catalyzed conversion of S to P
rate equations that are derived specifically for
a reaction, and these must be included in the model
storage as well. Some model simulators such as
COPASI (www.copasi.org/) refer to an internal rate
equation library. When building a model, the user can
select a rate equation from the library or use a user-
defined reaction (that is then added to the library).
When exporting the model from the simulator, each
equation is given explicit in the SBML file.
V ¼ Vm S=ðKm þ SÞ
Kinetic Parameters
One of the advantages of using rate equation libraries For the original derivations, Michaelis and Menten
is that it is possible to give identifiers to each of the assumed equilibrium binding of enzyme and substrate
model parameters. In much the same way that mod- and then the Km value has a mechanistic interpretation
elers pick names for variable species, they also use as the dissociation constant (Kd) of the enzyme-
names for parameters that cannot always be uniquely substrate complex (Km ¼ Kd ¼ kd/ka; with kd the
related to known constants. If parameters in rate equa- rate constant for the dissociation reaction and ka the
tions were annotated, it would be possible to relate rate constant for the association reaction). In a later
them to existing kinetic constants and make links to derivation, Briggs and Haldane relaxed the equilib-
database resources. rium binding assumption to a quasi-steady-state
Unique identifiers for small chemical species and approximation for the enzyme-substrate complex,
for enzymes are quite well accepted and used, but for which gives a different mechanistic interpretation for
rate equations this is not so common. This makes it the Km value (Km ¼ (kd + kcat)/ka). Although the
harder to store kinetic information in a database and assumptions for the derivation and the resulting inter-
limits the strength of database searches and functional pretation of the Km value are different, the equation for
comparisons with other rate equations or parameter the description of the enzyme activity is identical.
values. The situation is worse for reactions for which Interestingly, in practice the Km value is determined
we do not know the kinetic mechanism as now differ- as the substrate concentration giving half-maximal reac-
ent rate equations can be used to describe the same tion rate, irrespective of the mechanism or assumptions.
reaction. The functional behavior of different rate The enzyme characteristics and the experimental con-
equations might be similar (this is to be expected if ditions determine whether the equilibrium binding
they were constructed from the same data set), and assumption or the quasi-steady-state approximation
would therefore not lead to very different model sim- assumption holds (if any). However, the mechanistic
ulation results. However, the different equations make interpretation of the Km value is independent of the
it hard to compare (or use) kinetic parameters obtained experimental determination, and many researchers
for different rate equations. have become a bit careless in using and reporting Km
This touches on an important aspect of one of the values. If the Km value is operationally defined as the
advantages of using databases for model storage: substrate concentration giving half-maximal enzyme
D 542 Databases for Kinetic Models

activity, this can still work fine. However, for multiple Linking Models to Data Sets
substrate/product reactions, the rate equation becomes One can roughly distinguish two types of data sets that
dependent on the kinetic mechanism used, and for are connected to mathematical models: data for model
ordered binding mechanisms the definition of the inhi- construction and data for model validation. In principle
bition constants is related to the mechanism. For such these should be independent data sets, and they can be
reactions, it is important to publish the kinetic rate very different. For instance in a typical bottom-up
equation together with the Ki values. modeling approach, it is possible to have kinetic data
Irrespective of the mechanistic interpretation of for each of the individual enzymes that is used for
kinetic parameters, their values are usually dependent parameterization of the enzyme kinetic rate equation
on the assay conditions under which they were deter- as part of the model construction. In such a study, data
mined. The kinetic parameter values should be deter- for the complete system could be used for model val-
mined under conditions that closely reflect the idation. Of course, this is just a scenario and other
conditions that are simulated in the model. Where types of data sets are possible. In many studies it is
a model simulates the intracellular cytosol, it is hard not possible to separate the model components func-
to imitate those conditions ex vivo. Often kinetic tionally from the system, and then one cannot charac-
parameters are determined in vitro, and some guide- terize the individual components in isolation.
lines for mimicking cytosolic conditions have been Because the bottom-up approach nicely separates
formulated (van Eunen et al. 2010). the model construction data sets from the model vali-
The contents of databases are dependent on the way dation data sets, I will expand a little further on how
the data are represented in the scientific literature. For such data sets can be linked to model storage in data-
the representation of enzyme kinetic data, standards bases. The first question to ask is whether experimental
have been formulated by the ▶ STRENDA commis- data should be stored in a model database. On the one
sion (Standards for Reporting Enzymology Data, hand, one can argue that the data are not part of the
http://www.beilstein-institut.de/en/projects/strenda/). model description, and are not necessary for model
A number of databases exist where enzyme kinetic simulation or analysis and could therefore be kept
data are collected. This might seem in conflict with the separate from the model. On the other hand, the data
above advice to use published kinetic parameters with for model construction are closely linked to the model
caution, but these database initiatives give information and can enhance the model functionality significantly.
on the following: (1) the experimental conditions Firstly, if all data for model construction are made
under which the parameters were determined, (2) the available, the model construction becomes
associated rate equation, and (3) the experimental data a completely transparent process and could be
used for the parameter determination (e.g., SABIO- reproduced by other scientists, which is an important
RK). Some kinetic parameters such as Keq and kcat aspect of scientific research. Secondly, it enables
are relatively constant (e.g., not sensitive to a particular researchers to make changes to the model, for instance
kinetic mechanism assumed in its estimation), when using the same experimental data sets but applying
they are measured under the correct physiological con- different rate equations. A last advantage to explicit
ditions. Other kinetic parameters, such as Vmax values linking of the data sets to the model is that it makes it
can vary largely as they are not only dependent on the possible to completely separate the construction and
assay conditions but also on expression level of the validation data sets.
enzyme, i.e., on the growth conditions of the organism. The above given advantages should be sufficient
The relevance of this long section on enzyme motivation to link experimental data sets to the exper-
kinetic parameters for model databases is that data- imental data that was used for the model construction,
bases that store the kinetic parameters will facilitate but it does not provide with a strong argument to store
the reuse of the parameter values. There is a significant the data in the same database as where the models are
risk involved in reusing parameter values, if the user stored. For instance it would be possible to store all the
does not check carefully how the parameter values experimental data in a specialized database for enzyme
were determined. A close link of the parameter values kinetics, such as SABIO-RK (sabio.villa-bosch.de/) or
and rate equations to experimental data seems Brenda (http://www.brenda-enzymes.info/), and then
necessary. link the model to the external database. Whereas such
Databases for Kinetic Models 543 D
a link to external databases is easily made, it is some- (www.unicellsys.eu/), EviMalaR (www.evimalar.org/),
times more convenient to also store the experimental the Virtual Liver (www.virtual-liver.de/), and many
data in the model database. For instance, several of the more research projects.
model databases (e.g., JWS Online and Biomodels) are
part of data management systems of large research Web-Services and Workflows
projects, and then it can be advantageous to keep the One of the biggest advantages of using a database for
data in a secure database and have direct access to the storing data is the possibility to structure the data
data without external links. For instance, JWS Online according to its expected use. One can store the data
is used in active model development where for each such that expected queries run most efficiently. It is D
reaction there is a data set available that is directly a relatively simple step to not only structure the data
available via the network schema. These data sets are storage but also the way in which queries are made and
sometimes updated and linked to versioned models. output is given. Most databases allow access to their data
Only the final model is released to the public and via so-called web-services (see ▶ Web Service), which
then the data is made available. are structured queries. There are different types of web-
Clearly, different solutions can be used to store services, which are treated in more detail in another
models, kinetic parameters, rate equations, and the section. For the model databases, such web-services can
associated experimental data. It is important that be a simple search query to select all models for a certain
a link between the model and the experimental data is organism, or pathway, but it can also involve running
made explicit and that it can be followed, how the link simulations of models that have been selected before.
is made is not so important. In some of the above- Such workflows that run a set of web-services in succes-
mentioned systems biology projects, in addition to sion can be defined in software tools specifically
experimental data there is a lot of additional informa- designed for this task, such as Taverna, or in more generic
tion stored and often a much greater functionality is programs such as Mathematica. Workflows are very
provided in a complete data management structure. powerful tools that automate tedious and repetitious jobs.
An important aspect of web-service construction is
Data and Model Management Structures the formulation of the specific format in which the
The multidisciplinary character of systems biology query must be made and the answer will be given.
projects and the scale of many of these projects neces- Recent initiatives have focused on the formulation of
sitate the collaboration between sometimes-large num- standards for model simulation descriptions. MIASE
bers of research groups. Although each research (Minimal Information About a Simulation Experi-
project could in principle come up with its own unique ment) proposes a minimal set of information needed
solution for data and model management, it would to reproduce simulation experiments, and SED-ML
make more sense to develop more generic tools that (Simulation Experiment Description Markup Lan-
can be used by many research groups. An example of guage) encodes this information.
such a data management system that is being devel-
oped centrally is the data management group for the The Silicon Cell Initiative
▶ SysMO projects (www.sysmo.net/). The SEEK Model databases greatly enhance the accessibility of
(▶ Data and Model Management Platform, SEEK) published kinetic models. The storage of curated
(Wolstencroft et al. 2011) is the centrally developed models is important to prevent losing the models,
software package that gives a wide functionality to which are often custom coded, and to make the models
each of the individual research groups, one of which publicly available in standard formats. The reproduc-
is the storage and curation of mathematical models, but ibility of (model simulation) results is an important
it also makes it possible to make explicit links between aspect of the scientific process and this is ensured in
models and experimental data. the curation process of the model databases.
The SEEK has been a large success and is now also Model accessibility is also important for future
implemented in several other SB projects. SEEK can model reuse. Whether a specific model will be reused
be downloaded, installed, and adapted to anyone’s is largely dependent on how it was constructed. If
needs, and this has helped strongly in the package a model was made to address a specific research
being incorporated in, e.g., the UniCellSys project question, the model structure and kinetic parameters
D 544 Data-Driven Science

might be very dependent on the specific conditions ▶ CellML Model Curation


that were applied. Then model reuse might be limited ▶ Data and Model Management Platform, SEEK
to these conditions. However, if a model was ▶ DOQCS: Database of Quantitative Cellular
constructed using the above-discussed bottom-up Signaling
approach, using experimental data for each of the ▶ JWS Online
individual reaction steps, the model (or parts of it) ▶ KEGG Pathway Database
can be reused. If the kinetics of the individual reaction ▶ Kinetic Modeling and Simulation
steps were measured under physiological conditions, ▶ Metabolic Control Analysis
they are to a large extent model independent. This ▶ Model Validation
holds for the kinetic mechanism and the kinetic ▶ Silicon Cell
parameters but to a lesser extent to the Vmax values ▶ STRENDA
(which are dependent on the enzyme expression ▶ SysMO
levels). Since the enzyme expression levels will be ▶ Systems Biology Markup Language (SBML)
dependent on the growth conditions, these could very ▶ Virtual Cell (VCell) Modeling and Analysis
well be model dependent. Platform
The ▶ silicon cell initiative (e.g., Snoep and ▶ WebCell
Westerhoff 2004) advocates the use of rate equations ▶ Web Service
based on kinetic parameters that were experimentally
determined for the individual reactions. Construction
of kinetic models with such rate equations followed References
by an independent model validation for the system
would lead to kinetic models that can be reused in Conradie R, Bruggeman FJ, Ciliberto A, Csikász-Nagy A,
Novák B, Westerhoff HV, Snoep JL (2010) Restriction
a modular approach to building models of larger sys-
point control of the mammalian cell cycle via the cyclin
tems. Whereas it is unlikely that a detailed kinetic E/Cdk2:p27 complex. FEBS J 277:357–367
model for a whole cell can be constructed in a single Huebner K, Sahle S, Kummer U (2011) Applications and
step with a bottom-up approach, the Silicon Cell trends in systems biology in biochemistry. FEBS J
278:2767–2857
approach would be to validate models for parts of
Snoep JL, Westerhoff HV (2004) The silicon cell initiative.
the cell and then merge them to simulate a larger Curr Genom 5:687–697
part of the system. After merging, the resulting Teusink B, Passarge J, Reijenga CA, Esgalhado E, Van der
model can again be validated. In such a modular Weijden CC, Schepper M, Walsh MC, Bakker BM,
Van Dam K, Westerhoff HV, Snoep JL (2000) Can yeast
approach, a gradual increase in model size will pre-
glycolysis be understood in terms of in vitro kinetics of the
vent the accumulation of experimental error in the constituent enzymes? Testing biochemistry. Eur J Biochem
individual model parameters. Model validation is 267:5313–5329
crucial in this approach, and the definition of modules van Eunen K, Bouwman J, Daran-Lapujade P, Postmus J,
Canelas AB et al (2010) Measuring enzyme activities under
should largely be determined by whether they can be
standardized in vivo-like conditions for systems biology.
validated, i.e., define a module such that it can be FEBS J 277:749–760
validated. Westerhoff HV, Alberghina L (eds) (2005) Systems biology,
In this section, several aspects of model databases definitions and perspectives, vol 13, Topics in current genet-
ics. Springer, Berlin, pp 13–30
were introduced and discussed. A number of these
Wolstencroft K, Owen S, du Preez F, Krebs O, Mueller W, Goble
database initiatives will be highlighted in assays and C, Snoep JL (2011) The SEEK: a platform for sharing data
several additional definitions will be formulated. and models in systems biology. Methods Enzymol
500:629–655, Chapter 29

Cross-References

▶ BioModels Database: A Repository of


Mathematical Models of Biological Processes Data-Driven Science
▶ Cell Cycle Database
▶ CellML ▶ Data-intensive Research
Data-Intensive Research 545 D
Characteristics
Data-Intensive Research
What does it mean for research to be based on empir-
Sabina Leonelli ical evidence? This question, one of the oldest within
ESRC Centre for Genomics in Society, University of the philosophy and history of science, is being
Exeter, Exeter, Devon, UK reformulated and reconsidered within contemporary
biological and biomedical science. In these areas, and
particularly within system biology with its emphasis
Synonyms on data sharing and interdisciplinary integration, tech- D
nological innovation and shifting ideas about what
Computational data analysis; Data deluge; Data-driven counts as evidence have transformed practices of data
science; Data-intensive science collection, dissemination and analysis, with profound
methodological consequences for experimental and
modeling practices. This entry sketches some charac-
Definition teristics of this broad trend.

Data-intensive research can be characterized as the The Data Deluge


attempt to extract biological knowledge from the The activities of data gathering and data use appear to
huge amounts of data produced through experiments have acquired relative independence from other scien-
and high-throughput technologies (e.g., new genera- tific activities such as hypothesis-testing, modeling,
tion ▶ DNA sequencing) and disseminated through and explanation. Up to the second half of the twentieth
cyberinfrastructures (e.g., community databases and century, biological data were largely produced as evi-
▶ Bio-Ontologies). Data-intensive research encom- dence to support a specific experimental hypothesis.
passes a wide variety of scientific methods, whose This is still the case in several research areas, but not
common feature is to rely on the accumulation and within molecular and system biology, where high-
sharing of evidence on a large scale and across throughput technologies such as sequencing and
research contexts as a starting point for the research micro-array experiments have changed the way in
process. Also central to data-intensiveness is the idea which data are produced. In these fields, the activity
of automated data analysis, defined as the extraction of of data gathering has become increasingly automated
biologically significant patterns from data through and technology-driven, resulting in the production of
computational means, with as little human intervention billions of data-points in need of a biological interpre-
as possible (see also ▶ Automated Reasoning). Com- tation (Hey et al. 2009). Consequently, massive
putational tools for ▶ data mining are expected to research efforts are being devoted to the dissemination
facilitate the generation of new hypotheses and thus of data online, in the hope that free and widespread
to help identify fruitful research directions, which can access to large datasets will enable scientists to use
be explored further through in vivo experimentation. them to understand biological phenomena, thus gener-
At the same time, both champions and critics of data- ating new paths toward discovery.
intensive research recognize that data analysis cannot Thanks to the variety of computational tools devel-
be understood as purely inductive and that the human oped to collect, store, and distribute them, data are now
input and skills involved, such as the ability to interpret available to researchers on an unprecedented scale.
data, construct models, and formulate hypotheses, can- This partnership between biology and computer sci-
not be fully automated. Data-intensive research is thus ence constitutes both the strength and the weakness of
not one single, inductive, computer-driven method for data-intensive research. Several commentators have
discovery. Rather, it encompasses a variety of methods argued that the extraction of knowledge from such
of data analysis, all of which rely on iterative feedback large, cross-disciplinary datasets constitutes a new sci-
between ▶ experimentation in vivo and the consulta- entific method, often depicted as “data-driven.” The
tion of data available in silico; and between inductive, underlying idea is that data already available online
deductive, and explorative reasoning (Kell and Oliver constitute formidable sources of insight, which can be
2004; O’Malley et al. 2009). used to generate research programs without
D 546 Data-Intensive Research

necessarily starting from a specific hypothesis to be also have biological significance, and what they
tested and without necessarily possessing the same teach us about the biology of organisms needs to
expertise as the original data producers (Kell and Oli- be investigated through further experimentation.
ver 2004). At the same time, it has become clear that Yet, computational analysis here offers a shortcut
the sheer scale and diversity of data to be analyzed toward discovery by pointing to patterns that have
requires the creation of sophisticated tools for data at least the potential to enhance existing under-
mining, which in turn needs to be informed by relevant standings of biological phenomena.
expertise in the theory and practice of all the relevant • Discovery through spotting gaps
domains within biology (Blake and Bult 2005, Buetow Data mining can lead to discovery by pointing
2005). researchers toward areas of investigation that were
not previously charted. A good example is the dis-
Three Data-Intensive Research Methods covery of “ultra-conserved regions” in vertebrate
The following three cases provide good examples of DNA and their regulatory role in development
the advantages and limitations of data-intensive (Blake and Bult 2005); or, more generally, the dis-
research methods: covery of correlations between the presence of spe-
• Discovery through triangulation of existing cific alleles in an individual’s genotype and the
evidence occurrence of a phenotypic trait, which might
The opportunity to access data through a web of open the way to the investigation and discovery of
interlinked repositories and databases is enabling mechanisms responsible for the development of
scientists to retrieve more and more data of possible that trait (this is the approach underlying genome-
relevance to their research interests. It is now pos- wide association studies). In these ways, data-
sible to gather and integrate data obtained on a wide intensive research helps to identify gaps in the
variety of organisms by laboratories across the existing knowledge about a given entity, thus open-
globe, no matter the specific expertises and interests ing up new areas for investigation.
guiding the production of data at each location. This
is particularly true in the case of research on Beyond Induction
▶ model organisms, where researchers can retrieve As exemplified by the above cases, at the core of data-
large portions of the data available on the same set intensive research is an emphasis on the epistemic
of phenomena through community databases. This value of data beyond the experimental context in
unrivaled level of data sharing is fueling the discov- which they are originally produced. Supporters of
ery of new regulatory roles of specific genes or data-intensive research stress that the way in which
pathways. The triangulation of existing evidence data can be used as evidence varies depending on the
thus furthers and transforms existing understand- scientific context in which they are considered. This
ings of phenomena. This research strategy is hardly flexibility as “raw materials” of science is what makes
new, yet the use of digital technologies makes it the dissemination and integration of data across bio-
tremendously more efficient. It is crucial that the logical domains into an effective route toward dis-
data in question are in a digital format, which makes covery (Leonelli 2009). This does not necessarily
it possible to disseminate them widely and retrieve mean that the accumulation and sharing of data con-
them instantly. stitutes the best starting point for inquiry, yet critics of
• Discovery through data mining data-intensive approaches have stressed the risk
Online databases can also be searched for emerging of “induction beckoning again” (Allen 2001) and
patterns or correlations that could not have been emphasized that proponents of data-driven science
predicted otherwise. A striking instance of this tend to portray data as the primary source of scientific
method is the idea of “random walks” through knowledge and to stress the value of inductive
data, where software is used to mine datasets to procedures over and above other research methods.
spot statistically significant patterns (e.g., gene This impression is heightened by the frequent
expression). It is not obvious that these patterns juxtaposition of data-intensive methods to
Data-Intensive Research 547 D
“hypothesis-driven,” deductive research, which sug- ▶ ontologies (Renear and Palmer 2009). Indeed, it
gests that data mining can lead to the formulation of has been claimed that data-intensive science cannot
testable claims without recourse to preconceived advance without substantial investment in a reliable
hypotheses (Evans and Rzhesky 2010). These inter- cyberinfrastructure which can be easily updated by
pretations of data-intensive research as purely induc- users, particularly when this approach is put to the
tive and independent from existing theoretical service of systems biology (Philippi 2006).
expectations are very problematic and untenable in 2. Successful examples of data-intensive science illus-
the face of actual scientific practice. Reusing data for trate the need for these methods to be embedded in
the purposes of discovery involves a complex ensem- a wider spectrum of scientific practices, ranging D
ble of skills and methodologies, which go well from theoretical analysis to ▶ experiments and
beyond an inductive approach. Extracting biologi- field observations. Data-intensive research thrives
cally meaningful inferences from high-throughput when exposed to iterative feedback with in vivo
genomic data involves, for instance, reliance on the- research (O’Malley et al. 2009).
ories about gene expression and regulation, specific
models of the biological processes being regulated, The Heuristic Value of Data-Intensive Science
common standards for the formatting and visualiza- The relation between data-intensive science and other
tion of genomic data, and familiarity with the instru- forms of research is difficult to describe in simple and
ments and organisms from which data were originally general terms, yet this complexity is precisely what
obtained. This methodological complexity is what makes data-intensive methods interesting as a new
makes it difficult to pinpoint the epistemic character- approach to scientific inquiry. Recent advances in
istics of data-driven research as a unique, emerging bioinformatics and successful applications of data-
mode of inquiry. At the same time, methodological driven methods have shown that the analysis of data
complexity aligns this form of research with the goals in silico cannot be fully automated, nor should it be
and methods favored in system biology: the pursuit of disjoint from experimental research in vivo. How-
complex, interdisciplinary interactions; and the use of ever, computational tools have the power to substan-
a variety of methods to achieve an integrated under- tially transform how research is performed and the
standing of living organisms (Philippi 2006). ways in which ▶ experiments are set up, carried out,
and verified. Data-intensive science has great heuris-
The Limits of Automation tic value, since findings emerging from the analysis of
Another characteristic common to the three examples large datasets can be used to challenge and re-direct
of data-driven methods given above is reliance on the means and targets of experimental research. This
automated data analysis: “smart” software is assigned does not mean that data-driven methods should
a prominent role in facilitating the extraction of pat- always be conceived as the starting point for experi-
terns from data, either through statistical analysis or mental inquiry. They constitute resources that can
through search mechanisms in databases. While auto- potentially complement all stages of research and
mated techniques for data analysis and hypothesis gen- can thus be fruitfully applied to any project, even if
eration are proliferating and becoming increasingly in each case their function is likely to be different
sophisticated, there are two good reasons to believe depending on the specific goals, context, and
that biological research cannot and should not be fully resources available.
automated:
1. Research within the field of bioinformatics, and
particularly database curation, has shown that data Cross-References
mining processes are only reliable when resources
and expertise are invested in selecting high-quality ▶ Automated Reasoning
data for insertion in databases; and in building data- ▶ Bio-Ontologies
bases that make use of efficient search engines, ▶ Community Database
visualization systems for data, and interoperable ▶ Data Mining
D 548 Data-Intensive Science

▶ DNA Sequencing
▶ Experiment dbSNP
▶ Ontology Lookup Service for Controlled
Vocabularies and Data Annotation Jingky Lozano-K€uhne
Department of Public Health, University of Oxford,
Oxford, UK
References

Allen JF (2001) Bioinformatics and discovery: induction Synonyms


beckons again. Bioessays 23(1):104–107
Blake JA, Bult CJ (2005) Beyond the data deluge: data
integration and bio-ontologies. J Biomed Inform Single-nucleotide polymorphism database
39(3):314–320
Buetow KH (2005) Cyberinfrastructure: empowering a ‘third
way’ in biomedical research. Science 308:821–824
Evans J, Rzhesky A (2010) Machine science. Science
Definition
329(5990):399–400
Hey T, Tansley S, Tolle K (eds) (2009) The fourth paradigm. A database containing information about genetic
Data-intensive scientific discovery, Microsoft Research, variations. It was established by the National Center
Redmond. http://research.microsoft.com/en-us/collabora-
for Biotechnology Information as a central repository
tion/fourthparadigm. Accessed 31 August 2010
Kell DB, Oliver SG (2004) Here is the evidence, now what is the of data for both single-base nucleotide substitutions
hypothesis? The complementary roles of inductive and and short deletion and insertion polymorphisms
hypothesis-driven science in the post-genomic era. Bioessays (Sherry et al. 2001).
26(1):99–105
Leonelli S (2009) On the locality of data and claims about
phenomena. Philos Sci 76(5):737–749
O’Malley MA, Elliott KC, Haufe C, Burian RM (2009) Philoso- Cross-References
phies of funding. Cell 138:611–615
Philippi S (2006) Addressing the problems with life-science
▶ SNPedia
databases for traditional uses and systems biology. Nature
7:769–773
Renear AH, Palmer CL (2009) Strategic reading, ontologies,
and the future of scientific publishing. Science References
325(5942):828–32
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski
EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic
variation. Nucleic Acids Res 29:308–311

Data-Intensive Science
DDBJ Genome Resources
▶ Data-Intensive Research
Akos Dobay1 and Maria Pamela Dobay2
1
Institute of Evolutionary Biology and Environmental
Studies (IEU), University of Zurich, Zurich,
Date Hub Switzerland
2
Department of Physics, Ludwig-Maximilians
▶ Hub University, Munich, Germany

Definition
DBID
The DNA Data Bank of Japan (DDBJ; http://www.
▶ Database Identifier ddbj.nig.ac.jp) is a collection of nucleotide
DDBJ Genome Resources 549 D
sequence data. Although DDBJ collects data DDBJ Genome Resources, Table 1 List of the databases
worldwide, most of the direct submissions are available at DDBJ
from Japanese researchers. DDBJ continuously Service
exchanges the collected data with the European name Description
Bioinformatics Institute (▶ EBI Genome Resources) INSD- The INSD-core data contain all the traditional
core sequences of complete genomes, but exclude
and the National Center for Biotechnology Informa- whole-genome shotgun (WGA) sequences, mass
tion Database (NCBI; http://www.ncbi.nlm.nih.gov). sequence for genome annotation (MGA), and third
The three databases are part of the International party annotation (TPA) (Kaminuma et al. 2010;
Nucleotide Sequence Database (INSD; http://www. Kaminuma et al. 2011) D
insdc.org) consortium, whose function is to WGS The whole-genome shotgun contains large sets of
overlapping and finished sequences without
ensure the integrity of the shared information. annotation from ongoing genome projects
Apart from the genome resources, DDBJ also has (Kaminuma et al. 2010; Kaminuma et al. 2011)
resources for protein sequences and protein MGA The mass sequence for genome annotation is
structures. comprised of sequences that are produced in a large
quantity for the purpose of genome annotation
(Kaminuma et al. 2010)
TPA Third party annotation data are assembled using
Characteristics primary entries of publicized nucleotide sequence
data collections, with additional features
DDBJ was an initiative of Japanese molecular biolo- determined by experimental or inferential methods
from a pool of experts (Cochrane et al. 2010)
gists and biophysicists (Tateno and Gojobori 1997)
DTA DDBJ trace archive is a permanent repository of
that began its activities in 1986, in collaboration DNA sequence chromatograms
with the European Molecular Biology Laboratory DRA DDBJ sequence read archive is a repository for
(EMBL; http://www.embl.de) and the genetic sequencing data from next-generation sequencing
sequence database (Genbank; http://www.ncbi.nlm. technologies. A web-based metadata creation tool
nih.gov/genbank) of the National Institutes of called MetaDefine has been released in March 2010
to facilitate the submission process
Health (NIH; http://www.nih.gov). The maintenance
DAD The DDBJ amino acid database contains amino acid
and the development of DDBJ are organized by sequences extracted from the nucleotide flat files
the Center for Information Biology and DNA present in the DDBJ periodical release and TPA dataset
Data Bank of Japan (CIB-DDBJ; http://www.cib. GTPS The Gene trek in prokaryotic space is a re-annotated
nig.ac.jp/) of the National Institute of Genetics database that uses motif scans
(NIG; http://www.nig.ac.jp/english/index.html). GIB The genome information broker is a comprehensive
data repository of complete microbial genomes
Ninety-nine percent of the data coming from Japanese
GIB-V The genome information broker for viruses is a
researchers and available from INSD are submitted repository for complete virus genomes. For other
through DDBJ (Kaminuma et al. 2010, 2011). virus genome resources, refer to the ▶ NCBI viral
Researchers from China, Korea, and Taiwan genomes resources
are also mostly submitting their data through CIBEX The Center for Information Biology Gene
Expression database (http://cibex.nig.ac.jp) is a
DDBJ. When the submission does not include
public database for mircoarray data (▶ Microarray
a large number of sequences, as the case is from data, parallel and distributed preprocessing)
whole-genome shotgun (WGS) data or mass sequence DOR The DDBJ omics archive stores quantitative data
data for genome annotation (MGA) (Sugawara from both microarrays (▶ Microarray data, parallel
et al. 2009), DDBJ offers a user interface and distributed preprocessing) and new high-
throughput sequencing platforms. DOR also
called SAKURA. Up to now, DDBJ has released integrates CIBEX and exports the data to
several databases. As the case is in the ▶ NCBI ArrayExpress database (▶ EBI genome resources;
BioProject genome resource, researchers have the Kodama et al. 2010)
option to submit their projects prior to full comple- GTOP The genomes TO protein structures and function
tion. DDBJ also provides genomic information database consists of data analyses of proteins by
application of various computational tools to the
as well as resources for protein sequences and protein amino acid sequences of genome projects
structures. Table 1 summarizes the available data- sequenced to its entirety (Kawabata et al. 2002;
bases in DDBJ. Fukuchi et al. 2009)
D 550 De Novo Computational Discovery of Motifs

Other Resources
The DDBJ also includes patent data transferred from De Novo Computational Discovery of
the Japan Patent Office (JPO; http://www.jpo.go.jp), Motifs
the Korean Intellectual Property Office (KIPO; http://
www.kipo.go.kr), as well as from the United States Jianhua Ruan
Patent and Trademark Office (USPTO; http://www. Department of Computer Science, University of Texas
uspto.gov) and the European Patent Office (EPO; at San Antonio, San Antonio, TX, USA
http://www.epo.org).

Definition
Cross-References
A class of computational methods for finding tran-
▶ EBI Genome Resources scriptional binding motifs within a set of promoter
▶ Genome Annotation sequences. This is different from the scenario where
▶ Microarray Data, Parallel and Distributed one needs to search promoter sequences for the binding
Preprocessing sites of a known transcription factor binding motif
▶ NCBI Bioproject Genome Resources (e.g., a consensus or a PSWM).
▶ NCBI Viral Genomes Resources

References
Death Rate
Cochrane G, Bates K, Apweiler R, Tateno Y, Mashima J,
Kosuge T, Mizrachi IK, Schafer S, Fetchko M (2010)
▶ Life Span, Turnover, Residence Time
Evidence standards in experimental and inferential
INSDC third party annotation data. OMICS J Int Biol 10 ▶ Lymphocyte Population Kinetics
(2):105–113
Fukuchi S, Homma K, Sakamoto S, Sugawara H, Tateno Y,
Gojobori T, Nishikawa K (2009) The GTOP database in
2009: updated content and novel features to expand and
deepen insights into protein structures and functions. Nucleic
Acids Res 37(suppl 1):D333–D337 Decentralized Version Control
Kaminuma E, Mashima J, Kodama Y, Gojobori T, Ogasawara O,
Okubo K, Takagi T, Nakamura Y (2010) DDBJ launches a
new archive database with analytical tools for next- ▶ Distributed Version Control System (DVCS)
generation sequence data. Nucleic Acids Res 38:D33–D38
Kaminuma E, Kosuge T, Kodama Y, Aono H, Mashima J,
Gojobori T, Sugawara H, Ogasawara O, Takagi T, Okubo K,
Nakamura Y (2011) DDBJ progress report. Nucleic Acids Res
39:D22–D27
Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T,
Ichiyoshi N, Nishikawa K (2002) GTOP: a database of
Decision Rule
protein structures predicted from genome sequences. Nucleic
Acids Res 30:294–298 ▶ Prediction Rule
Kodama Y, Kaminuma E, Saruhashi S, Ikeo K, SugawaraH TY,
Nakamura Y (2010) Biological databases at DNA data bank
of Japan in the era of next-generation sequencing technolo-
gies. Adv Exp Med Biol 680:125–135
Sugawara H, Ikeo K, Fukuchi S, Gojobori T, Tateno Y (2009)
DDBJ dealing with mass data produced by the second
generation sequencer. Nucleic Acids Res 37:D16–D18
Decision Theory
Tateno Y, Gojobori T (1997) DNA data bank of Japan in the age
of information biology. Nucleic Acids Res 25:14–17 ▶ Bayesian Decision Analysis
Decision Tree 551 D
Figure 1 illustrates decision tree structures based on
Decision Tree two simple examples from everyday life. To empha-
size the tree analogy, the decision trees depicted in
Daniel Berrar1 and Werner Dubitzky2 Fig. 1 have their root node at the bottom of the diagram
1
Interdisciplinary Graduate School of Science and and the leaf nodes at the top. In practice, decision trees
Engineering, Tokyo Institute of Technology, are usually visualized with the root node at the top and
Midori-ku, Yokohama, Japan the leaf nodes at the bottom.
2
Biomedical Sciences Research Institute, University of Consider Fig. 1. In the Name Title example, there
Ulster, Coleraine, UK are three leaf nodes labeled Mr, Mrs, and Miss, and two D
non-leaf nodes labeled with the attributes Gender (root
node) and Marital Status. The branches are labeled
Synonyms with the corresponding attribute values: male and
female (for attribute Gender) and married and not
Classification tree; Regression tree married (for attribute Marital Status). In the Jogging
example, there are seven leaf nodes (labeled Yes and
No, respectively), three non-leaf nodes (labeled with
Definition the attributes Outlook, Humidity, and Temperature),
and ten branches labeled with the corresponding attri-
A decision tree refers to both a concrete decision bute values. Figure 1 also illustrates the corresponding
model used to support decision making and decision rules (▶ Prediction Rule).
a method to construct such models automatically A decision tree whose leaf nodes are labeled with
from data. As a model, a decision tree refers to discrete class labels is referred to as classification tree
a concrete information or knowledge structure to sup- (▶ Model Testing, Machine Learning). A decision tree
port decision making, such as classification (▶ Model that uses continuous values or value ranges is referred
Testing, Machine Learning) and regression to as regression tree (▶ Regression Analysis) (Breiman
(▶ Regression Analysis) tasks, processes or analyses. et al. 1984). A decision-tree structure represents
As a method, a decision tree comprises various tech- a ▶ directed acyclic graph which satisfies the follow-
niques to construct decision tree models in a highly ing properties:
automated fashion using specific algorithms and mea- 1. There is exactly one node, called the root, into
sures from the field of statistics, machine learning which no edges (branches) enter.
(▶ Model Validation, Machine Learning) and artifi- 2. Each node other than the root has exactly one enter-
cial intelligence. ing edge (branch).
3. There is a unique path from the root to each non-
root node.
Characteristics Each path from a decision tree’s root node to a leaf
node can be interpreted as a decision rule (▶ Decision
Decision-Tree Structure Rule, ▶ Machine Learning) which has a condition and
A decision tree is composed of nodes and branches that conclusion part. This may be expressed using the
connect the nodes (Quinlan 1993; Hastie et al. 2001; IF-THEN notation or the symbol for logical implica-
Duda et al. 2001). Two basic node types are distin- tion as follows:
guished: leaf nodes and non-leaf nodes (a special non-
leaf node is the root node). Each non-leaf node is IF condition THEN conclusion
labeled with an attribute or a question. The branches
condition ) conclusion
emanating from a non-leaf node correspond to the
possible values of the attribute or the answers to the
question. The leaf nodes of a decision tree are labeled If the input information meets all the conditions
with a class or category. described in the condition part, then the conclusion
D 552 Decision Tree

Name Title: Decision tree model used to Yes Jogging: Decision tree model
No
determine name title used to decide whether
or not to go jogging
< 20 C ≥ 20 C
Mr Mrs Miss
Temperature No

No

m
married not married Yes Yes

mediu
low < 25 C
male high
Marital Status ≥ 25 C

Humidity Temperature Yes


female
rainy
Gender sunny
overcast
Outlook

Corresponding decision rules: Some of the corresponding decision rules:


R1: IF male THEN Mr R1: IF sunny AND high THEN No
R2: IF female AND married THEN Mrs R2: IF sunny AND medium AND <20C THEN Yes
R3: IF female AND not married THEN Miss
...
R7: IF overcast THEN Yes

Decision Tree, Fig. 1 Simple decision trees facilitating classi- jogging (right diagram). Circles depict leaf nodes, boxes depict
fication tasks in everyday life: determining the English (name) non-leaf nodes, and arrows represent branches. The rule sets
title or honorific (left diagram), and deciding whether or not to go below the decision trees describe the corresponding decision rules

stated in the conclusion part is asserted. Thus, the 2. Derive the decision tree (or equivalent set of deci-
decision structure of a decision tree can be formulated sion rules) by means of an inductive process that
as a set of decision rules (this is illustrated in Fig. 1). generalizes from the available facts (data).
The decision rules derived from a decision tree may be In not so well-understood domains (like biological
associated with quantities expressing confidence and knowledge discovery), the machine learning
support as illustrated in Fig. 2. approach has the added advantage that by automati-
cally generating a decision tree from available data it
Decision-Tree Construction may be possible to reveal relationships in the data
Once a decision tree is constructed, it can be used to aid that may not be obvious to the investigator or
decisions of a decision maker, humans or machines. decision maker. A decision-tree learning algorithm
There are two basic approaches for creating decision determines, for instance, which attribute should be
trees. Firstly, decision trees may be generated manu- placed at the root node given the input data,
ally by knowledge engineers working with human thus providing an insight as to what attribute has
experts or textbooks. This approach is effective in the strongest influence in the partitioning of the
well-understood domains but involves a considerable underlying instances. The knowledge-discovery
human effort and time. Secondly, decision trees may aspect as well as the symbolic encoding of the knowl-
be derived automatically from available examples edge they represent make decision trees a useful
(data) by suitable machine learning (▶ Induction) method for exploring biological data. Decision
algorithms. The two basic steps involved in the trees have been widely used for exploratory
machine learning approach are: data analysis, classification, and regression tasks in
1. Obtain facts (data) about the decision making prob- biology (Zhang et al. 2001; Kingsford and Salzberg
lem or studied phenomenon. 2008).
Decision Tree 553 D
Training Set Partition of Learning Set Based on Attribute X
(Learning Instances)

Element of partition whose Element of partition whose


instances have X = 1 instances have X = 0

Instance Attributes Class Instance Attributes Class Instance Attributes Class


# X Y Label # X Y Label # X Y Label
#1 1 0 A #1 1 0 A #6 0 1 A
#2 1 0 A #2 1 0 A #7 0 0 B
#3 1 1 A #3 1 1 A #10 0 1 B D
#4 1 1 A #4 1 1 A
#5 1 0 A #5 1 0 A entropy (X=0) = 0 .92
#6 0 1 A #8 1 0 B
#7 0 0 B #9 1 0 B
#8 1 0 B
#9 1 0 B entropy (X=1) = 0.86
#10 0 1 B

entropy (training set ) = 0.97

R1: IF X=0 AND Y=0 THEN B


Non-Leaf Node #1
class A: 6 instances [support = 0.1, confidence = 1.0]
class B: 4 instances
R2: IF X=0 AND Y=1 THEN A OR B

X=1 [support = 0.2, confidence = 0.5]


X=0
R3: IF X=1 AND Y=0 THEN A
Non-Leaf Node #2 Non-Leaf Node #3 [support = 0.3, confidence = 0.6]
class A: 2 instances class A: 5 instances
class B: 2 instances class B: 2 instances R4: IF X=1 AND Y=1 THEN A
[support = 0.2, confidence = 1.0]
Y=0 Y=1 Y=0 Y=1
See rule R1

Leaf Node #1 Leaf Node #3


class A: 0 instances class A: 2 instances See rule R4
class B: 1 instance class B: 0 instances

Leaf Node #2 Leaf Node #4


See rule R2 class A: 1 instance class A: 3 instances See rule R3
class B: 1 instance class B: 2 instances

Decision Tree, Fig. 2 Illustration of decision-tree learning based on a training set with ten instances and two attributes. Top:
Training set and partition determined for root node. Bottom: resulting final decision tree (left) and decision rules (right)

Decision-tree learning generalizes from observa- and a column represents either an attribute and its
tions by a process of induction. This process takes as values or the class labels.
input a set of specific observations or instances and Using the training instances as input, a decision-tree
generates general rules that cover them. To facilitate learning algorithm generates a decision-tree model by
automated decision-tree learning, the learning obser- recursively partitioning the instances via the following
vations or instances (referred to as training set main steps:
[▶ Model Training, Machine Learning]) are usually 1. Initialize. Provide the full set of observations (training
expressed as a table where a row represents an instance set) as input to the decision-tree learning algorithm.
D 554 Decision Tree

2. Analyze attributes: Determine the attribute that pro- applied to the remaining attributes (in this case only
duces the most uniform grouping or partitioning of attribute Y) at the next level. This leads to the deci-
instances based on the class label of the instances. sion-tree model and decision rule (▶ Prediction Rule)
3. Create node: If the predefined uniformity threshold depicted in Fig. 2. The model consists of three non-leaf
is reached, create a leaf node and label it with the nodes (including the root node) and four leaf nodes.
corresponding class label. Otherwise, create a non- Whereas three leaf nodes are unambiguously associated
leaf node that tests the attribute, assign to it the with the class label A and B, respectively, Leaf Node #2
instances of the corresponding partition, then go to cannot be assigned to a unique class label.
Step 2. The class label frequencies represented by the leaf
The eventual result of the decision-tree learning nodes are used to express likelihood estimates for the
procedure outlined above is a decision-tree model in classification of unseen instances. Unseen instances
which all or the majority of instances at a leaf node are those that comply with the structure of the learning
show the same class label. instances but have not been used in the decision learn-
A challenge in decision-tree learning is to maximize ing process. To illustrate this idea, consider the deci-
uniformity or purity of the instances assigned to the sion tree model in Fig. 2. An unseen instance that is
leaf nodes of a decision-tree model while ensuring that characterized by the attribute values X ¼ 1 and Y ¼ 0
the model generalizes well to unseen instances, i.e., would be labeled (classified) with the class label A.
instances that do not feature in the training set. The Because three out of five instances in the corresponding
latter is also referred to as generalization ability, or Leaf Node #4 carry the class label A, the conditional
bias-variance trade-off. probability for this classification would be given as 3/5.
Measures of uniformity, purity, or homogeneity The conditional probability associated with the predicted
commonly employed by classification tree learning outcome of decision tree and similar models is also
algorithms include the information theory measures referred to as confidence. Another measure that is fre-
of entropy (▶ Entropy), the Gini index, and the quently used in the context of decision trees is called
Kolmogorov–Smirnov distance. Regression-tree support. The support of a decision rule (corresponding to
learning algorithms make use of variance minimiza- a path from root to a leaf node in a decision tree model) is
tion techniques for the same purpose. determined as the proportion of instances in the learning
The following example illustrates the concept of set that satisfy the rule. For example, the decision rule R3
decision tree learning for a binary classification task corresponding to Leaf Node #4 in the decision tree
(▶ Model Testing, Machine Learning), i.e., a task model in Fig. 2 has a support of 3/10 because 3 of 10
involving exactly two class labels. The training set items (#1, #2, and #5) satisfy the rule. The aim in
(▶ Model Training, Machine Learning) in this example decision tree learning is to construct a decision tree
comprises ten instances, each described by two numeric model with a high confidence and support.
attribute values and one class label. The attributes are
called X and Y and assume values from the set {0,1}, Strengths and Weaknesses of Decision Trees
and the class labels are drawn from the set {A,B}. Six Strengths
instances in the training set carry the class label A, and • Decision-tree models capture knowledge in an
four the label B. Figure 2 depicts the training set as well easy-to-interpret knowledge structure, either as
as a partition (composed of two groups of instances) hierarchical trees or sets of decision rules.
obtained from applying the information-theoretic • Decision-tree learning offers a powerful approach
entropy uniformity measure. According to this measure, to automatically discover decision-tree models
higher uniformity corresponds to more information, from large data sets. Decision-tree learning can be
which is corresponds to lower entropy. used to reveal important relationships in data.
Because the ▶ information gain for the partition • The computational costs associated with decision-
derived from the values of attribute X is higher than tree learning are low to moderate.
that for attribute Y, attribute X is chosen to split the data • Decision trees can process both discrete and con-
at the root node. This is illustrated by the decision tree tinuous variables and have an intrinsic ability to
depicted in Fig. 2. The same learning process is handle missing values.
Declarative Language 555 D
Weaknesses
• Decision-tree learning is highly sensitive to Declarative Language
changes in the training set. Small changes in the
learning set may lead to considerably different Amanda Clare1 and Martin Swain2
1
decision-tree models. This is also referred to as the Department of Computer Science, Aberystwyth
stability problem in machine learning. University, Aberystwyth, UK
2
• For a given data set or decision problem, the Institute of Biological, Environmental, and Rural
orthogonal decision regions generated by axis- Sciences, Aberystwyth University, Aberystwyth,
parallel decision-tree learning methods may not Ceredigion, UK D
sufficiently approximate the true underlying deci-
sion regions.
Synonyms
Decision-Tree Algorithms and Tools
There is a large variety of decision-tree algorithms and Declarative programming
tools; open source tools widely used in computational
biology include Weka (Weka, Machine Learning
Tool) and R (▶ R, Programming Language).
Definition

A declarative programming language is a program-


Cross-References
ming language where the programmer specifies the
goal or what should be achieved, rather than how
▶ Directed Acyclic Graph
a goal should be achieved. Logic programming
▶ Entropy
languages, such as Prolog, are declarative languages,
▶ Induction
as are many database query languages. They are often
▶ Information Gain
contrasted with imperative languages such as C/C++,
▶ Model Training, Machine Learning
Java, FORTRAN, Python, and Perl. Imperative lan-
▶ Model Validation, Machine Learning
guages are concerned with procedures: the program-
▶ Overfitting
mer specifies a series of instructions or statements
▶ Prediction Rule
that must be executed one after another in a specific
▶ R, Programming Language
order to achieve the output or goal of the program.
▶ Regression Analysis
Declarative languages are concerned with the rela-
tions between statements. In declarative languages
based on logic such as Prolog, computations are
References based on the evaluation of logical statements using
mechanisms such as ▶ deduction, ▶ induction, or
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classifi- ▶ abduction: there is an important separation
cation and regression trees. Chapman and Hall/CRC,
between the logical statements and the mechanisms
Boca Raton
Duda RO, Hart PE, Stork DG (2001) Pattern classification, of reasoning with those statements.
2nd edn. Wiley-Interscience, New York
Hastie T, Tibshirani R, Friedman J (2001) The elements of
statistical learning, Springer Series in Statistics. Springer,
New York
Kingsford C, Salzberg S (2008) What are decision trees? Nat Cross-References
Biotech 26(9):1011–1013
Quinlan JR (1993) C4.5: programs for machine learning. ▶ Abduction
Morgan Kaufmann, San Mateo
▶ Deduction
Zhang H, Yu CY, Singer B, Xiong M (2001) Recursive
partitioning for tumor classification with gene expression ▶ Induction
microarray data. Proc Natl Acad Sci USA 98(12):6730–6735 ▶ PROLOG
D 556 Declarative Programming

example in proof, the two deduction rules that if


Declarative Programming a model satisfies a conjunction, then it also satisfies
each of the conjuncts,
▶ Declarative Language
f^j
f
Decrease j
▶ Reduction and if a model satisfies a disjunction, then it also
satisfies one of the disjuncts,

Deduction f _ j
f j j
C. Maria Keet
KRDB Research Centre, Free University of In addition, there are two rules for the quantified for-
Bozen-Bolzano, Bolzano, Italy mulas. First, if a model satisfies a universally quanti-
fied formula (8), then it also satisfies the formula where
the quantified variable has been substituted with some
Definition term (and the prescription is to use all the terms which
appear in the tableaux),
Deduction is a way to ascertain if some theory T, which
consists of one or more axioms expressed in a suitable 8x:f
logic language, entails a conclusion, which is an axiom ffX=tg
a that is not explicitly asserted in T; this is written as 8x:f
T j¼ a. That is, a can be derived from the premises
using a set of deduction rules. and, second, for an existentially quantified formula, if
a model satisfies it, then it also satisfies the formula
Usage where the quantified variable has been substituted with
Deduction is used widely in ▶ knowledge representa- a new Skolem constant,
tion and the semantic web, including checking consis-
tency of ▶ bio-Ontologies, using automated reasoners 9x:f
(▶ Automated Reasoning). ffX=ag

Characteristics Example
Let us take some arbitrary theory T that contains two
There are various ways how to ascertain T j¼ a, be it axioms stating that relation R is reflexive (8x.R(x,x),
manually or automatically. One can construct a step- a thing relates to itself) and asymmetric (8x,y. R(x,y)
by-step proof (▶ Proof, Logic) “forward” from the ! :R(y,x); if a thing a relates to b by relation R, then b
premises by applying the deduction rules or prove it does not relate back to a). We then can deduce, among
indirectly such that T [ {:a} must lead to others, that T [ {:8x,y.R(x,y)} is satisfiable. We do
a contradiction. The former approach is called natural this by demonstrating that the negation of the axiom is
deduction, whereas the latter is based on techniques unsatisfiable.
such as resolution, matrix connection methods, and To enter the tableau, we first rewrite the asymmetry
sequent deduction (which includes tableaux). into a disjunction using equivalences, that is, 8x,y.
Concerning deduction rules for tableaux and first R(x,y) ! :R(y,x) is equivalent to 8x,y. :R(x,y) ∨ :R
order predicate logic formulae, we have, as with the (y,x), and add a negation to {:8x,y.R(x,y)}, which
Deductive Reasoning 557 D
Deduction, Table 1 Tableau example
Number Tableau Explanation
1 "x.R(x,x) Reflexivity axiom in the original theory T
2 "x,y. ¬R(x,y) Ú ¬R(y,x) Asymmetry axiom in the original theory T
3 "x,y.R(x,y) The negated axiom added to theory T
4 Substitute x for term a in 1,2,3
5 R(a,a)
6 "y. ¬R(a,y) Ú ¬R(y,a)
7 "y.R(a,y) D
8 Substitute y for term a in 2 and 3
9 R(a,a)
10 ¬R(a,a) Ú ¬R(a,a)
11 R(a,a)
12 Split the disjunction of 10
13 ¬R(a,a) ¬R(a,a) Which each generate a clash with 9 and 11, hence, :8x,y.R(x,y) is entailed by T

thus becomes 8x,y.R(x,y). Then, to start the tableau, we Characteristics


have three axioms (Table 1).
Deductive reasoning allows one to infer statements
Cross-References that are logical consequences of a set of given
statements (Genesereth and Nilsson 1987). For
▶ Automated Reasoning instance, given a theory stating that all swans
are white (8x:swanðxÞ ! whiteðxÞ) and that Odette
References is a swan (swan(Odette)), it can be deduced that
Odette is white (white(Odette)). The example uses
Hedman S (2004) A first course in logic – an introduction to one of the general inference rules of first order
model theory, proof theory, computability, and complexity. logic, which can be written as 8x:pðxÞ!qðxÞ
qðaÞ
and pðaÞ
.
Oxford University Press, Oxford, UK
Portoraro F (2010) Automated reasoning. In: Zalta E (ed) The top part denotes the statements present in the
Stanford encyclopedia of philosophy. Stable URL, given theory (read as “whenever a property p
http://plato.stanford.edu/cgi-bin/encyclopedia/archinfo.cgi? holds for some object x, another property q also
entry=reasoning-automated holds for x” and “p holds for the given object a”),
and the bottom part the deduced statement (“q holds
for a”).
Deductive Reasoning In contrast to ▶ induction, deduction does not
hypothesize new knowledge, but makes information
Angelika Kimmig implicitly contained in a logical theory explicit.
Departement Computerwetenschappen, Katholieke It thus is truth-preserving: whenever the premises
Universiteit Leuven, Heverlee, Belgium of deductive inference hold, the conclusions must
hold as well. Deduction either generates additional
Definition statements that follow logically from the theory,
or verifies a given statement by constructing
Deduction is the process of inferring whether a given a proof (▶ Proof, Logic). Deductive reasoning is the
statement is entailed, that is, whether it logically fol- basis for theorem proving and logic programming
lows from a theory. (Flach 1994).
D 558 Deductive-nomological (DN) Analysis

Cross-References References

▶ Induction Carnap R (1966) Philosophical foundations of physics. Basic


Books, New York
▶ PROLOG
Hempel CG, Paul O (1948/1965) Studies in the logic of expla-
▶ Proof, Logic nation. In: Hempel CG (ed) Aspects of scientific explanation.
Free Press, New York, pp 245–295

References

Flach P (1994) Simply logical – intelligent reasoning by Degree Centrality


example. Wiley, The Netherlands
Genesereth M, Nilsson N (1987) Logical foundations of artificial
intelligence. Morgan Kaufmann, San Francisco Deepak Sharma1 and Avadhesha Surolia2
1
Translational Health Science and Technology
Institute, Gurgaon, India
2
Molecular Biophysics Unit, Indian Institute of
Science, Bangalore, India
Deductive-nomological (DN) Analysis

Max Kistler Definition


IHPST, Université Paris 1 Panthéon-Sorbonne,
Paris, France Degree centrality is defined as the number of links
incident upon a node (i.e., the number of ties that a
node has). If the network is directed (meaning that ties
Definition have direction), then two separate measures of degree
centrality are defined, namely, indegree and outdegree.
The deductive-nomological (DN) analysis, explicitly Indegree is a count of the number of ties directed to
formulated by Hempel and Oppenheim (1948/1965), the node (head endpoints) and outdegree is the number
has had enormous influence as an account of of ties that the node directs to others (tail endpoints).
both causation and scientific explanation, in particu- In such cases, the degree is the sum of indegree and
lar in the tradition of logical empiricism. Instead outdegree.
of considering that explanation by laws replaces
causal explanation, the DN analysis suggests that
causation can be analyzed in terms of lawful, or Cross-References
“nomological,” explanation. Carnap has given
a classical statement of this view: “What is meant ▶ Pathway Targeting, Antimycobacterial Drug Design
when it is said that event B is caused by event A?
It is that there are certain laws in nature from
which event B can be logically deduced when they
References
are combined with the full description of event A.”
(Carnap 1966, p. 194). Bang-Jensen J, Gutin G (2007) Digraphs: theory, algorithms and
applications, 1st edn. Springer, London
Diestel R (2010) Graph theory, 4th edn. Springer, Heidelberg
Freeman LC (1978/1979) Centrality in social networks: conceptual
Cross-References clarification. Soc Netw 1:215–239
Sabidussi G (1966) The centrality index of a graph. Psychometrika
▶ Causality 31:581–603
Dense Overlapping Regulon Motif 559 D
breastfeeding, and needle sharing during intravenous
Delocalized drug abuse. Worldwide an estimated 10–20 million
people are infected with HLTV-1, the etiological
▶ Function, Distributed agent of adult T-cell leukemia/lymphoma (ATL), and
HTLV-1 associated myelopathy/tropical spastic
paraparesis (HAM/TSP). Endemic areas include
Southwest Japan, Caribbean islands, Central Africa,
Deltaretroviridae South America, and Melanesia. HTLV-2 infections
are endemic in Central Africa and native populations D
Christian Sch€onbach of North, Central, and South America and associated
Department of Bioscience and Bioinformatics, Kyushu with HAM/TSP-like illness. HTLV-3 and HTLV-4 are
Institute of Technology, Iizuka, Fukuoka, Japan largely uncharacterized in terms of pathogenicity and
epidemiology (Wolfe et al. 2005).

Synonyms
Cross-References
Deltaretrovirus
▶ HTLV, Cellular Transcription

Definition
References
Retroviridae is a family of retroviruses whose replica-
tion involves one reverse transcription step. Older pub- Index of Viruses – Retroviridae (2006) In: B€uchen-Osmond
C (ed) ICTVdB – The Universal Virus Database. Columbia
lications often refer to three retrovirus subfamilies
University, New York (Version 4)
Oncovirinae, Lentivirinae, and Spumavirina. Nowa- Van Dooren S, Salemi M, Vandamme AM (2001) Dating
days, the classification is obsolete and retrovirus is the origin of the African human T-cell lymphotropic
grouped into two subfamilies Orthoretrovirinae and virus type-i (HTLV-I) subtypes. Mol Biol Evol 18(4):661–
671
Spumaretrovirinae. The genera Betaretrovirus, Wolfe ND, Heneine W, Carr JK, Garcia AD, Shanmugam V,
Gammaretrovirus, Alpharetrovirus, Deltaretrovirus, Tamoufe U, Torimiro JN, Prosser AT, Lebreton M, Mpoudi-
and Lentivirus belong to the subfamily of Ngole E, McCutchan FE, Birx DL, Folks TM, Burke DS,
Orthoretrovirinae (Index of Viruses – Retroviridae Switzer WM (2005) Emergence of unique primate
T-lymphotropic viruses among central African bushmeat
2006). Deltaretroviridae are complex retroviruses,
hunters. Proc Natl Acad Sci USA 102(22):7994–7999
whose genomes contains besides LTR-gag-pol-env-
LTR a number of accessory genes. Deltaretrovirus
infects vertebrates. Bovine leukemia virus (BLV),
Human T-lymphotropic virus 1, 2, 3, and 4 (HTLV-1,
-2, -3, and 4) and Simian T-lymphotropic virus 1, 2, Deltaretrovirus
and 3 (STLV-1, -3, -4) were isolated from cow, human,
and various nonhuman primates. HTLV-1 and its sub- ▶ Deltaretroviridae
types are thought to originate from various indepen-
dent interspecies transmissions from simians to
humans starting around 50,000 years ago. The most
recent HTLV-1 subtype f probably emerged some
3,000 years ago (Van Dooren et al. 2001). Dense Overlapping Regulon Motif
HTLV-1 and HTLV-2 are human-pathogenic
viruses that are transmitted by sexual contact, ▶ Dense Overlapping Regulons
D 560 Dense Overlapping Regulons

Dense Overlapping Regulons Design Explanation

Guangxu Jin ▶ Explanation, Functional


Systems Medicine and Bioengineering,
Bioengineering and Bioinformatics Program,
The Methodist Hospital Research Institute,
Weill Medical College, Cornell University, Houston,
TX, USA Design of Experiments

Alexandros Kiparissidis, E.N. Pistikopoulos and


Synonyms Athanasios Mantalaris
Biological Systems Engineering Laboratory,
Dense overlapping regulon motif; DOR Department of Chemical Engineering, Centre for
Process Systems Engineering, Imperial College,
London, UK
Definition

Dense overlapping regulons (DORs) were found in the Synonyms


Escherichia coli transcriptional regulation network.
A set of operons, Z1, . . ., Zm, are each regulated by DOE; Optimal experiment design
a combination of a set of input transcription factors,
X1, . . ., Xm.
Definition

References The aim is to design experiments in order to maxi-


mize the information content of the measurements in
Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the the context of their utilization for estimating the
transcriptional regulation network of Escherichia coli. Nat
model parameters. This is equivalent to minimizing
Genet. 2002;31:64–8
the variances of the parameters to be estimated. The
variances are a measure for the uncertainty of the
parameters, also represented by individual confidence
interval approximations. Design of experiments
Density of a Subgraph (DoE) aims at minimizing the variances of the param-
eters to be estimated. Experiment design for parame-
▶ Clustering Coefficient ter precision aims at determining optimal
experimental settings and measurement times in
order to maximize the information content from the
measured data generated by these experiments. This
Deoxyribonucleic Acid Sequencing is equivalent to minimizing the confidence ellipsoid
of the parameters to be estimated.
▶ DNA Sequencing

Characteristics

Depletion Rate DoE aims to address the following questions:


• What should be the initial conditions for the
▶ Life Span, Turnover, Residence Time experiment?
▶ Lymphocyte Population Kinetics • How long should we run the experiment?
Design of Experiments 561 D
• How should we vary the controls (e.g., the time A-optimality
profiles of feed flowrates)?
• When should we take the measurement samples?
The overall aim is to generate the maximum amount
of information for a subsequent estimation of the
model parameters, while trying to maintain the process
within the required operating envelope. In mathemat- E-optimality
ical terms, we want to minimize some measure c of the θ2
variance-covariance matrix, ny, of the parameters (y) to D
be estimated:

minx CðV # Þ: (1)


D-optimality
The variance-covariance matrix is given by:
θ1
 1
V # ¼ H# ; (2) Design of Experiments, Fig. 1 The different design criteria
for a two-dimensional confidence ellipsoid (Adapted from PSE,
Ltd)
where H# is the information matrix which is a ny x ny
matrix (ny is the number of parameters (y) to be esti-
mated) and is given by:
This minimizes the sum of the variances of the
0  1
individual parameter estimates. It corresponds to
@ @
X @ym zil ðriml Þ @yn zil ðriml Þ
N exp X Nm;il
X
H# ¼ @ A: minimizing the dimensions of the smallest hyper rect-
l¼1 i2SV m¼1
s2il zil ðriml ; bil Þ angle within which the confidence ellipsoid can be
inscribed.
m; n ¼ 1; :::; ny
D-optimality: minimize the determinant of the var-
(3) iance-covariance matrix:

where x is the set of experiment decision variables in 1


all experiments, NEXP is the number of experiments, CD ðV # Þ ¼ det ðV # ÞNy : (5)
SVi is the set of measured state variables in experiment
l, Nil the number of sampling points for measured This is also known as the minimum volume crite-
variable i in experiment l, riml the m-th measurement rion since it minimizes the volume of the confidence
time for variable i in experiment l, zil(riml) the model- ellipsoid.
predicted value of variable i at time point riml in E-optimality: minimize the largest Eigen value of
experiment l and s2il zil ðriml ; bil Þ The variance of the the variance-covariance matrix:
measurement error of variable i at time point riml in
experiment l.
CE ðV # Þ ¼ lMAX ðV # Þ: (6)
In order to compare the magnitude of different
variance-covariance matrices, various real-valued
functions have been suggested as a measure of opti- The Eigen values of the variance-covariance matrix
mality. The three most commonly used criteria are: correspond to the lengths of the minor and major axes
A-optimality: minimize the trace of the variance- of the confidence ellipsoid. By minimizing the largest
covariance matrix: Eigen value, the design renders the confidence ellip-
soid as spherical as possible.
1 X Ny Figure 1 shows a graphical interpretation of the
C A ðV # Þ ¼ ðV # Þm;m : (4) different design criteria for a two-dimensional confi-
N y m¼1
dence ellipsoid.
D 562 Designing Experiments for Sound Statistical Inference

Cross-References ▶ bias, i.e., it avoids systematic errors in the conclu-


sions, and ensures that the experiment is efficient,
▶ Designing Experiments for Sound Statistical (▶ Efficiency) i.e., it minimizes the uncertainty in the
Inference conclusions for a given amount of cost.
▶ Global Sensitivity Analysis
▶ Model-based Experimental Design, Global
Sensitivity Analysis Characteristics
▶ Optimal Experiment Design, Ill-Posed Problems
As an example, consider an experiment which com-
pares the expression of genes in subjects with type II
References diabetes to healthy controls using a whole
transcriptome shotgun sequencing (RNA-seq) (Wang
Process Systems Enterprise (1997–2011) gPROMS. www. et al. 2009) technology. Statistical experimental design
psenterprise.com/gproms
is characterized by the following steps.

Step I: Define the Problem


Designing Experiments for Sound Define Who or What Is Being Studied
Statistical Inference The first step in experiment planning is to clearly
define the populations and the conditions to be
Melissa Key1 and Olga Vitek2 represented by the subjects in the study. In systems
1
Center for Computational Diagnostics, Indianapolis, biology, it may be practical to initially focus the study
IN, USA on smaller populations. In the example of type II diabe-
2
Department of Statistics, Department of tes, if literature suggests that the effect of the disease is
Computer Science, Purdue University, West Lafayette, different in men and women, the experiment may limit
IN, USA the scope of the study to a single gender initially, then
plan a follow-up experiment to verify that the conclu-
sions apply to the other gender. Subject availability may
Synonyms impose additional constraints. For example, if the dia-
betes study can only access subject samples from
Design of experiments; Experimental planning; a single health-care institution, the population is limited
Statistical experimental design to that institution only, and a larger follow-up experi-
ment is necessary to broaden the conclusions.

Definition Define What Is Being Measured


It is also important to define the ▶ response, i.e., a
An ▶ experimental design is a protocol that defines all measurable aspect of the biological samples for which
aspects of a planned experiment. This includes the variation between conditions or treatments is of
a definition of the populations of biological organisms interest. In systems biology, many responses are quanti-
and the conditions of interest, procedures for selecting tative measurements at the molecular level, such as gene
the individuals from the populations or allocating them expression. It is common to simultaneously measure
to treatments, and the organization of experimental a large number of responses. For example, the diabetes
material from the selected individuals in the experi- experiment simultaneously quantifies the expression of
mental framework. Statistical experimental design tens of thousands of genes on each biological sample.
(Kreutz and Timmer 2009; Montgomery 2001; The definition of the response can be nontrivial. In the
Oehlert 2000) is conducted in conjunction with example RNA-seq experiment, gene expression can be
statistical inference, i.e., in situations where measure- quantified separately for each isoform, or as the sum of
ments on the selected individuals are used to make the expression of all of its isoforms, and the latter can be
biologically relevant conclusions regarding the calculated over the unions or the intersections of the
unknowns. Statistical experimental design avoids exons (Garber et al. 2011). The definitions can have
Designing Experiments for Sound Statistical Inference 563 D
important implications for interpreting the data and for The two sources of variation lead to two possible
the biological conclusions. replication types. Biological replicates are multiple
experimental units with the same condition or treat-
Translate the Research Question into a Statistical ment. For example, in the diabetes experiment, biolog-
Hypothesis ical replicates are multiple subjects from a disease
Many systems biology experiments aim at finding rel- group. Technical replicates are multiple measurements
ative changes in the response between conditions. In on the same experimental units. They address the pre-
the diabetes example, of interest are (log)-fold changes cision of the measurement protocol but not the biolog-
in the transcription rate between the populations of ical variation. Even sensitive technologies such as D
healthy subjects and the subjects with the disease, RNA-seq do not eliminate the presence of biological
separately for each gene. In statistical terminology, variation. Therefore, experiments that focus on the
this translates into testing the null hypothesis of “no populations always require biological replicates as
change” against the alternative hypothesis, e.g., log- part of the design.
fold change different from zero. Not all nonzero values
of (log)-fold change are of practical importance, and Randomization
the range of biologically relevant values needs to be Randomization guards the experiment against ▶ bias.
specified in advance. Bias occurs when the experimental units with different
conditions are selected or handled in systematically
Step II: Sample Selection and Resource Allocation different ways, not intended by the purpose of the
Define the Experimental Unit study. The two sources of variation (biological and
▶ Experimental units are the basic currency of sample technical variation) lead to two sources of bias. The
selection and resource allocation. An experimental unit first occurs when subjects selected from the groups
is the subject, sample, or object that carries the condition differ in known or unknown biological characteristics
or the treatment. For example, in the diabetes experi- (such as age, gender, or ethnicity) that affect the
ment, the experimental unit is a person (i.e., a patient). If response. The second occurs due to systematic differ-
multiple treatments are assigned to the same subject in ences in the technical aspects of the experiment
different areas, (e.g., a different topical treatment is between conditions, such as in protocols of specimen
applied to different patches of the skin), then the exper- collection or time of data acquisition.
imental unit is an area (e.g., skin patch). If a treatment is When these sources are unaccounted for, they
assigned to pools of biological material from multiple become confounding factors, i.e., they affect the
subjects, then the experimental unit is a pool. response in addition to the condition or treatment,
and bias the results. Confounding cannot be removed
Replication by increasing the sample size or by demonstrating the
Quantities such as transcript abundance vary naturally reproducibility of the results in a repeated instance of
in the populations and introduce biological variation in the same workflow. Instead, confounding can be
the response. In addition, sample processing and han- removed by randomization. A randomization of the
dling can interfere with its composition, and measure- experimental units (e.g., a random selection of samples
ment technologies can be somewhat imprecise, from the underlying population) and the randomization
introducing technical variation. As the result, the of the order of sample processing and data acquisition
observed difference in the response can be either the distribute the confounding factors roughly equally
systematic effect of the condition or treatment, or across groups and eliminate the bias.
a random artifact of these sources of variation. The
goals of replication are to (1) evaluate the extent of Blocking
biological and technical variation, (2) assess whether A completely randomized design has two drawbacks.
the observed change in the response is likely to arise First, although randomization averages the allocation
from this variation by random chance, and (3) increase of confounding factors between conditions, it can yield
the precision of the conclusions, since increasing the unequal allocations in experiments with a small num-
number of replicates increases our confidence that the ber of replicates. Second, in randomized experiments,
difference is indeed systematic. the variability of the response within each group is the
D 564 Designing Experiments for Sound Statistical Inference

combination of the biological and technical variation, to a statistician once the data are collected. In practice,
and it may be difficult to detect the systematic differ- experimental design and statistical modeling are
ences between conditions. tightly interconnected. The understanding of the down-
Blocking improves upon these two aspects when stream statistical analysis helps us determine the opti-
confounding factors are known in advance, and is mal design and maximize our ability to detect
performed by imposing restrictions on randomization. quantitative changes in the response for a given
The two sources of variation (biological and technical) amount of replication and cost.
lead to two types of blocking. In the context of sample Statistical inference requires a probability model
selection, blocking is sometimes referred to as that describes three aspects of the experiment. First,
matching. In the diabetes example, a block is the model describes the sources of systematic varia-
a combination of known confounding factors tion in the response (such as conditions, subjects, and
(e.g., age, gender, ethnicity). The design enforces an blocks) in the experimental design. Second, the
equal number of subjects with diabetes and controls in model describes the scope of interpretation that we
a block, and randomly selects subjects with these char- associate with the biological replicates, i.e., whether
acteristics from the population. When multiple mea- we restrict our conclusions to the subjects in the
surements are possible on a same subject, e.g., two study or generalize the conclusions to the entire
treatments can be applied to two skin patches, the underlying populations. In systems biology, many
subject forms a block and receives both treatments, experiments aim at generating initial hypotheses for
and the pairs of the patches are matched. a subsequent follow-up, and the number of biological
For data acquisition, blocking is sometimes referred replicates is too small to adequately represent the
to as multiplexing. If a particular experimental step is underlying populations. In these cases, it is useful to
noisy but can accommodate multiple samples, it can be restrict the scope of the conclusions to the selected
viewed as a block. Blocking enforces the constraint that samples only and treat them as fixed units. On the
the step processed an equal number of samples from other hand, experiments at the validation stage aim at
each condition. An example of block is a two-color generalizing the conclusions to the underlying
cDNA mocroarray, and a variety of strategies for allo- populations, and in this case, the biological replicates
cating samples to arrays have been proposed (Dobbin are best viewed as random selections from the
and Simon 2002). In an RNA-seq experiment, blocking populations and are represented in a mixed or
can utilize the capacity to label the samples with sam- multilevel model.
ple-specific sequences (“bar codes”) that allow multiple Finally, the probability model describes the nature
samples to be sequenced in a same lane of a flow-cell, of the nonsystematic variation in the response, based
making within-lane differences more consistent (Auer on the information from a pilot study or from the
and Doerge 2010). Blocking strategies for mass spec- literature. For example, in gene expression
trometry-based proteomic experiments have also been microarrays, the variation of log-response can be
discussed (Oberg and Vitek 2009). A randomization of assumed normally distributed and leads to models
samples within blocks is always required to account for such as analysis of variance (ANOVA) (▶ Analysis
the unknown confounding effects. of Variance (ANOVA) Tables). In the RNA-seq exam-
Blocking enforces a strict balance of the ple, the response is the count of reads and can be
confounding factors between conditions and prevents described using a Poisson or a negative binomial
bias. In addition, if the downstream statistical analysis distribution. Frequentist modeling treats the unknown
distinguishes the variation of the response between the parameters of these distributions as fixed, while
blocks from the within-block variation between the Bayesian models specify the prior distributions of
conditions, it increases the sensitivity of tests. all the parameters, and Empirical Bayes models
estimate the parameters of the prior distributions
Step III: Statistical Modeling from the data.
Describe the Anticipated Properties of Data in
a Statistical Model Describe the Model-Based Testing
Unfortunately experimentalists often consider statisti- A consequence of each probability model is the proce-
cal modeling as a separate task, which can be deferred dure for testing the scientific hypothesis of interest.
Designing Experiments for Sound Statistical Inference 565 D
For example, in analysis of variance, testing the null changes in the response between conditions requires
hypothesis of the same expected gene expression in more replication. Experiments with larger variation
two conditions leads to the student’s t-test. Count also require more replication. An increase in the num-
response leads to the Fisher’s exact test or tests based ber of biological replicates is always more efficient
on Poisson regression. In experiments with multiple than an increase in the number of technical replicates,
responses, such as the expression of genes, a separate and experiments without biological replication cannot
hypothesis is tested for each gene and the testing generalize their conclusions to the underlying
requires controlling a multivariate error rate, such as populations. Experiments with more responses, and
the false discovery rate. also experiments with a smaller proportion of expected D
changes in multiple responses, have more opportunity
Derive Model-Based Requirements of Sample Size for false discoveries and therefore require more
The combination of the protocol of replication, ran- replication.
domization, and blocking and of the probability model The implementation of the steps of the experimental
allows us to calculate the sample size (▶ Power and designs described in this entry depends substantially
Sample Size) (Lenth 2001; Wittes 2002), i.e., the desir- on the characteristics of the biological system and on
able number of biological and technical replicates, the technology at hand. The rapid technological
with two goals. First, sample size calculations are advances introduce constant changes into how
used to evaluate the expected operational characteris- the experiments are performed and into the properties
tics of the experiment. The number of replicates should of the resulting datasets. At the same time, they moti-
be large enough to enable the detection of biologically vate new developments in statistical methodology.
significant changes in the response, but not too large in As the result, experimental planning becomes increas-
order to optimize the cost. Second, sample size calcu- ingly complex and cannot always be achieved in a fully
lations allow us to compare various strategies of sam- automated fashion. A close collaboration between
ple selection and resource allocation, e.g., various experimentalists and statisticians at all steps of the
strategies of allocation samples to blocks. The optimal experiment, starting from the earliest stages of exper-
strategy will maximize our ability to detect a change in iment planning, is highly recommended.
the response for a fixed number of biological and
technical replicates.
In frequentist modeling, each test is characterized Cross-References
by five properties: (1) the probability of type I error
(i.e., the probability of rejecting the null hypothesis ▶ Analysis of Variance
when it is true), (2) the probability of type II error ▶ Design of Experiments
(i.e., the probability of not rejecting the null hypothesis ▶ Experimental Design, Variability
when the alternative is true), (3) the extent of known ▶ False Discovery Rate (FDR)
sources of variation, (4) the biologically significant ▶ Fisher’s Test
change in the response, and (5) the number of biolog- ▶ Frequentist Inference
ical and technical replicates in each condition. ▶ Hypothesis Testing
A modification is introduced for experiments with ▶ Hypothesis Testing, Bayesian vs Frequentist
multiple responses, such as RNA-seq. The probability ▶ Hypothesis Testing, Parametric vs Nonparametric
of type I error in (1) is replaced with the false discovery ▶ Mixed and Multi-Level Models
rate, and an additional quantity (6), the expected pro- ▶ Multiple Hypothesis Testing
portion of the true null hypotheses in all tests, is spec- ▶ Poisson Regression
ified. The quantities (l)–(6) are interconnected, and ▶ Power and Sample Size
a prior specification of all of them but one allows us ▶ Quantitative Experiment Design
to solve for the remainder. In particular, the specifica- ▶ RNA-Seq
tion of (l)–(4) and (6) allows us to solve for the sample ▶ Sample Variability, Inter-groups
size. ▶ Sample Variability, Intra-groups
Several general conclusions can be made from the ▶ Statistical Methods in Systems Biology
calculations of sample size. Detection of smaller ▶ Student’s t-Test
D 566 Deuterated Glucose (2H-Glucose)

References

Auer PL, Doerge RW (2010) Statistical design and analysis of


RNA sequencing data. Genetics 185(2):405–416
Dobbin K, Simon RM (2002) Comparison of microarray designs
for class comparison and class discovery. Bioinformatics
18(11):1438–1445
Garber M, Grabherr MG, Guttman M, Trapnell C (2011) Com-
putational methods for transcriptome annotation and quanti-
fication using RNA-seq. Nat Methods 8(6):469–477
Kreutz C, Timmer J (2009) Systems biology: experimental Developmental Biology, Classical Sea Urchin Experi-
design. FEBS J 276:923–942 From Melissa ments, Fig. 1 Spemann Mangold experiment – When a small
Lenth RV (2001) Some practical guidelines for effective sample region of the embryo, the dorsal lip, is grafted to the opposite
size determination. The Am Statist 55(3):187–193 (ventral) side of a host gastrula embryo (on the left), the resulting
Montgomery DC (2001) Design and analysis of experiments, Xenopus laevis tadpole develops a Siamese twin 3 days later
5th edn. Wiley, New York (right)
Oberg AL, Vitek O (2009) Statistical design of quantitative mass
spectrometry-based proteomic experiments. J Proteome Res
8(5):2144–2156 Exp design
Oehlert GW (2000) A first course in design and analysis of
experiments, 1st edn. W. H. Freeman, New York the framework set by the Entwicklungsmechanik (e.g.,
Wang Z, Gerstein MB, Snyder M (2009) RNA-seq: Roux, His, etc.) in the 1890s: perturbing at various stages
a revolutionary tool for transcriptomics. Nat Rev 10:57–63 the development in order to unravel the mechanisms at
Wittes J (2002) Sample size calculations for randomized con-
each stage of development. Sea urchins were easy to
trolled trials. Epidemiol Rev 24:39–53, From Stephane
handle model organisms, even if until Hans Spemann
the embryological techniques were quiet crude.
The first crucial experiment was done by Hans
Driesch (1894): a sea urchin, when divided after the
Deuterated Glucose (2H-Glucose) first cell stage (or before the fourth), develops into two
sea urchins. The later the stage, the smaller
▶ Lymphocyte Labeling, Cell Division Investigation the embryos. This experiment was, first by Driesch
himself, seen as an argument for vitalism (the vital
power being efficient in both embryo; cutting it off
should have simply broken a purely material disposal).
Deuterated Water (2H2O) Hans Spemann did two important experiments: in
the first, one half of a blastomere of a sea urchin
▶ Lymphocyte Labeling, Cell Division Investigation develops into a complete one. In the second one,
done with Hilde Mangold (1924), a set of sea urchin
cells from one embryo induces Siamese twins when
grafted in another urchin embryo (although not any
Developmental Biology, Classical Sea cut of the urchin leads to this result) (Fig. 1). This
Urchin Experiments second kind of experiments has been understood as an
argument against preformism (see ▶ Preformation
Philippe Huneman and Epigenesis). Specifically, it gave an evidence
Institut d’Histoire et de Philosophie (IHPST), des for embryonic induction. Spemann and Mangold
Sciences et des Techniques, Université Paris 1 saw as an “organizer” the substance which induces
Panthéon-Sorbonne, Paris, France the second sea urchin embryo. More precisely,
because the fate of the transplanted cells could there-
fore be traced during development, Spemann and
Definition Mangold were able to demonstrate that the graft
became a notochord, yet induced neighboring cells
Manipulations of sea urchin embryos have been impor- to change fates. These neighboring cells adopted dif-
tant benchmarks in the development of embryology, after ferentiation pathways that were more dorsal, and
Developmental Control Networks 567 D
produced tissues such as the central nervous system, Characteristics
somites, and kidneys. Afterward, against Spemann’s
vitalism, it has been proven that even killed cells from Developmental control networks or cenes can be
the organizer substance were able to induce the devel- linked together to form larger cenes. The global devel-
opment, which prevented vitalistic interpretations of opmental control network in a genome is called the
the organizers. cenome. The topology of the cene determines its ideal
dynamic phenotype. Developmental control networks
can be deterministic or stochastic. They can be condi-
Cross-References tionally activated by satisfaction of some condition F. D
They can involve cell-cell signaling. Cenes are abstract
▶ Explanation, Developmental but executable networks that guide the development of
▶ Preformation and Epigenesis multicellular organisms. Changes in developmental
control networks can result in major evolutionary tran-
sitions in the morphology and function of organisms.
Examples of cenes include ▶ stem cell networks,
Developmental Cancer Networks ▶ cancer networks and terminal, and progenitor cell
Developmental Control Networks (for details see Wer-
▶ Cancer Networks ner 2011a, b).

Developmental Networks are Executable


Networks
Developmental control networks or cenes are execut-
Developmental Control Networks able networks. The cell has an interpretive executive
system, the IES, that interprets and executes the direc-
Eric Werner tives in developmental control networks. This system
Department of Physiology, Anatomy and Genetics, co-evolved with the developmental control networks
University of Oxford, Oxford, UK (Werner 2011a).
Department of Computer Science, University of
Oxford, Oxford, UK Subnetworks within Developmental Control
Oxford Advanced Research Foundation, Fort Myers, Networks
FL, USA Any network can link subnetwork of another develop-
mental network. This means the link has further devel-
opmental progeny generated by that subnetwork. In
Synonyms this way more and more complex cenes are built up.

Cancer networks; Cenes; Developmental networks; Developmental Networks Subsume Gene


Stem cell networks Networks
Developmental control networks subsume and control
lower level reactive network in the cell. The cell is
Definition a living organism that needs to react to local information
to survive. Thus, the cell is locally reactive, but is
A developmental control network or cene is a network globally controlled by its developmental networks.
that controls the development of multicellular organ- Hence, there are levels of network control in the cell.
isms by controlling cell states. The nodes in the net- In generating the embryo, the global developmental
work are cell control states. Edges in the network network makes use of a cell’s ability to move, to com-
denote cell actions including jumps to new cell municate, and to react to its environment. Hence, while
states. Branches in the network denote cell division the global developmental network does not uniquely
where each daughter cell enters a possibly new determine the outcome of the ontogeny of an organism,
control state. it constrains to reach its ultimate form and functions.
D 568 Developmental Control Networks

Developmental Control p
Networks, Fig. 1 Flexible
exponential-linear stochastic 1− p
stem cell network. One stem j1
cell (S) divides to produce two
cells (J1 and J2) that each J1
S J2 T
stochastically activates either
the stem cell itself or
a terminal cell (Werner 2011b)
j2 1− q

Stochastic Developmental Networks stem cell results in a terminal tumor that does not
A stochastic developmental control network contains develop further because it consists only of cells of
links to nodes that have an associated probability p and terminal type T. This shows that stochastic cancer
only jump to that node with probability p. Consider an Stem Cell Networks (▶ Cancer Networks) can in
example of a stochastic, flexible developmental net- some cases go into spontaneous remission. While
work whose characteristics and topology are deter- this network can also exhibit exponential growth
mined by the probabilities of its links: even in a stochastically linear probability distribu-
The network in Fig. 1 is very flexible. Depending tion, because of the two backward loops, there is
on the probability distribution, the network can a diminishing probability that it remains exponential.
range between being exponential, linear, or terminal, Thus, whether this network results in linear or expo-
as well as every mixture in between. Furthermore, nential proliferation depends on the probability
the network can be deterministic, stochastic, or distribution.
mixture of both. This flexibility is partly the result For this stochastic network, if the probabilities
of separating out the probability distributions for the p ¼ 1q then the higher the probability of p the more
behaviors of the two daughter cells. It shows that the network approximates a deterministic linear devel-
the architecture of the network imposes constraints opmental network. Since, in this case the distribution is
on what kinds of developmental dynamics are in antisymmetric, the cell population partition of cell
principle possible. The probability distribution pre- types consists of an equal number of exponential
supposes a network architecture of possible develop- stem cells and terminal cells, with the majority of
mental paths. cells being linear stem cells. This corresponds to the
The cell S is only a stem cell stochastically and not observed distribution in epidermal basal stem cells. If,
intrinsically when p < 1 and q < 1. When p ¼ q ¼ 1 the on the other hand, we have a symmetric distribution
network is deterministic exponential. When p ¼ 1, where p ¼ q then the higher the probability of p
q ¼ 0 or p ¼ 0, q ¼ 1 the network is deterministic the more the network approximates a deterministic
linear, that is, a deterministic 1st order geometric stem exponential network. Thus, the type of cancer network
cell network. If p ¼ q ¼ 0 the network is terminal. we have depends on the probability distributions over
When p ¼ 1 and q < 1, or when q ¼ 1 and p < 1, the connecting stochastic links.
then the network is mixed deterministic linear with
stochastic exponential tendencies. The Network Evolution of Multicellular Organisms
If probabilities p ¼ q then as p and q approach 1 the The evolution of multicellular organisms is directly
network approaches the behavior of a deterministic linked to the increasing complexity of their
exponential network. However, if p and q are differ- developmental control networks. These networks are
ent then this network can simulate a linear stochastic an autonomous layer on top of normal gene networks
network as well when, for example, p approaches (Werner 2011a, b).
1 and q approaches 0, or vice versa. As the probabil- Developmental control networks provide a power-
ities p and q decrease, the more frequently the cancer ful framework for explaining diverse developmental
Developmental Networks 569 D
phenomena beyond stem cells and cancer. These across many species, and developmental theory (unlike
include bilateral symmetry and the evolution of the the evolutionary viewpoint) is mostly interested in
internal skeleton in bilateral symmetric animals commonalities across phyla (e.g., in regular constant
(Werner 2012a, b). developmental mechanisms such as apoptosis) rather
than in differences. Developmental modules exist at
many levels: genetic (GRNs), cellular (morphogenetic
Cross-References fields), and tissues (germ layers: ectoderm, etc.), and
therefore may have some overlap.
▶ Cancer Networks Although quasi-independence defines modules in D
▶ Cene general, developmental modules are not necessarily
▶ Cenome the same modules as the ones identified by physiology
▶ Stem Cell Networks or morphology (Winther 2001). For instance, the
mesoderm – a developmental module – gives rise to
the heart (a physiological module), but is also involved
References in the production of the vertebrate eye (another phys-
iological module).
Werner E (2011a) On programs and genomes, arXiv:1110.5265v1 Modularity seems tied to evolvability (Wagner and
[q-bio.OT]. http://arxiv.org/abs/1110.5265
Altenberg 1996) because it entails that no variation in
Werner E (2011b) Cancer networks: a general theoretical and
computational framework for understanding cancer, a subpart is likely to change the functioning of the
arXiv:1110.5865v1 [q-bio.MN]. http://arxiv.org/abs/1110. whole; therefore, mosaic evolution can be possible.
5865v1 Developmental modularity fulfills the same evolution-
Werner E (2012a) The origin, evolution and development
ary requirements. It raises the question of its evolu-
of bilateral symmetry in multicellular Organisms.
arXiv:12073289v1 [q-bioTO]. http://arxiv.org/abs/1207.3289 tionary origins: is it given with the first elementary
Werner E (2012b) How to grow an organism inside-out: evolu- eukaryote and generally the most basic of cell mecha-
tion of an internal skeleton from an external skeleton in nisms? Or has it been selected for some advantages, or
bilateral organisms. arXiv:12073624v1 [q-bioTO]. http://
evolved as a by-product of selection for some devel-
arxiv.org/abs/1207.3624
opmental mechanisms? No consensus is yet attained.

Cross-References
Developmental Module
▶ Explanation, Developmental
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Sciences et des Techniques, Université Paris 1 References
Panthéon-Sorbonne, Paris, France
Oliveri P, Tu Q, Davidson E (2008) Global regulatory logic for
specification of an embryonic cell lineage. PNAS
105(16):5955–5962
Definition Wagner G, Altenberg L (1996) Complex adaptations and the
evolution of evolvability. Evolution 50(3):967–976
A developmental module is a set of cells, or genes, Winther R (2001) Varieties of modules: kinds, levels, origins,
and behaviors. J Exp Zool 291:116–129
which is more intrinsically connected than connected
to its surroundings, which is constant across some
clades, and which plays a specific causal role in devel-
opment. For example endoderm is a developmental
module, but also the Gene Regulatory Network of the Developmental Networks
skeletogenic micromere cell lineage in sea urchins
(Oliveri et al. 2008). Developmental modules are ▶ Cene
important units of analysis because they are constant ▶ Developmental Control Networks
D 570 Developmental Systems Theory

gradual change over time, the correlation between


Developmental Systems Theory genes and phenotype varies wildly, making it infeasi-
ble to talk about the informational content of any
Niall Palfreyman particular determinant in isolation from the entire
Biotechnology and Bioinformatics, Weihenstephan- genetic and environmental matrix.
Triesdorf University of Applied Science, Freising, Rather, information is an intrinsic aspect of the
Germany developmental process which is reconstructed in each
individual organism, and is therefore strongly depen-
dent on time and context. Consequently, in DST, the
Definition concept of a genetically programmed organism is
replaced by the idea of a life cycle process which
Developmental Systems Theory (DST) regards evolu- reconstructs itself out of an entire developmental sys-
tion as change not in gene frequencies, but in the tem of genetic, environmental, and other resources
spatial and temporal structure of developmental pro- available to the life cycle.
cesses or systems. DST arose out of Oyama’s (2000) Like Oyama, various authors have objected to the
concerns about the role of ▶ information in the Mod- use of the information metaphor. A recent analysis
ern Synthesis (MS) of evolutionary biology, which concluded that, instead of expanding it as in DST,
holds that genes, whether individually or in combina- information should be dropped from the biological
tion, encode information about the developmental lexicon because it is inimical to biological thought
construction of phenotypes. The MS thus makes an and thus is hindering the development of a theory of
inherently dualistic distinction between material organisms (Longo et al. 2012).
genes and bodies, and the ▶ information encoded in The rather abstract formulation of DST has on the
them, which “passes through bodies and affects them, one hand excited criticism that it is far removed from
but it is not affected by them on its way through” the practicalities of evolution in a slowly changing
(Dawkins 1995). environment. On the other hand it has also facilitated
A central difficulty of this account is that it views dialogue between a growing community of authors
the genome as a representation of development, who emphasize the need to integrate developmental,
whereas the genome is in reality just one among evolutionary, and ecological processes into a single
many dynamical components in a developmental sys- coherent theory. The link between DST and evolu-
tem which includes ribosomes, methylation, RNA tionary-developmental biology is clear, and the man-
splicing, transcription factors, intercellular signaling, ifestly hierarchical definition of the life cycle meshes
and environmental and cultural resources. If we wish well with multilevel theories of selection. Also, from
to hold to the metaphor of information, we are forced the DST perspective, ontogeny, phylogeny, and
to regard that information as distributed throughout niche-construction are respectively the enaction of
the entire developmental system. Yet as Oyama life cycle, evolutionary and cultural continuity in the
(2000) points out, this “means that information is internal structural relationships of a developmental
not ‘out there,’ that it is not in the nucleus or anyplace system.
else, that it is a way of talking about certain interac-
tions rather than their cause or a prescription for
them.” References
Oyama’s point is that there certainly exist continu-
ities and correlations between organisms and their Dawkins R (1995) River out of eden: a Darwinian view of Life.
Basic Books, New York
environments which we may refer to as information;
Longo G, Miquel PA, Sonnenschein C, Soto AM (2012) Is
however, information in this sense is not a commodity information a proper observable for biological organization?
which can be carried, stored, or transmitted in genes or Prog Biophys Mol Biol 109:108–114
in any other way. Nijhout (in Oyama et al. 2001), for Oyama S (2000) The ontogeny of information: developmental
systems and evolution. Duke University Press, Durham
instance, reports on a computer simulation of gene
Oyama S, Griffiths PE, Gray RD (eds) (2001) Cycles of contin-
frequencies in a population subject to phenotypic gency: developmental systems and evolution. MIT Press,
selection. While the resulting phenotype exhibits Cambridge, MA
Differential Evolution 571 D
Steinberg MS (1962) On the mechanism of tissue reconstruction
Differential Adhesion Hypothesis by dissociated cells. I. Population kinetics, differential adhe-
siveness, and the absence of directed migration. PNAS
48(9):1577–1582
Anja Voss-B€ ohme Voss-Boehme A, Deutsch A (2010) On the cellular basis of cell
Center for Information Services and High Performance sorting kinetics. J Theor Biol 263(4):419–436
Computing (ZIH), Technical University Dresden,
Dresden, Germany

D
Synonyms Differential Equations with Deviating
Arguments
DAH
▶ Dynamical Systems Theory, Delay Differential
Equations
Definition

The Differential Adhesion Hypothesis (DAH) is a


theory advanced by Steinberg (1962) to explain the Differential Evolution
mechanisms of cell sorting. The latter are in vitro
observations, where mixed heterotypic cell aggregates Zhong-Yuan Zhang
sort out into homotypic clusters. The sorting proceeds School of Statistics, Central University of Finance and
via the coalescence of small clusters into larger ones Economics, Beijing, China
until a complete de-mixing of cell types is achieved.
The DAH postulates that, analogous to the de-mixing
of immiscible fluids, differences in the cell-type- Definition
specific strengths of intercellular adhesion cause mea-
surable tissue surface tensions which drive the sorting Differential evolution (DE) (Xu et al. 2007) tries to
process to minimize these tensions. It predicts that find the optimal solution of an objective function f(x) :
round cell aggregates emerge where either the cell ℝn ! ℝ by collective-intelligence-based-search strat-
type with the highest homotypic intercellular adhesion egy. DE is initialized with a swarm of particles {xi ϵ
is in the center of the aggregate and is surrounded by ℝn: i ¼ 1, 2, ···, N} that serves as the candidate solu-
cells with lower homotypic intercellular adhesion or tions, and the particles are updated by turns. For exam-
a serial arrangement of homotypic clusters arises. The ple, when updating the particle xi at step t + 1, one
DAH has been challenged both by experimental and randomly selects three other distinct particles a, b, and
theoretical works; for a review see Green (2008). By c firstly, then every dimension j of xi is mutated with
now, it is fairly generally excepted that differential the predefined probability Pr as follows:
adhesion causes cell sorting, although there is
a recent debate on whether additional intercellular ðtþ1Þ
xij ¼ aj þ gðbj  cj Þ;
interactions could contribute to cell sorting and affect
the final sorted pattern as well (Green 2008; Krieg et al.
2008; Voss-Boehme and Deutsch 2010). where the parameter g 2 [0, 2] is predefined and called
the differential weight.
References Otherwise:

ðtþ1Þ ðtÞ
Green JB (2008) Sophistications of cell sorting. Nat Cell Biol xij ¼ xij :
10(4):375–377
Krieg M, Arboleda-Estudillo Y, Puech PH, K€afer J, Graner F, ðtÞ ðtþ1Þ
Of xi and xi , the one that has higher fitness with
M€uller DJ, Heisenberg CP (2008) Tensile forces govern
germ-layer organization in zebrafish. Nat Cell Biol respect to the objective function f(x) is passed on to the
10(4):429–436 next generation.
D 572 Differential Expression Analysis

The iteration process is terminated when some stop distinct cell types. Three major categories of potency
criterion is satisfied. are totipotency, pluripotency, and multipotency. Toti-
potent cells are capable of differentiating into every
cell type of a particular organism in addition to the
Cross-References extraembryonic tissues. An example of totipotent
cells is those produced upon the fusion of a sperm
▶ Identification of Gene Regulatory Networks, and an egg up to the stage of a morula. Pluripotency
Neural Networks describes the potential of a stem cell to differentiate
into cells comprising any of the three germ layers:
ectoderm (gut, lung, etc.), mesoderm (blood, bone,
References muscle, etc.), and endoderm (skin, nervous system,
etc.). Examples of pluripotent stem cells include
Xu R, Venayagamoorthy GK, Wunsch DC II (2007) Modeling of embryonic stem cells and induced pluripotent stem
gene regulatory networks with hybrid differential evolution
(iPS) cells. Multipotent cells are those that can differ-
and particle swarm optimization. Neural Networks
20(8):917–927 entiate into multiple cell lineages, but only to
a restricted family of closely related cell types. Hema-
topoietic stem cells are an example of multipotent
stem cells as they can give rise to all blood cells but
Differential Expression Analysis not other tissue types such as neurons, muscle, or
epithelium.
▶ Gene Expression Biomarkers, Ranking
▶ Relative Expression Analysis
Cross-References

▶ Single Cell Assay, Mesenchymal Stem Cells


Differential Expression of Homologous
Genes

▶ Genomic Imprinting
Diffuse Division

Differential-Difference Equations ▶ Modeling, Cell Division and Proliferation

▶ Dynamical Systems Theory, Delay Differential


Equations

Diffusion Approximation to Chemical


Master Equation
Differentiation Potency
▶ Stochastic Processes, Fokker-Planck Equation
Steven D. Rhodes
School of Medicine, Indiana University, Indianapolis,
IN, USA

Diffusion Driven Lattice-Gas Model for


Definition Translation
Differentiation potency is the property of a particular ▶ Stochastic Modeling of Translation Elongation and
cell, such as a stem cell, to give rise to multiple Termination
Digital Organism 573 D
• Euclidean distance (Table 2)
Diffusion Processes qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
DE ðp; q Þ ¼ ðx  sÞ2 þ ðy  t Þ2 (2)
▶ Stochastic Processes, Fokker-Planck Equation

• Chessboard distance (8-connectivity) (Table 3)

Digital Metrics
D8 ðp; qÞ ¼ maxðjx  sj; jy  tjÞ (3) D
Virginio Cantoni, Riccardo Gatti and Luca Lombardi
Department of Computer Engineering and Systems
Science, University of Pavia, Pavia, Italy These definitions can be easily extended to the 3D
space.

Definition

Given two pixels p and q in a bi-dimensional space, Cross-References


with coordinates (x, y) and (s, t), the following dis-
tances are defined as follows: ▶ Distance Transform and Travel Depth
• City block distance (4-connectivity) (Table 1)

D4 ðp; qÞ ¼ jx  sj þ jy  tj (1)

Digital Metrics, Table 1 City block distance: pixels classifi-


Digital Organism
cation in a 5  5 neighborhood
4 3 2 3 4
Christoph Adami
3 2 1 2 3 Department of Microbiology and Molecular
2 1 0 1 2 Genetics, Michigan State University, East Lansing,
3 2 1 2 3 MI, USA
4 3 2 3 4

Synonyms
Digital Metrics, Table 2 Euclidean distance: pixels classifica-
tion in a 5  5 neighborhood Avidian
2√2 √5 2 √5 2√2
√5 √2 1 √2 √5
2 1 0 1 2
√5 √2 1 √2 √5 Definition
2√2 √5 2 √5 2√2
A digital organism is a self-replicating computer
program, usually within the Tierra or Avida evolution
Digital Metrics, Table 3 Chessboard distance: pixels classifi- platforms.
cation in a 5  5 neighborhood
2 2 2 2 2
2 1 1 1 2
2 1 0 1 2 Cross-References
2 1 1 1 2
2 2 2 2 2 ▶ Artificial Evolution
D 574 Dimer

Dimer Directed Acyclic Network

Xiaoping Liu ▶ Directed Acyclic Graph


Institute of Systems Biology, Shanghai University,
Shanghai, China

Directory
Definition
▶ Workspace
A dimer is a molecule consisting of two subunits called
monomers. In biochemistry and molecular biology,
dimers of proteins or nucleic acids are often observed.
If the two subunits constituting a dimer are the same Disassembly of the Pre-initiation
monomers, the dimer is called a homodimer. If the two Complex
subunits constituting a dimer are different monomers,
the dimer is called a heterodimer. ▶ PIC Disassembly

Directed Acyclic Graph


Discontinuous Epitope
Lin Wang
School of Computer Science and Information Ramachandran Srinivasan
Engineering, Tianjin University of Science and G.N. Ramachandran Knowledge Centre for Genome
Technology, Tianjin, China Informatics, Institute of Genomics and Integrative
Biology, Delhi, India

Synonyms
Synonyms
Acyclic digraph; Directed acyclic network
Conformational epitope

Definition
Definition
A directed graph, or digraph, is a graph with directions
assigned to its edges. A digraph is usually denoted by Epitopes whose residues are distantly placed in the
D ¼ (V, A) where V¼fv1 ;    ; vn g is a finite set of sequence brought together by physicochemical folding
nodes and A is a set of ordered pairs of nodes called constitute discontinuous epitopes. The epitope struc-
arcs. In a digraph, a cycle C¼fc1 ;    ; ck g of D is ture is defined by protein folding process when the
a closed and non-repetitive sequence of nodes in V residues forming a discontinuous epitope are juxta-
such that ðcj ; cjþ1 Þ 2 A; j ¼ 1;    ; k  1, c1 ¼ ck , posed, enabling the antibody to recognize its three-
and ci 6¼ cj ; i; j ¼ 1;    ; k  1. A directed acyclic dimensional structure.
graph is a digraph without any cycles.

References
Discrete Model
Zhang XS (2000) Neural networks in optimization. Kluwer,
Dordrecht ▶ Logical Model
Disease Databases 575 D
References
Disease Classification or Discrimination
Baek S, Tsai C-A, Chen JJ (2009) Development of biomarker
classifiers from high-dimensional data. Brief Bioinform
James J. Chen
10:537–546
U.S. Food and Drug Administration, National Center Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH,
for Toxicological Research (HFT-20), Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP,
Jefferson, AR, USA Poggio T, Gerald W, Loda M, Lander ES, Golub TR
(2001) Multiclass cancer diagnosis using tumor gene expres-
sion signatures. Proc Natl Acad Sci 98:15149–15154
D
Synonyms

Disease identification; Disease taxonomy Disease Databases

Jingky Lozano-K€uhne
Definition Department of Public Health, University of Oxford,
Oxford, UK
Disease classification is to systematically group dis-
eases into classes in a hierarchical structure according
to characteristics of disease: etiology, pathology, phys- Definition
iology, prognosis, and combinations of these. Every
disease is grouped into one and only one class. Disease Disease databases are resources containing informa-
discrimination is used to identify a set of features that tion on diseases, syndromes, and other medical
classy diseases into classes. Disease identification is to conditions. They may provide specific information on
examine and classify diseases into specific classes. the signs and symptoms, risk factors, treatment
Disease taxonomy is the practice and science of clas- regimen, and/or results of studies done on known
sification of diseases. Disease taxonomy is diseases and medical conditions. Diseases databases
a completing list of all disease types in the classifica- are available both in printed and online publications.
tion system. Nowadays, online resources have already supplanted
International Statistical Classification of Diseases many printed publications. Selected online disease
and Related Health Problems is a system of codes for databases used in systems biology researches are
classifying diseases and health problems published by presented below.
the World Health Organization. The Medical Dictio-
nary for Regulatory Activities is a clinically validated
international medical terminology used by the biophar- Characteristics
maceutical industry and is used as the adverse event
classification dictionary endorsed by the ICH. Classification of Disease Databases
Molecular classification of diseases based on geno- A disease database can be categorized as general
mic and proteomic profiling is used within the field of or specialized database depending on its scope.
systems biology. Molecular classification uses statisti- General disease databases contain a wider scope of
cal and machine learning methods to identify molecu- diseases and medical conditions, while specialized
lar markers of a specific disease or to develop databases are limited to certain types of diseases
prediction models to classify disease types (Baek such as cancer, tropical diseases, or genetic diseases.
et al. 2009). Classification models select a set of Many online disease databases such as the
molecular features to discriminate between different “MedlinePlus,” “Diseases Database,” and the
types of disease or between disease and normal groups “Online Mendelian Inheritance in Man (OMIM)”
(Ramaswamy et al. 2001). The selected molecular are publicly accessible for free. Some databases are
features, individually or as a set, may be further devel- maintained by universities and government institu-
oped to be probable biomarkers or valid biomarkers for tions, while others are maintained by private organi-
disease classification or disease discrimination. zations or interest groups.
D 576 Disease Databases

General Disease Databases registration is required for authors or owners who


1. MedlinePlus (http://www.nlm.nih.gov/medlineplus/) wish to submit documents for sharing to enable
is an online publication of the United States National them to upload their files to the Website. All sub-
Library of Medicine (NLM) that provides informa- mitted documents are first reviewed by the
tion on diseases, health conditions, treatments, and OpenMED editor before it is posted online for free
wellness issues. This database provides reliable and public access. Non-English contributed documents
up-to-date health information for free. The health are also accepted provided that the abstract and
topics on the Website are categorized by body loca- keywords are in English. OpenMED is hosted by
tion/system, disorders or conditions, diagnosis and the Bibliographic Informatics Division of National
therapy, demographic groups, and health and well- Informatics Centre in India (OpenMED@NIC
ness issues. One can find meaning of medical terms, 2011).
search information on a disease, or view medical There are other online databases that may be useful
illustrations and videos on the Website. It is linked in doing research on diseases such as the
to the ▶ MEDLINE/▶ PubMed database for further MediLexicon (http://www.medilexicon.com/), the
information on researches on a certain disease and NCBI Biosystems Database (http://www.ncbi.nlm.
other related topics of interest such as systems biol- nih.gov/biosystems/), Free Medical Journals (http://
ogy (National Library of Medicine 2011). www.freemedicaljournals.com/), GPnotebook (http://
2. Diseases Database (http://www.diseasesdatabase. www.gpnotebook.co.uk), Jeghers Medical Index
com) is one of the free Websites that provides (http://www.jeghers.com/), SciVerse Scopus (http://
a medical textbook-like index and search options www.info.sciverse.com/scopus/), and also country-
about human diseases, medical disorders, signs and specific databases like the Indian Biomedical Journals
symptoms, medications, and other information. It Database (http://medind.nic.in/imvw/).
has dictionary-type definitions of terms which are
linked to the National Library of Medicine’s Specialized Disease Databases
Unified Medical Language System. It also has 1. Prion Disease Database or PDDB (http://prion.
subject-specific links to other Web resources. This systemsbiology.net) is a public database for systems
cross-referenced database is funded by the Medical biology research on Prion disease, a fatal neurode-
Object Oriented Software Enterprises Ltd and has generative disorder found in humans and animals.
been designed for physicians and other health The database is a product of the joint effort between
workers and students. Researchers will find general the Institute for Systems Biology in Seattle, Wash-
knowledge about their disease of interest in the data- ington and the McLaughlin Research Institute in
base and numerous links to other related databases Great Falls, Montana, USA. It contains genetic
and references. The Website follows the Health on data, microarray data, and other datasets that are
the Net (HON) code of conduct for medical and useful for genetics and systems biology studies
health Websites and is edited by Dr. Malcolm H. related to Prion disease. The information available
Duncan who is also the Director of the database’s in the PDDB are collected from public sources and
sponsoring company (Duncan 2011). collaborating laboratories. PDDB users can also
3. OpenMED@NIC (http://openmed.nic.in/) is an share, store, and also analyze their own data through
open-access database which contains archives of the Website. Analysis software tools are provided on
peer-reviewed scientific articles and technical doc- the Website. PDDB is powered by the GDxBase
uments in the field of Medical and Allied Sciences. software which is also used in different disease
It includes a big collection of published articles Websites (Gehlenborg et al. 2009).
about diseases which is indexed by year and subject. 2. The Human microRNA Disease Database (HMDD,
The site also includes conference proceedings and http://202.38.126.151/hmdd/mirna/md/) is a data-
journal articles on biological phenomena, cells, base containing information on microRNAs
enzymes, and genetic processes among others. No (▶ Cell Cycle Regulation, microRNAs) and their
registration is required for searching the archive and disease associations. It contains microRNA names,
downloading documents. However, a one-time diseases, dysfunction evidences, literature citations,
Disease Marker Identification 577 D
and tissue expression pictures in some cases. The Cross-References
database is not only useful for studying the associ-
ation of microRNAs with diseases but also for ▶ Apoptosis
investigating their roles in biological processes ▶ Cell Cycle Regulation, microRNAs
such as tissue differentiation, embryonic develop- ▶ MEDLINE
ment, cell growth, proliferation, and ▶ Apoptosis ▶ Mendelian Traits
(Lu et al. 2008). ▶ MEDLINE and PubMed
3. Online Mendelian Inheritance in Man (OMIM,
http://www.ncbi.nlm.nih.gov/omim/) is a public D
database primarily designed for physicians, health References
professionals, students, and researchers concerned
with genetic disorders. OMIM is the online version Baxevanis AD (2002) Searching Online Mendelian Inheritance
in Man (OMIM) for information for genetic loci involved in
of the database of ▶ Mendelian traits and disorders
human disease. Curr Protoc Bioinform 1.2.1–1.2.15
which was initiated by Dr. Victor A. McKusick in doi:10.1002/0471250953.bi0102s00/
the early 1960s. The printed versions entitled Bibliographic Informatics Division of National Informatics
“Mendelian Inheritance in Man (MIM)” were Centre, India (2011) OpenMED@NIC. World Wide Web
URL: http://openmed.nic.in/. Accessed 18 May 2011
published between 1966 and 1998. It was made
Duncan MH (ed) (2011) Diseases database. World Wide Web
available online starting 1987 through collaborative URL: http://www.diseasesdatabase.com. Accessed 18 May
efforts of the National Library of Medicine and the 2011
Willian H. Welch Medical Library at Johns Gehlenborg N, Hwang D, Lee IY, Yoo H, Baxter D, Petritis B,
Pitstick R, Marzolf B, Dearmond SJ, Carlson GA, Hood L
Hopkins University and was further developed in
(2009) The Prion disease database: a comprehensive
1995 by the National Center for Biotechnology and transcriptome resource for systems biology research in
Information (NCBI). OMIM contains information prion diseases. Database (Oxford): bap011. Epub 2009
on Mendelian traits and disorders and more than Sep 17. doi:10.1093/database/bap011
Lu M, Zhang Q, Deng M, Miao J, Guo Y, Gao W, Cui
12,000 genes. It also contains full citation informa-
Q (2008) An analysis of human microRNA and disease
tion, pictures of disorders (where appropriate), associations. PLoS ONE 3(10):e3420. doi:10.1371/journal.
and links to other genetic resources. Its content pone.0003420
is authored and edited at the McKusick-Nathans McKusick-Nathans Institute of Genetic Medicine, Johns
Hopkins University (Baltimore, MD), National Center for
Institute of Genetic Medicine, Johns Hopkins
Biotechnology Information, National Library of Medicine
University School of Medicine (OMIM 2011). (Bethesda, MD) (2011). Online Mendelian Inheritance in
Baxevanis (2002) published an article about the Man, OMIM (TM). World Wide Web URL: http://www.
details of OMIM’s layout of records and specific ncbi.nlm.nih.gov/omim/. Accessed 18 May 2011
National Library of Medicine (2011) About MedlinePlus. World
data entries as a guide for searching information for
Wide Web URL: http://www.nlm.nih.gov/medlineplus/
genes involved in human diseases. aboutmedlineplus.html. Accessed 18 May 2011
There are numerous databases available online for
specific diseases. It would be a big challenge to count
and index these continuously growing resources. What
was mentioned above are just the commonly accessed
databases in systems biology research. To name addi- Disease Identification
tional ones, there is the Pathogenic Pathway Database
for Periodontitis (http://bio-omix.tmd.ac.jp/disease/ ▶ Disease Classification or Discrimination
perio/), the Integrated Clinical Omics Database
(http://omics.tmd.ac.jp/icod_en/portal/top.do), the
Kyoto Encyclopedia of Genes and Genome (KEGG,
http://www.genome.jp/kegg/), the Rare Metabolic
Diseases Database (RAMEDIS, http://www.ramedis. Disease Marker Identification
de), BIOBASE Biological Databases (http://www.
biobase-international.com/), and many more. ▶ Host–Pathogen Systems, Target Discovery
D 578 Disease Mechanism Networks

Disease Ontologies
Disease Mechanism Networks Disease ontologies are biomedical ontologies provid-
ing coverage for the domain of diseases, disorders,
▶ Ontology Analysis of Biological Networks illness, etc. The degree to which these words are
▶ Functional/Signature Network Module for Target synonymous is subject to debate (Ceusters and
Pathway/Gene Discovery Smith 2010), but disease ontologies generally cover
conditions understood as or suspected of being
a deviation from a healthy status and, less frequently,
diagnostic criteria for these conditions. Some
Disease Ontology disease ontologies also cover the manifestations of
these conditions, that is, the signs and symptoms
Olivier Bodenreider associated with them. However, the relations between
Lister Hill National Center for Biomedical conditions and their manifestations are usually not
Communications, US National Library of Medicine, recorded in ontologies, as such relations are not def-
Bethesda, MD, USA initional for most diseases. In fact, except for the
so-called pathognomonic manifestations, there is
a probabilistic, not systematic relation between
Definition a manifestation and a condition. Phenotypes are the
observable characteristics of organisms resulting
Background from the genetic makeup of a particular organism.
Disease ontologies can be traced back to the seven- There is partial overlap between phenotypes and dis-
teenth century, when health authorities in London used ease manifestations, and phenotypes may be covered
a standard list of about 200 causes of death to compile by disease ontologies.
accurate health statistics known as the Bills of Mortal- The principal use of disease ontologies is to
ity (Bodenreider 2008). This list was later integrated support the annotation of diseases in biomedical
into the International Classification of Diseases, the datasets, including the curation of knowledge bases,
11th revision of which is currently under active devel- the clinical documentation of electronic health
opment. Among the several hundredths of biomedical records and the indexing of the biomedical literature.
ontologies currently available, a few dozen provide Disease ontologies are also used for aggregation
coverage of diseases. A selection of these ontologies purposes (e.g., for grouping myocardial infarction
is presented in this brief review. and mitral stenosis under cardiovascular diseases),
as well as for clinical decision support (e.g., drugs
Biomedical Ontologies contra-indicated with asthma should not be pre-
Biomedical ontologies represent the properties of scribed to patients diagnosed with specific forms of
biomedical entities and their relations to other bio- asthma, such as seasonal asthma and occupational
medical entities. As such, biomedical ontologies are asthma).
artifacts used to represent and share knowledge about
the biomedical domain (Bodenreider and Stevens
2006). More specifically, biomedical ontologies tend Characteristics
to focus on definitional knowledge (i.e., what is
always true of biomedical entities), as opposed to Existing Disease Ontologies and their
assertional knowledge, usually found in knowledge Characteristics
bases. Ontologies also differ from terminologies, In this section, we review 17 ontologies, which roughly
whose focus is purely naming, and from thesauri, in qualify as disease ontologies according to the defini-
which knowledge is usually organized for a specific tion above. The list of ontologies is shown in Table 1,
purpose (e.g., information retrieval). Despite these along with a URL from which more information can be
differences, the term “ontology” is often used loosely, obtained. A list of salient characteristics for disease
as an umbrella name for these various kinds of ontologies is presented in Table 2. Finally, for each
artifacts. ontology, we provide a brief description and a list of
Disease Ontology 579 D
Disease Ontology, Table 1 List of disease ontologies
DO Disease Ontology – http://diseaseontology.sourceforge.net/
DSM Diagnostic and Statistical Manual of Mental Disorders – http://www.psych.org/
HPO Human Phenotype Ontology – http://www.human-phenotype-ontology.org/
ICD International Classification of Diseases – http://www.who.int/classifications/icd/en/
ICPC International Classification of Primary Care – http://www.globalfamilydoctor.com/wicc/
IDO Infectious Disease Ontology – http://www.infectiousdiseaseontology.org/
LOINC Logical Observation Identifiers Names and Codes – http://loinc.org
MEDCIN MEDCIN – http://www.medicomp.com/ D
MedDRA Medical Dictionary for Regulatory Activities – http://www.meddramsso.com/
MeSH Medical Subject Headings – http://www.nlm.nih.gov/mesh/
MPATH Mouse Pathology Ontology – http://www.pathbase.net/
MPO Mammalian Phenotype Ontology – http://www.informatics.jax.org/searches/MP_form.shtml
NCI Thes. NCI Thesaurus – http://ncit.nci.nih.gov/
NDF-RT National Drug File-Reference Terminology – http://evs.nci.nih.gov/ftp1/NDF-RT/
OMIM Online Mendelian Inheritance in Man – http://www.ncbi.nlm.nih.gov/omim/
PATO Phenotypic Quality Ontology – http://obofoundry.org/wiki/index.php/PATO:Main_Page
SNOMED CT SNOMED CT – http://www.ihtsdo.org/

Disease Ontology, Table 2 List of salient characteristics for disease ontologies


Component The disease ontology is a component of a broader ontology
Specialized The disease ontology only covers a specific group of diseases
Human The disease ontology mainly covers human diseases
OBO The disease ontology is part of the Open Biomedical Ontologies (OBO) family of ontologies
Clinical The disease ontology is mainly used in clinical practice
Definitions The disease ontology includes definitions (textual or logical)
Translations The disease ontology is available in other languages than English
Publicly available The disease ontology is publicly available
X-ref The disease ontology has cross-references to other disease ontologies (natively or through the UMLS)

characteristics summarized in Table 3, with additional annotation of the genetic diseases listed in OMIM.
notes in Table 4. Developed by a consortium including Charité Hos-
• Disease Ontology (DO): Controlled terminology pital (Berlin) and the University of Cambridge
originally created for annotation purposes as part (UK).
of the NuGene project at Northwestern University. • International Classification of Diseases (ICD):
Still under development. Classification from the World Health Organization
• Diagnostic and Statistical Manual of Mental Dis- (WHO) family of health classifications, with many
orders (DSM): Standard classification of mental local adaptations. ICD9-CM, developed by the Cen-
disorders in the United States, developed by the ter for Medicare & Medicaid Services (CMS) for
American Psychiatry Association and used by use in the US, includes clinical modifications.
a wide range of mental health professionals across Broad coverage of diseases and health problems.
clinical settings. • International Classification of Primary Care
• Human Phenotype Ontology (HPO): Controlled (ICPC): Classification of reasons for encounter,
vocabulary for the phenotypic features encountered diagnoses or problems, and process of care. Devel-
in human hereditary and other diseases, used for the oped by the World Organization of Family
D 580 Disease Ontology

Disease Ontology, Table 3 Some characteristics of 17 disease ontologies (see Table 2 for the definitions of the characteristics)
Component Specialized Human OBO Clinical Definitions Translations Publicly available X-ref
DO No No Yes Yes No Textual No Yes Native
DSM No Yes Yes No Yes None* No No UMLS
HPO No Yes Yes Yes No* Textual No Yes Native
ICD No No Yes No Yes None Yes No UMLS
ICPC No Yes* Yes No Yes None Yes Yes Native to ICD,
UMLS
IDO No Yes Yes Yes No Textual No Yes None
LOINC No Yes Yes No Yes Logical* Yes Yes UMLS
MEDCIN Yes No Yes No Yes None No No UMLS
MedDRA No Yes* Yes No Yes None Yes No UMLS
MeSH Yes No Yes* No No Textual Yes Yes UMLS
MPATH No No No Yes No Textual No Yes None
MPO No No No Yes No Textual No Yes None
NCI Thes. Yes Yes* Yes No Yes* Logical, No Yes Native, UMLS
textual
NDF-RT Yes No Yes No Yes Logical* No Yes UMLS
OMIM No Yes Yes No Yes Textual* No Yes UMLS
PATO No No Yes* Yes No Logical, No Yes None
textual
SNOMED CT Yes No Yes No Yes Logical Yes Yes* UMLS

Refer to additional notes in Table 4

Disease Ontology, Table 4 Additional notes on Table 3 items


DSM Provides diagnostic criteria for many mental disorders
HPO PhenExplorer is a clinical diagnostic tool based on HPO annotations of OMIM diseases
ICPC Primary care can be considered a specialty
IDO IDO borrows concepts from other OBO ontologies
LOINC Although LOINC does not use description logics (DL), its organization is close to DL representation
MedDRA MedDRA is not restricted to any medical specialties, but focuses on adverse events
MeSH MeSH covers, but is not limited to human diseases
NCI Thes. NCIt essentially covers cancers and cancer-related diseases; used in clinical research
NDF-RT Weak logical definitions (primitive classes)
OMIM OMIM contains extensive narrative descriptions more than definitions
PATO PATO represents all kinds of phenotypes, including in humans
SNOMED CT Freely available for use in the IHTSDO member countries

Doctors (Wonca). Coverage of diseases and • Logical Observation Identifiers Names and Codes
health problems at the level of detail required for (LOINC): Set of names and codes for laboratory
primary care. and other clinical observations (elements of clinical
• Infectious Disease Ontology (IDO): Set of ontologies phenotypes). Developed at the Regenstrief Institute.
for specific infectious diseases, including malaria, Coverage restricted to clinical observations.
influenza, and tuberculosis, sharing a core ontology. • MEDCIN: Developed by Medicomp Systems,
Covers entities relevant to both biomedical and clini- MEDCIN is a vocabulary for clinical documenta-
cal aspects of most infectious diseases. Developed tion and a knowledge base for clinical decision
by the Infectious Disease Ontology Consortium. support. It provides coverage for elements
Disease Ontology 581 D
including symptoms, medical history, physical Disease Ontology, Table 5 Ontology repositories providing
examination, tests, and diagnoses. coverage of disease entities
• MedDRA: Created by a pharmaceutical industry trade UMLS Unified Medical Language System – https://uts.nlm.
group, the Medical Dictionary for Regulatory Activ- nih.gov
ities (MedDRA) is a medical terminology used to BioPortal NCBO BioPortal – http://bioportal.bioontology.org/
classify adverse event information associated with
the use of medications, vaccines, and medical devices,
especially for reporting to regulatory agencies. Terminology Standard Development Organization
• Medical Subject Headings (MeSH): Controlled D
(IHTSDO) for use in electronic health records and
vocabulary developed by the US National Library adopted by seventeen countries to date. Broad cov-
of Medicine for the indexing and retrieval of the erage including diseases.
biomedical literature, especially in the MEDLINE
bibliographic database. Broad coverage including Ontology Repositories
diseases. Many of the disease ontologies listed above are present
• Pathbase pathology ontology (MPATH): Ontology in ontology repositories (Table 5), which offer a con-
of mutant and transgenic mouse pathology pheno- venient way of integrating disease resources annotated
types used for the annotation of Pathbase, to different ontologies. The Unified Medical Language
a repository of histopathology images. Developed System (UMLS) is a terminology integration system
by the Pathbase European Consortium. developed by the US National Library of Medicine.
• Mammalian Phenotype Ontology (MPO): Con- The UMLS establishes a correspondence among terms
trolled vocabulary for the annotation of mammalian from different terminologies for a given biomedical
phenotypes, currently used for the annotation of entity. It integrates a number of the terminologies
phenotypic data in mouse and rat databases. Devel- presented above, as well as many other biomedical
oped at the Jackson Laboratory. Coverage restricted terminologies. Developed by the National Center for
to phenotypes. Biomedical Ontology (NCBO), The BioPortal is
• NCI Thesaurus (NCIt): Controlled vocabulary another such repository, which offers mapping
developed by the National Cancer Institute to sup- among terms from different ontologies. The BioPortal
port the integration of information related to cancer provides systematic coverage of the ontologies from
research. Broad coverage including diseases. the Open Biomedical Ontologies (OBO) family, as
• National Drug File-Reference Terminology well as many other ontologies. The NCBO also
(NDF-RT): Reference terminology for medications, indexes resources, such as clinical trials and gene
providing information including pharmacologic expression databases, in reference to ontology entities
class, therapeutic intent, mechanism of action, and from the BioPortal.
physiologic effect. Produced by the US Department
of Veterans Affairs, Veterans Health Administra-
Acknowledgments This research was supported in part by the
tion (VHA). Coverage of diseases through their Intramural Research Program of the National Institutes of Health
relations to drugs (therapeutic intent). (NIH), National Library of Medicine (NLM). This manuscript is
• Online Mendelian Inheritance in Man (OMIM): loosely based on a study of desiderata for disease ontologies,
coauthored with Dr. Anita Burgun and presented to the First
Knowledge base on human genetic diseases devel- International Conference on Biomedical Ontology in 2009.
oped at Johns Hopkins University and available
through the NCBI Entrez system. Coverage
restricted to genetic diseases. References
• Phenotypic Quality Ontology (PATO): Ontology of
phenotypic qualities, intended for use in a number Bodenreider O (2008) Biomedical ontologies in action: role in
of applications, primarily defining composite phe- knowledge management, data integration and decision sup-
notypes and phenotype annotation. Coverage port. Methods Inf Med 47(suppl 1):67–79
Bodenreider O, Stevens R (2006) Bio-ontologies: current trends
restricted to phenotypes.
and future directions. Brief Bioinform 7(3):256–274
• SNOMED CT: The largest clinical terminology, Ceusters W, Smith B (2010) Foundations for a realist ontology of
maintained by the International Health mental disease. J Biomed Semantics 1(1):10
D 582 Disease Progression

If kin ¼ koutS, synthesis and elimination cancel each


Disease Progression other out and the status is in homeostasis. A disease
process (dp) may affect the rate of synthesis or elimi-
▶ Disease Progression Modeling nation. This may be modeled by introducing temporal
change in the rate of synthesis

dkin =dt ¼ fdp; synth ðkin ; tÞ (2)

Disease Progression Modeling or in the rate of elimination

Anyela Camargo1 and Jan T. Kim2 dkout = dt ¼ fdp; elim ðkout ; tÞ: (3)
1
Institute of Biological, Environmental and Rural
Sciences, Aberystwyth University, Aberystwyth, Models of disease progression may provide insight into
Ceredigion, UK the biology behind the evolution of a target disease, and
2
School of Computing Sciences, University of East help identify the best treatment that either control or
Anglia, Norwich, Norfolk, UK stop disease progression. The selection of the best
course of action is a function of the effect of a given
treatment on disease status over time. It is therefore
Synonyms paramount that disease progression models include bio-
logical aspects (i.e., genetic, transcription, and cross
Disease system analysis; Disease progression talks) that are involved in the evolution of the disease.

Definition References

Disease progression describes the change of disease Chan PLS, Holford NHG (2001) Drug treatment effects on disease
status over time as function of disease process and progression. Annu Rev Pharmacol Toxicol 41(1):625–659
Dayneka NL, Garg V, Jusko WJ (1993) Comparison of
treatment effects. The status of a subject, such as
four basic models of indirect pharmacodynamic responses.
a patient, may be represented by a numerical J Pharmacokinet Pharmacodynamics 21:457–478.
quantity S (Chan and Holford 2001). In practice, ↗ doi:10.1007/BF01061691
biomarkers are frequently used as a proxy to monitor
disease status. In a healthy subject, mechanisms of
homeostasis ensure that the status S is relatively con- Disease Risk Assessment
stant and remains within a normal range. The change in
disease status shows minimal variation over time t, Sanjeev Kumar1 and Shipra Agrawal2
which mathematically is expressed as dS/dt  0 1
BioCOS Life Sciences Private Limited, Bangalore,
A disease process is characterized by a change of S, Karnataka, India
taking it outside of the normal range. As disease pro- 2
BioCOS Life Sciences Pvt. Limited, Institute of
gresses, the status S continues to move further away Bioinformatics and Applied Biotechnology,
from the normal range, or in terms of change over time, Bangalore, Karnataka, India
dS/dt 6¼ 0.
The change of status S over time can be described
by mathematical expressions that model biological Definition
processes. As an example from Dayneka et al.
(1993), change can be explained in terms of Disease risk assessment could be defined as the sys-
a constant rate of synthesis kin and a first-order process tematic evaluation and identification of ▶ risk factors
of decay or elimination kout: responsible for a disease, estimation of risk levels
and finding possible ways to counter the onset and
dS=dt ¼ kin  kout S: (1) progression of a disease within the population.
Disease Risk Assessment 583 D
Disease Risk Assessment, A
Fig. 1 Factors involved in
disease risk assessments: The
Evidence-based
information from evidence- Data
based data (demographic, SNP
population-based data, and markers
family history) is processed to Demographic Population Family
get the SNP markers. The SNP -based history
marker data is used to predict Disease
Disease Risk
the disease risk. An integrated prediction
Assessment
approach from proteomics and tools D
genomics provide clues to
identify predictive biomarkers
(expression), which also help Integrated Predictive
in disease risk assessment approach using biomarkers
B
Proteomics
and Genomics

The following facts are very important and considered investigations, are conducted on a group of people
for disease risk assessment (Fig. 1): to gather evidences on a probable cause for
• Estimation of risk factors involves the evaluation of a disease.
variables and factors which can be indicative of the
likelihood development of a particular disease. For
example, the risk factors such as age, smoking, Disease Risk Assessment is a Challenge for
elevated levels of cholesterol and LDL, low levels Complex Diseases
of HDL, family history of premature coronary heart
disease, etc. are indications of incurring cardiovas- Complex diseases result from the combined effects of
cular disease. multiple genetic and environmental factors and each
• As the current focus has shifted toward preventive such factor alone has only marginal contribution to the
intervention than the disease cure, the risk assess- disease. Type 2 diabetes, coronary heart disease, and
ment involves genotyping of tissues to identify myocardial infarctions are examples of such diseases
“SNP markers” associated with a genetic disease. where genetic profiling has led to limited predictive
The SNP/genetic markers could be used to estimate values. Complete knowledge of causal mechanisms,
the disease susceptibility in a new born/unborn which involves identification of all the possible
child. This becomes useful in administrating the combinations of the causal factors (gene–gene interac-
preventive treatment to the child. tions, gene–environmental interactions), is required.
• Integrated approaches in proteomics and genomics Monogenic diseases like Huntington disease, PKU,
can create a rich resource of “predictive and hereditary cancers where identification of causal
biomarkers” that is, signature molecules, which mechanisms and risk prediction in these diseases are
are biochemically measurable. Such biomarkers comparatively straightforward are caused by DNA
can help in secondary disease prediction (i.e., variations in a single gene. (Teramoto et al. 2008;
whether a diabetic person has a susceptibility of Wei et al. 2009).
developing cardiovascular disease).
• Evidence-based data (▶ Evidence-based Medicine)
such as family history, demographic, and clinical References
data can be used to develop disease prediction tools.
Such predictive models having variables (factors Teramoto T, Ohashi Y, Nakaya N, Yokoyama S, Mizuno K,
Nakamura H, MEGA Study Group (2008) Practical risk
which may influence the disease) gathered from
prediction tools for coronary heart disease in mild to moder-
cohort-based studies can be used in these studies. ate hypercholesterolemia in Japan: originated from the
Cohort-based studies which are analytical MEGA study data. Circ J 72(10):1569–75
D 584 Disease System Analysis

Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C, expression and regulation and also establishes phylo-
Frackleton E, Hou C, Glessner JT, Chiavacci R, Stanley C, genetic relationships between different organisms.
Monos D, Grant SF, Polychronakos C and
Hakonarson H (2009) From disease association to risk An upcoming extension of genomic studies is
assessment: an optimistic view from genome-wide associa- transcriptomics which involves identification of
tion studies on type 1 diabetes. PLoS Genetics 5(10): actively expressing genes at any given point of time.
e1000678 Microarray technology is generally used to quantify
the expression levels of various mRNAs and RNA-
Seq provides details about the nucleotide sequence of
transcripts being expressed. Proteomics deals with
Disease System Analysis the comprehensive analysis of the entire complement
of proteins present in a given system. Mass spectrom-
▶ Disease Progression Modeling etry is the main tool used for proteomic studies and it
has revolutionized this field due to its extremely high
sensitivity and diverse qualitative and quantitative
applications. Metabolomics involves the study of the
Disease System, Malaria metabolites (metabolic intermediates, hormones, sig-
naling molecules, etc.) present in a given system at
Pragyan Acharya, Manish Grover and Utpal Tatu a particular time and provides an instantaneous
Department of Biochemistry, Indian Institute of glimpse into the physiology of any system. Bioinfor-
Science, Bangalore, Karnataka, India matics makes use of computational and statistical
approaches to develop algorithms and databases
which help to integrate and analyze the information
Synonyms obtained from genomic, proteomic, and metabolomic
studies.
Bioinformatics; Genomics; Metabolomics; Parasitol-
ogy; Proteomics; Transcriptomics
Characteristics

Definition Malaria
Malaria is a debilitating disease affecting almost
Malaria continues to be an enormous threat to the 200–300 million people worldwide annually. It is
society, and the emergence of drug-resistant parasite caused by the protozoan parasite belonging to the
strains has severely challenged the methods of disease genus Plasmodium, which is carried by the female
prevention and control. In order to combat this situa- Anopheles mosquito and transmitted by mosquito
tion, better understanding of the disease is required to bites in humans. Malaria inflicts mostly children and
identify new drug targets and vaccine candidates. is centered around the tropical and subtropical regions
Hence, we need to expand our research horizon and of the world, being most widespread in Africa, some
move beyond the regular reductionist approach of parts of South America, and Asian countries including
studying a single gene/protein or a biochemical path- India. In humans, malaria is caused by infection of the
way. With the help of high-throughput technologies red blood cell with any of the five species of the
and interdisciplinary tools, a holistic approach toward parasite – Plasmodium falciparum, Plasmodium
the understanding of the disease has been developed vivax, Plasmodium malariae, Plasmodium ovale, and
(Fig. 1). These initiatives commonly referred to as Plasmodium knowlesi. Of these, maximum mortality
“systems biology” are further classified into geno- and morbidity is caused by P. falciparum followed by
mics, proteomics, and metabolomics based on the P. vivax. All types of malaria are associated with
component of the biological system they deal with. febrile episodes with periodic paroxysms and chills in
Genomics is concerned with the study of genomes of addition to nausea, headache, and general weakness.
organisms and majorly involves DNA sequencing. It Population groups such as children, pregnant women,
provides insights into the mechanism of gene HIV/AIDS-infected individuals and travelers to
Disease System, Malaria 585 D

Ring RB
te
zoi C
M ero
Tro
p ho
Lab Cultures +T,+P,+M zo
it
P e
+T,+ +T
Clinical Isolates +T,+P ,+
-P P,+
-T, M
s

+T
yte

,-P
D
t

Metabolomics
toc

on
hiz
pa

Schi
P
Sc

,+
He

+T,+
+T

-P

zo
+T,-P
-T,

ies

Pro

P,+M

nt
ud

teo
St

mic
l
ica

Systems
n

s
cli

Biology of

+T,+
+T,

Mero
-T,-P
Spo
Saliv

+T,-P

Malaria

-P

P,+M
rozo
a

zoite
ry G

Bi
ites

oin
ics
land

for
-T m tom
,-P at
scr
ip ,-P
s

ics -T
n

es e
+T Tra P

yt al
,-P ,+

oc m
+T

et Fe
O -T,-P -T,-P

am nd
oc

G ea
ys +T,-P
+T,-P

al
t

M
acro
and M
Ook
inet Micro tocytes
e Game Human
Mid
gut

Mosquito

Disease System, Malaria, Fig. 1 Life cycle of P. falciparum the particular stage in the parasite’s life cycle (Adapted and
and the systems biology approaches used for understanding the modified from Das A et al., Systems Biology of Malaria: An
disease. “T,” “P,” and “M,” respectively, represent the availabil- Indian Perspective. Biobytes Vol. 5, 2009)
ity of transcriptomic, proteomic, and metabolomic evidence for

malaria endemic areas are at a higher risk for malaria further compounded by the emergence of drug-
due to lower immunity (reviewed by Gracia 2010). resistant strains of the parasite. Both P. falciparum
Malaria is a disease that has been a major research and P. vivax have become resistant to common anti-
focus for several years. Yet, vaccines against malaria malarials and recently new P. falciparum strains
are elusive. The best bet in the development of resistant to artemisinin, one of the most effective
malaria vaccines has been the RTS,S vaccine antimalarial available, have also emerged (reviewed
(Mosquirix) which has been recently demonstrated by Gracia 2010). Traditionally, most studies on
by Joe Cohen, a GlaxoSmithKline research scientist, malaria have focused on the biochemical, cell biolog-
to provide about 50% protection in infants (Olotu ical, and genetics of single gene or protein. In recent
2011). Still, the most popular and effective methods times however, the focus in malaria research has
of controlling malaria are control of its mosquito shifted to the analyses of gene and protein expression
vector and the use of bed nets. However, malaria is at the global level, revealing several interesting fea-
not completely preventable and continues to be an tures of this parasite that may be useful in the discov-
enormous threat to the society. This problem is ery of novel antimalarial drug targets.
D 586 Disease System, Malaria

Systems Biology of Malaria range, parasitemia and hematocrit (reflecting severity


The recent availability of genome sequences of many of disease). The parasite transcriptome profiles thus
pathogens has enabled systems analyses of infectious obtained were then statistically clustered to obtain
diseases. The advantage of using systems biology gene expression patterns for different parasite groups,
approaches is that they allow visualization of if any. The genes were also mapped onto their
a system as a whole in response to its environment. In S. cerevisiae orthologs in order to identify the possible
case of the malaria parasite, functional genomics data, pathways. This study revealed the presence of three
global transcriptome data, and proteome data from distinct physiological clusters of P. falciparum as
both laboratory strains and clinically isolated parasites found within the malaria-infected individuals. These
as well as metabolome data from laboratory strains are corresponded to first, active growth based on glyco-
now available and have made striking revelations lytic metabolism, second, a starvation response accom-
about the physiology of the malaria parasite. panied by metabolism of alternative carbon sources,
and third, an environmental stress response. The gly-
Genomics colytic state closely resembled the ring stages of the
The deadliest form of malaria is caused by parasite in culture; however, the other two states were
P. falciparum, which alone is responsible for about novel and specific to the parasites in vivo, indicating
a million deaths annually. The complete life cycle of that gene expression profiles in the wild may differ
P. falciparum consists of three stages – the mosquito significantly from those in cultures.
stage, the liver stage, and the human blood stages. The
blood stages of the malaria parasite are responsible for Proteomics
most of the pathophysiology associated with malaria. Although interesting, transcriptomes may not reveal
As a result, the blood stages of P. falciparum have the true physiological states of clinical malaria para-
received maximum research focus. The release of the sites. There is increasing awareness about greater
complete genome sequence of P. falciparum in 2002 reliability and accuracy of proteomics over
incited systems biology analyses of the parasite. The transcriptome analyses. It has been demonstrated
22.8 Mb genome of P. falciparum was shown to con- that in many cases transcriptome profiles are not
tain 14 linear chromosomes, a circular plastid-like reflective of the protein complement of a cell, as
genome, and a linear mitochondrial genome (Gardner temporal and spatial differences arise between the
et al. 2002). About 3.9% of the P. falciparum genome two. In fact, studies that correlate the transcriptome
was shown to encode for different families of antigenic and proteome data from laboratory cultures of
determinants essential for parasite virulence. Subse- P. falciparum indicate that there is a significant time
quent transcriptome analysis showed that about 60% delay between the abundance of transcripts and
of the 5,409 predicted open reading frames of the corresponding proteins in the asexual stages of the
parasite were transcriptionally active in the blood parasite (Le Roch et al. 2003). Most recently, the
stages of the parasite (Bozdech et al. 2003). This clinical proteome of P. falciparum and P. vivax has
study utilized 7,462 individual 70 mer oligonucleo- been reported for the first time (Acharya et al. 2009;
tides representing 4,488 of the 5,409 ORFs manually 2011). Although proteomic analyses of the parasite-
annotated by the malaria genome sequencing consor- infected RBC had been carried out earlier, the prote-
tium. The transcriptome analysis of the blood stages omics analysis of clinical parasites isolated directly
revealed for the first time that induction of any parasite from patients is a challenge due to several factors
gene occurred once per cycle and only at a time when it such as low parasite density in the venous blood of
is required. More recently, transcriptome analyses of patients and the presence of abundant host proteins
parasites isolated from malaria patients have been car- that mask identification of low abundant parasite pro-
ried out, which have revealed the existence of distinct teins. This study had identified about 100 proteins
parasite physiologies in the wild (Daily et al. 2007). from the major malaria parasite P. falciparum. The
This study analyzed the transcriptome of malaria par- highlights of the study were the identification of
asites isolated from venous blood samples from 43 several well-known and putative drug targets in
malaria patients from Senegal with a diverse age P. falciparum and the detection of several proteins
Disease System, Malaria 587 D
in clinical parasites that were not expressed by the which Geldanamycin, an Hsp90 inhibitor shown to
laboratory culture of the parasite. This study is the abrogate parasite growth in cultures, may work reiter-
first of its kind since it utilized parasites directly ating the usefulness of such a systems level approach in
isolated from the peripheral blood without culture generating testable hypotheses (Banumathy et al.
adaptation or influence by external factors. 2003). Similarly, another bioinformatic study
addressed the question of PfEMP1 diversity, which is
Metabolomics the major antigenic protein present on the infected
More recently, metabolome analysis of the parasite erythrocyte membrane and a potential vaccine candi-
has been carried out using laboratory cultures. date for malaria. The study was conducted in seven D
Metabolomic analyses in models of infectious dis- genomes using sequence alignment and distance tree
eases have been especially useful in elucidating analysis (Rask et al. 2010). It described multiple novel
metabolic modulation of the host by the parasite and features about PfEMP1 domain organization thereby
may aid in the detection of biomarkers. The most providing a platform for the understanding of PfEMP1
striking revelation of the metabolome analysis of expression and function.
P. falciparum has been the presence of citrate,
aconitate, and a–ketoglutarate, which are metabolic Systems Analysis of Malaria Vector
intermediates of the tricarboxylic acid (TCA) cycle, Malaria eradication programs center around vector
in the parasite asexual stages. The study not only control methods in a large way and thus understanding
provided the evidence for a functional TCA cycle of the biology of the Anopheles mosquito is equally
but also suggested it to be organized in a branched important as that of the parasite biology inside the
pathway rather than the canonical cyclic pathway, human host. Malaria transmission in the wild is largely
with amino acids glutamate and glutamine as the determined by vectorial competence i.e., ability of the
major carbon sources (Olszewski et al. 2010). This particular mosquito species or strain to replicate and
study further highlighted the importance of transmit the parasite to human host. Systems biology
metabolomic technologies in elucidating the architec- approaches have unraveled some aspects of this host-
ture of metabolic networks and identification of novel pathogen interaction.
plausible drug targets. Metabolomic analysis of urine Whole genome sequencing of Anopheles gambiae
and plasma samples from P. berghei–infected mice and other insects has enabled systematic analysis of
(rodent model of malaria) revealed the presence of divergent protein families which have evolved to facil-
a unique biomarker, pipecolic acid, in the urine of itate specific interactions with the parasite. The viru-
malaria-infected mice which was completely absent lence caused by the sporogenic development of the
in urine samples collected from normal mice (Li et al. parasite in the mosquito imposes a fitness cost on the
2008). This suggests the use of metabolic profiling as vector which can be expressed as reduction of mos-
a diagnostic tool in malaria infections. quito survival or decrease in parasite fecundity. Vector
longevity will definitely have a better impact on
Bioinformatics malaria transmission and this is also emphasized by
In addition to the above experimental approaches, bio- the expanded immune repertoire present in the mos-
informatics analysis of Plasmodium genes at a systems quito. Large-scale functional genetic screen of mos-
level has revealed several interesting features of the quito genes pointed out PRRs (putative pattern
parasite. A systems level interactome analysis of chap- recognition receptors), TEP (thioester-containing pro-
erones in Plasmodium falciparum has been reported teins), CTL (C-type lectins) and LRIM (leucine rich-
based on the presence of orthologs of parasite proteins repeat immune proteins) to be especially important in
and established yeast-two-hybrid data (Pavithra et al. this regard (reviewed by Bongfen et al. 2009). This
2007). This study predicted the possible roles of sev- conjecture was further supported by a global proteomic
eral hypothetical proteins based on their interactions study on the saliva of A. gambiae which revealed
with P. falciparum–encoded chaperones and uncovers overrepresentation of proteins involved in signaling
parasite-specific chaperone-dependent pathways. In and immune response functions (Choumet et al.
addition, the study proposed a putative mechanism by 2007). A recent large-scale genomic study addressed
D 588 Disease System, Malaria

the susceptibility of different Anopheles populations to References


infection by human malaria parasites and found that
the exophilic subgroup is more susceptible than the Acharya P, Pallavi R, Chandran S, Chakravarti H et al
(2009) A glimpse into the clinical proteome of human
endophilic subgroup (Reighle et al. 2011). Such stud-
malaria parasites Plasmodium falciparum and Plasmodium
ies will play an important role in determining the vivax. Proteomics Clin Appl 3:1314–1325
epidemiology of malaria and develop better control Acharya P, Pallavi R, Chandran S, Dandavate V et al (2011)
and prevention strategies. Clinical proteomics of the neglected human malarial parasite
Plasmodium vivax. PLoS One 6(10): e26623
Banumathy G, Singh V, Pavithra SR, Tatu U (2003) Heat shock
Conclusion protein 90 function is essential for Plasmodium falciparum
From the above discussion, it is clear that systems growth in human erythrocytes. J Biol Chem 278(20):
level approaches have been extensively initiated to 18336–18345
Bongfen SE, Laroque A, Berghout J, Gros P (2009) Genetic and
understand the biology of the malaria parasite and
genomic analyses of host-pathogen interactions in malaria.
have indeed been fruitful in uncovering several Trends Parasitol 25(9):417–422
novel aspects of parasite biology that would not Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL
have been possible by classical studies involving (2003) The transcriptome of the intraerythrocytic cycle of
Plasmodium falciparum. PLoS Biol 1(1):E5
one gene or one protein. There are emerging evi-
Choumet V, Carmi-Leroy A, Laurent C, Lenormand P et al
dences which highlight that physiology of the parasite (2007) The salivary glands and saliva of Anopheles gambiae
is different under in vitro and in vivo conditions as an essential step in Plasmodium life cycle: a global prote-
(reviewed by LeRoux et al. 2009), and, thus, efforts omic study. Proteomics 7(18):3384–3394
Daily JP, Scanfeld D, Pochet N, Le Roch K et al (2007) Distinct
should be made to understand the biology of the
physiological states of Plasmodium falciparum in malaria
parasite directly isolated from malaria patients. This infected patients. Nature 450:1091–1095
will give insight into the real disease scenario and Gardner MJ, Hall N, Fung E, White O et al (2002) Genome
enable better understanding of disease pathogenesis, sequence of the human malaria parasite Plasmodium
falciparum. Nature 419(6906):498–511
transmission, and therapeutic targets. In the future,
Gracia LS (2010) Malaria. Clin Lab Med 30(1):93–129
these studies may reveal several features of malaria Le Roch KG, Zhou Y, Blair PL, Grainger M et al (2003) Discov-
infection and will thereby contribute to the discovery ery of gene function by expression profiling of the malaria
of novel antimalarial drug targets as well as vaccine parasite life cycle. Science 301(5639):1503–1508
LeRoux M, Lakshmanan V, Daily JP (2009) Plasmodium
candidates. Such studies have provided an opportu-
falciparum biology: analysis of in vitro versus in vivo growth
nity for scientific groups with different expertise to conditions. Trends Parasitol 25(10):474–481
pursue research in a coherent manner, and this forms Li JV, Wang Y, Saric J, Nicholson JK et al (2008) Global
the key feature for the success and impact of systems metabolic responses of NMRI mice to an experimental Plas-
modium berghei infection. J Proteome Res 7(9):3948–3956
biology studies.
Olotu A, Moris P, Mwacharo J, Vekemans J et al (2011)
Circumsporozoite-Specific T Cell Responses in Children
Vaccinated with RTS,S/AS01(E) and Protection against
Cross-References P falciparum Clinical Malaria. PLoS One 6(10): e25786
Olszewski KL, Mathew MW, Morrisey JM, Garcia BA,
Vaidya AB, Rabinowitz JD, Llinas M (2010) Branched tri-
▶ Biological Disease Mechanism Networks carboxylic acid metabolism in Plasmodium falciparum.
▶ Biomarkers Nature 466(7307):774–778
▶ Biomarkers, Clinical Relevance Pavithra SR, Kumar R, Tatu U (2007) Systems analysis of
chaperone networks in the malarial parasite Plasmodium
▶ Graph Alignment, Protein Interaction Networks
falciparum. PLoS Comput Biol 3(9):1701–1715
▶ Gene Expression Biomarkers Rask TS, Hansen DA, Theander TG, Gorm Pedersen A,
▶ Host–Pathogen Interactions Lavstsen T (2010) Plasmodium falciparum erythrocyte
▶ Interactome membrane protein 1 diversity in seven genomes – divide
and conquer. PLoS Comput Biol 6(9):e1000933
▶ Mass Spectrometer
Reighle MM, Guelbeogo WM, Gneme A, Eiglmeier K et al
▶ Proteomics (2011) A cryptic subgroup of Anopheles gambiae is highly
▶ Systems Pharmacology, Drug-Target Networks susceptible to human malaria parasites. Science
▶ Vaccinomics 331(6017):596–598
Disease System, Parkinson’s Disease 589 D
sleep dysfunctions and dementia. Most PD cases are
Disease System, Parkinson’s Disease sporadic arising spontaneously with unknown origin.
PD is common among subjects aged >60 years with
Rajeswara Babu Mythri1, Shireen Vali2 and the prevalence and severity increasing with age. Inter-
M. M. Srinivas Bharath1 estingly, men are more affected than women (Hoehn
1
Department of Neurochemistry, National Institute of and Yahr 1967; Chaudhuri et al. 2006). Diagnosis of
Mental Health and Neurosciences (NIMHANS), PD even today is symptomatic and depends on neuro-
Bangalore, Karnataka, India logical recognition of clinical symptoms.
2
Cell Works Group Inc., Bangalore, India D
Pathology of PD
Voluntary movement is controlled in the brain by the
Synonyms nigrostriatal pathway involving dopaminergic neurons
originating from the substantia nigra (SN) region of the
Movement disorders; Neurodegenerative diseases ventral mid brain and projecting into the striatum
(Fig. 1). These neurons synthesize and supply the
neurotransmitter ▶ dopamine (DA) to the striatum
Definition where it participates in a complicated neurochemical
network that ultimately controls body movement. Bio-
Parkinson’s disease (PD) is a movement disorder and chemical, pathological, and imaging data from PD
an age-associated neurodegenerative disease. Motor patients have indicated that a gradual loss of these
impairment in PD is caused by degeneration of dopa- dopaminergic neurons causing decreased DA supply
minergic neurons in the substantia nigra (SN) of the result in motor impairment and PD symptoms (Burke
midbrain which leads to depletion of the neurotrans- 1998).
mitter dopamine in the striatum. Neurodegeneration Although there are several approaches for PD ther-
in PD is a culmination of several interdependent pro- apy, a permanent cure is not available. Most pharma-
cesses including oxidative stress, mitochondrial dam- cological drugs strive at replenishing the lost DA; but
age, protein aggregation, proteasome inhibition, many patients develop motor complications with
etc. However, the dynamics and interdependence of chronic treatment (Diaz and Waters 2009). Further,
the disease pathways, their temporal order, synergy, most drugs do not exhibit significant neuroprotection.
and regulation cannot be understood completely by The failure to obtain an effective PD drug is attributed
isolated experiments. Systems biology based partly to the lack of complete understanding of the
dynamic modeling and predictive analyses supported pathology at the molecular level. Therefore, there is
by experimental data can address this issue and pro- a need to obtain a comprehensive network of
vide a superior understanding of PD pathology at the interacting molecules and pathways involved in neu-
molecular level aimed at improved diagnosis and ronal death in PD.
therapy.
Molecular Mechanisms in PD
The etiology of PD is contributed by a combination of
Characteristics physiological ▶ aging, environmental factors, and
genetic mutations. Neurodegeneration in PD involves
Parkinson’s disease (PD) is an age-associated neuro- interdependent mechanisms such as ▶ oxidative stress,
degenerative disease clinically defined as a movement mitochondrial damage (▶ Mitochondrial Dysfunction,
disorder. The clinical symptoms of PD include Parkinson’s Disease), proteasome inhibition
akinesia (impaired body movement), rigidity, resting (▶ Proteasome Inhibition, Parkinson’s Disease), pro-
tremor, and postural instability. The patients also tein ▶ aggregation, neuroinflammation, etc. (Betarbet
exhibit nonmotor symptoms including autonomic dys- et al. 2002). The brain is particularly vulnerable to
function, cognitive, neurobehavioral, sensory, and oxidative stress because, it (1) consumes relatively
D 590 Disease System, Parkinson’s Disease

Disease System, Parkinson’s Disease, Fig. 1 Nigrostriatal pathway in normal and PD brains

higher amounts of oxygen (2) accumulates lipids and the precise relationship, synergy, and temporal order
iron that promote oxidative damage (3) has lower among these pathways are not clear (Fig. 2).
antioxidant defenses. Among the brain cells, neurons
in general and dopaminergic neurons in particular are Systems Biology Applications in
highly susceptible to oxidative damage because they Neurodegeneration
(1) generate oxidative stress during dopamine metab- It could be surmised that designing experiments to
olism (2) have lower levels of the antioxidant ▶ gluta- analyze the comprehensive dynamics of all the events
thione (GSH) and (3) increased iron content. in PD could be difficult. However, systems biology
Mitochondrial damage in SN neurons is caused by based in silico predictive technology can address this
oxidative stress mediated selective inhibition of issue by recapitulating an accurate and simultaneous
mitochondrial complex I (CI) (Bharath et al. 2002; view of individual and interlinked pathways at the
Abou-Sleiman et al. 2006). molecular level. Such a virtual platform should include
Inhibition of protein processing is linked to aggrega- all the relevant proteins and their genes and transcripts
tion of cellular proteins in the SN dopaminergic neurons with their relationship quantitatively represented. The
as intraneuronal protein deposits called “Lewy bodies platform should integrate these species in intra and
(LBs)” (▶ Lewy Bodies, Role of Alpha-syn). LBs rep- intercellular pathways providing a comprehensive
resent a pathological hallmark of PD with a-synuclein view at the cellular and tissue level. The platform
(a-syn) protein as the major component (Shults 2006). should be certified against predefined in vitro and
Neuroinflammatory pathways also contribute to the in vivo studies reported in scientific literature with
degenerative process in PD (Glass et al. 2010). flexibility to include new data. Such a platform could
Although PD involves a complex network of events, assay all intermediate and endpoint biomarkers and
Disease System, Parkinson’s Disease 591 D

Disease System, Parkinson’s Disease, Fig. 2 Interplay among different disease pathways in PD

manipulate different triggers, inhibitors, and activa- stress, proteasomal dysfunction, protein aggrega-
tors. This also includes percentage, knockout, dose– tion, neurotrophin/growth factor signaling, etc.
response, overexpression, and mutational analysis. 3. Amalgamation of signals from all cell types
Different customized studies can be defined and exe- involved, including dopaminergic neurons,
cuted through an interactive graphical user interface microglia, and astrocytes and relevant pathways
mode or a high-throughput approach. covering all disease stages.
Accordingly, a typical in silico PD platform should 4. Incorporation of different triggering factors includ-
include an exhaustive list of ▶ molecular markers ing environmental toxins (rotenone, 1-methyl-4-
representing important pathways in normal physiology phenyl-1,2,3,6-tetrahydropyridine or MPTP
and neurodegeneration as follows: (▶ Parkinson’s Disease, MPTP) and genetic factors
1. Markers that represent physiological aging to dis- (familial mutations) which induce
tinguish aging and neurodegeneration. Aging state neurodegeneration in the aging neurons.
is depicted by markers of oxidative stress, mito- We recently generated such a platform to carry out
chondrial activity, and quantitative changes in insu- simulations linking disease pathways in PD with
lin growth factor, melatonin, homocysteine, etc. experimental validation (Vali and Bharath 2009). The
2. Pathways representing PD state including neurode- methods in model construction are as follows and are
generative and neuroinflammatory events and applicable to similar biological phenomena:
related endpoint biomarkers of mitochondrial dys- A bottom–up approach was adopted to build the plat-
function, oxidative stress, endoplasmic reticulum form. Initially, the various phenomena related to
D 592 Disease System, Parkinson’s Disease

neuronal dynamics were built as base modules and mitochondrial damage with therapeutic potential in
progressively integrated. These modules included: PD. In a third study, modeling envisaged that the
(a) Comprehensive mitochondrial bioenergetics A53T mutant of a-syn can accumulate and disrupt
(b) Generation of reactive oxygen and nitrogen spe- mitochondrial function in dopaminergic neurons. In
cies (ROS/RNS) including dopamine metabolism the presence of proteasome inhibition, mitochondrial
and iron turnover is further reduced, resulting in decreased ATP
(c) GSH and Ca2+dynamics synthesis and in turn decreasing GSH synthesis.
(d) Proteasome inhibition and protein aggregation
Modeling involved a detailed study of the individual
pathways and components. Interactions among these Cross-References
components and their regulation were arranged to
obtain a basic interaction map. The static map was ▶ Aging
made dynamic by incorporating (1) physiological con- ▶ Dopamine
centrations of individual molecules and enzymes/ ▶ Genetic Factor in Parkinson’s Disease
proteins (2) catalytic rates and reaction mechanisms in ▶ Glutathione
flux equations. Most of these data were obtained from ▶ Lewy Bodies, Role of Alpha-syn
published scientific literature involving experimental ▶ Mitochondrial Dysfunction, Parkinson’s Disease
methods. After individual modules were built and vali- ▶ Molecular Markers, In Silico Parkinson’s Disease
dated against published data, the cross-talk among these Platform
phenomena was integrated such that the output from ▶ Oxidative Stress, Protein Damage
a module either created the input to another module or ▶ Parkinson’s Disease, MPTP
provided regulatory control. These interactions cross- ▶ Proteasome Inhibition, Parkinson’s Disease
linked and integrated the base modules thus generating
a complete integrated system. The performance of such
a complex network would be different from isolated References
experimental data obtained from these phenomena.
Abou-Sleiman PM, Muqit MM, Wood NW (2006) Expanding
The modeling of kinetic phenomena such as time- insights of mitochondrial dysfunction in Parkinson’s disease.
dependent potential differences in trans-mitochondrial Nat Rev Neurosci 7:207–219
membrane potential, proton motive force, ion fluxes, Betarbet R, Sherer TB, Di Monte DA, Greenamyre JT
and other metabolic reactions were performed utilizing (2002) Mechanistic approaches to Parkinson’s disease path-
ogenesis. Brain Pathol 12:499–510
modified “Ordinary Differential Equations” and “Mass Bharath S, Hsu M, Kaur D, Rajagopalan S, Andersen J (2002)
Action Kinetics” and the integration of the pathways Glutathione, iron and Parkinson’s disease. Biochem
were solved by the Radau and Euler methods. Such Pharmacol 64:1037–1048
modeling experiments reconcile numerous formerly Burke RE (1998) Parkinson’s disease. In: Koliatsos VE, Ratan
RR (eds) Cell death and disease of the nervous system.
unrelated features of PD in a chronological manner Humana, Totowa, pp 459–475
thus elucidating disease progression based on the com- Chaudhuri KR, Healy DG, Schapira AH (2006) Non-motor
bined molecular actions of different mechanisms. symptoms of Parkinson’s disease: diagnosis and manage-
Using this platform, we carried out few studies ment. Lancet Neurol 5:235–245
Diaz NL, Waters CH (2009) Current strategies in the treatment
exploring the mechanistic and therapeutic aspects of of Parkinson’s disease and a personalized approach to man-
PD (Vali and Bharath 2009). Firstly, we integrated agement. Expert Rev Neurother 9:1781–1789
GSH metabolism and mitochondrial dysfunction asso- Glass CK, Saijo K, Winner B, Marchetto MC, Gage FH
ciated with PD. This study inferred that the mitochon- (2010) Mechanisms underlying inflammation in
neurodegeneration. Cell 140:918–934
drial damage affected the cellular GSH synthesis Hoehn MM, Yahr MD (1967) Parkinsonism: onset, progression
thereby enhancing the oxidative damage and exacer- and mortality. Neurology 17:427–442
bating neurodegeneration. Secondly, we have also Shults CW (2006) Lewy bodies. Proc Natl Acad Sci USA
traced the neuroprotective function of curcumin 103:1661–1668
Vali S, Bharath MMS (2009) Dynamic virtual prototype of
(a polyphenol from turmeric) and its bioconjugates. Parkinson’s disease at the cellular and molecular abstraction
We found that curcumin and its conjugates induced level: focus on neurodegeneration physiology. Biobytes
GSH production, protected against oxidative stress and 5:40–46
Disease-oriented Causal Networks 593 D
variable affects the certainty of the state of another
Disease Taxonomy variable hence these causal networks are graphical
representations of causal relationships between the
▶ Disease Classification or Discrimination variables of the graph. For example, obesity is one of
the factors for developing insulin resistance, which in
turn is the cause for type 2 diabetes.
Disease-oriented Causal Networks

Sanjeev Kumar1 and Shipra Agrawal2 Characteristics D


1
BioCOS Life Sciences Private Limited, Bangalore,
Karnataka, India Usage of the Disease-Oriented Causal Networks
2
BioCOS Life Sciences Pvt. Limited, Institute of Disease mechanisms can be explained using causal
Bioinformatics and Applied Biotechnology, networks. These networks are used to identify genes,
Bangalore, Karnataka, India underlying pathways or interactions that can be causal
driver of the disease. For example, causal genes
involved in the pathogenesis of a disease can be
Definition revealed by studying gene regulatory networks,
which are directed graphs and depict causal interac-
DOCN represents the relationship across disease, can- tions and functional correlation between the genes.
didate genes, regulatory genes and their functions to The intricate genetic interactions and gene-to-
define the causal relationship through a gene or protein phenotype correlation of a complex disease could be
network. The connectivity across the network is better understood by integrating data from multiple
established through directed graphs where the nodes sources (genotype information, gene-expression, PPI
or genes are variables, which are connected through information, etc.) to construct the causal probabilistic
edges. The graph indicates interaction between and networks. Such networks are based on probabilistic
across the genes (nodes) to ultimately describe the approaches where it is believed that one node in the
causal mechanism of a disease. In the DOCN, the network affects the state of other nodes in the network
graph shows how the change of the state of one (Fig. 1).

ACGTGGCCGTCAA
Causal SNP
ACGTGGGCGTCAA

Genotype Information Data integration &


Network modeling

Regulatory gene
Protein interaction
information Causal genes
Causal probabilistic
network

Disease-oriented Causal
Networks, Fig. 1 Data
integration and network
modeling to create a causal Gene-expression
probabilistic disease network information
D 594 Diseasome

Data Integration to Create Causal Disease Definition


Networks
The advantage of integrating the multiple molecular Dispositions are properties that need enabling condi-
data that is, RNA profiling, genotyping/SNP typing, tions for being manifest.
protein expression and/or protein–protein interaction
is to enhance the identification of genes in genomic
loci or disease loci. Further, the data integration Characteristics
methods also help in knowing whether given loci are
jointly associated with the disease. This association Dispositional Versus Categorical Properties
may be due to the co-alteration in transcript/protein Properties of objects or systems are usually distin-
levels, or two closely linked loci are altered indepen- guished into dispositional properties and categorical
dently to affect the RNA/protein levels. The statistical properties. Everyday paradigm cases of dispositional
procedures are used examine the joint probabilities of properties are, for instance, courage and fragility; par-
genotype association, RNA/protein expression, and adigm cases of categorical properties are shape and
clinical disease data. The entire data could be modeled structure of an object or a system. Whether it is possi-
to know whether they are related in a causal or reactive ble to explicitly define individual dispositions (and to
relationship. provide a general definitional scheme for “disposition”
as a generic term) is a major dispute in the debate about
dispositions (see section “Conditional Analysis”).
References In the case of dispositional properties, it is impor-
tant to distinguish between an object or system having
Barabási A-L, Gulbahce N, Loscalzo J (2012) Network medi- a property on the one hand and manifesting (▶ mani-
cine: a network-based approach to human disease. Nat Rev festation) the property under ▶ enabling conditions on
8(4):286–295
the other hand. A person may be a courageous person
Schadt EE, Friend SH, Shaywitz DA (2009) A network view of
disease and compound screening. Nat Rev Drug Disc all his life, but has only few occasions to show coura-
8(4):286–295 geous behavior. Similarly, a glass may be fragile but
this disposition will become manifest (i.e., the glass
will break) only if given certain enabling or stimulus
conditions obtain (e.g., striking of the glass). On the
contrary, if objects or systems possess categorical
Diseasome properties (e.g., the roundness of a billiard ball) they
will be manifest unconditionally. Thus, concepts for
▶ Biological Disease Mechanism Networks categorical properties do not entail the distinction of
having the property and manifesting it.
In the biological sciences many examples of disposi-
tional properties can be found. These examples include:
the capacity of amino acid chains to fold into a specific
Disposition three-dimensional structure, the capacity of genes to
become activated, the ability of muscle fibers to contract,
Andreas H€uttemann and Marie I. Kaiser the pluripotency and totipotency of cells, the fitness
Department of Philosophy, University of Cologne, (capacity to reproduce successfully and survive) of organ-
Cologne, Germany isms, the ▶ evolvability/adaptability of populations, and
the sustainability of ecosystems.

Synonyms The Importance of Dispositions in Science


Dispositions have been controversial since early
Capacity; Potentiality; Power; Tendency modern times because they were conceived of as
Disposition 595 D
hidden causes (▶ causality) that bring about effects analysis, the necessary and sufficient conditions for
(i.e., their manifestations). Moliére in his Le Malade s having D can be symbolized as follows:
imaginaire ridicules explanations (▶ Explanation
in Biology) in terms of dispositions by pointing out SCA : Ds $ ðEs ! MsÞ
that one might explain why opium puts people
to sleep by appealing to its “dormitive virtue.” which is to be read as: s has Disposition D if and only
However, it is explanatory empty to refer to hidden if: if s were confronted with E, then s would necessarily
causes that are epistemically accessible only via manifest M. Thus, given SCA and given the knowl-
a single effect. edge regarding how to test the counterfactual claim D
Since the 1930s (cf. Carnap 1936) it became appar- “Es ! Ms,” we know under which conditions we can
ent that dispositional concepts do play an important legitimately attribute D to s.
role in science and furthermore interest in disposi- One problem with the SCA is that manifestations
tions as an analytical tool for characterizing science cannot easily be specified. What exactly are the man-
has resurged considerably in the last two decades ifestations of being courageous or of fragility (cf. Prior
(e.g., Mellor 2000; Choi and Fara 2012). The 1985, 6–10)? Likewise, it is difficult to spell out the
concept of a disposition is an important tool in exact enabling conditions for a disposition (e.g., break-
the analysis of science because it points to the fact ing, hitting, and throwing in particular ways). This is
that the properties/behavior of systems may only even truer for biological dispositions because the way
be manifest given certain enabling or stimulus condi- in which the context is involved in the manifestation is
tions, for example, contextual factors. This is partic- diverse and complicated, and the enabling conditions
ularly true in the biological sciences. We attribute are very complex.
many properties to biological systems (e.g., the abil- Another significant problem for the simple condi-
ity of muscle fibers to contract) that become manifest tional analysis is a family of counterexamples that
only given the presence of specific enabling condi- shows that the right hand side of SCA (Es ! Ms) is
tions (e.g., the presence of ATP and an appropriate neither necessary nor sufficient for the left hand side
stimulus). (Ds). There are various such counterexamples
discussed under the headings of “antidotes,” “finks,”
Conditional Analysis “masks,” etc. For example, if we understand “fatally
Certain aspects of dispositions have been debated (see poisonous” as “disposed to kill if ingested,” someone
Mumford 1998 for a comprehensive overview). We might take the poison but, nevertheless, survive
will discuss some of these issues in order to clarify because of some antidote that has been ingested as
what is implied by the attribution of a disposition to an well (Bird 2007, 27). In such a case, the substance is
object or a system. fatally poisonous, but the manifestation does not take
First, what are the conditions under which we can place even though the enabling conditions (ingestion)
legitimately attribute a disposition D to a system s? To did occur. A fortiori the right hand side of SCA is not
give a precise answer to this question requires speci- a necessary condition for the left hand side. There are
fying the relation between having a disposition and possible interferences, which invalidate SCA. Thus,
manifesting it. One major issue in the debate about the manifestation of a disposition requires not only
dispositions is whether this connection can be made enabling conditions but also the absence of interfering
more precise – whether particular dispositions can be factors. Only if all of these conditions can be listed
defined explicitly in terms of their manifestations and explicitly, the SCA would provide an explicit defini-
enabling conditions. tion of a dispositional concept. It is, however,
The starting point for such attempts is the so-called a controversial issue whether it is even in principle
▶ simple conditional analysis (SCA). Let Ds stand for possible to list all relevant factors. Take the example
system s having the disposition D, that is, s being of the differentiability of cells. The process of mani-
disposed to M (manifestation) provided enabling con- festation, that is, the differentiation of a cell into
ditions E obtain. According to the simple conditional a specific cell type is a very complex and temporally
D 596 Disposition

extended process, which requires that many genes are efficacy. For example, whether a population is evolvable
correctly activated or repressed, that plenty of proteins or not is not independent of contextual factors like
are properly synthesized and interact in the right way migratory abilities and landscape topography. Hence,
with each other. According to the ▶ complexity of the the intrinsic character of the biological disposition
differentiation process, numerous factors could disturb “evolvability” is called into question.
this process and prevent the manifestation. It is hardly
imaginable that one could (even in principle) prepare Single-Track Versus Multi-Track Dispositions
a complete list of all possible interfering factors. Courage, it seems, is a disposition that will be
manifested in different situations by different behaviors.
Intrinsicality It is a multi-track disposition, that is, one disposition
A second instructive debate concerns the with multiple possible manifestations. However, the
▶ intrinsicality of dispositions. Roughly speaking, SCA-tradition has often assumed that dispositions are
a property is intrinsic if a system possesses the property individuated in terms of one set of enabling conditions
independently of what is going on in its context. Shape and one manifestation (single-track dispositions). The
is an intrinsic property, whereas being smaller than drawback is a proliferation of dispositions, for example,
everybody else in the room is an extrinsic (relational) different courage-dispositions – one for each kind of
property. courageous behavior, for example, courage in the face
The rationale for attributing a disposition to of death and courage in the face of financial stress.
a particular system seems to imply that dispositions In biology, there are many possible candidates for
are intrinsic. The rationale is as follows: The phenom- multi-track dispositions: the manifestation of
enon of sugar dissolving in water is, strictly speaking, evolvability for a population can result in different
a property of a combined system – sugar plus water. If changes of gene frequency of a population; the
we describe the phenomenon in terms of a disposition pluripotency of stem cells can become manifest in
being manifest rather than in terms of a property of muscle cells, bone cells, etc. But on closer inspection
a compound system, we usually introduce a distinction it becomes apparent that the characterization of these
between a system (e.g., sugar), which is endowed with dispositions as “multi-track” depends on a fine grained
a disposition, and external, for example, contextual analysis of the manifestation states. If we raise the
conditions. If we ascribe solubility to sugar, then we graininess of the analysis, just one and not multiple
focus on those conditions for obtaining of the phenom- possible manifestation states can be identified. For
ena that are due to sugar only. The disposition (solu- example, the evolvability of a population will be man-
bility) comprises exactly those conditions of the ifest if its gene frequency has changed independent of
phenomenon that the system (sugar) possesses inde- the kind of gene whose frequency has changed and
pendently of what is going on in the context. Thus, independent of the exact dimension of the change.
even though the manifestation of dispositions (e.g., the
dissolving in the case of solubility) depends on extrin- Reduction
sic factors, it is usually held that the disposition itself A further frequently disputed question concerns the
(e.g., the solubility of salt) is intrinsic. But issue of ▶ reduction. Dispositions, such as fragility,
intrinsicality may not be a necessary feature of dispo- are necessary conditions for the obtaining of the man-
sitions (cf. McKitrick 2003; Choi and Fara 2012). The ifestation (provided the simple conditional analysis or
challenge is particularly clear in the case of some something akin is correct). This is often analyzed as:
biological systems: The importance of the context Fragility is causally efficacious (▶ causality) in bring-
undermines the claim that all dispositional properties ing about the manifestation. An ensuing question that
are intrinsic. As Alan Love (2003) has pointed out for has been widely discussed is whether a disposition can
the example of the ▶ evolvability of populations be considered causally efficacious on its own or
(▶ adaptation), in many cases external factors are not whether it is causally efficacious in virtue of an under-
only the enabling conditions for biological dispositions. lying causal basis, such as molecular structure.
Rather, they determine jointly with intrinsic factors the It is important to distinguish two issues in this
very nature of the disposition as well as its causal debate. First, fragility and other every-day dispositions
Distance Transform 597 D
are macroscopic properties. We tend to assume that Mellor DH (2000) The semantics and ontology of dispositions.
macroscopic properties of systems can be reduced to Mind 109:757–780
Mumford S (1998) Dispositions. Oxford University Press,
their molecular structure. A glass, for instance, is frag- Oxford
ile in virtue of its molecular structure. This, however, is Prior E (1985) Dispositions. Aberdeen University Press,
true for dispositional and categorical properties alike. Aberdeen
The glass has its shape (a categorical property) in
virtue of its molecular structure (and/or arrangement)
as well. So this is not a special issue for dispositions.
A second, different, issue is whether there can be Distance Field D
bare dispositions or whether every dispositional prop-
erty needs to be reduced to categorical properties, such ▶ Distance Transform
as the microstructural configuration. The question is ▶ Distance Transform and Travel Depth
whether there might be irreducible dispositional prop-
erties that cannot be identified with a set of categorical
(e.g., microstructural) properties. The physical prop-
erty “charge” or other fundamental dispositions might Distance Map
be candidates for bare dispositions because there
are no microstructural properties that they might be ▶ Distance Transform
identified with. ▶ Distance Transform and Travel Depth

Cross-References
Distance Transform
▶ Adaptation
▶ Causality Virginio Cantoni1,2, Riccardo Gatti1 and
▶ Complex System Luca Lombardi1
▶ Complexity 1
Department of Computer Engineering and Systems
▶ Enabling Conditions Science, University of Pavia, Pavia, Italy
▶ Evolvability 2
Computational Biology, KTH Royal Institute of
▶ Explanation in Biology Technology, Stockholm, Sweden
▶ Intrinsicality
▶ Manifestation
▶ Reduction Synonyms
▶ Simple Conditional Analysis (SCA)
Distance field; Distance map

References
Definition
Armstrong DM, Martin CB, Place UT (1996) Dispositions –
a debate. Routledge, London The Distance Transform (DT) is an operator usually
Bird A (2007) Nature’s metaphysics: laws and properties. applied into the domain of 2D binary image (where
Oxford University Press, Oxford
Carnap R (1936) Testability and meaning. Phil Sci 3:420–471
each point is classified as foreground or background)
Choi S, Fara M (2012) Dispositions. In: Zalta EN (ed) The but it can be extended to 3D domains too. The result of
Stanford Encyclopedia of Philosophy (Spring 2012 Edition), the transform is a new image whose foreground pixels
http://plato.stanford.edu/archives/spr2012/entries/dispositions/ are labeled with a value that represents the minimum
Love A (2003) Evolvability, dispositions, and intrinsicality. Phil
Sci 70:1015–1027
distance from the background.
McKitrick J (2003) A case for extrinsic dispositions. Aust J Phil There are many different types of DT which differ
81:155–174 mainly on the type of metric used to evaluate the
D 598 Distance Transform and Travel Depth

Distance Transform,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Fig. 1 Distance transform
examples with city block and 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0
chessboard metrics
0 0 0 1 2 2 1 1 0 0 0 0 0 1 2 1 1 1 0 0
0 0 1 2 3 3 2 1 0 0 0 0 1 1 2 2 2 1 0 0
0 1 1 2 3 3 3 2 1 0 0 1 1 1 2 3 2 1 1 0
0 0 0 1 2 2 2 2 1 0 0 0 0 1 2 2 2 1 1 0
0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

distance. Most commonly used ▶ digital metrics are ▶ convex hull (or a related surface), is called travel
“chessboard” metric, “Euclidean” metric, or “city depth. For the evaluation of this feature, an effective
block” metric (Fig. 1). tool is given by the ▶ distance transform.
The simplest method to obtain a DT is to apply
a sequence of erosion operations from mathematical mor-
phology with a proper structuring element defined through Characteristics
the chosen metric. This sequence of operations must be
performed until all foreground pixels are covered. In biology, the travel depth parameter has been intro-
The DT is applied in skeletonization, shape descrip- duced by Coleman and Sharp (Coleman and Sharp 2006)
tion, and symmetry evaluation processes. (Coleman and Sharp 2010). The travel depth is the value
of the distance associated to each point of a pocket sur-
face from the ▶ convex hull (a 2D example is shown in
Cross-References Fig. 1). The shape and the properties of the molecular
surface determine what interactions are possible with
▶ Distance Transform and Travel Depth ligands or other macromolecules. In particular, the active
sites are generally described as shallow and deep spots on
the molecular volume. Experimentations have shown
that active sites usually correspond with areas on the
Distance Transform and Travel Depth surface having a high travel depth value and thus they
coincide with the bottom of protein’s pockets.
Virginio Cantoni1,2, Riccardo Gatti1 and Luca
Lombardi1 How to Obtain Travel Depth for a Given Protein 3D
1
Department of Computer Engineering and Systems Structure
Science, University of Pavia, Pavia, Italy The first step is to define the reference protein’s vol-
2
Computational Biology, KTH Royal Institute of ume inside a cubic grid of voxels (Cantoni et al. 2010).
Technology, Stockholm, Sweden This reference volume could be, for example, the van
der Walls volume, the solvent excluded surface (SES),
or the solvent accessible surface (SAS). The second
Synonyms step is to build the convex hull of the molecule’s
volume that will define the boundaries in the 3D
Distance field; Distance map space in which to apply the distance transform algo-
rithm. The convex hull of a molecule is the smallest
convex polyhedron that contains the molecule voxels.
Definition In R3 the convex hull is constituted by a set of facets,
usually triangles and a set of ridges (boundary ele-
For the pockets analysis in proteomics, the minimum ments) that are edges. Each triangle that belongs to
distance of a point from a reference surface, that is the the convex hull must then be inside the 3D grid and
Distributed Data Access 599 D
• BCH represents the surface of the convex hull.
• minn2ne(dn + wn) represents the minimum value
among the distances de in the near neighbors
belonging to D already defined, incremented by
the displacement wj between the locations (e, n):
C
that is, if e and n have a common face wn ¼ 1; if
A
e and n have a common edge wn ¼ √2; if e and
D
E n have a common vertex wn ¼ √3. At each iteration,
new voxels, inside R, are reached by the propaga- D
B tion process and the value they take is determined
F
by the neighbor distance (from the convex hull) and
the voxels distance from the neighbor involved; this
in order to simulate an isotropic propagation pro-
cess and the proper distance evaluation.
Distance Transform and Travel Depth, Fig. 1 A 2D • E ¼ Ø corresponds to the regime condition: no other
example. A, B, C, and D are points with high travel depth changes are given and the connected component of
value and possible active sites candidates R, adjacent to the border BCH, is completely
covered.
also all the voxels belonging to the volume of the The travel depth represents the distance of each
convex hull must be labeled inside the grid. The region voxel of A from BCH.
on which the distance transform is applied is called
concavity volume and is obtained by:
Cross-References
R ¼ CH \ RV (1)
▶ Convex Hull
where R is the concavity volume, CH the convex hull, ▶ Distance Transform
and RV the reference molecular volume (e.g., the SES).
Within the region R the following propagation is References
applied:
( ) Cantoni V, Gatti R, Lombardi L (2010) Segmentation of SES for
1 if i 2 BCH protein structure analysis. In: Proceedings of bioinformatics
Di ¼ 2010, Valencia, INSTICC Press: Setubal, PT, pp 83–89
0 otherwise
Coleman RG, Sharp KA (2006) Travel depth, a new shape
A ¼ BCH ; descriptor for macromolecules: application to ligand bind-
ing. J Molecular Biology 362:441–458
N ¼ ðA  K Þ \ R; Coleman RG, Sharp KA (2010) Protein pockets: inventory,
E ¼ N  A; shape and comparison. J Chem Inf Model 50:589–603
while E 6¼  do
8e 2 E : de ¼ minn2ne ðdn þ wn Þ;
A ¼ N; Distributed Data Access
N ¼ ðA  K Þ \ R;
E ¼ N  A; Eugenio Cesario
done ICAR-CNR (Istituto di Calcolo e Reti ad Alte
Prestazioni), Rende, CS, Italy
Where:
• A represents the increasing set of voxels contained
in R; E corresponds to the recruited set of near Synonyms
neighbors of A contained in R (i.e., the voxels
reached by the last propagation step). Data collection (integration) from distributed sources
D 600 Distributed Data Access

Definition researchers all over the world. For example, an orga-


nization that generates data, that is, a sequencing lab-
Distributed Data Access refers to the search and explo- oratory, starts by processing raw data (collected by
ration of data stored in distributed data repositories, as sample tracking), followed by analytical processing
well as the gathering of data from various sources to to translate the signal to measurements, and finally to
find interesting and useful information and answer obtain biological data (such as sequence tags or abun-
a specific scientific question. dance of gene or proteins). Daily production rates are
in the order of tens of gigabytes (Topaloglou et al.
2004). For example, from 1996 to 2010 the GenBank
Characteristics (the NIH genetic sequence database populated by an
annotated collection of all publicly available DNA
Biological Data Issues sequences) increased the number of entries with an
Biological data (protein structure and function, DNA exponential rate, that is, from 1 million sequences to
sequences, and metabolic pathways) can concern many 49 million (GenBank 2011).
life science domains, such as genetics, structural biol-
ogy, microarrays, pharmacology, etc. They are Data Access and Integration
characterized by two main features: heterogeneity Biological information is highly interconnected and
and huge volume. often context-dependent. In practice, genomic data
and its associated information generated by experi-
Heterogeneity mental or computational methods is stored in hundreds
Biological repositories are highly diverse in both vari- of independent, overlapping, and heterogeneous data
ety and granularity. For example, biological data types resources geographically distributed. They are stored
can vary from images and drawings to graph struc- in a variety of formats, ranging from unstructured data
tures, from unstructured text to data tables, from (i.e., textual data) to strongly structured database data,
sequence to three-dimensional protein structures, depending on its content (Baralis and Fiori 2008).
etc. Moreover, each researcher can focus on different Moreover, there are millions of articles composing
levels of biological problems. This makes the reposi- the scientific research literature, most of them accessi-
tory store data with different granularities, such as, ble on the Web.
from genomic and protein sequences to protein activ- Biological scientists often require the execution of
ities, from cell structure to two-dimensional or three- “cross-queries,” that is, the discovery of information
dimensional structured data of huge molecules (Tao from different repository locations and the merging of
2006). Table 1 shows a list of the biological data types results retrieved by various datasets (Haider et al.
and a related schematic description of their content 2009). For such a reason, a typical data-integration
(BioInfoBank Library 2010). problem is the gathering of data from various sources
to find relevant information and answer a specific sci-
Huge Volume entific question. To do that, the simple solution of
Due to the advances of the research activity in the life moving all these data into a central location for inte-
science, huge amounts of data are generated by grated querying with other resources is unfeasible, due

Distributed Data Access, Table 1 Types of biological data


Experimental Data revealed from direct laboratory experiments (observations, digital images, notes, etc.)
data
Raw data Data which have never been a subject of manipulation or processing
Sequence data Data containing protein sequences or obtained from a DNA sequencing process
Structure data Data modeling three-dimensional protein structures, DNA, RNA, or small molecules
Phylogenetic Data about evolutionary relations among various groups of organisms (information is revealed through molecular
data sequencing data and morphological data matrices)
Metabolic data Data containing metabolic pathways (enzymatic reactions in living organisms) and systems biology information
Distributed Data Access 601 D
to physical transfer challenges (too long transfer time) enabling users to ask complex questions over data
and privacy-preserving issues. On the contrary, the sources that may be located at different geographical
most common approach currently exploited consists locations. It is used to access many of the large
in maintaining the information stored in geographi- biological datasets in the public domain, such as
cally distributed databases whereby individual data dbSNP Ensembl genomic, Uniprot protein, Reactome
providers are responsible for updates and release pathway, HGNC gene name, Wormbase genomic, and
cycles (Smedley et al. 2008). The query response is PRIDE proteomic data (a complete list is available at
obtained by collecting and merging the information http://www.biomart.org/biomart/martview/). As of
obtained from various data sources and return a final March 2009, BioMart Central Portal brings together D
result to the user. Lastly, the results to be returned an extensive range of databases serving more than 100
should be in standard formats and where possible, datasets with an average monthly usage of over one
semantically annotated to ensure interoperability with million server hits (Haider et al. 2009).
other databases and tools (Haider et al. 2009). In the
following we describe two systems that have been Data Analysis
developed for biological data integration. The number of available complete genomic sequences
The Distributed Annotation System (DAS) (Dowell is doubling almost every 12 months (Meyer 2006),
et al. 2001) is a widely adopted protocol for dynami- whereas according to Moore’s law, available compute
cally integrating a wide range of biological data from cycles (i.e., computational power) double every
geographically diverse sources. Its applicability is 18 months. That is, biological data to be processed
growing and evolving in response to new challenges are growing more quickly than computational and
facing integrative bioinformatics. An extended version technological instruments. Additionally, since the
of the DAS specification (version 1.53E) incorporates analysis of genomic sequences requires binary com-
several recent developments, including its extension to parisons of the genes involved in it, the computational
serve new data types and an ontology for protein fea- overhead is very high. The impact of such issues is
tures (Jenkinson et al. 2008). Data distribution, plotted in Fig. 1 (Meyer 2006), which contrasts the
performed by DAS servers, is separated from visuali- number of genetic sequences obtained with the number
zation, which is done by DAS clients that integrate of annotations generated. The figure shows that the
information from multiple servers. It allows a single knowledge (annotations, models, patterns) has
machine to gather up sequence annotation information a sublinear rate with respect to the available data
from multiple distant Web sites, collate the informa- sequences which they are extracted from.
tion, and display it to the user in a single view. The To handle this abundance in data availability
DAS specification has several client implementations (whose rate of production often far outstrips the
(Jenkinson et al. 2008): the Ensembl genome browser capability of the scientists to analyze it), automatic
(to display data from a wide variety of genomic, gene, data analysis techniques are used. In particular, the
and protein sequence coordinate systems), SPICE (to exploitation of Data Mining algorithms (Fayyad et al.
combine protein sequence and structural annotations), 2006; Grossman et al. 2001) in science helps scientists
the DASMIweb portal (to integrate protein-protein and in hypothesis formation and gives them a support on
domain-domain interaction datasets), and iPfam (to their scientific practices and solving environments,
compare the interaction topologies of different sources getting the benefits coming from knowledge that can
by overlaying them in a node graph). be extracted from large data sources. Moreover,
Recently the BioMart Central Portal, a system since data is large and is maintained over geographi-
offering access to a wide set of biological datasets, cally distributed sites, the computational power of
has been implemented (Haider et al. 2009). It is distributed systems is often exploited for knowledge
a Web server interface of BioMart software (Smedley discovery in scientific data. Distributed Data Mining
et al. 2009) and provides a unified view over disparate algorithms are very suitable to such a purpose.
data sources that enable bioscientists to retrieve data The Grid (Foster et al. 2003) is a privileged
from one or multiple sources in a simple and efficient computing infrastructure to develop applications over
way. It provides access to a variety of datasets that can geographically distributed sites. The Grid involves the
be queried independently or in a federated way integrated and collaborative use of remote computing
D 602 Distributed Data Access

Distributed Data Access, 100000000

Number of entries in database


Fig. 1 Using a logarithmic
scale, the growth of sequence 10000000
databases and annotations
1000000

100000
Sequences
10000
Annotations
1000

100

1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
Year

power, storage, software, and data managed and shared Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The
by different organizations. This technology has shown distributed annotation system. BMC Bioinformatics 2:7
Eurogrid (2001) http://www.eurogrid.org/
to be very reliable in solving large-scale bioinformat- Fayyad U, Haussler D, Stolorz P (2006) Mining scientific data.
ics-related problems and improving the efficiency and Commun ACM 39(11):51–57
effectiveness of the computation on biological data Foster I, Kescbrselman C, Nick J, Tuecke S (2003) The physi-
(Cesario and Talia 2010). In the last years, various ology of the grid. In: Berman F, Fox G, Hey A (eds) Grid
computing: making the global infrastructure a reality. Wiley,
international scientific projects have been developed New York, pp 217–249
(and are currently under development) on this field. GenBank (2011) http://www.ncbi.nlm.nih.gov/genbank/
The most important are Euro BioGrid (Eurogrid 2001), Grossman RL, Kamath C, Kegelmeyer P, Kumar V, Namburu
Asia Pacific BioGrid (APBiogrid 2001), UK BioGrid RR (2001) Data mining for scientific and engineering appli-
cations. Kluwer Academic, Dordrecht/Boston
(UKBiogrid 2001), and North Carolina BioGrid Haider S, Ballester B, Smedley D, Zhang J, Rice P,
(NCBiogrid 2001) showing that the Grid is a reliable Kasprzyk A (2009) BioMart Central Portal – unified access
and useful infrastructure for the management and anal- to biological data. Nucleic Acids Res 37:23–27
ysis of distributed biological data. Jenkinson AM, Albrecht M, Birney E, Blankenburg H, Down T,
Finn RD, Hermjakob H, Hubbard TJP, Jimenez RC, Jones P,
Kahari A, Kulesha E, Macı́as JR, Reeves GA, Prlic A (2008)
Integrating biological data – the distributed annotation
Cross-References system. BMC Bioinformatics 9:S3
Meyer F (2006) Genome sequencing vs. Moore’s law: cyber
challenges for the next decade. Trends and tools in bioinfor-
▶ General-Purpose Computation, Graphics Processing matics and computational biology. CTWatch Quart 2(3)
Units http://www.ctwatch.org/quarterly/articles/2006/08/genome-
▶ Grid Computing, Parallelization Techniques sequencing-vs-moores-law/
NCBiogrid (2001) http://www.ncbiogrid.org/
Smedley D, Swertz MA, Wolstencroft K, Proctor G,
Zouberakis M, Bard J, Hancock JM, Schofield P (2008)
References Solutions for data integration in functional genomics:
a critical assessment and case study. Brief Bioinform
APBiogrid (2001) http://compaq.apbionet.org/grid/ 9(6):532–544
Baralis E, Fiori A (2008) Exploring heterogeneous biological Smedley D, Haider S, Ballester B, Holland R, London D,
data sources. In: 19th international workshop on database and Thorisson G, Kasprzyk A (2009) BioMart – biological
expert systems applications, Turin, Italy, pp 647–651 queries made easy. BMC Genomics 10:22
BioInfoBank Library (2010). http://lib.bioinfo.pl/courses/view/160 Tao C (2006) Toward making online biological data machine
Cesario E, Talia D (2010) Using grids for exploiting the abun- understandable. In: Proceedings of the 5th international
dance of data in science. Scalable Comput Pract Exp semantic web conference (ISWC 2006) doctoral consortium,
11(3):251–262 Athens, GA, November 2006, pp 992–993
Distributed Data Management 603 D
Topaloglou T, Davidson SB, Jagadish HV, Markowitz VM, Distributed Data
Steeg EW, Tyers M (2004) Biological data management: Information organized in a semantically related way,
research, practice and opportunities. In: Proceedings of the
thirtieth international conference on very large data bases, also divided in pieces of information with rules to be
Toronto, pp 1233–1236 able in reconstructing them. Pieces of information may
UKBiogrid (2001) http://www.mygrid.org.uk/ represent data duplication (minimizing data loss risks)
or partition (saving space).

Distributed Data Management Characteristics D

Pietro Hiram Guzzi1, Giuseppe Tradigo2 and Availability of data in digital way with an increasing
Pierangelo Veltri2 data precision and quality is increasing the need of
1
Department of Experimental Medicine and Clinic, both support and spaces for data storing as well as
University Magna Gratia of Catanzaro, Catanzaro, procedures and structures for data exchanging. In
Italy such a scenario, the term Distributed Data Manage-
2
Department of Medical and Surgical Sciences, ment refers to a set of methodologies, architectures,
University Magna Græcia of Catanzaro, and tools enabling the efficient management of data
Catanzaro, Italy stored in geographically distributed databases aiming
both to reduce access time and to allow efficient
knowledge extraction. In research contexts, distributed
Synonyms data management refers to a set of technologies
enabling the realization of a distributed laboratory in
Data Management: Data processing; Data storing and which different research unit collaborate performing
querying different experiments on the same project.
Distribution may help in saving spaces and improve
data availability and replication. It is the case of bio-
Definition logical data management where data produced are
huge and information extraction often is a hard task.
Database For example, data produced by many experimental
A database is a set of information semantically orga- platforms used in the biological field have been largely
nized and correlated such that it can be used to guide used in many studies to identify molecules that may be
processes or opportunely combined to form knowledge. related to human diseases (Cannataro et al. 2010).
Databases can be reconducted to database management Computational intensive applications for data manip-
systems that include all procedures necessary to orga- ulation require distributed processing, as in biological
nize, store, index, and query data and to keep data applications, where, thanks to always more accurate
always consistent, that is, always valid. techniques to manipulate or to simulate information
obtained from molecules are available (Cannataro
Distributed Processing et al. 2010), a typical study involves large number of
The distributed processing means solving an samples and huge amount of data. Data distribution
input problem by means of subprocesses evaluated on allows retrieving information in similar way as in
different processing unit. Processing unit may centralized data management structure, allowing scal-
be located on single computational machine or can be ability in terms of data and users, where parallel data
dislocated on different machine wired and located in manipulation from different users allows to improve
the same building or geographically distributed. knowledge of the database.
Main requirements of distributed data storing with
Data Management several nodes each with part of database and part of
Organizing information in data containers for storing, local data are as follows:
saving, and maintaining them allowing querying and • The introduction of a commonly shared data model
information retrieval. able to capture both raw data of the experiment and
D 604 Distributed Query Optimization

related metadata; currently existing approaches are Cross-References


often based on XML-based languages for the rep-
resentation of data and metadata, for example, all ▶ General-Purpose Computation, Graphics Processing
the languages developed by the HUPO-PSI Units
initiative (www.hupo-psi.org). ▶ Grid Computing, Parallelization Techniques
• The definition of a uniform and widely accepted
access and manipulation strategy for such large
datasets enabling the sharing of information coming
from single experiment on a specified laboratory References
has to be shared and validated in a distributed lab-
oratory environment. Branden C, Tooze J (1999) Introduction to protein structure,
2nd edn. Garland Science, New York. ISBN 0815323050
• The definition of a high performance data transfer Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein
strategy. interactions: technologies, databases, and algorithms. ACM
• The definition of a set of rules and prescription Comput Surv 43(1):1
guaranteeing data privacy. Valduriez P (1993) Parallel database systems: open problems
and new issues. Distrib Parallel Databases 1(2):137–165.
The advantages of distribution for data management doi:10.1007/BF01264049, ISSN 0926–8782
has been considered in previous studies (see for Veltri P (2008) Algorithms and tools for analysis and manage-
instance Valduriez (1993)), and it has been becoming ment of mass spectrometry data. Brief Bioinform 9(2):
always more and more required for solving problems 144–155
where computational time requires several nodes and
input data is very large (it is the case for instance of
proteins interactions simulation or protein structure
prediction Branden and Tooze (1999)). As an appli-
cation example, consider a distributed laboratory as Distributed Query Optimization
a framework environment composed of several
nodes, each one associated to a laboratory running ▶ Distributed Query Processing
its local database. The framework efficiently supports
distributed storage and manipulation of experimental
data. Each node contains an application programming
interface mounted on a data storage system
hosting data produced by the laboratory. Scientists Distributed Query Processing
may cooperate working each in his/her own labora-
tory each one running tens of experiments on different Steve R. Pettifer and Teresa K. Attwood
available biological data sets (as for instance for Faculty of Life Sciences and School of Computer
mass spectrometers as in Veltri (2008)). Such Science, University of Manchester, Manchester, UK
a configuration can potentially lead to terabytes of
data produced each week or even each day. Pushing
these considerations to their limit data processing Synonyms
(e.g., reduction of noise and allowing data compara-
ble in terms of instrument accuracy) can easily scale- Distributed Query Optimization; Distributed Querying
up minimal computational requirements. Algorithms
for data manipulation and information extraction, that
often use access to external databases, in a such Definition
scaled-up context become heavy time-consuming
tasks, whereas in a distributed data management A distributed query is a kind of database query that
environment allow the cooperation and problems interrogates multiple databases. These can be
tractable. colocated (typically for performance), or distributed
Diversity 605 D
across geographically separate locations (typically as computer files. It is most commonly used in
used for data integration). Each component database projects where a team of people may be working on
usually holds only a part of the integrated whole. The the same files concurrently. In contrast to a central-
source query is translated into several individual ized version control system, in a distributed version
queries, which are executed on the separate databases. control system there is no single central repository.
The results of each query are then reassembled to give In the case of the CellML model repository, this
the required result. allows modelers to be able to work independent
of the online CellML model repository, and share
their changes directly with each other until they D
Cross-References decide the model is ready to be uploaded into the
repository.
▶ Data Integration and Visualization The version control software employed by the
CellML model repository is Mecurial (http://mercu-
rial.selenic.com/).

Distributed Querying

▶ Distributed Query Processing Cross-References

▶ CellML Model Repository

Distributed Revision Control

▶ Distributed Version Control System (DVCS) References

Mecurial http://mercurial.selenic.com/

Distributed Version Control System


(DVCS)
Distribution-Free Tests
Catherine M. Lloyd
Auckland Bioengineering Institute, University of ▶ Hypothesis Testing, Parametric vs Nonparametric
Auckland, Auckland, New Zealand

Synonyms
Disturbance
Decentralized Version Control; Distributed revision
control; Revision control; Source control; Version ▶ Perturbation
control

Definition
Diversity
Version control is the management of changes to
documents, programs, and other information stored ▶ Diversity (D) Gene
D 606 Diversity (D) Gene

(upstream) and in 30 (downstream) a recombination


Diversity (D) Gene signal (50 D-RS and 30 D-RS, respectively) (▶ Recom-
bination Signal (RS)). These 50 D-RS and 30 D-RS are
Marie-Paule Lefranc specifically recognized by the enzyme recombinase
Laboratoire d’ImmunoGénétique Moléculaire, that, in the most usual chronology, allows first a D
Institut de Génétique Humaine UPR 1142, Université gene to be rearranged to a J gene, and then a V gene
Montpellier 2, Montpellier, France to be rearranged to the previously rearranged D-J gene.
The sequence resulting from the V-D-J rearrangement
encodes the V-DOMAIN (▶ Variable (V) Domain) of
Synonyms the IGH, TRB and TRD chains (▶ Chain Type)
(Lefranc and Lefranc 2001a, b).
D gene; Diversity; Diversity gene

Cross-References
Definition
▶ Chain Type
The diversity (D) gene, or “diversity” is ▶ Configuration Type
a ▶ leafconcept of the “▶ GeneType” concept of iden- ▶ Constant (C) Gene
tification (generated from the ▶ IDENTIFICATION ▶ Conventional Gene
Axiom) of ▶ IMGT-ONTOLOGY, the global refer- ▶ Gene Type
ence in ▶ immunogenetics and ▶ immunoinformatics ▶ IMGT ® Information System
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, ▶ IMGT-ONTOLOGY
2005, 2008; Duroux et al. 2008), built by IMGT ®, the ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
international ImMunoGeneTics information system ® ▶ IMGT-ONTOLOGY, Leafconcept
(http://www.imgt.org) (▶ IMGT ® Information Sys- ▶ Immunogenetics
tem). “Diversity” identifies a gene that rearranges ▶ Immunoglobulin Synthesis
at the DNA level and codes the diversity region of ▶ Immunoinformatics
the variable domain of an immunoglobulin (IG) ▶ Joining (J) Gene
or antibody or of a T cell receptor (TR) chain ▶ Recombination Signal (RS)
(▶ Chain Type). ▶ Variable (V) Domain
It is one of the four leafconcepts that are ▶ Variable (V) Gene
characteristics of the IG and TR loci (Lefranc
and Lefranc 2001a, b), the other three being “vari-
able” (V), “joining” (J), and “constant” (C) (▶ Vari-
References
able (V) Gene, ▶ Joining (J) Gene, ▶ Constant (C)
Gene). Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
The diversity (D) genes are observed in three loci: Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
IG heavy (IGH), TR beta (TRB), and TR delta (TRD), ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
where they participate to V-D-J rearrangements ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
(Lefranc and Lefranc 2001a, b) (▶ Immunoglobulin Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Synthesis). An IG or TR diversity (D) gene has three FactsBook. Academic Press, London, pp 1–458
possible and exclusive configurations (▶ Configura- Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic Press, London, pp 1–398
tion Type): a germline configuration (before DNA
Lefranc M-P, Giudicelli V, Ginestoux C, Bose N, Folch G,
rearrangement), a partially rearranged configuration Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
(after D-J DNA rearrangement or, less frequently, Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
V-D or D-D rearrangements), and a rearranged config- Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
uration (after a complete V-D-J DNA rearrangement,
and immunoinformatics. In Silico Biol 4:17–29
that eventually may involve several D). In the germline Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
configuration, a diversity (D) gene possesses in 50 Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
DNA Methylation 607 D
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 DNA Damage Checkpoint
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform ▶ Cell Cycle Checkpoints
9:263–275

DNA Damage Response


Diversity Gene D
▶ Cell Cycle Arrest After DNA Damage
▶ Diversity (D) Gene

DNA Labeling
Dividing Cells Depletion
▶ Modeling, Cell Division and Proliferation
▶ Quantifying Lymphocyte Division, Methods

DNA Chip DNA Methylation

Yan Zhang
▶ DNA Microarrays
Key Laboratory of Systems Biology, Shanghai
Institutes for Biological Sciences, Chinese Academy
of Sciences, Shanghai, China
DNA Content

▶ Lymphocyte Labeling, Cell Division Investigation


Definition
▶ Quantifying Lymphocyte Division, Methods
DNA methylation refers to the addition of a methyl
group to the 5 position of the cytosine pyrimidine ring
or the number 6 nitrogen of the adenine purine ring in
DNA strand. This modification can be inherited through
DNA Damage
cell division. The attachment of a methyl group to these
nucleotides can serve many important biological pur-
Paolo Plevani
poses and is a crucial part of normal development and
Dipartimento di Scienze Biomolecolari e
cellular differentiation in higher organisms.
Biotecnologie, Università di Milano, Milan, Italy

Characteristics
Definition
The DNA in many different types of organisms can
DNA is modified by numerous chemico-physical
undergo DNA methylation, though it does not always
agents causing a variety of lesions on the DNA
necessarily serve the same function. In plants, for exam-
molecule.
ple, scientists believe that methylation occurs to deacti-
vate genes that could otherwise cause harmful
Cross-References mutations. In fungi, DNA methylation is used to mod-
erate and control the expression of certain genes based
▶ Cell Cycle Checkpoints on the particular conditions affecting the fungus.
D 608 DNA Methylation

DNA Methylation, Unmethylated cytosine


Fig. 1 Inheritance of the CH3
DNA methylation pattern. The
T T C G A T T A C G A
DNA methyltransferase can 5’ 3’
methylate only the CG
sequence paired with 3’ 5’
A A G C T A A T G C T
methylated CG. The CG
sequence not paired with
CH3
methylated CG will not be
methylated
DNA replication

CH3

T T C G A T T A C G A T T C G A T T A C G A
5’ 3’ 5’ 3’

3’ 5’ 3’ 5’
A A G C T A A T G C T A A G C T A A T G C T

CH3
Methylation Methylation

CH3 CH3

T T C G A T T A C G A T T C G A T T A C G A
5’ 3’ 5’ 3’

3’ 5’ 3’ 5’
A A G C T A A T G C T A A G C T A A T G C T

CH3 CH3

Not recognized by
DNA methyltransferase

Methylation in mammals similarly moderates and characteristics necessary for multicellular life from
inhibits the expression of certain genes; additionally, it a single immutable sequence of DNA. DNA methyla-
is involved in the production of chromatin, a protein- tion also plays a crucial role in the development of
DNA complex that makes up the structure of nearly all types of cancer (Jaenisch and Bird 2003).
chromosomes. DNA methylation at the 5 position of cytosine has
DNA methylation stably alters the gene expression the specific effect of reducing gene expression and has
pattern in cells. DNA methylation is typically removed been found in every vertebrate examined. In adult
during zygote formation and reestablished through somatic tissues, DNA methylation typically occurs in
successive cell divisions during development. How- a CpG dinucleotide context; non-CpG methylation is
ever, the latest research shows that hydroxylation of prevalent in embryonic stem cells (Lister et al. 2009).
methyl group occurs rather than complete removal of
methyl groups in zygote (Iqbal et al. 2011). Some
methylation modifications that regulate gene expres-
References
sion are inheritable and are referred to as epigenetic
regulation (Fig. 1). Iqbal K, Jin SG, Pfeifer GP, Szabó PE (2011) Reprogramming of
In addition, DNA methylation suppresses the expres- the paternal genome upon fertilization involves genome-
sion of viral genes and other deleterious elements that wide oxidation of 5-methylcytosine. Proc Natl Acad Sci
USA 108(9):3642–3647
have been incorporated into the genome of the host over
Jaenisch R, Bird A (2003) Epigenetic regulation of gene expres-
time. DNA methylation also forms the basis of chroma- sion: how the genome integrates intrinsic and environmental
tin structure, which enables cells to form the myriad signals. Nat Genet 33(suppl (3s)):245–254
DNA Microarrays 609 D
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti- Definition
Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L,
Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH,
Thomson JA, Ren B, Ecker JR (2009) Human DNA This term refers to hybridization-based analytic plat-
methylomes at base resolution show widespread epigenomic forms composed of hundreds to millions of DNA
differences. Nature 462(7271):315–322 probes with defined sequences arrayed on a solid sur-
face to measure, in parallel, nucleic acids present in
a complex sample (Wheelan et al. 2008). The sample is
first labeled with a fluorescent dye and then hybridized
DNA Microarrays to the DNA microarray. After hybridization and wash- D
ing, the fluorescence signal intensities from each probe
J€
urg B€ahler and Samuel Marguerat are recorded using a laser scanner, providing
Department of Genetics, Evolution & Environment a semiquantitative measure for the amount of nucleic
and UCL Cancer Institute, University College London, acid molecules whose sequence is complementary to
London, UK any given probe. One or two samples can be hybridized
on a single microarray. When two samples are ana-
lyzed together, they are labeled with different dyes and
Synonyms the relative intensities of the two dyes are analyzed for
each probe (two-color array; Fig. 1). When only one
Arrays; cDNA microarrays; DNA chip; Microarrays; sample is analyzed, the absolute signals of the probes
Oligo microarrays are used instead. Microarray probes are either PCR

dye A
Culture A Culture B ratio
dye B

Data extraction Gene 1 12


and analysis Gene 2 0.3
Gene 3 1.1
...

RNA
extraction

Microarray
scanning
sm3445

Microarray

cDNA synthesis
and labelling
+
Microarray
hybridisation
+

dye A dye B

DNA Microarrays, Fig. 1 Schematic representation of a two-color microarray experiment


D 610 DNA Modification

products or shorter DNA primers (typically 25–60 Cross-References


nucleotides). Probes are either printed on a solid sur-
face using a robot, or synthesized directly on the ▶ DNA Replication
surface.
DNA microarrays provide a versatile platform to
analyze RNA transcript levels and structures as well as
genome structure (array CGH). When applied to the DNA Repair
study of ▶ transcriptomes, DNA microarrays typically
consist of probes directed against annotated gene fea- Paolo Plevani
tures. However, a specialized type of microarray, Dipartimento di Scienze Biomolecolari e
called “tiling array,” consists of probes with sequences Biotecnologie, Università di Milano, Milan, Italy
tiled systematically across a genome. Tiling arrays
permit the analysis of the transcriptional landscape of
cells or tissues without being restricted by existing Definition
gene annotations.
The various types of molecular processes repairing
specific classes of lesions in the DNA.
Cross-References

▶ Cell Cycle Analysis, Expression Profiling Cross-References

▶ Cell Cycle Checkpoints


References

Wheelan SJ, Martı́nez Murillo F, Boeke JD (2008) The incred-


ible shrinking world of DNA microarrays. Mol Biosyst
4:726–732
DNA Replication

Zoi Lygerou1, K. K. Koutroumpas2 and John Lygeros2


1
School of Medicine, Laboratory of General Biology,
DNA Modification University of Patras, Patras, Greece
2
Automatic Control Laboratory, ETH Zurich, Zurich,
▶ Post-Replication Modification Switzerland

Synonyms
DNA Polymerases
DNA synthesis
Zoi Lygerou
School of Medicine, Laboratory of General Biology,
University of Patras, Patras, Greece Definition

DNA replication is the process of making an identical


Definition copy of the genetic material within each cell
(Alberts et al. 2007; DePamphilis 2006). In eukaryotes,
DNA polymerases are enzymes that synthesize new DNA replication takes place during a defined period
DNA by moving along the template strand and synthe- of the ▶ cell cycle, called S (for synthesis) phase.
sizing a new strand of complementary DNA sequence DNA replication must be carried out with great
by nucleotide polymerization in the 50 to 30 direction. precision every time the cell divides, so that genetic
DNA Replication 611 D
information is preserved. Control mechanisms ensure place at the replication fork. Okazaki fragments on the
that every base of the genome is replicated once and lagging strand are stitched together by the action of the
only once per cell cycle, thereby safeguarding genomic endonuclease Fen1 and DNA ligase. A fork protection
integrity. complex (consisting of timeless, tipin, claspin, and
And1/ctf4) safeguards integrity of the fork when poly-
merases are forced to stall. Topoisomerases ensure that
Characteristics topological tension introduced by replication is
relieved. Newly synthesized DNA is repackaged into
We present key characteristics of DNA replication in chromatin by histone deposition complexes such as D
eukaryotic cells. CAF-1, which is recruited to the replication fork
by interactions with PCNA. The replisome copies
Replication Forks Move Continuously Along the the leading and lagging strand at the same time and
Genome as Replisomes Catalyze DNA Synthesis replication forks move continuously along the genome,
Each cell must accurately copy millions of bases of producing identical copies of the cell’s genetic material.
DNA (six billion base pairs in a human cell) before
every cell division. DNA replication initiates from DNA Replication in Eukaryotes Is Complex and
thousands of sites along eukaryotic chromosomes, Uncertain
called ▶ replication origins. Unwinding of the double- Origin selection and activation is a crucial part of
stranded DNA at replication origins (origin firing) cre- replication and various organisms have evolved differ-
ates a replication bubble consisting of two Y-shaped ent ways to define origins of replication (Gilbert 2004).
DNA structures called ▶ replication forks where DNA In bacteria, there is a single, sequence-specific origin
synthesis takes place (Fig. 1). Replication forks move of replication and origin activation is deterministic: the
outward in both directions from the origin as the DNA is origin fires in every cell cycle with high fidelity. At the
replicated, eventually merging with a replication fork other extreme, in early fly and frog embryos origin
moving in the opposite direction (fork conversion). selection is a stochastic process. In Xenopus
Since the two strands of the template DNA have oppo- preblastula embryos, where replication must be com-
site orientations (antiparallel) and DNA synthesis can pleted fast, replication initiates apparently at random
only take place in the 50 to 30 direction, one of the DNA and at short intervals (8–15 kb) without discernible
strands in a replication fork is replicated continuously sequence specificity. Most eukaryotic cells seem to
(leading strand) while the other is replicated in follow an intermediate route between a fully determin-
a backstitch fashion in pieces of around 100–200 base istic and a fully random origin selection mechanism.
pairs, called Okazaki fragments (lagging strand). Replication initiates from relatively specific regions
Multisubunit protein complexes comprising over 100 along the genome. In each cell cycle a fraction of
proteins (▶ Replisome) carry out the steps required for these regions are activated, giving rise to a different
DNA synthesis (DePamphilis 2006). A DNA helicase, distribution of initiation events along the genome at
made up from the hetero-hexameric MCM complex every S-phase. Moreover, the timing of firing of each
assisted by the tetrameric GINS complex and Cdc45 origin of replication is not fully determined: though
(CGM complex), unwinds the double-stranded tem- some origins tend to fire on average early and others
plate ahead of the replication fork, while replication late, generating a reproducible timing program in a
protein A (RP-A) binds and stabilizes the single- population of cells, the time at which a given origin
stranded DNA exposed by the helicase. ▶ DNA Poly- will fire may differ from cell to cell, giving rise to
merase a-primase lays down RNA-DNA primers for uncertainty also in the time domain. The process of
replication and is then replaced (polymerase switching) DNA replication thus follows a unique pattern in each
by polymerase e on the leading strand and polymerase d cell in a population. Every cell must therefore remem-
on the lagging strand. The replication clamp PCNA ber, at every point in time, which parts of its genome
(proliferating cell nuclear antigen), a homo-trimer have been replicated, and should not be replicated
loaded by the clamp loader RF-C (replication factor a second time and which parts remain unreplicated.
C) encircles the DNA, holds the polymerases in place, Such molecular memory is brought about by origin-
and choreographs the multiple transitions that take bound multisubunit protein complexes.
D 612 DNA Replication

DNA Replication,
Fig. 1 The eukaryotic DNA Fork movement
replication fork

MCM
3’
PCNA GINS 5’
Pol d
Cdc45
Lagging strand
5’
3’ RP-A

Okazaki fragment
Pol e
PCNA

d
an
str
ing
5’ ad
Le
3’

Dynamic Origin-Bound Complexes Safeguard The Cell Cycle Control System Orchestrates
Once per Cell Cycle Replication Transitions
Every part of the genome must be replicated once and DNA replication must be accurately controlled in
only once per cell cycle. This is ensured by ordered space and time and must be coordinated with cell
transitions in protein complexes bound at origins of cycle progression. How are the ordered transitions of
replication (Blow and Dutta 2005, Fig. 2). When mito- origin-bound complexes brought about at the correct
sis is completed and cells move into a new G1 phase, point in time during the cell cycle? Cyclin dependent
all putative origins become licensed (▶ DNA Replica- kinases (CDKs) (▶ Cyclins and Cyclin-dependent
tion Licensing) for a round of replication by the assem- Kinases), the master regulators of the cell cycle,
bly of ▶ prereplicative complexes onto origins, cross-talk with origin-bound complexes to ensure
consisting of the six subunit origin recognition com- once per cell cycle replication (Blow and Dutta 2005,
plex (ORC), the loading factors Cdc6/Cdc18 and Cdt1, Fig. 2). When mitosis is completed, CDK activity
and the six subunit MCM complex, which will later act levels are low. Only then can pre-replicative com-
as the replicative helicase. As cells move into S-phase, plexes assemble at origins of replication (window of
specific origins are activated by modifications and opportunity). Increase in CDK activity levels at the
recruitment of additional factors (Cdc45, the GINS G1/S transition signals conversion of pre-replicative
complex, Dpb11/TopBP1, Sld2/RecQL4, Sld3/ complexes to pre-initiation complexes and entry into
Treslin, Mcm10, and others) which turn the pre- S-phase. CDK activity levels over the G1/S transition
replicative complex into a pre-initiation complex, threshold however inhibit further assembly of pre-
lead to activation of the helicase, recruitment of poly- replicative complexes, ensuring that licensing will
merases, and origin firing. Complexes which remain at only take place again after mitosis has been completed
origins after firing (or passive replication from a fork and CDK activity levels have dropped (▶ Cell Cycle
coming in from a nearby origin) are at the post- Transitions, Mitotic Exit). The CDK cycle therefore
replicative state, and cannot support replication initia- restricts licensing and replication to different windows
tion again until mitosis has been completed and a new of the cell cycle, guarding against re-replication. To
round of licensing has taken place. It is therefore the ensure that mitosis only occurs after DNA replication
composition of complexes along the DNA which dic- has been completed, replication complexes signal to
tate when and where replication will initiate. the cell cycle control system to inhibit CDK activity
DNA Replication 613 D
DNA Replication, Replication
Fig. 2 DNA replication
licensing
Replisome

Origin Licensing Block to re-replication


D
ORC Cdt1 ORC
Cdc6 G1 G2

MCM
Low CDK High CDK
Pre-replicative Complex Post -replicative Complex

from reaching the levels required for mitotic entry single-cell level and statistically analyze the properties
(▶ G2/M Checkpoint) until every part of the genome of the process at the population level. Spiesser et al.
has been replicated. Defects in the cross-talk between (2009) modeled replication in Saccharomyces deter-
CDKs and origin-bound complexes can lead to over- ministically using data for location and firing times of
replication of the genome or a catastrophic entry into a fraction of replication origins. De Moura et al. 2010
mitosis with unreplicated DNA. developed a stochastic model of DNA replication
which was employed for quantitative analysis of the
Mathematical Models of DNA Replication dynamics of replication of chromosome VI of S.
As eukaryotic DNA replication is characterized by cerevisiae, while Yang et al. 2010 presented an ana-
a high degree of uncertainty, both in the location and lytical model of DNA replication with which replica-
in the time of activation of origins along the genome, tion timing across the S. cerevisiae genome was
mathematical models have been employed to capture analyzed. A model to analyze replication fork failure
how DNA replication may be progressing in each cell in metazoan cells was developed by Blow and Ge
in a population and to allow accurate interpretation of 2009. The model assumes stochastic origin activation
experimental data (reviewed in Hyrien and Goldar in a cluster of 5–100 potential origins on a circular
2010). One of the first models to be developed was 250 kb DNA molecule, modeling replication in a series
a stochastic model for DNA replication based on the of discrete time steps.
KJMA model of phase transition kinetics, originally
used to analyze single-molecule data of DNA replica- Uncertainty and Robustness in DNA Replication
tion in cell-free extracts of Xenopus laevis embryos Model predictions and single cell experiments indicate
and further exploited for analyses of DNA replication that random selection and activation of origins of rep-
dynamics (Herrick et al. 2002; Hyrien and Goldar lication results in an exponential distribution of dis-
2010). A stochastic hybrid model of eukaryotic DNA tances between active origins. Such a distribution
replication incorporating exact locations and firing would produce infrequent large inter-origin gaps,
propensities of origins along a complete genome was which would need a long time to be replicated. This
proposed by Lygeros et al. 2008. The model was complication of random origin selection has been
instantiated using experimental data for Schizosac- named the random completion or random gap prob-
charomyces pombe and Monte Carlo simulations lem. Several hypotheses have been proposed to resolve
were used to reproduce full genome replication at the this paradox (reviewed in Legouras et al. 2006;
D 614 DNA Replication

Hyrien and Goldar 2010), including redistribution of the process of DNA replication is organized in time
a limiting factor, defined spacing of active origins, or and space and how it cross-talks with other cellular
origin redundancy. It has also been suggested that processes, such as transcription and the inheritance of
S-phase may in fact last longer than previously epigenetic states.
assumed, occupying much of what is currently thought
of as the G2 phase of the cell cycle (Lygeros et al. 2008).
Given the uncertainty inherent in stochastic origin Cross-References
selection, why have eukaryotes not opted for
a deterministic mode of origin selection? It is likely ▶ Cell Cycle
that stochastic origin selection offers robustness because ▶ Cyclins and Cyclin-dependent Kinases
of redundancy (Legouras et al. 2006): there are many ▶ DNA Polymerases
more origins ready to fire than those actually required to ▶ DNA Replication Licensing
complete S-phase which can be used if need arises. One ▶ Prereplicative Complex
example of this is DNA damage: when cells experience ▶ Replication Fork
DNA damage and active forks arrest, the presence of ▶ Replication Origin
dormant origins becomes essential for the completion of ▶ Replisome
DNA replication (Blow and Ge 2009). Differences in
transcriptional programs in different cell types offers
another example: transcription cross-talks with origin
selection and origin usage changes under different met-
References
abolic conditions or differentiation. The excess in puta- Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P
tive origins may therefore ensure timely completion of (2007) Molecular biology of the cell, 5th edn. Garland
replication under various, often adverse, conditions. Science, New York
Blow JJ, Dutta A (2005) Preventing re-replication of chromo-
somal DNA. Nat Rev Mol Cell Biol 6:476–486
Higher-Order Organization of DNA Replication Blow JJ, Ge XQ (2009) A model for DNA replication showing
Within the Cell Nucleus how dormant origins safeguard against replication fork fail-
We have thus far considered DNA replication as ure. EMBO Rep 10:406–412
a linear process along the DNA. DNA is however de Moura APS, Retkute R, Hawkins M, Nieduszynski CA
(2010) Mathematical modelling of whole chromosome rep-
tightly packaged within the cell nucleus and DNA lication. Nucleic Acids Res 38:5623–5633
replication is topologically organized, providing an DePamphilis ML (2006) DNA replication and human disease.
additional level of regulation. Replication origins CSHL Press, Cold Spring Harbor
appear to fire in clusters (of 6–12 active origins within Gilbert DM (2004) In search of the holy replicator. Nat Rev Mol
Cell Biol 5:848–855
an approximately 1 Mb region) which are co-regulated Herrick J, Jun S, Bechhoefer J, Bensimon A (2002) Kinetic
and are visible within the cell nucleus as replication model of DNA replication in eukaryotic organisms. J Mol
foci (or factories). Such a topological organization Biol 320:741–750
may offer a number of advantages: sequestration of Hyrien O, Goldar A (2010) Mathematical modelling of eukary-
otic DNA replication. Chromosome Res 18:147–161
replication proteins within factories may help increase Legouras I, Xouri G, Dimopoulos S, Lygeros J, Lygerou Z
their local concentration facilitating the kinetics of (2006) DNA replication in the fission yeast: robustness in
DNA replication; co-replication of origins within the face of uncertainty. Yeast 23:951–962
a similar chromatin context may facilitate the inheri- Lygeros J, Koutroumpas K, Dimopoulos S, Legouras I, Kouretas P,
Heichinger C, Nurse P, Lygerou Z (2008) Stochastic hybrid
tance of epigenetic modifications at a given locus; modeling of DNA replication across a complete genome. Proc
local organization in clusters provides the possibility Natl Acad Sci USA 105:12295–300
for differential regulation within a cluster (e.g., local Spiesser TW, Klipp E, Barberis M (2009) A model for the
firing of dormant origins in the vicinity of DNA dam- spatiotemporal organization of DNA replication in Saccha-
romyces cerevisiae. Mol Genet Genomics 282:25–35
age) and outside the cluster (e.g., global inhibition of Yang SC, Rhind N, Bechhoefer J (2010) Modeling genome-wide
replication when DNA damage is present elsewhere in replication kinetics reveals a mechanism for regulation of
the genome). Future work will hopefully elucidate how replication timing. Mol Syst Biol 6:404
Doob–Gillespie Algorithm 615 D
References
DNA Replication Licensing
Maxam AM, Gilbert W (1977) A new method for sequencing
DNA. Biochemistry 74(2):560–564
Zoi Lygerou
School of Medicine, Laboratory of General Biology,
University of Patras, Patras, Greece

DNA Synthesis
Definition D
▶ DNA Replication
DNA replication licensing is the process of
marking origins of replication as competent for
a new round of replication. It takes place in late
mitosis and G1 and consists of the loading of Document Classification/Categorization
the hexameric MCM complex, which will later
act as the replicative helicase, onto origins of replica- ▶ Text Classification
tion. It requires the origin recognition complex
(a six subunit complex which binds to origin
DNA) and the MCM loading factors Cdc6/Cdc18
and Cdt1. Document Retrieval

▶ Information Retrieval
Cross-References

▶ DNA Replication
DOE

▶ Design of Experiments

DNA Sequencing

Jingky Lozano-K€uhne Domain Ontology


Department of Public Health, University of Oxford,
Oxford, UK ▶ Cell Cycle Ontology (CCO)

Synonyms
Domain Type
Deoxyribonucleic acid sequencing
▶ IMGT-ONTOLOGY, DomainType

Definition

DNA sequencing is the procedure of determining the


order of nucleotide bases, i.e., adenine, guanine, cyto- Doob–Gillespie Algorithm
sine, and thymine, in the DNA molecule (Maxam and
Gilbert 1977). ▶ Stochastic Simulation Algorithm
D 616 Dopamine

family of receptors which are stimulatory in function


Dopamine and their activation results in increased production of
the second messenger cyclic adenosine-50 -
Rajeswara Babu Mythri1, Shireen Vali2 and monophosphate (cAMP), while D2, D3, and D4 recep-
M. M. Srinivas Bharath1 tors belong to the D2-like family of receptors which
1
Department of Neurochemistry, National Institute of are primarily inhibitory and inhibit the production of
Mental Health and Neurosciences (NIMHANS), cAMP. Whatever be the type of receptor, on comple-
Bangalore, Karnataka, India tion of its function, DA is degraded by either mono-
2
Cell Works Group Inc., Bangalore, India amine oxidase A/B (MAO A/B) or catechol-O-methyl
transferase (COMT) at the synaptic cleft or following
reuptake into the presynaptic neuron. Alternatively,
Definition the DA undergoing reuptake into the presynaptic neu-
ron is repackaged into vesicles for the next cycle of
DA is a catecholamine that occurs in a wide variety of synaptic activity.
animals, including both vertebrates and invertebrates. DA synthesizing (dopaminergic) neurons project
In the brain, DA functions both as a neurotransmitter axons to larger brain regions which control behavior
and a neurohormone. DA functions by activating the and cognition, voluntary movements, motivation, pun-
five types of DA receptors termed D1–D5. DA is ishment and reward, inhibition of prolactin production,
produced in several areas of the brain, including the sleep, mood, attention, working memory, and learning.
SN and the ventral tegmental area. DA is released by DA is commonly associated with the pleasure system
the hypothalamus as a neurohormone which inhibits of the brain, providing feelings of enjoyment and rein-
the release of prolactin from the anterior lobe of the forcement to motivate a person proactively to perform
pituitary. Swedish scientist Arvid Carlsson, the Nobel certain activities. DA is released by naturally reward-
laureate performed fundamental biochemical experi- ing experiences such as food, sex, use of certain drugs,
ments demonstrating the neurotransmitter role of DA and stimuli that are associated with them.
which later paved the way for the first clinical therapy The nigrostriatal pathway that involves dopaminer-
of PD. gic neurons from the SN into the striatal region of the
DA is also a precursor for other catecholamine brain is the region of focus in PD pathology. This
neurotransmitters, epinephrine and norepinephrine. pathway is directly involved in controlling voluntary
Biosynthesis of DA in the body occurs from the movement. Gradual loss of these neurons in PD
amino acid precursor, L-tyrosine, by a two-step reac- patients causes drastic DA deficiency in the striatum
tion (Fig. 1). consequently reducing the ability to perform smooth
DA synthesized at the presynaptic terminal follow- and controlled movements.
ing suitable stimulus, is released into the synaptic cleft It could therefore be surmised that DA can be
and is taken up by DA receptors on the postsynaptic administered as a therapeutic molecule for controlling
terminal. D1 and D5 receptors belong to D1-like the motor symptoms of PD. However, DA supplied as

Dopamine, Fig. 1 Synthesis of DA from L-tyrosine


Drug Discovery 617 D
a drug acts on the sympathetic nervous system, pro-
ducing effects such as increased heart rate and blood DRIP
pressure. Since DA cannot cross the blood-brain bar-
rier, it does not directly affect the central nervous ▶ Mediator
system. Therefore, the therapeutic approach for PD
involves the administration of DA precursor
L-dihydroxy phenyl alanine (L-DOPA or levodopa),
which can cross the blood-brain barrier and induce Drug Discovery
DA synthesis. D
Riza Theresa Batista-Navarro
National Centre for Text Mining, Manchester
Cross-References Interdisciplinary Biocentre, Manchester, UK

▶ Disease System, Parkinson’s Disease


Synonyms

Pharmaceutical discovery

DOQCS: Database of Quantitative


Cellular Signaling Definition

▶ Database of Quantitative Cellular Signaling Drug discovery is “the process of identifying chemical
(DOQCS) entities that have the potential to become therapeutic
agents” (Decker and Sausville 2007).
There are two different approaches to drug discov-
ery: empirical and rational. Empirical drug discovery
involves finding a compound that produces a desired
DOR therapeutic effect in vitro. Initially, there is no
understanding of the candidate drug’s mechanism of
▶ Dense Overlapping Regulons action. In rational drug discovery, on the other hand,
the target is known from the beginning; scientists then
attempt to find or design compounds which would
interact with the target of interest (Decker and
Sausville 2007).
Dosage Compensation

▶ X Chromosome Inactivation Cross-References

▶ Biological Activity
▶ Drug Target
Doubling Time ▶ Natural Product Resources

▶ Modeling, Cell Division and Proliferation

References

Decker S, Sausville E (2007) Drug discovery. In: Atkinson AJ,


Downward Looking Abernethy DR, Daniels CE, Dedrick RL, Markey SP (eds)
Principles of clinical pharmacology, 2nd edn. Academic,
▶ Reduction Burlington, pp 439–447
D 618 Drug Disease Networks

possible network that a drug might influence in


Drug Disease Networks a metabolic system (Schwartz and Nacher 2009).

▶ Systems Pharmacology, Drug Disease Interactions


Cross-References

▶ Systems Pharmacology, Drug Disease Interactions


▶ Systems Pharmacology, Drug-Target Networks
Drug Metabolism

▶ Epigenetics, Drug Discovery


▶ Pharmacokinetics and Pharmacodynamics References

Handorf T, Ebenh€
oh O, Heinrich R (2005) Expanding metabolic
networks: scopes of compounds, robustness, and evolution.
J Mol Evol 61:498–512
Schwartz JM, Nacher JC (2009) Local and global modes
Drug Scope, Metabolic of drug action in biochemical networks. BMC Chem
Biol 9:4
Jean-Marc Schwartz
Manchester Institute of Biotechnology, Faculty of Life
Sciences, University of Manchester, Manchester, UK

Drug Target
Definition
Riza Theresa Batista-Navarro
The metabolic drug scope is an extension of the National Centre for Text Mining, Manchester
concept of scope in metabolic pathways. The scope Interdisciplinary Biocentre, Manchester, UK
of a metabolic compound is the set of all compounds
that can be generated in principle by transformations of
the seed compound, irrespective of kinetic and ther- Synonyms
modynamic laws that determine the rate at which these
transformations might actually take place (Handorf Molecular drug target; Target
et al. 2005). When a set of several seed compounds is
considered, the resulting scope is the set of all
compounds that can be generated by transformations Definition
and combinations of the seeds.
The scope of metabolic compounds is generated A drug target is a macromolecule that undergoes
through an expansion process. The principle used is a specific interaction (e.g., binding, inhibition)
that for any reaction to take place, all necessary sub- with a drug. A target is linked to a specific disease;
strates must be present. Starting from the seed com- the interaction between a drug and a target is
pounds, products generated by metabolic reactions expected to elicit an effect on the course of the
using the seeds are iteratively added, until no further disease.
reaction is possible. There are four main types of drug targets: proteins,
By extension, the metabolic drug scope is the scope polysaccharides, lipids, and nucleic acids. Among
generated by the enzymatic targets of a drug. The them, proteins are considered the best source of drug
metabolic drug scope is constructed by the expansion targets as most drugs have been shown to interact with
of a set of seeds containing the substrates and products them. Proteins can be further divided into seven fam-
of all metabolic reactions targeted by the drug. Essen- ilies: G-protein-coupled receptors, ion channels, pro-
tially, the metabolic drug scope represents the largest tein kinases, zinc metalloproteases, serine proteases,
Dynamic Bayesian Networks 619 D
nuclear hormone receptors, and phosphodiesterases
(Giegel et al. 2007). Drug Targets
Some targets belong to or are associated with path-
ogenic organisms (e.g., bacteria, viruses, and fungi), ▶ Epigenetics, Drug Discovery
which cause infectious diseases (Gies and Landry ▶ Host–Pathogen Systems, Target Discovery
2008). ▶ Pharmacogenomics

Cross-References D
Drugs
▶ Natural Product Resources
▶ Xenobiotics

References

Giegel DA, Lewis AJ, Worland P (2007) Diversity versus focus Duration Analysis
in choosing targets and therapeutic areas. In: Taylor JB,
Triggle DJ (eds) Comprehensive medicinal chemistry II.
Elsevier, Oxford, pp 753–770 ▶ Survival Analysis
Gies J, Landry Y (2008) Molecular drug targets. In: Wermuth ▶ Survival Analysis, Fundamental Statistical
CG (ed) The practice of medicinal chemistry, 3rd edn. Techniques
Elsevier Science and Technology, Amsterdam, pp 85–105

Drug Target Strategies Dye Dilution

▶ Pathway Targeting, Antimycobacterial Drug Design ▶ Quantifying Lymphocyte Division, Methods

Drug Target, Off-Target Dynamic Bayesian Networks

Ravi Iyengar Lin Wang


Department of Pharmacology and Systems School of Computer Science and Information
Therapeutics, Mount Sinai School of Medicine, Engineering, Tianjin University of Science and
New York, NY, USA Technology, Tianjin, China

Definition Synonyms

Target of a drug that is not the primary target, which Dynamic conditional independent graphs
may give rise to undesirable pathophysiology leading
to adverse events.
Definition

Cross-References Dynamic Bayesian network is a representation of


stochastic evolution of a set of random variables
▶ Adverse Events X ¼ {x1, x2, . . ., xn} over discretized time. It consists
▶ Primary Target of a directed graph representing conditional indepen-
▶ Systems Pharmacology dences and a family of conditional distributions
D 620 Dynamic Conditional Independent Graphs

P(xi(t)|Pa[xi(t-1)]), where Pa[xi(t-1)] represents the


set of parent nodes of the node xi(t). The temporal Dynamic Metabolic Flux Analysis
process of a dynamic Bayesian network is assumed to
be a time-homogeneous Markov process: Yun Lee, I-Chun Chou, Melissa L. Kemp and
Eberhard O. Voit
The Wallace H. Coulter Department of Biomedical
P½XðtÞjXð0Þ; Xð1Þ;    ; Xðt  1Þ ¼ P½XðtÞjXðt  1Þ;
Engineering, Georgia Institute of Technology and
(1) Emory University, Atlanta, GA, USA
The joint distribution over all the possible trajecto-
ries of the process is decomposed into the following
Definition
product form:
It is easier to deduce the static structure of a biological
Y
T
system from experimental data than to account for its
P½Xð0Þ; Xð1Þ;    ; XðTÞ ¼ Pð0Þ P½XðtÞjXðt  1Þ;
t¼1
full dynamic repertoire. Nonetheless, knowledge of the
(2) static structure reveals strict constraints that can form
the basis for constructing dynamic models. While this
conversion step from static to dynamic models is any-
Therefore, given an initial state of random vari-
thing but solved, this essay discusses some generic
ables, their evolution is given by:
strategies toward accomplishing the task.
Y
T
P½Xð1Þ;    ; XðTÞjXð0Þ ¼ P½XðtÞjXðt  1Þ Characteristics
t¼1 Experimental and Computational Strategies of Model
Y
T Y
n Construction
¼ Pðxi ðtÞjPa½xi ðt  1ÞÞ; Metabolic pathway systems are characterized by three
t¼1 i¼1 classes of components: metabolites and their concen-
trations, enzymes and modulators, and fluxes that
(3) describe how much material flows through the sys-
tems. Sole knowledge of metabolites at the normal
state of a pathway system is not sufficient to deduce
fluxes, and sole knowledge of fluxes does not permit
References inferences on the metabolite concentrations. Three
trends in metabolic analysis are in the process of con-
Chen L, Wang RS, Zhang XS (2009) Biomolecular networks. verging and have the potential to offer new insights:
Wiley, New York
1. Systematic profiling of metabolite concentrations
using NMR spectroscopy and mass spectrometry
2. Experimental metabolic flux analysis
3. Computational methods
Dynamic Conditional Independent Many platforms are now available to provide
Graphs targeted analysis of hundreds of metabolites in a single
biological sample. The metabolic profiles thus
▶ Dynamic Bayesian Networks obtained have been used to distinguish molecular alter-
ations between two samples, whether derived from
benign and metastatic cancer, or from a mutant and
the corresponding control. Current profiling methods
are mainly based on two analytical techniques: nuclear
Dynamic Mechanistic Explanation magnetic resonance (NMR) spectroscopy and mass
spectrometry (MS) coupled to a pre-separation tech-
▶ Mechanism, Dynamic nique such as chromatography or electrophoresis.
Dynamic Metabolic Flux Analysis 621 D
Of the two platforms, MS-related techniques are con- metabolite pool, the fluxes governing its synthesis
sidered more sensitive and have been used to obtain and degradation are equal, thus leading to the name
one-time snapshots of thousands of metabolites. NMR “flux balance analysis” (FBA; Palsson 2006). FBA
spectroscopy has its own advantages, because it is extends stoichiometric network analysis by also
noninvasive and of a high-throughput nature and can, accounting for thermodynamic and other physico-
therefore, be used to execute dense in vivo time-series chemical constraints that limit the reactions within
measurements of moderately large collections of the network. Furthermore, and in contrast to MFA,
metabolites. With considerable advances being made FBA identifies a particular flux distribution under the
on both analytical fronts, it is to be expected that time- assumption that the cell strives to meet a specific D
resolved profiles of thousands of metabolites will objective such as maximizing growth.
become the norm rather than the exception in the A significant feature – but also its major weakness –
foreseeable future. of FBA is that it contains no information about metab-
In contrast to the concentrations of enzymes and olite concentrations. This may pose a problem if the
metabolites that define cellular metabolism, metabolic purpose of an investigation is, for example, to investi-
fluxes cannot be measured directly but rather need to gate the effect of a certain mutation on the intracellular
be inferred from measurable quantities. A method level of a metabolite serving as a biomarker. In such a
developed for this purpose, metabolic flux analysis case, dynamic, kinetics-based models that center on
(MFA), dates back to the 1970s, where one of its metabolite concentrations appear to be a better fit.
early applications was to infer the rates of intracellular Traditionally, the formulation of dynamic models of
reactions from the measurements of secreted metabo- metabolic pathway systems starts with finding
lites and from coarse knowledge of the pathway struc- a functional representation for each reaction that best
ture (Aiba and Matsuoka 1979). Since then, the use of describes its kinetics in vitro. Given the explicit repre-
stable isotopes (mostly by feeding 13C-labeled sub- sentations of individual reactions, the next step is to
strates to cultured cells) was shown to refine the flux integrate them into a system of ordinary differential
estimates considerably, especially for ratios of fluxes equations (ODEs) where each equation describes the
at branch points (Stephanopoulos 1999). These local temporal change in one metabolite as a difference
flux ratios, when combined with a stoichiometric between the sums of rates (fluxes) of its synthesis and
model of the metabolic pathway system, furthermore degradation. Lastly, having determined the initial con-
permit the determination of absolute rates for all fluxes centrations, one solves the ODE model to obtain the
through an iterative fitting algorithm (Wiechert 2001). metabolic concentrations at different time points,
Evidently, this process poses a challenge for undertak- which are not necessarily at steady state, and compute
ing MFA in higher organisms because their pathway fluxes if needed. Overall, the design of dynamic
structure is more complex and, in many cases, models requires many kinetic details, but these models
ill-defined. To address this challenge, more advanced eventually offer the ability to predict all metabolite
experimental design and techniques like dynamic concentrations and fluxes under non-steady-state
labeling are required, but so far their application has conditions.
been limited (Voit et al. 2004; Schaub et al. 2008). Interestingly, there is little overlap between the
In addition to experimental approaches, computa- two different kinds of metabolic models. The only
tional methods have been developed and successfully common characteristic is that they both yield flux
used to study metabolic pathway systems, thanks in estimates at the steady state, although with distinct
part to advances in computers and the increasingly tactics: one uses a top-down approach by directly
easy access to them. Most existing methods fall into predicting the fluxes under a quasi-steady-state
two categories: (1) stoichiometric and flux-based assumption, whereas the other takes a bottom-up
models; and (2) dynamic, kinetics-based models. approach through solving the integrated ODE model
Models from the former class are based on the toward the steady-state. For a metabolic pathway
assumption that the metabolic transients are much system, however, the kinetic data obtained individu-
faster than both cellular growth and environmental ally with purified enzymes may not reflect the true
fluctuations. Consequently, the metabolic fluxes are kinetic behavior in vivo. Therefore, a sensible
assumed to be in a quasi-steady state where, for any approach would be first to determine the flux
D 622 Dynamic Metabolic Flux Analysis

identify a flux distribution that achieves maximal


growth by solving the task in Eq. 3:

maximize v4
subject to v1  v2  a1 v4 ¼ 0 (3)
Dynamic Metabolic Flux Analysis, Fig. 1 A generic linear v 2  v 3  a2 v 4 ¼ 0
pathway with feedback inhibition by the product (X2)
Notably, no explicit accounts of metabolite concen-
trations are involved in FBA, and the result exclusively
distribution with methods of MFA or FBA and then to addresses fluxes.
infer the kinetic parameters based on concentration As an alternative, one can construct an ODE
measurements and functional descriptions of individ- model using traditional enzyme kinetics. For instance,
ual reactions. Such a merger of two otherwise distinct assuming that all reactions follow the classic
modeling techniques has the following attraction: not Michaelis–Menten type kinetics and that the inhibition
only is the resulting model more accurate at the level by X2 is competitive, the corresponding ODE model is
of fluxes, but one also gains information on metabo- formulated as:
lite concentrations. This type of merger is greatly
facilitated by using canonical models within the X0 ¼ constant
framework of Biochemical Systems Theory (BST) V1 X0 V2 X1
X_ 1 ¼ v1  v2 ¼ 
(Savageau 1976; Voit 2000) because mechanistic X0 þ K1 ð1 þ X2 =K4 Þ X1 þ K2 (4)
assumptions regarding the enzymatic processes can V2 X 1 V 3 X2
X_ 2 ¼ v2  v3 ¼ 
be minimized. X1 þ K2 X2 þ K3
Let us illustrate the idea with a linear pathway
(Fig. 1). Suppose a fixed amount of substrate X0 is Solving Eq. 4 requires knowledge of the initial
converted into product X2 in two steps, with X2 metabolite concentrations and of all kinetic parameters
inhibiting the synthesis of the intermediate X1 through (Vi and Ki). In practice, these are often determined
feedback. using purified enzymes and thus prone to uncertainties
In FBA, the assumption of a metabolic quasi-steady due to the fact that in vivo systems are quite different
state leads to the following flux balance equations: from in vitro experiments. A novel means of parameter
estimation, thanks to the advent of metabolic time-
v1  v2 ¼ 0 series profiles, is to substitute the differentials on the
(1)
v2  v3 ¼ 0 left-hand side of Eq. 4 with estimated slopes at discrete
time points, transform the coupled system of differen-
Equation 1 is underdetermined since the number of tial equations into several sets of decoupled algebraic
fluxes exceeds the number of metabolites (equations) equations, and identify the optimized values of param-
where the steady-state constraints are imposed (i.e., eters via regression (Voit and Almeida 2004). For
X1 and X2). Therefore, infinitely many solutions exist. some metabolic systems, we can even derive the
In this case, a particular solution may be obtained if dynamic flux profiles from the slope data in the first
one of the three fluxes can be directly measured, as it place and then estimate the parameters on a flux basis
might be the case for substrate uptake (v1) or product (Goel et al. 2008).
formation (v3). If none of the three fluxes is measur- In cases where the time-series metabolic profiles are
able, one might be able to make reasonable assump- not accessible, the steady-state flux distributions, as
tions regarding the biomass composition of the system, determined by MFA or FBA, can be valuable for
such as: parameter estimation. The idea is that instead of
using the estimated fluxes at multiple time points
v4
a1 X1 þ a2 X2 ! Biomass; (2) within one experiment, one may slightly perturb the
system many times and record or predict the steady-
where a1 and a2 represent stoichiometric coefficients state responses. With flux and concentration data from
for the two species. Linear optimization can be used to multiple perturbation experiments, the task is again
Dynamic Metabolic Flux Analysis 623 D
reduced to fitting parameters individually for each flux. ln n2
To illustrate the point, it is convenient to model the
linear pathway in Fig. 1 with power-law functions, as
proposed in BST:
1 f2,1
2
X0 ¼ constant 3
X_ 1 ¼ v1  v2 ¼ g1 X 01;0 X 21;2  g2 X 12;1 :
f f f
(5)
X_ 2 ¼ v2  v3 ¼ g2 X12;1  g3 X23;2
f f
ln g2 D
ln X1
In Eq. 5, each flux has two or three unknown param-
eters (rate constants gi and kinetic orders fi,j). Thus, we Dynamic Metabolic Flux Analysis, Fig. 2 Estimation of f2,1
need flux and concentration data at three or more and g2 using data from three observed steady states. The colored
different steady states for parameter estimation. This circles refer to the log-transformed values (ln X1, ln v2) taken
argument is valid, because a flux, say v2, reduces to either from the nominal steady state (1) or from new steady states
where, for instance, the input substrate X0 is increased (2) or the
a linear equation, when we take logarithms of the activity of enzyme catalyzing v3 is decreased (3)
power-law representation:

ln v2 ¼ ln g2 þ f2;1 ln X1 : (6)

We can see that the kinetic order f2,1 is the slope in it was shown that even without a comprehensive set of
the plot of ln v2 versus ln X1, while the logarithm of concentration data, the conversion can still be accom-
the rate constant ln g2 is the y-intercept (Fig. 2). plished through a combination of model reduction and
If experimentally feasible, it is important that enough optimization methods (Lee and Voit 2010).
data are obtained to allow a statistical regression
analysis.
Many experimental strategies are available for arti- Cross-References
ficially driving the pathway of interest to different
metabolic steady states. For instance, one may slightly ▶ 13C Metabolic Flux Analysis
perturb the system, which is initially at a nominal ▶ Flux Balance Analysis
steady state, by changing the amount of substrate fed ▶ Metabolic Flux Analysis
into the pathway. Another approach is to genetically ▶ Metabolic Networks, Structure and Dynamics
modify the activity of pathway enzymes, for instance, ▶ Metabolic Pathway Analysis
through a gene knockdown experiment. In the latter ▶ Ordinary Differential Equation (ODE)
approach, one must also measure the relative enzyme ▶ Optimization Algorithms for Metabolites
activities compared to those in the control to adjust for Production
the bias introduced to the rate constants. ▶ Parameter Estimation, Metabolic Network
Overall, the transformation of steady-state, flux- Modeling
based models into dynamic, kinetics-based models ▶ Pathway Modeling, Metabolic
requires large amounts of concentration measure-
ments, which despite the substantial improvements
that have recently been made in the field of References
metabolomics, remains a challenge. The issue is espe-
Aiba S, Matsuoka M (1979) Identification of metabolic model:
cially significant in plant biology: not only do plants citrate production from glucose by Candida lipolytica.
synthesize a far more diverse array of metabolites than Biotechnol Bioeng 21:1373–1386
do animals and microorganisms, but we also lack Goel G, Chou I-C, Voit EO (2008) System estimation from
metabolic time-series data. Bioinformatics 24:2505–2511
the ability to identify the majority of signals from
Lee Y, Voit EO (2010) Mathematical modeling of
metabolic profile data (Saito and Matsuda 2010). monolignol biosynthesis in Populus xylem. Math Biosci
Nevertheless, even in this complicated case of plants, 228:78–89
D 624 Dynamic Metabolic Networks, k-Cone

Palsson BØ (2006) Systems biology: properties of reconstructed under external perturbations or internal gene dele-
networks. Cambridge University Press, Cambridge tions. In the assumption that linear perturbations
Saito K, Matsuda F (2010) Metabolomics for functional geno-
mics, systems biology, and biotechnology. Annu Rev Plant occur around a metabolic steady state, one can expect
Biol 61:463–489 that even though the perturbation induces changes
Savageau MA (1976) Biochemical systems analysis: a study of in the metabolic concentrations of the network,
function and design in molecular biology. Addison-Wesley, the system recovers its initial metabolic profile. The
Reading
Schaub J, Mauch K, Reuss M (2008) Metabolic flux analysis in dynamical description of this relaxation process is
Escherichia coli by integrating isotopic dynamic and described by:
isotopic stationary 13C labeling data. Biotechnol Bioeng
99:1170–1185 dm 0
Stephanopoulos G (1999) Metabolic fluxes and metabolic engi- ¼J m (1)
neering. Metab Eng 1:1–11 dt
Voit EO (2000) Computational analysis of biochemical systems: 0
a practical guide for biochemists and molecular biologists. where J is a diagonal matrix whose entries are the
Cambridge University Press, Cambridge eigenvalues of the Jacobian matrix while m is a vector
Voit EO, Almeida J (2004) Decoupling dynamical systems for
pathway identification from metabolic profiles. Bioinformat-
that specifies the modes of the system, i.e., the set of
ics 20:1670–1681 metabolites whose concentrations dynamically corre-
Voit EO, Alvarez-Vasquez F, Sims KJ (2004) Analysis of late at each timescale. Timescale distribution is defined
0
dynamic labeling data. Math Biosci 191:83–99 by the negative inverse of J eigenvalues.
Wiechert W (2001) 13C metabolic flux analysis. Metab Eng
Even though this theoretical framework can be
3:195–206
straightforwardly applied to genome-scale metabolic
reconstructions, a major limitation in dynamic model-
ing is the current lack of kinetic information. With
the purpose to overcome this limitation, the k-cone
Dynamic Metabolic Networks, k-Cone has been suggested as an appealing scheme for
exploring dynamic behavior of genome-scale recon-
Isaac F. López-Moyado and structions. This formalism refers to the feasible
Osbaldo Resendis-Antonio space of kinetic parameters that ensures a steady-
Center for Genomic Sciences-UNAM, Universidad state behavior in metabolic networks. From
Nacional Autónoma de México, Cuernavaca, a mathematical point of view, this space is defined
Morelos, Mexico through the next equation:

S  diagðCÞ  k~ ¼ 0 (2)
Synonyms
where S is the stoichiometric matrix and diag(C)
Feasible parameter space; k-cone space; Modal is a diagonal matrix whose entries are determined
analysis by a function of metabolic concentrations at steady
state (C ¼ Pxi jSij j , where SRij refers to the reactant
R

stoichiometric coefficients). In addition, k~represents


Definition a vector whose dimensionality is determined by
the number of metabolic fluxes in the network. The
Genome-scale metabolic reconstruction constitutes combination of modal theory and k-cone space
a paradigm in systems biology that currently supply with a statistical pipeline for exploring
extends its escope to eukaryotes, prokaryotes, and and surveying the dynamical behavior of genome-
archaea (Oberhardt et al. 2009). Among a variety scale metabolic reconstructions, this framework
of biological questions that can be surveyed with being independent of a complete knowledge of
these metabolic reconstructions, the dynamical the kinetics underlying the metabolic network
description of metabolism is a noteworthy issue for and strongly dependent on metabolome data, see
exploring how the metabolic phenotype changes Fig. 1.
Dynamic Metabolic Networks, k-Cone 625 D

Dynamic Metabolic Networks, k-Cone, Fig. 1 General over- parameters ensuring a steady state in the metabolic systems
view. (a) The metabolic state of a system can be obtained from a Jacobian library can be calculated. (d) Finally, the modal
metabolome data. (b) From this information and assuming that library is recovered from the diagonalization of the Jacobian
law mass action govern all metabolic reactions in the network, library. The Jacobian and the modal libraries contain informa-
the feasible space of kinetic parameters, the k-cone, can be tion about the timescales and the metabolic pools formed during
calculated. (c) Once defined the feasible set of kinetic the relaxation process, respectively

Characteristics the cell. Thus, with the purpose of studying the


dynamic properties of metabolism, it is convenient to
Modal Analysis find a way to study how a metabolic network relaxes to
Cells are continuously exposed to environmental per- its steady state after an environmental perturbation has
turbations whose biological effects are controlled by occurred (Kauffman et al. 2002). Modal analysis is
genetic, protein, and metabolic circuits for ensuring a conceptual framework which is useful for this
a proper functional state. In the particular situation purpose by assuming that the perturbations occurred
where the perturbation is small, one expects that ini- very close to a metabolic steady state. In this context,
tially the metabolite concentrations change inside the the temporal evolution of the systems is obtained by
cell; however, with time, these concentrations recover linearizing the dynamic mass balance equations
their original values characterizing a functional state in around a reference point, i.e., the steady state.
D 626 Dynamic Metabolic Networks, k-Cone

In general, the temporal behavior of a metabolite In this contextual scheme, the Jacobian matrix con-
concentration is calculated as a balance among the tains information about the topology of the metabolic
metabolic flux of those reactions that contribute to network and the thermodynamics of the reactions
increase and decrease the production and use of (Jamshidi and Palsson 2008).
metabolites. This principle is conserved at genome As we mentioned earlier, the objective of using
scale and the temporal behavior of a metabolic net- the theory of modal analysis is to recover dynamical
work integrated by n reactions and m metabolites is information of the system, in particular (1) how the
given by: metabolism rearranges its constituents while it relaxes
toward a steady state and (2) which are the timescales
d~
x during this process. Even though these questions can
~
¼SV (3)
dt be explored from Eq. 5, modal analysis is a proper
formalism to explore these issues in a more direct
where S denotes the ▶ stoichiometric matrix which is way. Thus, by applying a diagonalization on Eq. 5 we
formed from the stoichiometric coefficients of the obtain:
reactions that comprise a metabolic reconstruction. 0
This matrix is organized in such a way that J ¼ M  J  M1 (7)
every column corresponds to a reaction and every
row corresponds to a metabolic compound (as exem- Notably, J 0 defined in Eq. 7 is a diagonal matrix
plified by Fig. 2 panel D). In addition, V ~ refers to with the eigenvalues of J ordered descendingly and the
a vector that contains the fluxes of the reactions term M is a matrix whose columns are integrated by the
included in the reconstruction. Here d~ x eigenvectors of J. By substituting Eq. 7 into Eq. 4 we
dt is a vector
with time derivatives of the metabolite concentrations obtain:
and it is equal to zero when the system has reached
a steady state. x0
d~
x 0 ¼ M  J 0  M1  ~
¼ J ~ x0 (8)
Here we limit our description to analyze the dt
dynamical metabolic profile when small perturbations
are applied to the metabolic reconstruction. Thus, in d~x0 0

M1  ¼ M1  M  J 0  M1  ~


x (9)
order to model the deviation from the steady-state dt
concentration, we selected ~ x 0 ¼~
x ~
xst as our central
By defining the metabolic modes as:
variable where ~xst and ~
x indicate the concentrations of
the metabolites at the steady and perturbed state,
respectively. Under this assumption, a Taylor m ¼ M1  ~
x0 (10)
0
expansion around ~ x lets us to obtain:
Equation 9 can be written in compact form as:
0
d~
x
x0
¼ J ~ (4) dm
dt ¼ J0m (11)
dt
with J being the Jacobian matrix, which in turn is
Thus, meanwhile m contains information of how the
obtained through:
network coordinates and organizes its metabolites to
reach the
0 steady-state condition, and the diagonal
J ¼SG (5) matrix J specifies the timescales in which this process
happens. This latter issue is given
0 by the negative
where G is the gradient matrix formed with the partial inverse of the eigenvalues of J (Kauffman et al.
derivatives between the flux of the i-esime reactions of 2002; Resendis-Antonio 2009).
~ Vi , and the concentration xj :
V,
k-Cone Space
@Vi
G¼ (6) As we mentioned in the last section, the dynamical
@xj description of metabolism constitutes a cornerstone
Dynamic Metabolic Networks, k-Cone 627 D
a dx′
= J · x′
b dt
A x′ = x − xst

k6 k1 J=S·G

∂Vi
k5 k2 G=
∂xj
k4
B
D
C J = M · J ′· M −1
k3
m = M −1 · x ′

dm
= J ′· m
A V2 + V5 −V1 − V6 dt
B = V1 + V4 −V2 − V3
V3 + V6 −V4 − V5 Supposing steady-state
C
and law of mass action
c A = k 2B + k 5C − (k 1 + k 6)A dx
=S·V=Q·k
B = k 1A + k 4C − (k 2 + k 3)B dt
Q = S · diag (x0)
C = k 6A + k 3B − (k 4 + k 5)C
Q · k = 0 ⇒ null (Q) ⇒ k-cone

V1 k1A k1
k1 k2 k3 k4 k5 k6 V2 k2B k2
−1 −1 V3 k3B k3
1 0 0 1 A V= = k=
S= 1 −1 −1 1 0 0 B V4 k4C k4
0 0 1 −1 −1 1 V5 k5C k5
C
V6 k6A k6

−A B 0 0 C −A
d Q = S · diag (xst) = A −B −B C 0 0
0 0 B −C −C A

k1 0 0
0 k2 0
∂Vi 0 k3 0 −k1 −k6 k2 k5
Gij = = J=S·G= k1 −k2 −k3 k3
∂xj 0 0 k4
0 0 k5 k5 k4 −k4 −k5
k6 0 0

Dynamic Metabolic Networks, k-Cone, Fig. 2 Example of composed of three metabolites which interconvert to each other
the application of the theory of modal analysis and the k-cone with reaction constants ki . (c) Box showing the balance equations
formalism to a hypothetical metabolic network. (a) The box in terms of the fluxes Vi or the metabolite concentrations and
shown in this figure recapitulates the mathematical equations kinetic constants. (d) Box exemplifying the matrices mentioned
described in this essay. (b) A hypothetical metabolic network in the text constructed according to the network from (b)

in systems biology for surveying the mechanism by scales. For instance, in the example depicted in Fig 2,
which cells respond under external perturbations. it can be seen that the knowledge of the kinetic
However, given that a variety of kinetic information constants is needed to obtain the Jacobian matrix J
is unknown in most of the biological systems, this (panel D). With the purpose of overcoming this issue,
formalism has been applied in few cases at genome some databases are emerging (Rojas et al. 2007;
D 628 Dynamic Metabolic Networks, k-Cone

Scheer et al. 2011) that store the numerical values of space let us identify and explore the potential response
some parameters required for analyzing certain biolog- of the metabolic phenotype for a microorganism and
ical circuits at well-defined physiological conditions. allows us to survey how the metabolism coordinately
Although, these databases represent important contri- acts to reach a steady state after one perturbation
butions to enrich the model and experimentally assess occurs. Remarkable, by selecting an ensemble of
its outcomes predictions, there is evidence that the kinetic parameters belonging to the k-cone, one can
numerical values of the kinetic parameters can vary build a library of feasible dynamic behavior and
depending on the physiological context. In fact, there explore the average or most frequent behavior.
is evidence that measurement in vitro and in vivo can Furthermore, this dynamic library can contribute to
differ by some orders of magnitude (Famili et al. classify those parameters that have a high from those
2005). In this contextual scheme, the elaboration of with a low numerical variability for recovering the
alternative procedures that allows us to estimate the steady state.
range of each parameter participating in the reactions In order to apply this framework at the genome
conforming a metabolic network would be of great scale, an important issue is to have a well-defined
value. metabolic profile at a steady state. As described in
In general terms, the temporal evolution of the Fig. 1, the variety of technologies used in metabolome
metabolite concentrations can be founded by solving high-throughput data can fulfill this latter requirement.
Eq. 3, but this requires a complete knowledge of the As we have seen throughout this essay, the combined
~ and the initial conditions, a fact that is
values of S, V, effect of theory of modal analysis and the k-cone
not always fulfilled. However, if we suppose that all formalism supply with a pipeline to explore and survey
the reactions obey law of mass action, Eq. 3 can be the dynamical behavior of genome-scale metabolic
written as: reconstructions. This method is strongly dependent
on quantitative metabolic profile of the organism in
study and independent of a complete or partial knowl-
d~
x
¼ S  diagðCÞ  k~ (12) edge of the kinetics underlying the metabolic network
dt
(Resendis-Antonio 2009).

where diag(C) is a diagonal matrix whose entries are


determined by a function of metabolic concentrations Cross-References
at steady state (C ¼ Pxi jSij j , such that SRij refers
R

uniquely to the reactant stoichiometric coefficients) ▶ Jacobian Matrix


and k~ is a vector with the unknown kinetic constants ▶ Law of Mass Action
and whose dimensionality is determined by the number ▶ Stoichiometric Matrix
of metabolic fluxes included in the metabolic recon-
struction. Given that the numerical values of the
kinetic constants remain along time, the numerical References
value of k~ can be straightforward identified at the
steady-state regime. In such a situation, Eq. 12 is Famili I, Mahadevan R, Palsson BO (2005) k-Cone analysis:
written as: determinig all candidate values for kinetic parameters on
a network scale. Biophys J 88(3):1616–1625
Jamshidi N, Palsson BO (2008) Formulating genome-scale
Q  k~ ¼ 0 (13) kinetic models in the post genome era. Mole Syst Biol 4:171
Kauffman KJ, Pajerowski JD, Jasmshidi N, Palsson BO,
Edwards JS (2002) Description and analysis of metabolic
where, we have stated that Sdiag(C) ¼ Q for simplic-
connectivity and dynamics in the human red blood cell.
ity in notation. Thus, we concluded that the numerical Biophys J 83(2):646–662
range of the kinetic parameters can be estimated Oberhardt MA, Palsson BO, Papin JA (2009) Applications of
through the right null space of Q. The feasible space genome-scale metabolic reconstructions. Molec Syst Biol
5:320
of kinetic parameters that ensure the presence of
Resendis-Antonio O (2009) Filling kinetic gaps: dynamic
a steady-state behavior in metabolic networks is called modeling of metabolism where detailed kinetic information
the k-cone (Famili et al. 2005). This multidimensional is lacking. PLoS ONE 4(3):e4967
Dynamical Systems Theory, Asymptotics and Singular Perturbations 629 D
Rojas I, Golebieswki M, Kania R, Krebs O, Mir S, Weidemann A, usually network modules (▶ Modularity; ▶ Module
Wittig U (2007) SABIO-RK: a database for biochemical reac- network) are of an imaginable form for decomposition
tions and their kinetics. BMC Syst Biol 1(Suppl 1):S6
Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, because of network ▶ modularity. And second, the
Rother M, Sohngen C, Stelzer M, Thiele J, Schomburg D dynamical connectivity between different modules
(2011) BRENDA, the enzyme information system in 2011. (▶ Modularity; ▶ Module network) is quantified
Nucleic Acids Res 39:670–676 under various conditions. The typical method for this
purpose is modular response analysis (MRA) proposed
by Kholodenko et al. (2002).
Dynamic Modeling and Simulation D
Cross-References
▶ Kinetic Modeling and Simulation
▶ Modular Organization of Gene Regulatory
Networks
Dynamic Modularity
References
Junhua Zhang
Academy of Mathematics and Systems Science, Alexander RP, Kim PM, Emonet T, Gerstein MB (2009)
Chinese Academy of Sciences, Beijing, China Understanding modularity in molecular networks requires
Key Laboratory of Random Complex Structures and dynamics. Sci Signal 2(81):pe44
Kholodenko BN, Kiyatkin A, Bruggeman FJ, Sontag E,
Data Science, Chinese Academy of Sciences, Beijing, Westerhoff HV et al (2002) Untangling the wires: a strategy
China to trace functional interactions in signaling and gene
National Center for Mathematics and Interdisciplinary networks. Proc Natl Acad Sci USA 99:12841–12846
Sciences, Chinese Academy of Sciences, Beijing,
China
Dynamical System Model Invalidation
Synonyms
▶ Model Invalidation
Community; Modular; Modularity; Module network

Dynamical Systems Theory, Asymptotics


Definition and Singular Perturbations

The main idea of dynamic modularity comes from the John R. King
need to systematically explain the influence of a simple School of Mathematical Sciences, University of
genetic change or environment perturbation on the Nottingham, Nottingham, UK
behavior of an organism, which is also the ultimate
goal of studying biological networks. However,
research on dynamics of molecular networks remains Synonyms
very challenging even though a huge number of high-
throughput data are available nowadays because the Matched asymptotic expansions; Multiscale and
state space of networks grows exponentially with homogenization methods; Regular and singular pertur-
the number of network components (Alexander bation methods
et al. 2009). So, methods to reduce the complexity of
the analysis are of great interest. Thus, dynamic mod-
ularity appears, which can be used to explore molecu- Definition
lar network dynamics to some extent by two steps.
First, the network is decomposed into smaller building Asymptotic methods comprise a class of systematic
blocks that can be analyzed more easily, among which mathematical procedures that allow the dominant
D 630 Dynamical Systems Theory, Asymptotics and Singular Perturbations

processes operating in a given model to be identified, for for prescribed constants S0 and E0. An essential
the model to be reliably simplified by neglecting those first step in the application of asymptotic techniques
effects that are identified as negligible and for the error is to non-dimensionalize the model, thereby reducing
arising from the simplification to be rather precisely the number of parameters present, identifying the
characterized. The techniques are widely applied in relevant parameter groupings and, most importantly
the study of differential-equation models but are equally in the current context, characterizing the relative
applicable to discrete systems, differential-delay equa- importance of these dimensionless groups and hence
tions, and stochastic models, for example. Typical of the processes that each embodies. The choice of
applications in systems biology include in reducing the non-dimensionalization is somewhat arbitrary, but
complexity of network models (possibly expressing it is often convenient to scale out the initial data, i.e.,
them in modular form) and in integrating across dispa- in the case of Eqs. 1 and 2 to set, having noted that
rate spatial and/or temporal scales in multiscale formu- E ¼ E0  C,
lations. The techniques permit the extraction of
parameter dependencies from, and more extensive ana- ^
S ¼ S0 S; ^
C ¼ E0 C; t ¼ ^t=k1 S0 : (3)
lytical treatment of, the governing models, as well as
complementing their numerical study. This yields

d S^    
¼ r  1  C^ S^ þ k1 C^ ;
Characteristics dt (4)
d C^   
¼ 1  C^ S^  ðk1 þ k2 Þ C; ^
The numerous general texts available on asymptotic d^t
methods include Murray (1984); Hinch (1991);
Holmes (1995); and Kevorkian and Cole (1996). An with
ordinary differential equation (ODE) model that exem-
plifies many of the key aspects of singular-perturbation S^ ¼ 1; C^ ¼ 0 at ^t ¼ 0: (5)
methods as applied in systems biology is one that is
widely adopted as a deterministic description of the Here the number of parameters has been reduced
enzymatic reactions from five in Eqs. 1 and 2 to three dimensionless
groupings
k1 k2
S þ E Ð C! P þ E
k1 r ¼ E0 =S0 ; k1 ¼ k1 =k1 S0 ; k2 ¼ k2 =k1 S0 ; (6)

in which a substrate S reacts with an enzyme E to form the first of which is a concentration ratio, while the
a complex C that decomposes irreversibly into the other two are ratios of possible timescales. The extent
enzyme and a product P. Using the same notation for to which the number of parameters can be reduced
the concentrations of each of these species then gives depends on the problem; a direct application of the
the ▶ Ordinary Differential Equation (ODE) Buckingham p theorem gives a lower bound on how
many fewer there are in the dimensionless formulation
(and the actual value, two, in the above example),
dS
¼ k1 ES þ k1 C; while the number of variables to be scaled, e.g., three
dt
in the case of Eq. 3, is typically an upper bound.
dE
¼ k1 ES þ ðk1 þ k2 ÞC; (1) If reasonable (at least order of magnitude) estimates
dt
dC are available for each of the parameters, the next step is
¼ k1 ES  ðk1 þ k2 ÞC; to determine which of the dimensionless groupings is
dt
small (or large) and hence available for exploitation
in an asymptotic analysis (subtleties can arise in iden-
typically subject to the initial conditions tifying the combination of dimensionless parameters
that can be most effectively exploited in this way – see
S ¼ S0 ; E ¼ E0 ; C ¼ 0 at t ¼ 0 (2) Segel and Slemrod (1989), for example – and
Dynamical Systems Theory, Asymptotics and Singular Perturbations 631 D
distinguished limits, described below, have a role to The reduced problems obtained in the above fash-
play in this regard). In the case of regular perturbation ion are more amenable to analytical solution (as in
problems, a uniformly valid approximation is obtained Eq. 8) than is the full problem, are numerically better
simply by discarding all terms multiplied by the small conditioned (their stiffness having been removed), and
parameter. Singular perturbation problems are more contain fewer parameters (e.g., Eqs. 7 and 9 each
delicate, since different approximations hold on differ- involve only the single grouping k1 þ k2 ), so can
ent scales, thereby capturing the ▶ slow-fast dynam- more readily be fitted to experimental data. It is note-
ics: this can be illustrated by considering the limit of worthy in particular that for such purposes it may not
Eqs. 4 and 5 in which r is small. Then for ^t ¼ O (1) (the be necessary to have any quantitative information D
shortest relevant timescale, corresponding to the inner about the small parameter, beyond that it is indeed
region or boundary layer) the appropriate approxima- small. In the above, only leading order approximations
tion to Eq. 4 is, at leading order in r, have been given – more accurate expressions can be
obtained by expanding the solutions in each region in
dS^0 d C^0   powers of the small parameter and balancing the terms
¼ 0; ¼ 1  C^0 S^0  ðk1 þ k2 ÞC^0 ; (7)
d^t d^t that involve the same power (in more complicated
examples, terms depending on the logarithm of the
so that S^0 ¼ 1 and small parameter may also need to be introduced in
order to match successfully etc.).
  The above assumes the problem to contain a single
C^0 ¼ 1  eð1þk1 þk2 Þ^t =ð1 þ k1 þ k2 Þ: (8) small (or large) parameter. In many applications (and
almost invariably in the case of complex systems, such
The outer region sets ^t ¼ t=rk2 (to obtain as are common in the modeling of ▶ gene regulatory
a balance in the first of Eq. 4), S^ ¼ S and C^ ¼ C and networks and of ▶ metabolic and signaling networks)
a different approximation then holds as r ! 0, namely, multiple such parameters are present; typically one of
these would be identified as that in which the expan-
d S0
k2 ¼ ð1  C0 ÞS0 þ k1 C0 ; sions are to be performed and the relative sizes of the
dt others are then characterized by expressing them in
0 ¼ ð1  C0 ÞS0  ðk1 þ k2 ÞC0 ; terms of powers of this small parameter; there can,
however, be ambiguity in how this is accomplished.
whereby the second equation in Eq. 4 is replaced by its Distinguished limits, in which such characterizations
quasi-steady approximation (▶ Flux Balance Analysis, are identified in part on mathematical grounds as giv-
for example) leading to a Michaelis-Menten expres- ing the fullest set of relevant balances within the equa-
sion (a special case of the ▶ Hill Equation): tions, can then be of particular value, as they can also
be (because of their broad validity) when limited infor-
mation is available about the sizes of some of the
d S0 S0 0 ¼ S0
¼ ; C : (9) parameters.
dt k1 þ k2 þ S0 k1 þ k2 þ S0 The discussion above pertains to circumstances in
which the method of matched asymptotic expansions
The initial data on Eq. 8 are provided by the applies. Another broad set of techniques (multiple
matching condition scales, having two-timing as a special case) was origi-
nally developed largely in the analysis of nonlinear
oscillations (whereby rapid oscillations are subject to
lim S0 ðtÞ ¼ lim S^0 ð^tÞ ¼ 1
t!0 ^t!þ1 a slow rate of decay, for example: since the oscillations
persist, the fast and slow scales must be captured for all
(more generally, however, the matching conditions times, rather than the former being relevant only in the
need not correspond to the original initial conditions). boundary layer); here secularity conditions, instead of
The approximations valid on each of these timescales matching, play a central role. This class of techniques is
can be combined into a uniformly valid or composite of particular importance in integrating between scales
approximation. (homogenization – see Mei and Vernescu (2010), for
D 632 Dynamical Systems Theory, Bifurcation Analysis

instance), having the potential within systems biology ▶ Master Equation


to embed cell-scale behavior systematically within tis- ▶ Mean-Field Approximation
sue-level models, say (cf. ▶ mixed and multi-level ▶ Metabolic and Signaling Networks
models). ▶ Mixed and Multi-level Models
Since the above example is an ODE system, it ▶ Modularity-Based Network Decomposition
should be emphasized that the techniques are equally ▶ ODE: Ordinary Differential Equation
effective in the analysis of, inter alia, ▶ partial differ- ▶ Partial Differential Equation (PDE), Models
ential equation (PDE) models, differential-delay ▶ Slow-Fast Dynamics
equations, discrete systems, and stochastic models
(cf. van Kampen (2007), for example, reduction of
the ▶ master equation to a ▶ Fokker-Planck equation References
being a common upshot) and may be of value in under-
pinning the development of a ▶ mean-field approxi- Barenblatt GI (1996) Scaling, self-similarity and intermediate
asymptotics: dimensional analysis and intermediate asymp-
mation. Given the disparate timescales that almost
totics. Cambridge University Press, Cambridge
invariably arise in such applications, they can be of Hinch EJ (1991) Perturbation methods. Cambridge University
particular effectiveness in simplifying a complex Press, Cambridge
▶ biological network model containing numerous dis- Holmes MH (1995) Introduction to perturbation methods.
Springer, New York
tinct pathways and may then provide systematic
Kevorkian JK, Cole JD (1996) Multiple scale and singular per-
approaches to ▶ modularity-based network turbation methods. Springer, New York
decomposition. King JR, Chapman SJ (2001) Asymptotics beyond all orders and
The methodologies in question remain areas of stokes lines in nonlinear differential-difference equations.
Eur J Appl Math 12:433–463
active research and issues that go beyond those
Mei CC, Vernescu B (2010) Homogenization methods for
described above include the following. multiscale mechanics. World Scientific, Singapore
(a) Asymptotics beyond all orders. In certain applica- Murray JD (1984) Asymptotic analysis. Springer, New York
tions, the calculation of algebraic terms (those Segel LA, Slemrod M (1989) The quasi-steady-state assump-
tion: a case study in perturbation. SIAM Rev 31:446–447
involving powers of the small parameter) does
van Kampen NG (2007) Stochastic processes in physics and
not suffice in the sense that important phenomena chemistry. Elsevier, Amsterdam
manifest themselves only in terms that are expo-
nential small: these terms are “hidden” beyond
a (divergent) algebraic series and hence lead to
additional complexities; a particular application
is to the failure of signal propagation in spatially Dynamical Systems Theory, Bifurcation
discrete systems – see King and Chapman (2001) Analysis
and references therein.
(b) Intermediate asymptotics (Barenblatt 1996). An Alan Champneys and Krasimira Tsaneva-Atanasova
independent variable, rather than one of the dimen- Department of Engineering Mathematics, University
sionless constants, can be taken to be the small or of Bristol, Bristol, UK
large quantity, the analysis of large-time behavior
(such as traveling-wave propagation in Fisher’s
equation) being a common such application. Synonyms

Numerical continuation; Parameter studies;


Cross-References Path-following; Qualitative analysis

▶ Biological Network Model


▶ Flux Balance Analysis Definition
▶ Fokker–Planck Equation
▶ Gene Regulatory Networks Bifurcation theory refers to the study of qualitative
▶ Hill Equation changes to the state of a system as a parameter is varied.
Dynamical Systems Theory, Bifurcation Analysis 633 D
It can be applied to ▶ steady state systems, or to dynam- In contrast with many equations arising in physics,
ical systems and can be understood best at the level of engineering, and economics, a key feature of most
a mathematical model, although recent techniques biological models of the form (Danino et al. 2010) is
allow the method to be applied to experiments with that they are nonlinear, essentially because of large
feedback control. Typically the theory is applied to deformations, excitability, thresholding and interac-
a ▶ continuous model, but can also be used in ▶ discrete tion processes governed by the ▶ law of mass action.
models and mathematics, difference equations. There Nonlinear systems can have multiple ▶ stable steady
are dedicated numerical implementations of bifurcation states, or ▶ attractors, that can have many different
theory using path-following, or numerical continuation. stable regions, or basins of attraction in state space. D
There is a distinction between a local ▶ bifurcation, They can also often support ▶ limit cycle periodic
which can be understood in terms of a change to the oscillations. Moreover, as a parameter is varied,
number or stability of simple steady states, and a global a new attractor can emerge out of “thin air,” typically
bifurcation, which cannot. Often global bifurcations when an unstable state gains local stability. For this
cause catastrophic changes to the ▶ attractor of reason, usual numerical methods and computer simu-
the system. Typical local examples are the ▶ Hopf lation for ordinary differential equations (▶ Partial
bifurcation which leads to the onset of oscillation and Differntial Equations, Numerical Methods and Simu-
the ▶ saddle-node bifurcation where a stable steady lations; ▶ Ordinary Differential Equation (ODE),
state is created or destroyed, often leading to bistability. Model) can be highly unreliable in understanding the
true dynamics of the system, if the method is based on
simulation from fixed initial conditions. Bifurcation
Characteristics theory can be especially useful in this context as it
enables the tracing of paths of unstable states as param-
Once a model of a biological system has been eters vary and hence determine precisely the transi-
constructed and, consequently, the number of tions (or “bifurcation points”) at which qualitatively
parameters in the model carefully defined and distinct stable behavior emerges.
evaluated – perhaps using ▶ optimization and param- The codimension of a bifurcation is defined as the
eter estimation – one may wonder how the long-term number of parameters required to observe a bifurcation
behavior of a dynamical system changes when these in a structurally stable way. So a codimension-one bifur-
parameters are changed. This question is the basis of cation can be observed at an isolated value of a single
bifurcation theory and it underlies a qualitative parameter, whereas a codimension-two bifurcation
understanding of many biological processes and tran- would typically only be seen at an isolated point in
sitions, such as the onset of oscillation, switching, a two-parameter diagram. There is a distinction drawn
morphogenesis, multi-stability, emergence, and between local bifurcations that can be understood in
localization. terms of loss of ▶ stability of a simple state such as an
Bifurcation theory can be applied to a wide variety equilibrium or a limit cycle and global bifurcations that
of deterministic models and processes, including cannot. Often global bifurcations involve
▶ partial differential equation (PDE) models rearrangements of stable and unstable manifolds of
and ▶ dynamical systems theory, delay differential other simple states, such as in homoclinic bifurcations.
equations but for simplicity this entry shall consider A codimension-one bifurcation can often be
only the context of ▶ dynamical systems theory, represented in a bifurcation diagram that depicts
ordinary differential equations. Specifically, consider a measure, or norm, of a system state against a single
such a parametrized ▶ ODE model written in parameter; see Fig. 1 for examples. Codimension-one
state-space form: bifurcations can also be used to divide regions in
a parameter plain in which qualitatively distinct
x ¼ f ðx; AÞ; (1) bifurcations can occur.
A comprehensive treatment of local bifurcations of
where x E Rn is the set of states of the system, l E Rp is codimension-one and two, and many examples
a parameter set, and a dot denotes differentiation with of global bifurcations can be found in Kuznetsov
respect to time. (2004). That book also contains analytical and
D 634 Dynamical Systems Theory, Bifurcation Analysis

a b
(x) (x )

λ1 λ2 λ λ

bistability region

c d
(x ) (x )
x2 x1

Dynamical Systems Theory, Bifurcation Analysis, norm or scalar measure of the vector state x. (b) A supercritical
Fig. 1 Schematic bifurcation diagrams depicting pitchfork bifurcation (upper plot) and a transcritical bifurcation
codimension-one local bifurcations. (a) An s-shaped fold featur- (lower plot). (c) A supercritical Hopf bifurcation (upper plot)
ing two saddle-node bifurcations (at parameter values l1 and l 2) and a subcritical Hopf bifurcation (lower plot). Here, a curve
and a consequent parameter interval between these two values in composed of solid circles represents a path of stable limit cycle
which bistability is observed. In this and subsequent panels, solid oscillations, whereas a curve of open circles represents a path of
lines represent paths of stable steady states and dashed lines unstable limit cycles. (d) A representation of a supercritical Hopf
unstable steady states. Also ||x|| represents a characteristic bifurcation in state and parameter space

numerical techniques for analyzing bifurcations in systems theory, stability analysis in general can be
practical examples, principally using the theory of found in Strogatz (1994). For more on numerical tech-
center manifolds and normal forms. That theory can niques for performing parameter continuation and
be seen as a counterpart to more traditional ▶ dynam- bifurcation analysis, see Krauskopf et al. (2007).
ical systems theory, asymptotics, and singular pertur- Rather than reiterate this theory, this entry shall try
bations. The qualitative geometry or topology of to give a qualitative flavor to how bifurcation theory
bifurcations is also stressed by Shilnikov et al. can underlie several key phenomena in biological
(2001). Many examples in biological systems, espe- systems. Specifically treated are threshold behavior,
cially in ▶ reaction-diffusion-advection equations are oscillation, bi- and multi-stability, synchronization
found in Murray (2007), and applications to cell biol- and emergence of collective behavior, before some
ogy in Fall et al. (2002). A more elementary introduc- final remarks on bifurcation theory applied directly to
tion to bifurcation theory and nonlinear dynamical experiments.
Dynamical Systems Theory, Bifurcation Analysis 635 D
Threshold Examples:
It is common in biological systems for a threshold • The ▶ Goodwin oscillator is a simplified model of
concentration of some chemical signal to be there in circadian rhythms based on a negative ▶ feedback
order for certain behavior to be triggered. Thresholds loop.
can often be understood in terms of the phenomenon of • The ▶ repressilator and oscillating network is
excitability, which in itself is a property of a dynamical a simple ▶ network motif constructed by ▶ model-
system with two timescales, where a transient pushes ing and simulation: synthetic models and methods.
the system beyond the region where a large excitation It models a ▶ gene regulatory network composed of
occurs. In systems that reach steady state though, the three genes which mutually repress each other in D
variation of a parameter can cause a global bifurcation sequence according to a ring structure.
that makes the large excitation occur. • Spatiotemporal regulation of extracellular signal-
regulated kinase has been shown to result in rapid
Examples: and sustained nuclear-cytoplasmic oscillations
• Control of calcium oscillations, see for example (Shankaran et al. 2009).
(Sneed and Keener 2008)
• ▶ Cell cycle model analysis, bifurcation theory, Bistability and Multi-stability
which is a canonical example of the practical appli- Bistability refers to the existence of two distinct
cability of bifurcation theory in explaining how attractors to which the system may evolve given
biological decisions occur as emergent bifurcation different initial states or perturbations. A typical bifur-
events upon integrating the various environmental cation scenario in which bistability occurs is via a pair of
and internal parameters ▶ saddle-node bifurcations that are connected in an
S-shaped fold, see Fig. 1a. Such folded structures are
Oscillation often seen as part of the the unfolding of a codimension-
Periodic oscillations are important in biology, two cusp bifurcation point (see Fig. 2a, b) which is one
appearing in diverse areas such as ▶ Circadian of the elementary catastrophes of singularity theory.
rhythms, metabolic networks and their evolution, Bistability also often accompanies subcritical
heart beats, cell signaling, etc. Oscillations can be bifurcations, where an extra fold causes the unstable
described by simple harmonic motion, governed by bifurcating branch to turn around, become stable, and
second-order linear ODEs, but such descriptions suffer coexist with the primary branch, again see Fig. 1.
from several serious limitations. First, they are not Multi-stability refers the situation when there are
robust, as a small amount of damping destroys the more than two competing attractors, each with distinct
oscillation. Second, ▶ oscillation amplitude and basins of attraction, which can occur via a sequence of
phase depends on the initial condition. Finally, the bifurcations, or directly via a global bifurcation, such as
frequency is typically not adaptable. In contrast, stable that caused by a Shilnikov-type homoclinic bifurcation.
limit cycles of nonlinear models are robust, because
following a small ▶ perturbation away from the cycle, Examples:
the system will return to the cycle by itself. Also, if the • The Toggle switch (Gardner et al. 2000) is
dynamics changes a little, a limit cycle will still exist, a synthetic biology construct that matches what is
close to the original one. believed to be a common ▶ network motif in sys-
The canonical way to generate oscillation from tems biology. It is composed of two genes that
a system that is otherwise at rest is via a ▶ Hopf mutually repress each other, which causes
bifurcation. These oscillatory instabilities can occur bistability between steady states in which either
in two ways (see Fig. 1c), either as a ▶ supercritical one gene or the other is expressed at high levels.
bifurcation in which a stable limit cycle is created at • Multi-stability is also seen in unusual cell-division
small amplitude, or as a ▶ subcritical bifurcation, phenotypes in ▶ cell cycle model analysis, bifurca-
often accompanied by bistability which would tion theory and in recent systems biology models of
cause the jump to a fully formed large-amplitude tumors as competing attracting states in cancer
attractor. dynamics (Huang et al. 2009).
D 636 Dynamical Systems Theory, Bifurcation Analysis

a b
l2 nx
solution
1 stable state states

codim 2 l2
cusp point

2 stable bifurcation
states, 1 unstable set
l1 l1
c d
16 600

period decreasing
14
500
12 TB
10 400
8 SNP SN
300

α1
μ2

DH δ = 0.1
6 period increasing
HB 200 δ = 0.01
4 HC δ = 0.001 oscillations
2 100
no oscillations
0 CP SNIPER 0
2 3 4 5 6 7 8 9
−20 −10 0 10 20 0.1 1
μ2 p (μM)

Dynamical Systems Theory, Bifurcation Analysis, saddle-node combine), DH a degenarate Hopf bifurcation
Fig. 2 Examples of two-parameter bifurcation diagrams indi- (a transition between super and subcritical cases), SNIPER rep-
cating parameter regions in which qualitatively distinct behavior resents a Saddle Node of Infinite PERiod bifurcation point
is observed. (a) A cusp bifurcation point linking two curves of (a kind of global bifurcation), SN a saddle node of equilibria,
saddle-node bifurcations. (b) A representation of the cusp in SNP a saddle node of periodic orbits, HB a Hopf bifurcation, and
3D showing a norm of the solution state on the vertical axis. HC a homoclinic bifurcation. (d) Two-parameter bifurcation
(c) Codimension-two bifurcations in a simple model for plateau diagram of an open-cell model for calcium oscillations (Sneed
bursting in excitable systems that arises in the unfolding of and Keener 2008). The lines (for three different values of
a certain degenerate codimension-three bifurcation point; see a parameter d) represent Hopf bifurcation curves in the param-
(Golubitsky et al. 2001). Here CP represents a cusp bifurcation eter plane and separate regions in which oscillations do and do
point, TB a Takens-Bogdanov point (where Hopf and not exist

Synchronization and Emergence of Behavior Experimental Bifurcation Theory


Many biological systems composed of near identical Recently, the possibility of performing bifurcation
cells, components, or organisms are known to undergo analysis directly in ▶ feedback controlled experiments
common collective dynamics. The simplest such state has raised (Sieber et al. 2008). This poses the possibil-
is that of synchronization, where each state oscillates, ity of direct intervention in vitro or in vivo in order to
typically periodically perfectly in time with the others. analyze, or indeed to influence and control bifurcations
This is an example of a symmetric state of a dynamical to desirable or undesirable states as an external or
system and the onset or loss of synchronicity can be internal parameter is varied. Such technology is likely
understood as a form of symmetry breaking bifurca- to have a significant impact on synthetic biology and
tion. Other collective states include spatially localized personalized medicine.
patterns of either steady state or dynamic behavior. For
more on the bifurcation theory of pattern formation
systems, see (Hoyle 2006). Cross-References

Example: ▶ Attractor
• Collective oscillations in proliferating bacterial ▶ Bifurcation
population (Danino et al. 2010). ▶ Bifurcation, Supercritical and Subcritical
Dynamical Systems Theory, Delay Differential Equations 637 D
▶ Cell Cycle Model Analysis, Bifurcation Theory Huang S, Ernberg I, Kauffman S (2009) Cancer attractors: a
▶ Circadian Rhythm systems view of tumors from a gene network dynamics
and developmental perspective. Semin Cell Dev Biol
▶ Continuous Model 20(7):869–876
▶ Differential-Difference Equations Krauskopf B, Osinga HM, Galn-Vioque J (2007)
▶ Dynamical Systems Theory, Asymptotics and Numerical continuation methods for dynamical systems:
Singular Perturbations path following and boundary value problems. Springer,
New York
▶ Dynamical Systems Theory, Delay Differential Kuznetsov YA (2004) Elements of applied bifurcation theory,
Equations 3rd edn. Springer, New York
▶ Feedback Regulation Murray JD (2007) Mathematical biology. Springer, New York, D
▶ Gene Regulatory Networks third (in two parts) edition
Shankaran H, Ippolito DL, Chrisler WB, Resat H, Bollinger N,
▶ Goodwin Oscillator Opresko LK, Wiley HS (2009) Rapid and sustained nuclear-
▶ Hopf Bifurcation cytoplasmic erk oscillations induced by epidermal growth
▶ Law of Mass Action factor. Mol Syst Biol 5:332
▶ Limit Cycle Shilnikov LP, Shilnikov AL, Turaev DV, Chua LO
(2001) Methods of qualitative theory in nonlinear dynamics.
▶ Metabolic Networks, Evolution World Scientific, Singapore
▶ Network Motif Sieber J, Gonzalez-Buelga A, Neild SA, Wagg DJ, Krauskopf
▶ Partial Differntial Equations, Numerical Methods B (2008) Experimental continuation of periodic orbits
and Simulations through a fold. Phys Rev Lett 100:244101
Sneed J, Keener J (2008) Mathematical physiology. Springer,
▶ Optimization and Parameter Estimation, Genetic New York, second, in two parts edition
Algorithms Strogatz SH (1994) Nonlinear dynamics and chaos: with appli-
▶ Ordinary Differential Equation (ODE) cations to physics, biology, chemistry, and engineering.
▶ Oscillation Amplitude Westview, Boulder
▶ Partial Differential Equation (PDE), Models
▶ Periodic Oscillation
▶ Perturbation
▶ Reaction-Diffusion-Advection Equation
▶ Repressilator and Oscillating Network Dynamical Systems Theory,
▶ Saddle-Node Bifurcation Delay Differential Equations
▶ Stability
▶ Stability, States and Regions Patrick Nelson
▶ Steady State CCMB 2017 Palmer Commons,
University of Michigan, Ann Arbor, MI, USA

References Synonyms
Danino T, Mondragn-Palomino O, Tsimring L, Hasty J (2010)
A synchronized quorum of genetic clocks. Nature
Differential equations with deviating arguments;
463(7279):326–330 Differential-difference equations; Functional differen-
Fall CP, Marland ES, Wagner JM, Tyson JJ (2002) Computa- tial equations; Time delay
tional cell biology. Springer, New York, In memory of Joel
Keizer
Gardner TS, Cantor CR, Collins JJ (2000) Construction of
a genetic toggle switch in Escherichia coli. Nature
403(6767):339–342 Definition
Golubitsky M, Josic K, Kaper TJ (2001) An unfolding theory
approach to bursting in fast-slow systems. Global Analysis Delay differential equations (DDE) are equations
of Dynamical Systems. Festschrift dedicated to Floris
Takens for his 60th birthday, pp 277–308
whose solution depends on not just a single initial
Hoyle R (2006) Pattern formation: an introduction to methods. condition at time, t ¼ t0, but also on the past history
Cambridge University Press, Cambridge of the system. DDEs can be classified as retarded or
D 638 Dynamical Systems Theory, Delay Differential Equations

Dynamical Systems Theory, Delay Differential Equations, Table 1 Summary of Research on DDEs
Topic Problem formulation Approach Applications
Nonlinear y˙(t) ¼ f(y(t), y(t  t), u(t)) Perturbation method with Lambert HIV, chatter
function
P
_ þ Ni¼1 Ai yðt  ti Þ þ ByðtÞ ¼ uðtÞ Superposition or modified Lambert
Multiple delays yðtÞ HIV, multiple regenerative effect
function in chatter
Time-varying y˙(t) + A(t)y(t  t) + B(t)y(t) ¼ u(t) Floquet theory, Wronskian matrix Milling chatter
with Lambert function

neutral and continuous or discrete. In general, DDEs has been led by Bellman, Cooke, Hale, Kuang,
a discrete delay differential equation can be written as and Stepan (Bellman and Cooke 1963; Stepan 1989;
Hale 1977; Corless et al. 1996). Time delays can be
dn y used to represent self-oscillating systems, economic
¼ f ðt; a0 ðtÞyðtÞ; a1 ðtÞy0 ðtÞ;... aðtÞn1 yn1 ðtÞ;
dtn (1) futures, and in engineering, pure delays are often
b0 ðtÞyðt  tÞ. ..bn1 ðtÞyn1 ðt  tÞÞ used to ideally represent the effects of transmission,
transportation, and inertial phenomena. They are also
where t represents a point of time in the past history quite popular in biology, where they can be used to
of the equation. Note, when bi ¼ 0 for all i that the model gestation, maturation, transcription, and
system is just an ordinary differential equation. How- numerous cell-cycle phenomena. Delay differential
ever, when bi ¼ 0 for i > 0 then the system is defined as equations constitute basic mathematical models for
a retarded differential equation. If bi ¼
6 0 for any i > 0 the such real phenomena. However, there is a principal
system is defined to be of neutral type. Equation 1 is an difficulty in studying DDEs hidden in their special
example of a DDE with only one time delay, t ¼ t but transcendental character which leads to an infinite
there can be multiple time delays, t1, t2,. . ., tn, in the spectrum of frequencies. In other words, all DDEs
system. Most DDEs in physics, engineering, and biol- will result in an infinite number of eigenvalues.
ogy are of the discrete type with one time delay. Another Hence, they are often solved using numerical methods,
representation for a DDE is defined as a continuous asymptotic solutions, approximations (e.g., Padé)
delay differential equation. and graphical approaches or through the study of
bifurcation of their characteristic equations to
 Z 1 
dy determine their stability.
¼ f t; yðtÞ; yðt  tÞgðtÞdt (2)
dt 0 Characteristic Equation (zeros of the transcenden-
tal equation) The characteristic equation for a discrete
Note that when one considers the distribution time delay equation at steady state, with a single delay
function g(t) to be a gamma distribution (more (( Eq. 1), with bi ¼ 0 for i > 1) can be written as
precisely an Erlang distribution), the system can
reduce to the discrete delay type. P1 ðlÞ þ P2 ðlÞelt ¼ 0 (3)

where P1 and P2 are functions of the eigenvalues, l of


Characteristics the equation. P1 + P2 ¼ 0 represent the characteristic
equation for the system when t ¼ 0. For the equation
Delays are inherent in many physical, biological, given by Eq. 2, the characteristic equation looks like
economic, and engineering systems. The first appear-
ance of DDEs was in a paper by Kondorse in 1777 but P1 ðlÞ þ P2 ðlÞFðlÞ ¼ 0 (4)
their use did not become popular until the early 1940s
and 1950s when Nyquist, Chebotarev, and Pontryagin where the Pi are the same as in Eq. 3 but instead of the
pioneered work investigating the stability of DDEs explicit exponential term, elt, we find, F(l) to be the
(Pontryagin 1955; Krall 1965). More recently, the Laplace R transform of the delay kernel, defined as
1
advances into the theory, solution, and application of FðlÞ ¼ 0 gðtÞelt
Dynamical Systems Theory, Delay Differential Equations 639 D
The basic idea behind studying the characteristic imaginary parts and squaring both sides. Then
equation is to determine conditions on the parameters allowing m ¼ n2 and defining the resulting equation
and the time delay, t, that causes a bifurcation in as S(m)) has a positive real root m∗ ¼ (n∗)2, such that
stability of the system. One starts by looking at 2. S0 (m∗) 6¼ 0
the conditions for stability when t ¼ 0 (all roots of
characteristic equation lie in the left half plane) and Example
then consider t > 0 to find when the roots of the
characteristic equation cross the imaginary axis l2 þ al þ b þ ðcl þ dÞelt ¼ 0: (5)
(El’sgol’ts and Norkin 1973). As t varies, these roots D
change. Of interest is any critical values of t at which A steady state with this characteristic is stable for t ¼ 0
a root of this equation transitions from having negative if all of the roots of
to having positive real parts. If this is to occur,
there must be a boundary case, a critical value of t, l2 þ ða þ cÞl þ ðb þ dÞ ¼ 0
such that the characteristic equation has a purely
imaginary root. have negative real part. By the Routh-Hurwitz condi-
Early stability methods developed and presented in tions, this occurs if and only if a + c > 0 and b + d > 0.
the classic papers of Pontryagin (1955) and Nyquist Letting l ¼ in we arrive at the following form of
(Krall 1965) have been used for many years to study equation
bifurcations in transcendental equations. However,
these methods rely heavily on the principal of the argu-
SðmÞ ¼ m2 þ ða2  c2  2bÞm þ ðb2  d 2 Þ ¼ 0: (6)
ment for determining where the poles of the
transcendental equations are located. In other words,
they use geometric principles to determine the number Let A a2  c2  2b and B b2  d2. Equation 6
of roots of these equations. The monograph by has a positive real root in two circumstances. Clearly,
Chebotarev and Meiman (1949) shows how to extend since the lead coefficient is positive, if B < 0 then there
the Routh-Hurwitz criteria for polynomials to quasi- is a positive real root. If B > 0, the roots of Eq. 6 are
polynomials. However, it has been noted that the
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
application of the Chebotarev criterion as an analytical A A2  4B
tool is not effective practically. Recent results using ;
2
Sturm sequences (Forde and Nelson 2000) relax the
need for the application of the argument principle and and there is a simple positive root if and only if A < 0.
provides an analytical criterion that is practical to use. Thus one can conclude
The Sturm sequence provides an algorithm for
determining stability of low degree, i.e., less than degree Proposition 1. A steady state with characteristic
4, polynomials and is explained by the following. (Eq. 5) is stable in the absence of delay, and becomes
Given a system of differential equations unstable with increasing delay if and only if
dy
dt ¼ f ðyðtÞ; yðt  tÞÞ with a discrete delay t, and 1. a + c > 0 and b + d > 0
a stable steady state, ys , for t ¼ 0, will lead to 2. Either b2 < d2, or b2 > d2 and a2 < c2 + 2b.

X
N X
M Analytical Solutions
lt
ai l þ e
i
bi l ¼ 0
i
Finding analytical solutions for DDEs is difficult and
i¼1 i¼1 most of the theory to date focuses on computation
methods for solutions or methods for determining sta-
as the characteristic equation of the system about ys. bility via computation or analytics. However, a group
Then there exists a t∗ > 0 for which ys undergoes at the University of Michigan (Asl and Ulsoy 2003; Yi
a nondegenerate change of stability if and only if the et al. 2007) recently developed an analytic approach,
equation based on the matrix Lambert function, for the complete
1. S(m) ¼ 0 (defined by substituting l ¼ m + in into solution of a system of linear constant coefficient
the characteristic equation, separating the real and DDEs. This method can be applied to study eigenvalue
D 640 Dynamical Systems Theory, Delay Differential Equations

assignment, pole placement, controllability and one can rewrite as


observability, and time-varying coefficients. The
method has been validated in engineering problems SeSt x0 þ AeSt x0 þ Ad eST eSt x0 ¼ ðS þ A þ Ad eST ÞeSt x0 ¼ 0:
where delay is significant, e.g., regenerative chatter in (11)
a machining operation on a lathe and a biological
problem, e.g., control of drug therapies. The matrix Because the matrix S is an inherent characteristic of
Lambert function-based solution approach for DDEs is a system and independent of initial condition, we can
analogous to the use of the matrix exponential for the conclude that for Eq. 11 to be satisfied for any arbitrary
free and forced solution of linear constant coefficient initial condition, x0, and every time, t, we must have
ordinary differential equations. Systems with multiple
time delays and nonlinearities arise quite naturally in S þ A þ Ad eST ¼ 0: (12)
engineering and biology and yet little attention has
been paid to their analyses. To explain, look at In the special case that Ad ¼ 0, the delay term in
a second-order linear system of DDEs with state Eq. 7 disappears, Eq. 7 becomes ODE, and Eq. 12 is
space given by ~x.
S þ A ¼ 0 , S ¼ A: (13)
xðtÞ þ AxðtÞ þ Ad xðt  tÞ ¼ 0: (7)
Then, substitution into Eq. 8 yields
A and Ad are the linearized coefficient matrices and
are functions of the dynamical system. The analytical xðtÞ ¼ eAt x0 (14)
method to solve scalar DDEs, and systems of DDEs
using the matrix Lambert W function was introduced This is the typical solution to ODE in terms of the
by Asl and Ulsoy (2003) and extended by Yi (2007) to matrix exponential. Multiply TeST eAT on both sides of
obtain the solution of general systems of DDEs in Eq. 12 and rearrange to obtain
matrix-vector form. First assume a solution form for
Eq. 7 as TðS þ AÞeST eAT ¼ Ad TeAT : (15)

In the general case, when the matrices A and Ad do


xðtÞ ¼ eSt x0 ; (8)
not commute, neither do S and A; thus

where S is n  n matrix. In the usual case, the


characteristic equation for Eq. 7 is obtained from TðS þ AÞeST eAT 6¼ TðS þ AÞeðSþAÞT : (16)
the equation by looking for nontrivial solution of the
form estC where s is a scalar variable and C is con- Consequently, to adjust the inequality in Eq. 16 and
stant (Hale 1977). However, such an approach can to take advantage of the property of the matrix Lambert
neither lead to any interesting result nor help in deriv- W function defined by
ing a solution to systems of DDEs in Eq. 7. Alterna-
tively, one could assume the form of Eq. 8 to derive WðHÞeWðHÞ ¼ H; (17)
the solution to systems of DDEs in Eq. 7 using the
matrix Lambert W function. Substituting into Eq. 7 we introduce an unknown matrix Q so that satisfies,
yields

TðS þ AÞeðSþAÞT ¼ Ad TQ: (18)


SeSt x0 þ AeSt x0 þ Ad eSðtTÞ x0 ¼ 0; (9)

and using the property of the exponential Comparing Eqs. 17 and 18 we note that

eSðtTÞ ¼ eSðTþtÞ ¼ eSðTÞ eSt (10) ðS þ AÞT ¼ WðAd TQÞ: (19)


Dynamics Modeling 641 D
Then from Eq. 19, solving for S gives embedded in the various commercial software pack-
ages, such as Matlab, Maple, and Mathematica.
1 The following table provides an overview of where
S¼ WðAd TQÞ  A: (20)
T the current state of studying DDEs is.

Substituting Eq. 20 into Eq. 15 yields the following


condition, which can be used to solve for the unknown Cross-References
matrix Q:
▶ Dynamical Systems Theory, Asymptotics and D
Singular Perturbations
WðAd TQÞeWðAd TQÞAT ¼ Ad T: (21) ▶ Mathematical Model, Model Theory
▶ Systems Network in HIV
To date, many examples have been studied and
Eq. 21 always has a unique solution Qk for each
branch, k. However, a proof of this result is needed. References
The solution is obtained numerically, for a variety of
initial conditions, using the “fsolve” function in Asl FM, Ulsoy AG (2003) Analysis of a system of linear
differential equations. ASME J Dyn Syst Meas Control
Matlab. The matrix Lambert W function defined in 125:215–223
Eq. 17 contains an infinite number of branches Bellman R, Cooke K (1963) Delay differential equations. Aca-
(Corless et al. 1996). Corresponding to each branch, demic, New York/London
k (¼  1,. . ., 1, 0, 1,. . ., 1), of the Lambert Chebotarev NG, Meiman NN (1949) The Routh-Hurwitz
problem for polynomials and entire functions. Trudy Mat
W function, for Hk ¼ AdTQk, we compute the eigen- Inst Steklov 26
values ^lki , i ¼ 1, 2, of Hk and the corresponding Corless RM et al (1996) On Lambert’s W function. Adv Comput
eigenvector matrix Vk. Hence, the matrix Lambert Math 5:329–359
W function is El’sgol’ts LE, Norkin SB (1973) An introduction to the theory
and application of differential equations with deviating
arguments. Academic, New York
Wk ð^
lk1 Þ 0 Forde J, Nelson P (2000) Applications of Sturm sequences to
Wk ðHk Þ ¼ Vk V1 : (22) bifurcation analysis of delay differential equation models.
0 W k ð^
lk2 Þ k J Math Anal Appl 300:273–284
Hale JK (1977) Theory of functional differential equations.
Spring, New York
Finally, Sk is computed corresponding to Wk from Krall AM (1965) Stability criteria for feedback systems with
Eq. 20 and summated to be the solution to the systems a time lag. SIAM J Control 2:160–170
of DDEs Eq. 7 as Kuang Y (1993) Delay differential equations with applications
to population biology. Academic, Boston
Pontryagin LS (1955) On the zeros of some elementary
X
1 transcendental functions. Am Math Soc Transl 2:95–110
xðtÞ ¼ eSk t Ck (23) Stepan G (1989) Retarded dynamical systems: stability and
k¼1 characteristic functions. Longman Scientific and Technical,
Burnt Mill
Yi S, Ulsoy AG, Nelson PW (2007) Delay differential equa-
where the Ck is a 2  1 coefficient matrix computed tions via the matrix Lambert W function and bifurcation
from a given preshape function x(t) ¼ g(t), which analysis: application to machine tool chatter. Math Biosci
is initial state of DDEs Eq. 7, for t 2 [T, 0] Eng 4:1–12
(Yi et al. 2007).
Each branch of the Lambert W function can be
computed analytically as shown in Corless et al.
1996), and one of the merits of the matrix Lambert Dynamics Modeling
W function approach is that one can compute all of the
branches of the function using commands already ▶ Lymphocyte Dynamics and Repertoires, Modeling
D 642 Dysplasia

uniformity and polarity, and may either stop prolifer-


Dysplasia ating after cessation of the stimulus that evoked the
growth or progress to neoplasia.
Barbara J. Davis
Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences, Cross-References
North Grafton, MA, USA
▶ Cancer Pathology

Definition

Dysplasia is an excessive, disorderly grown tissue in


which cells “abnormally increase in number,” lose
E

Early Embryonic Oscillator Characteristics

▶ Cell Cycle of Early Frog Embryos The European Bioinformatics Institute, an outstation
of the European Molecular Biology Laboratory
(EMBL; http://www.embl.org), was created in 1992
to respond to the requirements for organizing data
EBI Genome Resources from sequencing efforts, notably those originating
from the genome projects (Brooksbank et al. 2003).
Akos Dobay1 and Maria Pamela Dobay2 Located on the Welcome Trust Genome Campus in
1
Institute of Evolutionary Biology and Environmental Hinxton near Cambridge (UK), the EBI web portal
Studies (IEU), University of Zurich, Zurich, constitutes the European node of the INSD consortium
Switzerland (Brooksbank et al. 2003). Other members are the DDBJ
2
Department of Physics, Ludwig-Maximilians and the NCBI. Information on new entries and updates
University, Munich, Germany are exchanged between these databases on a daily basis.
The EBI provides access to more than 60 databases
containing high-quality annotated information and raw
Synonyms
data that range from nucleotide sequences, whole-
genome projects, to protein sequences, protein struc-
EMBL genome database
tures, and protein interactions. The data can be browsed
and analyzed using various tools hosted on the site, and
Definition that are also available for download. The EBI constitutes
the primary repository in Europe for sequence data.
The European Bioinformatics Institute (EBI; http://
www.ebi.ac.uk) is the main European repository of DNA and RNA Sequence Databases Available
nucleotide sequence data. EBI exchanges the collected from EBI
data with the DNA Data Bank of Japan (DDBJ; http:// The main nucleotide sequence database is the European
www.ddbj.nig.ac.jp) and the National Center for Bio- Nucleotide Archive (ENA). ENA is built from several
technology Information Database (NCBI; http://www. databases: the EMBL Nucleotide Sequence Database
ncbi.nlm.nih.gov) on a daily basis. The three databases (Kulikova et al. 2007) or EMBL-Bank created in early
are part of the International Nucleotide Sequence 1980; the Sequence Read Archive (SRA), a repository
Database (INSD; http://www.insdc.org) consortium, set up in 2009 for raw data obtained from next-
whose function is to ensure the integrity of the shared generation sequencing platforms; and the European
information. Apart from the genome resources, EBI Trace Archive (ETA), composed of raw data from elec-
also has resources for protein sequences and structures, trophoresis-based sequencing machines. The EBI also
as well as access to sequences from patents. provides access to whole genome databases across the

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
E 644 EBI Genome Resources

phylum Chordata through the project Ensembl (http:// Microarray Experiment (MIAME) recommendations
www.ensembl.org). Ensembl was launched in 1999 of the international Microarray Gene Expression Data
as a joint collaboration between EBI and the (MGED; http://www.mged.org) Society. The EBI also
Welcome Trust Sanger Institute (Flicek et al. 2011). provides access to patent data resources. These resources
Ensembl Genomes (http://www.ensemblgenomes.org) include the biology-related abstracts of patent applica-
is an extension of the Ensembl project to non-vertebrate tions, chemical compounds available from the Chemical
species such as bacteria, protists, fungi, plants, and Entities of Biological Interest (ChEBI; http://www.ebi.
metazoa. ac.uk/chebi), protein sequences of the European Patent
Office (EPO; http://www.epo.org), the Japan Patent
Protein Sequence Databases and Functional Office (JPO; http://www.jpo.go.jp), the Korean Intellec-
Information Available from EBI tual Property Office (KIPO; http://www.kipo.go.kr)
The Universal Protein Database (UniProt; http://www. and the United States Patent and Trademark Office
uniprot.org), a consortium created in 2002, is a joint (USPTO; http://www.uspto.gov), as well as patent
effort between the EBI, the Swiss Institute of Bioin- equivalents.
formatics (SIB; http://www.isb-sib.ch), and the Protein
Information Resource (PIR; http://pir.georgetown.edu,
The UniProt Consortium 2008). UniProt is comprised
Cross-References
of six different services:
1. The UniProt Knowledgebase (UniProtKB) consists
▶ DDBJ Genome Resources
of the two sections: the UniProtKB/Swiss-Prot for
▶ Metagenomics
curated protein information and the UniProtKB/
▶ NCBI BioProject Genome Resources
TrEMBL for automated annotation.
2. The UniProt Reference Clusters (UniProt/UniRef)
databases, which contain clustered sets of sequences
from the UniProtKB. References
3. The UniProt Archive, a nonredundant protein data-
Brooksbank C, Camon E, Harris MA, Magrane M,
base. Sequence retrieval is done through a unique
Martin MJ, Mulder N, O’Donovan C, Parkinson H,
protein identifier (UPI). Tuli MA, Apweiler R, Birney E, Brazma A, Henrick K,
4. The UniProt Metagenomic and Environmental Lopez R, Stoesser G, Stoehr P, Cameron G (2003) The
Sequences (UniProt/UniMES) database, which is European Bioinformatics Institute’s data resources. Nucl
Acids Res 31:43–50
a repository specifically developed for
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y,
▶ Metagenomics. Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L,
5. The UniProtKB-GOA, which provides high-quality Hendrix M, Hourlier T, Johnson N, K€ah€ari A, Keefe D,
gene ontology (GO; http://www.geneontology.org) Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P,
Longden I, McLaren W, Overduin B, Pritchard B, Riat HS,
annotations to proteins in UniProtKB/Swiss-Prot
Rios D, Ritchie GRS, Ruffier M, Schuster M, Sobral D,
and UniProtKB/TrEMBL. Spudich G, Tang AY, Trevanion S, Vandrovcova J,
6. The UniProtKB Sequence/Annotation Version Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J,
Archive (UniSave), which is a repository of Aken BL, Birney E, Cunningham F, Dunham I, Durbin R,
Fernández-Suarez XM, Herrero J, Hubbard TJP, Parker A,
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL Proctor G, Vogel J, Searle SMJ (2011) Ensembl 2011. Nucl
entry versions. InterPro, an integrated documenta- Acids Res 39:D800–D806
tion resource used for the classification and auto- Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M,
matic annotation of proteins and genomes, can be Baldwin A, Bates K, Bhattacharyya S, Bower L, Browne P,
Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N,
used to add in-depth annotation, including GO terms,
Hoad G, Kanz K, Lee C, Leinonen R, Lin Q, Lombard V,
to protein signatures in UniSave. Lopez R, Lorenc D, McWilliam H, Mukherjee G,
Nardone F, Pastor MPG, Plaister S, Sobhany S, Stoehr P,
Other Databases Available from EBI Vaughan R, Wu D, Zhu W, Apweiler R (2007) EMBL
nucleotide sequence database in 2006. Nucl Acids Res 35:
The ArrayExpress Archive is a database of microarray
D16–D20
data for gene expression and functional genomics in The UniProt Consortium (2008) The universal protein resource
accordance with the Minimum Information About a (UniProt). Nucl Acids Res 36:D190–D195
Ecological Modeling 645 E
content in DNA. Defining organisms in relation to the
ECM embedding context, i.e., their environment including
history, appears as easier than in relation to their constit-
▶ Extracellular Matrix uents. No reconstruction of life from nonliving building
blocks is yet possible. These features also set the stage
for viewing living systems as open with respect to their
Ecological Explanation environment. Since the times of Uexk€ull (Buchanan
2009), the question has been posed but not settled to
▶ Explanation, Functional what extent relationships between organisms and their
environment are appropriately described by mechanisms
or whether the role of life constitutes in turn another E
Ecological Modeling modeling relationship. Hence, proper definitions of
organism, environment, and modeling may turn out to
Michael Hauhs be recursive.
Ecological Modeling, University of Bayreuth, Environmental models (and environmental model-
Bayreuth, Germany ing) can be regarded as a field closely related to eco-
logical modeling. An environment can only be defined
relative to a system or interface of which it serves as
Definition environment. In environmental models, the key role of
organisms is relaxed for characterizing the potential
Ecology studies organisms in their relationship to their behavior of the environment. In this case, abiotic sys-
living and nonliving environment. This relationship can tems serve as a reference for the environment.
either be conceptualized as a function of some internal
structure or as characteristic behavior at the respective
external interface. Models can be used to formalize these Characteristics
options. There are several scientific versions of the
notion of models as exemplified by the usages in math- Ecological Modeling Classically
ematics, physics, or engineering. Nothing has to be Environmental modeling was envisioned by John von
a model and anything can be a model (Mahr 2009), Neumann as a key application area when computing
even an organism, e.g., ▶ model organism. This defini- machinery was introduced. He argued that computing
tion deals with models of organisms or ecosystems. The would be critical in areas such as meteorology where
distinction between conceptual and computational results are relevant for society; they solve numerically
models has been useful. In the following context of a so far unsolvable problem and hence are capable of
▶ artificial life (AL), the discussion will be restricted to demonstrating the power of computing to a wide audi-
models which can be expressed or implemented ence (McGuffie and Henderson-Sellers 2005).
on a computer. Hence, in ecological modeling, the The subsequent success of computer models in opera-
empirical phenomena derived from scientific study as tional weather forecasting (prediction) has aptly demon-
in ecosystem science (on the one hand) or in land-use strated his vision. Computer models transformed weather
traditions as in agriculture, forestry, or nature conserva- prediction from an art into scientifically based technology.
tion (on the other hand) are linked with the theoretical In this area, ▶ process-based models could be shown to
results and engineering potential of computer science. outperform empirical models (McGuffie and Henderson-
In other words, the definition above does not restrict Sellers 2005). The success of environmental models has
ecological modeling neither to a subfield of ecology been used as a blueprint for other modeling approaches,
nor to a subfield of computer science. especially those in the field of ecological modeling or
Organism is the central notion of biology and also climate models (Edwards 2010).
ecology. Characterizing and expressing its key features It is often assumed that computer models can
require biological language and notions. One such key become equally important when studying the phenom-
feature is that all life on earth is embedded into one ena of life. However, this puts the above definition of
single evolution history as indicated by the shared ecological modeling into a dilemma: On the one hand,
E 646 Ecological Modeling

many ecologists regard ecosystems and living systems an interface, e.g., between an ecosystem and its abiotic
almost by definition as ▶ complex systems. Also, the environment. Computers have been routinely used as
atmosphere might be perceived as complex, but unlike (reductionistic) generators of virtual (living) behavior.
large scale geo-systems, the complexity of life has no However, it has not been widely recognized that com-
formalization and appears as such on all scales. Fea- puter science also offers theoretical tools studying the
tures of living systems and especially behavior such as “design of behavior.” Theoretical computer science
reproduction or evolution are often described as the could accordingly be summarized by a corresponding
result of ▶ emergence or interaction (Oyama et al. slogan “Computers, as they could be.”
2000), a behavior towards the environment. Further,
symptoms of unsettled theoretical and conceptual Ecological Modeling for Artificial Life
issues in biology can be found under the notion of An area where such a link seems natural, but has also
holism; even the notion and meaning of causality is not been fully established so far, is Artificial Life
still contested (Laland et al. 2012). research. (▶ Artificial Life) has constituted itself with
On the other hand, each computer program runs on the slogan “Life as it could be” (Langton 1989), in the
a mathematical machine, and any ecological models sense of, could be artificially generated. Models in AL
implement implicitly a formal approach to ecology, and have been focused at the reductionist approach in
any ▶ artificial life model implements a formal approach which living behavior is sought to be generated artifi-
to biology. This implied theory can be seen as a hypothet- cially from abstract building blocks. The commentary
ical core to a theory about life and can be interpreted as of the results as “something is missing” by Brooks
reductionism. The goal of reductionist ecological and (2001) still largely holds. AL seems to share with
much of AL modeling is to follow the successful exam- ecology the strange relationship between methodolog-
ples set by environmental modeling above. This, however, ical rigor and practical relevance. Using this analogy
implies that ecological and AL models attempt serving would imply focusing in a parallel manner on attempts
holism and reductionism at the same time. “formalizing holism,” i.e., studying life as virtual
Models in ecology appear as either practically (rather than artificial) behavior with methods from
(empirically) relevant or as methodological rigorous computer science. So far, however, there has been little
but rarely both at the same time (Peters 1991). This contact in the research agendas of AL and ecological
leaves the field of ecological modeling with two oppo- modeling as indicated by the few respective entries in
site points of orientation: mainly empirical with AL or ecological modeling conferences.
a pragmatic focus on usefulness or mainly theoreti- Rosen proposed to formalize modeling relations
cal/speculative while remaining untestable. between organisms and environment in categorical lan-
A promising modeling approach to the ability of guage (Rosen 1991). In the language of (▶ Mathematical
organisms of behaving strategically and deciding due Structure, Category) theory, there are two modeling par-
to an actual environment is ▶ agent based modeling. adigms available which encompass the above dilemma:
This modeling technique is a new rapidly expanding Firstly, a functional modeling paradigm which has
field not only of ecological modeling (Grimm and been adopted from physics and which provides
Railsbeck 2005) but also in other fields such as sociol- a constructive interpretation to the slogan: “Life as it
ogy or economy. It is, however, still characterized could be artificially and symbolically constructed on
by an unsettled relationship to theory (O’Sullivan a computer.” It derives the environmental relationship
and Haklay 2000) and a strained relationship with of life as a function of state. Secondly, there is an inter-
empirical studies (Clifford 2007). active modeling paradigm which characterizes life at
Within the life sciences, ecological modeling has the interface with its environment as emergent behavior.
been perceived as an applied science. The theoretical It can be adopted from computer science and provides
background of the models being applied stems from a classificatory interpretation to the slogan: “Life as
physics. In such a reductionist perspective, the rela- it could behave virtually on a computer.” In the related
tionship of an organism or ecosystem with an environ- field of computational intelligence, the latter approach
ment is modeled as a function of state (e.g., implied by has become famous by the so-called Turing test.
a natural law). In a holistic perspective, the relation- A correspondent behavioral classification of life has
ship can be modeled as emergent behavior observed at been proposed (Cronin et al. 2006).
Edge Betweenness Centrality 647 E
Since the behavior-based definition of intelligence Mahr B (2009) Die informatik und die logik der modelle.
by A. Turing, theoretical computer science has come Informatik Spektrum 32(3):228–249
McGuffie K, Henderson-Sellers A (2005) A climate modelling
up with a unifying approach of formalizing behavior of primer. Wiley, London
systems (Rutten 2000). Categorically, this approach O’Sullivan D, Haklay M (2000) Agent-based models and
can be abstracted as (▶ Coalgebra). The two modeling individualism: is the world agent-based? Environ Plann
approaches above are categorical duals of each other. A 32:1409–1425
Oyama S, Griffiths PE, Gray RD (2000) Cycles of contingency –
However, only one of them has dominated the fields developmental systems and evolution. MIT Press,
of AL and ecological modeling so far, i.e., the former Cambridge
physical one. The second approach exists as a theoret- Peters RH (1991) A critique for ecology. University Press,
ical possibility: In this perspective, the relationships of Cambridge
living system to their context and how they behave
Rosen R (1991) Life itself – a comprehensive inquire into the E
nature, origin, and fabrication of life. Columbia University
towards their environment become the defining fea- Press, New York
tures of life. The corresponding states observable at Rutten JJMM (2000) Universal coalgebra: a theory of systems.
interfaces are the mere (epistemic) signatures of this Theor Comp Sci 249(1):3–80
behavior rather than their material causes.

Edge Betweenness
Cross-References ▶ Edge Betweenness Centrality

▶ Agent-Based Modeling
▶ Artificial Life Edge Betweenness Centrality
▶ Coalgebra
▶ Complex System Long Jason Lu1 and Minlu Zhang2
▶ Emergence 1
Division of Biomedical Informatics, Cincinnati
▶ Mathematical Structure, Category Children’s Hospital Research Foundation, Cincinnati,
▶ Model Organism OH, USA
▶ Process-based Model 2
Department of Computer Science, University of
Cincinnati, Cincinnati, OH, USA

References
Synonyms
Brooks R (2001) The relationship between matter and life.
Nature 409:409–411 Edge betweenness
Buchanan B (2009) Onto-ethologies: the animal environments
of Uexkull, Heidegger, Merleau-Ponty, and Deleuze. State
University of New York Press, New York
Clifford NJ (2007) Models in geography revisited. Geoforum Definition
39:675–686
Cronin L, Krasnoger N, Davis BD, Alexander C, Robertson N,
The edge betweenness centrality is defined as the num-
Steinke JHG, Schroeder SLM, Cooper G, Gardner PM,
Kholbystov AN, Siepmann P, Whitaker BJ, Marsh D (2006) ber of the shortest paths that go through an edge in
The imitation game – a computational chemical approach to a graph or network (Girvan and Newman 2002). Each
recognizing life. Nat Biotechnol 24(10):4 edge in the network can be associated with an
Edwards PN (2010) A vast machine: computer models, climate
edge betweenness centrality value. An edge with
data, and the politics of global warming. MIT Press,
Cambridge a high edge betweenness centrality score represents
Grimm V, Railsbeck SF (2005) Individual-based modeling and a bridge-like connector between two parts of a net-
ecology. Princeton University Press, Princeton work, and the removal of which may affect the com-
Laland KN, Sterelny K, Odling-Smee J, Hoppitt W, Uller T (2012)
munication between many pairs of nodes through the
Cause and effect in biology revisited: is Mayr’s proximate-
ultimate dichotomy still useful? Science 334:1512–1516 shortest paths between them. Figure 1 illustrates an
Langton C (1989) Artificial life. Addison-Wesley, Reading example of eight nodes in a network, and the red
E 648 Edge Ontology

interactions between biological molecules. Different


types of relationships between pathway components
can be represented by edge ontology. Lu et al. (2007)
provide a prototype of an edge ontology that divides
edge into four levels: direction, type, subtype, and
specification. Edge ontology is quite a novel concept
still far from being completed.

Edge Betweenness Centrality, Fig. 1 An edge with a high


edge betweenness centrality value. In the network, the red edge References
between two red nodes has the highest betweenness centrality
score among all edges Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim
PM, Montelione GT, Gerstein MB. Comparing classical
pathways and modern networks: towards the development
of an edge ontology. Trends Biochem Sci. 2007;32(7):
edge between two red nodes has a high edge between-
320–331.
ness centrality value of 16. The removal of this edge
will result in a partition of the network into two densely
connected subnetworks.
Efficiency
Cross-References Olga Vitek
Department of Statistics, Department of Computer
▶ Biological Applications of Network Modules
Science, Purdue University, West Lafayette, IN, USA
▶ Modules in Networks, Algorithms and Methods

Definition
References
Efficiency is a property of experimental design. It is
Girvan M, Newman ME (2002) Community structure in social
and biological networks. Proc Natl Acad Sci USA inversly proportional to the variance of the data-
99(12):7821–7826 derived estimates of the unknowns. Smaller variation
leads to higher efficiency.

Edge Ontology Cross-References

Jiguang Wang ▶ Designing Experiments for Sound Statistical


Beijing Institute of Genomics, Chinese Academy of Inference
Sciences, Beijing, Beijing, China

Synonyms EFM

Arrow ontology ▶ Elementary Mode

Definition
Eigen Decomposition
Inspired by ▶ gene ontology, edge ontology provides
a hierarchical vocabulary of terms for describing ▶ Principal Component Analysis (PCA)
Electronic Health Record Systems 649 E
by a given linear transformation, the prefix “eigen”
Eigenvalue is used, as in eigenfunction, eigenmode, eigenface,
eigenstate, and eigenfrequency. Eigenvalues and eigen-
Tianshou Zhou vectors have many applications in both pure and applied
School of Mathematics and Computational Sciences, mathematics. They are used in matrix factorization, in
Sun Yet-Sen University, Guangzhou, quantum mechanics, and in many other areas.
Guangdong, China

Definition Elasticity Coefficient


E
The eigenvectors of a square matrix are the nonzero Emma Saavedra and Rafael Moreno-Sánchez
vectors that, after being multiplied by the matrix, either Department of Biochemistry, National Institute for
remain proportional to the original vector (i.e., change Cardiology “Ignacio Chávez”, Mexico City, Mexico
only in magnitude, not in direction) or become zero.
For each eigenvector, the corresponding eigenvalue is
the factor by which the eigenvector changes when Definition
multiplied by the matrix. The prefix eigen- is adopted
from the German word “eigen” for “own” in the sense It is a quantitative value of the ability of a pathway
of a characteristic description. The eigenvectors are enzyme (or group of enzymes) to change its rate or
sometimes also called characteristic vectors. Similarly, activity when changes in its ligands (substrates, prod-
the eigenvalues are also known as characteristic ucts, activators, inhibitors) are varied during the path-
values. way function. It is written as eaiX where a is the rate or
The mathematical expression of this idea is as activity of a pathway enzyme i and X is any of its
follows: if A is a square matrix, a nonzero vector u is ligands. The elasticities are positive for ligands that
an eigenvector of A if there is a scalar l (lambda) increase the enzyme rate whereas they are negative for
such that: those that decrease it. The elasticity coefficients are
intrinsic properties of the enzymes since they only
Au ¼ lu depend on its particular kinetic capabilities.

The scalar l is said to be the eigenvalue of A


corresponding to u. An eigenspace of A is the set of Cross-References
all eigenvectors with the same eigenvalue together
with the zero vector. However, the zero vector is not ▶ Metabolic Control Theory
an eigenvector.
These ideas often are extended to more general
situations, where scalars are elements of any filed,
vectors are elements of any vector space, and linear Electronic Health Record Systems
transformations may or may not be represented by
matrix multiplication. For example, instead of real Jesus Bisbal
numbers, scalars may be complex numbers; instead Departament de Tecnologies de la Informació i les
of arrows, vectors may be functions or frequencies; Comunicacions, Universitat Pompeu Fabra,
instead of matrix multiplication, linear transformations Barcelona, Spain
may be operators such as the derivative from calculus.
These are only a few of countless examples where
eigenvectors and eigenvalues are important. Definition
In such cases, the concept of direction loses its ordi-
nary meaning, and is given an abstract definition. An electronic health record system is the software
Even so, if that abstract direction is unchanged system which provides the functionality needed to
E 650 Electronic Health Records

create, manage, and exploit the information contained Characteristics


in a set of ▶ electronic health records Bisbal and Berry
2011; Dick et al. 1997. Scope
The scope of an electronic health record refers to the
range of information it contains. Ideally, it should
provide a longitudinal, lifelong “cradle to grave”
References
record of all kinds of information about an individual
Bisbal J, Berry D (2011) An analysis framework for (Grimson 2001). Typically, however, this is limited to
electronic health record systems: interoperation and noncontiguous periods of life and includes information
collaboration in shared healthcare. Methods Inf Med like demographics, medical history, medication and
50(2):180–189
allergies, immunization status, laboratory test results,
Dick RS, Steen EB, Detmer DE (eds) (1997) The computer-
based patient record: an essential technology for health and radiology images.
care, Revised editionth edn. Institute of Medicine, National Advances in clinical medicine and biomedical
Academy Press research impact on clinical practice. This, in turn,
increases the expectations of caregivers on the scope
of information to be included within an electronic
health record. For example, genetic data is not yet
Electronic Health Records commonly included in most systems, but it may be
a requirement in the near future if therapies are rou-
Jesus Bisbal tinely based on such information.
Departament de Tecnologies de la Informació i les The adoption of electronic health records is a long and
Comunicacions, Universitat Pompeu Fabra, costly process, both at the technological as well as at the
Barcelona, Spain organizational levels, and there is still no sufficient sci-
entific evidence that this process is indeed cost-effective
(Chaudhry et al. 2006; Price Water House Coopers 2007)
from the point of view of the health-care providers.
Synonyms
Electronic Health Records Organization
Computer-based patient records (CPR); Electronic
Health care generates large volumes of information,
healthcare records (EHCR); Electronic medical
which must be appropriately structured and accessed to
records (EMR); Electronic patient records (EPR)
ensure its efficient exploitation during care delivery.
There are several different ways in which the contents
of an electronic health record can be organized (Bisbal
Definition and Berry 2011). A very common approach in existing
systems is the episode-based EHR, in which informa-
An electronic health record (EHR) is the digital tion is indexed along the time line according to the
representation of health-related information about interactions between a patient and the care provider(s).
an individual. It provides a unified view over all This organization emphasizes the administrative side
relevant information, irrespective of where and of patient information.
how this data is created or managed. Its primary Alternatively, information can be structured
purpose is to improve the delivery of health according to the problem-oriented EHR paradigm
care, including, but not limited to, increased (Weed 1968), in which patient information is indexed
efficiency and reduction of human errors (Kohn by the list of clinical “problems” suffered by the sub-
et al. 2000). ject. Such problems can include, for example, risk
It must be differentiated from an ▶ electronic health factors, symptoms, signs, and fully diagnosed clinical
record system, which is the software system which conditions. This provides a clinically-oriented view of
provides the functionality needed to create, manage, patient information, often preferred by caregivers and
and exploit the information contained in a set of commonly available in primary care, but not yet
electronic health records. widely used in secondary/tertiary care.
Electronic Health Records 651 E
Finally, more flexible approaches acknowledge the difficulties, however. It must address syntactic and
complexity of the health-care domain and the variety semantic integration issues, just like any other approach
of user types that interact with an electronic health (except for the unlikely case of a green field site). It may
record system. No a priori record structure is assumed also suffer from performance limitations, due to repeated
in this case; thus the organization of the information in queries to remote database servers and data integration
the record is defined by each caregiver, according to (transformation) processes that must be performed “on
his/her specific needs. For this reason, it has sometimes the fly.”
been referred to as neutral EHR organization (Bisbal There is a middle ground between the consolidated and
and Berry 2011). This approach is technologically federated approaches, referred to as materialized EHR, in
more challenging to realize. It can be achieved through which all or part of the EHR data may be materialized in
the so-called two-level modeling paradigm (Garde a local data repository (Hull and Zhou 1996). E
et al. 2007). Its first level, the reference model, is
a predefined set of very abstract classes and aggrega- Interoperability of Electronic Health Records
tion rules that provide the flexibility to model An Institute of Medicine’s 2001 report (IOM 2001)
any information concept. Its second level, the identified four stages of health organization evolution
▶ Archetypes, add semantics and constraints to the in the progression from autonomous modes of working
potential instances of the reference model, in order to to fully coordinated shared care with a high degree of
ensure that the desired semantics are captured by the specialization and expertise as follows:
clinical concepts defined by those archetypes. • Stage 1: highly fragmented practice with individuals
functioning autonomously and little specialization.
Electronic Health Records Construction • Stage 2: referral networks of loosely structured
Several mechanisms have been devised to create the multidisciplinary teams.
“unified view” of all health-related information • Stage 3: more patient-centered team-based care but
expected to be provided by an electronic health record focusing primarily on needs and intentions of health-
(Bisbal and Berry 2011). The first and technically most care professionals with some decision support but
feasible approach, referred as consolidated EHR, stores little integration of health information.
all this data in a repository considered part of the EHR • Stage 4: shared multidisciplinary care, evidence-
system itself. It is the approach most commonly used in based and patient-centric practice with strong ser-
practice, due to its conceptual simplicity and because vice coordination between practices and with good
its development relies on sound database methodologies. quality practices and performance measures.
It suffers, however, from a significant limitation. It The latest stage of this evolution characterizes the
assumes that the EHR system’s own data repository health care which is hoped to be provided in most indus-
will be the only source from which data is delivered to trialized countries. Patient care is shared among several
the caregiver. However, clinical decisions are commonly independent caregivers, and a health condition com-
taken based on multisource information (e.g., centralized monly requires collaboration between several clinicians.
medical history, plus departmental specialized investiga- The complete realization of this shared health-care
tions), and the electronic health-care record should scenario clearly requires integrating all the relevant infor-
ideally unify access to all these sources. mation (Dick et al. 1997), potentially from several inde-
As an alternative, it is also possible to provide the pendent institutions. In this context, interoperability
so-called federated EHR (Grimson et al. 1998). This between electronic health records systems becomes
approach provides a true view of the underlying data, essential. This refers to the ability to exchange informa-
querying the underlying data sources which store all tion between systems (Noy 2004). Several levels of inter-
required information, without requiring any changes to operability can be defined (Bisbal and Berry 2011).
existing system (Sheth and Larson 1990). Thus, it lever- Functional interoperability, for example, requires that
ages the investment made in legacy systems (Bisbal et al. communicating systems agree on the exact content, struc-
1999), in contrast to the consolidated approach. Another ture, and meaning of the information to be exchanged.
advantage of the federated approach is that data is never Remarkable efforts like the HL7 industry standard have
out-of-date, as it is at all times taken from its original data initially pursued this approach. However, without
source. The approach is not without its technical a precise definition of the semantics of this information,
E 652 Electronic Healthcare Records (EHCR)

interoperability between systems cannot be guaranteed Grimson J, Grimson W, Berry D, Stephens G, Felton E, Kalra D
(Iakovidis et al. 2007). In that respect, the combination of et al (1998) A CORBA-based integration of distributed elec-
tronic healthcare records using the synapses approach. IEEE
standardized terminologies (e.g., ▶ ontologies) together Trans Inf Technol Biomed 2(3):124–138
with domain concept definitions (i.e., ▶ archetypes, as Grimson J (2001) Delivering the electronic healthcare record for
described above) will raise the level of interoperability. the 21st century. Int J Med Inform 64:111–127
This leads to semantic interoperability (▶ Semantic Garde S, Hovenga E, Buck J, Knaup P (2007) Expressing clinical
data sets with openehr archetypes: a solid basis for ubiquitous
Web, Interoperability), which ensures that the informa- computing. Int J Med Inform 76(3):334–341
tion being exchanged is computer-processable. For that Hull R, Zhou G (1996) A framework for supporting data inte-
reason, international efforts (Eichelberg et al. 2005), like gration using the materialized and virtual approaches. ACM
CEN’s 13606 and HL7’s RIMv3, aim to standardize how SIGMOD Record 25(2):481–492
Iakovidis I, Dogac A, Purcarea O, Comyn G, Laleci GB
domain concepts (i.e., ▶ Archetypes, ▶ Templates) (2007) Interoperability of eHealth systems – selection of
should be defined in order to achieve this level of recent EU’s research programme developments. In: Proceed-
interoperability. ings of the international conference eHealth: combining
health telematics, telemedicine, biomedical engineering and
bioinformatics to the edge, Berlin
Secondary Uses of Electronic Health Records IOM (2001) Commitee on quality of healthcare in america
In addition to improving health care delivery, electronic institute of medicine. Crossing the quality chasm: A new
health records also hold the promise of accelerating the health system for the 21st century. National Academic Press
outcomes and efficiency of biomedical research. The Kohn LT, Corrigan JM, Donaldson MS (eds) (2000) To err is
human: building a safer health system. National Academic
aggregation of vast amounts of patient-level information Press, Washington, DC
can, for example, be the basis upon which to test models/ Kohl P, Noble D (2009) Systems biology and the virtual physi-
theories about disease mechanisms and treatment out- ological human. Mol Syst Biol 5:292
comes, as is done in systems biology and the virtual Noy NF (2004) Semantic integration: a survey of ontology-based
approaches. ACM SIGMOD Record 33(4):65–70, Special
physiological human (Kohl and Noble 2009), or decrease section on semantic integration
patient recruitment cycle times for clinical trials. Price Water House Coopers (2007) The economics of IT and
The realization of this secondary use requires that hospital performance. Price-Water House Coopers’ Health
electronic health records are in wide spread use. In Research Institute, Boston
Sheth AP, Larson JA (1990) Federated database systems for
addition, the level of ▶ interoperability between these managing distributed, heterogeneous, and autonomous data-
systems needs to be sufficiently high as to facilitate the bases. ACM Comput Surv 22(3):183–236, Special issue on
integration of related sources, in order to efficiently heterogeneous databases
exploit this wealth of information. Weed LL (1968) Medical records that guide and teach. New Eng
J Med 278(11): 593–599 and 278(12):652–657. Published
also in MD Comput 10(2):100–114, March–April 1993

References

Bisbal J, Lawless D, Wu B, Grimson J (1999) Legacy informa-


Electronic Healthcare Records (EHCR)
tion systems: issues and directions. IEEE Software
16(5):103–111 ▶ Electronic Health Records
Bisbal J, Berry D (2011) An analysis framework for
electronic health record systems: interoperation and
collaboration in shared healthcare. Methods Inf Med
50(2):180–189
Chaudhry B, Wand J, Wu S, Maglione M et al (2006) Systematic Electronic Medical Records (EMR)
review: impact of health information technology on quality,
efficiency, and costs of medical care. Ann Intern Med
▶ Electronic Health Records
144(10):742–752
Dick RS, Steen EB, Detmer DE (eds) (1997) The computer-based
patient record: an essential technology for health care
(revised edition). Institute of Medicine, National Academy,
Washington, DC
Eichelberg M, Aden T, Riesmeier J, Dogac A, Laleci GB
Electronic Patient Records (EPR)
(2005) A survey and analysis of electronic healthcare record
standards. ACM Comput Surv 37(4):277–315 ▶ Electronic Health Records
Embryonic Induction 653 E
Elementary Flux Mode ELMO

▶ Elementary Mode ▶ Elementary Mode

Elongation Complex
Elementary Mode
▶ Transcription Elongation Complex
Sang Yup Lee E
Department of Chemical and Biomolecular
Engineering and Department of Bio and Brain
Engineering, Korea Advanced Institute of Science and Elongation Factor
Technology (KAIST), Daejeon, Korea
▶ Transcription Elongation Factor

Synonyms

EFM; Elementary flux mode; ELMO


Elongation Phase of Translation

▶ Translation Elongation

Definition

Elementary modes are a minimal set of pathways and EMBL Genome Database
all possible steady-state flux distributions that the net-
work inherently can achieve (Schuster et al. 2002). ▶ EBI Genome Resources
This set of vectors can be calculated from the
stoichiometric matrix of a biochemical network
using convex analysis, and there exists a unique
set of elementary modes for a given network. Embryological Explanation
1. Each elementary mode is a minimal set of pathways
that consists of the minimum number of reactions ▶ Explanation, Developmental
necessary to exist as a functional unit. This property
is called “non-decomposability” or “genetic inde-
pendence.” If any reaction in an elementary mode is
eliminated, the resulting elementary mode cannot Embryonic Induction
operate as a functional unit.
2. The elementary modes are the set of all routes Philippe Huneman
through a given network corresponding with Institut d’Histoire et de Philosophie (IHPST),
property. des Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France

References Definition
Schuster S, Hilgetag C, Woods JH, Fell DA (2002) Reaction
routes in biochemical reaction systems: algebraic properties,
Before the rise of genetics, Hans Spemann in 1901
validated calculation procedure and example from nucleotide already emphasized the role of inducers at the tissular
metabolism. J Math Biol 45(2):153–181 scale, a role which can be played by many chemical
E 654 Emergence

substances, as Joseph Needham recognized later (Order Jacobson A, Sater A (1988) Features of embryonic induction.
and life), leading to the “paradox of an apparently Development 104:341–359
Kuroda H, Wessely O, De Robertis E (2004) Neural induction in
nonspecific stimulus eliciting a specific developmental Xenopus: requirement for Ectodermal and Endomesodermal
response” (Jacobson and Sater 1988, p. 341). Inducers signals via Chordin, Noggin, b-Catenin, and Cerberus.
trigger the development of sets of cells, morphogenetic PlosBio 2(5):623–634
fields, likely to develop into specific cell fates. The
neural crest, for example, is a transitory structure which
among other things gives rise to the nervous system.
Morphogenetic fields can until some stage be induced
into other fates by proper induction as it has been shown Emergence
by Spemann and Mangold’s sea urchin experiments.
This proves the plasticity proper to early development. Ulrich Krohs
The experimental investigation of the nature of Department of Philosophy, University of M€unster,
inducers as well as the relation between morphogenetic M€unster, Germany
fields and organizers has been a main endeavor
of embryology in the pre-molecular biology era. Cur-
rent developmental theory investigates the genetic Definition
mechanisms of induction in terms of transcription
factors, etc., and the developmental genes, such as Emergence is the appearance of novel properties,
homeogenes, underpinning morphogenetic fields structures, patterns, and processes in a system, which
(De Robertis et al. 1991). are absent from the isolated components of the system.
Inducers cannot induce anything except in Concepts of emergence vary in demanding the novel
a “competent” tissue, that is, a tissue which is already features also being unpredictable from knowledge of
prepared to deal with them and respond to them. This has the components and their possible interactions (weak
been again shown about the induction of the central emergence), or being irreducible (▶ Reduction) to
nervous system, which actually involves more requisites properties of the components (strong emergence).
than the sole induction by a specific center in the meso- Just novelty, without unpredictability and irreducibil-
derm – as one usually thought – but also signals ity, is sometimes called nominal emergence.
expressed in the ectoderm already from the blastula
stage, without which neural induction is inefficient Characteristics
(Kuroda et al. 2004). Therefore, developmental pro-
cesses require both induction and “competence” Taxonomy and History
(as a capacity to respond to specific signals) (Dawid The concept of emergence captures the view that
2004), which is a modern version of the complementarity the whole is more than the sum of its parts. When
between epigenesis (inducers) and preformism (compe- G.H. Lewes introduced the term in 1875, it was linked
tence) as two poles in developmental explanations. to the view of irreducibility of the systems level to the
level of components. This metaphysical view of emer-
gence was most influentially put forward by Broad
(1923) and, in a less strict version, by Alexander
Cross-References
(1920) (see McLaughlin 1992). Today, the concept
comes differentiated into several more specific ver-
▶ Explanation, Developmental
sions (see Bedau 2003), not all of which require irre-
ducibility. The concept of emergence has thus lost its
strong metaphysical implications and is used today
References even within strictly physicalist frameworks.

Dawid I (2004) Organizing the vertebrate embryo – a balance of


Nominal Emergence
induction and competence. PlosBio 2(5):0579–0583
De Robertis E, Morita E, Cho K (1991) Gradient fields and Nominal emergence requires novelty, as any other
homeobox genes. Development 112:669–678 concept of emergence, but neither unpredictability nor
Emergence 655 E
unexplainability or irreducibility. A systemic entity is Strong or Metaphysic Emergence
novel if does not occur – is not instantiated without – Strong emergence, in contrast, is an ontological con-
a system of a particular sort. For example, a stripe pattern cept, which is meant to describe ontological or con-
of a fabric emerges nominally, since no smallest ceptual irreducibility of an emergent property to
component of the fabric, even if colored, can be properties and configuration of components. The case
striped. Similarly, an electric dipole displays elec- was made for the emergence of the mind, or of mental
trical properties that no simple carrier of an ele- properties, on complex neuronal structures (Broad
mentary charge can have. So the property of being 1923). It is all but clear both, whether mental proper-
a dipole emerges nominally in a system of spatially ties do emerge as claimed, and, if so, whether they are
separated charges. Notwithstanding, we can easily irreducible to neuronal properties.
predict the nominally emergent property from The concept of strong emergence conceptualizes E
more basic knowledge about the system and its the idea that a complex system brings entities into
components. being which are irreducible not only for the reason of
The concept of nominal emergence is the least demand- the actual or in-principle limitedness of the human
ing member of this family of concepts, but consequently, it mind, but because of their categorical difference to
captures the least interesting cases of emergence. anything which can possibly be linked theoretically
to the components of the system. This case was what
Weak or Epistemic Emergence the British emergentists had in mind when introducing
The concept of weak emergence takes emergence the concept. The human mind was thought to emerge
as unpredictability or unexplainability of higher- from neuronal structures, being of a quality that is
level properties. It is thus an epistemological con- irreducible to any interplay of material entities.
cept. In this perspective, emergence is theory rela-
tive, because predictability of a systemic property The Dynamical Criterion for Emergence
obviously depends on the theories available for A fourth concept of emergence avoids the spookiness
prediction, as does explainability. What is emergent of strong, metaphysical emergence without making the
today, may no longer be emergent tomorrow, given contingent status of theory building at a certain time
that a theory becomes available that allows for the criterion for emergence. This concept relies on
prediction or explanation of the property in ques- a dynamical criterion for emergence. According to this
tion. Regulatory properties of a feedback loop were concept, any behavior that is either the result of
weakly emergent when first observed. Nowadays, a bifurcation where a structural instability leads to
cybernetics and systems theory allow predicting, a shift in dynamical form, or that establishes a new system
in principle, such loops and their systemic effects. level (Hooker 2011). The latter is the case, e.g., in phase
So, there was a state of knowledge where feedback transitions to the solid state. The former occurs, e.g., in
loops were weakly emergent, while it is nowadays flocking behavior and in a bifurcation that establishes
merely nominally emergent. a limit cycle from a singularity. The concept of dynamical
Weak emergence is scientifically and metaphysi- emergence thus relies on novelty alone and avoids any
cally unproblematic. It makes no presupposition claim about unexplainability or unpredictability, while
about anything falling in principle out of the scope of avoiding at the same time focusing on the more or less
scientific explainability. It is rather the manifestation trivial cases of nominal emergence.
of the limitation of our knowledge at any given time.
The concept may be strengthened by claiming that Emergence and Causality
unpredictability and unexplainability in a particular case Unfortunately, emergence is often tried to be
do hold for principle reasons and will not vanish with any cashed out in terms of causality. The higher-level
possible future development of theorizing. Theory rela- property is said to cause the systems components to
tivity is thus replaced by absolute unpredictability or show certain behavior. This view of so-called down-
unexplainability. However, it is unclear how such in- ward causation (▶ Interlevel Causation) is flawed by
principle unexplainability could be proven. Conse- a major misconception of what a causal relation is. Let
quently, the absolute, not theory-relative concept of us take first the opposite view: components making up
weak emergence may not have any application. a system. The behavior of the system depends on, and
E 656 Enabling Conditions

is perhaps determined by, the behavior of the compo- with theory reduction. Nothing with regard to the
nents. Despite this determination, the relation is not emergent dynamics or level needs to remain
a causal one. It is constitutional instead: The compo- unexplained, and no other than low-level entities are
nents in their relation to each other constitute the assumed to make up the system.
system. Their behavior is at the same time part of the
behavior of the system. To test this intuition, we may
adopt a view of causality as transfer of energy or other Cross-References
conserved parameters. We see immediately that the
components cannot transfer energy to the system of ▶ Causality
which they are proper parts, since their energy is ▶ Complexity
already included in the energy of the system. So the ▶ Mechanism
relation is not causal. Causal processes are diachronic, ▶ Reduction
constitution is synchronic. “Upward causality” is syn- ▶ Supervenience
chronic and thus constitution rather than diachronic
causality. Analogously, a causal relation of “down-
ward causation” does not exist, though obviously References
a part behaves differently whether it is isolated or
integrated in a system. However, the system does not Alexander S (1920) Space, time, and deitiy. Macmillan, London
Bedau MA (2003) Downward causation and autonomy in weak
transfer energy to its parts; instead, parts exchange
emergence. Princ Rev Int Epistemol 6:5–50
energy with other parts. The system merely constrains Broad CD (1923) Scientific thought. Humanities Press,
what the parts can do – and even this is not the precise New York
description of what is going on. In fact, the interacting Hooker C (2011) Conceptualizing reduction, emergence and
self-organization in complex dynamical systems. In:
parts constrain each other – which is most conveniently
Hooker C (ed) Philosophy of complex systems. Elsevier,
described as systems behavior. ‘Consisting of parts’ Amsterdam, pp 195–222
and ‘being constituted by’ are synchronic relations, McLaughlin B (1992) The rise and fall of British emergentism.
while the influence between the components is In: Beckermann A, Flohr H, Kim J (eds) Emergence or
reduction? Essays in the prospects of nonnaturalist physical-
diachronic. Thus, only the latter is a causal relation.
ism. de Gruyter, Berlin, pp 49–93
Emergence is not a diachronic, causal, but a synchronic, Stephan A (1997) Armchair arguments against emergentism.
constitutional relation. This does not rule out that Erkenntnis 46:305–314
emergent properties may have causal effects, e.g., on
other high-level systems (Stephan 1997).

Emergence and Reduction Enabling Conditions


Whether emergence and reduction are opposites or
even necessarily linked to each other depends on the Marie I. Kaiser and Andreas H€uttemann
concepts or emergence and reduction used. Weak or Department of Philosophy, University of Cologne,
epistemic emergence obviously excludes theory reduc- Cologne, Germany
tion, since it is defined exactly via irreducibility.
On the other hand, it is perfectly compatible with
ontologic reduction, namely, the thesis that nothing Synonyms
consists of more than its proper parts. In contrast,
strong emergence in its metaphysical understanding Manifestation conditions; stimulus conditions
is opposed to and thus incompatible with ontologic
reduction. It is also incompatible with theory reduc- Definition
tion. The claim is that there are higher-level entities
and consequently high-level concepts which cannot be Enabling conditions are those conditions, which
translated into the concepts of the lower level. Emer- are – according to the simple conditional analysis
gence according to the dynamical criterion, finally, (SCA) – necessary and sufficient for the occurrence
matches perfectly well with ontological as well as of the manifestation (Choi and Fara 2012).
Endoreplication 657 E
Cross-References components of the DNA replication machinery have
to be loaded onto the replication origins. This licensing
▶ Disposition step can only be accomplished when the CDK activity
is low (i.e., G1 phase of the cell cycle). The licensed
origins can initiate replication during the next step
References
when CDK activity is increasing and promotes repli-
Choi S, Fara M (2012) Dispositions. In: Zalta EN (ed) The cation by phosphorylation. An important consequence
Stanford Encyclopedia of Philosophy (Spring 2012 Edition), of the increasing CDK activity is the inhibition of
http://plato.stanford.edu/archives/spr2012/entries/dispositions/ a new round of licensing. Therefore, CDK activity
inhibits the first step (licensing), but activates the sec-
ond step (initiation) in replication control. Accord- E
ingly, the replication origin can only be relicensed in
Endoreplication the subsequent cell cycle after the CDK activity
dropped to zero. The CDK activity drop occurs at the
Orsolya Kapuy and Béla Novák end of the mitosis and this allows one and only one
Department of Biochemistry, Oxford Centre for DNA replication per cell cycle (Morgan 2006).
Integrative Systems Biology, University of Oxford, Eukaryotic cells employ two different CDKs to drive
Oxford, UK their mitotic cycles. The S-phase and M-phase CDKs
peak during DNA replication and mitosis, respectively.
Both CDKs inhibit the relicensing of the replication
Definitions origin and this inhibition avoids over-replication under
normal conditions. Over-replication means that the cell
Endoreduplication: A type of over-replication. More breaks the rule of alternating DNA replication and mito-
than one copy of DNA is created without intervening sis, and initiates new replication without undergoing
chromosome segregation. This mechanism causes new mitosis. This can happen in the absence of M-phase
rounds of complete S phases without mitosis and each CDK, when the periodic oscillation of S-phase CDK is
replication is a discrete event. Replication origins are able to drive full rounds of complete S phases. This type
not activated more than once in each S phase. The of over-replication is called endoreduplication. Another
DNA content is increasing by doubling (2N, 4N, 8N, possible way for DNA over-replication is when the
etc.) (Arias and Walter 2007; Porter 2008). replication origins fire more than once per S phase
Re-replication: A type of over-replication. Replica- thereby breaking the “once and only once DNA synthe-
tion origins undergo more than one initiation event per sis” rule. This type of over-replication is called re-rep-
S phase leading to increasing DNA content. There are lication. Large increase in the DNA content with some
overlapping replications resulting onion-like struc- exceptions is lethal for the cell (Morgan 2006; Arias and
tures with intermediate N values (e.g., between 2N Walter 2007; Porter 2008).
and 4N) (Arias and Walter 2007; Porter 2008). The molecular requirements of over-replication
mechanisms were first characterized in fission yeast by
the Nurse’s lab. Therefore, in this entry we are focusing
Characteristics on some well-studied examples of endoreduplication and
re-replication in fission yeast. The Cdc2/Cdc13 and
One of the most important roles of the mitotic cell the Cdc2/Cig2 complexes represent the mitotic and the
cycle machinery is to maintain the alternation of S-phase CDKs in this organism. Both complexes are
S and M phases through generations. The cell has to inhibited by the Rum1 stoichiometric inhibitor in G1
duplicate its full set of chromosomes, and then the two phase and both Cdc13 and Cig2 levels are downregulated
copies have to be distributed equally to the daughter by APC/Slp1 at the end of mitosis. The APC/Srw1
cells at mitosis. DNA replication is controlled by promotes Cdc13 degradation in G1 phase of the cell
a two-step mechanism to make sure that one and cycle. The G2/M transition is controlled by both Wee1
only one S phase occurs per each cell cycle. The two kinase and Cdc25 phosphatase regulating the inhibitory
separated steps are the following: first, essential Tyr-phosphorylation of Cdc2/Cdc13. The synthesis of
E 658 Endoreplication

Endoreplication, Fig. 1 The


Cdc18
main regulatory interactions in
the fission yeast cell cycle
regulatory network controlling
DNA replication and mitosis. MBF Rum1
Cdc25
Arrows and blocked end lines
represent stimulation of
synthesis/activation and
degradation/inhibition,
respectively Cdc2 / Cdc2 /
Cig2 Cdc13 Wee1

APC / APC /
Srw1 Slp1

S-phase cyclin is regulated by a transcription factor while its inhibitor Rum1 is high which allows licensing
complex (MBF), which also assists the activation of the of replication origins in G1. The increasing level of
licensing factor, Cdc18, in fission yeast. The degradation Cdc2/Cig2 complex downregulates Rum1 after
of Cdc18 is promoted by CDK-dependent phosphoryla- reaching a threshold and S phase is initiated. However,
tion. Both Cdc2/Cdc13 and Cdc2/Cig2 control the activ- the high Cig2 level cannot be maintained because
ity of their regulators through feedback loops (Fig. 1). Cdc2/Cig2 turns off its own transcription factor
Depletion of Cdc10 (an MBF subunit) blocks the cell in (MBF). This effect results in a decrease of Cig2 level
G1 phase, while cdc18D enters into mitosis without in G2 phase and makes possible the re-accumulation of
DNA replication resulting in a “cut” phenotype Rum1 in the absence of Cdc2/Cdc13. The rise of
(Nishitani and Nurse 1995; Sveiczer et al. 2004; Morgan Rum1/Cig2 ratio resets the cells back to G1 where
2006). relicensing of origins can take place. The boundary
between G2 and G1 phases are experimentally not
Endoreduplication defined yet, but multiple rounds of DNA replication
The first example for endoreduplication in fission yeast are well separated (Hayles et al. 1994; Novak and
came during studying Cdc2 kinase, the catalytic subunit Tyson 1997; Kiang et al. 2009). Cdc13D cells undergo
of both S- and M-phase CDKs (Broek et al. 1991). periodic relicensing shown by cdc13Dcdc10ts and
Cdc2ts mutants blocked in post-replicative G2 phase of cdc13Drum1D double mutants. Cdc13D cells are
the cycle were nitrogen starved and heat-shocked (49 C) arrested in G1 after depletion of the MBF component,
for a short time in order to increase protein degradation. because the synthesis of both the licensing factor and
When cells were released from the cdc2ts block, they Cig2 is blocked. In contrast, Rum1 depletion stops the
were re-replicating their DNA rather than entering into endoreduplication in G2. The Cdc13DCig2D double
mitosis. This suggested that cells “forgot” their cell mutant cannot re-replicate either because Cig2 is
cycle stage as a consequence of these treatments. Later required to initiate S phase (Hayles et al. 1994;
experiments have shown that the “memory” molecule Novak and Tyson 1997; Kiang et al. 2009).
lost is the Cdc2/Cdc13 complex responsible for mitotic Overexpression of the CDK inhibitor (Rum1) leads
initiation and progression. When the Cdc13 expression to a phenotype similar to cdc13D (Moreno and Nurse
was turned off by germinating spores deleted for cdc13 1994). After upregulation of Rum1 level, the first G1
(cdc13D), cells underwent through multiple round of phase is much longer than in wild-type, because the
DNA replications (Hayles et al. 1994). stoichiometric inhibitor has to be eliminated before the
In the absence of Cdc2/Cdc13, cells cannot enter cell can enter into S phase. Rum1 inhibits both Cdc2/
into mitosis but the activity of the S-phase CDK (Cdc2/ Cig2 and Cdc2/Cdc13 complexes, but it is a weaker
Cig2) oscillates and drives G1-S-G2-G1 endoredu- inhibitor for the S-phase CDK than for the Cdc2/
plication cycles. The Cig2 cyclin level is low in G1, Cdc13. Therefore, the high Rum1 level creates
Endoreplication 659 E
a Endoreduplication b Re-replication
cdc25ts block & release with cdc13D cdc25ts block & release with cdc18OP
1.2
1.4
Cdc2 / Cdc13, Cdc13T, Rum1

Cdc2 / Cdc13, Cdc13T, Rum1


1.0
1.2

0.8 1.0

0.8
0.6
0.6
0.4
0.4
0.2
0.2 E
0.0 4 0.0 100
1.4 M G1 S G2 G1 S G2 G1 S M G1 S
2.0
1.2 80
3
Cdc2 / Cig2, Cig2T

Cdc2 / Cig2, Cig2T


1.0 1.5
60

Cdc18

Cdc18
0.8
2
1.0
0.6 40

0.4 1 0.5 20
0.2

0.0 0 0.0 0
0 100 200 300 400 0 100 200 300 400
time (min) time (min)

Endoreplication, Fig. 2 Time course results of two different alternating without mitosis resulting in the proper doubling of
types of over-replication. The relative activity/level of the main the DNA content. (b) Re-replication. Fission yeast cells are
regulatory components are plotted on the “y” axis. The particular blocked and released from cdc25ts then Cdc18 licensing factor
cell cycle phases are indicated on the figure. (a) Endoredu- was overexpressed. The cells enter S phase and have
plication. Fission yeast cells are blocked and released from a continuous DNA replication without any relicensing period
cdc25ts then Cdc13 was deleted from the G1-S-G2 phases are

a complete block of mitosis while Cdc2/Cig2 can be and over again. This continuous DNA replication does
activated periodically and triggers S phases (Moreno not require any Cdc10 function (cdc18OPcdc10ts double
and Nurse 1994; Novak and Tyson 1997). mutant re-replicates) suggesting that cells are not pass-
ing through Start during re-replication. Even more
Re-replication Cdc18-driven over-replication does not depend on any
Overexpression of the Cdc18 licensing factor results in further protein synthesis (Cdc18 overexpression
another type of over-replication called re-replication. followed by cycloheximide addition does not block re-
Cdc18 plays an important role in the initiation of DNA replication) (Nishitani and Nurse 1995).
replication and its protein level is peaking during
S phase. Overproduction of Cdc18 protein is able to Controlling Over-Replication in Higher Eukaryotes
spoil the control of origin firing. Upregulation of Higher eukaryotes have some extra pathways which
Cdc18 level by thiamine repressible nmt1 promoter have to work coordinately to block more than one
blocks mitosis and cells show enlarged nucleus DNA replication per cycle. A protein called geminin
(Fig. 2b). Cdc18-induced over-replication is clearly dif- and the Cul4-Ddb1Cdt2 pathway can suppress over-
ferent than endoreduplication. The FACS profiles are replication by inhibiting the licensing factors. Emi1
much more smeared referring to incomplete rounds of directly can regulate APC activity negatively. How-
DNA replications or multiple replication initiations at ever, the essential element of the system is again CDK,
origins rather than G1-S-G2-G1 type oscillation. The and its level clearly defines that the cell could go to
large amount of Cdc18 keeps the cell in the S phase and M phase or should be blocked in an over-replication
promotes the re-initiation of the replication origins over state (Arias and Walter 2007; Porter 2008).
E 660 Enhancer

Cross-References transcription levels of specific genes. An enhancer can


be located either upstream or downstream to the
▶ Cdk Inhibitors transcriptional start site, and does not need to be
▶ Cell Cycle Checkpoints immediately next to the genes it regulates.
▶ Cell Cycle of Mammalian Cells
▶ Cell Cycle, Fission Yeast
▶ Cell Cycle Modeling, Differential Equation
▶ Cell Cycle, Physiology Ensembl
▶ Cyclins and Cyclin-Dependent Kinases
▶ DNA Replication Jingky Lozano-K€uhne
▶ Mitosis Department of Public Health, University of Oxford,
Oxford, UK
References

Arias EE, Walter JC (2007) Strength in numbers: preventing Definition


rereplication via multiple mechanisms in eukaryotic cells.
Genes Dev 21:497–518 A database that provides genomic information on
Broek D, Bartlett R, Crawford K, Nurse P (1991) Involvement of
p34cdc2 in establishing the dependency of S phase on mito-
human genome, other vertebrates, and model organ-
sis. Nature 349:388–393 isms (Hubbard et al. 2002).
Hayles J, Fisher D, Woollard A, Nurse P (1994) Temporal order of
S phase and mitosis in fission yeast is determined by the state of
the p34cdc2-mitotic B cyclin complex. Cell 78:813–822
Kiang L, Heichinger C, Watt S, B€ahler J, Nurse P (2009)
References
Cyclin-dependent kinase inhibits reinitiation of a normal
S-phase program during G2 in fission yeast. Mol Cell Biol Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L,
29:4025–4032 Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E,
Moreno S, Nurse P (1994) Regulation of progression through the G1 Gilbert J, Hammond M, Huminiecki L, Kasprzyk A,
phase of the cell cycle by the rum1+ gene. Nature 367:236–242 Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E,
Morgan D (2006) The cell cycle – principles of control. Oxford Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S,
University Press, UK Slater G, Smith J, Spooner W, Stabenau A, Stalker J,
Nishitani H, Nurse P (1995) p65cdc18 plays a major role control- Stupka E, Ureta-Vidal A, Vastrik I, Clamp M (2002)
ling the initiation of DNA replication in fission yeast. Cell Ensembl genome database project. Nucleic Acids Res
83:397–405 30(1):38–41
Novak B, Tyson JJ (1997) Modeling the control of DNA replica-
tion in fission yeast. Proc Natl Acad Sci USA 94:9147–9152
Porter AC (2008) Preventing DNA over-replication: a Cdk per-
spective. Cell Div 3:3
Sveiczer A, Tyson JJ, Novak B (2004) Modelling the fission
Ensemble
yeast cell cycle. Brief Funct Genomic Proteomic 2:298–307
Celine Vens
Department of Computer Science, Katholieke
Universiteit Leuven, Leuven, Belgium
Enhancer

Jianhua Ruan Synonyms


Department of Computer Science, University of Texas
at San Antonio, San Antonio, TX, USA Committee; Multiple classifier system

Definition Definition

An enhancer is a short segment of DNA that can be An ensemble is a collection of learning models
bound by proteins (transcription activators) to enhance (▶ Model Validation, Machine Learning), whose
Ensemble 661 E
Ensemble, Fig. 1 Schematic
representation of ensembles Combination

Model Model ... Model Model

Dataset

outputs are combined to form the output of the attribute (in the case of prediction tasks), by injecting E
ensemble. Ensembles can be used for several randomness into the base learning algorithms, or
machine learning tasks, such as predicting class labels by a combination of several of these techniques.
(▶ Classification), predicting numeric outcomes Many different ensemble learning techniques have
(▶ Regression), or ▶ clustering. Therefore, the indi- been proposed. The most well-known include
vidual models can be combined in several ways. ▶ bagging (Breiman 1996) and boosting (Freund and
Shapire 1997), which both manipulate the training
examples.
Characteristics Ensemble classifiers have been applied in several
bioinformatics domains, such as protein folding (Shen
An ensemble learning algorithm constructs a set of base and Chou 2006), cancer classification (Tan and Gilbert
learners and produces an output by combining the out- 2003), and gene function prediction (Guan et al. 2008).
puts of these base learners, by Fig. 1. While ensembles
have been proposed for unsupervised learning Cross-References
(▶ Learning, Unsupervised) tasks, such as ▶ clustering,
they are especially popular for supervised learning ▶ Bagging
(▶ Learning, Supervised) problems, where individual ▶ Classification
predictions can be combined by taking a (weighted or ▶ Clustering
unweighted) vote for ▶ classification tasks, and by ▶ Learning, Supervised
taking the average in the case of ▶ regression. ▶ Learning, Unsupervised
Ensemble techniques are popular prediction methods ▶ Model Testing, Machine Learning
because of their excellent generalization performance. ▶ Regression
A necessary and sufficient condition for an ensemble of
classifiers to be more accurate than any of its individual
References
members is that the classifiers are accurate and diverse
(Hansen and Salamon 1990). An accurate classifier does Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
better than random guessing on new examples. Two Dietterich TG (1997) Machine-learning research. AI Magazine
classifiers are diverse if they make different errors on 18(4):97–136
Freund T, Shapire RE (1997) A decision-theoretic generalization
new data points. The underlying principle is that these
of online learning and an application to boosting. J Comput
uncorrelated errors get corrected by the voting or aver- Syst Sci 55(1):119–139
aging procedure. Dietterich (1997) gives a detailed Guan Y, Myers CL, Hess DC, Barutcuoglu Z, Caudy AA,
explanation why ensembles can improve the predictive Troyanskaya OG (2008) Predicting gene function in a hier-
archical context with an ensemble of classifiers. Genome
performance of their base classifiers.
Biol 9(Suppl 1):S3
Ensembles can be constructed either by using Hansen L, Salamon P (1990) Neural network ensembles. IEEE
a different learning algorithm in each iteration, or by Trans Patt Anal Mach Intell 12:993–1001
using the same algorithm, but training it in different Shen HB, Chou KC (2006) Ensemble classifier for protein fold
pattern recognition. Bioinformatics 22(14):1717–1722
ways. The latter can be accomplished by introducing
Tan AC, Gilbert D (2003) Ensemble machine learning on gene
some diversity, e.g., by manipulating the training set, expression data for cancer classification. Appl Bioinform
by manipulating the descriptive attributes or the target 2(3):S75–S83
E 662 Entity Disambiguation

Cross-References
Entity Disambiguation
▶ Named Entity Recognition
▶ Term Disambiguation, Text Mining ▶ Text Mining
▶ Text Mining, Tools
▶ Word Sense Disambiguation

Entity Identification
Entrez
▶ Entity Mention Normalization
Jingky Lozano-K€uhne
Department of Public Health, University of Oxford,
Oxford, UK

Entity Mention Normalization


Definition
J€
org Hakenberg
Department of Computer Science and Department of Entrez is the integrated, text-based search and retrieval
Biomedical Informatics, Arizona State University, system used at the National Center for Biotechnology
Tempe, AZ, USA Information for the major databases, including PubMed,
Nucleotide and Protein Sequences, Protein Structures,
Synonyms Complete Genomes, Taxonomy, and others (NCBI 2011).

Entity identification; Grounding


References

National Center for Biotechnology Information (2011) Data-


Definition bases. World Wide Web. http://www.ncbi.nlm.nih.gov/Data-
base/. Accessed 24 May 2011
In ▶ text mining and data integration, entity mention
normalization refers to the process of mapping
a (biomedical) ▶ named entity, such as a gene, disease,
or species, to a unique identifier provided by an author- Entrez Genome Project
ity. For genes, such authorities are NCBI’s EntrezGene
(http://www.ncbi.nlm.nih.gov/sites/entrez/?db¼gene), ▶ NCBI BioProject Genome Resources
HUGO for human genes (http://www.genenames.org/),
MGI for murine genes (http://www.informatics.jax.org/
); for diseases, identifiers could map to UMLS concepts
(http://www.nlm.nih.gov/research/umls/). Entropy
Challenges for entity mention normalization
arise from synonymity, where an entity can be Daniel Polani
referred to by many names; and in particular from Adaptive Systems Research Group, School of
homonymity, where one and the same name can Computer Science, University of Hertfordshire,
refer to multiple entities. As an example, “Fas” Hatfield, UK
can refer to the “TNF receptor superfamily, mem-
ber 6” gene, as well as to “fatty acid synthase.” In
addition, entity mention normalization has to map Synonyms
the gene to the correct organism, as both are known
for various species. Entropy, Information Theory
Environment 663 E
Definition other outcome, and thus the entropy, that is, uncer-
tainty is maximal.
Uncertainty (or Entropy, in the sense of Shannon’s Maximum Entropy Principle: For an experiment
information theory) is a quantitative measure for where there is no prior knowledge or expectation on the
the uncertainty of the outcome of a random experiment outcome, one assumes a probability of maximal uncer-
modeled by a probabilistic variable. Given a probabi- tainty, that is, maximum entropy. If nothing is known, one
listic variable X with outcome set X and probability p, therefore will assume equiprobable probabilities on the
the entropy H(X) of X is defined as: outcomes. This generalizes to the case when there is prior
knowledge. In that case, one seeks a probability distribu-
X
HðXÞ :¼  pðxÞ log pðxÞ; (1) tion that achieves the maximum possible entropy value
x2X while respecting this prior knowledge (Jaynes 1957). This E
constitutes the least biased prior on the probability distri-
with the convention that 0 log 0  0. The logarithm is bution that is consistent with the prior knowledge.
often taken with basis 2, and the outcome is then
expressed in bits.
References
Extreme Cases: The entropy is never negative. It is
0 if X has only one outcome x0 , that is, p(x0 ) ¼ 1 for Jaynes ET (1957) Information theory and statistical mechanics.
exactly the outcome x0 , and p(x) ¼ 0 for any other Phys Rev 106(4):620–630
outcome x. The entropy H(X) reaches its maximal
value of log | X| (with | X| the number of possible out-
comes) if all possible outcomes x have the equal proba-
bility pðxÞ ¼ jX1 j . Entropy, Information Theory
Entropy of Joint Variables: For joint probabilistic
variables X and Y, the entropy is given by ▶ Entropy

X
HðX; Y Þ ¼  pðx; yÞlog pðx; yÞ
ðx;yÞ2XY
Entropy Reduction
Conditional Entropy: For joint probabilistic vari-
▶ Information
ables X and Y, one defines the conditional entropy of
Y given a particular outcome x by:
X
H ðYjX ¼ xÞ ¼  pðyjxÞlog pðyjxÞ: (2) Entry into Mitosis
y2Y

▶ Cell Cycle Transitions, G2/M


From this, one defines the conditional entropy of Y
given X as the average over all conditional entropies
given particular outcomes x:
X
Environment
HðYjXÞ ¼ pðxÞH ðYjX ¼ xÞ: (3)
x2X C. Vyvyan Howard
Centre for Biomedical Sciences, University of Ulster,
Characteristics: The entropy measures the uncer- Coleraine, UK
tainty about the outcome of the experiment or obser-
vation X in advance. In the special case of only one
outcome, there is no uncertainty about the outcome, Definition
and thus the entropy vanishes. In the other special case,
all outcomes are equally probable. This corresponds to The environment can be defined as the totality of the
the case that no outcome can be a priori favored to any biological and nonbiological influences that act upon
E 664 Environmental Genome

an organism, a population of organisms, or an ecolog-


ical system and that can influence survival. Biological Enzyme Databases
factors include the organisms, their food, and their
interactions. Nonbiological factors can be divided ▶ Metabolic Networks, Databases
into physical and chemical and can act in combina-
tions. Purely physical influences include all electro-
magnetic radiations, temperature, and meteorology.
Purely chemical influences include soil constitution,
water and air quality, and chemical pollution. Organ- Enzyme-Ligand Interaction Databases
isms can respond to environmental changes by evolu-
tionary adaptations. ▶ Specialized Metabolic Component Databases

Cross-References

▶ Cancer and Environmental Influences Epifluorescence Microscope

▶ Fluorescence Microscopy

Environmental Genome

▶ Metagenome
Epifluorescence Microscopy

Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for
Environmental Genomics Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA
▶ Metagenomics

Definition

Epifluorescence microscopy is a method of fluores-


Environmental Metabolomics cence microscopy that is widely used in life sciences.
The excitatory light is passed from above (or, for
▶ Metametabolomics inverted microscopes, from below), through the objec-
tive lens and then onto the specimen instead of passing
it first through the specimen.

Environmental Proteomics
Cross-References
▶ Metaproteomics
▶ Spectroscopy and Spectromicroscopy

Environmental Transcriptomics Epigenetic Modification

▶ Metatranscriptomics ▶ Post-Replication Modification


Epigenetics 665 E
through cell division. Epigenetic markings can be
Epigenetic Regulation considered as signatures for the transcriptional state
of the gene.
Yufei Huang
Picower Institute for Learning and Memory, Characteristics
Massachusetts Institute of Technology, Cambridge,
MA, USA Epigenetics operates through ▶ post-replication mod-
Greehey Children’s Cancer Research Institute, ification of DNA and the ▶ post-translational modifi-
University of Texas at Texas Health Science Center at cation of ▶ chromatin proteins. Unlike DNA sequence
San Antonio, San Antonio, TX, USA which can vary between individuals, epigenetic
Department of Epidemiology and Biostatistics, variations can be detected not only between individuals, E
University of Texas at Texas Health Science Center at but also between tissues and as a function of time in terms
San Antonio, San Antonio, TX, USA of development and aging of an organism. Epigenetics
provides an interface between genetic constitution
(genotype) of an organism and the environment.
Definition DNA in the nucleus of eukaryotic cells is packed
compactly through interaction with specific nuclear
Epigenetic regulation is the inherited regulation of proteins called ▶ histones, leading to the formation of
phenotype or gene expression caused by nongenetic the basic units of chromatin called ▶ nucleosomes
mechanisms other than underlying DNA sequence. (Fig. 1). Chromatin is the template for transcription
and the consequent interaction of proteins that regulate
Cross-References gene expression. Though nucleosomes are present all
along the ▶ genome, the level of compaction differs
▶ Gene Regulation between regions of the genome. This brings about
variation in accessibility of the genome to regulators
of transcription. The more compact regions of chro-
Epigenetics matin are less transcriptionally active and they consti-
tute ▶ heterochromatin; while the more open and
Vani Brahmachari and Shruti Jain active regions form the ▶ euchromatin. The epigenetic
Dr. B. R. Ambedkar Center for Biomedical Research, marking can be either on DNA or on the histones.
University of Delhi, Delhi, India
Epigenetic Marks on DNA
The DNA generally consists of the four bases adenine
Synonyms (A), guanine (G), cytosine (C), and thymine (T). These
can be modified by covalent linkage of methyl groups
Gene regulation at specific positions within their ring structure, after
replication of DNA during cell division. The most
Definition common modification seen in the eukaryotic cells is
the methylation of C at the fifth carbon, leading to the
The term Epigenetics was coined by C. H. Waddington formation of 5-methyl cytosine (Fig. 2). The cytosine
in 1942. Ptashne and Gann (2002) defined it as “. . .a residues with guanine as the 30 neighbor, in the dinu-
change in the state of expression of a gene that does not cleotide CpG are methylated. This methylation status
involve a mutation, but that is nevertheless inherited in can be detected by ▶ methylation-sensitive restriction
the absence of the signal (or event) that initiated the endonucleases or by direct sequencing following
change.” The Greek prefix epi meaning “on the top of” ▶ bisulfite conversion (Frommer et al. 1992). Such
implies that the epigenetic features override the effect analysis has revealed tissue-specific methylation pat-
of the nucleotide sequence of the gene. Epigenetics terns, the unmethylated DNA being associated with
impacts gene expression without changing the nucleo- actively transcribing genes and methylated DNA with
tide sequence of the gene, which can be maintained silenced genes. The unmethylated CG sites, in regions
E 666 Epigenetics

Epigenetics,
Fig. 1 Diagrammatic
representation of packaging of Chromosome*
DNA in the nucleus. Histone H1
Chromatin as shown
represents the “Bead on Nucleus Histone H2A
Cell
a string” structure. *The
chromosome shown is the Histone H2B
structure chromosomes
assume during cell division Chromatin Histone H3
and are referred to as
metaphase chromosomes. The Histone H4
tail of histone H4 is marked, as Nucleosome
shown, and similar extensions
called histone tails are present
in all the histone monomers Histone Octamer + DNA
and these are most frequently
modified by acetylation/
methylation/phosphorylation
as described in the text
Histone tail

DNA

Histone Octamer

NH2 methyltransferase) resulting in conserving the DNA meth-


H3C ylation pattern on the genome through cell division. How-
N ever, demethylation of the DNA can occur only as
a consequence of failure to methylate the C residues in
N O CG dinucleotides on the newly synthesized DNA strand.
H Presently there is no direct evidence for the existence of
Epigenetics, Fig. 2 Structure of 5-methyl cytosine. The enzymes that can remove methyl group from cytosine
methyl group added after replication is marked residue in the DNA (DNA demethylase).

known as CpG islands are observed in constitutively Epigenetic Marks on Chromatin Proteins
expressing genes. The methylated cytosine inhibits The ▶ histone proteins in the chromatin structure are
the trans-acting factors from binding to the regulatory also important targets of epigenetic modifications.
regions producing a closed chromatin structure. Alter- There are organisms like the yeast (Saccharomyces
natively, an inhibitory protein like methylated cytosine cerevisiae) and the fruitfly (Drosophila melanogaster)
binding protein 2 (▶ MeCP2) can also bind to methyl- where DNA methylation has not been detected so far.
ated DNA to produce closed chromatin structure and However histone modifications are known in almost all
hence, regulate the expression of gene (Bogdanović and eukaryotes. Histone modifications are carried out after
Veenstra 2009). Therefore, undermethylation is translation of the protein; hence they are post-transla-
associated with active transcription of the gene. The tional modifications (Latchman 2005). Based on the
palindromic nature of CG dinucleotide ensures the main- three-dimensional structure of histones, the core and
tenance of methylation after DNA replication, thus, the N-terminal tails are distinguished from each other
satisfying the “memory” component of the epigenetic (Fig. 1). Post-translational modifications are known
mechanism. As the replication fork moves, the newly to occur both on the core and the N-terminal tails.
synthesized DNA strand, carrying hemi-methylated Most often the lysine and arginine residues in the
CpG dinucleotides, is methylated by DNMT (DNA histones undergo covalent modifications like
Epigenetics 667 E
P Ac Ub

Histone H2A: NH3-S G RG KQ G G KA RA... AV L LPK KT E S H H K


1 5 119 P-PO43-
Ac Ac Ac Ub Ac-CH3CO

Me-CH3
Histone H2B: NH3-PEP V K S A P V P K K G SK K A I N K...VK Y T SS K
5 12 15 120 Ub-Ubiquitin, a 76
amino acid protein.
Me Me Me Me
Me Me
P Me P E
P
Me Me
Ac Ac Ac Ac Ac Ac

Histone H3: NH3-AR TK QTA R KS T GG K A P R K QL A S K A A R K S A . . . GV K K . . . E F K T D


2 3 4 9 10 14 17 18 23 26 27 28 36 79

Ac
P Me Ac Ac Ac Me

Histone H4: NH3-S G R G K G GK GL GKG G A K R H R K V L R D N I QGI T


1 3 5 8 12 16 20

Epigenetics, Fig. 3 Potential sites of post-translational modi- Peterson 2003). The amino acids most frequently modified are
fications of histone tails of nucleosomes. The alphabets after - S-serine, K-lysine, E-glutamic acid, R-arginine, and T-tyrosine.
NH3 are the single letter symbols for amino acids. The chemical The numbers below the amino acids indicate their position in the
nature of the modifications is shown. Ac acetylation, Me meth- amino acid sequence of the respective histone
ylation, P phosphorylation, Ub ubiquitination (Jaskelioff and

acetylation, methylation, phosphorylation, One of the mechanisms is that the methylated-CpG


ubiquitination, and sumolyation (Fig. 3). These post- binds to specific proteins resulting in a compact
translational modifications in turn can be activating or closed chromatin structure occluding the DNA from
silencing marks, which flag the recruitment of different transcription factors, and thus repressing gene expres-
trans-acting regulatory factors that generally occur as sion and 5-methyl cytosine binding protein 2 (MeCP2) is
protein complexes. Apart from post-translational mod- one such protein as described earlier. These proteins can
ifications, variant histones like H2A.Z and H3.3 coded recruit various other histone-modifying enzymes like
by different genes in the genome are also known to be ▶ histone deacetylases and other complexes. Therefore,
associated with active genes (Henikoff et al. 2004) and conversion of a genomic region into a transcriptionally
H1b with silenced genes (Lee 2004). inactive state is a multistep process with different proteins
Each modification of histones is abbreviated to participating in the process in different genomic contexts.
indicate the histone modified, the chemical nature of The next most important repressive epigenetic
the modification, and the amino acid modified and its regulation is brought about by post-translational
position with in the histone protein. For example, his- histone modifications like deacetylation, ubiquitination,
tone H3 tri-methylated at 27th lysine residue from its sumolyation, and methylation at specific amino acid
N-terminal end is represented conventionally as residues of different histones. Histone deacetylation,
H3K27me3. The repressive and the activating marks brought about by histone deacetylase, is an important
on the histones are listed in Table 1. step in converting an actively transcribing region into
a repressed state. The proteins acting as repressors in fact
Linking Epigenetic Modifications to Gene bring about repression by recruiting histone deacetylase
Expression State to the desired genomic location. Deacetylation results in
The DNA methylation, brought about by DNA methyl the positive charge on the lysine residue of the histone
transferases (DNMT), plays a key role in converting the molecule, leading to a more compact chromatin structure
chromatin structure to either inactive or active state. and transcription repression. Similarly, addition of small
E 668 Epigenetics

Epigenetics, Table 1 Histone modifications and functions


Histone modification Histone (residues modified) Functional regulated
Acetylation H3(K9,K14,K18,K56)H4(K5,K8,K13,K16) Transcriptional activation
H3K56 DNA repair
H4K16 Chromosome condensation
Methylation H3 (K4,K36,K79), H3 (R17,R23), H4(R3) Transcriptional activation
H3 (K9,K27), H4(K20) Transcriptional repression
H4K20 DNA repair
Phosphorylation H3S10 Transcriptional activation
H2AS129, H4S1 DNA repair
H3S10, H3T3 Chromosome condensation
Ubiquitylation H2AK119, H2BK20 Transcriptional activation
H3, H4, H2A DNA repair
Sumoylation H4, H2A, H2B Transcriptional repression

proteins like ▶ ubiquitin or SUMO (small ubiquitin- to describe such states (Bernstein et al. 2006). Thus the
related modifier) on an internal lysine residue in the signature for expression state of genes in differentiated
histone is associated with repression of gene expression. cells is the outcome of a combination of epigenetic
The epigenetic modification can be reversed to modifications.
activate a silenced gene. Actively transcribing regions The observation of different modifications of histones
are devoid of DNA methylation. The conversion of within a given region of the genome has given rise to the
a repressed gene with methylated cytosine residues into concept of “Histone code” akin to the genetic code. The
an active unmethylated state has to be done with DNA concept first put forward by Jenuwein and Allis (2001)
synthesis either during replication or repair and MeCP2- suggests that the pattern of histone modifications in the
like proteins are phosophorylated leading to their genome generates a highly complex pattern that can be
disassociation from DNA and opening up of chromatin correlated with transcriptional status of the gene. The
structure to facilitate entry of transcription machinery. post-translational modifications of the histones confers
The relationship between histone modifications and a pattern of flagging the basic unit of chromatin in the
active gene expression is direct as in the case of acetyla- nucleus to signal either the recruitment of transcription
tion. Acetylation of specific lysine residues K9, 14, 18, activating complexes like the ▶ trithorax complexes or
23, 27 and K5, 8, 12, 16 in H3 and H4, respectively, leads silencing complexes like the ▶ polycomb complexes.
to loss of positive charge on lysine residues and weakens The memory component of epigenetics has
the association between histones and DNA, resulting in a complex mechanism. It is observed that the epigenetic
open chromatin state allowing the entry of transcription marks are always faithfully transmitted through mitosis,
activators. However, violation of this correlation is and sometimes, through meiosis like in cases of
observed in many cases, apparently repressive epigenetic ▶ genomic imprinting and ▶ X-chromosome inactiva-
marking on active genes which are explained by indirect tion. The mechanisms of transcriptional memory are yet
effects. For example, ubiquitination is a repressive mark, to be completely understood.
but ubiquitination of H2B acts as an activating mark as it
is accompanied with methylation on lysine residues at Cross-References
position 4 and 79. The phosphorylation of serine residues
in histone H3 leads to activation of gene expression, and ▶ Bisulfite Conversion
also enhances the ability of histone acetyl transferases ▶ Chromatin
(HATs) to acetylate lysine 14. On the other hand, ▶ Epigenetics, Drug Discovery
dephosphorylation and deacetylation at lysine 10 and ▶ Euchromatin
14 of histone H4 are signatures for repression, but they ▶ Genome
are known to promote methylation at position 9 of H4 ▶ Genomic Imprinting
and hence some of the potentially active genes bear these ▶ Heterochromatin
modifications. The term “bivalent mark” has been used ▶ Histones
Epigenetics, Drug Discovery 669 E
▶ MeCP2 Definition
▶ Methylation-Sensitive Restriction Endonucleases
▶ Nucleosomes The response to treatment varies widely in patients
▶ Polycomb Complexes with reference to beneficial effects as well as adverse
▶ Post-Replication Modification reactions. It is established that there is a genetic basis
▶ Post-translational Modifications for this variability, and ▶ pharmacogenomics is the
▶ Trithorax Complexes study of these underlying genetic variations at
▶ Ubiquitin and SUMO the whole genome level. The outcome of drug treat-
▶ X Chromosome Inactivation ment is a consequence of multiple steps culminating in
the interaction between the therapeutically active mol-
ecule and the target. The genes coding for drug trans- E
References porters and metabolizers and those coding for targets
of drug action are important players in the final out-
Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, come of treatment for diseases. Therefore, the mecha-
Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal
nisms that influence expression of the genes involved
A, Feil R, Schreiber SL, Lander ES (2006) A bivalent chroma-
tin structure marks key developmental genes in embryonic stem in ▶ pharmacokinetics and pharmocodynamics are
cells. Cell 125(2):315–326 important in drug discovery and disease treatment.
Bogdanović O, Veenstra GJC (2009) DNA methylation and In addition to genetic variations, epigenetic marks
methyl-CpG binding proteins: developmental requirements
and function. Chromosoma 118(5):549–565
on these classes of genes will impact drug metabolism.
Frommer M, Mcdonald LE, Millar DS, Collis CM, Watt F, Grigg In the systems biology approach to drug discovery and
GW, Molloy PL, Paul CL (1992) A genomic sequencing optimization of existing drugs, it is important to study
protocol that yields a positive display of 5-methylcytosine not only the genetic variations but also epigenetic
residues in individual DNA strands. Proc Nat Acad Sci USA
variations between populations and individuals.
89:1827–1831
Henikoff S, Furuyama T, Ahmad K (2004) Histone variants,
nucleosome assembly and epigenetic inheritance. Tren
Genet 20(7):320–326 Characteristics
Jaskelioff M, Peterson CL (2003) Chromatin and transcription:
histones continue to make their marks. Nat Cell Biol 5:395–399
Jenuwein T, Allis CD (2001) Translating the histone code. The discovery of novel drug targets and drugs to target
Science 293:1074–1080 them, as well as, discovering novel drugs for known
Latchman DS (2005) Gene regulation: a eukaryotic perspective. targets are the aspects of drug discovery relevant for
Taylor & Francis, London
Lee H (2004) Msx1 cooperates with histone H1b for inhibition of
consideration in this section. Significance of epige-
transcription and myogenesis. Science 304:1675–1678 netic effects in the context of drug discovery could be
Ptashne M, Gann A (2002) Genes and signals. Cold Spring in the following contexts:
Harbor Laboratory Press, Cold Spring Harbor 1. Epigenetic changes in the genome under disease
conditions as novel targets for treatment
2. Epigenetic variation in genes involved in drug metab-
olism in turn resulting in altered response to drugs
3. Drug molecules bringing about alteration in
Epigenetics, Drug Discovery epigenetic profile of tissues and, thus, affecting gene
expression
Vani Brahmachari and Shruti Jain
Dr. B. R. Ambedkar Center for Biomedical Research, Epigenetic Changes in Disease Conditions
University of Delhi, Delhi, India The tools, presently available, have made it possible to
analyze the epigenome. Epigenome is the epigenetic
marks both on DNA and the histones (▶ Epigenetics)
Synonyms at the whole genome level. Epigenome analysis has been
carried out in many disease contexts specially cancers
Drug metabolism; Drug targets, Epigenetics; and significant differences in DNA methylation profile
Epigenome; Gene regulation; Pharmacogenomics between normal and cancer cells from the same
E 670 Epigenetics, Drug Discovery

individual are reported in literature (Lo and Sukumar a therapeutically active drug molecule into a therapeuti-
2008). Such data can be obtained by different cally inactive form or a therapeutically inactive drug
approaches: (1) using methylation-sensitive restriction (referred to as ▶ prodrug) into a therapeutically active
endonucleases (▶ Epigenetics) or (2) ▶ chromatin form. The enzymes involved in drug metabolism can be
immunoprecipitation (ChIP) with antibodies directed divided into two categories based on the reaction they
against 5-methyl cytosine (▶ Epigenetics). These catalyze: Phase I comprising of oxidizing, reducing, and
methods applied on the total genome of the cells under hydrolyzing enzymes; Phase II comprising of methylat-
analysis will yield a pool of methylated DNA sequences ing, sulphonating, and acetylating enzymes. Two most
which are sequenced to derive information on the DNA studied examples of drug-metabolizing enzymes are
methylation pattern of the whole genome. It is observed cytochrome P450 monooxygenase system and alcohol
that the genome of cancer cells have higher methylation dehydrogenase, ADH which are oxidizing enzymes and
(hypermethylation) than normal cells. Since DNA meth- cytosolic N-acetyltransferase (NAT2) which has acety-
ylation leads to repression, there will be inappropriate lating activity. Various isoforms of cytochrome P450
gene silencing in cancer cells compared to normal monooxygenase system are encoded by genes CYP1A1,
cells. Pharmacological intervention by incorporating CYP2A6, CYP2C9, and CYP2E1.
▶ 5-azaCytosine, the non-methylatable analogue of
cytosine has been shown to reverse gene silencing in Where and When Are They Active?
cancer cells (Gilbert et al. 2004). Incorporation of this The principal organ of action of these genes is the liver
analogue also inhibits the activity of DNA methyl trans- as it is the first organ to be perfused by drugs absorbed
ferases (DNMT). The molecules which are used to alter in the gastrointestinal tract and hence, it has very high
the epigenome are designated as epigenetic drugs and concentrations of the drug-metabolizing enzymes.
are currently in use against hematological cancers like Other sites include the kidney, gut, lung, and the skin.
myelodysplastic syndrome (Gilbert et al. 2004). How-
ever, one has to consider the lack of selectivity of such Genetic Variations in the Drug Metabolism Genes
analogues at the cell and the gene level which would The genes coding for these enzymes vary between indi-
have undesirable effects. viduals both at genetic and epigenetic level resulting
The pattern of histone modifications on the whole in variable drug response between individuals. The
genome of cancer cells has also been compared with ▶ genetic polymorphisms in drug-metabolizing genes
that of normal cells using ChIP approach, which has have given rise to distinct subgroups in the population
shown that there is lower acetylation of histones in that differ in drug response. Polymorphisms in ▶ micro-
cancer cells. Acetylated histones that are present in satellite repeats and ▶ single nucleotide polymorphisms
actively transcribed genes are deacetylated by the (SNPs) in genes coding for drug transporters, metaboliz-
action of enzymes called histone decetylases (HDAC, ing enzymes, or drug targets lead to their functional
▶ Epigenetics). Deacetylation will lead to transcrip- variability. Thus, within a given population, individuals
tion repression, thus, altering gene expression profile can be characterized as hypermetabolizers, poor
of cancer cells compared to normal cells. Therefore the metabolizers, and non-metabolizers. Genetic polymor-
effort is to use inhibitors of histone deacetylation as phisms in drug-metabolizing enzymes may have more
therapeutic agents. Phenylbutyrate and buphenyl are subtle, yet clinically important consequences for
inhibitors of histone deacetylases. There are two interindividual variability in drug response.
HDAC inhibitors approved for clinical application in For example, SNPs in the gene NAT2 can distinguish
cutaneous T-cell lymphoma (CTCL; Grant et al. 2010; populations into two distinct subgroups: “slow
Mercurio et al. 2010). acetylators” and “fast acetylators.” NAT2 codes for the
enzyme N-acetyltransferase type 2 that conjugates
Epigenetic Influence on Drug Metabolism Genes substrates like ▶ isoniazid with an acetyl group, which
What Is Drug Metabolism? is subsequently eliminated from circulation. The slow
Drugs are metabolized by complex biochemical path- acetylators, therefore, have decreased ability to metabo-
ways dedicated for metabolism of ▶ xenobiotics. Phar- lize this class of drugs by NAT2 (Roden and George
macological compounds are metabolized by the same 2002). Similarly CYP1A gene polymorphisms can group
enzyme system, leading to the conversion of either individuals in to distinct subgroups. Further, the
Epigenetics, Drug Discovery 671 E
combination of genetic polymorphism in genes coding neuroprotective mechanisms in several neurodegenerative
for different drug-metabolizing enzymes like NAT2 diseases, in addition to its role as antiepileptic and anti-
and isoforms of CYP1A determines the drug response malignancy drug (Monti et al. 2009).
profile of an individual. For example, increased CYP1A, Genetic and epigenetic variation in genes between
coupled with slow acetylation results in reduced antican- individuals is an important issue to be considered in drug
cer activity of amonafide (Ratain et al. 1996). discovery. In addition, the nontarget effect of drugs not
only on protein activity but also in terms of their epige-
Epigenetic Influence on Drug Metabolism Genes netic effects being recognized recently, demands particu-
The role of epigenetic modifications in drug- lar attention and cannot be ignored. Interestingly similar to
metabolizing enzymes is well established in certain drug efficacy differences, adverse effects can be also
cases, for example, drug transporters like ▶ ATP-binding negligible or absent in the background of the genetic and E
cassette B1 transporter (ABCB1) and solute carrier epigenetic variation profile of different populations.
(SLC22A8) and CYP1A. The hypermethylation of CpG
islands of CYP1A1 involved in the metabolism of car-
cinogens and polycyclic aromatic hydrocarbons in cell Cross-References
lines like MCF-7 (human breast carcinoma) and HeLa
(human cervical adenocarcinoma) results in its repres- ▶ 5-azaCytosine
sion (Hirota et al. 2008). The acetylation of histone H3 is ▶ Apoptosis
another chromatin remodeling modification which is ▶ ATP-Binding Cassette B1 Transporter
well characterized in transcriptional activation of ▶ Chromatin Immunoprecipitation
ABCB1 gene (Hirota et al. 2008). ▶ Epigenetics
▶ Genetic Polymorphisms
Drugs Modulating Epigenetic Profile ▶ Isoniazid
Apart from the drugs recently being developed as epige- ▶ Microsatellite Repeats
netic drugs mentioned above, the case of volproic acid as ▶ Pharmacogenomics
an epigenetic drug is unique and creates a new paradigm ▶ Pharmacokinetics and Pharmacodynamics
to be considered in a system biology approach to drug ▶ Phase I Enzymes
discovery. Drugs currently in clinical use for certain ▶ Phase II Enzymes
diseases lead to epigenetic modulation of genes in addi- ▶ Prodrug
tion to their effect on their target; they can have wider ▶ Single Nucleotide Polymorphisms
effect on the transcriptional profile of the target cells. For ▶ Xenobiotics
example, if a drug induces hypermethylation and thus,
inactivation of the gene involved in its metabolism, it
can lead to the accumulation of the drug and therefore
References
toxicity. On the other hand, if the drug leads to
hypomethylation it can result in higher expression of Gilbert J, Gore SD, Herman JG, Carducci MA (2004) The clin-
the gene. Therefore drugs directed at other targets can ical application of targeting cancer through histone acetyla-
cause epimutations (alteration in epigenetic marking) tion and hypomethylation. Clin Cancer Res 10:4589–4596
Grant C, Rahman F, Piekarz R, Peer C, Frye R, Robey RW,
leading to altered gene expression, without changing Gardner ER, Figg WD, Bates SE (2010) Romidepsin: a new
the DNA sequence. One of the drugs known to have therapy for cutaneous T-cell lymphoma and a potential
such an effect is valproic acid (VPA, 2-propylpentanoic therapy for solid tumors. Expert Rev Anticancer Ther
acid). It is widely used as an antiepileptic drug and for the 10:997–1008
Hirota T, Takane H, Higuchi S, Ieiri I (2008) Epigenetic regula-
therapy of bipolar disorders. Its action was initially found
tion of genes encoding drug-metabolizing enzymes and
to be primarily related to neurotransmission and modu- transporters; DNA methylation and other mechanisms. Curr
lation of intracellular pathways. More recently, it is Drug Metab 9:34–38
recognized as an antineoplastic agent as well, acting on Lo P-K, Sukumar S (2008) Epigenomics and breast cancer.
Pharmacogenomics 9(12):1879–1902
cell growth, differentiation, and ▶ apoptosis. It has
Mercurio C, Minucci S, Pelicci PG (2010) Histone deacetylases
been demonstrated that valproic acid directly inhibits and epigenetic therapies of hematological malignancies.
histone deacetylases. This property is important for Pharmacol Res 62:18–34
E 672 Epigenome

Monti B, Polazzi E, Contestabile A (2009) Biochemical, molec-


ular and epigenetic mechanisms of valproic acid Epitope
neuroprotection. Curr Mol Pharmacol 2(1):95–109
Ratain MJ, Mick R, Janisch L, Berezin F, Schilsky RL,
Vogelzang NJ, Kut M (1996) Individualized dosing of Marie-Paule Lefranc
amonafide based on a pharmacodynamic model incorporat- Laboratoire d’ImmunoGénétique Moléculaire,
ing acetylator phenotype and gender. Pharmacogenetics Institut de Génétique Humaine UPR 1142, Université
6(1):93–101
Roden DM, George AL Jr (2002) The genetic basis of variability Montpellier 2, Montpellier, France
in drug responses. Nat Rev Drug Discov 1:37–44

Synonyms

Antigenic determinant
Epigenome

▶ Epigenetics, Drug Discovery


Definition

“Epitope” is a concept of the “▶ SpecificityType”


Epistasis concept of identification (generated from the ▶ IDEN-
TIFICATION Axiom) of ▶ IMGT-ONTOLOGY, the
Christoph Adami global reference in ▶ immunogenetics and ▶ immunoin-
Department of Microbiology and Molecular Genetics, formatics (Giudicelli and Lefranc 1999; Lefranc et al.
Michigan State University, East Lansing, MI, USA 2004, 2005, 2008; Duroux et al. 2008), built by IMGT®,
the international ImMunoGeneTics information sys-
tem® (http://www.imgt.org) (▶ IMGT® Information
Synonyms System). “Epitope,” or “antigenic determinant,” iden-
tifies the part of the antigen (Ag) or of the peptide/
Mutation interactions major histocompatibility (pMH) that is recognized by
the ▶ Paratope, or antigen-binding site, of the variable
(V) domains (▶ Variable (V) Domain) of an immuno-
Definition globulin (IG) or antibody or of a T cell receptor (TR),
respectively.
In genetics, epistasis refers to the situation where the The “Epitope” concept has two leafconcepts
effect of one gene on a particular ▶ mendelian trait is (▶ IMGT-ONTOLOGY, Leafconcept):
modified by another gene. Such genes are said to be • the “B cell epitope” identifies the part of the Ag that
epistatically interacting when producing the trait. is recognized by the paratope of a specific IG or
More generally, epistasis can refer to mutations within antibody
a single gene that interact epistatically in conferring • the “T cell epitope” identifies the part of the pMH
▶ fitness to the gene. In that case, the effect of one that is recognized by the paratope of a specific TR.
mutation on fitness depends on the identity of another An “epitope” is “conformational” or “linear,” based
mutation within the same gene. Epistasis is related to on its structure and its interaction with the paratope.
but distinct from pleiotropy, where a single mutation or A “conformational” epitope (or “discontinuous”
gene affects multiple traits. epitope) comprises amino acids that are close in the
three-dimensional (3D) structure but distant in the
primary amino acid sequence. Most B cell epitopes
Cross-References are conformational.
A “linear” epitope (or “contiguous” epitope) com-
▶ Fitness prises amino acids that are in continuity in the primary
▶ Mendelian Traits amino acid sequence. T cell epitopes are usually
Equilibrium Probability Distribution 673 E
identified as “linear” when referring to the processed receptor/peptide/MHC complexes. In: Schoenbach C,
peptide (p) presented in the groove (G) domains Ranganathan S, Brusic V (eds) Immunoinformatics.
Immunomics reviews, series of Springer Science and Business
(▶ Groove (G) Domain) of the major histocompatibil- Media LLC, Springer, New York, USA, Chap. 2, pp 19–49
ity (MH) proteins. However, in IMGT-ONTOLOGY, Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
the “T cell epitope” concept is identified as “discon- Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
tinuous” as it comprises amino acids of the peptide and Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
amino acids of the helices of the MH that bind to the Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
TR V domains (▶ Variable (V) Domain). and immunoinformatics. In Silico Biol 4:17–29
Analysis of the paratope/epitope interactions in Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
IG/Ag and TR/peptide/MH (TR/pMH) complexes with Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
three-dimensional (3D) structures are available in
Lefranc G (2005) IMGT-Choreography for immunogenetics E
and immunoinformatics. In Silico Biol 5:45–60
IMGT/3Dstructure-DB (http://www.imgt.org) (Kaas Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
et al. 2008). IMGT, a system and an ontology that bridge biological and
“Epitope” is one of the two major concepts character- computational spheres in bioinformatics. Brief Bioinform
9:263–275
istics of the IG/Ag and TR/pMH interactions, “epitope” is
on the antigen side, whereas the other one, “▶ Paratope,”
is on the antigen receptor (IG and TR) side.

Equilibrium
Cross-References
▶ Life Span, Turnover, Residence Time
▶ Chain Type ▶ Lymphocyte Population Kinetics
▶ Complementarity Determining Region (CDR- ▶ Steady State
IMGT)
▶ Groove (G) Domain
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering Equilibrium Probability Distribution
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom Ruiqi Wang
▶ IMGT-ONTOLOGY, Leafconcept Institute of Systems Biology, Shanghai University,
▶ IMGT-ONTOLOGY, SpecificityType Shanghai, China
▶ IMGT ® Information System
▶ Immunogenetics
▶ Immunoinformatics Synonyms
▶ Paratope
▶ Systems Immunology, Data Modeling and Scripting Steady-state probability distribution
in R
▶ Variable (V) Domain
Definition

In the stochastic framework, it is difficult to define an


References equilibrium configuration in terms of the number of
molecules as in a deterministic framework because any
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, reaction, which changes the number of molecules,
the Formal IMGT-ONTOLOGY paradigm. Biochimie takes place in probability. In deterministic differential
90:570–583 equations, an equilibrium means that there is no net
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
change in the number or concentration of molecules. In
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Kaas Q, Duprat E, Tourneur G, Lefranc M-P (2008) IMGT contrast, for a stochastic model, there is an equilibrium
standardization for molecular characterization of the T cell probability distribution, i.e., a set of probabilities that
E 674 eRF

the system has certain number of molecules, even Cross-References


though the number of molecules may change.
Generally, it is difficult to get the equilibrium prob- ▶ Stochastic Modeling of Translation Elongation and
ability distribution of a master equation analyti- Termination
cally. However, for some simple cases, it can be
obtained by recursion relations or Fokker–Planck References
equations.
Gardiner CW (2002) Handbook of stochastic methods. Springer,
Berlin
References

Hasty J, Pradines J, Dolnik M, Collins JJ (2000) Noise-based


switches and amplifiers for gene expression. Proc Natl Acad Error of Type I and Type II
Sci 97:2075–2080
Reddy VTN (1975) On the existence of the steady state in the
stochastic Volterra-Lotka model. J Stat Phys 13:61–64 Martin Carrier
Department of Philosophy, Bielefeld University,
Bielefeld, Germany

eRF
Definition
▶ Release Factor, Translation
Type I error, also known as false positive, concerns the
mistaken acceptance of an incorrect assumption. Type II
error, also known as false negative, designates the
Ergodicity erroneous rejection of a true account.

M. Carmen Romano1,2 and Ian Stansfield2


1 Cross-References
Institute for Complex Systems and Mathematical
Biology, University of Aberdeen, Aberdeen, UK
2 ▶ Non-empirical Values
Institute of Medical Sciences, University of
Aberdeen, Aberdeen, UK

ESCEC
Definition
▶ Experimental Standard Conditions of Enzyme
A system is said to be ergodic if the time average of the Characterizations
variable describing it, is equal to its ensembleaverage.

Suppose that we have measured a signal xSt m at
different points in time t ¼ 1,. . ., T, where Sm denotes
the underlying system.
 SmThe
 time average of the signal Euchromatin
PT Sm
is then denoted by xt t ¼ T t¼1 xt . Now assume
1

that instead of measuring the signal from one single Vani Brahmachari and Shruti Jain
system Sm at different time points, we have a whole Dr. B. R. Ambedkar Center for Biomedical Research,
ensemble {Sm}, m ¼ 1,. .., Mof identical systems, and University of Delhi, Delhi, India
we measure the signal xSt m at a fixed time point t
for every system Sm, from which we then calculate the
PM S m
ensemble
 Sm   average
 xSt m m ¼ M1 m¼1 xt . Then, if Synonyms
xt t ¼ xSt m m , the system is said to be ergodic
(Gardiner 2002). Open chromatin
Eukaryotic Translation Initiation Factor Interactions 675 E
Definition Characteristics

The actively transcribing regions of the genome which Eukaryotic Translation Initiation Process
is in an open chromatin state is called euchromatin. In eukaryotic cells, eIF2 preferentially binds GDP. The
The nomenclature “euchromatin,” in contrast to guanine nucleotide exchange factor eIF2B replaces
▶ heterochromatin, was initially given based on the ligh- this GDP to GTP, but only on the unphosphorylated
ter staining of the chromatin in these regions with dyes form of eIF2 (Fig. 1a: v1), so as to comprise the ternary
that bind to DNA, when the nuclei are observed under complex (i.e., TC, comprising eIF2, GTP, and initiator
the microscope. Euchromatin corresponds to gene-rich methionyl-tRNA, Met-tRNAi). This ternary complex
regions and active transcription. It is also enriched in is essential for translation initiation (Fig. 1a: v2). The
epigenetic marks associated with active transcription. interaction between eIF1A and the 40S ribosomal E
subunit promotes the dissociation of a ribosome and
ensures that the 40S ribosomal subunit is open to
Cross-References receive other eIFs. Subsequently, eIF1, eIF3, eIF5,
and ternary complex are recruited to the 40S ribosomal
▶ Epigenetics subunit thereby generating the 43S preinitiation
▶ Heterochromatin complex, in which the initiator tRNA Met-tRNAi is
held close to the ribosome’s P site (Sonenberg and
Hinnebusch 2009). This may happen via binding of
the individual factors to the 40S ribosomal subunit in
Eukaryotic Translation Initiation Factor separate steps. On the other hand, these factors may
Interactions interact through cooperative binding. There is evi-
dence to suggest that these factors load onto the 40S
Tao You1, George M. Coghill2 and ribosomal subunit in the form of a multifactor complex
Alistair J.P. Brown3 (Fig. 1a: v3, v4, Asano et al. 2000). Also various partial
1
Computational Biology, Innovative Medicines, complexes have been isolated. Therefore, the multifac-
AstraZeneca, Macclesfield, Cheshire, UK tor complex may form via different routes, as shown in
2
Institute for Complex System and Mathematical Fig. 1b (You et al. 2010).
Biology, University of Aberdeen, Aberdeen, UK eIF4E interacts with the 50 cap structure of an
3
Institute of Medical Sciences, University of mRNA (beginning of an mRNA), and poly-A binding
Aberdeen, Aberdeen, UK protein (PABP) binds the poly-A structure at the 30 end
(mRNAs end), as depicted in Fig. 1c. eIF4G circular-
izes the mRNA by binding both eIF4E and PABP.
Synonyms eIF4G also recruits the mRNA helicase eIF4A and its
activating partner eIF4B, which removes mRNA sec-
Eukaryotic translation initiation pathway ondary structures that might impede the movement of
the ribosome during scanning (below). This entire
complex is known as eIF4F (Fig. 1c).
Definition The 43S preinitiation complex loads onto the 50 end
of an mRNA via multiple interactions between eIF3/
Eukaryotic translation initiation is the process that eIF5 and eIF4E/eIF4B complexes (Fig. 1a: v5). Subse-
allows a ribosome to assemble at the start codon on quently, the complex moves down the mRNA in a 30
a messenger RNA. A set of proteins known as eukary- direction, seeking the 50 proximal start codon, in
otic translation initiation factors (eIFs) orchestrates a process known as scanning (Fig. 1a: v6). eIF5 cata-
this process along with the two ribosomal subunits lyzes the hydrolysis of GTP bound to eIF2, which is in
and initiator tRNA. Some of the events in this process equilibrium with GDP plus phosphate (Fig. 1a: v7).
happen in a specific temporal order. The cooperativity Base pairing between the anticodon loop of the initia-
among protein interactions gives rise to some defined tor tRNA Met-tRNAi and the start codon leads to
preinitiation complexes. a change in the conformation of the 43S preinitiation
E 676 Eukaryotic Translation Initiation Factor Interactions

(A)n
a 4E
7G
m 4G
4A
4B

AUG

V2
(A)n
V3
7G
m 4G
V2 3 V4 3 V5 3 4E 4A
2 2 1 5 1 5
4B

1 5
40S 2 1A 40S 2 1A
2
2B AUG
V1

2 5 3 1
V6

c
V9 3 V8 3 V7 3
5 1 5 1 5 PABP
40S 1A 40S 2 1A 40S 2 1A 40S 2 1A 4A
+Pi
m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n

(A)n
V10 4G 4G 7G
m 4G 4E
4A 4E 4A
1A 4B 4B

AUG
5B
V12 4B
+Pi
5B V11 5B Pi V13
40S 1A 40S 1A 40S 40S 4E-BP
m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n
m7G AUG (A)n TORC1
60S

3
1 5
5 3
5 2 1 5
2 5
2 2 5
3

3
3 3 5
2

5 5

3 1
3 V21 3 3 3 1
1 1 5 5 1 5
5 1

3
1 5 1 5 3
2 1 5
2

Eukaryotic Translation Initiation Factor Interactions, Fig. 1 Schematic diagram of eukaryotic translation initiation

complex. This triggers release of eIF1 in a mechanism of the translation initiation process, a ribosome is
involving eIF1A and eIF5 (Fig. 1a: v8, Lorsch and poised at the start codon ready for translation elonga-
Dever 2010). This leads to release of the phosphate, tion (Sonenberg and Hinnebusch 2009).
GDP, eIF2, and eIF5, and commits the 40S ribosomal
subunit in an immobilized closed conformation over Multifactor Complex
the start codon (Fig. 1a: v9, Sonenberg and As described above, assembly of the 43S preinitiation
Hinnebusch 2009). complex relies on a multifactor complex. The cooper-
Next, eIF5B∙GTP is recruited to the 40S ribosomal ative binding between the individual factors that com-
subunit (Fig. 1a: v10). The 60S ribosomal subunit joins prise multifactor complex means a partial complex has
the complex after the hydrolysis of this eIF5B-bound higher affinity with the binding partners compared
GTP (Fig. 1a: v11) and the release of the phosphate, with the individual components of the partial complex.
eIF1A and eIF5B (Fig. 1a: v12, v13). Upon completion This stimulation favors the formation of a complete
Eukaryotic Translation Initiation Factor Interactions 677 E
multifactor complex, and hence ensures the proper on protein synthesis rate (Dimelow and Wilkinson
assembly of a functioning 43S preinitiation complex. 2009; You et al. 2010). eIF2 binds initiator tRNA
The multifactor complex contains equimolecular Met-tRNAi as well as the GTP whose hydrolysis is
amounts of the individual factors. However, these central to start codon recognition. Therefore, reducing
eIFs are not equimolar in vivo (von der Haar and eIF2 activity would directly affect the levels of
McCarthy 2002). The optimal operation of the trans- the multifactor complex and consequently the 43S
lation initiation apparatus relies on the complex inter- preinitiation complex. Therefore, sufficient levels of
actions among the eIFs, and overexpression of eIF2 expression are important for efficient translation.
a specific eIF does not necessarily lead to higher pro- On the other hand, eIF1 has a critical role in start codon
tein synthesis activities. For instance, overexpression recognition. Evolution might have led to the mainte-
of eIF5 sequesters eIF2∙GDP in a nonproductive nance of high eIF1 activities in vivo to ensure that E
eIF5∙eIF2∙GDP complex, thereby reducing protein mRNA translation is not impeded at the initiation
synthesis (Singh et al. 2006; You et al. 2010). step by this factor.

eIFs Control Protein Synthesis Modes of Control in Translation Initiation


According to metabolic control theory (▶ Metabolic 4E-BP (eIF4E binding protein) competitively binds
Control Analysis), control over a multistep process is eIF4E, and inhibits its interaction with eIF4G (Fig. 1c).
distributed among the enzymes that catalyze each step, In mammals the activity of 4E-BP is negatively regulated
and there is no specific rate-limiting step. Hence, con- by mTORC1 (mammalian target of rapamycin complex
trol over protein synthesis is not exerted upon the 1). When nutrients are limiting, mTORC1 activity is
activity of only one eIF. Instead, such control is dis- lowered. This increases the activity of 4E-BP and even-
tributed among different factors. A further complica- tually reduces protein synthesis via inhibition of eIF4E
tion is that the topology of the network that gives rise to (Sonenberg and Hinnebusch 2009).
the multifactor complex is not linear. To characterize In addition, the activities of eIF2 and eIF2B are
the control of protein synthesis by a specific eIF in negatively regulated in response to stress conditions
a small perturbation regime, a local flux control coef- (see ▶ Translational Control of GCN4). A reduction in
ficient is introduced: eIF2 activity leads to global translational arrest, while
some specific mRNA species gain higher translation
eIF @J activity in mechanism involving upstream open
J
CeIF ¼  ; (1)
J @eIF reading frames.

where J denotes the protein synthesis rate. This control


coefficient is a normalized value, and it illustrates how
sensitive the protein synthesis is to a small perturbation Cross-References
around a particular eIF activity, while keeping the
activity of other eIFs constant. ▶ Translational Control of GCN4
Experimental work demonstrates that some of
J
the eIFs exert a biphasic mode of control. CeIF is
small at small reduction in eIF, indicating that References
protein synthesis is relatively robust against small
decrease in eIF activities. Below a certain thresh- Asano K, Clayton J, Shalev A, Hinnebusch AG
J (2000) A multifactor complex of eukaryotic initiation fac-
old, CeIF assumes a higher value. This means that
tors, eIF1, eIF2, eIF3, eIF5, and initiator tRNA(Met) is an
protein synthesis becomes sensitive to the activity
important translation initiation intermediate in vivo. Genes
of an eIF once this eIF is reduced below certain Dev 14(19):2534–2546
value (Sangthong et al. 2007). Dimelow RJ, Wilkinson SJ (2009) Control of translation initia-
Computational modeling has provided additional tion: a model-based analysis from limited experimental data.
J R Soc Interface 6(30):51–61
insights. Significantly, modeling predicts that eIF2
Lorsch JR, Dever TE (2010) Molecular view of 43S
has the highest impact on protein synthesis rate, complex formation and start site selection in eukaryotic
while eIF1 is among those factors with the least impact translation initiation. J Biol Chem 285(28):21203–21207
E 678 Eukaryotic Translation Initiation Pathway

Sangthong P, Hughes J, McCarthy JE (2007) Distributed


control for recruitment, scanning and subunit joining Evolution of Elongation Factor 1 Alpha
steps of translation initiation. Nucleic Acids Res
35(11):3573–3580
Singh CR, Lee B, Udagawa T, Mohammad-Qureshi SS, Kazuki Saito and Koichi Ito
Yamamoto Y, Pavitt GD, Asano K (2006) An eIF5/eIF2 Graduate School of Frontier Sciences, University of
complex antagonizes guanine nucleotide exchange by Tokyo, Chiba, Tokyo, Japan
eIF2B during translation initiation. EMBO J 25(19):4537–
4546
Sonenberg N, Hinnebusch AG (2009) Regulation of translation
initiation in eukaryotes: mechanisms and biological targets. Definition
Cell 136(4):731–745
von der Haar T, McCarthy JE (2002) Intracellular translation
initiation factor levels in Saccharomyces cerevisiae and Elongation factor-Tu (EF-Tu, bacterial)/eukaryotic
their role in cap-complex function. Mol Microbiol elongation factor 1a (eEF1a also known as eEF1A) is
46(2):531–544 a critical factor for protein synthesis and is highly
You T, Coghill GM, Brown AJP (2010) A quantitative model for conserved across all three domains of life. EF-Tu/
mRNA translation in Saccharomyces cerevisiae. Yeast
27(10):785–800 eEF1a is a principal member of the homologous trans-
lational GTPase family. Among the members of the
family, eEF1a-related factors, eRF3, HBS1, and EFL,
are the particular interests of this entry. eRF3,
a GTPase subunit of the polypeptide chain release
Eukaryotic Translation Initiation
factors, and HBS1, an mRNA quality control protein,
Pathway
are closely related to EF-Tu/eEF1a, thus comprising
members of the EF1a family. eRF3 is completely and
▶ Eukaryotic Translation Initiation Factor Interactions
HBS1 is almost completely conserved among all
eukaryotes. Their evolution evokes attention to the
molecular mimicry between tRNA and protein factors.
The discovery of EFL in some eukaryotes disrupts
Event Extraction Corpora
the seemingly perfect phylogenetic conservation of
EF-Tu/eEF1a among all domains of life.
▶ Text Corpora, Molecular Event Extraction

Characteristics
Evidence-based Medicine
The Translational GTPase Superfamily
Ravi Iyengar The translational GTPase superfamily consists of protein
Department of Pharmacology and Systems factors that share highly conserved motifs in the G-
Therapeutics, Mount Sinai School of Medicine, domain as well as in adjacent regions. Translational
New York, NY, USA GTPases include protein factors that stimulate GTPase
hydrolysis activity on the large subunit of the ribosome
and play essential roles in the initiation, elongation, and
Definition termination of protein synthesis. Some factors within the
translational GTPase family do not appear to be involved
Utilizing existing clinical trials, studies, and in translation but are engaged in the maintenance or
meta-analysis, as well as biological understanding to biogenesis of ribosomes instead. The members of this
determine treatment choice and effectiveness. family are largely divided into subfamilies according to
their similarities with EF-Tu/eEF1a, EF-G/eEF2, and
IF2/eIF5B (Table 1, see ▶ Translation Initiation and
Cross-References ▶ Translation Elongation). Typically, the genome of
individual species encodes many members of EF-Tu/
▶ Systems Pharmacology eEF1a and EF-G/eEF2 families whereas IF2/eIF5B is
Evolution of Elongation Factor 1 Alpha 679 E
Evolution of Elongation Factor 1 Alpha, Table 1 The trans- interaction between the codons of mRNA and antico-
lational GTPase family of eubacteria and eukaryotes dons of tRNA, EF-Tu/eEF1a hydrolyzes GTP, releases
Ancestor Factor name the aa-tRNA, and then exits the ribosome. These pro-
type Eubacteria Eukaryotes Function cesses lead to the relocation of the aminoacyl-CCA
IF2 IF2 eIF5B Initiation stem of the aa-tRNA toward the peptidyl transferase
EF-Tu EF-Tu eEF1a and/ Elongation center of the ribosome and result in elongation of the
or EFL
peptide chain. Subsequently, GTP-bound EF-G/eEF2
CysNa n.i.∗ Adenosine-50 -
enters the ribosome and translocates the peptidyl-
phosphosulfate synthesis
n.i. eRF3b Termination
tRNA along with mRNA toward the P site, leaving
n.i. HBS1b mRNA surveillance the A site ready for the next round of reactions cata-
lyzed by EF-Tu/eEF1a (Agirrezabala and Frank 2009). E
n.i. Ski7b mRNA surveillance
n.i. eIF2gammac Initiation The molecular functions of EF-Tu/eEF1a and EF-G/
SelBa EEFSEC Incorporation of eEF2 are the essential characteristics of protein syn-
selenocysteine thesis, and these two proteins are almost (see below)
EF-G EF-G EF2 Elongation fully conserved among all living organisms.
RF3a n.i. Termination
LepAa n.i. Unclear EFL
Teta n.i. Tetracycline resistance Despite the critical role of eEF1a in protein syn-
TypeAa n.i. Unclear thesis, there are certain eukaryotic species, such as
n.i. SNU114d mRNA splicing
green algae and chromalveolates, that seem to lack

n.i. indicates “not identified” eEF1a orthologs and instead encode an elongation
For derived factors, references are as follows:
a
Margus et al. 2007, factor-like protein (EFL). Although the amino acid
b
Atkinson et al. 2008, sequence of EFL is clearly divergent from that of
c
Keeling et al. 1998, eEF1a, the sequences are similar enough to suggest
d
Fabrizio et al. 1997 that EFL and eEF1a are functionally equivalent. In
most cases, the distribution of EFL and eEF1a
among species is mutually exclusive. Only one
encoded by a single gene. Presumably, in relation with species has so far been identified to possess both
numbers of genes, EF-Tu/eEF1a and EF-G/eEF2 have eEF1a and EFL, and no species has been found
many more derivatives than IF2/eIF5B. Interestingly, which possess neither of them. This mutual exclu-
EF-Tu/eEF1a has more derivatives in eukaryotes, and siveness makes EFL different from the other EF1a
EF-G/eEF2 has more derivatives in bacteria (Fig. 1). family members, eRF3 and HBS1.
Also interestingly, with the exception of SelB The distribution of EFL is not vertical but is
(▶ Translational Control by Cis RNA Elements, unevenly scattered in a wide range of phylogenic
Bacteria; ▶ Selenocysteine-specific EF-Tu/eEF1a) trees, i.e., close relatives of organisms with EFL have
specific EF-Tu/eEF1a (Keeling et al. 1998), derivatives eEF1a instead of EFL in multiple branches. This situ-
of these two factors are unique to the different domains ation suggests distribution of EFL by lateral gene
of life (Table 1 and Fig. 1). transfer. Direct evidence for lateral gene transfer of
EFL has been reported in the phylogenetic analyses of
EF-Tu/eEF1a: The Key GTPase Elongation Factor EFL in Rhizaria and algae (Kamikawa et al. 2008).
eEF1a is the eukaryotic ortholog of the bacterial elon- Despite this, it is not clear if lateral transfer has
gation factor EF-Tu. The fundamental function of occurred in other phylogenetic groups containing spe-
eEF1a is equivalent to that of EF-Tu. In polypeptide cies with EFL. The gene-loss scenario is entirely pos-
elongation cycles, eEF1a and eEF2, another GTPase sible (Cocquyt et al. 2009). Namely, the common
that is the ortholog of bacterial EF-G, sequentially bind eukaryotic ancestor was assumed to have both EFL
the ribosome. EF-Tu/eEF1a forms a complex with an and eEF1a, of which EFL was lost in most lineages.
aminoacyl-tRNA (aa-tRNA) in a GTP-binding state Further analysis is required to elucidate the evolution-
and delivers the aa-tRNA to the ribosomal A site. In ary processes that underlie the distributions of EFL and
the ribosomal A site, upon cognate base-pairing eEF1a within and among different taxonomic groups.
E 680 Evolution of Elongation Factor 1 Alpha

Evolution of Elongation
Bacteria
Factor 1 Alpha,
Fig. 1 Evolutionary EF-Tu Elongation
relationship of EF1a and EF2
family proteins found in the
three domains of life on Earth. EF-G Elongation
LUCA, the last universal
RF3 Termination
common ancestor (Diversified)

LUCA

EF-Tu

EF-G Archaea
alF2γ Initiation

Elongation
aEF1A multipotent Termination
Surveillance

aEF2 Elongation

Eukaryotes

elF2γ Initiation

eEF1A Elongation
eRF3 Termination
HBS1 Surveillance (Diversified)

eEF2 Elongation

The EF1a Family Members – eRF3 and HBS1 – and complex in appearance, and this complex is thought
Mimicry of the EF-Tu/eEF1a-aa-tRNA Complex to function in mRNA surveillance or mRNA quality
Two family members of eEF1a in eukaryotes have control (Doma and Parker 2006). However, the precise
similar modes of action as eEF1a. Instead of forming molecular mechanism underlying this putative func-
complexes with RNA factors like tRNA, these proteins tion is not well understood.
form complexes with protein factors that mimic the The tRNA mimicking factors, eRF1 and Dom34,
structure and function of tRNA. One of these proteins are conserved in archaea as well as in eukaryotes.
is the class II eukaryotic release factor (eRF3) that However, their interacting GTPases, eRF3 and HBS1
forms a complex with class I eukaryotic release factor respectively, have not been identified in archaeal
(eRF1), which mimics tRNA in functional as well as genomes despite extensive research (Atkinson et al.
structural aspects to catalyze the decoding of stop 2008). A study has recently shown that canonical
codons in a way similar to tRNAs. Another family archaeal EF1a can form functional tRNA mimicking
member, HBS1, forms a complex with Dom34, complexes with any of the archaeal eRF1 or Dom34
which is partly homologous to eRF1. The HBS1– factors (Saito et al. 2010). Archaeal EF1a (aEF1a)
Dom34 complex resembles the eEF1a-aa-tRNA may thus function as a versatile carrier protein for
Evolution of Elongation Factor 1 Alpha 681 E
Interaction Evolutional event Interaction

tRNA tRNA

EF1α

aRF1 eRF1

Cenancestor eRF3
aEF1α EF1α
aPelota (aEF1α-like) Pelota E

HBS1
Archaea Eukaryotes

Evolution of Elongation Factor 1 Alpha, Fig 2 The special- termination, and mRNA surveillance. In archaea (toward the
ized eukaryotic EF1a and the related proteins eRF3 and HBS1 left), aEF1a remains a versatile translational GTPase. In eukary-
and the multipotent archaeal EF1a. Black arrows indicate the otes (toward the right), the cenancestoral EF1a differentiated
evolutionary events and gray arrows indicate interactions. into multiple proteins with specific functions and interacting
Archaeal and eukaryotic EF1a and eukaryotic eRF3 and HBS1 partners
presumably share a cenancestor that functioned in elongation,

tRNA and tRNA-like protein adaptors. The ▶ Translation Termination


cenancestoral EF1a for archaea and eukaryotes is ▶ Translational Control by CIS RNA Elements,
thought to function equally in translation termination Bacteria
and mRNA surveillance, as well as in elongation, by
switching interacting partners as in the current archaea.
This suggests that a gene duplication/triplication event References
may have occurred in the cenancestral eukaryote and
Agirrezabala X, Frank J (2009) Elongation in translation as
that the resulting gene copies subsequently diversified
a dynamic interaction among the ribosome, tRNA, and elon-
with their respective binding partners to facilitate dif- gation factors EF-G and EF-Tu. Q Rev Biophys 42:159–200
ferential regulation (Figs. 1 and 2). Atkinson GC, Baldauf SL, Hauryliuk V (2008) Evolution of
nonstop, no-go and nonsense-mediated mRNA decay and
their termination factor-derived components. BMC Evol
Bacterial RF3 is Not an EF1a Family Member
Biol 8:290
Unlike eRF3 or aRF3, the bacterial class 2 release Cocquyt E, Verbruggen H, Leliaert F, Zechman FW, Sabbe K,
factor, RF3, is not an EF1a family member. Instead, De Clerck O (2009) Gain and loss of elongation factor genes
it is an EF2 family member, a striking example of in green algae. BMC Evol Biol 9:39
Doma MK, Parker R (2006) Endonucleolytic cleavage of
convergent evolution (Kisselev et al. 2003) (Table 1).
eukaryotic mRNAs with stalls in translation elongation.
This explains marked difference in mechanisms under- Nature 440:561–564
lying eRF3 versus RF3 functions during translation Fabrizio P, Laggerbauer B, Lauber J, Lane WS, L€ uhrmann R
termination (▶ Translation Termination). (1997) An evolutionarily conserved U5 snRNP-specific pro-
tein is a GTP-binding factor closely related to the ribosomal
translocase EF-2. EMBO J 16:4092–4106
Kamikawa R, Inagaki Y, Sako Y (2008) Direct phylogenetic
Cross-References evidence for lateral transfer of elongation factor-like gene.
Proc Natl Acad Sci USA 105:6965–6969
Keeling PJ, Fast NM, McFadden GI (1998) Evolutionary rela-
▶ Selenocysteine
tionship between translation initiation factor eIF-2gamma
▶ Translation Elongation and selenocysteine-specific elongation factor SELB: change
▶ Translation Initiation of function in translation factors. J Mol Evol 47:649–655
E 682 Evolution of Eukaryotic Translation Initiation Factors

Kisselev L, Ehrenberg M, Frolova L (2003) Termination of and load Met-tRNAiMet onto the P-site (▶ Translation
translation: interplay of mRNA, rRNAs and release factors. Initiation). The direct partner of Met-tRNAiMet is eIF2
EMBO J 22:175–182
Margus T, Remm M, Tenson T (2007) Phylogenetic distribution made of a, b, and g subunits. The GTP-binding
of translational GTPases in bacteria. BMC Genomics 8:15 subunit, a/eIF2g, is a close structural homologue of
Saito K, Kobayashi K, Wada M, Kikuno I, Takusagawa A, EF-Tu/eEF1A (Schmitt et al. 2002) (Fig. 1b) (▶ Evo-
Mochizuki M, Uchiumi T, Ishitani R, Nureki O, Ito K lution of Elongation Factor 1 Alpha). Similar to the
(2010) Omnipotent role of archaeal elongation factor 1
alpha (EF1{alpha}) in translational elongation and termina- latter, GTP-bound eIF2 forms a ternary complex (TC)
tion, and quality control of protein synthesis. Proc Natl Acad with Met-tRNAiMet off the ribosome and delivers the
Sci USA 107:19242–19247 aa-tRNA to it. The Ser-51 of eIF2a is the conserved
site of phosphorylation by various stress-activated
kinases (▶ Translation Initiation, ▶ Translational
Control of GCN4). Interestingly, e/aIF2a is a fusion
Evolution of Eukaryotic Translation between the N-terminal OB-fold domain, containing
Initiation Factors Ser-51, and the C-terminal domain (CTD) similar to
eEF1Ba, the catalytic subunit of eEF1B homologous
Katsura Asano to EF-Ts (▶ Translation Elongation) (Ito et al. 2004).
Molecular, Cellular, and Developmental Biology The aIF2a-CTD interacts with the domain II of aIF2g,
Program, Division of Biology, Kansas State just as eEF1Ba binds the domain II of eEF1A during
University, Manhattan, KS, USA the guanine nucleotide exchange reaction, although
their orientations relative to the domain II of the
GTP-binding protein are different (Yatime et al.
Definition 2007). The evolutionary relationship between the
OB-fold domains of e/aIF1A and e/aIF2a is yet
The current repertoire of eukaryotic translation initia- unclear.
tion factors (eIFs) evolved on the basis of the set of eIF1 and eIF2b-CTD have a similar eIF1/2/5-fold,
initiation factors found in archaea (aIFs) and the addi- suggesting their common origin (Conte et al. 2006).
tion of new proteins belonging to different protein Because a/eIF1’s function in start site fidelity appears
superfamilies including those specifically found in to be fundamental to the ribosome function, it is tempt-
eukaryotes. In this entry, the evolutionary relationship ing to hypothesize that this was the precursor of the
and origin of these eIFs are described. proteins of this family. In agreement with this, some
bacteria encode an eIF1 relative, termed YciH (named
in E. coli), which can replace eIF1 function in mam-
Characteristics malian reconstitution assays (Lomakin et al. 2006).
Based on this result, Lomakin et al. proposed that
Of ten eIFs involved in eukaryotic translation initia- eIF1/YciH was the major fidelity factor in the common
tion, two of them, eIF1A and eIF5B are universally ancestor and replaced with the newer protein, IF3, in
conserved (Fig. 1a) (for initiation factors, see bacterial lineages. However, since only a few bacterial
▶ Translation Initiation and its Table 1). eIF5B is species encode YciH (Kyrpides and Woese 1998), it
a member of translational GTPase superfamily and its cannot be ruled out that this was transferred from
bacterial counterpart is IF2 (▶ Evolution of Elongation Archaea by horizontal gene transfer (“?” in Fig. 1).
Factor 1 Alpha). eIF1A is an OB-fold protein homol-
ogous to bacterial IF1; OB-fold proteins include many Evolution of HEAT-Domain-Containing Factors
ribosomal proteins and are generally characterized as and Regulators
nucleic acid-binding proteins (Theobald et al. 2003). The mRNA-binding factor, eIF4G, in most metazoan
species carry three HEAT domains, MIF4G/HEAT1,
Evolution of eIF1 and eIF2 Conserved in Archaea MA3/HEAT2, and W2/HEAT3 as eIF4A- and eIF4E
eIF1, eIF1A, and eIF2 directly interact with the 40S kinase-binding sites (Fig. 2a). eIF4G identified in other
subunit, the eukaryotic ribosomal small subunit (SSU), species, such as plants and fungi, lack the W2 domain
prevent the SSU reassociation with the large subunit, or MA3 and W2 domains together (see Fig. 2b for
Evolution of Eukaryotic Translation Initiation Factors 683 E

Evolution of Eukaryotic Translation Initiation Factors, conserved universally (a), functionally conserved with archaeal
Fig. 1 Evolutionary relationship between translation initiation factors (b), with HEAT domains (c), and with their precursors
factors found in the universal ancestor (LUCA, last universal identified in archaea (d) are indicated by bracket or thin arrow.
common ancestor), archaea and eukaryotes. Each vertical col- Proteins, which appeared during eukaryotic evolution, are
umn with a distinct color represents the group of proteins in listed between the columns representing archaea and eukaryotes.
the hypothetical genome of LUCA (left column), or the proto- RRM superfamily is found in all three domains of life, so this
typical genome of archaea (middle) or eukaryotes (right). Pro- is listed to the left of the archaeal column. eIF2a, eIF2Be,
teins belonging to the same protein superfamily are described in and eIF5 are proposed to be generated by fusion between
the same color as defined in the blue box at bottom left. Proteins two distinct domains, and their origins are listed separately by
connected by solid horizontal arrows are homologous proteins. the color code and two arrows that merge together as defined in
Dotted horizontal arrows connect proteins under the same super- the blue box (except eIF2a-CTDEF1B derived from EF-Ts/
family but without direct homology. Eukaryotic factors eEF1Ba)

yeast eIF4G). Marintchev and Wagner identified of the nuclear cap-binding protein 80 (CBP80)
a significant homology between each of the three dis- (Marintchev and Wagner 2005). CBP80 is the larger
tinct HEAT domains of the metaozan eIF4G and those subunit of nuclear cap-binding complex (CBC). This is
E 684 Evolution of Eukaryotic Translation Initiation Factors

1 500 1000 1500

PABP elF4E elF4A elF3 elF4A Mnk1


a elF4G1
MIF4G
(human) MA3 W2
AA-1 AA-2

PABP elF4E elF4A


b elF4G2
(Yeast) MIF4G

elF4A and elF3


elF2
c p97/NAT1/DAP5 MIF4G MA3 W2
AA-1 AA-2
Other elF2B subunits
elF2
d elF2Bε W2
AA-1 AA-2

elF2
e elF5 GAP W2
AA-1 AA-2

elF2
f 5MP/BZW MA3 W2
AA-1 AA-2

Evolution of Eukaryotic Translation Initiation Factors, listed across the top. Bracket indicates an approximate area of
Fig. 2 HEAT domain-containing translation initiation factors interaction with indicted partners. Green boxes indicate HEAT
and regulators. Primary structures of human eIF4G1, yeast (S. domains. The C-terminal HEAT domain is the W2 domain with
cerevisiae) eIF4G2, human p97/NAT1/DAP5, human 5MP1/ short thick lines representing the location of AA-boxes 1 and 2
BZW2, and yeast eIF2Be and eIF5 are drawn to scale with filled (AA-1, AA-2, respectively)
boxes indicating segments known to interact with their partners

a highly conserved protein and this protein from all understood as having lost the W2/HEAT3 domain, or
eukaryotes has the three HEAT domain structure, in subsequently the MA3/HEAT3 domain in yeast, by
contrast to eIF4G. The smaller subunit CBP20 of CBC, intron or stop codon mutations (Fig. 3g–i).
carrying a common and ancient RNA recognition eIF5 and eIF2Be, the GTPase activating protein
motif (RRM), directly binds the mRNA cap. The cyto- (GAP) and the catalytic subunit of the guanine nucle-
plasmic cap-binding protein, eIF4E, has a distinct otide exchange factor (GEF) for eIF2, carry the W2
unique fold, suggesting that the cytoplasmic complex domain at their C-termini as eIF2-binding sites
(eIF4F) is a newer invention in terms of the cap- (Fig. 2d, e) (▶ Translation Initiation). Translational
binding function. Together, these findings led regulators NAT1/p97/DAP5 and eIF5-mimic protein
Marintchev and Wagner to propose that the metazoan (5MP)/BZW also carry HEAT domains, including W2,
eIF4G was the first HEAT-domain-containing transla- as found in metazoan eIF4G (Fig. 2c, f; reviewed in
tion factor, which evolved by gene duplication of pri- Singh et al. 2011). Because the W2 domains in these
mordial cap-binding protein (Fig. 3a–b). The extant proteins are distinct from the HEAT3 of CBP80,
form of eIF4G was assumed to have evolved when it lacking the last HEAT repeat found in the latter,
acquired a long N-terminal unstructured extension these W2 domains are likely to be derived from the
containing the binding sites for eIF4E and PABP archaic eIF4G with the three HEAT domains.
(Fig. 3c; see Fig. 2a–b for eIF4E and PABP binding Because NAT1 family is limited to Metazoa and is
sites). The diversity of eIF4G structure in eukaryotes is more similar to eIF4G proteins therein than those from
Evolution of Eukaryotic Translation Initiation Factors 685 E
(a) Eukaryotic H1 H2 H3 Primordial cap binding protein
ancestor

CBP80 (Nucleus) eIF4G (Cytoplasm)


(b) H1 H2 H3 H1 H2 H3
(b-c)
Formation of
Nucleus
(c) H1 H2 H3 H1 H2 H3 (+elF4E)

5MP
(d) elF2- H1 H2 H3 H2 H3 H1 H2 H3
binding E
proteins (g) Plants
H1 H2
Amoebozoa
H1 H2 H3 H2 H3 H1 H2 H3

(e) LECA H1 H2 (h) Other fungi


SNT AT H3 NTD Zn H3

elF2Be elF5 H1 (i) Yeasts

H1 H2 H3 H1 H2 H3
H2 H3
(f) Metazoa
except H1 H2 H3
Nematodes SNT AT H3 NTD Zn H3
NAT1

Evolution of Eukaryotic Translation Initiation Factors, versions by gene duplication. (c) The cytoplasmic version
Fig. 3 The model of the evolution of HEAT domain-containing obtained N-terminal extension to bind eIF4E and became
translation initiation factors and regulators. The proposed eIF4G. (d) eIF5-mimic protein (5MP) evolved by truncation of
course of evolution of HEAT domain-containing proteins, the second eIF4G copy and became eIF2-binding translational
starting with the primordial cap-binding protein with three regulator. (e) eIF5 and eIF2Be acquired the W2 domains from
HEAT domains (three boxes representing HEAT1, HEAT2, 5MP in the last eukaryotic common ancestor (LECA). (f) NAT1
and HEAT3), is depicted in each hypothetical step of eukaryotic evolved from eIF4G more recently in metazoa (f). Because
evolution. (a) The early eukaryotic ancestor without a nucleus eIF4G has transferred the function carried by W2 to 5MP, this
contained a single cap-binding protein. (b) Formation of the domain of eIF4G tends to be lost during the course of eukaryotic
nucleus allowed the evolution of nuclear and cytoplasmic evolution (g–i)

outside (Marintchev and Wagner 2005), this protein The emergence of eIF5 and eIF2Be is understood as
evolved in the ancestor of Metazoa (Fig. 3f). On the the gene fusion of a duplicated copy of aIF2b (eIF1/2/5
other hand, nearly universal distribution of 5MP with fold) and the ancestor of the unique protein eIF2Bg
MA3 and W2 domains in eukaryotes including the (see below), respectively, with the C-terminal portion
primitive species Giardia lamblia (Marintchev and of 5MP (Figs. 1c and 3d–e).
Wagner 2005) argues for its ancient origin and sug-
gests that this was the evolutionary precursor for the Evolution of eIF4A and eIF2B
W2 domains of eIF5 and eIF2Be (Fig. 3d–e). Impor- Proteins with the DExD/H-box are ATP-dependent
tantly, this protein lacks the MIF4G domain essential RNA helicases, which have diversified to carry out
for eIF4A binding. Instead, this has the W2 domain, a variety of biological processes in eukaryotes (Tanner
which serves as the eIF2-binding site for all the extant and Linder 2001). These include transcription, ribo-
W2 domains except the eIF4G-W2 domain (reviewed some biogenesis, RNA processing and export, and
in Singh et al. 2011). Thus, the original function of translation initiation and termination. eIF4A is the
5MP might have been to bind eIF2 and regulate its major DExD/H-box RNA helicase involved in transla-
guanine nucleotide binding, similar to the GDP disso- tion initiation. Although eIF4A requires the associa-
ciation inhibition activity found for yeast eIF5-W2 and tion with eIF4G for its activity, some of the DExD/H
the adjacent linker peptide (Jennings and Pavitt 2010). helicases, such as yeast Ded1p and human DHX29 are
E 686 Evolution of Eukaryotic Translation Initiation Factors

involved in translation initiation, but do not require Unstructured Segments of eIFs Defining Functions
or bind eIF4G for their activities. Interestingly, archaea Unique to Eukaryotic Initiation
contain a prototypical member of this family, Similar to other eukaryotic proteins, more than half, on
MjDEAD (named in M. jannaschii). The original func- average, of the entire length of each eIF polypeptide is
tion of MjDEAD in archaeal biology is unknown. unfolded, yet carrying important functions. Examples
About half of the sequenced archaeal genomes include the N-terminal segment of eIF4G containing
encode a protein termed aIF2B more similar to PABP- and eIF4E-binding sites and the N-terminal
eIF2Ba/b/d than distinct archaeal enzymes with similar segment of eIF2b carrying conserved lysine-rich
folds (RBPI, ribose-1,5-bisphoshate isomerase, and segments (K-boxes) mediating eIF2 binding to eIF5
MTNA, methylthioribose-1-phosphate isomerase). and eIF2Be (▶ Eukaryotic Translation Initiation Fac-
aIF2B copurifies with aIF2a and a sugar-phosphate tor Interactions). eIF1A, the universally conserved
nucleotidyltransferase-like protein (SP-NUCT) from factor, also carries eukaryote-specific N-terminal and
T. kodakaraensis (Dev et al. 2009). Interestingly, bioin- C-terminal tails, which play important roles in regulat-
formatics analysis indicates that eIF2Bg/e has motifs ing ribosome conformation in response to AUG selec-
found in SP-NUCT. Because the SP-NUCT does not tion (Saini et al. 2010). Thus, the additional proteins or
carry the W2 domain essential for GEF catalysis, protein segments found in eukaryotic initiation sys-
this archaeal protein would rather be homologous to tems confer mRNA recruitment by unique terminal
non-catalytic subunit eIF2Bg. These results suggest modifications (m7G-cap and polyA tail), the accuracy
that aIF2B and the SP-NUCT originally evolved as in start codon selection and (via phospho-eIF2-
eIF2-binding proteins, which might regulate eIF2 in regulated eIF2:eIF2B interaction) the translational
response to sugar or nucleotide binding. The extant regulation unique to this domain of life.
eIF2B is hypothesized to have arisen by gene triplication
and duplication, respectively, of aIF2B and SP-NUCT,
heteropentamer formation by their gene products and Cross-References
a subsequent gene fusion between a SP-NUCT copy and
5MP, generating the catalytic subunit eIF2Be (Fig. 1d). ▶ Eukaryotic Translation Initiation Factor Interactions
▶ Evolution of Elongation Factor 1 Alpha
Evolution of eIF3 and RRM-containing Factors ▶ Translation Elongation
Human eIF3 has 13 distinct subunits (a-m), of which ▶ Translation Initiation
six of them contain a PCI domain whereas two of them ▶ Translational Control of GCN4
contain a MPN domain (Zhou et al. 2008). Six PCI
proteins and two MPN proteins are also found in
the 26S proteasome lid and the COP9 signalosome
deNEDDylase complexes (yellow box in Fig. 1). References
Cryo-EM analysis of the 26S proteasome lid suggests
Conte MR, Kelly G, Babon J, Sanfelice D, Youell J, Smerdon SJ,
that the PCI and MPN proteins form a ring structure. Proud CG (2006) Structure of the eukaryotic initiation factor
Thus, it appears that these motifs were used to build 5 reveals a fold common to several translation factors. Bio-
a large scaffold. Besides these subunits, eIF3 contains chemistry 45:4550–4558
Dev K, Santangelo TJ, Rothenburg S, Neculai D, Dey M, Sicheri F,
a WD40 subunit (eIF3i) (WD40 family is uniquely
Dever TE, Reeve JN, Hinnebusch AG (2009) Archaeal
eukaryotic and forms a b-propeller structure), aIF2B interats with eukaryotic translation initiation factors
two RRM subunits (eIF3b and g) and two other sub- eIF2alpha and eIF2Balpha: implications for aIF2B function
units (eIF3d and j) unique to eIF3. As expected for and eIF2B regulation. J Mol Biol 392:701–722
Ito T, Marintchev A, Wagner G (2004) Solution structure of
a reaction involving multiple mRNA binding steps,
human initiation factor eIF2alpha reveals homology to the
additional factors eIF4B and eIF4H contain an RRM. elongation factor eEF1B. Structure 12:1693–1704
A structural homology between CBP20 (see above), Jennings MD, Pavitt GD (2010) eIF5 has GDI activity necessary
the cap-binding subunit of CBC, and eIF4H has been for translational control by eIF2 phosphorylation. Nature
465:378–381
suggested (Marintchev and Wagner 2005), but its sig-
Kyrpides NC, Woese CR (1998) Universally conserved
nificance in the mechanism of translation initiation is translation initiation factors. Proc Natl Acad Sci USA
unknown. 95:224–228
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 687 E
Lomakin IB, Shirokikh NE, Yusupov MM, Helln CUT, Pestova and use free energy to perform a wide diversity of
TV (2006) The fidelity of translation initiation: reciprocal functions. This process couples exergonic reactions
activities of eIF1, IF3 and YciH. EMBO J 25:196–210
Marintchev A, Wagner G (2005) eIF4G and CBP80 share of nutrient oxidation to the endergonic processes
a common origin and similar domain organization: implica- required to maintain the vital state of the cell, such as
tions for the structure and function of eIF4G. Biochemistry mechanical work, active transport, and biosynthesis
44:12265–12272 of molecules. Therefore, the origin and evolution of
Saini AK, Nanda JS, Lorsch JR, Hinnebusch AG (2010) Regula-
tory elements in eIF1A control the fidelity of start codon enzyme-driven metabolism are based on the idea of
selection by modulating tRNAiMet binding to the ribosome. gene duplication, followed by divergence. In this
Genes Dev 24:97–110 regard, two main hypotheses have been proposed to
Schmitt E, Blanquet S, Mechulam Y (2002) The large subunit of explain how the metabolic pathways actually origi-
initiation factor aIF2 is a close structural homologue of
nated: The ▶ retrograde hypothesis suggests that, in E
elongation factors. EMBO J 21:1821–1832
Singh CR, Watanabe R, Zhou D, Jennings MD, Fukao A, Lee the case where a substrate tends to be depleted, gene
B-J, Ikeda Y, Chiorini JA, Fujiwara T, Pavitt GD et al duplication can provide an enzyme capable of supply-
(2011) Mechanisms of translational regulation by a human ing the exhausted substrate, giving rise to homologous
eIF5-mimic protein. Nucl Acids Res 39:8314–8328
Tanner NK, Linder P (2001) DExD/H Box RNA helicases: from enzymes catalyzing consecutive reactions. In other
generic motors to specific dissociation functions. Mol Cell words, if a compound A was essential for the survival
8:251–262 of primordial cells, when A became depleted from the
Theobald DL, Mitton-Fry RM, Wuttke DS (2003) Nucleic acid primitive soup, this should have imposed a selective
recognition by OB-fold proteins. Annu Rev Biophys Biomol
Struct 32:115–133 pressure allowing the survival and reproduction of
Yatime L, Mechulam Y, Blanquet S, Schmitt E (2007) Structure those cells that were become able to perform the trans-
of an archaeal heterotrimeric initiation factor 2 reveals formation of a chemically related compound “B” into
a nucleotide state between the GTP and the GDP states. “A” catalyzed by enzyme “a” that would have led to
Proc Natl Acad Sci USA 104:18445–18450
Zhou M, Sandercock AM, Fraser CS, Ridlova G, Stephens E, a simple, one-step pathway. On the other hand, the
Schenauer MR, Yokoi-Fong T, Barsky D, Leary JA, ▶ patchwork hypothesis postulates that duplication of
Hershey JWB et al (2008) Mass spectrometry reveals genes encoding primitive and promiscuous enzymes
modularity and a complete subunit interaction map of the (capable of catalyzing various reactions) allows each
eukaryotic translation factor eIF3. Proc Natl Acad Sci USA
105:18139–18144 descendant enzyme to specialize in one of the ancestral
reactions. These hypotheses could explain the actual
anatomy of the metabolical pathways. Recently,
another mechanism associated to the metabolical
growth suggests the existence of ▶ alternative routes
Evolution of Metabolism, Amino Acid or alternologs, defined as branches or enzymes that,
Biosynthesis Pathways proceeding via different metabolites, converge in
a common end product. These alternative branches con-
Georgina Hernández-Montes1, Dagoberto Armenta- tribute to genetic buffering similar to gene duplication. It
Medina2 and Ernesto Pérez-Rueda2 has been proposed that the origin of alternative branches
1
Departamento de Medicina Molecular y Bioprocesos, is closely related to different environmental metabolite
Universidad Nacional Autónoma de México, sources and life styles among species (Conant and
Cuernavaca, Morelos, México Wagner 2003; Hernandez-Montes et al. 2008; Horowitz
2
Departamento de Ingenierı́a Celular y Biocatálisis, 1945).
Instituto de Biotecnologı́a, Universidad Nacional
Autónoma de México, Cuernavaca, Morelos, México
Characteristics

Definition The study of the evolution of metabolism is central


to understanding the adaptive processes of cellular
Metabolical Networks and Evolution Models life, the emergence of high levels of organization
In classical terms, metabolism has been defined as (multicellularity), the diversity of cellular organization
a global process through which living systems acquire in the three major domains of life, Archaea, Bacteria,
E 688 Evolution of Metabolism, Amino Acid Biosynthesis Pathways

and Eucarya and the complexity of the living metabolic networks exhibit a high retention of dupli-
world (Caetano-Anolles et al. 2007). At present the cates within functional modules, and a preferential
large-scale information derived from genomics and biochemical coupling of reactions. This retention of
proteomics studies has allowed the development of duplicates may result from the biochemical rules
databases devoted to understand the metabolic pro- governing substrate-enzyme-product relationships.
cesses, such as KEGG and MetaCyc. The information The retention of duplicates between chemically dis-
contained in these databases can be analyzed from similar reactions is, however, also greater than
a network perspective (Diaz-Mejia et al. 2007) giving expected by chance as suggested Diaz-Mejia et al.
a more integrative view of cellular functioning and for (2007). Finally, a significant retention of duplicates
the global understanding of genomes properties and as groups, instead of single pairs, has been also
dynamics. In this regard, cell metabolism can be con- reported. In brief, the duplication events were evalu-
sidered as one of the most ancient biological networks, ated among modules using a hierarchical clustering
where the nodes represent substrates and/or enzymes algorithm. A paired measure of evolutionary distance
and the edges represent the relationships among them (ED) for all-against-all metabolic pathways was then
(Fig. 1). From this perspective, the study of metabolic calculated. In brief, the ED is a measure of the reten-
networks has focused on describing topological prop- tion of duplicates between pathways within and
erties such as the existence of functional modules, between modules, and is calculated as follows:
giving special relevance to the clustering and motif (ED) ¼ A0 /(A0 + AB), where A0 is the number of
formation, and showing the existence of similar attri- enzymes of the smaller pathway (pA) without homologs
butes to the small-world and scale-free networks in the second pathway (pB). AB is the number of
(Barabasi and Oltvai 2004). A small-world network is enzymes of pA with homologs in pB. At one extreme,
a type of graph in which most nodes are not neighbors when all the enzymes of pA have homologs in pB, the
of one another, but most nodes can be reached from evolutionary distance converges on 0. In contrast,
every other by a small number of steps, and where the when the two pathways share no homologs, the value
time required for a spreading of perturbation is close to of evolutionary distance converges on 1 (Diaz-Mejia
the theoretically possible minimum for a graph with et al. 2007). This analysis shows that metabolic path-
the same number of nodes and edges. This property ways of the same module tend to have a lower ED,
allows metabolism to react faster to perturbations. In suggesting a greater retention of duplicates within
addition, the scale freeness property (few nodes called modules than among them. Based on these data, the
“hubs,” having many more connections than the most capability of metabolic networks to grow modularly by
of others nodes in the network) is related with the gene duplication may be a consequence of two factors:
robustness of a metabolic networks, where the fraction the closeness between reactions, and the kind of sub-
of nodes with low connectivity in a small-world net- strate(s) participating within each module (Fig. 1b).
work is much higher than the fraction of hubs, in
consequence the probability of deleting an important Retention of Duplicates as Groups and Single
node is very low. Another relevant feature of metabolic Entities
networks is its modularity, where each module is It was observed that a high frequency of duplicates
a discrete entity of elementary components (enzymes were retained as groups (those defined here as pairs
and/or substrates) that perform a certain task, separable of consecutive reactions), instead of single entities.
from the functions of other modules. The elements of Based on this approach, it has been determined that
each module are similar to each other and may have enzymes catalyzing chemical similar reactions (CSRs,
been subjected to the same evolutionary process, such or those reactions that share a common enzymatic E.C.
as the amino acid biosynthesis pathways and lipid and number (preferentially the first two digits). Therefore,
nucleotide metabolisms (Fig. 1a). despite having similar reaction chemistry, enzymes
could exhibit different substrate specificities) in the
Metabolic Evolution by Recruitment of Modules fatty-acid metabolism have been originated by gene
Based on a network-based approach, an increased duplication (Diaz-Mejia et al. 2007). Thus, an ances-
retention of duplicates for enzymes catalyzing consec- tral pathway catalyzed both fatty-acid degradation and
utive reactions has been identified. As a consequence, biosynthesis could have originated in a first step.
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 689 E
a

b ED
1.00
0.67
0.33
0.00

Evolution of Metabolism, Amino Acid Biosynthesis Pathways, Fig. 1 (continued)


E 690 Evolution of Metabolism, Amino Acid Biosynthesis Pathways

Evolution of Metabolism, Amino Acid Biosynthesis Path- Modules comprising related branches are indicated by color as
ways, Fig. 1 The metabolism can be analyzed as a network, in (a). A value of ED closer to zero (the darker squares) implies
where enzymes are nodes and substrates are edges, allowing the a greater retention of duplicates between the two given path-
evaluation of the influence of network modularity on the reten- ways. From this tree the amino acid pathways (c) exhibit a large
tion of duplicates: (a) A hierarchical clustering showing the fraction of duplicated events. Posterior analyses showed that the
modules in metabolic networks (in colors) (b) Metabolic path- amino acid biosynthetic network exhibits diverse alternative
ways (branches in the trees) within and across modules were pathways, defined as alternative or alternolog, whereas ancient
compared using a measure of evolutionary distance (ED). pathways are more connected among them
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 691 E
The direction of this ancestral pathway would be of increasing genome size, protein structural complexity,
dependent on the acyl carriers and fatty acids available, and selective pressures in changing environments. In the
evidencing the existence of two different pathways. evolution of amino acid biosynthesis, for instance, diverse
Diaz-Mejia et al. and Light et al. (Diaz-Mejia et al. studies have suggested that they could be among the
2007; Light et al. 2005) using similar approaches earliest metabolic compounds and that their growth has
evaluated the generality of this observation, using an been affected by gene duplication. Based on this analysis,
all-against-all comparison of the enzymes catalyzing a core of 11 pathways that probably occurred in the last
consecutive CSRs. They identified that around 15% of common ancestor was identified using a combination of
enzymes have at least one homolog in a metabolic path- genomic tools with a network approach (Hernandez-
way. Of these, two thirds are retained as isolated dupli- Montes et al. 2008). Figure 1c. One of the most important
cates and a third is retained as groups, for instance, the results from this analysis is that diverse enzymes synthe- E
amino acid metabolic pathways. Alternatively, an esti- sizing 16 out of the 20 standard amino acids were identi-
mation of retention of duplicates in metabolism using an fied in the three cellular domains. It is important to
enzyme-centric network approach showed significant highlight that the full biosynthetic routes for these 16
results highlight the influence of both distances apart in amino acids, such as tryptophan, arginine, and proline,
the network and chemical similarity of reactions on the among others amino acid pathways appeared early in
retention of duplicates. Specifically, an increased reten- evolution. Therefore, when a network approach is used
tion of duplicates between consecutive reactions was to identify functional relationships among all pathways
observed and showed that this bias can be partially (Fig. 1c), the universal branches are connected to each
attributed to the preferential biochemical coupling of other, allowing the possibility that they feedback and
reactions. In addition, these analyses show a significant they complete a minimal set of enzyme-driven reaction
retention of duplicates acting on both CSRs and chemical for the biosynthesis of amino acids. In addition, path-
different reactions (CDRs, defined by the mismatch in ways, as lysine, methionine, and valine, among others,
the first two digits of the sequence enzymatic number. are partially distributed across the three domains of life,
Therefore, despite having different reaction chemistry, with branches identified in the LCA. This data suggest
enzymes could have similar substrate specificities), that partial branches may be intimately connected to the
supporting the idea that gene duplication is important main branches identified as the core, previously
in generating innovations as well as metabolic variants. described. In addition, in the same analysis, the authors
A synergy between closeness in the network and chem- identified the contribution of paralogs (duplicated genes
ical similarity between reactions explains the high reten- within an organism to occupy two different positions in
tion of duplicates between consecutive CSRs, suggesting the same genome, then copies evolve new functions,
that duplicates are significantly retained as groups that even if these are related to the original one), analogs
can be extended to several series of reactions. (genes that perform the same function but their
sequences do not have a common evolutionary history
New Insights on the Evolution of Metabolic and therefore come from independent origins) and
Pathways alternologs (see Definition section) in the evolution of
The modern perspective of metabolic processes has biosynthesis of amino acids. Eleven out of the 20 amino
shown that evolutionary studies must include not only acid biosynthetic routes revealed an important contribu-
phylogenetic relationships among enzymes, but also the tion of paralogs to the generation of diversity, 8 out of the
influence of some topological properties of metabolic 20 amino acids routes show contribution of analog
networks. In evolutionary terms, one can assume that the enzymes, while alternologs routes participate in 9.
universal occurrence of some pathways and branches in
modern species would suggest that they existed in the Concluding Remarks
last common ancestor (LCA). The evolution of these It is well accepted that biological networks exhibit sim-
pathways and the emergence of diverse homologues ilar topological properties, such as their scale-free behav-
reflect an increased metabolic diversity as a consequence ior and hierarchical modularity. Based on recent studies,
E 692 Evolution of Regulatory Circuits

diverse authors have suggested that the origin and evo-


lution of networks must consider not only the topological Evolution of Regulatory Circuits
properties they share but also those that differentiate
them (Artzy-Randrup et al. 2004). In this regard, the Xiaojuan Sun and Jinzhi Lei
modeling of metabolic networks by including the pref- Zhou Pei-Yuan Center for Applied Mathematics,
erential biochemical coupling of reactions explains more Tsinghua University of Beijing, Beijing, China
appropriately how metabolism evolves. A more detailed
analysis looking at other functional constraints, such
as metabolite similarity and binding versus catalytic
Definition
enzyme properties, as well as massive gene duplications
Evolution is the genetic change in a population of species
and horizontal gene transfer, could increase our under-
over generations. The certain changes that aid survival
standing of the influence of metabolic versatility in the
evolution of species. In this regard, Shuster et al. and reproduction will be selected in successive genera-
tions of population. Evolution with environmental
(Schuster et al. 2008) showed coevolution of increasing
changes contributes ▶ adaptation, a process in which
specific group-transfer metabolite coenzymes with their
specific enzymes, in a group-transfer network, selected an organism becomes better suited to its habitat. The
gradual modification of transcription circuits over evo-
for growth rate, suggesting a coupling between enzyme
lutionary time scale is an important source of diversity of
evolution and cell level fitness subject to reasonable
environmental conditions. Finally, genetic and genomic life (Tuch et al. 2008).
Evolution of regulatory circuits occurs through mod-
data have revealed that genetic dynamics, such as hori-
ifications either in transcription factors or in promoter
zontal gene transfer and gene duplications, in all organ-
isms has also influenced the evolution of metabolic structure. In the evolution of bacterial regulator circuits,
for example, there are four classes of changes (Perez and
processes, and given evidence that older enzymes are
Groisman 2009): (1) transcriptional rewiring whereby
more highly connected (duplicated and diverged enzyme
may preserve past reagents). the promoters of orthologous genes in related species
differ in the presence or absence of a binding site(s) for
a conserve transcription factor(s), (2) embedding of hor-
References izontally acquired genes under regulation of an ancestral
transcription factor, (3) restructuring of the promoters
Artzy-Randrup Y, Fleishman SJ, Ben-Tal N, Stone L (2004)
Comment on “network motifs: simple building blocks of controlled by a transcription factor, and (4) modifications
complex networks” and “superfamilies of evolved and in the transcription factors.
designed networks”. Science 305:1107; author reply 1107
Barabasi AL, Oltvai ZN (2004) Network biology: understanding
the cell’s functional organization. Nat Rev Genet 5:101–113 References
Caetano-Anolles G, Kim HS, Mittenthal JE (2007) The origin of
modern metabolic networks inferred from phylogenomic
Perez JC, Groisman EA (2009) Evolution of transcriptional
analysis of protein architecture. Proc Natl Acad Sci USA
regulatory circuits in bacteria. Cell 138:233–244
104:9358–9363
Tuch BB, Li H, Johnson AD (2008) Evolution of eukaryotic
Conant GC, Wagner A (2003) Asymmetric sequence divergence
transcription circuits. Science 319:1797–1799
of duplicate genes. Genome Res 13:2052–2058
Diaz-Mejia JJ, Perez-Rueda E, Segovia L (2007) A network
perspective on the evolution of metabolism by gene duplica-
tion. Genome Biol 8:R26 Evolution Programming
Hernandez-Montes G, Diaz-Mejia JJ, Perez-Rueda E,
Segovia L (2008) The hidden universal distribution of
amino acid biosynthetic networks: a genomic perspective Feng-Sheng Wang and Li-Hsunan Chen
on their origins and evolution. Genome Biol 9:R95 Department of Chemical Engineering, National Chung
Horowitz NH (1945) On the evolution of biochemical syntheses. Cheng University, Chiayi, Taiwan
Proc Natl Acad Sci USA 31:153–157
Light S, Kraulis P, Elofsson A (2005) Preferential attachment in
the evolution of metabolic networks. BMC Genomics 6:159
Schuster S, Pfeiffer T, Fell DA (2008) Is maximization of molar
Synonyms
yield in metabolic networks favoured by evolution? J Theor
Biol 252:497–504 Evolutionary algorithm; Evolutionary computation
Evolutionary Algorithm, Transcription Regulatory Network Construction 693 E
Definition
Evolution Programs
Evolution programming is a population-based
stochastic optimization technique that is used ▶ Optimization and Parameter Estimation, Genetic
in some mechanisms in artificial intelligence Algorithms
by simulated biological evolution to determine
a global optimal solution. The algorithm employs
real-coded variables and, in its original form,
it relied on mutation as the search operator. It Evolutionary Algorithm
starts with a set of finite state machines (FSMs)
FSM  fFSMt jFSMt ðInputÞ  Outputt g and a fit- ▶ Evolution Programming E
ness function to evaluate how well each finite ▶ Genetic Algorithms
state machine survives. The evolution drops the
FSMs with lower grades in fitness function and
replaces them with new FSMs. The new FSMs are
generated from the survivors with small differ- Evolutionary Algorithm, Transcription
ences, or called mutation. The evolution repeats Regulatory Network Construction
until a FSM with required grades in fitness func-
tion is found or the limit number of iterations is Xiujun Zhang
reached. Institute of System Biology, Shanghai University,
Shanghai, China

Characteristics
Synonyms
Pseudocode :

t=0 Genetic algorithm


Initpopulation FSM(0) // initialize random population of finite state machines
Evaluate FitFSM(0) // evaluate fitness of each FSM
Do
FSM(t+1)=survive(FSM(t)), mutate(survive(FSM(t)))
Definition
// keep the FSMs with good fitness
// generate new FSMs from the survivors with some Evolutionary algorithms (EAs) are a group of newly
perturb
t=t+1
developed heuristic optimization algorithms, which
While maximum iterations or required fitness is not attained are inspired by natural selection and evolution in
biological systems. EAs differ from other traditional
Example. Evolutionary programming has been optimization techniques, and they aim to find the
applied to determine network models of gene regula- optimal solution from a “population” of solutions
tory networks using synthetic and gene expression data rather than from a single initial solution so that EAs
from DNA microarrays (Alina et al. 2010; Martin et al. can avoid being trapped into the local optimal
2005). solutions to some extent.
In general, an EA consists of four operations,
including reproduction, mutation, recombination, and
References selection, and all these operations are repeated until the
algorithm converges to a certain point with some
Alina S, Heather JR, Martin C (2010) Comparison of evolution- criteria satisfied. In an EA, each candidate solution is
ary algorithms in gene regulatory network model inference. represented as a chromosome. In each step of EA, the
BMC Bioinformatics 11(59):1471–2105 chromosomes compete against each other and those
Martin S, Thomas H, Werner D, Johannes M, Niall P (2005)
Reverse-engineering gene-regulatory networks using evolu-
representing poor solutions will be kicked out before
tionary algorithms and grid computing. J Clin Monit Com next step. To evaluate the chromosomes, a fitness func-
19(4):329–337 tion is defined as the objective function. The solutions
E 694 Evolutionary Algorithm, Transcription Regulatory Network Construction

with high “fitness” are treated as good solutions, and In transcription regulatory network construction, the
will be “reproduced” from last step and “recombined” fitness function can be the difference between the
with other solutions by swapping parts of the chromo- observed gene expression data and the calculated
some. Furthermore, the chromosomes are “mutated” expression data. Another possible fitness function is
with small changes made to the elements of the the Akaike’s information criterion (AIC) (▶ Akaike’s
chromosomes. Some new chromosomes are generated Information Criterion (AIC)) which is a measure of the
through recombination and mutation operations. Only goodness of fitting an estimated statistical model with
those chromosomes with high fitness will enter next the data and can be used as the fitness function in
step. evolutionary algorithm–based transcription regulatory
Evolutionary algorithms have been successfully network construction (Shin and Iba 2003).
applied to various fields, such as pattern recognition,
biomedical engineering, and transcription regulatory The General Steps of EA
network construction. Especially, evolutionary algo- The general steps of evolutionary algorithm are listed
rithms can be used to calculate the ligand-receptor below:
binding affinities (▶ Binding Affinity) (Morris et al. 1. Initialize a population of chromosomes.
1998). The transcription regulations can be represented 2. Apply recombination and mutation to generate new
as a network, where nodes represent proteins or genes chromosomes.
and edges represent direct regulatory interactions (e.g., 3. Evaluate each chromosome according to the fitness
the binding of a transcription factor to the promoter of function.
its target genes). The transcription regulatory 4. Select chromosomes to reproduce and replace the
network (TRN) describes the TF-gene interactions as old chromosomes.
a graph or network, and gene expression is regarded as 5. Repeat steps 2–4, until the specified numbers of
a function of regulatory inputs specified by interactions generations are completed or special condition is
between proteins and DNA (Blais and Dynlacht satisfied.
2005). Transcription regulation has influence on most
physiochemical activities in a cell (Thomas 1998; An Example
Albert and Barabási 2002). Therefore, it is important Here, an example is used to explain the workflow to
to construct transcription regulatory network. Com- infer the transcription regulatory network by
pared with other methods for constructing transcrip- employing evolutionary algorithm. Suppose there is
tion regulatory networks, the evolutionary algorithms a microarray dataset with n genes measured at m time
are able to infer smaller regulatory networks with good points. The expression of genes at time point t + 1
sensitivity and precision. However, much more com- should be the regulatory result of the expression of
putation power is required to infer a large regulatory regulators at time point t. The regulatory relationship
network from the experimental data. of every gene with i ¼ 1, 2,  , n and t ¼ 1, 2,  , m
can be expressed as:

Characteristics a1i x1t þ a2i x2t þ    þ ani xnt ¼ xitþ1 (1)

The initial population of chromosomes that represent where aji is the regulatory strength of gene j on gene i, xjt
the regulatory relationships between genes can be is the expression data of gene j at time point t, and xit+1 is
produced randomly or according to user-defined the expression data of gene i at time point t + 1.
restriction. Sequentially, these chromosomes can be Since it is not known which genes regulate gene i, all
interpolated by recombination and mutation. Then the genes are possible candidate regulators for gene i.
fitness function (▶ Fitness Function) is employed to The regulatory strength aji can tell which genes regulate
evaluate these recombined and mutated chromosomes gene i, and those genes with aji 6¼ 0 are possible
and only those chromosomes with high fitness are regulators. Since there are usually only a few measure-
selected to reproduce the next generation population. ments (i.e., m) but with many variables (i.e., n) in the
One important issue in EA is the design of the fitness model, it is not easy to solve the problem. An initial
function which is the criteria for evolution. population of N solutions can be randomly generated.
Evolvability 695 E
The difference between the true expression data and the the quality of the solution. In order to identify better
calculated data is regarded as a fitness function which and better solutions, EAs apply some operators
can be used to evaluate the chromosomes. For each inspired by evolution of living organisms (mutation,
chromosome k, its fitness is defined as follows: recombination, and selection) to the population of
individuals.
 
Fn ¼ min xit  xk  (2) There are different types of EAs, such as Genetic
it
1kN
Algorithms, Evolutionary Programming and Evolution
Strategies, which differ mainly by the encoding of the
individuals and the operators used during the
evolution.
Cross-References E

▶ Akaike’s Information Criterion (AIC) References


▶ Binding Affinity
▶ Fitness Function Eiben AE, Smith JE (2003) Introduction to evolutionary
computing (natural computing series). Springer, Berlin

References

Albert R, Barabási AL (2002) Statistical mechanics of complex Evolutionary Capacity


networks. Rev Mod Phys 74(1):47–97
Blais A, Dynlacht BD (2005) Constructing transcriptional regu-
latory networks. Genes Dev 19:1499–1511 ▶ Evolvability, Generalized Biology
Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE,
Belew RK, Olson AJ (1998) Automated docking using
a Lamarckian genetic algorithm and an empirical binding
free energy function. J Comput Chem 19:1639–1662
Shin A, Iba H (2003) Construction of genetic network using Evolutionary Computation
evolutionary algorithm and combined fitness function.
Genome Inform 14:94–103 ▶ Evolution Programming
Thomas R (1998) Laws for the dynamics of regulatory networks.
Int J Dev Biol 42:479–485

Evolvability

Evolutionary Algorithms Marie I. Kaiser and Andreas H€uttemann


Department of Philosophy, University of Cologne,
Ettore Mosca, Ivan Merelli and Luciano Milanesi Cologne, Germany
Institute for Biomedical Technologies – CNR (Consiglio
Nazionale delle Ricerche), Segrate, Milan, Italy
Definition

Definition Two definitions of evolvability can be distinguished


(Love 2003): (1) EvolvabilityU is understood as the
In artificial intelligence, evolutionary algorithms (EA) ability of a population to respond to natural selection.
are generic population-based metaheuristic optimiza- This understanding of evolvability is especially found in
tion algorithms (Eiben and Smith 2003). In the context quantitative genetics. (2) A more restricted definition of
of EAs, each candidate solution of an optimization evolvabilityR was introduced to explain differential evo-
problem plays the role of an individual of a finite lutionary success. Contrary to the notion of fitness,
population of individuals, and is associated to a cost evolvabilityR is an exclusively population-level trait.
function (designated as fitness function) which defines EvolvabilityU is the capacity of a population for
E 696 Evolvability, Generalized Biology

evolutionary change and by this a function of the Characteristics


amount of variation available on which natural selection
can act. In biology and other realms, evolvability is a concept
applying to any Darwinian evolutionary system
(Wagner & Altenberg 1996). Here, “Darwinian evolu-
Cross-References tionary system” can be understood axiomatically (e.g.,
Nehaniv 2005): A (generalized) Darwinian evolution-
▶ Disposition ary system is taken to be any dynamical system in
which a population of individuals (a term that will
not be defined here, cf. (Michod 1999)) is character-
ized as satisfying certain axioms: individuals give rise
References
somehow to other individuals in a manner exhibiting
Love A (2003) Evolvability, dispositions, and intrinsicality. (1) heredity, (2) variability, (3) differential reproduc-
Philos Sci 70:1015–1027 tive success (depending at least in part on inherited
characteristics), and (4) finite resources and a turnover
of generations. The classical reasoning of Darwin and
Wallace (1858) then shows that in such a setting there
always is a struggle for existence that can lead to
Evolvability, Generalized Biology increasingly fit individuals.
However, many instances of Darwinian evolution
Chrystopher L. Nehaniv can be studied or constructed and these can vary
Royal Society/Wolfson Bio Computation Research greatly in their capacity to produce fitter individuals.
Laboratory, School of Computer Science, University Instances where the concept of evolvability can be
of Hertfordshire, Hatfield, UK applied include evolving biological populations on
earth; populations of computer programming as
in genetic programming; candidate solutions to opti-
Synonyms mization problems encoded as binary strings as
in typical genetic algorithms; or as vectors of real
Capacity to evolve; Evolutionary capacity numbers in evolution strategies; or populations of liv-
ing or life-like entities elsewhere in the universe
(cf. astrobiology); evolving populations of machines,
Definition RNA molecules, pharmaceuticals under design; artifi-
cial genetic regulatory networks, or forms of artificial
Evolvability in generalized biology refers to the life; or other as yet undiscovered forms of life.
capacity of a Darwinian evolutionary system to
evolve fit individuals. The evolvability of different Evolution of Evolvability
instances of Darwinian evolution can be quantified The origin and maintenance of evolvability in Darwin-
as the capacity of the evolutionary system (includ- ian evolution is a topic inviting controversy since evi-
ing mechanisms of heredity, interaction with the dently evolvability is not a property of an individual on
environment and other entities, etc.) to produce which natural selection can act. However, relating
individuals fitter than any that have yet existed evolvability to lineage selection (e.g., in any of the
(Altenberg 1994), or alternatively, fitter than any above-mentioned instances of Darwinian examples),
currently existing (Nehaniv 2005). one may attempt to treat a lineage as an abstract,
In addition, sometimes “evolvability” refers to the higher-level individual in the Darwinian evolution
capacity to evolve adaptive, complex structure often axioms. Populations whose “individuals” are (possibly
in an open-ended way in which the parameters of overlapping) lineages, lead to consideration of the
evolutionary space, such as the genotype-phenotype struggle for existence among lineages, and this may
relationship, are neither fixed nor circumscribed at lead to investigation of the evolution of evolvability in
the outset. terms of a different, higher level Darwinian dynamic.
Exact Test for Independence 697 E
Limitations
In more general settings, such as in discussions of Ex Vivo Imaging
“software evolution” or the “evolution” of culture, of
behavior, or of artifacts, one can try to analyze these ▶ Live Cell Imaging
systems in Darwinian terms and attempt to investigate
their evolvability; however, since in these realms it is
difficult to identify well-defined “individuals,” these Exact Test for Independence
concepts are challenging to apply (Nehaniv et al.
2006). Nevertheless, one can identify “descent with Larissa Stanberry
modification,” and persistence and spread analogous Bioinformatics and High-throughput Analysis
to survival and fecundity of individuals. However, Laboratory, Seattle Children’s Research Institute, E
fitness is then also difficult to define in such settings Seattle, WA, USA
when there are no clearly defined individuals, so the
capacity to produce fitter variants is similarly vague.
Definition

Cross-References Fisher’s exact test evaluates the hypothesis of indepen-


dence between two categorical random variables.
▶ Artificial Evolution
▶ Artificial Life
▶ Evolvability Characteristics
▶ Fitness
▶ Gene Regulatory Networks Fisher’s Exact Test for 2  2 Tables
▶ Genetic Algorithms Consider a 2  2 contingency table summarizing the
observed values of two random binary variables X and Y
0 1 Total
References
0 n00 n01 n00 + n01
Altenberg L (1994) The evolution of evolvability in
genetic programming. In: Kinnear KE (ed) Advances in 1 n10 n11 n10 + n11
genetic programming. MIT Press, Cambridge, MA, pp 47–74
Darwin C, Wallace A (1858) On the tendency of species to form Total: n00 + n10 n01 + n11 n
varieties; and on the perpetuation of varieties and species by
natural means of selection. J Proc Linn Soc Zool 3:45–62 If X and Y are independent, then the probability of
Michod RE (1999) Darwinian dynamics: evolutionary transi- observing any particular combination of frequencies
tions in fitness and individuality. Princeton University
Press, Princeton
given marginal totals is given by the hypergeometric
Nehaniv CL (2005) Self-replication, evolvability, and distribution:
asynchronicity in stochastic worlds. In: Lupanov OB,
Kasim-Zade OM, Chaskin AV, Steinh€ ofel K (eds) pðtÞ ¼ Pðn00 ¼ tÞ
Proceedings of the 3rd symposium on stochastic algorithms,
foundations and applications (Symposium stochastische ðt þ n01 Þ!ðn10 þ n11 Þ!ðt þ n10 Þ!ðn01 þ n11 Þ!
algorithmen, grundlagen und anwendungen) – SAGA’05, ¼
n!t!n01 !n10 !n11 !
Moscow, 20–22 Oct 2005. Lecture Notes in Computer
Science, vol 3777, Springer, 2005, pp 126–169. Corrected
version on-line at URL http://homepages.feis.herts.ac.uk/
When the marginal totals are fixed, n00 determines
comqcln/nehaniv-SAGA05-withcorrections.pdf
Nehaniv C, Hewitt H, Christianson B, Wernick P (2006) What the remaining cell counts. For 2  2 contingency
software evolution and biological evolution don’t have in tables, independence is equivalent to the unity odds
common. In: Proceedings of the 2nd international IEEE ratio, i.e.,
workshop on software evolvability (SE’06). IEEE Computer
Society Press, Philadelphia, pp 58–65
n00 n11
Wagner GP, Altenberg L (1996) Complex adaptations and the ¼1
evolution of evolvability. Evolution 50(3):967–976 n01 n10
E 698 Exon

Departure of the odds ratio from 1 represents evi- i.e., Pðjn00  Eðn00 Þj jt  Eðn00 ÞjÞ. The criteria
dence of association between the two variables. When can provide different results due to discreteness and
marginal totals are fixed, tables with larger n00 have potential skewness.
larger odds ratio providing evidence in favor of the
alternative hypothesis. The p-value of the test is given Conclusions
by the probability Pðn00 t Þ, where t∗ is the Fisher’s test of independence for 2  2 tables is
observed value of n00 (Agresti 2002). applicable for small sample sizes. The test uses exact
distributions rather than large-sample approximations.
Fisher’s Tea Tasting Experiment Note that in small samples it is not always possible
Fisher’s colleague claimed that when testing the tea, to achieve an exact significance level because the
she could distinguish the order in which milk and tea hypergeometric distribution is discrete and the p-value
were added to the cup. To test her claim, Fisher devised can assume only a few values. Consequently, the Fish-
a small tasting experiment in which four cups had milk er’s test is conservative as the true type I error would be
added first and another four cups had tea poured first. less than the pre-specified one.
The cups were presented in random order for tasting.
A colleague was asked to predict which four cups had
milk added first. The table below shows the results Cross-References
Truth/Guess Milk Tea Total
▶ Fisher’s Test
Milk 3 1 4

Tea 1 3 4 References
Total: 4 4 8
Agresti A (2002) Categorical data analysis. Wiley, Hoboken
Truly distinguishing the order of pouring as
opposed to simply guessing corresponds to testing
whether odds ratio is greater than one. Under the Exon
null hypothesis, the observed table has a probability
pð3Þ ¼ 0:229. The p-value is given by the probabil- Luiz O. F. Penalva
ity Pðn00 3Þ ¼ Pðn00 ¼ 3Þ þ Pðn00 ¼ 4Þ ¼ 0:243. Department of Cellular and Structural Biology,
Hence, there is no evidence to confirm that predic- Greehey Children’s Cancer Research Institute,
tions were more accurate than a simple guess. University of Texas Health Science Center,
The sample size, however, is rather small, so San Antonio, TX, USA
the lack of association cannot be deduced with
certainty. Fisher in reality was convinced of his
Definition
colleague’s ability.
Exons are short (50–200 bp in length) RNA sequences
Testing Two-sided Alternatives
that are found in pre-mRNAs. Exons contain the
To test for one-sided alternative, we can order the
sequences found in a mature mRNA. When a gene is
tables by their odds ratios or by their first cell fre-
transcribed, the precursor mRNA is composed of both
quency n00. The resulting p-value will be exactly the
introns and exons. After the removal of introns by the
same, irrespective of the choice of the ordering criteria.
spliceosome, the precursor mRNA matures into
In practice, the two-sided tests are a lot more
mRNA which is then exported into the cytoplasm.
common. However, when testing a two-sided alter-
This mature mRNA can now be translated into protein.
native, different criteria may result in different
p-values. To test for a two-sided alternative, one can
calculate the p-value by summing over Pðn00 Þ ¼ t Cross-References
for all t with pðtÞ  pðt Þ. Another criterion sums p(t)
for tables that are farther from the null, ▶ Post-transcriptional Regulatory Networks
Experiment 699 E
Expectation Experiment

▶ Prognosis Jutta Schickore


Department of History and Philosophy of Science,
Indiana University, Bloomington, IN, USA

Expectation-Maximization Algorithm
Definition
Kejia Xu
Institute of Systems Biology, Shanghai University, A procedure, method, or setting of devices and tools E
Shanghai, China designed to manipulate objects and study their features
and working.
Definition
Characteristics
An expectation-maximization (EM) algorithm is
a method for finding maximum likelihood estimates of
For the most part of the twentieth century, philosophy
parameters in statistical models where the available data
of science focused on scientific theories and concep-
set is incomplete. The EM algorithm consists of two steps,
tual foundations of science, with a special emphasis on
firstly guessing a probability distribution over comple-
physics. Only in the last 30 years or so, experiments
tions of missing data given the current model (known as
and biology have become themes for philosophical and
the E-step) and secondly reestimating the model parame-
historical analysis. In the following, I present some key
ters using these completions (known as the M-step).
general issues and questions for philosophy of experi-
Given a likelihood function L(y; X, Z), where y is
ment as well as a number of methodological, episte-
the parameter vector, X is the observed data, and
mological, and ethical challenges that arise specifically
Z represents missing values, the maximum likelihood
from experimenting with living, evolving things:
estimate (MLE) is determined by the marginal likelihood
What are the distinctive features and epistemic roles
of the observed data L(y; x), which is often intractable.
of experiments? What do experiments tell us about
Therefore, the EM algorithm maximizes the expectation
nature? When is an experimental result valid and infor-
of the log-likelihood function, based on the observed
mative? What ethical and sociopolitical concerns may
samples and the current estimate of y. The two steps of
arise from biological experimentation?
the algorithm are:
E-step: At the (t)th step of the iteration, where y(t) is
Analytic Descriptions of Biological Experiments
available, compute the expected value of the
and Their Epistemic Roles in Science
log-likelihood function.
One of the key tasks for philosophy of biological
experimentation is the identification of the distinctive
QðyjyðtÞÞ ¼ E½log LðyjX; ZÞjX; y ; (1)
features of biological experiments, their aims, and the
nature of experimental reasoning. Most of this work
M-step: Compute the next (t + 1)th estimate of y by
has been done in response to the kind of philosophy of
maximizing QðyjyðtÞÞ, that is,
science that dominated the early twentieth century.
Until the late twentieth century, philosophy of science
@QðyjyðtÞÞ
yðt þ 1Þ : ¼ 0; (2) was almost exclusively concerned with physics, and
@y
for the most part with physical theories, their structure,
and the dynamics of theory change. Experimental
References practice was often regarded as part of the “context of
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from
discovery” and thus not amenable to philosophical
incomplete data via the EM algorithm. J Roy Stat Soc B. analysis. Also, for a good portion of the twentieth
1977;39(1):1–38. JSTOR 2984875. MR0501537. century, philosophers generally recognized only one
E 700 Experiment

epistemic role for experimental evidence: It serves as of the system itself. Following François Jacob,
test to support or reject theories or to decide between Rheinberger emphasizes that experimental systems are
competing theories. Arguably, it was the biological “machines to create the future.” Because they have
sciences that helped challenge the received view the power to produce unexpected and surprising out-
about the nature of scientific experimentation and the comes, experimental systems influence the direction of
roles of experiments in science. Historians’ and phi- research. Rheinberger’s concept “experimental system”
losophers’ increased attention to biology in the 1980s thus redirects the focus of the analysis from the role of
and 1990s highlighted the diversity of scientific prac- experiments as tests for well-developed theories to the
tices and thus the limitation of physics-based philoso- role of experimental systems as generators of new
phy. They showed that experiments play a number of knowledge. The point of Rheinberger’s conception of
epistemic roles in science. Three conceptions have experimental systems is that it explicitly acknowledges
been particularly fertile for the study of experiments: the material, technical, and institutional setting of exper-
“discovering mechanisms,” “experimental systems,” iments. The experimental system with all its technical,
and “exploratory experimentation.” instrumental, institutional, social, and epistemic aspects
The notion that biological research can be described is the site of knowledge generation. This notion has
as “discovering mechanisms” has been very influential made it easy for historians and sociologists to connect
for recent thought about experimentation. This to Rheinberger’s work, while philosophers have been
approach has been developed to characterize the nature slower to take it up.
of experimental reasoning in biology, thereby chal- The third new conception that has become fruitful
lenging the view that reasoning practices are outside for philosophy of experiment more generally is
the scope of philosophical analysis. The basic idea “exploratory experimentation” (see also “▶ Explor-
originates in the study of molecular biology. Molecular atory Experimentation”). Episodes of experimentation
biology does not have the kind of grand theory based that can be characterized as “exploratory” have been
around a set of laws or a set of mathematical models identified in the history of physics as well as biology.
that is familiar from the physical sciences. Rather, The term highlights a mode of experimentation where
general knowledge in the field takes the form of explicit theoretical principles, guidelines, or concrete
descriptions of the interaction of entities and activities theoretical expectations are absent. In other words,
in a biological system (such as mechanisms of DNA exploratory experimentation is the opposite of experi-
replication or protein synthesis). Several philosophers mentation for the purpose of testing well-developed
of biology have argued that these mechanisms – theories. To say it with Richard Burian, exploratory
specific collections of entities and their distinctive experimentation is “data driven,” not “hypothesis
activities – are the fundamental unit of scientific dis- driven” (Burian 2007). Large-scale sequencing tech-
covery and scientific explanation (Machamer et al. nologies are a striking example of exploratory
2000). Experiments serve to identify the components research: These technologies generate hundreds of
and interactions of the mechanisms. This conception results, which can be considered robust even though
has proven valuable not only in molecular biology the theoretical interpretation remains unclear.
but in a wide range of special sciences, such as In recent studies of biological experimentation, the
neuroscience. once-common view that the epistemic role of experi-
A very fruitful analytic term to describe biological ments is to be a “test” of theories has thus come under
experiments is Hans-J€ org Rheinberger’s notion “exper- attack from various angles. Of course, biological
imental system,” a term that is taken directly from experiments may function as tests, but they may also
experimental biologists’ parlance. An experimental sys- fulfill a number of other functions. They may be
tem is the working unit of both scientists and epistemol- motors for the reorientation of research activity, aids
ogists. The basic idea is that investigators never perform for discovering mechanisms, and devices for creating
isolated experiments but series of interlocking experi- robust results where no theory is involved.
mental trials, and that the major driving force for mod-
ifications and changes in the experimental settings Reality and Reliability
and for reorientations of questions and goals is not General philosophy of scientific experimentation has
a preconceived theory or hypothesis but the behavior concentrated on two issues: the reality of the objects of
Experiment 701 E
our theories and experiments and the reliability of standardized experimental objects (see “▶ Model
experimental results. The long-standing debate in phi- Organism”). They are selected because they are very
losophy of science about realism and antirealism con- simple, readily available, hardy, with short gestation
cerns the interpretations of the relation to reality of periods, and a large number of offspring, such as the
scientific theories and the entities that these theories nematode C. elegans or the fruit fly. These objects
are about, such as electrons and quarks. The philosoph- are studied as stand-ins for others. Features that have
ical question is the following: Do these things really been uncovered in detail in one model organism are
exist in the world, or are they merely theoretical con- taken to be “exemplars” for others that cannot be
structs (useful fictions, conceptual tools) that scientists easily studied.
use to explain observations and experimental results? Philosophical discussion has focused on a number
Like physics and chemistry, biological research also of issues such as the sense in which model organisms E
concerns processes, phenomena, and objects that can- are “models” (Ankeny 2001). Are they exemplars or
not be directly observed, such as genes, neurons, and abstractions, typical, ideal, homologous, or analogous
proteins. What reasons do we have for believing models? And is it really appropriate to transfer the
that the objects and phenomena of biological thought insights gained about model organisms to organisms
really exist? in their natural habitat, especially given that in some
In his influential book Representing and Interven- cases, the laboratory environment has changed the
ing, Ian Hacking makes this debate a topic for philos- model organisms currently being used to an extent
ophy of experiment. Hacking argues that the dispute that they no longer resemble organisms in the wild?
between realists and antirealists can be brought to Also, to what extent is it legitimate to transfer these
a close if we take into account that experimenters insights to other, infinitely more complex organisms,
often use experimental objects to manipulate others – especially humans, given that model organisms are
particle accelerators, for instance, utilize subatomic usually chosen for practical purposes, such as easy
particles to interfere with other particles. In these tractability and general availability? Philosophers
cases, Hacking claims, one may assume that the parti- have addressed these questions through examining
cles that are used as tools are real. Biologists base their modeling practices and the usages of the information
interpretations and explanations of biological things gained from experimenting with model organisms
and phenomena on the data and images produced by (Weber 2005, Chap. 6).
instruments in often highly complex experimental Another central task for philosophy of experimen-
settings. But like physicists, they routinely interfere tation is the assessment of the empirical information
with the research objects and assess their results by generated in experiments. Experiments are complex
assessing the effects of their interventions. Like phys- affairs involving intricate settings and recalcitrant
icists, they thus have good reason to assume that the objects, and it is a struggle to get experiments to
subvisible things and phenomena they investigate work. When is an experiment(al result) reliable and
really exist. informative? Recent philosophers of experiment have
But even if we follow Hacking and adopt a realist put together sets of strategies that may be used to
stance vis-à-vis experimental objects in biology, other assess the validity of experimental results, including
questions arise. Experimental settings in the laboratory experimental checks and calibration; elimination of
are artificial surroundings very different from natural plausible sources of error and alternative explanations
habitats, and in this sense, experiments create effects of the result; using an apparatus based on a well-
that do not exist in nature. This raises issues about corroborated theory; and statistical validation of the
the scope of the knowledge generated by laboratory evidence (Franklin 1986).
experiments. What, if anything, can such experiments Several philosophers of biology have argued that
tell us about objects and phenomena outside the the concept of robustness plays a major part in the
laboratory? validation of experimental results. If multiple different
This question is especially pressing in the context of experiments are robust, that is, if they agree in their
biological experimentation because today much exper- outcomes, we may assume that the experimental result
imental work is carried out on a small number of model is reliable. In its most general characterization, the
organisms. Model organisms are selected and concept of robustness straddles epistemology and
E 702 Experiment

ontology. Robustness is regarded as the decisive crite- problem at hand and clarifying the meaning and moral
rion in separating what is real from what is unreal and significance of key concepts such as harm and
what is reliable from what is unreliable (Wimsatt coercion, but without appeal to high-level ethical
1981). In recent years, philosophers and historians of theory. Philosophers of experiment may contribute to
biology have probed a number of scientific episodes to these discussions through the analysis of potential
establish whether robustness really is the main crite- conflicts between methodological standards of valid
rion for the assessment of experimental outcomes, and experimentation and ethical concerns about responsi-
what other criteria have been used, and should be used, ble experimentation.
to evaluate experimental results. This discussion is Two fields of biomedical research in particular have
still ongoing. been subject of heated ethical as well as political
debates: stem cell research and the human genome
Sociopolitical and Ethical Dimensions of Biological project. Stem cell research aims to identify the mech-
Experimentation anisms that govern cell differentiation. The ultimate
Scientific experimentation raises important ethical and goal is to turn stem cells into specific cell types that can
sociopolitical issues related to the use of experimen- be used for treating debilitating and life-threatening
tally generated knowledge and the responsibility of diseases and injuries. While this research promises to
investigators. These are vast topics, and the discus- have extremely important therapeutic implications, it
sions surrounding them have grown extremely com- is also strongly contested because stem cells are
plex and controversial, involving experts and obtained from human embryos. The very practice of
arguments from ethics, politics, religion, and law. experimenting raises intricate ethical questions. Is it
Here I can only touch upon some of the questions and ethical to destroy human embryos for the purposes of
issues that need to be addressed. clinical research? Here, the discussions focus on the
Most sensitive ethical questions arise of course in beginning of human life, the embryo’s right to life and
experiments with living beings, especially in biomed- the relative values of embryonic life and the life of
ical contexts. What ethical codes are and should be a mature human being. At present, each of these points
applied in biological and biomedical experimentation? is still under discussion.
What are the limits of ethically responsible experimen- The human genome project (HGP) – the mapping
tation? On the most general level, experimenters are and sequencing of the human genome – has also
confronted with the question: When is it acceptable to generated much discussion among philosophers of sci-
expose some to risks of harm for the benefit of others? ence, bioethicists, political philosophy, and philosophy
This question has been hotly debated ever since the of law, but here the discussion has focused more on the
first experiments were performed on humans and ani- application and use of the knowledge generated by the
mals (cf., LaFollette and Shanks 1997). The general HGP. As the HGP assists scientists in the identification
ethical principles that should guide human experimen- of genes and their functions, genetic testing made
tation – the principles of beneficence, respect, and possible by the human genome project has raised
justice – are stipulated in official documents and policy a number of concerns and worries, for instance,
statements, as are the regulations for the use of animals about access to genetic information by insurers and
in research facilities. But the interpretation of these employers, and the possibility of genetic discrimina-
guidelines and their application to concrete cases are tion. Prenatal genetic testing raises concerns about
often extremely difficult. For a long time, it was com- reproductive rights and eugenics. The rapid develop-
mon to assume that the appeal to ethical theory such as ment of new technologies poses additional challenges
utilitarianism, deontology, or virtue ethics should to the discussions.
guide the practice of bioethics. But philosophical Finally, the commercialization of biomedical
discussions often proved unhelpful in concrete cases. research raises challenging questions for philosophers
In recent philosophical debates about the relation of biology. Arguably, in recent decades, the influence
between (applied) bioethics and philosophical theory, on biomedical research of business and industry has
many bioethicists and several philosophers have led to changes in the ethical and methodological norms
become skeptical of applying ethical theory to practi- of biomedical practice (Krimsky 2003). It is an impor-
cal problems. Instead, they advocate starting with the tant task for philosophy of experiment to assess and
Experimental Design 703 E
critique the impact on the integrity of biomedical
experimentation of the privatization and commercial- Experimental Design
ization of biomedicine.
Philosophy of experiment is still a vibrant branch of Clemens Kreutz1 and Jens Timmer2,3,4
1
philosophy of science, and each of the aspects Institute for Physics, Freiburger Center for Data
discussed in this entry invites further study. As the Analysis and Modeling (FDM), University of
recent development of the field shows, philosophy of Freiburg, Freiburg, Germany
2
experiment has considerably broadened its scope, Institute for Physics, University of Freiburg, Freiburg,
seeking to make contact with the biological sciences Germany
3
as well as history and sociology of science and thus to BIOSS Centre for Biological Signalling Studies and
increase its social relevance. Freiburg Institute for Advanced Studies (FRIAS), E
Freiburg, Germany
4
Department of Clinical and Experimental Medicine,
Cross-References Link€oping University, Link€oping, Sweden

▶ Exploratory Experimentation
▶ Model Organism Definition

An experimental design or shortly a design is defined


References as the set of experimental conditions which can be
chosen by the experimenter. In statistics, the exper-
Ankeny R (2001) Model organisms as models: understanding the imental conditions are also called independent vari-
‘Lingua Franca’ of the human genome project. Philos Sci
ables or design variables. In contrast, the measured
(Proc) 68:S251–S261
Burian RM (2007) On MicroRNA and the need for exploratory variables are called dependent variables because
experimentation in post-genomic molecular biology. Hist the realizations depend on the independent variable,
Philos Life Sci 29:285 i.e., on the design, and on the system’s behavior.
Franklin A (1986) The neglect of experiment. Cambridge
The design D usually covers all variables of
University Press, Cambridge
Hacking I (1983) Representing and intervening. Cambridge a model F(D, y) except the parameters y.
University Press, Cambridge In systems biology, an experimental design D usu-
Krimsky S (2003) Science in the private interest. Has the lure of ally specifies the choice of the external perturbations,
profits corrupted biomedical research? Rowman-Littlefield
the observables, as well as the number and time points
Publishing, Lanham
LaFollette H, Shanks N (1997) Brute science: of the measurements (Kreutz and Timmer 2009). In
dilemmas of animal experimentation. Routledge, London/ mathematical terms, the design D ¼ {ti,ui,oi} is the set
New York of all measurement times ti, perturbations ui, and
Machamer P, Darden L, Craver C (2000) Thinking about mech-
observables oi for all data points i.
anisms. Philos Sci 67:1–25
Rheinberger H-J (1997) Toward a history of epistemic things. The procedure of finding optimally informative
Stanford University Press, Stanford experimental designs, i.e., to choose the number of
Weber M (2005) Philosophy of experimental biology. data points N as well as ti,ui,oi for all measurements
Cambridge University Press, Cambridge
i ¼ 1,. . ., N, is termed optimal experimental design or
Wimsatt WC (1981) Robustness, reliability, and overdeter-
mination. In: Brewer MB, Collins BE (eds) Scientific inquiry design of experiments (DoE).
and the social sciences. Jossey-Bass, San Francisco,
pp 124–163
Cross-References

▶ Active Learning
Experiment Design ▶ Experimental Design, Variability
▶ Input
▶ Model-based Experimental Design, Global Sensitiv- ▶ Observable
ity Analysis ▶ Optimal Experimental Design
E 704 Experimental Design, Variability

References Experimental Design, Variability, Table 1 Sample ANOVA


table showing sources of variability for a medical study utilizing
Kreutz C, Timmer J (2009) Systems biology: experimental proteomics analysis
design. FEBS J 276:923–942 Source DF
Hospitals m–1
Treatment 1
Patients (hospitals) n(m – 1)
Treatment x hospitals m–1
Experimental Design, Variability
Treatment x patients (hospitals) n(m – 1)
Samples (patients) kn(m – 1)
Roger Higdon
Treatment x samples (patients) lkn(m – 1)
Seattle Children’s Research Institute, LC fractions (samples) jlkn(m – 1)
Seattle, WA, USA Treatment x LC fractions (samples) jkln(m – 1)
MS runs (LC fractions) ijkln(m – 1)
Treatment x MS runs ijkln(m – 1)
Synonyms Total N – 1 ¼ 2mnklji – 1

Analysis of variance (ANOVA) tables; Components of


variance; Sources of variability of variation. Terms higher up in the table contain
variability from the terms below. Parentheses explic-
itly mark terms that are nested within others while
Definition crossed terms indicate interactions between different
levels of variation. As can be seen in Table 1, the
Identifying and accounting sources of variability is one of potential sources of variation in systems biology can
the key aspects of statistical experimental design. These get very large even in a relatively simple study.
can correspond to experimental factors under study, sub- Not replicating sources of variability can lead to
jects or biological specimens, samples for subjects, time confounding between different sources of variability.
points, technical replication of experimental protocols, For example, in Table 1 if there is only one patient per
and blocking or grouping of subjects or samples. treatment group and even if samples or MS runs are
replicated, one will not be able to differentiate vari-
ability due to differences in treatments from the vari-
Characteristics ability due to differences between patients. This would
be an example of pseudo-replication. Even if sources
Sources of variability in the experimental design of of variability are replicated, insufficient replication can
biological study are often divided into two categories: lead to poor power for statistical tests. This can occur
biological variability (variability due to subjects, even when many repeated runs of a sample are done
organisms, and biological samples) and technical var- with little replication of biological samples or subjects.
iability (variability due measurement, instrumentation, Sources of variability need to be accounted for in
and sample preparation). However, within biological the analysis and experimental design of a study.
and technical variability there may be many levels and Historically, R. A. Fisher was among the pioneers in
sources of variability, including experimental factors, utilizing statistical methods to account for variability
time points, subsampling, repeated measurements, and in experimental design (Fisher 1918). There are
blocking. In order to design and analyze biological several approaches to accounting for variability within
experiments, it is necessary to properly account for the experimental design. Holding factors constant by
the different sources of variation (Box et al. 2005; utilizing standard protocols, the same instrumentation
Cox and Reid 2000). A useful tool for keeping track and settings, the same operator, and so on can elimi-
of the sources of variability is the analysis of variance nate the sources of variability. Using blocking can
(▶ ANOVA) table. Table 1 provides a simple exam- reduce other sources of variability. Blocking is group-
ple; it shows the sources of variation and the levels of ing experimental samples within similar experimental
replication (degrees of freedom) and indicates the level or environmental conditions, for example, grouping by
Experimental Standard Conditions of Enzyme Characterizations 705 E
time, geography, or batch. If all experimental factors Fisher R (1918) Studies in crop variation. I. An examination of
are represented within the same block, comparisons the yield of dressed grain from Broadbalk. J Agr Sci 11:107–135
Kreutz C, Timmer J (2009) Systems biology: experimental
can be made within the block; if not, experimental design. FEBS J 276(4):923–942
variability may become confounded with the blocking
factor. Randomizing experimental factors to biological
samples and biological samples to blocks can convert
other sources of variation to random variation that Experimental Methods for Studying
can then be accounted for by the statistical model. Immune Signaling Network Dynamics
Replication of biological samples and blocks can
reduce the amount of biological variability. ▶ Immune Pathway Dynamics, Biological Methods
There are a number of approaches that can be E
applied to the analysis stage to reduce or eliminate
sources of variability. Normalization can reduce sys-
tematic and technical sources of variability that are Experimental Organism
part of the measurement process. The use of covariates
such as measurements of the initial state of a biological ▶ Model Organism
sample such as weight or age can correct for variability
in the initial conditions or other properties of biologi-
cal samples.
There are examples of such approaches in systems Experimental Planning
biology (Kreutz and Timmer 2009). In relative expres-
sion analysis, multichannel microarray studies implicitly ▶ Designing Experiments for Sound Statistical
use the array as a blocking factor to eliminate the mea- Inference
surement variability between arrays (Churchill 2002).
Other types of blocking are common in high-throughput
studies: the day the samples were run, the cage the mice
were kept, the instrument used, the lab that ran the assay, Experimental Standard Conditions of
or the 96 well plate containing the sample. Normalization Enzyme Characterizations
is commonly used in high-throughput data studies such as
microarrays or proteomics to eliminate systematic vari- Carsten Kettner
ability. Mixed and multilevel models are an effective Beilstein-Institut zur F€orderung der Chemischen
statistical analysis approach to deal with multiple sources Wissenschaften, Frankfurt am Main, Germany
of variability.

Synonyms
Cross-References
ESCEC
▶ Mixed and Multi-Level Models
▶ Relative Expression Analysis
Definition

The Experimental Standard Conditions of Enzyme Char-


References
acterizations (ESCEC) is a conference series biennially
Box GE, Hunter WG, Hunter JS, Hunter WG (2005) Statistics hosted by the Beilstein-Institut in R€udesheim, Germany
for experimenters: design, innovation, and discovery, (Hicks and Kettner 2010). The ESCEC symposia serve
2nd edn. Wiley, New York as a platform for the exchange of ideas and link to
Churchill GA (2002) Fundamentals of experimental design for
the scientific community. Experts from all fields of
cDNA microarrays. Nat Genet 32:490–495, Supplement
Cox DR, Reid NM (2000) The theory of design of experiments. experimental, theoretical, and bioinformatics enzymology
Chapman & Hall/CRC, Boca Raton and metabolic network investigation present and discuss
E 706 Experimental Unit

new results, approaches, and methodologies as well as


pitfalls and problems of data generation and reproduction. Explanation in Biology
Suggestions from the STRENDA Commission form
the basis of subsequent discussions in following Arno Wouters
symposia where they were are improved, rejected, or Department of Philosophy, Erasmus University
replaced by alternatives. Rotterdam, Rotterdam, The Netherlands

Cross-References Definition

▶ STRENDA An explanation of a characteristic of a system is


an account of why that system has that characteristic.

References
Characteristics
Hicks MG, Kettner C (2010) Experimental standard conditions
of enzyme characterizations. Logos, Berlin
Explanations in biology, unlike those in physics, rarely
proceed by filling out the parameters in mathematical
equations (laws) that describe the general characteris-
tics of the relevant form of organization. With the
Experimental Unit notable exception of population genetics and theoreti-
cal ecology, biologists typically employ mechanistic,
Olga Vitek functional, and historical strategies of explanation,
Department of Statistics, Department of Computer rather than mathematical ones. To some scientists,
Science, Purdue University, West Lafayette, IN, USA this is a sign of the immaturity of biology, perhaps
resulting from the difficulty to take into account the
extraordinary high number of parameters and variables
Definition relevant to the phenomena that interest biologists.
Structuralist biologists such as Brian Goodwin (e.g.,
Experimental unit is the basic unit of experimental 1994) have pointed to the dominance of gene-centered
design and is the subject, sample, or object that carries and historical approaches as the main source of imma-
the condition or the treatment. turity. They hope that in the future, more research,
more data, more computational power, better mathe-
matics, and above all, more insight into the proper way
Cross-References of explaining in the natural sciences and more willing-
ness to pursue the structuralist strategy will enable
▶ Designing Experiments for Sound Statistical what is thought to be a more scientific approach.
Inference Many others reject the view that the minor role of
general laws in biological explanation is a sign of its
immaturity. They argue that the objects of study in
biology have characteristics that justify approaches dif-
Expert Elicitation ferent from those suitable in physics. One such view
points to the process by which life gets its shape: evolu-
▶ Prior Elicitation tion by natural selection. For example, the evolutionary
biologist Ernst Mayr (1904–2005), one of the founding
fathers of the modern synthesis, argues that it makes
Expert System sense to ask why questions in biology but not in physics
and chemistry (e.g., Mayr 1997, Chap. 6). The reason is
▶ Knowledge-based System that organisms owe their characteristics to their history,
Explanation in Biology 707 E
whereas history is, in general, not important to the char- explain why a certain mechanism has the characteris-
acteristics of physical and chemical systems. So whereas tics it has.
physics and chemistry must limit themselves to how A better justification starts with the observation that
questions, biologists should address why questions in organisms are highly organized (see also the essay on
addition to how questions. Within biology, how ques- ▶ Organization): their capacities and characteristics
tions and why questions are, according to Mayr, the critically depend not only on the characteristics of
subject of two “largely separate fields”: functional biol- their parts but also on the spatial arrangement of
ogy and evolutionary biology, corresponding with two those parts and on the order and timing of their activ-
modes of explanation: functional explanation and ities. Explanations in physics and chemistry are typi-
evolutionary explanation. Functional explanations cally aggregative: they derive the behavior of a system
answer how questions by describing the operation of (such as the behavior of a volume of gas as described E
mechanisms. Evolutionary explanations answer why by the Boyle-Charles’ law) by aggregating the behav-
questions by revealing the evolutionary history of those ior of the parts of that system (the molecules of which
mechanisms. Both kinds of explanation are legitimate the gas is composed as described by Newton’s laws of
and needed to understand the living world, but evolu- motion). As philosopher William Wimsatt (e.g., 2007,
tionary explanations provide the distinctive biological Chap. 12) has argued, very few system properties are
perspective. aggregative under all possible decompositions of
Whereas Mayr seeks the justification of the biolo- a system into parts. The list is pretty much exhausted
gist’s concern with why questions merely in the impor- by the quantities that appear in conservation laws:
tance of history for understanding the characteristics of mass, energy, momentum, and net charge. All other
organisms, many philosophers (e.g., Dennett 1995) system properties are more or less organized in the
refer to the specific character of that history as being sense that they break down under at least one of the
a selection history. Unlike physical objects, organisms four conditions for aggregativity identified by Wimsatt
have the character they have because ancestral variants (invariance under rearrangement and substitution, size
with those characteristics were favored over variants scaling, decomposition and reaggregation, and linear-
with other characteristics. For that reason in biology, ity). By carefully choosing a decomposition, by limit-
but not in physics and chemistry, the answer to the ing explanations to certain forms of organization
question how a certain characteristic is produced must and by introducing idealizations and approximations
involve an answer to the question why (e.g., for what (in the description of the system and in the derivation
effects) that characteristic was favored in the selection of the system’s behavior), the power of aggregative
process. explanation can be expanded immensely. However, if
In my view, attempts to justify types of explanation a system is very highly organized, there comes a point
in biology that differ from explanations in physics and where aggregative explanation is no longer possible
chemistry by appeal to the importance of history for and where the system’s organization must be taken into
understanding organisms or by appeal to the special account.
character of that history are unsatisfactory. Such In biology, this is done by viewing organisms as
attempts take the importance of history for granted, mechanisms (▶ Mechanism) for being alive (Wouters
whereas they should explain it. Moreover, they fail 2005). For the purpose of this essay, a mechanism for
to address the many differences between functional a certain behavior can be defined as a complex system
biology and physics and chemistry (see Fox Keller that produces that behavior by the organized interac-
2002 to get a feeling for the weirdness of mechanistic tion of its parts (cp. Glennan 1996; Machamer et al.
explanation in biology in the eyes of a theoretical 2000). Because the behavior of a mechanism results
physicist). The most notable differences are a prefer- from its organization, a mechanism can be seen as
ence for mechanistic models that visualize interactions a solution to the problem of how to organize the parts
over abstract mathematical system descriptions and their interaction in such way that this behavior is
(i.e., equations that do not bear an obvious relation to generated. Note, that this problem need not be experi-
the physical properties of the parts of the system), enced by the mechanism or some other system. The
appeal to role functions to explain how mechanisms difficulty (i.e., the amount of organization needed) to
work and the appeal to advantages and requirements to produce or maintain that behavior in the circumstances
E 708 Explanation in Biology

in which it occurs suffices to talk of problems. As explanations) but also why that mechanism’s organi-
George Cuvier (1769–1832), the founding father of zation solves the problem, whereas other forms of
functional zoology, already noted, the very existence organization (often called “designs”) do not. This
of organisms means that they have solved the problem opens up the possibility and the need to explain why
of how to stay alive (cf. Reiss 2009). Organisms exist a mechanism has the characteristics it has on the basis
far from thermodynamic equilibrium and can exist of the requirements it has to satisfy: the constraints
only by actively maintaining their organization imposed on it by the problems it must solve, the other
(Prigogine and Stengers 1984). To understand how characteristics of that mechanism, and the conditions
this form of existence is possible, a combination of under which it works. This is what functional explana-
explanations of four kinds must be employed, as many tions (▶ Explanation, Functional) do.
biologists have recognized (e.g., Tinbergen 1963): Furthermore, because, by definition, one does not
mechanistic explanation, functional explanation, devel- get a mechanism by throwing its parts together, the
opmental explanation, and evolutionary explanation. question of how the organization that solves the prob-
Mechanistic explanations explain how the behavior lems arises in the course of time needs consideration
of a mechanism arises from the properties of the parts, too. As became clear in the wake of Charles Darwin’s
their interaction, and the way in which this interaction theory of evolution, in the case of living mechanisms,
is organized. Biology’s functional perspective glues the answer requires a two-staged explanation. Devel-
explanations of different mechanisms together by opmental explanations (▶ Explanation, Developmen-
viewing those mechanisms in terms of their role in tal) explain how the organization arises in the course
maintaining the state of being alive (i.e., in terms of of individual development. Evolutionary explanations
their biological role). The different organ systems have (▶ Explanation, Evolutionary) explain how the machin-
specific roles in bringing about the living state. Each ery to develop the required organization arose in the
organ system consists of organs with specific roles in course of evolutionary history.
bringing about the properties of the organ system that So, the view of organisms as solutions to the prob-
enable that organ system to perform its role in the lem of how to maintain the living state provides
maintenance of the living state. Each organ in turn is a unifying perspective in biology that explains and
divided into subsystems, each with a specific role in justifies the importance of the four kinds of explanation
bringing about the properties relevant to that organ’s employed in biology, the relation between those expla-
biological role. And so on, until a level is reached in nations, and the main differences between biology on
which the relevant system properties arise out of unor- the one hand and physics and chemistry on the other.
ganized components. Attributions of biological roles
(often called “function ascriptions”) (see ▶ Function,
Biological) situate the different mechanisms in this Cross-References
encompassing hierarchical organization, making it
possible to see different mechanistic explanations as ▶ Explanation, Developmental
part of the larger project to understand how organisms ▶ Explanation, Evolutionary
are able to stay alive. Attributions of biological roles ▶ Explanation, Functional
are, hence, the key to explanation in biology. ▶ Function, Biological
The very fact that the behavior of mechanisms ▶ Mechanism
(including organisms) results from the organization ▶ Organization
of their parts (in addition to their composition) means
that in order to understand a mechanism, it does not
suffice to explain how it works. The mechanism’s References
solution to a certain problem need not be the only
possible solution to that problem. On the other hand, Dennett DC (1995) Darwin’s dangerous idea. Simon & Schuster,
by definition, not any form of organization will solve New York
Fox Keller E (2002) Making sense of life. Harvard University
any problem. So in order to understand a mechanism,
Press, Cambridge
one must not only explain how its organization results Glennan SS (1996) Mechanisms and the nature of causation.
in a certain behavior (as is done in mechanistic Erkenntnis 44(1):49–71
Explanation, Developmental 709 E
Goodwin BC (1994) How the leopard changed its spots. Phoenix narrowly, it means the process leading from the zygote
Books, London to the adult form. The larger meaning covers the whole
Machamer PK, Darden L, Craver CF (2000) Thinking about
mechanisms. Philos Sci 67:1–25 life of the organism, which includes episodes such as
Mayr E (1997) This is biology. Harvard University Press, reproduction and senescence (West Eberhardt 2003;
Cambridge Gilbert 2003). Developmental explanations pinpoint
Prigogine I, Stengers I (1984) Order out of chaos. Heinemann, events in the lifetime accounting for an individual
London
Reiss JO (2009) Not by design: retiring Darwin’s watchmaker. carrying a trait, whereas evolutionary explanations
University of California Press, Berkeley (▶ Explanation, Evolutionary) often stand at the pop-
Tinbergen N (1963) On aims and methods of ethology. ulation level.
Z Tierpsychol 20:410–433
Wimsatt WC (2007) Re-engineering philosophy for limited
What Development Explains E
beings. Harvard University Press, Cambridge, MA
Wouters AG (2005) The functional perspective of organismal The general explanandum of developmental explana-
biology. In: Reydon TAC, Hemerik L (eds) Current themes tions is twofold: to explain why and how a zygote
in theoretical biology. Springer, Dordrecht, pp 33–69 develops into an adult of its species, with its species-
typical traits; and to understand why and how systematic
deviations from the normal type occur, which usually
concerns “teratology.” Often, a back and forth move-
Explanation, Developmental ment between teratology and normal development char-
acterizes developmental explanation, because knowing
Philippe Huneman which alterations (e.g., silencing genes; injecting pro-
Institut d’Histoire et de Philosophie (IHPST), teins, etc.) produce a specific abnormal development
des Sciences et des Techniques Université Paris 1 allows understanding what is necessary to develop the
Panthéon-Sorbonne, Paris, France corresponding normal trait; therefore, many experimen-
tal developmental theorists proceed by such controlled
perturbations of development, at any hierarchical level
Synonyms (genes, cell, tissues).
Developmental explanations target ▶ model organ-
Embryological explanation isms such as nematodes, sea urchins, frogs, chicken;
through those models they aim at understanding what
is common across many species and clades. Develop-
Definition ment has many varieties: some species hatch, some
have larval stages, some develop from a unicellular
A developmental explanation explains how a trait zygote as a sort of bottleneck between generations –
came into being through a process starting at the sin- whereas others do not. Lowest level mechanisms
gle-celled stage of the existence of the organism. Since involving genes and gene transcription are likely to
developmentally explaining a trait is an explanation of be very general across multicellular organisms; but
its development, it requires explaining some detail of some higher level traits typical to a clade (i.e., eyes,
the developmental processes which were involved; hearts, etc.) require targeting an appropriate model
“developmental explanation” secondarily means the organism.
explanation of a particular feature, moment, or process In principle, any trait in an individual organism can
of development itself, for example, cell division, apo- receive a developmental explanation. However, the
ptosis, and specific pattern formations. focus is often put on species(or clades)-typical traits:
heart or central nervous system in vertebrates, tetrapod’s
limb, butterflies’ wings striped, etc. Some theoreticians
Characteristics view the scope of developmental explanation as in prin-
ciple limited to species-typical traits and the genealogy
Development has narrow and large meanings: very of a species type, then of a family type, etc., as in early
narrowly, it is the embryogenesis, that is, the process descriptive embryology. This conflicts with the core
leading from conception to hatching or birth; less Darwinian idea that variation is proper to species.
E 710 Explanation, Developmental

Inversely, variants in a population can often be Alternative Epistemological Options in


explained by a same core mechanism, whose various Developmental Studies
outputs depend on the initial conditions, and which will Embryology emerged in a context where debates
yield both statistically typical and also eccentrically opposing ▶ preformation and epigenesis were salient.
varying forms (e.g., stripes on echinoderms). But this The current idea that genes and ▶ epigenetics both
idea of a developmental matrix of variation seems not contribute to development sounds like a synthesis of
enough to solve the issue of the individual versus typical preformationism and epigeneticism. More generally,
scope of developmental explanations. developmental explanations can be divided into two
Developmental explanations explain the generation styles (often combined): those which pinpoint pro-
of organic form and therefore, given that individuals of cesses occurring step after step within development,
a same species are typically like their parents, contrib- and those which specify some informational content or
ute to explain the conservation of form across time. specified dispositions present along development.
The transmission of traits across generation is “inher- Two other general stances in biology have been
itance”; for biologists of the nineteenth century, conflicting within embryology: vitalism and mechanism.
explaining development was supposed to explain Vitalists hold that development presents us with the
inheritance, whereas the Modern Synthesis (Darwin- clearest expression of vital forces, what Driesch called
ian) dissociated inheritance (explained by Mendelian “entelechies,” supposed to make sense of the classical
transmission of characters from parents to offspring) sea urchin experiments (▶ Developmental Biology,
from development, as the process of growing individ- Classical Sea Urchin Experiments). Mechanists claim
ual forms. The integration of Mendelian genetics with that physical and chemical forces are alone at work in
Darwinism from the 1930s underpinned the split development, a process on a par with other chemical
between developmental and evolutionary theories, processes like fermentation or corruption. Famously,
and the current call for a new synthesis between devel- the school of Entwicklungsmechanik (Roux, His), in
opment and evolution (Gilbert et al. 1996). In a radical late 1800s, studied development by experimental con-
developmentalist perspective, the constancy of the trolled perturbation, in order to characterize the condi-
main developmental processes (and not the transmis- tions under which physical processes result in an
sion of traits) explains the reproduction of form across embryogenesis. Nowadays, nobody endorses vitalism,
generation and then contributes to explain the conser- yet there is an informal consensus about the fact that
vation of form. some “▶ emergence” occurs in development, even if
Besides physiology – the science of the functioning what this means is still debated. But what shaped modern
of adult organisms – embryology has been constituted developmental explanations, in its difference with initial
as the science of study of structure and function of embryology initiated by Wolff (1764) and culminating
developing organisms, including the function of with von Baer (1828), is cell theory and Mendelian
developmental processes. In this sense, and given genetics, cytology, and molecular biology.
that functionality as well as persistent form meta-
physically characterize organisms, their difference Cells, Genes, and Epigenetics
exemplified the two poles of biology, identified in Elaborated by Schwann, Schleiden, and then Virchow
1911 by E.S. Russell in Form and function: the in the late 1800s, cell theory rooted the concepts of
science of function and the science of form. Devel- embryologists – layers, tissues, organs – within their
opmental explanations typically instantiate the substrate, namely, cells (Mazzarello 1999). Cell prolif-
explanatory style proper to questions about biolog- eration proved to be the basic operation which
ical form. Many of the current controversies about underpinned embryogenetic development as well as
the purported absence of development within evo- reproduction (the zygote results from cells from the
lutionary biology express this conflict between mother and the father). Genes, postulated by Mendelian
these two stances on biology (Amundson 2005). theory as the carriers of inheritance, turned out to be
Especially, ▶ adaptationism in evolutionary explana- localized on chromosomes and finally identified to
tion neglects the coherence of organismal form (in favor their material substrate, DNA in 1953. (However, the
of explaining isolated traits as optima), whereas devel- assumed identity between genes of molecular biology
opmental explanation aims at accounting for it. and the genes of classical transmission genetics is
Explanation, Developmental 711 E
highly controversial (Moss 2003)). Genes have then are sought, which are able to switch off and on genes.
been central both in evolutionary (▶ Explanation, Evo- While some genes are precisely acting upon other genes
lutionary) and in developmental explanations because (Regulatory Genes), others are expressing proteins
they have been seen both as substrate for inheritance required for building up the body. Developmental expla-
(which pertains to evolution) and as causes of develop- nations are thereby in principle multiscale explanations
ment (as they code for traits). The extreme view on the because causal factors intervene at each level (signals in
role of genes in development consists in claiming that the tissues, e.g., gradients of a substance, signals within
they are a program (▶ Evolution Programming) for the cell and genetic instructions, signaling inter-tissues).
development (or a “recipe,” as Dawkins says), which is A specific set of genes, developmental genes, con-
undoubtedly the modern form of preformationism, dition across many eukaryote phyla the anteroposterior
whereas this has been challenged by recent advances in and dorsoventral axes, as well as many other feature E
our knowledge of ▶ epigenetics, as well as by the ongo- which seemed previously to be analogous (such as
ing understanding that “genes” are much more complex eyes). Understanding the specific timing of the activa-
than a mere linear continuous strain of coding DNA. tion of those genes is crucial to understand the process
Each cell in a eukaryotic multicellular organism has of development. Some biologists (e.g., Carroll 2005)
the same DNA in its nucleus, which conditions pro- would reconcile evolutionary and developmental per-
teins and then organismal traits. The genesis of the spectives by seeing developmental theory as the genet-
organisms is precisely the process through which the ics and epigenetics of developmental genes, yet other
zygote multiplies and differentiates into various cell Evo-Devo scientists (e.g., M€uller and Newman 2005)
types which in their turn combine into various tissues view this option as still too much gene-centered.
and organs, stemming from the various layers, which
emerged in the first stage (blastula) of the process, and Cascades, Signals and Networks
then brings about organogenesis. ▶ Mitosis explains Models of development involve various kinds of infor-
cell multiplication; however, there remains the ques- mation: instructions coded in genome, and signals
tion of why and how it is the case that, for example, exchanged between cell environments and receptors in
cells in the brain develop into neurons whereas cells in cells (Gilbert 2003). The role of the latter has been
the skin develop into epithelial cells; hence develop- modeled by the “French flag model” proposed by
mental explanations first aim at uncovering the reasons Lewis Wolpert, which provides a general view on pattern
why each cell expresses distinct genetic resources. formation (e.g., veins in insect wings, cartilages in ver-
Morphogenesis, or pattern formation, assuming cell tebrate limbs) (Wolpert 1994). Cell types develop
differentiation, is the second issue to be solved. according to the gradient of a “morphogen” produced
Genes are either expressed (active) or repressed. by the cells, a threshold in the gradient triggering
Their expression is mediated by signals from external a specific gene expression in some cells, and then a cell
environment factors, which allow a cell to express the type; the continuity of the diffusion of the morphogen
“right” genes in the place it is, and therefore, produce through the gradient translates into the discontinuity of
proteins, change this environment and contribute to the cell types, the concentration of the product due to its
extant signals around it, so that it will trigger other position informs on the cell fates (Fig. 1). This dialectics
events of gene expression; reciprocally, this cellular between genetic instructions and “positional” informa-
environment contains products like transcription fac- tion provides a scheme of genes-cells-tissues interaction
tors already synthesized by some genes. ruling morphogenesis. Philosophers should note the per-
Essential in developmental explanation is the identifi- vasiveness of a vocabulary such as: cells interpret, read
cation of episodes of ▶ embryonic induction, where a cell their position, etc. Even if those words are rather theo-
is instructed to a specific cell fate. Identifying morphoge- retical reformulations than metaphors, one can still ques-
netic fields as set of cells likely to undergo a specific fate tion whether they are reformulated according to a unique
(limb, eye, etc.) is a crucial step in such explanation, even information-theoretical framework.
if morphogenetic fields can be transformed. A developmental explanation therefore can appear
The (study of the) set of nongenetic factors, intra- or as a cascade of events involving signaling, production
extracellular, which conditions gene expression, is of morphogens, expression and repression of gene
often called ▶ epigenetics. Hence, epigenetic factors products. Modeling those cascades and identifying
E 712 Explanation, Developmental

a the morphogens as well as their propagation dynamics


and the cells’ receptivity mechanisms yields an expla-
nation of morphogenesis, exemplified by the emer-
gence of asymmetry in the heart (▶ Asymmetry of
the Heart, Development).
The nonlinear, complex relationships between thou-
sands of regulatory genes in a cell, transcription factors,
and finally a cell lineage involved in morphogenesis are
likely to be mathematically modeled by networks
(Davidson 1986). The topology of the network represents
the signaling, inducing, repressing, and expressing path-
b Each cell has the potential to
ways, whereas its dynamics captures the variation in
develop as bule, white, or red quantitative variables (amount of genes products, etc.)
according to some equations (e.g., Michaelis-Menten,
etc.). ▶ Gene Regulatory Networks (GRN), since the
initial investigation of the GRN proper to Endo 16 in
sea urchins by Davidson, have proved therefore to be
crucially involved in the formation of patterns during
Position of each cell is defined development (Davidson et al. 2003). The network’s view
by the concentration of
morphogen is powerful but, while it avoids a crude idea of a genetic
program ran by development, it still gives genes and
transcripts the leading role in developmental processes.
An alternative view emphasizes the fundamental role of
Concentration
of morphogen

a limited set of molecules (e.g., cadherin) involved in


many developmental episodes that is pervasive across
many clades, finally compacted into developmental
building blocks (DPM, “Dynamical patterning mod-
ules,” see Forgacs and Newman 2005; Gilbert et al.
1 2 3 4 5 6 1996), irrespective of which genes will be recruited to
supervise development. GRN are clearly developmental
modules. At a higher level, endoderm and mesoderm are
Position value is interpreted by also developmental modules, as well as Newman’s DPN.
the cells which differentiate
to form a pattern
Plausibly, alternative interpretations of developmental
explanation – for example, more or less gene-oriented –
are also distinguishable through the way they identify
Threshold developmental modules. This is because they do not use
Concentration
of morphogen

the same criteria to pinpoint them.


Nevertheless, GRN are also likely to account for
many features of developmental processes, especially
▶ canalization (because they offer multiple pathways
between a given output and input, so that the failure of
some genes does not often affect the final product) and
▶ plasticity, two key issues for Evo-Devo.

Explanation, Developmental, Fig. 1 (a) The French flag Articulating Explanatory Regimes
model pattern. (b) The causal underpinning of the flag pattern “Developmental explanation” in general means
(After Kerszberg and Wolpert 2007) explaining a trait by unraveling a developmental process.
Explanation, Evolutionary 713 E
When and how are developmental explanations needed? ▶ Plasticity
According to Ernst Mayr (1961), developmental expla- ▶ Preformation and Epigenesis
nations pertain to proximate causes of phenomena, and
evolutionary explanations (▶ Explanation, Evolution-
References
ary) to their ultimate causes. However, given that natural
selection requires variation and that some variation relies Amundson R (2005) The changing role of embryo in evolutionary
upon possibilities given by developmental processes theory. Cambridge University Press, Cambridge
(and not mere mutations and recombinations), develop- Carroll S (2005) Endless forms most beautiful: the new science
mental explanations may be implied by evolutionary of evo devo and the making of the animal kingdom. Norton,
New-York
questions. In general “developmental ▶ constraints” Davidson EH (1986) Gene activity in early development.
condition the generation of variation, through which Academic, Orlando
E
selection shapes traits and organisms. Therefore, evo- Davidson E, McClay D, Hood L (2003) Regulatory gene net-
lutionary and developmental explanations seem com- works and the properties of the developmental process.
PNAS 100(4):1475–1480
plementary, the former specifying which variations Forgacs G, Newman S (2005) Biological physics of the developing
are available and the second explaining which actual embryo. Cambridge University Press, Cambridge
variants will emerge on the basis of these variations. Gilbert S, Opitz G, Raff R (1996) Resynthesizing evolutionary
However, it may be that sometimes a developmental and developmental biology. Development and evolution
173:357–372
explanation is sufficient because it unravels a physico- Gilbert S (2003) Developmental biology. Sinauer, Sunderland
chemical structure involved in a large set of phyla, and Kerszberg M, Wolpert L (2007) Specifying positional information
therefore explains the pervasiveness of some traits in the embryo: looking beyond morphogens. Cell 130:205–210
without a need to appeal to selective pressures. The Mayr E (1961) Cause and effect in biology. Science
134(3489):1501–1506
self-organizing gene networks investigated by Mazzarello P (1999) A unifying concept: the history of cell
Kauffmann’s Origins of Order (1993) provide devel- theory. Nat Cell Biol 1:E13–E15
opmental explanations of this type. An example would Moss L (2003) What genes can’t do. MIT Press, Cambridge, MA
be the modularity of cell metabolic networks: even if it M€uller G, Newman S (2005) The innovation triad: an evo-devo
agenda. J Exp Zool 304B(6):487–503
seems that modularity confers a selective advantage Solé R, Valverde S (2008) Spontaneous emergence of modular-
(through ▶ robustness), so that natural selection seems ity in cellular networks. J R Soc Interface 5:129–133
responsible for the fact that all cell types in many clades von Baer KE (1828) Entwickelungsgeschichte der Thiere:
exhibit modular networks, the mere topology of net- Beobachtung und Reflexion, Borntr€ager, K€ onigsberg
West Eberhard MJ (2003) Developmental plasticity and evolu-
works will indeed promote mostly modular networks tion. Oxford University Press, Oxford
without the need for natural selection to prune a set of Wolff CF (1764) Theorie von der generation. Birnstiel, Berlin
randomly varying networks (Solé and Valverde 2008). Wolpert L (1994) Positional information and pattern formation
In cases like this, the developmental explanation seems in development. Dev Genet 15:485–490
enough to explain the generality of a trait, without
hypothesizing a fixation through natural selection. The
final status of Evo-Devo seems is tied to a decision about Explanation, Evolutionary
whether such cases are exceptions, or not.
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Cross-References Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France
▶ Canalization
▶ Emergence
▶ Epigenetics Definition
▶ Explanation, Evolutionary
▶ Gene Regulatory Networks Evolutionary theory is the general framework for mod-
▶ Model Organism ern biology, in the sense that all living phenomena have
E 714 Explanation, Evolutionary

an evolutionary history which somehow accounts for process of differential reproduction of individuals carry-
them being the way they are. Ernst Mayr usefully distin- ing traits which give them different chances of having
guished two sets of inquiries in biology: the “functional their traits represented in the next generation. As soon as
biology,” looking for “proximate causes” of a trait in an a population of individuals has traits which are varying
organism, that is, causes pertaining to the lifetime of the and heritable, and if those traits causally influence their
individual, and the “evolutionary biology,” looking for chances of reproduction, then we have natural selection.
“ultimate causes,” namely, causes at the level of the This third property means that those traits have, or
history of the species to which belongs the individual. contribute to, ▶ fitness. Finally, drift is the statistical
The former includes physiology, molecular biology, effect due to the fact that generation after generation
developmental biology, etc., whereas the latter includes a stochastic sampling of the population takes place.
paleontology, population genetics, behavioral ecology, The smaller the population, the higher the expected
systematics, etc. Evolutionary explanation is the set of amount of drift; whereas in an infinite population, selec-
explanatory styles to be met in this field (Ridley 2001). tion would be the sole causal factor, hence the fittest traits
or alleles would in general go to fixation.
Thus, evolutionary explanation is always
Characteristics a population level explanation. All factors are differen-
tial, so the whole process needs a population of varying
Explananda and Explanantia individual; it contrasts with an explanation which would
Evolutionary theory, as it was formulated by Darwin, consider an individual and explain its traits by investi-
holds two key ideas: all living forms share a common gating how they developed. To this extent, population
history and (as the naturalists hypothesized) a common level and individual level explanations are rather com-
origin; natural selection is a process responsible for many plementary than competing: “why does individual
of the features of this Tree, and of the species and organ- A have trait X?” can be answered by considering its
isms it includes. In the usual theory, called the “Modern history, whereas population level explanations answer
Synthesis,” namely, a synthesis of Darwinism and Men- questions such as: “why does each individual in the
delism on the basis of population genetics in the 1930s, population have trait X?,” or “why do many of them
evolution is related to the dynamics of gene frequencies. (or: a minority) have trait X?” (e.g., why are some people
Let us first specify what evolutionary explanations left-handed?)
are intended to explain. First, they should explain
evolution; then, they should explain the diversity of Two Evolutionary Questions
living beings; the adaptedness of organisms to their Evolutionary explanations often build models, analytic
environments – on the basis of the fact that they display or simulated, which on plausible assumptions describe
features which suit them to their environment, for the evolutionary dynamics at stake or the way it may
example, the webbed feet of ducks are suited to swim- have produced the explanandum; model-building and
ming. So they explain the traits themselves, which comparing with data are the two stages of the expla-
are adaptations. Given that the Modern Synthesis nation. ▶ Fitness and population size influence evolu-
has defined evolution as the “change in allelic tionary dynamics (as well as migration and mutation
frequencies,” a first explanandum is the change in the rates). Yet ▶ fitness is a probabilistic term, and has
relative frequency of genes. The existence of a trait in causes: all the interactions within which an organism
this context is explained when the fixation of a gene can engage, and into which the trait of interest is
underlying it is explained. involved, determine in the end the fitness value of the
The set of explanantia mostly used to explain those trait or of the organism. This entails a difference
explananda are natural selection, genetic drift, and “con- between two kinds of questions:
straints.” The last of these is loosely defined. It includes (a) How does selection proceed? With fitness as
“phylogenetics inertia,” that is, the fact that a same trait a variable, one can build a model describing the
in a population is indeed a character state of an ancestor dynamics of a population of alleles with specific
and did not happen through natural selection even if it is genetic structure (epistasis, pleiotropy, dominance,
adaptive. It also means “developmental constraints” (see etc.) and fitness values. This is what population
▶ Explanation, Developmental). Natural selection is the genetics does.
Explanation, Evolutionary 715 E
(b) Why is/was there selection? One can investigate the
causes of the fitness values, which implies consider-
ing the physiology of individuals and the ecology of
the population. Behavioral ecology, asking about the
▶ function of various traits – that is, the effect which E
led them to spread into the population and allows for
its maintenance – exemplifies such research pro-
gram, as well as paleontology.

In (a), population genetics explicates the process of


natural selection and shows which combinations of
S
t*
t E
fitness values, size, and mutation rates will plausibly
Explanation, Evolutionary, Fig. 1 Optimality model for
lead a trait to fixation. In this context, an evolutionary time t spent in patches when foraging (E is calorific intake, s is
explanation of the frequency of traits in a population the mean time spent to move between patches); t* is optimal
would start from equilibrium considerations: the because of the law of diminishing returns, which makes increasing
▶ Hardy-Weinberg law for two alleles with initial t less and less beneficial (After Charnov 1976)
frequencies p and q in a sexual panmictic population
defines the equilibrium frequencies that each allele selection, being the over-reproduction of the fittest,
would have with no selection in a panmictic infinite optimized the traits. Comparisons between species
isolated population. This is a null hypothesis for the are also used in addition, or independently.
system, like the inertia principle in mechanics (Sober
1984). Then, one compares the actual population to the Optimality Modeling
model; if there is deviation from this equilibrium, one If one can show how a trait makes the highest contri-
will hypothesize drift and selection as causes. Factor- bution to fitness among all the actual or possible var-
ing in the fitness values as they are measured in nature iants, then it is highly plausible that it results from
in the equations, will give predictions about the natural selection. Often, one picks up a proxy for
frequencies; differences with such frequencies in fitness, such as energy intake, chances of survival at
the actual population are therefore caused by drift some age, etc., not necessarily correlated to fitness in
(Gillespie 2004). In several cases, an explanation can a linear way. An evolutionary explanation would
prove that a trait is here because of selection, while not therefore model one of these values as a function of
being able to show what selection was for – because the trait, given some parameters representing the selec-
the causes of fitness or selection are unknown, ecolog- tive pressures in the environment (Fig. 1): for example,
ical context being insufficiently understood. the scarcity of resources, the chances of meeting
With question (b) one is interested in knowing why a predator, and so on. If the extant trait values in
a trait is there, whether it is by selection and for which various environments are the maxima of the function,
selective pressures. The pervasiveness of altruism (i.e., this is evidence for the trait being selected for meeting
behaviors increasing other’s fitness at the cost of one’s those environmental demands. The optimal value for
fitness), whereas selfishness by definition seems one selective pressure may not be the same as the
privileged by natural selection, compelled biologists optimum for another; hence the result of natural selec-
to consider selection at the level of genes, where altru- tion should be a trade-off between those demands,
ism toward kin appear evolutionarily favored since the for example, foraging and looking for mates. As
▶ relatedness of the beneficiary of the act mitigates the a methodology, this ▶ adaptationism is extensively
cost of altruism. Thus “kin selection” (West et al. used in behavioral ecology. If reality does not match
2007), together with ▶ sexual selection, is another an optimality model, one can say either that the selec-
form of natural selection to which evolutionary expla- tive pressures have not been correctly identified, or that
nations must appeal. genetic structure, amount of genetic variation, drift, or
Several kinds of explanations are therefore possi- other factors affect the evolutionary process, so that the
ble, which need not take the genetic makeup into models only show what the trait would be if selection
account, and which involve basically the idea that were acting alone.
E 716 Explanation, Evolutionary

Game-Theoretic Modeling Experimental tests may be used, for instance,


Often, the fitness of traits depends on the frequency changing the trait value enables one to check whether
of traits-carriers in a population: for example, selective pressures exist which will reestablish the
a camouflage is all the more efficient when it is not original value of the trait – a strong argument for its
so frequent. Frequency dependence is pervasive when being maintained by selection (Williams 1966).
it comes to social interactions, where the payoffs of the A complete explanation of a trait therefore relies on
possible traits depend upon what the others will do, so ecological knowledge (of the environment), phyloge-
that the evolutionary dynamics at one moment in turn netic knowledge (for comparisons), and genetics (the
depends upon the chances of meeting an interactor of underlying genetic makeup, and the heritability of
a given type (e.g., altruist, selfish, aggressive, etc.). traits) (Brandon 1990).
Evolutionary explanation of social behavior therefore
relies on Evolutionary Stable Strategy models Formal Justifications
(Maynard-Smith 1982), an ESS being a strategy such Evolutionary biology is a historical science – hence
that if all the population adopts it, it is not vulnerable to acknowledging an important role for contingency – but
invasions. Often, ESS is a mixed strategy (e.g., coop- heavily relies on mathematical modeling (e.g.,
erate at a chance 0.7, defect at a chance 0.3). The ▶ Hamilton’s rule for the evolution of altruism). Adap-
distribution of traits in a population matching what an tive dynamics provide for those models a “canonical
ESS model would predict is clear evidence that the equation” of change of fitnesses, yet the equation is
maintenance of those traits results from natural selec- often not solvable (Metz 2008). Many equations of the
tion. Often, ESS models are devised in the frameworks population genetics models can be derived from the
of ▶ replicator dynamics. Recently, “adaptive dynam- ▶ Price equation. Other models of natural selection are
ics” (Metz 2008) has provided a specific way of model- stated in terms of ▶ replicator dynamics (especially
ing evolutionary dynamics on the basis of “invasion with ESS). Many models are indeed simulated rather
fitness,” that is, the probability of each mutant (individ- than analytically solved.
ual or species) to invade a population, therefore synthe- Optimization as an assumption is supposedly
sizing population genetics, behavioral, and community derived from Fisher’s fundamental theorem of natural
ecology. selection, which states that the mean fitness of
Those explanations concern the maintenance of a population increases equally to the genetic additive
traits, rather than their origin; ESS models are built variance (by nature always positive). Fisher’s result
on the assumption of a strategy set but the evolutionary has been discussed and is indeed very restricted,
origins of strategies do not matter. Explaining origins concerning only the evolutionary change in fitness
of traits is another evolutionary question, which one due to the direct action of natural selection (Frank
undertakes, for example, in paleontology, where the and Slatkin 1992). It can be derived from the
environment of the organisms is not known. In such Price equation, when the value z considered is the
context, an available method is reverse engineering: fitness itself. Optimality assumptions of behavioral
given the traits and their mechanisms, reconstruct the ecologists seem at odds with results of population
environmental demands to which they may have genetics emphasizing the large restrictions on
responded. optimization (especially, in cases of frequency-
dependence selection, e.g., social interactions).
Comparisons However, there is now (Grafen 2007) a formal
Comparing a trait in an environment with one in a sister proof of an isomorphism between population genet-
or ancestor species, and whose origin is already known, ics models of natural selection dynamics and opti-
allows one to check whether it is simply inherited, or mization, which provides a mathematical basis to
whether it is an adaptation for analogous selective pres- the implicit pervasive intuition that natural selec-
sures (Endler 1986). Those comparisons may justify tion is an optimizing process. The maximand of
a claim that some trait results from selection, but also such a process is inclusive ▶ fitness. Therefore,
supplement other evolutionary models, for example, in optimality modeling and ESS receive a formal
order to exclude rival nonselective hypotheses. Compar- justification from the mathematics of allele
ative methods are therefore pervasive. frequencies.
Explanation, Functional 717 E
Cross-References Definition

▶ Adaptation Functional explanations explain why certain organisms


▶ Fitness have certain traits rather than some conceivable alterna-
▶ Function tives by appealing to the advantages for those organisms
▶ Price Equation of having those traits rather than the alternatives.
▶ Relatedness
▶ Replicator Dynamics
▶ Sexual Selection Characteristics

The term “functional explanation” is sometimes used in E


References a very wide sense, meaning any explanation in functional
biology or even any explanation that refers to functions
Brandon R (1990) Adaptation and environment. MIT Press, in any sense of “function” (▶ Function, Biological). This
Cambridge
essay discusses functional explanations in a narrower
Charnov EL (1976) Optimal foraging: the marginal value
theorem. Theor Popul Biol 9:129–136 sense, namely, reasonings that purport to explain why
Endler J (1986) Natural selection in the wild. Princeton Univer- certain organisms have a certain trait by elucidating why
sity Press, Princeton that trait is advantageous to or needed by those organ-
Frank S, Slatkin M (1992) Fisher’s fundamental theorem of
isms. An example is the well-known explanation of why
natural selection. TREE 7(3):92–95
Gillespie J (2004) Population genetics – a concise guide. The many organisms have a circulatory system by Nobel
John Hopkins University Press, Baltimore Prize winner August Krogh in 1919 (see Krogh 1941).
Grafen A (2007) The formal Darwinism project: a mid-term Applying Fick’s law of diffusion, Krogh calculated that
report. J Evol Biol 20:1243–1254
diffusion (passive transport) is too slow to provide the
Maynard-Smith J (1982) Evolution and the theory of games.
Cambridge University Press, Cambridge inner organs with oxygen at the rate needed to stay alive
Metz JAJ (2008) Fitness. In: Jørgensen S, Fath B (eds) Evolu- if the distance between the inner cells and the outside is
tionary ecology. Encyclopedia of ecology II. Elsevier, more than 0.5 mm. The presence of a circulatory system
Oxford, pp 1599–1612
solves this problem by providing an additional and faster
Ridley M (2001) Evolution. Oxford University Press,
Oxford means of transport. To avoid terminological confusion,
Sober E (1984) The nature of selection. MIT Press, philosopher Arno Wouters introduced the terms “viabil-
Cambridge ity explanation” (Wouters 1995) and “design explana-
West S, Griffin A, Gardner A (2007) Social semantics: altruism,
cooperation, mutualism, strong reciprocity and group selec-
tion” (e.g., Wouters 2007) to refer to functional
tion. J Evol Biol 20:415–432 explanations in this sense. Behavioral biologists and
Williams GC (1966) Adaptation and natural selection. Princeton evolutionary biologists often call them “ecological
University Press, Princeton explanations.”
Functional explanations are part and parcel of
biology, but they are not considered legitimate in other
natural sciences such as physics and chemistry. Reduc-
tionist and structuralist schools in biology tend to reject
Explanation, Functional functional approaches in biology because they would
rest on illegitimate teleological assumptions. Another
Arno Wouters worry concerns the possibility to provide evidence for
Department of Philosophy, Erasmus University counterfactual claims concerning what would happen if
Rotterdam, Rotterdam, The Netherlands an organism lacked the trait to be explained.
Functional explanations are concerned with the
way in which the different traits of a living
Synonyms system (organism) functionally depend on each other.
Functional explanations point out that because an organ-
Design explanation; Ecological explanation; Viability ism has certain traits (in the example of Krogh: a certain
explanation size and a certain level of activity), its ability to maintain
E 718 Explanation, Functional

the living state would diminish if the trait to be explained the rate of oxygen delivery functionally depend on
(the presence of a circulatory system) was replaced active transport (follows from Fick’s law of diffusion,
by a specific alternative (no active transport). The a well-known law of physical chemistry), (b) the rate
explaining traits are functionally dependent on the trait of oxygen consumption functionally depends on the
to be explained in the sense that the ability to maintain rate of oxygen delivery (follows from the obvious
the living state of an organism with the explaining traits invariance that the rate of consumption cannot be higher
would diminish if the trait to be explained was replaced than the rate of delivery), and (c) the rate of activity of the
by the alternative, whereas replacing the trait to be organism functionally depends on the rate of oxygen
explained would not make much difference if the organ- consumption (follows from a well-established invariant
ism lacked the explaining traits. For example, having relation (very roughly: the more active the organism, the
a certain size and activity is functionally dependent on more oxygen it consumes) that results from the way in
the presence of active transport because removal of which those organisms generate the energy needed for
active transport would diminish the life chances of their activities).
organisms with that size and activity, whereas the Several types of evidence support or undermine
replacement would have little effect on that organism’s claims about the existence of a relation of functional
capacity to maintain itself if it were small enough and not dependence. Comparisons provide an important clue. It
very active. is necessary that all organisms with the dependent traits
Functional dependence is a synchronic relation at have the trait to be explained or a functional equivalent,
the level of the individual (the size and activity of an but the support provided by comparisons for the conclu-
organism are dependent on that organism having sion that a certain trait is dependent on others is rather
a system of active transport). The relation is not weak. Experimental manipulation provides better evi-
of a causal nature (see ▶ Causality), but rather dence. By producing organisms that are in relevant
a ▶ constraint on what can be alive: our universe is aspects similar to the real organism, except that the trait
such that living systems of a certain size and activity to be explained is replaced by an alternative, one gets
cannot exist without mechanisms for active transport. good evidence for the advantageousness of that trait. In
There may, of course, exist causal relations between order to provide evidence for or against the thesis that
the trait to be explained and its dependents (perhaps the the need is due to the presence of the dependent traits,
transport system was maintained in the lineage because the dependent traits should be modified too. Techniques
variants with a less well-developed transport system of genetic manipulation have greatly increased the possi-
were less active and for that reason eliminated bilities to provide this kind of evidence. The explanation
by natural selection or perhaps the activity of the of a functional dependency on the basis of invariances is
developing organism influences the development of strong evidence for the existence of that dependency,
a transport system in the course of the ontogeny), but provided that both the invariances and the assumptions
the functional dependence exists independent of the on which the explanation rests are well established.
history of the organism and hence independent of Finally, the development of simulation models in systems
these causal relations (it would exist also if the same biology offers a new and powerful way to provide evi-
organism arose out of different ontogenetic and dence for or against functional claims because they allow
evolutionary processes). for precise modifications of large numbers of relevant and
Functional explanations not merely identify the traits potentially relevant variables in silico.
that are functionally dependent on the trait to be We can now see that functional explanations do not
explained, but they also account for the dependency assume ▶ teleology. Functional explanations are
itself. This is done by relating the dependency to invari- concerned with what is needed or advantageous for
ant relations that result from the way the organism is staying alive, but they do not tell us how the
wired and from more general invariant relations that required traits are brought about and, in fact, make no
scientists call “laws.” Such an explanation often intro- assumptions at all about how or why living systems and
duces a structure of functional dependencies intermedi- their traits come into being. For that very reason, they do
ate between the trait to be explained and the dependent not assume that traits are brought about because of their
traits. For example, the structure of functional dependen- advantages, in order to fit the requirements or because
cies in Krogh’s explanation is as follows: (a) the size and something or someone had a certain goal.
Explanation, Reductive 719 E
Functional explanations are, nevertheless, crucial evidence about the variants that occur in the
to understand life because they show us how the population, the occurrence of selection, the heritability
characteristics of an organism fit into the requirements of the trait, the structure of the population, and
for being alive. Living systems exist far from phylogenetic polarity, in addition to evidence about the
thermodynamic equilibrium and can exist only because advantageousness of the trait) (see ▶ Explanation, Evo-
they are able to maintain themselves actively (i.e., by lutionary). It is also unfortunate that this way of
using energy). Although there are many ways to stay presenting functional explanations has led some critics
alive, not all combinations of matter will do. The ability to reject functional explanations as mere speculation,
of a system to maintain the living state is critically depen- thus ignoring the valid insights provided by functional
dent on its organization, that is, on the composition, explanations together with the unsubstantiated evolu-
character, and arrangement of its parts and the order and tionary conclusions drawn from them. E
timing of its activities. Causal explanations can tell us
how a certain form of organization came into being and
how that form of organization brings about the ability to Cross-References
stay alive, but not why certain forms of organization are
able to stay alive and others not nor why certain forms of ▶ Causality
organization are better in staying alive than others. This is ▶ Constraint
the kind of understanding provided by functional expla- ▶ Explanation in Biology
nations, and this is why a combination of functional and ▶ Explanation, Evolutionary
causal explanations is needed to understand life (see ▶ Function, Biological
▶ Explanation in Biology). ▶ Teleology
The failure to understand the noncausal nature of
functional explanations and the failure to understand
how noncausal explanations can contribute to scientific References
understanding have instigated or strengthened many
misconceptions. If it is, erroneously, assumed that all Krogh A (1941) The comparative physiology of respiratory
explanations purport to explain how the phenomenon mechanisms. University of Pennsylvania Press, Philadelphia
Wouters AG (1995) Viability explanation. Biol Philos
to be explained was brought about, one might easily,
10(4):435–457
but erroneously, take functional explanations as resting Wouters AG (2007) Design explanation: determining the
on the teleological assumption that traits of organisms constraints on what can be alive. Erkenntnis 67(1):65–80
are brought about because they perform a certain func-
tion. In response to this misconception, one might reject
functional explanations, erroneously, as illegitimately
teleological, as both structuralist and reductionist Explanation, Reductive
biologists tend to do, or look for a place for teleology
within the Darwinian theory of evolution, as many nat- Marie I. Kaiser
uralistic philosophers tend to do. The idea that explana- Department of Philosophy, University of Cologne,
tions should explain how the trait to be explained is Cologne, Germany
brought about might also lie behind the tendency to
present the conclusion of functional explanations in evo-
lutionary terms. Several books advertise a functional Synonyms
approach as an evolutionary one, and an increasing num-
ber of articles present functional explanations as showing Explanatory reduction
that a certain trait “evolved as an adaptation for” or “was
selected for” some advantage, need, or dependent trait.
This is unfortunate not only because of the teleological Definition
odor of this kind of conclusion but also (and more
importantly) because it suggests much more than the Reductive explanations (or explanatory ▶ reduction)
evidence allows (claims about selection require are a special kind of scientific ▶ explanation. It has
E 720 Explanatory Reduction

been argued that three central features distinguish reduc- Kaiser MI (2011) Limits of reductionism in the life sciences.
tive explanations in biology from non-reductive ones Hist Philos Life Sci 33:389–396
Sarkar S (1998) Genetics and reductionism. Cambridge Univer-
(Kaiser 2011; for other analyses see Sarkar 1998; sity Press, Cambridge
H€uttemann and Love 2011). The first and third are nec-
essary conditions for a biological explanation to be
reductive, whereas the second is only a feature that is
typical for most (but not for all) reductive explanations in
biology. Explanatory Reduction
In reductive explanations, the behavior of
a biological system S is explained ▶ Explanation, Reductive
1. by referring only to factors that are located on
a lower level of organization than S,
2. by focusing on factors that are internal to S (i.e.,
that are genuine parts of S), Exploratory Experimentation
3. by describing parts of S only as being parts in
isolation (i.e., by appealing only to those relational Richard M. Burian
properties of and interactions between parts that can Department of Philosophy, Virginia Tech, Blacksburg,
be studied in other contexts than in situ). VA, USA
For instance, the contraction of a muscle
fiber is explained by appealing to certain molecules
(e.g., actin, myosin, calcium ions), cell organelles Definition
(e.g., sarcoplasmic reticulum), and by describing the
interactions between them. This explanation possesses Experiments count as exploratory when the con-
a reductive character because all factors mentioned in the cepts or categories in terms of which results should
explanation are internal to the muscle fiber (condition 2). be understood are not obvious, the experimental
External factors like the incoming neuronal signal are, if methods and instruments for answering the ques-
mentioned at all, simplified as being mere input- or tions are uncertain, or it is necessary first to estab-
background conditions. Furthermore, all explanatorily lish relevant factual correlations in order to
relevant factors (i.e., actin and myosin molecules, cal- characterize the phenomena of a domain and the
cium ions, sarcoplasmic reticulum) are located on regularities that require (perhaps causal) explana-
a lower level of organization than the muscle fiber itself tion. Elliot (2007, 322 ff.) offers a useful taxonomy
(condition 1). Finally, only those interactions between of exploratory experimentation. Rather than testing
the muscle fiber’s parts are represented in the explana- hypotheses; it varies parameters or circumstances to
tion that can be investigated by taking the parts out of the see what will happen; it utilizes background knowl-
original system (i.e., the muscle fiber) and studying their edge (including theories and experimental lore) to
properties in other contexts than in situ (condition 3). establish novel correlations, follow anomalies, seek
improvements in instrumentation and experimental
protocols, and the like; and it employs a variety of
Cross-References systematic strategies to govern appropriate variation
of parameters and appropriate orientation to the
▶ Explanation in Biology primary questions in the background. Thus, some
▶ Reduction of the investigations well suited for exploratory
experimentation
• Aim at developing new experimental technologies
for use in data-intensive research
References • Seek to track causal interactions that cross hierar-
chical levels
H€uttemann A, Love AC (2011) Aspects of reductive explanation
in biological science: intrinsicality, fundamentality, and tem- • Deal with cases in which the state space of relevant
porality. Br J Philos Sci 62:519–549 variables is not well delimited
Exploratory Experimentation 721 E
• Aim at developing new concepts for “fixing phe- Of course, studies of experimental methods and
nomena” calling for explanation by establishing practices always recognized that experiments may
relevant regularities or classifying relevant entities serve other purposes – e.g., to help design or calibrate
(e.g., regulatory RNAs, compounds that act as instruments, standardize reagents, refine key measure-
endocrine disruptors, and protein networks) that ments, establish correlations between changes in
have not yet been well characterized experimentally controlled parameters and dependent
• Seek to resolving anomalies, especially those aris- variables (▶ Experiment). Ian Hacking (1983), helped
ing across disciplinary boundaries in the handling spark a philosophical reexamination of experimental
of complex systems practice in its own right and reinforced recognition
Exploratory experimentation is often a key compo- within philosophy of science that experiment “has
nent in investigations of the importance of potentially a life of its own” and is not simply a handmaiden of E
relevant variables, parameters, and sources of error. theory. Philosophers supporting the “new experimen-
It also facilitates the discovery of unrecognized pat- talism” generally agree that experimental protocols
terns and regularities (▶ Pattern Mining) and the can be freed from pernicious dependence on the
design of instrumentation for investigating open ques- hypotheses or theories under investigation (Arabatzis
tions and for analyzing complex phenomena, particu- 2008; Hacking 1992; Weber 2004) and that experi-
larly when new or revised concepts and classifications ments serving exploratory and inductive purposes
are needed to help explain novel phenomena. It may serve different purposes and should meet different
help to establish system boundaries and resolve anom- evaluative criteria than those designed for hypothesis
alies that have turned up in previous investigations testing.
(Elliott 2007).
Conditions that Exploratory Experiments Should
Meet
Characteristics Several major steps are required for success in explor-
atory experiments. These include:
Philosophical Background • Establishing conditions in which experimental
Until about 1980, mainstream philosophers of science findings are stable (Hacking 1992) and/or system
focused mainly on hypotheses, theories, and their jus- behavior robust to various perturbations (Ferrell
tification rather than discovery and hypothesis genera- 2009)
tion, holding that the latter can neither be analyzed • Establishing which downstream effects follow on
philosophically nor made subject to philosophical small changes in the location, timing, or order of
norms. They paid far more attention to the events
hypothetico-deductive method as opposed to inductive • Iteratively adjusting experimental conditions
methods for inference from data to hypotheses: if jus- and technologies to reduce noise in experimental
tification depends on deduction from true or justified findings
premises, inductive inferences (which are deductively • Rethinking the identities of the “players” (boundary
invalid!) are suspect and, arguably, even illegitimate conditions, entities, mechanisms, parameters,
(▶ Deduction; ▶ Induction). This centrality of theory processes, and unknowns) relevant to the findings
and of hypothetico-deductive methodology is nicely • Iteratively deploying revised or new simulations
exemplified in the philosophy of Sir Karl Popper, and computational and theoretical models of the
who held that science is composed of conjectures experimental systems and situations
from which testable hypotheses are deduced with the These steps are intensively experimental. Satisfactory
aid of further hypotheses (e.g., about boundary condi- results are dependent on
tions, experimental setups, and the behavior of exper- • Careful tuning of experimental techniques and skills
imental instruments): • Integration of appropriate calculational tools and
theoretical commitments
The theoretician puts certain definite questions to the
experimenter, and the latter, by his experiments tries to
• Reworking of the experimental conditions, proto-
elicit a decisive answer to these questions and to no cols, instrumentation, and analytical categories in
others. . . (Popper 1968) terms of which the experiments are described
E 722 Exploratory Experimentation

• Iterative reexamination of the scope of tentative find- to cope with the high dimensionality of the interrelation-
ings and the applicability of the techniques and con- ships involved. Such work often requires innovative
cepts to additional research problems and systems statistical and computational tools as well as significant
• Interactive stabilization of the protocols, descrip- adjustment of terminologies and conceptual apparatus
tive terminology, and experimental findings, with from different disciplines and organisms. A common
iterative procedures for refining the experimental issue in analyzing pathway and networks is, therefore,
apparatus, the calculational tools, and of the ana- managing and reducing the semantic difficulties arising
lyses of patterns in the phenomena and causal path- from differing disciplinary commitments.
ways at issue (Kelder et al. 2010) An important type of investigation drawing on
exploratory experimentation is the interactive revision of
classifications and analyses to discover empirically mean-
Exploratory Experimentation in Systems Biology ingful patterns that can facilitate the production of coher-
Because systems biology is concerned with establishing ent and reliably reproducible results, transportable from
the strength and scope of interactions across many one problem or database to another. Thus the recognition
scales in complex systems that often have partially of novel patterns exhibited in data and/or turned up while
specified architectures and boundaries, exploratory developing novel explanatory concepts is a major purpose
experimentation typically serves different purposes of exploratory experimentation. Pattern recognition of this
and employs different modalities than conventional sort encounters many obstacles in the interdisciplinary
hypothesis testing (e.g., high-throughput technologies approaches to complex problems now prevalent in sys-
and data-intensive research with high levels of noise) tems biology and beyond. An extreme example illustrates
(Franklin 2005; O’Malley 2007). Recent debates about the difficulty: consider recent debates about the (partly
the structure and value of “data-driven research” and anthropogenic) causes and effects of climate change in
“discovery science” are therefore relevant (▶ Data- relation to ecological change and the spread of various
Intensive Research; ▶ Data Mining; ▶ De Novo Com- pathological organisms. The resolution of the issues about
putational Discovery of Motifs; ▶ Functional/Signature which patterns are salient and involve anthropogenic
Network Module for Target Pathway/Gene Discovery; causes must reconcile perspectives from biogeography,
▶ Rule Discovery). Furthermore, much research in sys- biological systematics, ecology, economics, epidemiol-
tems biology is comparative and classificatory, e.g., ogy, genomics, meteorology, molecular biology, paleocli-
comparing the distinctive functions of genes or proteins matology, pathology, and many other disciplines. It is
in different systems, different developmental stages of hardly a surprise that the integration and interpretation of
a developing system, or the networks and pathways of data to attempt a systematic approach to some of the issue
different systems. Such comparisons often require use of involved is, by itself, an enormously difficult problem.
data from multiple databases and disciplines. Ensuring Problems of this sort, though not always as dramatic, arise
robustness of findings and comparability of data from often enough when exploratory experimentation is
diverse sources is often problematic, fraught with the employed in systems biology.
difficulties of developing or meeting the constraints of
curated databases and/or standardized “ontologies” such Conclusion
as the Gene Ontology (▶ Bio-Ontologies; ▶ Disease The most important points of this entry are the distinc-
Ontology; ▶ Gene Ontology; ▶ Ontology Analysis of tiveness of exploratory experimentation and the inade-
Biological Networks; ▶ Ontology Structure). quacy of the characterizations it has received in standard
Again, the relationship between high-throughput accounts of experimental methods. Philosophers and
exploratory experiments and well-controlled hypothesis- methodologists are only beginning to develop models
testing experiments is crucial to evaluation of systems of exploratory experimentation. There has been little
biological experiments. Recently, protocols have been appreciation of the relatively indirect role of hypothesis
developed that combine the two with means of calibrating testing in exploratory experimentation. For example,
findings (e.g., Pfeifer et al. 2010); the resulting mutual standard reviews of experiment in systems biology
reinforcement is potentially very powerful. Often (as (e.g., Kreutz and Timmer 2009) retain strong rhetoric
illustrated by Pfeifer et al.), data from several sources about the need to specify exact hypotheses even when
need to be reconfigured to make the data comparable and they concentrate mainly on issues described here as
Exposure 723 E
pertinent to exploratory experiments. Unlike hypothesis ▶ Gene Ontology
testing, exploratory experiments do not begin with a null ▶ Induction
model and are not able to specify in detail the expected ▶ Ontology Analysis of Biological Networks
outcome of key experiments. They must focus, instead, ▶ Ontology Structure
on reducing noise in the data, on how best to perturb ▶ Pattern Mining
systems to uncover relevant parameters or the triggers of ▶ Rule Discovery
threshold transitions, and on ensuring robustness of
findings. Exploratory experiments do, however, have
a strong basis in (theoretical) background knowledge of References
relevant mechanisms and comparative knowledge of
related systems. One analytical approach of great interest Arabatzis T (2008) Experiment. In: Psillos S, Curd M (eds) The E
Routledge companion to the philosophy of science.
compares the ways in which exploratory experiments
Routledge, London/New York, pp 159–170
utilize complex data sets and revision of concepts and Elliott KC (2007) Varieties of exploratory experimentation in
classifications with the methods utilized by natural his- nanotoxicology. Hist Philos Life Sci 29(3):313–336
tory in dealing with complex data, revision of concepts, Ferrell JE Jr (2009) What is systems biology? J Biol 8:2
Franklin LR (2005) Exploratory experiments. Philos Sci
and revision of classifications (Strasser 2010). The point
72:888–899
is partly that the highly experimental work discussed Hacking I (1983) Representing and intervening: introductory
above seeks to understanding of complex interactions topics in the philosophy of natural science. Cambridge Uni-
in natural conditions among “natural assemblages” of versity Press, Cambridge
Hacking I (1992) The self-vindication of the laboratory sciences.
molecules, genomes, organisms, and a welter of
In: Pickering A (ed) Science as practice and culture. Univer-
interacting mechanisms, “players” and processes – and sity of Chicago Press, Chicago, pp 29–64
this is quite like what is done in natural history except Kelder T, Conklin BR, Evelo CT, Pico AR (2010) Finding the right
that it is in carried out in highly experimental contexts. questions: exploratory pathway analysis to enhance biological
discovery in large datasets. PLoS Biol 8(8):e1000472
Natural historical research is often characterized as
Kreutz C, Timmer J (2009) Systems biology: experimental
“descriptive.” Fair enough. But it should be recognized design. FEBS J 276:923–942
that descriptive findings are equally essential to inten- O’Malley MA (2007) Exploratory experimentation and scien-
sively experimental laboratory research required to tific practice: metagenomics and the proteorhodopsin case.
Hist Philos Life Sci 29(3):337–360
characterize the systems and the processes of interest in
Pfeifer AC, Kaschek D, Bachmann J, Klingm€ uller U, Timmer J
systems biology. This is why broadly inductive uses (2010) Model-based extension of high-throughput to high-
of exploratory experimentation and natural-historical content data. BMC Syst Biol 4:106
methods employing high-throughput technologies are Popper KR (1968) The logic of scientific discovery. Harper and
Row, New York
so central in systems biology. And it is why iterative
Strasser BJ (2010) Laboratories, museums, and the comparative
interaction between the methods of exploratory experi- perspective: Alan A. Boyden’s serological taxonomy,
mentation and those of hypothesis testing is crucial to the 1925–1962. Hist Stud Nat Sci 40(2):149–182
ongoing development of systems biology (Kelder et al. Weber M (2004) Philosophy of experimental biology.
Cambridge University Press, Cambridge and New York
2010).

Cross-References Exposure

▶ Bio-Ontologies Catherine M. Lloyd


▶ Data Mining Auckland Bioengineering Institute, University of
▶ Data-Intensive Research Auckland, Auckland, New Zealand
▶ De Novo Computational Discovery of Motifs
▶ Deduction
▶ Disease Ontology Definition
▶ Experiment
▶ Functional/Signature Network Module for Target An exposure in the CellML model repository refers to
Pathway/Gene Discovery the published view of a specific revision of a CellML
E 724 Expression Profiling

model. It can be regarded as the “display page” of that Definition


particular version of a model. It is important to note
that a model file can exist in a workspace without being A Gaussian Image is the mapping of all the normals of
exposed. That model is still accessible if it has been an object on a sphere of unitary radius (Gaussian
made publicly available; in such a case, it has to be Image): all the tails of the vectors are on the center of
downloaded directly from the workspace. the sphere while the heads lie on the surface. The
Extended Gaussian Images (EGI) (Horn 1984) include
Cross-References the associate area: each point on the sphere has
a mass proportional to the area (Fig. 1). The EGI
▶ CellML Model Repository is exploited for characterizing the objects shape.
Herman Minkowski in 1910 demonstrated that
a convex polyhedron is fully described by the area
and orientation of its faces, that is, if two different
Expression Profiling objects are convex, their EGI representations are
necessarily different.
▶ Cell Cycle Analysis, Expression Profiling The EGI can be easily built from needle or depth
maps generated by range or stereo devices and, most
importantly, they can also be determined from the 3D
Expression Signature model of an object.
The EGIs can be used in machine vision also
▶ Functional/Signature Network Module for Target for determining the attitude in space of an object.
Pathway/Gene Discovery The EGI is particularly suited to deal with the
varying attitude of a 3D object in space: it is
invariant to the object position and, in case of
convex objects, there is a bijective correspondence
Extended Gaussian Image
with their EGI.
The basic idea is that each surface patch is mapped
Virginio Cantoni, Alessandro Gaggia and Luca
to a weighted point on the Gaussian sphere according
Lombardi
to its surface normal and the value assigned to each
Department of Computer Engineering and Systems
orientation is the sum of area of all the surface patches
Science, University of Pavia, Pavia, Italy
having a common normal (Eq. 1, Fig. 1).

Synonyms X
Nd
!

W!d ¼ Al;!d (1)


Orientation histogram l¼1

Extended Gaussian Image,


Fig. 1 Extended Gaussian
image
Extended Gaussian Image for Pocket-Ligand Matching 725 E
Extended Gaussian Image for
Pocket-Ligand Matching

Virginio Cantoni1,2, Alessandro Gaggia1 and


Luca Lombardi1
1
Department of Computer Engineering and Systems
Science, University of Pavia, Pavia, Italy
2
Computational Biology, KTH Royal Institute of
Technology, Stockholm, Sweden
E

Synonyms

Orientations histogram

Definition

The ▶ Extended Gaussian Image (EGI) (Horn 1984)


can be useful for a preliminary evaluation of pocket-
Extended Gaussian Image, Fig. 2 EGI with a triangular ligand matching likelihood. This is supported by the
tessellation
effectiveness in representing object mainly convex
(i.e., a pocket and a ligand). The analysis, which con-
where d~ is the direction associated with a point on the sists of a kind of necessary condition for matching,
Gaussian sphere, N d~ the total number of surface follows a sequence of three steps:
patches with normal d~, and Al;d~ the area of the lth 1. The solvent-excluded surface (SES) is analyzed and
surface patch with normal d. ~ a number of pockets and tunnels sites are identified
For a useful computer representation the Gaussian (Fig. 1).
sphere must be discretized, usually adopting 2. The candidate binding sites are detected through
a triangular tessellation, and this is called geodesic a structural matching of pockets and ligand, both
dome. Starting with a regular polyhedron (e.g., the represented through a suitable EGI modality.
Icosahedron), for a more detailed description each 3. The loci of compatible positions of the ligand are
triangle is split (iteratively) into four smaller triangles identified through standard tools of ▶ mathematical
(Fig. 2). This representation is usually called orienta- morphology operators.
tion histogram.

Characteristics
Cross-References
The aim, for docking applications, is the search for
▶ Extended Gaussian Image for Pocket-Ligand subregions that are complementary between different
Matching molecules. When we have a large molecule (receptor)
and a small molecule (▶ Ligand), docking takes place
in a protein cavity. In this connection the first subprob-
References lem to be solved, in protein-ligand interfaces, is to
develop the representations and the data structures
Hermann M (1910) Geometrie der Zahlen, Leipzig and Berlin:
suitable to support the computational methods that
R. G. Teubner, MR0249269
Horn BKP (1984) Extended Gaussian images. Proc IEEE allow a quantitative evaluation of the protein-ligand
72:1671–1686 matching on the basis mainly of their 3D structure and
E 726 Extended Gaussian Image for Pocket-Ligand Matching

morphology. The EGI is a possible representation for A given 3D molecule, modeled through its SES in
the ligand molecule, with the correspondent data a triangular mesh, is described by the set of triangles:
structure based on a first-order statistic of the surface
orientations. After the segmentation of the protein SES T ¼ fT 1 ; :::; T m g; T l R3 (1)
(Cantoni et al. 2010), the interface regions, which
potentially can be active sites, are represented by an where each Tl consists of a set of three vertices (2):
EGI (Fig. 2).  
Tl ¼ PA;l ; PB;l ; PC;l (2)

Center, normal, and area of each triangle Tl, namely


gl, d~l and Al, respectively, can be computed by (3), (4),
and (5):

gl ¼ ðPA;l þ PB;l þ PC;l Þ=3 (3)

d~l ¼ ðPC;l  PA;l Þ  ðPB;l  PC;l Þ (4)

Al ¼ jðPC;l  PA;l Þ  ðPB;l  PC;l Þj=2 (5)

The total area of the mesh A is given by cumulating


the area of each single triangle (6):
Xm
A¼ l¼1
Al (6)

where the Gaussian sphere is partitioned into a number


of cells m.
Then all the triangles T of the target molecule are
mapped onto the corresponding cells on the basis of the
discretized orientation d~l . Dealing with pockets and
external molecules that must match the cavity surface,
Extended Gaussian Image for Pocket-Ligand Matching, the approximation that the small target molecule is
Fig. 1 Pockets and tunnels extracted from a test protein mainly convex (at least for the segment involving the

Extended Gaussian Image


for Pocket-Ligand
Matching,
Fig. 2 (a) Wireframe surface
representation of a ligand.
(b) The correspondent EGI
representation
Extended Gaussian Image for Pocket-Ligand Matching 727 E
matching of the pocket concavity) is usually suitable. • The Minkowski distance:
Nevertheless, if the concave parts of the target mole-
cules become critical other approaches that reduce the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Xm p
M¼ j Al;cav  Al;lig j
p

known ambiguities of the EGI representation must be l¼1


(8)
followed. In this connection new representations such
as the complex EGI or the enriched complex EGI in for p ¼ 1 and p ¼ 2 we obtain the Manhattan and the
(▶ Complex EGI and Enriched Complex EGI) (Hu Euclidean distances respectively
et al. 2010) even if do not eliminate the ambiguities • The Bray Curtis distance:
certainly in practical cases results very effective. Here
Pm
the ECEGI solution is considered. If the object is j Al;cav  Al;lig j
B ¼ Pl¼1
m (9) E
l¼1 j Al;cav þ Al;lig j
convex, the mass center of the three Wx, Wy, and Wz
distributions on the Gaussian sphere coincides with the
center of the sphere. In fact, this is true also for the EGI obviously 0  B  1
(and for the CEGI), as by (7): • The Hausdorff distance:
X X X   
d
j W x;d~ jd~ ¼ d
j W y;d~ jd~ ¼ d
j W z;d~ jd~ H ¼max max8l Al;cav min8l Al;lig ; max8l Al;lig min8l Al;cav 
X (10)
¼ d
Ad~ d~ ¼ 0 (7)
• Another distance that can be defined for the geodesic
Since for a convex object |Wx,d| ¼ Wy,d| ¼ Wz,d| ¼ dome is E=n/m where n is the number of triangles
Ad, being Ad the area of the surface patch with normal satisfying this threshold criteria: (11) and m is the
~ A tessellated sphere with uniform and isotropic
d. total number of triangles of the considered mesh.
subdivision is needed. These properties are obviously " ! #m
satisfied by the projection of a regular polyhedron onto jAl;cav Al;lig j [
the sphere. Adopting the highest order regular polyhe- ry max Al;cav ;Al;lig ¼ 0
max Al;cav ;Al;lig
l¼1
dron, the icosahedron with 20 triangular cells as a basis
(11)
(but it provides a too coarse sampling of the orienta-
tions), and proceeding further by dividing iteratively the distance is given by E ¼ n/m, i.e., the percentage of
each triangular cells into four smaller triangles the triangles satisfying the threshold criteria.
according to the well-known geodesic dome (Horn
1984) constructions, the required level of resolution
can be achieved: being n the number of iterative sub- Cross-References
division steps, the cells number is m ¼ 10*22n+1, and
the area (solid angle) of the single cells is p/10*22n1 ▶ Complex EGI and Enriched Complex EGI
respectively. The corresponding data structure is con- ▶ Extended Gaussian Image
sequently a hierarchical one (in which each cell of one ▶ Ligand
level contains, other than the specific orientation, the ▶ Mathematical Morphology Operators
four pointers to cells of the subsequent level) and the
searching strategy of the orientation histogram values
becomes a hierarchical process.
Given two candidates molecules dual parts (i.e., References
a cavity and a ligand) the aim is to find if they are
Cantoni V, Gatti R, Lombardi L (2010) Segmentation of SES for
morphologically (or geometrically) compatible, in
protein structure analysis. In: 1st international conference on
other words we look for the rigid motion that could bioinformatics. BIOSTEC 2010. Valencia (ES), pp 83–89
bring the protrusion into the cavity. On this purpose, Horn BKP (1984) Extended gaussian images. In: Proceedings of
a preliminary coarse constraint is given by the mass of the IEEE. IEEE Computer Society Press, Los Alamitos,
vol 72, pp 1671–1686
the ECEGI of the cavity and the ligand: Acav > Alig.
Hu Z, Chung R, Fung KSM (2010) EC-EGI: enriched complex
Some matching indices normally adopted in EGI EGI for 3D shape registration. In: Machine vision and appli-
applications are: cations. Springer Berlin/Heidelberg, vol 2, pp 177–188
E 728 Extensible Markup Language

large as are the modifications, variations, and binding


Extensible Markup Language factors, making the insoluble constituents of a tissue
important in not only structural integrity but also crit-
▶ XML ical for proper tissue function (Lukashev and Werb
1998). ECM degradation and composition is central
to many degenerative disease processes and cancer.
Collagen is the most abundant protein in animals
Extracellular Matrix and can be grouped according to types that include
fibrillar, facit, short chain, and reticular (Orgel et al.
Mary Helen Barcellos-Hoff 2011). There are at least 30 collagens in humans
Department of Radiation Oncology and Cell Biology, (Table 1). The three most abundant collagens form
New York University School of Medicine, fibers that are important in tissues structure.
New York, NY, USA Type I collagen forms that self-assemble in the extra-
cellular space into triple-helical cross-linked fibers up
to 300 nm long. Type II collagen constitutes 90% of the
Synonyms structural protein of cartilage. Type III collagen is
important during embryogenesis and in wound
Basal lamina; Basement membrane; ECM; Ground healing. In contrast, type IV collagen organizes in
substance a reticular manner present in basement membranes.
Other collagens not as abundant are important in deter-
mining tissue architecture and cell behaviors.
Definition Non-collagen ECM proteins are highly
glycosylated (Table 2). A notable exception is elastin,
Extracellular matrix is the complex protein and carbo- which, as its name implies, is important for providing
hydrate material in which cells of multicellular organ- elasticity and forms sheets and fibers by cross-linking
isms are embedded. on a scaffold of fibrillin microfilaments. These are
present around blood vessels and airways in particular.
Adhesion is another function of ECM. There are
Characteristics 15 known laminin, a major protein of the basement
membrane that is essential for epithelial cell
The extracellular matrix (ECM) is distinct according to adhesion by way of integrins, integral membrane
the developmental stage, organ, and tissue type ECM receptors. Laminins are heterotrimeric, cross-
(Hynes and Naba 2012). The core “matrisome” com- like structures with 3 short and 1 long arms. In the
prises about 300 proteins, whose structure and function interstitial ECM, fibronectin is the major adhesion
are affected by ECM-modifying enzymes, ECM- protein. Fibronectins are dimers of 2 similar peptides
binding growth factors, and other ECM-associated chain that are 60–70 nm long and 2–3 nm thick. A sin-
proteins. These different categories of ECM and gle fibronectin gene gives rise to at least 20 different
ECM-associated proteins cooperate to assemble and fibronectin chains by alternative RNA splicing. The
remodel extracellular matrices and bind to cells ECM of developing or fetal organisms contains
through ECM receptors. Together with receptors for specific fibronectin isoforms that enable and stimulate
ECM-bound growth factors, they provide multiple motility.
inputs into cells to control survival, proliferation, Proteoglycans are made up of more carbohydrate than
differentiation, shape, polarity, and motility of cells. protein. The GAGs chondroitin sulfate (CS), dermatan
There are two major ECMs: The basal lamina or sulfate (DS), keratan sulfate (KS), and heparan sulfate
basement membrane is a highly structured ECM layer (HS) are linear polysaccharides composed of an amino
on which epithelia rest. The interstitial ECM is that sugar and an uronic acid. These basic components
between tissues generated by the stroma. are further varied by epimerization, sulfation, and
ECM is composed of proteins, glycosaminoglycans deacetylation. The order of the carbohydrate chain and
(GAGs), and carbohydrates. The range of proteins is the other chemical modifications determine their
Extracellular Matrix 729 E
Extracellular Matrix, Table 1 Collagen types
Collagen type Gene symbol Characteristics Primary source
I COL1A1, COL1A2 300 nm, 67 nm banded fibrils Skin, tendon, bone, etc.
II COL2A1 300 nm, small 67 nm fibrils Cartilage, vitreous humor
III COL3A1 300 nm, small 67 nm fibrils Skin, muscle, frequently with type I
IV COL4A1 thru COL4A6 390 nm C-term globular domain, nonfibrillar All basal lamina
V COL5A1, COL5A2, COL5A3 390 nm N-term globular domain, small fibers Most interstitial tissue, assoc. with
type I
VI COL6A1, COL6A2, COL6A3 150 nm, N + C term, globular domains, Most interstitial tissue, assoc. with
microfibrils, 100 nm banded fibrils type I
VII COL7A1 450 nm, dimer Epithelia E
VIII COL8A1, COL8A2 Some endothelial cells
IX COL9A1, COL9A2, COL9A3 200 nm, N-term globular domain, bound Cartilage, assoc. with type II
proteoglycan
X COL10A1 150 nm, C-term globular domain Hypertrophic and mineralizing
cartilage
XI COL11A1, COL11A2 300 nm, small fibers Cartilage

Extracellular Matrix, Table 2 Non-collagen ECM proteins


Family Proteoglycans Core protein kDA GAG type Tissue
Small interstitial proteoglycans Decorin 36 kDa CS Secreted, connective tissue
Biglycan 38 kDa CS Secreted, connective tissue
Aggrecan family of matrix proteoglycans Aggrecan 208–220 kDa CS, HA, KSII Secreted, cartilage
Brevican 96 kDa CS Secreted, brain
Neurocan 145 kDa CS Secreted, brain
Versican 265 kDa CS Secreted, fibroblasts
HS proteoglycans Periecan 400 kDa HS Basement membrane
Agrin 212 kDa HS Basement membrane
Syndecans 1–4 31–45 kDa HS, CS Epithelial cells, fibroblasts
Betaglycan 110 kDa HS Fibroblasts
Glypicans 1–5 60 kDa HS Epithelial cells, fibroblasts
Serglycin 10–19 kDa CS Mast cells
KS proteoglycans Lumican 37 kDa KS 1 Broad
Keratocan 37 kDa KS 1 Broad
Fibromodulin 59 kDa KS 1 Broad
Mimecan 25 kDa KS 1 Broad
SV2 80 kDa KS 1 Synaptic vesicles
Claustrin 105 kDa nr CNS

specificity and functionality. Hyaluronan is non-sulfated reticular epithelial cells of the thymus and the
and is composed of an unmodified disaccharide repeat surface of brain and spinal cord. Basement membranes
that is not covalently linked to any protein (Toole 2004). usually contain type IV collagen, heparan sulfate pro-
The basement membrane is a permeable, thin ECM, teoglycan, chondroitin sulfate proteoglycan, entactin,
evident as a three-layered basal lamina when using hyaluronic acid, collagen VII, and laminin. Each of the
electron microscopy, placed between epithelial cells components of the basal lamina is synthesized by the
or endothelial cells and adjacent connective tissue. cells that rest upon it and are modified by adjacent
It surrounds smooth and striated muscle cells, fat stromal cells beneath. Basement membrane is both
cells, Schwann’s cells, cells of adrenal medulla, and structural and instructional, providing key positional
E 730 Extrinsic DNA Damage Checkpoints

and behavioral information to the attached cells. Epi- Cross-References


thelial functions mediated by the basement membrane
include polarity, morphogenesis, cell cycle regulation, ▶ Cancer
and differentiation. Epithelial stem cells are thought ▶ Fibroblasts
to reside in a specialized ECM that forms the niche. ▶ Wound Healing
ECM composition is context and process dependent
in a manner that propagates fundamental information.
The dynamic ECM compositional changes during References
embryogenesis and rapid ECM remodeling during
wound healing are essential for developing and Hynes RO, Naba A (2012) Overview of the matrisome—an
inventory of extracellular matrix constituents and functions.
restructuring tissues. An important distinction of fetal
Cold Spring Harb Perspect Biol 4(1):doi:10.1101/
ECM is that scars do not form around skin wounds, cshperspect.a004903
which is due to microenvironmental cues and specific Lukashev ME, Werb Z (1998) ECM signalling: orchestrating
cytokines (Shah et al. 1992). Scars contain disordered cell behaviour and misbehaviour. Trends Cell Biol
8:437–441
collagen I fibrils. Abnormal ECM composition or
Orgel JPRO, San Antonio JD, Antipova O (2011) Molecular and
remodeling contributes to pathogenesis in many structural mapping of collagen fibril interactions. Connect
organs. Overproduction of interstitial collagens is Tissue Res 52(1):2–17. doi:10.3109/03008207.2010.511353
characteristic of fibrosis, which may ultimately Schor AM, Schor SL (2009) Angiogenesis and tumour progres-
sion: migration-stimulating factor as a novel target for
compromise vital organ function. Fetal and ▶ wound
clinical intervention. Eye 24(3):450–458
healing ECM proteins are also often reexpressed in Shah M, Foreman DM, Ferguson MWJ (1992) Control of scar-
cancers. An example is migration-stimulating factor ring in adult wounds by neutralizing antibody to
(MSF), a truncated fibronectin monomer. MSF transforming growth factor b. Lancet 339:213–214
Toole BP (2004) Hyaluronan: from extracellular glue to
message and protein are expressed by fibroblasts,
pericellular cue. Nat Rev Cancer 4(7):528–539
keratinocytes, and vascular endothelial cells in fetal
skin but not in the majority of healthy adult skin (Schor
and Schor 2009). MSF message and protein are also
expressed by tumor cells and associated stromal
▶ fibroblasts and endothelial cells. Thus, ECM is Extrinsic DNA Damage Checkpoints
both a response to and a regulator of homeostasis and
pathology. ▶ Cell Cycle Arrest After DNA Damage
F

Faceted Browsing multiple comparisons. It is typically used in high-


throughput experiments in order to correct for random
▶ Search Engines with Faceted Search events that falsely appear significant. When testing
a null hypothesis to determine whether an observed
score is statistically significant, a measure of confi-
dence, the p-value, is calculated and compared to
Faceted Navigation a confidence threshold a. When k hypotheses are tested
simultaneously with a confidence level a, the chances
▶ Search Engines with Faceted Search of occurrence of false positives (i.e., rejecting the
null hypothesis when in fact it is true) is equal to
1  (1  a)k, which can lead to a high error rate in
the experiment. Therefore, a multiple testing correc-
Fact Extraction tion, such as the FDR, is needed to adjust our statistical
confidence measures based on the number of tests
▶ Information Extraction performed.
The FDR is defined as the expected proportion
of false discoveries, i.e., incorrectly rejected null
hypothesis, among all discoveries (Benjamini and
Factors Modulating Nucleosome Hochberg 1995). Consider the problem of testing
Structure simultaneously m null hypothesis H0 versus the
alternative hypothesis H1, of which m0 are true.
▶ Nucleosome Acting Factors We define the following random variables (see
Table 1): TN is the number of true negatives;
FP is the number of false positives (or type
I errors); FN is the number of false negatives; TP
False Discovery Rate (FDR) is the number of true positives; R is the number
of hypotheses rejected. R is an observable
Sigrid Rouam random variable, while TN, FN, TP, and FP are
Université Paris-Sud, Villejuif, France unobservable random variables.
The FDR is given by

Definition    
FP FP
FDR ¼ E ¼E if
FP þ TP R
The false discovery rate (FDR) is a statistical approach
used in multiple hypothesis testing to correct for R > 0; 0 otherwise: (1)

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
F 732 False Positive Rate

False Discovery Rate (FDR), Table 1 Number of errors com- False Positive Rate, Table 1 Sample confusion matrix
mitted when testing m null hypothesis H0 versus H1
Predicted
Accept H0 Reject H0 Total A Non-A
H0 true TN FP m0 Actual A TP FN
H1 true FN TP m  m0 Non-A FP TN
Total mR R m

The FDR is the proportion of the rejected null false positive rateðaÞ ¼ freject H0 jH0 trueg
hypotheses which are erroneously rejected. In practice,
the level of acceptable FDR is fixed by the investigator, In machine learning (▶ Model Validation, Machine
depending on the desired stringency. Learning), the false positive rate is closely related to the
An example of the practical use of the FDR is in notion of specificity, one of statistical measures widely
genomic association studies, for which a large number used to assess the performance of prediction models.
of statistical tests are performed simultaneously. In Let TP be true positives (samples correctly classi-
such studies, the objective is to identify genomic fac- fied as class A), FN be false negatives (samples incor-
tors worthy of further analysis. If the multiplicity of the rectly classified as not belonging to class A), FP be
data is not taken into account, the probability that false positives (samples incorrectly classified as
a false identification (type I error) is committed can class A), and TN be true negatives (samples correctly
increase sharply when the number of tested genes classified as not belonging to class A). The relationship
becomes large. The FDR controlling approach is between these prediction outcomes can then be sum-
widely used to avoid such errors. marized using a confusion matrix (Kohavi and Provost
1998) as illustrated Table 1.
The false positive rate is the proportion of samples
References not belonging to class A that were incorrectly classified
as class A, i.e.,
Benjamini Y, Hochberg Y (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple testing.
J Royal Stat Soc Ser B 57(1):449–518 false positive rate ðaÞ ¼ FP=ðFP þ TNÞ
¼ 1  specificity

False Positive Rate As an example, suppose we want to build


a ▶ classification model using gene expression micro-
Haiying Wang and Huiru Zheng array data to predict whether a subject has a disease or
School of Computing and Mathematics, Computer not. In a total of 100 subjects known to be free
Science Research Institute, University of Ulster, of a disease, the model actually predicts 10 subjects
Jordanstown, UK having the disease. In this scenario, FP ¼ 10 and
TN ¼ 90. Thus, the false positive rate is 10%.

Synonyms
Cross-References
Type I error rate
▶ Model Validation, Machine Learning

Definition
References
In statistical analysis, the false positive rate of a test is
defined as the probability of rejecting the null hypoth- Kohavi R, Provost F (1998) Glossary of terms. Mach Learn
esis H0 when it is true, which can be denoted as: 30:271–274
FBA Analysis, Plant-Pathogen Interactions 733 F
Nevertheless, this type of modeling lacks the
FBA Analysis, Plant-Pathogen capacity to represent the underlying biological
Interactions processes that take place during an infection, for
instance, those that modify infected cell behavior at
Andrés Mauricio Pinzón Velasco1, Silvia Restrepo2 the molecular level. Recently, due to the availability
and Andrés Fernando González Barrios3 of high-throughput biological data as well as the
1
Department of Systems Engineering, Universidad development of sophisticated computational tools,
Jorge Tadeo Lozano, Bogotá, Colombia there is a growing interest in the modeling of these
2
Department of Biological Sciences, Universidad de underlying molecular processes.
los Andes, Bogotá, Colombia
3
Department of Chemical Engineering, Universidad de Computational Modeling of Metabolism
los Andes, Bogotá, Colombia Although the way plants resist a particular pathogen
F
attack vary according to plant species and specific path-
ogen characteristics, in general terms, plants face
Synonyms a pathogen attack by shifting their defense mechanisms.
These mechanisms consists of a complex and highly
Flux balance analysis; Pathosystem interconnected set of networks, in which host defense
genes interact with each other as well as with pathogen
proteins present in the cell. Metabolic and regulatory
Definition networks are of particular interest when studying this
kind of biological processes. It is at this level that one
Flux Balance Analysis (FBA) linear programming expects that pathogen manipulation lead to phenotypic
based optimization approach that departs from an objec- and behavioral differences among plants under attack. In
tive cell or system function restricted by mass and this context, a particular approach for the in silico repre-
energy conservation, which has been widely used for sentation and analysis of the set of biochemical reactions
the analysis of biochemical fluxes in metabolic net- that take place in a given organism, known as metabolic
works. This approach allows researchers to determine reconstructions (MRs), is of particular interest.
changes in metabolic phenotypes given a set of Typically MRs are based on genomics information
biochemical constraints. In the case of plant-pathogen and therefore known as GEMRs or Genome Scale
interactions, or any host-pathogen model, this method- Metabolic Reconstructions (Thiele and Palsson
ology is particularly promising since it offers the oppor- 2010). This type of reconstructions are based on the
tunity to analyze the structure, dynamics, and complex gene content of a complete genome, from where the set
behavior of metabolic networks during a particular of genes that code for enzymes is selected and the
infective process. reactions they belong to are gathered.
Another approach for metabolic reconstruction is
known as TMRs or Targeted Metabolic Reconstruc-
Characteristics tion (Pinzón et al. 2010), which are not based on
genome content but in transcriptomics information.
Mathematical modeling of plant-pathogen interactions Due to its main characteristic TMRs do not cover the
has been carried out since the 1950s (Pinzón et al. complete set of potential genes present in an organism
2009). Typically, plant pathologists have approached genome, but they do provide a way to analyze active
the quantitative and computational modeling of this metabolic networks, as the set of enzymes expressed at
type of interactions focusing on processes at high a given time point and under particular conditions.
levels of resolution, such as the description of temporal Overall, the main aim of metabolic reconstructions
dynamics of crops diseases and spatial patterns of is the identification of suitable targets for metabolic
dispersion. From its very beginning these models engineering, the improvement of production yields and
belonged to the family of logistic equations and its nutritional value of crops, as well as the understanding
application to a broad spectrum of plant diseases has of certain processes such as resistance or defense from
been a constant since then (Pinzón et al. 2009). pathogens, for instance, through the study of time
F 734 FBA Analysis, Plant-Pathogen Interactions

FBA Analysis, Plant-Pathogen Interactions, Fig. 1 FBA is growth is used as OF, and it is represented in the form of
a constraint-based optimization method that facilitates the com- biomass. However, under particular conditions for some meta-
putational prediction of systemic phenotypes in the form of bolic reconstructions in plants, other OFs can be described. For
fluxes of reactions. For instance, given a set of available nutri- example, in S. tuberosum and other tuber-like plants it is feasible
ents for an organism, FBA allows for the prediction of the set of to use starch production and storage as OF instead of a typical
fluxes of metabolic reactions that optimize the growth for that biomass set of reactions, given that starch storage could be seen
organism. FBA requires the conversion of the metabolic system as an indicator of growth. Part B shows a standard form of a FBA
(A) into a stoichiometric matrix “S” (C). In this matrix rows problem where flux through the OF is maximized subject to
represent metabolites and columns reactions. In R1, metabolite some constraints such as reactions stoichiometric and uptake.
A is consumed (1) and metabolite B is produced (1), metabo- In this way reaction fluxes (Vj) are between lower (lb) and upper
lite C does not participate in the reaction (0). The FBA model (ub) bounds. Since FBA assumes a steady state for all reactions,
usually optimizes for a particular characteristic in the organism, metabolite concentrations are fixed (S.v ¼ 0)
which is commonly called the Objective Function (OF). Usually

evolution of a host network under pathogen attack. Using FBA it is possible to determine how
One of the most important characteristics of MRs is microorganisms utilize their metabolism, and predict
that they can be interrogated by means of computa- their changes under environmental perturbations. For
tional modeling and therefore derive hypothesis based instance, it is possible to assess the effect that a single
on model predictions. gene deletion can have over the flux of all reactions in
a metabolic network (Fig. 2). These fluxes of reactions
Flux Balance Analysis (FBA) in the Context of correspond to intracellular biochemical networks that,
Plant-Pathogen Interactions due to the lack of kinetics information, are assumed to
There are several methodologies than can be used to operate under pseudo steady-state conditions, which
interrogate a metabolic reconstruction (Oberhardt et al. seems to be in agreement with experimental data. For
2009). A common constraint-based method, known as instance, this approach showed an 85% consistency of
Flux Balance Analysis or FBA (Becker et al. 2007), gene essentiality for the genome scale reconstruction
has been widely used for this purpose (Fig. 1). of Escherichia coli and 70% consistency for gene
FBA Analysis, Plant-Pathogen Interactions 735 F

FBA Analysis, Plant-Pathogen Interactions, Fig. 2 (a) met- SAGE or protein-protein interaction data, it is possible to silence
abolic profile before virtual knockout. (b) Metabolic profile after some reactions and let FBA to predict flux change under these
virtual knockout of reactions between B and C and between new conditions. In the case of plant-pathogen interactions, it is
F and E. OF: Objective function. In part (a) of the figure it is possible to use data from known host R-proteins and pathogen
clear how a different flux of reactions is present for OF optimi- effectors interaction, as well as genes with differential expres-
zation. Based on biological information, such as microarray, sion obtained by essays of plants challenged with the pathogen

essentiality for the reconstruction of Pseudomonas between host R-proteins and pathogen effectors is
aeruginosa (Reviewed in Pinzón et al. 2009). effective and then the pathogen is able to suppress
This characteristic of FBA is of particular interest in host recognition and the plant defense response
the study of plant-pathogen interactions, but before derived from it, leading to a disease known as Late
trying to clarify how this approach can be used in this blight of potato.
context, some background on the pathogenicity and This pattern of interaction is also typical for many
host defense response is necessary. other pathosystems such as Arabidopsis thaliana and
Plant-pathogen interactions can be divided in two Pseudomonas syringae or A. thaliana and Hyaloper-
main mechanisms. During the first one, known also as onospora parasitica, among many others.
basal defense, the plant recognizes general features in Although for many of these pathosystems some
pathogens that are universally associated with microbes. molecular details are known, this is not the reality for
These features are known as MAMPs (Microbe- most of them. In the case of P. infestans and
Associated Molecular Patterns) which are recognized S. tuberosum, some phenotypical responses to plant
by membrane receptors in plants. The recognition of attack are recognized to date, such as a decrease
these MAMPs triggers the first line of defense, which in plant’s photosynthetic capacity several hours after
for most plants it is enough to stop pathogen attack. infection, but the precise biochemical mechanisms that
However, some plant pathogens are specific to a host lead to that phenotype are not known. Although typical
and have developed special strategies for the manipula- molecular approaches (proteomics, transcriptomics,
tion of host defenses. This is the case for the pathogen etc.) have shown to be effective in revealing
Phytophthora infestans in Solanum tuberosum. During how some of these mechanisms work, the methodolo-
the pathogenicity process P. infestans releases an arse- gies are not tractable when trying to understand
nal of proteins known as effectors, which are injected a complex network of biochemical interactions. There-
into the host plant through specialized secretion sys- fore, FBA over a metabolic reconstruction can be an
tems, suppressing the first line of defense. S. tuberosum ideal alternative for the comprehension of this type of
in turn, have evolved to recognize these effectors using networks.
Resistance proteins (R-proteins), which directly or indi- In general, what a FBA analysis describes is the set
rectly interact with them. Not always this interaction of fluxes of reactions that take place in a metabolic
F 736 Feasibility

network under a particular environment. Therefore,


if we take into consideration different environment Feasibility
conditions, it is possible to evaluate changes over the
same network given those new conditions. If we know, Frank Allg€ower, Jan Hasenauer and Steffen Waldherr
for instance, that a particular effector protein acts as Institute for Systems Theory and Automatic Control,
repressor for the expression of a host gene involved in University of Stuttgart, Stuttgart, Germany
metabolism, we can represent this situation by a virtual
knockout of this gene, for which different software
tools are available (Hoppe et al. 2011; Rocha et al. Definition
2010; Becker et al. 2007). By the study of the network
after and before this knockout it is possible to obtain A feasibility problem denotes the mathematical
a different profile of the reactions implied in both problem of showing that there exist values for an
stages and derive important biological conclusions unknown variable which satisfy specific constraints.
from their analysis (Fig. 2). In an n-dimensional vector space over the real
Although metabolism is crucial for most cellular numbers, the problem is to find x 2 n such that
activities, it is also important to take into account that fi(x)  0, i ¼ 1,. . ., m, for a given list of m
metabolism response is highly integrated to signaling scalar-valued functions fi. This problem is mathe-
that comes from different routes. In the case of plant- matically written as
pathogen interactions, it is then possible to integrate
into the FBA information regarding typical signal find x 2 n
defense routes, such as ethylene, jasmonic, and
subject to fi ðxÞ  0; i ¼ 1; . . . ; m:
salicylic acids pathways. The information related to
this signaling process can be integrated into FBA as
a set of restrictions, similar to the virtual knockouts Feasibility problems are tightly linked to optimiza-
described before. tion. Any feasibility problem can be formulated as an
optimization problem with a constantly zero objective
function:

References minimize 0
n
x2
Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, subject to fi ðxÞ  0; i ¼ 1; . . . ; m:
Herrgard MJ (2007) Quantitative prediction of
cellular metabolism with constraint-based models: the
COBRA toolbox. Nat Protoc 2(3):727–738. doi:10.1038/ If the constraints cannot be satisfied, the optimal
nprot.2007.99 value is 1 by definition. On the other hand, if an x
Hoppe A, Hoffmann S, Gerasch A, Gille C, Holzhutter H-G
exists that satisfies the constraints, the optimal value
(2011) FASIMU: flexible software for flux-balance compu-
tation series in large metabolic networks. BMC Bioinformat- will be 0.
ics 12(1):28. doi:10.1186/1471-2105-12-28 The formulation of a feasibility problem as
Oberhardt MA, Palsson BØ, Papin JA (2009) Applications of optimization problem is particularly attractive for
genome-scale metabolic reconstructions. Mol Syst Biol
constraint classes where a solver exists which
5:320. doi:10.1038/msb.2009.77
Pinzón A, Barreto E, Bernal A, Achenie L, Barrios is guaranteed to find the optimal value, e.g.,
AFG, Isea R et al (2009) Computational models in ▶ semidefinite programs. In this case, the optimi-
plant-pathogen interactions: the case of Phytophthora zation algorithm can directly be used to decide the
infestans. Theor Biol Med Model 6:24. doi:10.1186/
feasibility problem.
1742-4682-6-24
Rocha I, Maia P, Evangelista P, Vilaca P, Soares S, Pinto JP,
Nielsen J et al (2010) OptFlux: an open-source software
platform for in silico metabolic engineering. BMC Syst Cross-References
Biol 4(1):45. doi:10.1186/1752-0509-4-45
Thiele I, Palsson BØ (2010) A protocol for generating a high-
quality genome-scale metabolic reconstruction. Nat Protoc ▶ Model Falsification, Semidefinite Programming
5(1):93–121. doi:10.1038/nprot.2009.203 ▶ Semidefinite Program
Feed Forward Loop 737 F
References terms of prediction accuracy, but are generally com-
putationally more intensive. In summary, filter and
Boyd S, Vandenberghe L (2004) Convex optimization. wrapper methods can complement each other.
Cambridge University Press, Cambridge

References
Feasible Parameter Space
Dash M, Liu H (1997) Feature selection for classification, intel-
ligent data analysis. An Int’l J 1(3):131–156
▶ Dynamic Metabolic Networks, k-Cone Kim YS, Street WN, Menczer F (2003) Feature selection in data
mining. In: Wang J (ed) Data mining: opportunities and
challenges. Idea Group, Hershey, pp 80–105

Feature Reduction
F
▶ Feature Selection Feed Forward Loop

Guangxu Jin
Feature Selection Systems Medicine and Bioengineering,
Bioengineering and Bioinformatics Program, The
Chenglei Sun Methodist Hospital Research Institute, Weill Medical
Institute of Systems Biology, Shanghai University, College, Cornell University, Houston, TX, USA
Shanghai, Shanghai, China

Synonyms
Synonyms
Feed forward loop motif, FFL
Attribute selection; Feature reduction; Variable selec-
tion; Variable subset selection
Definition

Definition Feed forward loop (FFL) motif is one of the most


significant one in both E. coli and yeast. The FFL is
Feature selection is a must in data mining. The main composed of a transcription factor X, which regulates
idea of feature selection is to choose a subset of input a second transcription factor Y. X and Y both bind the
variables by eliminating the noisy or redundant features regulatory region of target gene Z and jointly modulate
and keeping the quality patterns. Feature selection can its transcription rate. The FFL has three transcription
significantly improve the effectiveness and robustness interactions. Each of these can be either positive (acti-
of the resulting classifier or regression models. More vation) or negative (repression). There are therefore
importantly, feature selection can help to discover the eight possible structural configurations of activator and
features that are really important in the classification. repressor interactions. Four of these configurations are
There are two types of feature selections: filter and termed “coherent”: the sign of the direct regulation
wrapper. Filter methods evaluate the goodness of the path (from X to Z) is the same as the overall sign of
feature subset by using the intrinsic characteristic of the indirect regulation path (from X through Y to Z).
the data. They are relatively computationally cheap, The other four structures are termed “incoherent”: the
since they do not involve the induction algorithm. signs of the direct and indirect regulation paths are
However, they also take the risk of selecting subsets opposite. Mathematical modeling indicates that FFLs
of features which may not match the chosen induction can serve as a novel mechanism for accelerating the
algorithm. Wrapper methods, on the contrary, directly expression of the target genes. Both coherent and inco-
use the induction algorithm to evaluate the feature herent FFL behaviors are sign sensitive: they acceler-
subsets. They generally outperform filter methods in ate or delay responses to stimulus steps (Fig. 1).
F 738 Feed Forward Loop Motif, FFL

Feed Forward Loop, a


Fig. 1 The structures for
coherent and incoherent FFL

References
Feedback Regulation
Mangan S, Alon U (2003) Structure and function of the feed-
forward loop network motif. Proc Natl Acad Sci USA Yufei Huang
100:11980–11985 Picower Institute for Learning and Memory,
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs
in the transcriptional regulation network of Escherichia coli. Massachusetts Institute of Technology, Cambridge,
Nat Genet 31:64–68 MA, USA
Greehey Children’s Cancer Research Institute,
University of Texas at Texas Health Science Center at
San Antonio, San Antonio, TX, USA
Feed Forward Loop Motif, FFL Department of Epidemiology and Biostatistics,
University of Texas at Texas Health Science Center at
▶ Feed Forward Loop San Antonio, San Antonio, TX, USA

Definition
Feedback Loops
Feedback regulation describes the particular type of
▶ MicroRNA Regulation, Feedback Loop gene regulation, where the current status of gene
Fibroblasts 739 F
expression will influence the future status of the same Characteristics
set of genes. In a cell, feedback regulation is an essen-
tial mechanism to ensure the stability of the cellular Fibroblasts reside in the stroma, are derived from the
system under changing environmental conditions. mesoderm, and are members of the cell type known as
mesenchymal. Dormant fibroblasts become fibrocytes
that can be easily identified along the margins of many
Cross-References types of connective tissue.
Fibroblasts are important to the structural integrity
▶ Gene Regulation of tissues and organisms, in part because they generate
type I collagen, the most abundant protein in animals.
Collagen I is a triple-helical ECM protein that provides
References mechanical strength. Fibroblasts are also a major
F
source of fibronectin, which serves as a scaffold for
Campbell N, Mitchell L et al (2003) Biology: concepts and cell adhesion, and generate many minor components of
connections. Benjamin Cummings, New York
the ▶ extracellular matrix. Fibronectin production by
Reik W (2007) Stability and flexibility of epigenetic gene regula-
tion in mammalian development. Nature 447(7143):425–432 fibroblasts is highly responsive to stress and is alterna-
tively spliced in response to specific signals resulting
in a highly variable protein with a multiplicity of
functions.
Feed-Forward Loops Fibroblasts are described by their common
characteristics that include spindle cell morphology,
▶ MicroRNA Regulation, Feed-Forward Loops vimentin cytoskeleton, and (relatively) proliferative
quiescence; yet fibroblasts are highly heterogeneous.
The distribution of fibroblasts varies widely between
organs, from single cells tucked between epithelium
FFL and endothelium to densely packed. Another feature of
fibroblasts is that they are readily cultured because
▶ Canonical Network Motifs they attach to tissue-culture plastic and proliferate in
response to serum used in most culture media. Because
of this common trait, many experimental studies have
used fibroblasts as “garden-variety” cells. Yet genome-
Fibroblasts wide patterns of gene expression in cultured fetal and
adult human fibroblasts derived from skin at different
Mary Helen Barcellos-Hoff anatomical sites revealed that fibroblasts from each
Department of Radiation Oncology and Cell Biology, site displayed distinct and characteristic transcriptional
New York University School of Medicine, New York, patterns (Chang et al. 2002).
NY, USA A broad classification discriminate fibroblasts
derived from visceral organs and cutaneous origin. It
was suggested that the site specificity of the molecular
Synonyms
signals and extracellular proteins expressed by fibro-
blasts provide “home addresses” that are monitored by
Fibrocytes; Interstitial cells; Mesenchymal cells
epithelial cells to restrict their migration, survival, and
proliferation (Chang et al. 2002). Fibroblasts isolated
Definition from different anatomical sites and from the same ana-
tomical site but of different diseases display topographic
Fibroblasts are cells that reside in connective tissues differentiation and positional memory. Although this
and produce extracellular matrix (ECM). Quiescent suggests fibroblasts at different locations in the body
fibroblasts, known as fibrocytes, are abundant in the could be considered distinct differentiated cell types,
interstitial tissues of all organs. there has been little success if classification schemes
F 740 Fibrocytes

based on exclusive markers. Few genes are uniquely Cross-References


expressed in fibroblasts from any particular site; rather,
combinatorial patterns of large groups of genes define ▶ Bone Marrow-derived Cells
fibroblasts from different sites. ▶ Extracellular Matrix
Fibroblasts play important roles in tissue repair and ▶ Wound Healing
wound healing, while aberrant fibroblasts contribute to
disease processes and cancer (Kalluri and Zeisberg
2006). In response to wounding, dermal fibrocytes References
undergo a phenotypic change to myofibroblasts that
serve to actively contract during wound closure. How- Buchanan EP, Longaker MT, Lorenz HP (2009) Chapter 6 Fetal
skin wound healing. In: Gregory SM (ed) Advances in
ever, ▶ wound healing differs in adults and fetuses; the
clinical chemistry, vol 48. Elsevier, Amsterdam, pp 137–161
latter do not undergo myofibroblasts differentiation Buckley CD (2011) Why does chronic inflammation persist: an
and also do not form scars. In the adult wound, fibro- unexpected role for fibroblasts. Immunol Lett 138(1):12–14
blast synthesis of collagen is delayed while fibroblasts Chang HY, Chi J-T, Dudoit S, Bondre C, van de Rijn M,
Botstein D, Brown PO (2002) Diversity, topographic
proliferate, but fetal fibroblasts simultaneously prolif-
differentiation, and positional memory in human fibroblasts.
erate and synthesize collagen (Buchanan et al. 2009). Proc Natl Acad Sci 99(20):12877–12882. doi:10.1073/
Fibroblasts also induce specific transition from acute pnas.162488599
inflammation to acquired immunity, while inappropri- Kalluri R, Neilson EG (2003) Epithelial-mesenchymal transition
and its implications for fibrosis. J Clin Invest 112(12):
ate production of chemokines and matrix components
1776–1784
by fibroblasts leads to the establishment of chronic Kalluri R, Zeisberg M (2006) Fibroblasts in cancer. Nat Rev
inflammation (Buckley 2011). Cancer 6(5):392–401
Fibroblasts rarely undergo spontaneous apoptosis Tarin D (2011) Cell and tissue interactions in carcinogenesis and
metastasis and their clinical significance. Sem Cancer Biol
or in response to DNA damage. A response of
21:72–82
primary fibroblasts to stress and aging is a permanent
arrest called replicative senescence. The senescence-
associated secretory phenotype (SASP) is distinct from
that of replication competent fibroblasts that is rich in
cytokines and factors that may mediate aging processes. Fibrocytes
Fibroblasts are also culprits in tissue-compromising
fibrosis that can starve, restrict, and ultimately replace ▶ Fibroblasts
functional parenchyma. In this case, exuberant collagen
production is a major characteristic, often mediated by
response to elevated TGFb, a cytokine both produced by
fibroblasts and to which fibroblasts are exquisitely sen- Fick’s Second Law
sitive. An novel source of fibroblasts in fibrotic condi-
tions is the epithelium via epithelial to mesenchymal ▶ Parabolic Differential Equations, Diffusion
transition (EMT) (Kalluri and Neilson 2003). The Equation
parenchymal response to injury, as occurs in ▶ wound
healing, causes EMT, but in the disease state, cells fail to
revert to type due to an imbalance in cytokine signaling,
leading to accumulation in the mesenchymal compart- Figurate Networks
ment (the significance and occurrence of EMT is the
subject of a controversy [Tarin 2011]). Similarly, cancer ▶ Stem Cell Networks
cells can recruit and induce a highly activated fibroblast
phenotype called cancer-associated fibroblasts (CAF).
Together, these examples underscore the remarkably
plasticity and diversity of fibroblasts, whose nature File Standard
should be explicitly evaluated when considering tissue
as systems. ▶ Proteomics Data Formats
Fitness 741 F
Cross-References
Final Cause
▶ Exact Test For Independence
▶ Teleology

References

Fisher’s Linear Discriminant Fisher RA. On the interpretation of w2 from contingency tables,
and the calculation of P. J Roy Stat Soc. 1922;85(1):87–94.

▶ Linear Discriminant Analysis

F
Fitness
Fisher’s Test
Philippe Huneman
Kejia Xu Institut d’Histoire et de Philosophie (IHPST), des
Institute of Systems Biology, Shanghai University, Sciences et des Techniques Université Paris 1
Shanghai, China Panthéon-Sorbonne, Paris, France

Definition Definition

Fisher’s exact test is a statistical test used to exam- The word comes from the phrase “survival of the
ine the significance of the association between two fittest,” which Darwin borrowed from Spencer in the
categorical variables. The test is useful for categor- last edition of the Origin of species due to his own
ical data that result from classifying objects in two hand. Fitness has two components, survival and repro-
different ways, and it is used to examine the sig- duction. If an organism is very well adjusted to its
nificance of the association between the two kinds milieu, but does not reproduce, it has no evolutionary
of classification. impact; hence, reproduction is often taken as crucial,
For an example application of the 2  2 and survival considered as a proxy for reproduction
test, let X be a journal, either a mathematics (the longer X survives, the higher the chances it has
magazine or a science magazine, and let Y be the offspring). However, even if often correct, those
number of articles on the topics of mathematics and approximations prove to be controversial, for example,
biology. when some organisms grow rather than reproduce. In
some contexts, one has to model the two components
Mathematics Science of fitness separately, for instance, when the issue is to
magazine magazine Total
understand how individual resources are partitioned
Mathematics a b a+b
between reproduction and survival.
Biology c d c+d
More formally, fitness can be defined as the probabil-
Total a+c b+d a+b+c+d
ity distribution of the representation of a gene, a trait or
Fisher calculates the conditional probability of get- an individual at the next generation; yet often equations
ting the actual matrix given the particular row and can consider only the expectancy (fitness meaning
column sums: expected offspring number of an individual). Because it
concerns expected rather than actual offspring, fitness
     is often metaphysically considered as a propensity
aþb cþd aþbþcþd
P¼ (sometimes called “expected fitness”) rather than as
a c aþc
a categorical property (then called “realized fitness”).
ða þ bÞ!ðc þ dÞ!ða þ cÞ!ðb þ dÞ! Sometimes, several generations have to be taken
¼ ; (1)
a!b!c!d!ða þ b þ c þ dÞ! into account to understand the evolutionary dynamics
F 742 Fitness Function

(e.g., when explaining the constancy of sex ratio, which computed a priori by estimating the performances of
involves considering the effect on grandchildren) (Sober various trait types (race speed, rate of metabolism,
2002). The case of altruism compels biologists to con- visual acuity, etc.), provided that one has an idea
sider selection at the level of genes. According to about the relative importance of all factors for survival
▶ Hamilton’s rule, the ▶ relatedness between actor and reproduction.
and beneficiary may account for the selection of
the altruistic act because the degree of relatedness
mitigates the cost: if the act increases the number Cross-References
of altruistic alleles at the next generation (as com-
pared to the selfish alleles), be they directly alleles ▶ Explanation, Evolutionary
of the offspring of the actor, or alleles of the
offspring of the beneficiaries of its altruistic acts,
then altruism evolves. One can therefore reason by References
including within fitness all those alleles due to the
altruist activities of the focal individual. Inclusive Bouchard F, Rosenberg A (2008) Fitness. In: Stanford on-line
encyclopedia of philosophy
fitness is therefore the number of genes directly
Grafen A (2009) Formalizing Darwinism and inclusive fitness
passed on to the next generation by a focal indi- theory. Phil Trans R Soc B 364(1):3135–3141
vidual, plus the ones that are passed on by its kin. Sober E (2002) The two faces of fitness. In: Krimbas R (ed)
What is therefore increased by selection is rather Thinking about evolution: historical, philosophical, and
political perspectives. Cambridge University Press,
inclusive than individual fitness, even though
Cambridge
calculating inclusive fitness may be difficult in
practice (Grafen 2009).
Even if mathematically speaking the construal of
fitness is clear and how to construe it in a given prob- Fitness Function
lem is often straightforward, the issue of the bearer of
fitness is difficult. Originally, only organisms had fit- Ke-Qin Liu
ness, which was often computed as the number of Institute of Systems Biology, Shanghai University,
offspring; with Modern Synthesis and the formulation Shanghai, China
of evolution in terms of gene frequencies in the context
of population genetics, fitness is also ascribed to genes
and genotypes. These values are interdependent of Synonyms
each other because the fitness of a gene can be seen
as the contribution it makes to the fitness of the organ- Objective function
ism, but only in the case of asexual organisms the
number of offspring equals the number of copies of
a given gene. Moreover, fitness is often seen as lifetime Definition
fitness, that is, computed along the whole life of the
organism; yet in behavioral ecology, one mainly In evolutionary algorithm, fitness function is actually
considers individually each act and ascribes fitness to an objective function that is used to determine which
it (e.g., costs and benefits of various strategies are solution within a population is better when solving
measured in terms of fitness). a particular problem (Holland 1975; Nelson et al. 2009).
Often, absolute fitness cannot be measured but rel- In evolution algorithm, the solution space consists
ative fitness can, and only the latter is evolutionarily of a population of chromosomes where each chro-
important (individuals with identical fitnesses do not mosome is one solution that can be evaluated by
undergo natural selection). Sometimes, one measures the fitness function. Finally, the chromosomes can
a posteriori the fitness of types of organisms by be ranked by calculating the fitness function value
counting the number of offspring. Otherwise, one can (Holland 1975). A proper fitness function is very
consider fitness as strictly correlated to the way indi- important for the speed of the search and the
viduals face environmental demands, and then it can be solution of optimization.
Flow Cytometry 743 F
References This mapping FðtÞ ¼ QðtÞetS gives rise to a time-
dependent change of coordinates (y ¼ Q1 ðtÞx), under
Holland JH. Adaptation in natural and artificial systems: which the original system becomes a linear system
an introductory analysis with applications to biology,
control and artificial intelligence. (2nd edition. 1992,
with real constant coefficients dy dt ¼ Sy. Since QðtÞ is
MIT) Ann Arbor: University of Michigan Press; 1975. continuous and periodic it must be bounded. Thus, the
Nelson AL, Barlow GJ, Doitsidis L. Fitness functions in evolu- stability of the zero solution for yðtÞ and xðtÞ is deter-
tionary robotics: a survey and analysis. Robot Auton Syst. mined by the eigenvalues of S. The representation
2009;57(4):345–370.
FðtÞ ¼ PðtÞetB is called a Floquet normal form for
the fundamental matrix FðtÞ.
The eigenvalues of the matrix eTB are called the
Fitting of Continuous and Deterministic Floquet multipliers of the system (some called charac-
Models teristic multipliers). They are also the eigenvalues of the
F
(linear) Poincaré maps xðtÞ ! xðt þ T Þ. A Floquet
▶ Grid Computing, Parameter Estimation for Ordinary exponent (sometimes called a characteristic exponent)
Differential Equations is a complex m such that emT is a Floquet multiplier of the
system. Notice that Floquet exponents are not unique
since eðmþð2pik=TÞÞT ¼ emT (where k is an integer), but
Floquet multipliers are unique. The real parts of the
Flemming Body Floquet exponents are called Lyapunov exponents. The
zero solution is asymptotically stable if all Lyapunov
▶ Midbody exponents are negative, Lyapunov stable if the Lyapunov
exponents are nonpositive, and unstable otherwise.

Floquet Multiplier
Flow Cytometry
Tianshou Zhou
School of Mathematics and Computational Sciences, Xiaojun Liu
Sun Yet-Sen University, Guangzhou, Guangdong, China Internal Medicine, The Second Hospital of Hebei
Medical University, Shijiazhuang, Hebei, China

Definition
Definition
Consider a linear differential equation of the form:
Flow cytometry is a technology that simultaneously
x_ ¼ AðtÞx measures and then analyzes multiple physical charac-
teristics of single particles, usually cells, as they flow in
where AðtÞ is a continuous periodic function with a fluid stream through a beam of light. The properties
period T. If FðtÞ is a fundamental matrix solution to measured include a particle’s relative size, relative gran-
this system, then for all t 2 R, ularity or internal complexity, and relative fluorescence
intensity. These characteristics are determined using an
Fðt þ T Þ ¼ FðtÞF1 ð0ÞFðTÞ optical-to-electronic coupling system that records how
the cell or particle scatters incident laser light and emits
In addition, for each matrix B (possibly complex) fluorescence.
such that eTB ¼ F1 ð0ÞFðTÞ, there is a periodic
(period T) matrix function t7!PðtÞ such that
FðtÞ ¼ PðtÞetB for all t 2 R. Also, there is a real matrix Cross-References
S and a real periodic (period-2T) matrix function
t7!QðtÞ such that FðtÞ ¼ QðtÞetS for all t 2 R. ▶ Cell Sorting
F 744 Flow Cytometry, Flow Microfluorimetry

Cross-References
Flow Cytometry, Flow Microfluorimetry
▶ Fluorescence Microscopy
▶ Cell Cycle Analysis, Flow Cytometry ▶ Spectroscopy and Spectromicroscopy

Fluorescence Fluorescence Microscopy

Xiaojun Liu Yi Zeng


Internal Medicine, The Second Hospital of Hebei Department of Pediatrics, University of Arizona,
Medical University, Shijiazhuang, Hebei, China Tucson, AZ, USA

Synonyms
Definition
Epifluorescence microscope; Fluorescence
Fluorescence occurs when a fluorescent compound microscope
absorbs light energy over a range of wavelengths
that is characteristic for that compound. It is
a transition of energy produced by the electron Definition
when it decays from higher energy level raised
by the absorption light to its ground state. In Fluorescence microscopy utilizes fluorescence as
Fluorescence-activated cell sorting fluorescence is a means of detecting objects through a microscope.
often used to show the particles’ characters
according to the specific antibody conjugated by
the fluorescent compound. Characteristics

Fluorescence microscopy (FP) has been extensively


Cross-References used in biomedical research. It allows users to observe
the structure and dynamics of the cell or tissue with
▶ Cell Sorting great granules. FP is designed to obtain temporal
and spatial information of objects that are either
autofluorescent, or have been labeled with extrinsic
fluorescent molecules. The combination of fluorescent
Fluorescence Microscope specificity and the sensitivity of the latest optical
instruments has enabled the detection of small amount
Xiaohua Wu of materials with high resolution and precision.
Department of Pediatrics, Herman B Wells Center for Furthermore, the application of the FP offers not just
Pediatric Research, Indiana University School of the qualitative description, but also the quantitative
Medicine, Indianapolis, IN, USA attributes of the specimens.
The entire procedure of fluorescence microscopy
starts with specimen preparation. The specimen is
Definition then irradiated by excitation light of specific wave-
length. The much weaker emitted ▶ fluorescence is
Fluorescence microscope is an optical microscope separated from the brighter excitation light and reaches
used to study properties of organic or inorganic sub- human eyes or other detectors usually a digital or
stances using the phenomena of fluorescence conventional film camera. The fluorescent areas shine
and phosphorescence, instead of, or in addition to, brightly against a dark background with sufficient con-
reflection and absorption. trast to permit detection.
Fluorescence Microscopy 745 F
Fluorescence microscopy has many advantages that fluorescence microscopy. Commonly used objectives
are not readily available in other optical microscopy can be classified into transmitted-light and reflected-
techniques: light categories. Transmitted-light objectives are
• Specificity. Fluorescence excitation and emission designed to be used with coverslips. Reflected-light
spectra are usually inherent characteristics of objectives feature specially collated glass surfaces to
a molecule. This property is the foundation for avoid reflection in the optics. Detectors allow visualiza-
selective analysis of complex mixtures of molecular tion of low levels of emitted ▶ fluorescence without
species. photobleaching or photodamage to the specimen. They
• Sensitivity. By distinguishing autofluorescence also allow real time recording of living cell and tissue
from specific fluorescence, fluorescence micros- physiology. A variety of specimen chambers are
copy can reveal the presence of fluorescent mate- available to allow analysis of live or fixed cells and
rial with a scale of even single fluorescent tissues.
F
molecule. One of the most important applications of fluores-
• Environmental sensitivity. The high sensitivity of cence microscopy is in the field of ▶ immunofluores-
fluorescence to physical and chemical environ- cence, which combines the sensitivity of fluorescence
ment enables the investigation of pH, viscosity, microscopy and the high degree of specificity exhibited
refractive index, ionic concentrations, membrane by antigen-antibody binding. Two commonly used
potential, and solvent polarity in living cells and ▶ immunofluorescence techniques are direct immuno-
tissues. fluorescence and indirect immunofluorescence. Direct
• High temporal resolution. Fluorescence measure- immunofluorescence uses a single antibody labeled with
ments can be used to track fast chemical and molec- a ▶ fluorochrome to bind the target ▶ antigen in the
ular changes in specimens. specimen. The reaction of the chemically attached fluo-
• High spatial resolution. Fluorescence can be mea- rescent conjugate and antigen is demonstrated when the
sured from single molecules if the molecules con- fluorochrome is excited. The subsequent emission inten-
tain a sufficient number of ▶ fluororphores. The sity at various wavelengths can then be observed visu-
interactions of cellular components whose dimen- ally or captured by a detector system. In indirect
sions are below the diffraction-limited resolution of immunofluorescence, an unlabeled primary antibody
the microscope can be visualized using fluores- first incubates with the target antigen. A secondary fluo-
cence resonance energy transfer techniques. rochrome-labeled antibody then incubates with the pri-
• Quantitation. Direct correlation between emitted mary antibody because it recognizes the primary as an
▶ fluorescence and the fluorescence quantum antigen. Subsequently, the labeled complex of antigen
yield (the ratio of photon absorption to emission) and antibodies is excited at the peak wavelength inten-
allows quantitative measurements by fluorescent sity of the fluorochrome, and any resulting emission is
microscope. Quantification can be achieved at observed. The fact that each antigen is able to bind to
relatively low concentrations due to the greater multiple antibodies allows indirect immunofluorescence
sensitivity of emission when compared to absorp- to produce greater fluorescence intensity compared to
tion processes. direct immunofluorescence. In addition, this approach
Fluorescence microscope has five basic compo- reduces the necessity of keeping in stock large numbers
nents: excitation light sources, wavelength selection of fluorochrome-labeled antibodies.
devices, objectives, detectors, and stages and specimen Other popular fluorescence microscopy applications
chambers. Powerful compact light sources are needed include: fluorescence in situ hybridization, fluorescence
to generate sufficient excitation light intensity to pro- differential interference contrast, automated fluores-
duce detectable emission. Tungsten or halogen lamps cence image cytometry, fluorescence recovery after
are used in transmitted or incident illumination. The photobleaching, total internal reflectance fluorescence
most common lamps for epifluorescence microscopes microscopy, fluorescence resonance energy transfer
are mercury, xenon, or metal halide arc lamps. In microscopy, digitized fluorescence polarization micros-
recent years, there has been increasing use of lasers copy, fluorescence lifetime imaging microscopy, Fourier
as light sources. Lasers are most widely used for spectroscopy/spectral dispersion microscopy, delayed
confocal microscopy and total internal reflection luminescence microscopy.
F 746 Fluorescence Spectroscopy

Cross-References Cross-References

▶ Antigen ▶ Spectroscopy and Spectromicroscopy


▶ Fluorescence
▶ Fluorochrome
▶ Fluorophore
▶ Immunofluorescence Fluorescence-activated Cell Sorting

Steven D. Rhodes
References School of Medicine, Indiana University, Indianapolis,
IN, USA
Herman B (1998) Fluorescence microscopy. Microscopy hand-
books. Bios Scientific, Oxford. ISBN 9780387915517
Definition
Huang K, Murphy RF (2004) From quantitative microscopy to
automated image understanding. J Biomed Opt
9(5):893–912, ISSN 1083-3668 Fluorescence-activated cell sorting, or FACS for short,
MedlinePlus. Medical encyclopedia, August 2011. URL is a specialized type of cell sorting that utilizes flow
http://www.nlm.nih.gov/medlineplus/ency/encyclopedia_
cytometry. Fluorescently tagged cells are isolated into
A.htm
Molecular expressions. Fluorescence microscopy, August 2011. charged droplets which are separated based on their
URL http://micro.magnet.fsu.edu/primer/techniques/fluores- deflection between two electrodes.
cence/fluorhome.ht%ml
Rost FWD (1991) Quantitative fluorescence microscopy. Cam-
bridge University Press, Cambridge/New York. ISBN
Cross-References
9780521394222
Spring KR (2003) Fluorescence microscopy. In: Driggers RG ▶ Single Cell Assay, Mesenchymal Stem Cells
(ed) Encyclopedia of optical engineering, Dekker encyclo-
pedias series. Marcel Dekker, New York
Wolf DE (2007) Fundamentals of fluorescence and fluorescence
microscopy. In Greenfield Sluder, DE Wolf (eds) Digital
microscopy. Methods in cell biology, 3rd edn, vol 81. Aca- Fluorescent Dye
demic, pp 63–91
Wouters FS (2006) The physics and biology of fluorescence
▶ Fluorochrome
microscopy in the life sciences. Contemp Phys 47(5):
239–255, ISSN 0010-7514

Fluorescent Marker
Fluorescence Spectroscopy
▶ Fluorescent Markers
Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for
Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA Fluorescent Markers

Definition Yongzheng He and Yan Li


Department of Pediatrics, Indiana University School of
Fluorescence spectroscopy uses higher-energy pho- Medicine, Indianapolis, IN, USA
tons to excite a sample, which will then emit lower-
energy photons. This technique has become popular
for its biochemical and medical applications, and Synonyms
can be used for confocal microscopy, fluorescence
resonance energy transfer, and fluorescence lifetime Cell marker; Cluster of differentiation (CD); Fluores-
imaging. cent marker; Fluorophore; Multicolor flow cytometry
Fluorescent Markers 747 F
Definition (monocytes and macrophages, neutrophils, basophils,
eosinophils, erythrocytes, megakaryocytes/platelets,
A fluorophore is a functional component of a molecule dendritic cells), and lymphoid lineages (T-cells,
which can cause a molecule to be fluorescent by the B-cells, NK-cells). In the embryo, hematopoietic
means of absorbing energy of a specific wavelength stem cells are formed from the mesoderm during
and emitting energy at a different wavelength. Fluo- embryogenesis and are deposited in specific hemato-
rescein isothiocyanate (FITC), rhodamine, and other poietic sites. In adults, HSCs are found in the bone
derivatives are common fluorophores used for a variety marrow and peripheral blood following pretreatment
of applications. with cytokines, such as G-CSF (granulocyte colony-
Fluorescent markers are specific molecules, like stimulating factors), that induce cells to be released
protein, which are covalently bound fluorophores that from the bone marrow compartment. Other sources of
selectively bind to a functional group of the target for HSCs include the placenta and umbilical cord blood.
F
detection. The most commonly used fluorescent mol- There is no single specific marker for stem cell.
ecules are antibodies. Generally, HSCs are characterized by the absence of
Cell markers are specified protein on the surface lineage-specific marker expression and expression of
of every cell, called receptors, which can selectively a combination of cell markers.
bind or adhere to other “signaling” molecules. The Human HSCs are determined as CD34+, CD59+,
biological uniqueness of the receptors and chemical CD90/Thy1+, CD38low/, c-Kit/low, and Lin.
properties of certain compounds are used to mark Murine HSCs are determined as CD34low/, Sca-1+,
cells. CD90/Thy1+/low, CD38+, c-Kit+, and Lin(Geraerts
Cluster of differentiation (CD) molecules are and Verfaillie 2009). Alternative methods that allow
markers on the cell surface, which can be recognized for more efficient identification of stem cells, such as
by specific sets of antibodies. Cluster of differentiation SLAM (Laje et al. 2010) family of cell surface mole-
systems can be used to identify the cell type, stage of cules, are now emerging.
differentiation, and activity of a cell.
Multicolor flow cytometry is a technique in which Mesenchymal Stem Cell Markers
cells or cell components (such as DNA) are stained Mesenchymal stem cells (MSCs) are multipotent stem
with multiple fluorescent dyes to detect the fluores- cells which are capable of differentiating into a variety
cence by laser beam illumination using up to 18 inde- of cell types, such as osteoblasts, chondrocytes, and
pendently measurable colors for the identification and adipocytes. Traditionally, MSCs are found in the bone
sorting of cells. Multicolor flow cytometry offers marrow. Alternatively, other tissues can be a more
a platform to acquire detailed information on specific accessible source for mesenchymal stem cell isolation,
cells within a mixed population, which is critical to including adipose tissue, cord blood, and peripheral
maximize the data that can be obtained from a small or blood.
limited sample. Due to the lack of unique single marker for both
human and murine MSCs so far, a combination of cell
markers is commonly used to identify MSCs. Human
Characteristics MSCs are determined as CD11c, CD14, CD31,
CD34, CD45, CD117, CD29+, CD44+, CD90+,
Multicolor Flow Cytometry CD73+, CD105+, CD106+, and CD166+ (Delorme
A combination of fluorescence conjugated antibodies et al. 2006; Parekkadan and Milwid 2010; Aicher
in flow cytometry is utilized for the determination and et al. 2011); Murine MSCs are determined as CD34,
sorting of certain cell population. CD45, CD44+, CD90+, CD105+, CD117 SCA1+
(Schrepfer et al. 2007; Soleimani and Nadri 2009;
Hematopoietic Stem Cell Markers Zhu et al. 2010).
Hematopoietic stem cells (HSCs), which constitute
1:10,000 of cells in myeloid tissue, are multipotent Cell Cycle
stem cells that give rise to blood cell types of the Flow cytometry can be utilized in cell cycle analysis to
myeloid and lymphoid lineages, including myeloid distinguish different phases of the cell cycle.
F 748 Fluorescent Probe

After being permeablized by detergent like Triton References


X-100 or NP-40, or fixed with ethanol, cells are incu-
bated with fluorescent DNA dyes for DNA quantifica- Aicher WK, Buhring HJ et al (2011) Regeneration of
cartilage and bone by defined subsets of mesenchymal
tion in single cell. Because fluorescent DNA dye also
stromal cells – potential and pitfalls. Adv Drug Deliv
stains RNA, cells are usually treated with RNase A to Rev 63(4–5):342–351
remove RNAs. The fluorescence intensity of each Darzynkiewicz Z, Juan G et al (2001) Determining cell cycle
stained cell at a certain wavelength correlates linearly stages by flow cytometry. Curr Protoc Cell Biol Chapter 8:
Unit 8 4
with amount of DNA content in it. In this way, the
Delorme B, Chateauvieux S et al (2006) The concept of mesen-
G0/G1 phase, S phase, G2/M phase (DNA duplicates) chymal stem cells. Regen Med 1(4):497–509
can be distinguished by the intensity of fluorescent Geraerts M, Verfaillie CM (2009) Adult stem and progenitor
DNA dye according to the amount of DNA the cell cells. Adv Biochem Eng Biotechnol 114:1–21
Laje P, Zoltick PW et al (2010) SLAM-enriched hematopoietic
contains.
stem cells maintain long-term repopulating capacity after
Besides Propidium iodide, 7-Aminoactinomycin lentiviral transduction using an abbreviated protocol. Gene
D (7AAD), DAPI, and Hoechst are frequently used as Ther 17(3):412–418
fluorescent DNA dye. Muppidi J, Porter M et al (2004) Measurement of apoptosis and
other forms of cell death. Curr Protoc Immunol Chapter 3:
Unit 3 17
Cell Apoptosis Parekkadan B, Milwid JM (2010) Mesenchymal stem cells as
Annexin V affinity assay is a method in molecular therapeutics. Annu Rev Biomed Eng 12:87–117
biology that employs flow cytometry to quantify the Parish CR, Glidden MH et al (2009) Use of the
intracellular fluorescent dye CFSE to monitor lymphocyte
number of cells undergoing apoptosis, which uses
migration and proliferation. Curr Protoc Immunol
the protein Annexin V to tag apoptotic and dead Chapter 4:Unit 4 9
cells. Schrepfer S, Deuse T et al (2007) Simplified protocol to isolate,
The fluorescence conjugated Annexin V protein purify, and culture expand mesenchymal stem cells. Stem
Cells Dev 16(1):105–107
binds to negative charged phosphatidylserine (PS) on
Soleimani M, Nadri S (2009) A protocol for isolation and culture
the membrane of cells undergoing apoptosis. By con- of mesenchymal stem cells from mouse bone marrow.
jugating FITC to Annexin V it is possible to identify Nat Protoc 4(1):102–106
and quantitate apoptotic cells on a single-cell basis by Zhu H, Guo ZK et al (2010) A protocol for isolation and culture
of mesenchymal stem cells from mouse compact bone.
flow cytometry.
Nat Protoc 5(3):550–560
When cells were stained with Annexin V (AV) and
propidium iodide (PI) simultaneously, it is easy to
distinguish intact cells (AVPI), early apoptotic
(AV+PI), and late apoptotic or necrotic cells
(AV+PI+) (Muppidi et al. 2004).
Fluorescent Probe
Proliferation Assay
Cell proliferation may be assessed by flow cytometry ▶ Fluorochrome
by labeling cells with the dye CFSE, which readily
crosses intact cellular membranes and irreversibly cou-
ples to both intracellular and cell surface proteins.
When cells divide, the CFSE labeling is then distrib- Fluorochrome
uted into the two daughter cells equally (Parish et al.
2009). Yi Zeng
Department of Pediatrics, University of Arizona,
Tucson, AZ, USA
Cross-References

▶ Cell Cycle Analysis, Flow Cytometry Synonyms


▶ Cell Sorting
▶ Flow Cytometry Fluorescent dye; Fluorescent probe
Flux Balance Analysis 749 F
Definition Cross-References

Fluorochromes are molecules capable of exhibiting ▶ Fluorescence Microscopy


fluorescence. ▶ Fluorescent Markers

Cross-References References

Herman B (1998) Fluorescence microscopy, Microscopy hand-


▶ Fluorescence Microscopy books. Bios Scientific, Oxford. ISBN 9780387915517
Huang K, Murphy RF (2004) From quantitative microscopy
to automated image understanding. J Biomed Opt 9(5):
References 893–912, ISSN 1083–3668
MedlinePlus (2011) Medical encyclopedia. http://www.nlm.nih. F
gov/medlineplus/ency/encyclopedia_A.htm
Herman B (1998) Fluorescence microscopy, Microscopy
Molecular Expressions (2011) Fluorescence microscopy.
handbooks. Bios Scientific, Oxford. ISBN 9780387915517
http://micro.magnet.fsu.edu/primer/techniques/fluorescence/
Huang K, Murphy RF (2004) From quantitative microscopy
fluorhome.html
to automated image understanding. J Biomed Opt 9(5):
Rost FWD (1991) Quantitative fluorescence microscopy. Cam-
893–912, ISSN 1083–3668
bridge University Press, Cambridge. ISBN 9780521394222
MedlinePlus (2011) Medical encyclopedia. http://www.nlm.nih.
Spring KR (2003) Fluorescence microscopy. In: Driggers RG
gov/medlineplus/ency/encyclopedia_A.htm
(ed) Encyclopedia of optical engineering, Dekker encyclo-
Molecular Expressions (2011) Fluorescence microscopy.
pedias series. Marcel Dekker, New York
http://micro.magnet.fsu.edu/primer/techniques/fluorescence/
Wolf DE (2007) Fundamentals of fluorescence and fluorescence
fluorhome.html
microscopy. In: Sluder G, Wolf DE (eds) Digital microscopy,
Rost FWD (1991) Quantitative fluorescence microscopy. Cam-
vol 81, 3rd edn, Methods in cell biology. Academic, Amster-
bridge University Press, Cambridge. ISBN 9780521394222
dam, pp 63–91
Spring KR (2003) Fluorescence microscopy. In: Driggers RG
Wouters FS (2006) The physics and biology of fluorescence
(ed) Encyclopedia of optical engineering, Dekker encyclo-
microscopy in the life sciences. Contem Phys 47(5):
pedias series. Marcel Dekker, New York
239–255, ISSN 0010-7514
Wolf DE (2007) Fundamentals of fluorescence and fluorescence
microscopy. In: Sluder G, Wolf DE (eds) Digital microscopy,
vol 81, 3rd edn, Methods in cell biology. Academic,
Amsterdam, pp 63–91
Wouters FS (2006) The physics and biology of fluorescence Flux Balance Analysis
microscopy in the life sciences. Contem Phys 47(5):
239–255, ISSN 0010-7514
Meghna Rajvanshi and Kareenhalli V. Venkatesh
Department of Chemical Engineering, Indian
Institute of Technology Bombay, Powai, Mumbai,
Maharashtra, India

Fluorophore
Synonyms
Yi Zeng
Department of Pediatrics, University of Arizona, Metabolic flux balancing
Tucson, AZ, USA

Definition
Definition
Flux Balance Analysis (FBA) is a method in ▶ meta-
Fluorophore is the structural domain or specific region bolic pathway modeling to quantify a metabolic state
of a molecule that is capable of exhibiting fluores- of a cell. It is a constraint-based approach, which uses
cence. Fluorochromes that are conjugated to a larger linear programming, subject to constrains imposed by
macromolecule through absorption or covalent bonds the stoichiometry of the metabolic network, thermody-
are termed “fluorophores.” namics, and the measured rates (rm). The fluxes of
F 750 Flux Balance Analysis

the metabolic network are computed under these a Network c Stoichiometric Matrix
constraints by optimizing an objective function under Ax r1 r2 r3 r4 r5 r6 r7 r8 r9
▶ steady state operation of the metabolic network r1 A 1 −1 0 0 0 0 0 0 0
(Edwards et al. 2002). FBA is used when limited A B 0 1 −1 0 −1 0 0 0 0
r2 C 0 0 1 −1 0 0 0 0 0
information about the accumulation rates, typically of r3 r4 D 0 0 0 0 1 −1 0 −1 0
an extracellular metabolite (i.e., measured rates, rm), B C Cx E 0 0 0 0 0 1 −1 0 0
is known and the stoichiometric matrix with coeffi- r5 F 0 0 0 0 0 0 0 1 −1
cients of all the unmeasured reactions (denoted by D
r6 r8
Su) is noninvertible to provide a unique solution. d Constraints
E F
Objective function = Maximize (r7)
r7 r9 0 ≤ r1 < 1
Characteristics Ex Fx 0.2 ≤ r4 ≤ 1
0.3 ≤ r9 ≤ 1
b Mass Balance equations r1−9 ≥ 0
The first step in FBA is reconstruction of the metabolic
S.r = 0
network. Once all the internal and transport reactions dA = r − r = 0
are identified, dynamic mass balance equations for all dt 1 2 e Solution
dB = r − r − r = 0
the metabolites are formulated. These mass balances dt 2 3 5 r1 1
dC = r − r = 0 r2 1
can also be represented in a matrix form representing 3 4 r3 0.2
dt
the stoichiometric matrix (S), and the flux vector (r). dD = r − r − r = 0 r4 0.2
5 6 8 r = r5 = 0.8
FBA analyzes the metabolic network assuming dt
dE = r − r = 0 r6 0.5
a steady-state condition, therefore the dynamic mass 6 7 r7 0.5
dt
dF = r − r = 0 r8 0.3
balance equations are set to zero, as given below: 8 9 r9 0.3
dt

dc Flux Balance Analysis, Fig. 1 Flux Balance Analysis (FBA)


¼ S:r ¼ 0 (1) of a simple metabolic network. The solution is obtained for
dt
maximizing the accumulation rate of “Ex,” that is, maximizing
the rate, r7
The above equation ðS:r ¼ 0Þ forms the constraint
for the linear optimization problem. Since, in
most biological systems, the number of reactions (n) is Mathematically, the maximization problem can be
usually more than the number of metabolites (m), that is, stated as:
ðn > mÞ, the system is underdetermined and has
ðn  mÞ degrees of freedom and therefore, cannot be max Z ¼ c:r (2)
solved algebraically. In order to solve the system,
additional constraints are specified, which constrict the Subject to the constraints: S:r ¼ 0; r  0; r  rmax ;
solution space. These constraints are, as mentioned pre- rm: min  rm  rm:max
viously, thermodynamic (defining reversibility or irre- Here, Z denotes the linear objective function and
versibility of the reactions) and capacities of enzymes c is a row vector of weights (coefficients) on the fluxes
and transporters (defines maximum uptake or reaction “r” used to define an objective function. Weights indi-
rates) (Edwards et al. 2002). Additionally, certain flux cate the contribution of reaction “r” toward an objec-
values are also specified, which are experimentally mea- tive function. rmax is the maximum flux derived from
sured. These constraints define a range of feasible values an enzyme or it corresponds to the maximum transport
in the solution space. However, to obtain a unique solu- capability of an enzyme. rm.min and rm.max are the
tion, the space is further constricted by an objective minimum and maximum rates achievable for
function. The solution thus obtained is specific to a particular reaction, respectively.
a metabolic state that optimizes for a specific objective. The methodology of FBA can be elucidated by
Therefore, solving an underdetermined system trans- solving an example by using a simple hypothetical
lates to an optimization problem for a defined objective metabolic network. Figure 1a illustrates an example
function (Z), wherein linear programming methodolo- of a hypothetical metabolic network. This system has
gies may be applicable (Edwards et al. 2002). nine reactions and ten metabolites. Four of these
Flux Balance Analysis 751 F
metabolites (Ax, Cx, Ex and Fx) are external and the rest solutions for a given set of constraints. However, pre-
are internal metabolites. All reactions in this system diction of the metabolic fluxes can be improved by
are irreversible, out of which r1, r4, r7, and r9 are introducing more number of measured fluxes in the
exchange reactions and r2, r3, r5, r6, and r8 are internal analysis by which the search space is further
reactions. For each of the internal metabolites, mass constrained.
balance equations can be written as shown in Fig. 1b.
These equations can be represented in the form of Applications of FBA
matrix (S), where rows of the matrix correspond to FBA is widely used for characterizing cellular
internal metabolites and column corresponds to reac- metabolism and has given solutions which are
tions (Fig. 1c). A unique flux distribution for the net- consistent with experimental findings. In this regard,
work is obtained for an assumption that the system is some of the most studied organisms are Escherichia
optimized for the accumulation rate of metabolite coli, Helicobacter pylori, Haemophilus influenzae, Sac-
F
“Ex.” Further the solution is also dependent on addi- charomyces cerevisae, and Methanosarcina barkeri.
tional constraints of the system under study, which are FBA is extensively used in determining metabolic net-
imposed by the stoichiometric coefficients of the work properties, identifying redundancies (existence of
metabolites, flux carrying capabilities of various reac- alternate optimal solutions) (Papin et al. 2002) and
tions, irreversibility, and measured accumulations robustness in the metabolic network. Robustness can
rates (Fig. 1d, e). be examined by introducing environmental perturba-
Since the solution to characterize metabolism is tions and analyzing its effect with respect to optimal
dependent on the optimization problem, defining growth rate. Analysis of perturbations in the network by
a precise objective function is critical. The objective deleting or inserting genes are important as they yield
function should closely represent the cellular metabo- information regarding the essentiality of the gene for
lism for a given condition. Typically, the basis of survival of an organism and thereby allowing identifi-
defining an objective function is the assumption that, cation of potential drug targets (Raman et al. 2005).
due to evolutionary pressure, cells have evolved to an Metabolic engineering is another area where FBA is
optimal behavior and therefore, the phenotypic state extensively used. It helps in identifying target genes,
yields an optimal flux distribution. The most com- whose manipulation leads to improved production of
monly used objective function is the maximization of metabolite of interest (Wang et al. 2006). Fongs et al.
growth or formation of biomass. Prediction of fluxes (2005) observed that strains engineered using findings
with this objective function has yielded results consis- of FBA behave suboptimally but after undergoing adap-
tent with several experimental findings, such as 86% tive evolution, the cell can reach a state of metabolic
cases for Escherichia coli, 85% for Pseudomonas optimality (Ibarra et al. 2002; Fong et al. 2005). FBA has
aeruginosa, 70% for Leishmania major, and 60% for also been used in analyzing the growth of an organism
Helicobacter pylori (Gianchandani et al. 2010). Other on different carbon sources (Covert and Palsson 2002).
objective functions employed include optimizing ATP Recently, FBA has been applied in the area of
production (to determine optimal energy efficient con- medicine to investigate the metabolic network of
dition), production of a metabolite, or rate of nutrient human mitochondria and the effect of treatment on
uptake (Raman and Chandra 2009). the metabolism. FBA has been applied to optimize
Although FBA has been useful in quantifying the the metabolism of cultured hepatocytes to be used in
metabolic state consistent with experimental measure- bio-artificial liver devices (Thiele et al. 2005). FBA has
ments, the formulation of the objective function is the also been used to study plant metabolism, but the imple-
limitation of this approach. Selection of an inappropriate mentation of FBA is difficult due to the structural
objective function will lead to a solution which will not complexities in plants.
be an accurate representation of the cellular metabolism Thus, FBA is a powerful modeling approach
(Edwards et al. 2002). Apart from identification of for quantitative simulation of microbial metabolism.
a precise objective function, it is also essential to repre- It assumes optimal cell behavior with respect to
sent the objective function in a precise mathematical a known objective function. Its applicability to genome
form. Further, FBA yields only one optimal solution scale metabolic networks makes FBA popular in
though there might be other optimal or suboptimal analyzing cellular physiology in such systems.
F 752 Flux Control Analysis

Cross-References
Flux Control Coefficient
▶ FBA Analysis, Plant-Pathogen Interactions
▶ Network Modeling of Biochemical Transport Emma Saavedra and Rafael Moreno-Sánchez
Phenomena Department of Biochemistry, National Institute for
▶ Pathway Modeling, Metabolic Cardiology “Ignacio Chávez”, Mexico City, Mexico
▶ Optimization Algorithms for Metabolites
Production
▶ Steady State Definition

It is the degree of control that each enzyme exerts on


References a metabolic pathway flux. A practical definition is the
percentage of change in the pathway flux when a 1%
Covert MW, Palsson BO (2002) Transcriptional regulation in change in the activity of a pathway enzyme is achieved.
constraints-based metabolic models of Escherichia coli.
The corresponding written description is CJai, where J is
J Biol Chem 277(31):28058–28064. doi:10.1074/jbc.
M201691200 flux (or cellular function) and a is the activity of an
Edwards JS, Covert M, Palsson B (2002) Metabolic modelling of enzyme i (or protein, transporter, or cellular process).
microbes: the flux-balance approach. Environ Microbiol The CJai as well as the ▶ concentration control coeffi-
4(3):133–140
cients (see next description) are systemic properties of
Fong SS, Burgard AP, Herring CD, Knight EM,
Blattner FR, Maranas CD, Palsson BO (2005) In silico the enzymes when they are working together in the
design and adaptive evolution of Escherichia coli for pathway or cellular network.
production of lactic acid. Biotechnol Bioeng 91(5):
643–648
Gianchandani EP, Chavali AK, Papin JA (2010) The application
of flux balance analysis in systems biology. Wiley Interdiscip Cross-References
Rev Syst Biol Med 2(3):372–382
Ibarra RU, Edwards JS, Palsson BO (2002) Escherichia ▶ Metabolic Control Theory
coli K-12 undergoes adaptive evolution to achieve in
silico predicted optimal growth. Nature 420(6912):
186–189
Papin JA, Price ND, Edwards JS, Palsson BO (2002) The
genome-scale metabolic extreme pathway structure in
Haemophilus influenzae shows significant network redun-
dancy. J Theor Biol 215(1):67–82
Fokker–Planck Equation
Raman K, Chandra N (2009) Flux balance analysis of biological
systems: applications and challenges. Brief Bioinform Ruiqi Wang
10(4):435–449. doi:10.1093/bib/bbp011 Institute of Systems Biology, Shanghai University,
Raman K, Rajagopalan P, Chandra N (2005) Flux balance anal-
Shanghai, China
ysis of mycolic acid pathway: targets for anti-tubercular
drugs. PLoS Comput Biol 1(5):e46
Thiele I, Price ND, Vo TD, Palsson BO (2005) Candidate
metabolic network states in human mitochondria impact Definition
of diabetes, ischemia, and diet. J Biol Chem 280(12):
11683–11695
Wang Q, Chen X, Yang Y, Zhao X (2006) Genome-scale in Fokker–Planck equations are a special type of master
silico aided metabolic analysis and flux comparisons of equation and are often used as a continuous approxi-
Escherichia coli to improve succinate production. Appl mation to master equations. By the Taylor expansion
Microbiol Biotechnol 73(4):887–894
of master equation to order two, we obtain the Fokker–
Plank equation.
The Fokker–Planck equation is beneficial in
the sense that some theoretical analysis can be
Flux Control Analysis conducted. For example, for some simple cases,
the equilibrium probability distribution can be
▶ Metabolic Control Analysis obtained.
Forward and Inverse Parameter Estimation for Metabolic Models 753 F
References 1. Identification of network structure and regulation
2. Selection of a mathematical modeling framework
van Kampen NG (1981) Stochastic processes in physics and 3. Estimation of parameter values
chemistry. Elsevier, Amsterdam
4. Model diagnostics
Risken H (1989) The Fokker-Planck equation. Springer, Berlin
5. Model validation
6. Applications and uses of the model
The first phase is dedicated to identifying the net-
Foreign Substance work structure and regulation of the system. This phase
relies on a combination of available information and
▶ Xenobiotics assumptions or hypotheses, which are derived from the
literature or from de novo experiments. Based on the
available body of knowledge, the modeler needs to
F
decide which components and interactions to include
Formation of Phosphodiester Bond in the model and which to omit in order to keep the
model manageable, while retaining the integrity of the
▶ Phosphodiester Bond Formation system. Additional assumptions and simplifications
are usually needed in order to fill gaps in information.
The result of this model design phase is often visual-
Formerly Known as the Physiome CellML ized as a diagram with nodes for the components
Environment (metabolites) and arrows for interactions between
them (fluxes). The second phase is dedicated to the
▶ OpenCell choice of the best-suited mathematical framework and
the corresponding formulation of a ▶ symbolic model,
that is, a model in which no parameter values are
specified yet. This process usually starts with
Forward and Inverse Parameter converting the diagram of the system topology and
Estimation for Metabolic Models regulation into mathematical equations. These may
be linear or nonlinear, stochastic or deterministic,
I-Chun Chou, Zhen Qi, Melissa L. Kemp and static or dynamic, and consist of explicit functions,
Eberhard O. Voit differential equations, or difference equations (Voit
The Wallace H. Coulter Department of Biomedical et al. 2008). Metabolic systems are usually described
Engineering, Georgia Institute of Technology and with a set of differential equations that represent the
Emory University, Atlanta, GA, USA dynamics of the system variables and are composed of
sums and differences of the metabolic fluxes driving
the system. Once the symbolic model is formulated,
Definition the next phase of parameter estimation consists of
numerically configuring the symbolic model, which
One of the most challenging bottlenecks of mathemat- requires the determination of parameter values that
ical modeling is the identification of optimal parameter render the model consistent with experimental obser-
values with which the model matches experimental vations. It is seldom expected that the model will
data well. This essay discusses distinct strategies for exactly match all available data points, and the desired
approaching this identification step. or required degree of consistency is therefore a matter
of judgment. In addition to fitting the data, the numer-
ical model needs to satisfy other characteristics inher-
Characteristics ent in biological systems, such as stability, robustness,
and possibly different transient behaviors. The diagno-
Modeling Process sis of the model with respect to such features is
The generic process of modeling metabolic systems followed by the validation of the model through testing
consists of six phases (Chou and Voit 2009): against experimental data and biological, clinical,
F 754 Forward and Inverse Parameter Estimation for Metabolic Models

pharmacological, or other information that had not or Hill function, or one of their generalizations; how-
been used for the design of the model. If the model ever, other options include power-law functions,
has successfully gone through these design and testing lin-log representations, or more complex mathematical
phases, one may cautiously deem it appropriate for the descriptions (Voit and Chou 2009). The computational
purposes that initially led to the model design. In aspects of estimating parameters for kinetic rate func-
particular, one may use the model for explanations, tions are rather simple, as these functions are explicit
predictions, and applications, the generation of new and usually contain only a few parameters. For
hypotheses, assistance in the design of new biological instance, in the case of MM rate laws, the parameters
experiments, possibly the development of treatment can be estimated with methods of linear regression.
strategies for diseases, and the manipulation and opti- The data characterizing the kinetic properties of an
mization of the model toward specific goals, such as enzyme or transporter are usually measured on isolated
the production of organics in metabolic engineering. enzymes in vitro and account for optimal temperature
The phasing of the modeling process may give the and pH ranges, cofactors, modulators, and secondary
impression that modeling is straightforward. However, substrates. Once sufficient information on individual
in most cases it is an iterative process requiring the rate processes has been assembled, the modeler attempts
repeated return to earlier phases (Voit et al. 2008). to merge this information into an integrative mathemat-
ical model of the entire system, which typically consists
The Challenge of Parameter Estimation of ordinary differential equations.
The most challenging among the phases of model devel- Though this forward approach is theoretically
opment is usually the estimation of parameter values, straightforward, it has several disadvantages. First, it
especially when the investigated system is moderately requires that a considerable amount of local kinetic
large. Addressing this challenge successfully is crucial, information is at hand. Second, even if this information
because it is clear that the parameter values qualitatively is available in the literature, it had often been obtained
and quantitatively characterize the metabolic model and from different organisms or with experiments under
distinguish it from others. Furthermore, most model various conditions. As a consequence, the ultimately
analyses can only be executed when all parameter emerging integrated model is often not internally
values are specified, and most predictions and many consistent or does not match biological observations,
explanations made with the model are of a numerical thus mandating numerous rounds of refinements,
nature that requires parameterization. restructuring, and reparameterization. These iterations
The development of parameter estimation methods is are very labor intensive and time consuming and greatly
driven by the availability and characteristics of experi- benefit from a combination of biological and computa-
mental data. Correspondingly, estimation methods can tional expertise which, however, is still rare.
be very distinct, thereby reflecting the variety of exper-
imental data types (Voit 2004; Goel et al. 2006). The Inverse Estimation
currently available methods can be classified as: Instead of first representing one process at a time and
1. Forward (bottom-up) approaches and subsequently merging all representations into a system
2. Inverse (top-down) methods using steady-state or of differential equations, it is also possible to infer
time series data quantitative information directly from data that char-
acterize the entire system. These inverse approaches
Forward Estimation fall into two categories that correspond either to
Before molecular high-throughput data were available, ▶ steady-state data or to time series data.
essentially all metabolic models were developed Steady-state data characterize a metabolic system
according to the first strategy, and most models are under a condition where all concentrations have
still generated in this fashion today. Specifically, a reached constant levels and all influxes and effluxes
model is designed based on local kinetic information are in balance. The most common approaches for
that is used to formulate functions for individual bio- parameter estimation from such data use stoichiomet-
chemical processes, such as enzyme-catalyzed reactions ric analysis (▶ Conservation Analysis) and ▶ flux bal-
and transport steps. The default choice for these func- ance analysis (FBA) (e.g., Palsson 2006). In both
tions or rate laws is typically a Michaelis–Menten (MM) methods, the stoichiometry is assumed to be known,
Forward and Inverse Parameter Estimation for Metabolic Models 755 F
some influxes and effluxes are measured as they enter sometimes even in vivo, and that insights gained from
or leave the system under steady-state conditions, and their analysis are therefore as close to reality as is
all other fluxes are inferred with computational currently feasible.
methods. In addition, FBA accounts for physico- The concept of these methods is intuitively simple:
chemical constraints on the fluxes in the system and Use the chosen symbolic model and an optimization
furthermore assumes that the cell attempts to optimize algorithm to determine values for all model parameters
an overall objective, such as maximal growth. This such that the dynamics of the model matches the
objective and all constraints are reformulated as observation data as closely as possible. The attraction
a constrained linear optimization task, which permits of this approach is that time series data, if obtained
an effective determination of a unique, optimal flux from in vivo perturbation or stimulus–response exper-
distribution that satisfies all constraints and maximizes iments, directly or indirectly account for all processes
the selected objective. These stoichiometrically based associated with the reaction of the system to the
F
methods solely characterize a steady-state flux distri- perturbation.
bution and do not provide information on concentra-
tions, regulation, or parameters associated with Algorithms for Inverse Estimation
dynamic features of the system. Many optimization algorithms have been proposed for
To a limited degree, parameter values can also be inverse estimation tasks in biology. Most of them
obtained from data close to a steady state. In this case, attempt to minimize the difference between the exper-
the data are obtained from experiments that measure imental data and a model response that is obtained per
the responses of a biological system to small perturba- computer simulation. However, in spite of substantial
tions (Sorribas and Cascante 1994). Such experiments efforts, none of these algorithms is truly satisfactory in
may use biochemical inhibitors, artificial ligands, realistic situations of moderately large systems or
genetic mutations, or a variety of other methods. where time series data are sparse, incomplete, or
The second category of inverse estimation corrupted by noise.
approaches is very different from these methods at, or The most prominent algorithms for inverse tasks may
close to, a steady state. Namely, they are based on time be divided into three groups. The first group consists
series data that characterize the full dynamic response of gradient-based, steepest-descent, or hill-climbing
of a system to some stimulus, such as an environmental methods. Best known among these are the Gauss–New-
stress or the sudden availability of food. These types of ton and Levenberg–Marquardt algorithms, which are
data are becoming more prevalent, thanks to recent included in all major software packages of the field,
advancements in molecular high-throughput tech- such as Mathematica and Matlab. The second group
niques at the genomic, proteomic, and metabolomic consists of stochastic search algorithms (▶ Evolution
levels. At least in principle, these data have the capa- Programming). These include evolutionary computa-
bility of characterizing global responses in a dynamic tion (EC) (▶ Stochastic Simulation Algorithm), ▶ sim-
manner, which subsequently permits the estimation of ulated annealing (SA), adaptive stochastic methods,
parameter values and even the identification of the ▶ clustering methods, and other meta-heuristics, such
topology and regulation of a system in a “top-down” as ant colony optimization (ACO) and ▶ particle swarm
manner. optimization (PSO). Some of these algorithms are bio-
The experimental tools that can generate dynamic logically inspired and have been applied to parameter
data of metabolites include nuclear magnetic resonance estimation tasks with the goal of finding global solu-
(NMR), mass spectrometry (MS) coupled with high- tions, especially in the context of identifying the struc-
performance liquid chromatography (HPLC), and flow tures of gene regulatory networks (Moles et al. 2003).
cytometry. The experiments can be performed under The best-known evolutionary method among these is
many different conditions, such as different initial set- the genetic algorithm (GA), which is now widely avail-
tings, various gene knock-outs or knock-downs, or dif- able in many variants that have proven useful and prac-
ferent types of enzyme inhibition, and shed light on the tical in a variety of biological applications. The third
system from different angles. Clear advantages of such group of methods accounts for the fact that observed
“global” data include that their information content is data are affected by random events. Thus, these
rich, that they can be collected from the same organism, approaches consider parameter estimation as a branch
F 756 Forward and Inverse Parameter Estimation for Metabolic Models

of statistics and use statistical or machine-learning tech- biological phenomena are usually also very robust, so
niques as solution strategies. These techniques include that modest alterations in parameter values often do not
maximum likelihood estimation, Markov chain Monte change their behavior. One contributor to this robustness
Carlo methods, and Kalman filtering. is redundancy, which allows the compensation of fail-
One significant issue with time series analysis is ures in one component with changes in other compo-
that the differential equations of the model have to be nents. This compensation directly translates into
solved thousands of times. Approximation methods different sets of parameter values, which all perform
have been developed to avoid this numerical solution with similar effectiveness and create biological sloppi-
step. While they save an enormous amount of comput- ness that may or may not be related to the sloppiness in
ing time, they face their own challenges, which are not optimized parameter sets. Thus, precise solutions,
always easily overcome (Chou and Voit 2009; Voit consisting of unique sets of parameter values, may not
and Almeida 2004; Ramsay et al. 2007). even be the ultimate criterion for a quality fit, and instead
of aiming for a model with the smallest residual error,
Quality of Fit one might set as the goal to identify all sets of different
Comparisons between parameter estimation algorithms parameterizations that are consistent with the data,
are usually based on the goodness of their fits to exper- within some acceptable error. Some of these may be
imental data, which is typically assessed as the sum of discarded if they correspond to models with high sensi-
squared errors (SSE) between model results and the tivities, a low degree of stability, or other undesirable
observed data (Voit 2011). However, the best fit should properties. The remaining set of parameter values may
not be the only criterion, whether the estimation occurs turn out to be clustered tightly, consist of several distinct
in the forward or inverse direction. In many actual cases, clusters, or be scattered throughout a large domain within
the “best” model, as judged by the smallest SSE, actu- the search space. In each case, the characteristics of this
ally tends to run into ▶ overfitting problems, which set, combined with comparative analyses of the candi-
means that the model contains too many parameters. date models, may identify one model as more likely than
As a consequence, its extrapolation and prediction its alternatives or suggest hypotheses and critical labora-
power with respect to untested conditions is low. tory experiments that may ultimately yield deeper
Related to this issue is the observation that the same insights into the system under investigation.
dataset can often be fitted by distinctly different param-
eter sets or even different model structures with similar
accuracy. This issue of unidentifiability and sloppiness Cross-References
has received much attention in recent times (Gutenkunst
et al. 2007; Srinath and Gunawan 2010). ▶ Linear Regression
The insistence on a good fit is not necessarily the ▶ Metabolic Networks, Reconstruction
best strategy due to the nature of biological phenomena. ▶ Optimization and Parameter Estimation, Genetic
First, biological data are often affected significantly by Algorithms
inter-individual variability, as well as uncertainties ▶ Ordinary Differential Equation (ODE)
due to slight variations in experimental conditions. As ▶ Pathway Modeling, Metabolic
a consequence, biological observations are not as exactly
replicable as experiments in physics. For instance, the
KM value is often seen as a genuine property of an
enzyme, and in vitro characterizations are taken as true. References
However, the same enzyme might slightly vary between
Chou I-C, Voit EO (2009) Recent developments in parameter
species and strains, and even a high degree of sequence
estimation and structure identification of biochemical and
homology may not prevent differences in the affinity genomic systems. Math Biosci 219:57–83
between enzyme and substrate, which directly affects Goel G, Chou I-C, Voit EO (2006) Biological systems modeling
the KM value. In other areas of science, the intrinsic and analysis: a biomolecular technique of the twenty-first
century. J Biomol Tech 17:252–269
variability among items can often be characterized with
Gutenkunst RN, Casey FP, Waterfall JJ, Myers CR, Sethn AJP
large numbers of measurements, but this strategy is not (2007) Extracting falsifiable predictions from sloppy models.
always feasible in biology. While variability is natural, Ann N Y Acad Sci 1115:203–211
Founders Effect 757 F
Moles CG, Mendes P, Banga JR (2003) Parameter estimation in
biochemical pathways: a comparison of global optimization Foundational Model of Anatomy
methods. Genome Res 13:2467–2474
Palsson BØ (2006) Systems biology: properties of reconstructed
networks. Cambridge University Press, Cambridge, UK Mark A. Musen
Ramsay JO, Hooker G, Cao J, Campbell D (2007) Parameter Stanford Center for Biomedical Informatics Research,
estimation for differential equations: a generalized smooth- Stanford University, Stanford, CA, USA
ing approach. J Roy Stat Soc B 69:741–796
Sorribas A, Cascante M (1994) Structure identifiability
in metabolic pathways: parameter estimation in models
based on the power-law formalism. Biochem J 298:
Definition
303–311
Srinath S, Gunawan E (2010) Parameter identifiability of The most thorough, thoughtful, and comprehensive
power-law biochemical system models. J Biotechnol ▶ ontology of human anatomy in existence, initiated by
149(3):132–140
Voit EO (2004) The dawn of a new era of metabolic systems
Prof. Cornelius Rosse at the University of Washington in F
analysis. Drug Discov Today BioSilico 2:182 the 1990s.
Voit EO (2011) What if the fit is unfit? In: Dehmer M, Emmert-
Streib F, Salvador A (eds) Applied statistics for biological
networks. Wiley, New York Cross-References
Voit EO, Almeida JS (2004) Decoupling dynamical systems for
pathway identification from metabolic profiles. Bioinformat- ▶ Protégé Ontology Editor
ics 20(11):1670–1681
Voit EO, Chou I-C (2009) Parameter estimation in canon-
ical biological systems models. Int J Syst Synth Biol 1:
1–19
Voit EO, Qi Z, Miller GW (2008) Modeling complex biological
systems, one step at a time. Pharmacopsychiatry 41(suppl 1):
S78–S84 Founders Effect

Sabina Leonelli
ESRC Centre for Genomics in Society, University of
Exeter, Exeter, Devon, UK
Forward-scattered Light (FSC)

Xiaojun Liu Definition


Internal Medicine, The Second Hospital of Hebei
Medical University, Shijiazhuang, Hebei, China Founders effect is the narrowing of experimentation to
a few well-studied organisms, usually portrayed as the
opposite of bestowing resources on comparative
Definition research among organisms (Joergensen 2001 and
Krebs 1975).
FSC is a measurement of mostly diffracted light.
It is the part of the laser light that is detected just off
the axis of the incident laser beam in the forward
Cross-References
direction by a photodiode. It is proportional to cell-
▶ Model Organism
surface area or size. FSC provides a suitable method of
detecting particles greater than a given size indepen-
dent of their fluorescence and is, therefore, often used References
in immunophenotyping to trigger signal processing.
Jørgensen CB (2001) August Krogh and Claude Bernard on
basic principles in experimental physiology. Bioscience
51:59–61
Cross-References Krebs HA (1975) The August Krogh Principle: ‘for many prob-
lems there is an animal on which it can be most conveniently
▶ Cell Sorting studied. J Exp Zool 194:221–226
F 758 Fourier Transform Infrared Microspectroscopy (FTIR)

Fourier Transform Infrared Framework Region (FR-IMGT)


Microspectroscopy (FTIR)
Marie-Paule Lefranc
Xiaohua Wu Laboratoire d’ImmunoGénétique Moléculaire, Institut
Department of Pediatrics, Herman B Wells Center for de Génétique Humaine UPR 1142, Université
Pediatric Research, Indiana University School of Montpellier 2, Montpellier, France
Medicine, Indianapolis, IN, USA
Synonyms
Definition
Framework region; FR-IMGT
FTIR spectroscopy is a long-established and invalu-
able technique, which is based on the principle that Definition
molecules absorb mid-IR radiation, yielding richly
structured IR absorption spectra. A Framework region (FR-IMGT) is a region made of
one or several beta strands, part of the framework of
a V-DOMAIN (▶ Variable (V) domain), and delimited
Cross-References
according to the ▶ IMGT unique numbering for
V domain (Lefranc et al. 2003). There are four
▶ Spectroscopy and Spectromicroscopy
FR-IMGT in a V-DOMAIN: FR1-IMGT (beta strands
A and B), FR2-IMGT (beta strands C and C0 ),
FR3-IMGT (beta strands C00 , D, E and F), and
Frame Language FR4-IMGT (beta strand G).
In a V-DOMAIN (V domain of the immunoglobu-
Mark A. Musen lins (IG) or antibodies and T cell receptors (TR)),
Stanford Center for Biomedical Informatics Research, the first three FR-IMGT are part of the V-REGION
Stanford University, Stanford, CA, USA (encoded by a ▶ variable (V) gene), whereas the FR4-
IMGT corresponds to part of the J-REGION (encoded
by a ▶ joining (J) gene).
Definition By comparison, in a V-LIKE-DOMAIN (V domain
of ▶ immunoglobulin superfamily (IgSF) other than
A knowledge representation based on prototype clas- IG and TR), the nine strands that correspond to the
ses, where subclasses of each superclass specialize the four FR-IMGT of a V-DOMAIN are usually encoded
superclass in some way. Frames are like objects in by one exon of a gene.
object-oriented programming. Frame classes have Amino acid positions of the FR-IMGT of a
slots (or properties or attributes) that take on values V-DOMAIN have always the same number according
when the frame is instantiated. to the ▶ IMGT unique numbering for V domain (Lefranc
et al. 2003). This allows to define, in an ▶ IMGT Collier
de Perles, the lengths of the FR-IMGT, the anchor posi-
Cross-References tions that support the complementarity determining
regions (▶ Complementarity Determining Region
▶ Protégé Ontology Editor (CDR-IMGT)) and the five conserved amino acids of a
V-DOMAIN (by comparison, four for a V-LIKE-
DOMAIN). These characteristics, based on the
▶ IMGT-ONTOLOGY concepts and managed in the
Framework Region ▶ IMGT® information system, are reported in Table 1.
Starting from amino acid sequences, the FR-IMGT
▶ Framework Region (FR-IMGT) lengths are obtained using the IMGT/DomainGapAlign
Frequency 759 F
Framework Region (FR-IMGT), Table 1 Characteristics of the framework regions (FR-IMGT) of a V-DOMAIN
FR-IMGT lengths of a basic
FR-IMGT Beta strands V-DOMAIN without gaps Anchors and conserved amino acidsa
FR1-IMGT A, B 26 1st-CYS 23
Anchor 26
FR2-IMGT C, C0 17 Anchor 39
CONSERVED-TRP 41
Anchor 55
FR3-IMGT C00 , D, E, F 39 Anchor 66
Conserved hydrophobic amino acid 89
2nd-CYS 104 (Anchor 104)
FR4-IMGT G 11 Anchor 118 (J-TRP or J-PHE in V-DOMAIN)a
Total 93 F
a
By comparison, the amino acid at position 118 is not conserved in a V-LIKE-DOMAIN. Otherwise, the anchors and conserved
amino acids are identical between a V-DOMAIN and a V-LIKE-DOMAIN (▶ Variable (V) Domain).

tool that compares the V-DOMAIN with the closest References


germline V and J genes (http://www.imgt.org) (Lefranc
2009; Ehrenmann et al. 2010). The lengths of the indi- Ehrenmann F, Duroux P, Giudicelli V, Lefranc M-P
(2010) Standardized sequence and structure analysis
vidual beta strands (for both V-DOMAIN and V-LIKE-
of antibody using IMGT ®. In: Kontermann R, D€ ubel
DOMAIN) are provided with amino acid changes S (eds) Antibody engineering, Ch 2, vol 2, 2nd edn.
according to the IMGT physicochemical classes Springer-Verlag, Berlin, Heidelberg, pp 11–31
(Pommié et al. 2004). Lefranc M-P (2009) Antibody databases and tools: The IMGT ®
experience. In: Zhiqiang An (ed) Therapeutic monoclonal
The lengths of the four FR-IMGT of a V-DOMAIN
antibodies: from bench to clinic, Ch 4 Wiley, Hoboken,
are shown between brackets and separated with dots. New Jersey, pp 91–114
For a basic V-DOMAIN without gaps, the lengths of Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
the FR-IMGT are [26.17.39.11] with a total of 93 Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
unique numbering for immunoglobulin and T cell receptor
positions in the ▶ IMGT Collier de Perles. For
variable domains and Ig superfamily V-like domains. Dev
a human VH (V-DOMAIN of an IG-Heavy chain), Comp Immunol 27:55–77
the FR-IMGT lengths are [25.17.38.11] with a total Pommié C, Levadoux S, Sabatier R, Lefranc G, Lefranc M-P
of 91 amino acids, whereas for a human VL (2004) IMGT standardized criteria for statistical analysis of
immunoglobulin V-REGION amino acid properties. J Mol
(V-DOMAIN of an IG-Light chain), the FR-IMGT
Recognit 17:17–32
lengths are [26.17.36.10] with a total of 89
amino acids.

Frequency

Cross-References Tianshou Zhou


School of Mathematics and Computational Sciences,
▶ Complementarity Determining Region Sun Yet-Sen University, Guangzhou, Guangdong,
(CDR-IMGT) China
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering Definition
▶ IMGT ® Information System
▶ IMGT-ONTOLOGY For cyclical processes, such as rotation, oscillations, or
▶ Immunoglobulin Superfamily (IgSF) waves, frequency is defined as a number of cycles per
▶ Joining (J) Gene unit time. In physics and engineering disciplines, such
▶ Variable (V) Domain as optics, acoustics, and radio, frequency is usually
▶ Variable (V) Gene denoted by a Latin letter f or by a Greek letter n (nu).
F 760 Frequent Pattern Mining

In SI units, the unit of frequency is the hertz (Hz), a database. Frequent itemsets play an essential role in
named after the German physicist Heinrich Hertz. 1 Hz many ▶ Data Mining tasks and are related to interesting
means that an event repeats once per second. patterns in data, such as ▶ Association Rules. Some
A previous name for this unit was cycles per second. concepts are necessary in order to understand this
A traditional unit of measure used with rotating definition:
mechanical devices is revolutions per minute, abbre- 1. Transaction: Let X ¼ {x1, x2, . . ., xm} be a set of m
viated RPM. 60 RPM equals 1 Hz. The period, usually elements called items and let T ¼ {t1, t2, . . ., tn} be
denoted by T, is the length of time taken by one cycle, a set of n subsets of items called transactions. Each
and is the reciprocal of the frequency f: transaction in T identifies a subset of items.
2. Frequent itemset: Given a set of items X ¼ {x1, x2,. . .,
1
T¼ xm} and a set of transactions T ¼ {t1, t2, . . ., tn},
f
a subset of X, S, is called a frequent itemset if S
The SI unit for period is the second. occurs in a percentage of all transactions in T that
Calculating the frequency of a repeating event is exceeds a threshold, named support.
accomplished by counting the number of times that 3. Support: The support of an itemset Y, support (Y),
event occurs within a specific time period, then dividing is defined as the number of transactions in T which
the count by the length of the time period. For example, contain the itemset Y.
if 71 events occur within 15 s the frequency is: What exactly constitutes an item or a transaction
depends on the application and on the type of infor-
71
f ¼  4:7ðHzÞ mation to be extracted. In a ▶ Gene Association
15 s
Analysis context, the meaning of transaction is usu-
If the number of counts is not very large, it is ally associated with “Over-expression,” that is, only
more accurate to measure the time interval for those “Over-expressed” genes will be understood to
a predetermined number of occurrences, rather than be included in the transaction. Equivalently, the
the number of occurrences within a specified time. The term frequent itemset is related to frequent subset
latter method introduces a random error into the count of of genes.
between 0 and 1, so on average half a count. This is Frequent Pattern Mining (FPM) techniques provide
called gating error and causes an average error in the methods to extract automatically all the frequent
calculated frequency of Df ¼ 1/(2 Tm), or a fractional itemsets from a dataset, being an extremely costly
error of Df / f ¼ 1/(2 f Tm), where Tm is the timing task (Alves et al. 2010). In ▶ Microarray data analysis,
interval and f is the measured frequency. This error the specific ▶ Gene Expression dataset structure
decreases with frequency, so it is a problem at low (thousands of genes against only hundreds of exper-
frequencies where the number of counts N is small. imental conditions) increases the frequent itemsets
mining process complexity. Due to this fact, devel-
oping efficient FPM techniques to be applied to
▶ Genomic studies has been an important challenge
Frequent Pattern Mining during the last years.
Next, in order to provide a better understanding of
Jesús Aguilar-Ruiz1, Domingo Rodrı́guez -Baena1 and Frequent Pattern Mining methods, a simple APRIORI
Ronnie Alves2 (see the review paper of Han et al. 2007 for further
1
School of Engineering, Pablo de Olavide University, information) example is illustrated in Fig. 1.
Escuela Politécnica Superior, Seville, Spain Let M be a discretized matrix, where 1 and 0 mean
2
Instituto de Informática, Universidade Federal do Rio “Over-expressed” and “Under-expressed,” respec-
Grande do Sul, Porto Alegre, Brazil tively. Table T represents the transactions and their
items. The APRIORI algorithm generates, iteratively,
Definition candidate itemsets. In every iteration, the support of
every candidate itemset is calculated, eliminating
Frequent Pattern Mining is a ▶ Data Mining subject those itemsets with a support value under a threshold
with the objective of extracting frequent itemsets from (set to 2/4 in this example). Based on the idea that an
Frequentist Approach 761 F
Iteration 1
g1 g2 g3 g4 g5 Itemset Sup.
Itemset Sup.
Tid Items Itemset {g1,g2} 0.25
c1 1 0 1 1 0 {g1} 0.5
c1 g1,g3,g4 {g1} {g1,g3} 0.5
c2 0 1 1 0 1 {g2} 0.75
c2 g2,g3,g4 {g2} {g1,g5} 0.25
c3 1 1 1 0 1 c3 {g3} 0.75
g1,g2,g3,g5 {g3} {g2,g3} 0.5
c4 0 1 0 0 1 {g4} 0.25
c4 g2,g5 {g5} {g2,g5} 0.75
{g5} 0.75
M T {g3,g5} 0.5

Iteration 3 Iteration 2

Itemset Sup. Itemset


{g1,g2,g3} 0.25 {g1,g3}
Itemset F
{g1,g2,g3,g5} 0 {g2,g3}
{g2,g3,g5}
{g1,g3,g5} 0.25 {g2,g5}
Final frequent itemsets {g2,g3,g5} 0.5 {g3,g5}

Frequent Pattern Mining, Fig. 1 Example of the Apriori algorithm with support set to 2/4, that is, every itemset, to be considered
as a valid candidate, has to appear in at least two of the four transactions

itemset is candidate if all its subsets are known to be Characteristics


frequent, the resulting itemsets are combined to create
new candidate itemsets. The algorithm ends when no In statistics, the goal is to make inference about
new candidate group can be generated. In Fig. 1 , after parameters of the population. In frequentist infer-
three iterations the final frequent itemset is composed ence, the parameters are constant, unknown values
of the genes g2, g3, and g5. and data are viewed as repeatable random samples
from the distribution. The parameters are estimated
using point estimates and confidence intervals. Both
References point and interval estimators vary from sample to
sample.
Alves R, Rodriguez-Baena D, Aguilar-Ruiz J (2010) The point estimator is chosen with respect to its
Gene association analysis: a survey of frequent pattern
mining from gene expression data. Brief Bioinform
properties including unbiasedness, consistency, and
11(2):210–224 efficiency. The unbiased estimator has its expecta-
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: tion equal to the population parameter, the consis-
current status and future directions. Data Min Knowl Discov tent estimator converges in probability to the
15(1):55–86
parameter value, and the relative efficiency of two
estimators is defined by the ratio of their mean
squared errors.
Frequentist Approach For example, consider data sampled from the
univariate normal distribution with mean m and
Larissa Stanberry variance s2. The sample mean, the sample median,
Bioinformatics and High-throughput Analysis and the midrange of a sample are the examples
Laboratory, Seattle Children’s Research Institute, of a point estimator of the mean m. All three estimators
Seattle, WA, USA are consistent and converge to the true population
mean as the sample size increases, but it is the sample
mean that is efficient.
Synonyms In the above example, the sample mean has
desirable theoretical properties. A single number,
Frequentist inference however, gives little information as to how close
F 762 Frequentist Hypothesis Testing

the estimated mean is to the true mean.


A confidence interval gives an interval estimate FuGE
of the population parameter. The length of the
confidence interval indicates how well sample Peter Wilkinson1 and Andrew R. Jones2
1
statistic estimates the population parameter. Vaccine and Gene Therapy Institute, Port St. Lucie,
Confidence intervals are constructed at a certain FL, USA
confidence level (1  a), where a is the type 2
Institute of Integrative Biology, University of
I error rate, i.e., the probability of rejecting Liverpool, Liverpool, UK
null when it is true. The 95% confidence interval
means that if one repeatedly samples data from
the population, 95% of the corresponding Synonyms
confidence intervals would contain the true popu-
lation parameter. FuGE-OM; Functional genomics experiment model
In the frequentist approach, the hypothesis test-
ing about the parameters uses the distribution of
the test statistics under the null hypothesis. When Definition
the exact distribution is not available, the hypoth-
esis testing relies on large-sample theory to con- The Functional Genomics Experiment (FuGE) model is
struct the approximate null distribution. Using the a set of computational resources, designed to facilitate
sampling distribution of the test statistic, the p- rapid prototyping of new data exchange formats and
value is computed as the probability of observing corresponding relational database schemas for the life
a statistic as or more extreme than the given one. sciences. It comprises a Unified Modeling Language
The decision is guided by comparing the p-value (UML) object model with software for automatically
to the selected significance level a. The p-value generating different implementations, including an
should not be overinterpreted but rather it should XML Schema (XSD) for the exchange standard,
be considered with respect to a given study a relational database schema, and a programming layer
design, sample size, and wider knowledge of the in Java to interface between the XSD and the database.
subject. Bioinformatics groups can develop extensions of FuGE
through two modes – either by extending the object
model in free graphical editing software, or by working
directly with a cut-down version of the XSD (FuGE-
light), to create new models for the biological domain
Frequentist Hypothesis Testing of interest. FuGE is particularly useful for “omics” inves-
tigations and systems biology, where numerous hetero-
▶ Hypothesis Testing geneous data types may describe the biological system.
The use of FuGE ensures that different data models have
the same underlying core, allowing rapid development of
new models and supporting software, and facilitating the
integration of the resulting data.
Frequentist Inference

▶ Frequentist Approach Characteristics

FuGE Overview
The core resource in the FuGE project is an object model,
comprising 10 packages (Fig. 1), each of which pertains
FR-IMGT to a particular type of data or metadata that is generically
required for digitally representing any type of life sci-
▶ Framework Region (FR-IMGT) ences experiment (Jones et al. 2007). An example is the
FuGE 763 F
Audit Contacts, auditing and security settings for all objects.

Additional annotations and free-text descriptions for all objects.


Description

Classes for providing measurements within FuGE, including slots for


Measurement the measurement value, the unit and the data type.

A mechanism for referencing external ontologies


Ontology or terms from a controlled vocabulary.
A model of procedures, software, hardware and parameters. The
Protocol package can define workflows by relating input and output materials
Common and/or data to the protocols that act on them.

Reference External bibliographic or database references that can


be applied to many objects across the FuGE model.
F
FuGE
Defines the dimensions of data and storage matrices,
Data or references to external data formats.

An overview of the investigation, capturing the overall design, the


Bio Investigation experimental variables and the associations to related data.

Material Models material types such as organisms, samples or solutions,


characterized by ontology terms or by extension of the package.

Conceptual Captures database entries of biological molecules such as DNA, RNA or


Molecule proteins and an extension point for other types, such as metabolites.

FuGE, Fig. 1 An overview of the FuGE package structure

Protocol package (Fig. 2), which contains a set of UML method. For example, extensions built for proteomics
classes and diagrams showing the relationships between capture the essential parameters of protein separation
objects. The key components in the Protocol package are techniques (Gibson et al. 2010) as defined by the
the Protocol class – a description of what should be done, associated minimum reporting guidelines documents –
for example, a standard operating procedure – and the MIAPE (Taylor et al. 2007). In this way, the format
ProtocolApplication class – a description of what was developer does not have to recreate all the models for
done, that is, with any runtime parameter values differing the basic core; FuGE provides all the elements for
from the defaults defined in the Protocol. Similarly, the handling protocols, data files, samples, and ontologies.
Data package contains structures for capturing data New data models can be developed quickly, and the
values in regular multidimensional matrices or as exter- associated project software can be used to create
nal files. ProtocolApplication is used to map between the a relational database or format an editing software auto-
input and output data (or biological samples) used at each matically (Belhajjame et al. 2008; Swertz et al. 2010).
stage of an experimental or data analysis pipeline, and Systems biology investigations typically use a range
thus can track the full audit trail for how the final results of experimental approaches to understand cellular
were generated through each stage in the process. processes. In the absence of using an extensible model,
developing a new data format or exchange standard can
FuGE for Data Format Development be a challenging process, and resulting models for differ-
One of the primary uses of FuGE is to develop new data ent domains tend to represent similar concepts (such as
exchange formats or standards (Jones et al. 2009). protocols and parameters) using different terminologies
A user interacts with the FuGE model with a UML and different levels of detail, thus making data integra-
editing tool or an XSD editing tool if working in the tion more difficult. Furthermore, some models do
FuGE-light mode. Extensions are built to capture the not allow for audit trails to capture all the processes
data types specific to the experimental or analysis that have been applied to the sample or data file.
F
764

Parameter
1 Parameter 0..* ParameterValue
+isInputParam : Boolean [0..1]
0..1 0..*
0..* 0..1
DefaultValue Value
0..1 1
ContactRole Parameters ParameterValues
Measurement
0..*
Provider
0..1 0..1 1
Parameterizable ParameterizableApplication

Protocol
Protocol 0..* ProtocolApplication
1
0..* 0..* +activityDate : DateTime [0..1]
Software Equipment 1 0..1 0..* 0..1

0..* 0..* InputMaterials


OutputData
Software Equipment 0..*
+version : String [0..1] MaterialMeasurement

InputData
0..* OutputMaterials

MeasuredMaterial 0..*

1 Data
0..*
0..*
Material

FuGE, Fig. 2 A component of the Protocol package, showing how parameter values are provided by a ProtocolApplication
FuGE
Function 765 F
Adoption of the FuGE methodology for model develop- management of high-throughput systems biology
ment should encourage developers to capture this level investigations. FuGE captures the necessary object
detail where appropriate, which will help with unambig- abstractions that can be implemented across three layers
uous interpretation of the results by the diverse commu- of data management: (1) data persistence, (2) in silico
nity of potential consumers of the data. and bench protocol management, and (3) data exchange.

FuGE for Database Development


Another use of FuGE is to develop an SQL References
or object-based database schema for data storage. Data-
base systems are designed to meet specific project Belhajjame K, Jones AR, Paton NW (2008) A toolkit for capturing
and sharing FuGE experiments. Bioinformatics 24:2647–2649
requirements and are generally purpose built for opti-
Gibson F, Hoogland C, Martinez-Bartolomé S, Medina-Aunon JA,
mized query execution or data entry. Fundamentally, Albar JP, Babnigg G, Wipat A, Hermjakob H, Almeida JS, F
the data model is the most critical aspect of system design Stanislaus R, Paton NW, Jones AR (2010) The gel electropho-
and function, and the model should reflect “real world” resis markup language (GelML) from the proteomics standards
initiative. Proteomics 10:3073–3081
objects and their relationships to ensure durability that
Jones A, Miller M, Aebersold R, Apweiler R, Ball C, Brazma A,
can outlast any application, many of which are not Degreef J, Hardy N, Hermjakob H, Hubbard S, Hussey P,
known when the system is first put into production. Igra M, Jenkins H, Julian R, Laursen K, Oliver S, Paton N,
These assertions are important in addressing the needs Sansone S-A, Sarkans U, Stoeckert C, Taylor C, Whetzel P,
White J, Spellman P, Pizarro A (2007) The functional geno-
of systems biology investigations because investigations
mics experiment model (FuGE): an extensible framework for
are large, lengthy, multi-institutional, and often need to standards in functional genomics. Nat Biotech 25:1127–1133
be integrated with one another, and, as such, require Jones AR, Lister AL, Hermida L, Wilkinson P, Eisenacher M,
long-term data persistence, protocol management, and Belhajjame K, Gibson F, Lord P, Pocock M, Rosenfelder H,
Santoyo-Lopez J, Wipat A, Paton NW (2009) Modeling and
data exchange. FuGE is a “real world” model of
managing experimental data using FuGE. OMICS: A J Integr
functional genomics experiments and can thus be Biol 13:239–251
used as a reference to derive a suitable database persis- Swertz M, Velde KJ, Tesson B, Scheltema R, Arends D, Vera G,
tence layer. SyMBA (http://symba.sourceforge.net/), for Alberts R, Dijkstra M, Schofield P, Schughart K, Hancock J,
Smedley D, Wolstencroft K, Goble C, de Brock E, Jones A,
example, provides a persistence layer directly based on
Parkinson H, members of the Coordination of Mouse
the FuGE XML Schema. FuGE can also be used to create Informatics, R, Genotype-To-Phenotype, C, Jansen R (2010)
an efficient, relatively generic, SQL-based schema that XGAP: a uniform and extensible data model and software
can be queried based on description or ontology types as platform for genotype and phenotype experiments. Genome
Biol 11:R27
consistent with the FuGE model, as described at the
Taylor CF, Paton NW, Lilley KS, Binz P-A, Julian RK, Jones AR,
project website (http://fuge.sourceforge.net/). Zhu W, Apweiler R, Aebersold R, Deutsch EW, Dunn MJ,
Heck AJR, Leitner A, Macht M, Mann M, Martens L,
FuGE for Pipeline Management Neubert TA, Patterson SD, Ping P, Seymour SL, Souda P,
Tsugita A, Vandekerckhove J, Vondriska TM, Whitelegge JP,
Systems Biology experiments also provide another
Wilkins MR, Xenarios I, Yates JR, Hermjakob H (2007) The
challenge in terms of analysis execution. Using ad hoc minimum information about a proteomics experiment
analysis, similar challenges to data persistence emerge (MIAPE). Nat Biotech 25:887–893
that include a reduced ability to audit and manage large
amounts of data and reporting. The FuGE Protocol
Package provides a suitable model to describe in silico
protocols and applications of protocols. For example, it FuGE-OM
is straightforward to describe an analysis workflow as
a pipeline by using the ProtocolApplication to describe ▶ FuGE
the sequence of steps with their respective parameter
values, inputs, and outputs. Furthermore, the results of
such a workflow can be moved to a persistence layer,
and exchanged later on. Function
The FuGE model is a real world description of omics
experiments that also provides a means for tractable ▶ Function, Distributed
F 766 Function Type

an organism to produce the effects for which that trait


Function Type was maintained in the process of natural selection in
the (possibly recent) past of that organism’s population
▶ IMGT-ONTOLOGY, FunctionType (Wright 1976; Millikan 1989; Neander 1991). In the
▶ IMGT-ONTOLOGY, SpecificityType philosophical debate that emerged in reaction to these
theories, many different understandings of function
and functional explanations have been developed (see
Wouters [2005] and Garson [2008] for overviews).
Function, Biological In biology, the connotation of “function” is usually
not purpose but activity, in a broad sense of that term,
Arno Wouters including “what it does,” “how it works,” and “how it
Department of Philosophy, Erasmus University is used.” For example, “functional morphology” is
Rotterdam, Rotterdam, The Netherlands typically defined as the study of the form of organisms
and their parts in relation to their activity and use. The
many articles yielded by a Google Scholar search on
Definition “structure and function” typically discuss both the way
in which a part of an organism is built (its structure)
The activity, role, value or purpose of a part, activity, and the way it works (its function). Within this broad
or trait of an organism. sense of function as activity, two uses of the term
function can be distinguished: function as activity in
a stricter sense (what it does and how it works) and
Characteristics function as biological role (how it is used).
“Function as activity” refers to what a system does
Terms like “function,” “functions,” and “functional” by itself (in abstraction of its effects on its environ-
are used in many different ways. The 2005 edition of ment) and the way it works – internally (e.g., the way
the New Oxford American Dictionary gives the follow- in which the activity is generated) or externally (e.g.,
ing as the first meaning of “function”: “an activity or the order of its changes). The notion of function as
purpose natural to or intended for a person or thing” what it does is typically used to distinguish form
with “Vitamin A is required for good eye function” as (or structural) characteristics from functional charac-
an example. This definition is suitable as a general teristics. The form characteristics of a system concern
characterization of the term “function” and at the its appearance (shape, volume, color, pattern, texture,
same time it contains the seeds of many confusions etc.), structure (composition, size, and spatial arrange-
about the notion of biological function, especially ment of the parts, e.g., amino acid sequences), and
because it talks about “activity or purpose” statics (hardness, weight, mass, etc.); the functional
and”natural to or intended for.” characteristics of a system concern its activity
The idea that biological function is somehow (frequency, order, velocity, momentum, reaction rates,
related to purposes and the idea that there can be oxygen consumption, kinetic energy, etc.). For example,
natural purposes in addition to intended ones has talk of “functional homology” might refer to a common
been a source of inspiration for philosophical discus- pattern in muscle movement, whereas talk of “structural
sion. In the 1950s and 1960s, philosophers of science, homology” might refer to a common pattern in the
of a logical positivist inclination, searched for ways to spatial arrangement of the muscles. The notion of func-
define the notion of biological function without appeal tion as how it works is typically used to make compar-
to purpose. Since the 1980s, many philosophers isons. For example, when it is said that the heart’s
think that evolutionary theory provides us with ventricle functions as a pressure pump and the atrium
a notion of natural purpose that can be used to develop as a suction pump, one compares the way in which these
a naturalized account of purposes, norms, and meaning two systems work.
in the philosophy of mind and language. According to The notion of function as biological role refers to
these “etiological theories,” it is the natural purpose the role of a system in enabling life. In general, role
and, for that matter, the “proper function” of a trait of functions concern the role of a system or activity in
Function, Biological 767 F
bringing about an organized characteristic of an The hypothetical organisms with which the real organ-
encompassing system (a role function of a brake is to isms are compared need not be real. Quite often,
enable the driver to stop the car because stopping the a comparison is made between a real organism and
car is how the brake contributes to the car’s organized a hypothetical organism that cannot possibly exist and
ability to transport people). The biological role of the point of the comparison is precisely that: to show that
a part of an organism is the role of that part in bringing it cannot exist (because it lacks an essential ability).
about the organism’s state of being alive. For example, Advantages differ from role functions in many ways.
the main biological role of the glycolysis is the Advantages are abilities to solve certain problems, not
production of ATP because that is how the glycolysis positions in an organization. Advantages are, unlike role
contributes to an organism’s ability to stay alive. Note functions, relative to an environment and to the traits
that role functions are positions in an organization used for comparison. In addition, role functions are typ-
rather than measurable properties. ically attributed to parts or activities, whereas advantages
F
Ascriptions of biological roles are the handle to are effects of traits (i.e., of the properties of systems or
understand life. Just as it is possible to explain how activities, including the presence of certain items or the
a company works by means of an organization chart performance of certain activities). It is, for example, the
that outlines the tasks of the different functionaries and biological role of the heart (a part of an organism) to
departments and the way in which they interact, the pump the blood around, whereas pumping blood by
ability of an organism to stay alive can be explained by means of a heart (a trait) is advantageous relative to
outlining the roles the different organ systems play in pumping blood by means of beating blood vessels (the
bringing about the living state. The ability of each trait for comparison) in environments with certain types
organ system to perform its biological role, in turn, of prey and predators because this allows for faster
can be explained by describing the roles the different oxygen transport (an ability resulting from the presence
parts of that system play in bringing about that ability, of a heart), which allows the organism to be more active
and so on, until a level is reached at which the relevant and, hence, to escape from predators or to catch prey
subsystems can be explained in terms of the physical in situations where an organism with beating blood ves-
and chemical characteristics of the molecules that sels would not be able to do so (more distal abilities
make up that subsystem (cf. Cummins 1975). Such an resulting from that trait).
organization chart provides a unifying framework Functional biology can be defined as the study of
for biology that relates detailed studies of specific how living systems (organisms) and their parts work.
mechanisms at different levels to the general aim of Functional biologists are concerned with two kinds of
understanding life. explanations that deal with synchronic relations
Yet, another use of the term “function” stems from between the different parts and activities of organisms
behavioral biology. In this area of study, “function” and the environment in which they live: mechanistic
often refers to the advantages of behaving in one way explanations (also called “causal explanations”) and
rather than another. More generally, the notion of functional explanations (also called “ecological expla-
function as biological advantage (also called “survival nations” or “design explanations”). The ascription of
value,” “adaptive value,” or “biological value”) is used role functions is central to explanations of both kinds
to refer to the way in which a certain trait influences the (see ▶ Explanation in Biology).
life chances of an organism in a certain environment as Mechanistic explanations address questions about
compared to other traits that might replace it. An how a certain biological role is performed (e.g., “how
advantage of a trait in a certain environment is an does the glycolysis generate ATP?”), by describing
ability resulting from that trait due to which the life a mechanism that produces the behavior that enables
chances of organisms with that trait are higher than the that system to perform this role. Because of their
life chances would be of organisms in which that trait concern with biological roles, mechanistic explana-
were replaced by another one (Canfield 1964; Bigelow tions in biology are sometimes called “functional
and Pargetter 1987). explanations” or “functional analyses” (especially by
Advantage articulations compare organisms with philosophers) (Cummins 1975). This kind of explana-
a certain trait with similar organisms in which that tion is discussed in the entry on mechanistic
trait is replaced by another one (or removed). explanation.
F 768 Function, Distributed

Functional explanations address questions about why Millikan RG (1989) In defense of proper functions. Philos Sci
a biological role is performed the way it is (e.g., “why do 56:288–302
Neander K (1991) The teleological notion of ’function’. Australa
many pathways that generate ATP start by activating J Philos 69:454–468
their substrate?”) by pointing to the advantages of Wouters AG (2005) The function debate in philosophy. Acta
performing the role in that way rather than in some Biotheor 53:123–151
conceivable alternative way (Wouters 2007). This kind Wouters AG (2007) Design explanation: determining the con-
straints on what can be alive. Erkenntnis 67:65–80
of explanation is often called “functional explanation” Wright L (1976) Teleological explanations: an Etiological
(especially by biologists) because it is concerned with analysis of goals and functions. University of California
the advantages of certain forms of organization rather Press, Berkeley
than with the question of how those forms are brought
about. It is discussed in the entry on functional explana-
tion (▶ Explanation, Functional).
Biological roles also play an important role in certain Function, Distributed
explanations in evolutionary biology (the study of
the history and dynamics of lineages of organisms), Jonathan F. Davies
especially in adaptation explanations. Adaptation expla- Fondazione Bruno Kessler, Trento, Italy
nations (also called “selection explanations”) are evolu-
tionary explanations that explain certain characteristics
of a population as the result of past interaction in that Synonyms
population between variants that differed in their fitness.
If a certain trait evolved because the fitness of past Delocalized; Function
variants having that trait was higher than that of their
competitors lacking that trait because the presence of that
trait improved the performance of a certain role function, Definition
one might say that the trait evolved as an adaptation for
performing that role function. For this reason, adaptation A distributed function is a feature of mechanistic
explanations are sometimes called “functional explana- explanations of some of the properties of complex
tions” (especially by philosophers) (Brandon 1990). This systems (▶ Complex System). It is a component
kind of explanation is discussed in the entry on evolu- causal-role ▶ function that cannot be localized onto
tionary explanation (▶ Explanation, Evolutionary). a readily identified component structure. A function
may be distributed across multiple nonlinearly
interacting elements or processes, across the whole sys-
tem or even transcend systemic boundaries.
Cross-References

▶ Explanation in Biology
▶ Explanation, Evolutionary
Characteristics
▶ Explanation, Functional
Central strategies in the mechanistic explanation of the
properties and behaviors of biological systems are
decomposition and functional localization. Decomposi-
References tion depends on the assumption that the behavior of
a system is a product of a set of subordinate functions,
Bigelow J, Pargetter R (1987) Functions. J Philos 84:181–196
and that the interactions between the functional ele-
Brandon RN (1990) Adaptation and environment. Princeton
University Press, Princeton ments are minimal and can be handled additively
Canfield JV (1964) Teleological explanation in biology. Br (Bechtel and Richardson 2010). Functional localization
J Philos Sci 14:285–295 is achieved when a structural decomposition of the
Cummins R (1975) Functional analysis. J Philos 72:741–765
system – in simple cases achieved through observation,
Garson J (2008) Function and Teleology. In: Sarkar S,
Plutynski A (eds) A companion to the philosophy of biology. and in more complex cases by the identification, of
Blackwell, Cambridge, pp 525–549 regions of maximal causal interaction, bounded with
Function, Distributed 769 F
regions of low interactivity (see ▶ Top-down Decom- tested, for example, through the use of targeted ▶ per-
position of Biological Networks) – yields identifiable turbation experiments. Successful explanations are
components that are held to be responsible for these achieved when functional decompositions map onto
functions (▶ Functional Modules and Complexes). structural decompositions. Problems arise, however,
The full explanation is achieved by the localization of when this mapping is not achieved or when the pertur-
a complete set of bottom-level component functions bation of components yields counterintuitive results.
(activities) in particular structures (entities) possessing Some features of biological systems do not appear to
the requisite characteristics or capacities that enable be analyzable into functionally discrete components and
them to be the bearer of these functions (Machamer the functionally relevant properties of many compo-
et al. 2000). Although it is not necessary to assume that nents are context sensitive to some degree. Bechtel and
a single component (in the sense of an individual, spa- Richardson (2010) identify systems in which the struc-
tially localized entity or structure identified as tural components seem to perform tasks that would not
F
a component of the system) is responsible for a specific appear in an intuitively plausible functional decomposi-
activity, this assumption is often made, if only as a first tion of the system. This makes the localization of com-
approximation. Even in cases where components interact ponent functions onto identified discrete structures
(a feature of what Herbert Simon (1996) called near- impossible and has given rise to the suggestion that the
decomposable systems), the primary explanatory burden emergent behavior of the system is inexplicable in
rests on the intrinsic (context-independent) properties of mechanistic terms (▶ Emergence).
the components. Interactions are usually modeled in Kauffman’s (1993) explanation of the stability of
a linear, additive fashion and organizational factors are ▶ gene regulatory networks on the basis of their con-
relegated to a secondary role, as constraints. nectivity is an extreme example of an explanation of
A well-known example of this explanatory strategy a systemic feature that does not make use of functional
in molecular biology is Jacob and Monod’s (1961) localization. His network model consists of simple
operon model of ▶ gene regulation. They posited regu- nodes, each node being in one of two states (on, off).
latory genetic elements as a solution to the problem of The interactions between the nodes are simple activa-
accounting for changes in gene expression. The details tion or repression, so making transitions between suc-
of this model include the functionally defined ▶ repres- cessive states of the network Boolean operations
sor. This consists of a protein that exhibits certain struc- (▶ Boolean Networks). Kauffman argues that networks
tural features allowing it to bind to the operator and so of this sort exhibit behavior that encounters stable cycles
prevent the expression of the structural genes. The shape (▶ Stability) even in the face of perturbations and that
and molecular constitution of the regulatory protein a gene network characterized by a similar architecture
ensures that it is able to perform the bottom-level activ- will be robust. The explanatory properties for the
ities (in this case, geometrical/mechanical and chemical ▶ robustness of the network are distributed across
bonding) that constitute an instance of gene regulation. the entire system and the intuitively satisfying parsing
Other component functions (structural genes, ▶ Pro- of the systemic behavior into functionally discrete
moter, operator, etc.) are performed by identified struc- components is not possible. Any explanatory properties
tures each constituted in such a way as to make their of individual nodes are dependent on their relations to
individual contribution to the behavior of the system other nodes rather than on their intrinsic properties; for
intelligible. The adequacy of the explanation is a result example, a ▶ hub is a hub in virtue of its high number of
of the gross systemic behavior being the sum of the connections and if it is functionally significant, it is so
linear sequence of sub-tasks (functions), which in turn because of these connections.
are the sum of their component sub-tasks until we reach A less radical example of a process of the distribution
the lowest level of entities and activities of concern to of a functional component of a biological system is the
the field (Machamer et al. 2000). notion of a eukaryotic gene. Genes are made up of
Standard strategies for the development of this kind non-contiguous exons (▶ Exon) interspersed with non-
of mechanistic explanation often involve the intuitive coding introns (▶ Intron), which, to complicate matters
parsing of the systemic behavior into component func- further, are utilized in differing ways in different devel-
tions giving rise to a plausible ▶ mechanism sketch or opmental processes. The localization of the functional
schema (Craver 2007). Competing schemas are then unit (“gene”) onto a discrete molecular structure
F 770 Function, Distributed

(i.e., a contiguous section of DNA) does not appear to be ascription of functional roles to individual modules,
possible. This, and further complications with regard to combinations of modules, and other network features
the physiological roles of the gene products, has led (e.g., connectivity, small world architecture; see
Moss (2003) to argue that the functional unit (called ▶ Small-World Property and ▶ Functional Modules
“gene-P” to signify its connection to phenotype) cannot and Complexes) can then be attempted by coordinated
be mapped onto any recognizable molecular structure perturbation experiments, conducted in vitro or in silico
(called “gene-D” to signify status as a developmental (or even in vivo) on the basis of molecular data and
resource). The phenotypic (functional) role of any par- theoretical work.
ticular molecular gene (gene-D) is not fixed by its intrin- The assumption underlying these methodological sug-
sic structural properties, but is rooted in features gestions is that functionality in biological systems is not
distributed throughout a dynamic network of regulatory always readily localizable onto discrete structural com-
and developmental resources. The extent of this distri- ponents. An individual function (which is defined in
bution will depend on the particular process and the relation to the concerns of the investigator) may be
context in which it is taking place. located onto a single molecular component, a discrete
It is argued that biological systems are characterized structure, a network module or complex, a chemical path-
by a spectrum from highly localized to radically way, a dynamic process, an architectural feature of the
distributed functional elements and that neither the whole system or may even interfere in the interaction
“reductionistic” (▶ Reduction) localization of bottom- between the system and its environment. One of the
level component functions in discrete structures nor the novel features of systems biology is thought to be that it
“holistic” (▶ Holism) representation of system dynamics facilitates the empirical investigation into the extent of
constitute adequate explanatory strategies (Krohs and distribution of any particular function in cases in which
Callebaut 2007). Instead a plurality of explanatory our intuitions and mental capacities let us down. Even
approaches, which depend on the nature of the subject in cases in which the function bearer is a dynamic process
matter and the interests of the investigator, should be (so involving multiple structural components over
pursued (Mitchell 2003). In systems biology, this means a period of time), it may be possible to construct
dealing with hierarchically organized dynamic systems a simulation that accounts for its role in the overall system
and networks (see ▶ Hierarchy and ▶ Organization), and ascertain to what extent it is spatiotemporally distrib-
the features of which are explicable in terms of combi- uted. It remains to be seen to what extent the computa-
nations of network and system dynamics and the prop- tional and theoretical tools available to systems biologists
erties of components structures, sub-networks, and will overcome the obstacles to achieving such ambitious
pathways (Kitano 2002). Addressing such ▶ complexity, aims. However, the developments in the capacity of
non-linearity, and heterogeneity is only possible with systems biologists to account for the behaviors of com-
huge quantities of data across the full range of omic plex biological systems are already throwing new light on
levels and the capacity to analyze these data in parallel. the roles of decomposition and functional localization
Traditional techniques for localizing functions through, and encouraging philosophers of science to think again
for example, single gene knockout experiments cannot about the limits of mechanistic explanatory strategies.
deal with robust systems or organizational and dynamic
properties. One suggestion is to attempt simulations in Cross-References
silico that integrate network and dynamical systems
models with molecular data concerning known compo- ▶ Boolean Networks
nents (Kitano 2002). Another suggestion, which relates ▶ Complex System
directly to the problems of explaining systems ▶ Complexity
containing distributed functional components, is to ▶ Emergence
focus on the “unbiased” structural decomposition of the ▶ Exon
system. Making use of quantitative methods for delin- ▶ Function
eating network modules, through the identification of ▶ Functional Modules and Complexes
local maxima of interaction, avoids the need for the ▶ Gene Regulation
intuitive decomposition of the system into functional ▶ Gene Regulatory Networks
components (Krohs and Callebaut 2007). The secondary ▶ Holism
Functional 771 F
▶ Hub the ▶ IDENTIFICATION Axiom) of ▶ IMGT-
▶ Intron ONTOLOGY, the global reference in ▶ immunogenet-
▶ Mechanism ics and ▶ immunoinformatics (Giudicelli and Lefranc
▶ Organization 1999; Lefranc et al. 2004, 2005, 2008; Duroux
▶ Perturbation et al. 2008), built by IMGT®, the international ImMu-
▶ Promoter noGeneTics information system® (http://www.imgt.
▶ Reduction org) (▶ IMGT® Information System). “Functional”
▶ Repressor identifies, whatever the molecule type (▶ MoleculeType),
▶ Robustness the functionality of ▶ Molecule_EntityType leafconcepts
▶ Small-World Property in undefined or germline configuration (▶ Configuration
▶ Stability Type), whose coding region has an open reading frame
▶ Top-down Decomposition of Biological Networks without stop codon and for which there is no described
F
defect in the splicing sites, recombination signals
(▶ Recombination Signal (RS)) and/or regulatory
References elements.
“Functional” is one of the three leafconcepts (the
Bechtel W, Richardson RC (2010) Discovering complexity: other two being “▶ ORF” and “▶ Pseudogene”) that
decomposition and localization as strategies in scientific
identify the functionality of Molecule_EntityType
research. MIT Press, Cambridge
Craver CF (2007) Explaining the brain: mechanisms and the leafconcepts in undefined configuration (▶ Configura-
mosaic unity of neuroscience. Clarendon Press, Oxford tion Type) conventional genes (▶ Conventional Gene)
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the and immunoglobulin (IG) and T cell receptor (TR) con-
synthesis of proteins. J Mol Biol 3:318–356
stant (C) genes (▶ Constant (C) Gene), or in germline
Kauffman S (1993) The origins of order: self organization and
selection in evolution. Oxford University Press, Oxford configuration (▶ Configuration Type) (IG and TR vari-
Kitano H (2002) Looking beyond the details: a rise in system- able (V), diversity (D), and joining (J) (▶ Variable (V)
oriented approaches in genetics and molecular biology. Curr Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene) genes
Genet 41:1–10
before DNA rearrangements) (Giudicelli and Lefranc
Krohs U, Callebaut W (2007) Data without models merging with
models without data. In: Boogerd F et al (eds) Systems biology: 1999; Lefranc et al. 2004; Duroux et al. 2008).
philosophical foundations. Elsevier, Amsterdam, pp 181–213
Machamer P, Darden L, Craver CF (2000) Thinking about
mechanisms. Phil Sci 67:1–25
Mitchell SD (2003) Biological complexity and integrative
Cross-References
pluralism. Cambridge University Press, Cambridge
Moss L (2003) What genes can’t do. MIT Press, Cambridge ▶ Configuration Type
Simon H (1996) The sciences of the artificial. MIT Press, ▶ Constant (C) Gene
Cambridge
▶ Conventional Gene
▶ Diversity (D) Gene
▶ FunctionalityType
▶ IMGT-ONTOLOGY
Functional ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
Marie-Paule Lefranc ▶ IMGT ® Information System
Laboratoire d’ImmunoGénétique Moléculaire, Institut ▶ Immunogenetics
de Génétique Humaine UPR 1142, Université ▶ Immunoinformatics
Montpellier 2, Montpellier, France ▶ Joining (J) Gene
▶ Molecule Entity Type
▶ MoleculeType
Definition ▶ Open Reading Frame (ORF)
▶ Pseudogene
“Functional” is a ▶ leafconcept of the “▶ Functiona- ▶ Recombination Signal (RS)
lityType” concept of identification (generated from ▶ Variable (V) Gene
F 772 Functional Differential Equations

References References

Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc Al-Shahrour F, et al. FatiGO: a web tool for finding significant
M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the Formal associations of gene ontology terms with groups of genes.
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 Bioinformatics. 2004;20:578–80.
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler—a
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 web-based toolset for functional profiling of gene lists from
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, large-scale experiments. Nucleic Acids Res. 2007;35(Web
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, Server Issue):W193–200.
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi MM, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Functional Genomics Experiment Model
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 ▶ FuGE
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
system and an ontology that bridge biological and computa-
tional spheres in bioinformatics. Brief Bioinform 9:263–275

Functional GO and Pathway-Based


Network

Functional Differential Equations ▶ Functional/Signature Network Module for Target


Pathway/Gene Discovery
▶ Dynamical Systems Theory, Delay Differential
Equations

Functional Interaction Network

Functional Enrichment Analysis ▶ Organelle and Functional Module Resources

Jiguang Wang
Beijing Institute of Genomics, Chinese Academy of
Sciences, Beijing, Beijing, China Functional Model

▶ Process-based Model
Definition

For a given cluster of biological elements such as gene,


functional enrichment analysis is to compare the Functional Modules and Complexes
enrichment of any type of biologically relevant labels
such as gene ontology terms in these genes to that in Xianwen Ren
the background. Hypergeometric distribution or bino- Key Laboratory of Systems Biology of Pathogens,
mial distribution is usually applied to give the p-value Ministry of Health, Institute of Pathogen Biology,
for whether the enrichment is significantly different in Chinese Academy of Medical Sciences and Peking
the given gene cluster and in the background. So Union Medical College, Beijing, China
the choice of background and the cutoff value are
important for the reporting result. There have been Definition
several available tools for application, e.g., FatiGO
(Al-Shahrour et al. 2004), g:profiler (Reimand et al. Proteins act in concert in cells. Some form complexes
2007), etc. in which member proteins bind to each other stably to
Functional/Signature Network Module for Target Pathway/Gene Discovery 773 F
act as a whole functional unit, e.g., ribosomes. are represented by nodes and the interaction
Complexes are expected to be predicted from protein- between them are represented by edges. The multi-
protein interaction networks through identifying ple nodes and corresponding edges in a network
functional modules. together form modules/subnetworks that manifest
Functional module refers to a set of proteins that are high internal similarity/correlation (i.e., modular
densely connected within themselves but sparsely networks). These modules form an independent
connected with the rest in the biological molecular connected component with each other, while they
network. are sparsely connected with the rest of the network
(Erten et al. 2009). The network modules carrying
biologically meaningful information are termed as
Cross-References functional modules. A general concept on identify-
ing signature and functional module is described
F
▶ Pathway, Functional Units below.

References Characteristics

Spirin V, Mirny AL. Protein complexes and functional modules Identification of Signature and Functional
in molecular networks. Proc Natl Acad Sci USA.
Network Modules
2003;100(21):12123–8
Modular networks are analyzed to identify signa-
ture/function network modules from larger molecu-
lar networks. It involves topological analysis of the
networks to find the maximally scoring regions in
the network, followed by mapping the results to the
Functional/Signature Network Module true function categories like those present in the
for Target Pathway/Gene Discovery protein function annotation databases, e.g., the
Gene Ontology (GO) and pathway databases. Net-
Shipra Agrawal1 and M. R. Satyanarayana Rao2 work topology is the layout pattern of interconnec-
1
BioCOS Life Sciences Pvt. Limited, Institute of tions of the various elements like nodes and edges.
Bioinformatics and Applied Biotechnology, The functional annotation of signature modules is
Bangalore, Karnataka, India based on their gene/protein enrichment analysis
2
Chromatin Biology Laboratory, Molecular Biology from GO and pathway database. The gene or protein
and Genetics Unit, Jawaharlal Nehru Center for enrichment analysis of networks is carried out by
Advanced Research, Bangalore, Karnataka, India mapping network genes to known pathways and
gene ontology terms to determine which pathways/
GO terms are overrepresented in a given set of
Synonyms genes.
Functional modules are significantly enriched
Co-expression network; Expression signature; Func- in known target genes and can be used as fingerprints
tional GO and pathway-based network; Genomics to identify genes relevant to some specific biological
network functions or diseases or to discover new potential
drug targets (Wong et al. 2008). The system level expres-
sion network is processed to identify upregulated/
Definition downregulated signature and functional expression
network modules. For example, identification of
High-throughput molecular biology data is often downregulated synaptic vesicle module for brain disor-
analyzed and represented today through molecular ders that includes Alzheimer’s disease, bipolar disorder,
networks. The network of interacting proteins/genes Schizophrenia, and glioblastoma (Suthram et al. 2010)
is an undirected network, where proteins/genes (refer Expression Network Module Box for details).
F 774 Functional/Signature Network Module for Target Pathway/Gene Discovery

2. Bottleneck Nodes
Expression Network Modules Bottleneck nodes are those, which have a higher
Expression modules consist of clusters of genes betweenness (i.e., “bottleneck-ness”) and lower
whose expression profiles share a local similarity degree of connectivity with neighboring
or coordinate expression. The local similarity/ genes/proteins. Betweenness is one of the most
co-expression across genes is measured using important topological properties of a network. It
statistical correlation parameters, i.e., Pearson’s measures the number of shortest paths (the shortest
Correlation Coefficient. The genes with very distance between two nodes) between nodes going
high co-expression levels constitute a cluster/ through a certain node. Therefore, nodes with the
module, which usually indicates that they have highest betweenness control most of the informa-
a common biological function or share tion flow in the network, representing the critical
a common physiological condition. The condi- points of the network (Fig. 1). They act as important
tion-specific co-expression information provides links between modules in protein interaction net-
clues to the dynamic features of these network works, just as bridges connecting two important
modules. For example, dilated cardiomyopathy, hubs/modules. Bottlenecks control the major infor-
which is a leading cause of heart failure, has been mation flow in a network and correspond to the
well studied using expression network modules. dynamic components of the interaction network.
These networks have facilitated identification of They are observed to be significantly less well
putative biomarkers or therapeutic targets for co-expressed with their neighbors. They have been
heart failure and the underlying molecular found to be present in the yeast interactome in
mechanism of dilated cardiomyopathy (Lin abundance (Yu et al. 2007).
et al. 2010) 3. Network Motifs
Network motifs are patterns of a larger and more
complex network. They are overrepresented node
The modular structure of complex biological connectivity patterns and recur frequently in
networks also facilitates identification of biologi- a given network than expected at random. They
cally relevant gene hubs, bottleneck nodes, network may represent autoregulation or feed-forward
motifs, and biomarker nodes, which are described loops in a regulatory network and indicate func-
below. tional and evolutionary constraints in a network
1. Gene Hubs (Lee et al. 2002). The transcription networks of
In a molecular network, nodes (gene/protein) well-studied microorganisms are made up of
which are highly connected to other nodes are a small set of network motifs. These motifs are
called gene hubs/network hubs. The gene hubs also found in protein modification network and
have a high degree of connectivity with their interactions between neuronal cells and signaling
neighboring genes. The latest method for scoring networks. The specific ways (i.e., autoregulation or
a functional hub is based on the role the nodes feed-forward loops) in which the network motifs
play in providing connectivity among genes or are wired together describes the dynamics of each
proteins of interest relative to their role in the individual motif. These patterns exhibit a robust
global network. They are centrally important for dynamical stability across a complex biological
the cellular functions and tend to be essential and network (Fig. 1).
conserved in a network module. In the context of 4. Biomarker Nodes
disease pathways, hubs may represent potential Biomarkers are biomolecular signatures, which are
drug targets (Wuchty 2004; Levy and Siegal detectable and measurable and can be used to study
2008). Scientists working on common diseases the diagnosis/prognosis of disease or therapy.
have reported several important drug targets by Under normal conditions, they may be present at
identifying important functional hubs across basal levels in the cell. However, if the amount of
larger molecular networks derived from high- these molecules changes, they may indicate
throughput expression as well as protein complex response to therapies, complications or diseases.
data. Biomarker nodes can be identified by studying
Functional/Signature Network Module for Target Pathway/Gene Discovery 775 F

Group of nodes forming Overrepresented patterns


independent component in a complex network

Network Module Network Motif

Group of genes whose


Nodes with high betweenness expression profiles share a
centrality and low degree local similarity
Group of genes performing
Expression Module
Bottleneck Node distinct biological functional

Functional Module

Functional/Signature Network Module for Target Path- If the network is constructed from co- expression (expression
way/Gene Discovery, Fig. 1 A complex biological network similarity data), its statistical analysis leads to identification of
and functionally important network elements. Topological anal- expression module. The expression modules or network module
ysis of a system level biological network leads to the identifica- can be functionally annotated to have a biological function,
tion of network modules, network motifs, and bottleneck nodes. which is finally termed as a functional module

gene network models having disease-associated The statistically significant functional gene/protein net-
gene expression profiles and biofluid proteomes works can be further analyzed and annotated to map
(Schiess et al. 2009). A gene, which is coordinately biologically important pathways and gene hubs as tar-
and consistently connected to critical disease gets. This section is focused on identification of pertur-
causal candidate genes and pathways are termed bation targets, target pathways/genes, using network
“biomarker nodes.” A highly connected biomarker biology approach.
node sometimes targets multiple related diseases; 1. Perturbation Targets from Network Biology
e.g., AZGP1 gene is a biomarker node for cardiac The network biology approach can be used to com-
hypertrophy, idiopathic cardiomyopathy, and idio- pare gene expression profiles of drug-treated or
pathic thrombocytopenic purpura, (Dudley and diseased-cell population with those of the normal
Butte 2009). cells. This helps in identifying the target genes and
pathways that are directly affected by the perturba-
Mapping Pathway/Gene Target From Networks tion experiments, e.g., genes that are deregulated in
Protein–protein interaction and transcriptional regula- some disease pathways or genes whose products
tory networks can be used to identify putative can be used as potential targets of some drug
biomarkers, target genes, target pathways, etc. The tar- (Mani et al. 2008, Karlebach and Shamir 2008).
get pathways and genes are more reliably detected from Topologically important networks are processed
expression profiling–based perturbation experiments. for gene ontology and pathway annotations to
F 776 Functional/Signature Network Module for Target Pathway/Gene Discovery

Functional/Signature Target pathways


Network Module for Target
Pathway/Gene Discovery, Pathways of target genes
Fig. 2 Identification of which are biologically
significant and high-scoring
biologically significant target
pathways/genes from
a complex biological network

Target genes

Causal genes having significant


involvement in a disease or
biological process

define the effect of perturbation experiment and This involves constructing networks and studying
subsequently to identify the targets of molecular the kinetic interactions of a gene/protein with other
perturbation from global expression profile data. genes/proteins in a network (Fig. 2) (Dezso et al.
2. Target Pathways 2009; Gilchrist et al. 2006). The discovery of target
Biologically significant pathways can be identified pathways and genes is interrelated. A researcher can
by performing functional annotation and pathway always narrow down the list of target genes from the
enrichment analysis of molecular networks. The list of target pathways discovered from modular
high-scoring and conserved pathways can be used networks.
in disease-related studies, i.e., for understanding the
pathogenesis of disease, identifying important
deregulated metabolic and signaling networks and References
subsequently discovering potential drug targets for
the disease (Fig. 2) (Zheng and Christina 2009; Dezso Z, Nikolsky Y, Nikolskaya T, Miller J, Cherba D,
Zhou and Wong 2009). Further analysis of biologi- Webb C, Bugrim A (2009) Identifying disease-specific
genes based on their topological significance in protein
cally significant pathways and their cross talk across
networks. BMC Syst Biol 3:36
networks leads to identification of the most critical Dudley JT, Butte AJ (2009) Identification of discriminating
pathways in a complex disease mechanism that could biomarkers for human disease using integrative network
also serve as potential key target pathways. For biology. Pac Symp Biocomput 27–38
Erten S, Li X, Bebek G, Li J, Koyut€ urk M (2009) Phylogenetic
example, human Toll-like receptor signaling net-
analysis of modularity in protein interaction networks. BMC
work has been used to establish the important control Bioinformatics 10:333
points through pathway cross talk studies. The anal- Gilchrist M, Thorsson V, Li B, Rust AG, Korb M, Roach JC,
ysis identified potential candidates for inhibitory Kennedy K, Hai T, Bolouri H, Aderem A (2006) Systems
biology approaches identify ATF3 as a negative regulator of
mediation of TLR signaling with respect to their
toll-like receptor 4. Nature 441(7090):173–178
specificity and potency (Li et al. 2008). Karlebach G, Shamir R (2008) Modelling and analysis of gene
3. Target Genes regulatory networks. Nature Reviews Molecular Cell
A system level network biology approach can be used Biology 9:770–780
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z,
to identify novel target genes of interest which are
Gerber GK, Hannett NM, Harbison CT, Thompson CM,
found to have significant involvement in some dis- Simon I, Zeitlinger J, Jennings EG, Murray HL,
ease, biological process, or an abnormality. Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL,
FunctionalityType 777 F
Fraenkel E, Gifford DK, Young RA (2002) Transcriptional Definition
regulatory networks in Saccharomyces cerevisiae. Science
298(5594):799–804
Levy SF, Siegal ML (2008) Network hubs buffer environmental FunctionalityType is a concept of identification
variation in Saccharomyces cerevisiae. PLoS Biol 6(11):e264 (generated from the ▶ IDENTIFICATION Axiom)
Lin CC, Hsiang JT, Wu CY, Oyang YJ, Juan HF, Huang HC of ▶ IMGT-ONTOLOGY, the global reference in
(2010) Dynamic functional modules in co-expressed protein ▶ immunogenetics and ▶ immunoinformatics (Duroux
interaction networks of dilated cardiomyopathy. BMC Syst
Biol 4:138 et al. 2008), built by IMGT®, the international ImMuno-
Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla- GeneTics information system® (http://www.imgt.org)
Favera R, Califano A (2008) A systems biology approach (▶ IMGT® Information System), that allows to
to prediction of oncogenes and molecular perturbation tar- identify, whatever the molecule type (▶ MoleculeType)
gets in B-cell lymphomas. Mol Syst Biol 4:169
Schiess R, Wollscheid B, Aebersold R (2009) Targeted proteo- (gDNA, cDNA, mRNA, or protein), the type of
mic strategy for clinical biomarker discovery. Mol Oncol functionality of a ▶ Molecule_EntityType leafconcept
3(1):33–44
F
(Giudicelli and Lefranc 1999; Lefranc et al. 2004;
Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ Duroux et al. 2008).
(2010) Network-based elucidation of human disease similar-
ities reveals common functional modules enriched for plu- The “FunctionalityType” concept comprises five
ripotent drug targets. PLoS Comput Biol 6(2):e1000662 leafconcepts, divided into two categories, according
Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY to the configuration type (▶ Configuration Type) of
(2008) Module map of stem cell genes guides creation of the Molecule_EntityType leafconcept.
epithelial cancer stem cells. Cell Stem Cell 2(4):333–344
Wuchty S (2004) Evolution and topology in the yeast protein Three leafconcepts, ▶ functional, ORF (open read-
interaction network. Genome Res 14(7):1310–1314 ing frame), and ▶ pseudogene, identify the functional-
Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M (2007) The ity of Molecule_EntityType leafconcepts in undefined
importance of bottlenecks in protein networks: correlation configuration (conventional genes (▶ Conventional
with gene essentiality and expression dynamics. PLoS
Comput Biol 3(4):e59 Gene) and immunoglobulin (IG) and T cell receptor
Zheng L, Christina C (2009) Systems biology for identifying (TR) constant (C) genes (▶ Constant (C) Gene) or
liver toxicity pathways. BMC Proc 3(Suppl 2):S2 in germline configuration (IG and TR variable (V),
Zhou X, Wong ST (2009) Computational systems diversity (D) and joining (J) (▶ Variable (V) Gene,
bioinformatics and bioimaging for pathway analysis and
drug screening. Proc IEEE Inst Electr Electron Eng ▶ Diversity (D) Gene, ▶ Joining (J) Gene) genes
96(8):1310–1331 before DNA rearrangements).
Two leafconcepts, ▶ productive and unproductive,
identify the functionality of Molecule_EntityType
leafconcepts in rearranged or partially-rearranged
configuration (IG and TR entities after DNA
Functionality Type rearrangements, and by extension fusion entities resulting
from translocations, and hybrid entities obtained by
▶ FunctionalityType biotechnology molecular engineering).

Cross-References
FunctionalityType
▶ Configuration Type
Marie-Paule Lefranc ▶ Constant (C) Gene
Laboratoire d’ImmunoGénétique Moléculaire, Institut ▶ Conventional Gene
de Génétique Humaine UPR 1142, Université ▶ Diversity (D) Gene
Montpellier 2, Montpellier, France ▶ Functional
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Synonyms ▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT-ONTOLOGY, Unproductive
Functionality type ▶ IMGT ® Information System
F 778 Functional-Structural Plant Modeling

▶ Immunogenetics organs) and geometry (orientation, inclination, and


▶ Immunoinformatics shape) of the organs and the plant. At the individual
▶ Joining (J) Gene level, this is also referred to as plant architecture. An
▶ Molecule Entity Type FSPM may consider a change in organ and plant struc-
▶ MoleculeType ture in time, thereby simulating growth, extension, and
▶ Open Reading Frame (ORF) branching processes of a given plant. This type of
▶ Productive FSPM is referred to as dynamic. A static FSPM, on
▶ Pseudogene the other hand, only considers an unchanging structure,
▶ Variable (V) Gene which is used as a model input in order to explain
spatial heterogeneity in physiological processes.
Physiological and physical processes considered
usually comprise essential, basic, and characteristic
References
functions, such as ▶ photosynthesis, growth (biomass
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, accumulation and organ extension in length and diam-
Giudicelli V (2008) IMGT-Kaleidoscope, the formal eter), and branching, with physiological processes
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 depending on environmental factors such as tempera-
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- ture, radiation, CO2 content of the air, and relative
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, humidity. These processes (or functions) are usually
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, implemented at the level of each organ and then
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, require local environmental parameters. The explicit
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, consideration of the geometry and topology of struc-
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29 tural elements at the organ level distinguishes FSPM
from its predecessor, ▶ process-based models.
In an FSPM, a feedback relation exists
between the structure and certain functions: A given
structure (static or resulting from application of
Functional-Structural Plant Modeling rules) can modulate the local output of processes (e.g.,
self-shading of leaves diminishing locally intercepted
Gerhard Buck-Sorlin light for ▶ photosynthesis); on the other hand,
Institut de Recherche en Horticulture et Semences, all structures are built and maintained by processes
UMR 1345 (INRA/Agrocampus-ouest/Université (e.g., growth of new biomass fed by assimilates resulting
d’Angers) Agrocampus Ouest, Angers, France from photosynthesis).

Synonyms Characteristics

Virtual plant modeling According to the FSPM paradigm, a plant responds to its
environment by adaptive modification of processes
constituting its physiology (e.g., ▶ photosynthesis) and
Definition plant architecture (e.g., bud break or dormancy, growth,
development, morphogenesis), thereby explicitly
Functional-structural plant modeling (FSPM) refers to addressing the feedbacks between structure and func-
a paradigm for the description of a plant by creating tion. In addition, such feedbacks can be implemented
a (usually object-oriented) computer model of its struc- and verified at different levels, e.g., locally at the organ
ture and selected physiological and physical processes, scale and globally at the plant or canopy scale.
at different hierarchical levels: organ, plant individual, A typical reaction of the plant to a change in envi-
canopy (a stand of plants), and in which the processes ronment or to an intervention (cutting, bending, etc.)
are modulated by the local environment. Structure by management or in the course of an experiment
comprises the explicit topology (connection between will thus be the relatively rapid readjustment of
Functional-Structural Plant Modeling 779 F
physiological functions and a somewhat slower reac- History
tion in terms of structural adaptation by growth of new The concept of Functional-Structural Plant Modeling
structures or active shedding of existing structures arose in the 1990s from the desire to link existing
(Vos et al. 2010). ▶ process-based models of crops with spatially
Usually, the dynamics in an FSPM is simulated explicit representations of plants (usually represented
using rules (for growth, extension, and branching) in a rule-based manner as ▶ Lindenmayer systems) in
always starting from a growing tip (meristem) which order to consider, in a mathematical model, interactions
produces ▶ phytomers (consisting of a leaf, a node, an between plant structure and functioning (Siev€anen et al.
axillary bud, and an internode). The axillary bud can 1997). Early representatives of FSPMs were restricted to
itself produce a phytomer (with a meristem on top) to one aspect of plant functioning and showed how the
form a new shoot; this is referred to as branching. For selected function was affected by morphology (e.g.,
each of the organs of a phytomer, specific dynamics for light interception, assimilate allocation, xylem sap
F
extension or growth in biomass apply, which can be flow). In these models, information was frequently uni-
described using nonlinear functions of time. If directional (from structure to function). Furthermore,
such a dynamics follows a logistic function, then it is due to the lack of conventions among modelers, these
characterized by three parameters: time of onset models were very specific and difficult to reuse or to
of process, time of maximum rate, and maximum transfer to other plant systems.
dimension. Thus, using a rule-based notation (e.g., During a second wave, FSPMs were designed to be
a ▶ Lindenmayer system) to describe these processes, more modular, exhibiting submodels contributed by
two types of rules are usually sufficient: several authors and considering more than one aspect
(a) For the formation of a new ▶ phytomer from of functionality at the same time. Examples for this type
a (terminal or axillary) meristem: of model are LIGNUM (Perttunen et al. 2001) or
Greenlab (Guo et al. 2006). New features included
conditions
M ! IN ½ L½ MM in these models were, e.g., bidirectional information
flow between structure and function (Hemmerling
where M symbolizes the meristem and I, N, L the et al. 2008); processes taking place at more than one
internode, node, and leaf, respectively; ! designates hierarchical level, e.g., phytohormone biosynthesis and
a replacement of the left-hand side symbol transport (Buck-Sorlin et al. 2005); or the fluxes of
(meristem M) by the string of symbols on the right phloem and xylem within the tree (Lacointe and
hand-side, i.e., the organs constituting a phytomer, Minchin 2008). Recent simulation studies addressed
plus another meristem at the end of the string, thereby also aspects of agricultural pest management, the impact
ensuring that the rule can be applied over and over of biomechanics on tree architectural development, and
again (as long as certain conditions are fulfilled); the more refined models of light distribution in canopies,
square opening [ and closing ] brackets represent including the effects of light quality on growth regula-
the beginning and end of a branch. tion (Vos et al. 2010). FSPMs have been created for crop
(b) For the dynamics of growth or extension: plants such as the cereals maize, wheat, rice, and barley,
but also for trees (Vos et al. 2010). In most of these
conditions FSPMs, the structure was modeled dynamically, from
OðlÞ ) Oðl þ dlÞ an initial meristem, but there exist also static FSPMs in
which the structure is fixed (Vos et al. 2010).
where O is one of {I, N, L,. . .} (can also be a fruit or
flower), l is its dimension (e.g., length), and ) denotes Elements for Conception and Construction of a
a rule in which the dimension of an already Functional-Structural Plant Model
existing organ (resulting from the application of rule The establishment of an FSPM starts with the
(a)) is updated by adding a fraction dl to the dimension observation of plant morphology, i.e., topology
l (specified on the right-hand side of the rule). (arrangement of ▶ phytomers in shoots, with lateral
Expressed as a rate, dl/dt would be the derivative shoots up to a maximum branching order) and
of the function employed to describe the growth or geometry, including the biometry and morphology of
extension dynamics of the organ in question. the different plant organs at different positions within
F 780 Functional-Structural Plant Modeling

the plant. As with every model, it is wise to determine et al. 2008); in addition, some FSPMs have been
the boundaries, the scale, and the elements of devised using a standard programming language
the system to be modeled: Usually, the scope of (e.g., C, C++, Java, Simula). Platforms used for
an FSPM is not more than a dozen plant individuals; FSPM include L-Studio, GroIMP, and OpenAlea.
the scales are that of the (sub)organ, shoot, individual
and canopy level, and the elements considered are
the organs constituting the phytomer, plus collective Cross-References
entities (the “plant” as the sum of all shoots, the shoot
as the sum of its phytomers, etc.). Sometimes, other ▶ Biological System Model
elements characteristic of a crop production system are ▶ Calibration
simulated, such as a greenhouse with lamps or a patch ▶ Complex System
of soil. Key topological parameters for an FSPM are ▶ Data Integration
maximum branching order, maximum rank of ▶ Ecological Modeling
a phytomer within a shoot of a given branching order, ▶ Feedback Regulation
or rank of lowest flower. Using rank and order as ▶ Function, Biological
parameters for M in equation (a), those measured max- ▶ Functional
imum values are used to restrict the production of ▶ Graphical Model
phytomers by the meristems (see phytomer for ▶ Hierarchical Structure
a definition of rank and order). For each considered ▶ Interdisciplinarity
plant element, a number of (physical or) physiological ▶ Markov Chain
processes (growth, extension, respiration, ▶ photosyn- ▶ Markov Process
thesis, etc.) are defined. This will yield an object- ▶ Model Validation
oriented model prototype. ▶ Modularity
In order to parameterize the dimensions and ▶ Monte Carlo Simulation
orientations of organs (e.g., growth and extension models ▶ Ordinary Differential Equation (ODE)
(equation (b)), a database with lengths, diameters, areas, ▶ Paradigm
and angles (phyllotaxis, divergence) of representative ▶ Phenomenological vs Physiological Modeling
organs needs to be established. Statistical analysis ▶ Plant Systems Biology
(regression) ideally returns a relationship between ▶ Rule
an organ dimension and its topology (rank, order). The ▶ Rule-Based Methods
same relation can be established between a physiological ▶ Spatiotemporal Pattern Formation
function (e.g., leaf photosynthesis) and topology. ▶ Time-Course
Methods used to establish a biometric database are man- ▶ Topology and Toponomics
ual measurement using a ruler and caliper, digitizing
(using a stylus or laser scanning), and image analysis.
After calibration of the FSPM, it is tested using a number References
of scenarios and then validated in the usual way by
reparameterization with a calibration data set (measured Buck-Sorlin G, Kniemeyer O, Kurth W (2005) Barley morphol-
ogy, genetics and hormonal regulation of internode elonga-
at organ scale) and comparison with output variables, tion modelled by a relational growth grammar. New Phytol
usually measured at canopy or plant scale. 166:859–867
Guo Y, Ma Y, Zhan Z, Li B, Dingkuhn M, Luquet D, de Reffye P
FSPM Algorithms, Languages, and Software (2006) Parameter optimization and field validation of the
functional-structural model Greenlab for maize. Ann Bot
Languages and formalisms employed for FSPM are
97:217–230
usually following the rule-based or object-oriented Hemmerling R, Kniemeyer O, Lanwert D, Kurth W, Buck-Sorlin G
paradigm, often in combination with the procedural (2008) The rule-based language XL and the modelling envi-
paradigm. The most widespread rule-based formalisms ronment GroIMP illustrated with simulated tree competition.
Funct Plant Biol 35:739–750
are ▶ Lindenmayer systems and ▶ relational growth
Lacointe A, Minchin PEH (2008) Modelling phloem and xylem
grammars. Dedicated languages for FSPM are transport within a complex architecture. Funct Plant Biol
L + C (Prusinkiewicz et al. 2007) and XL (Hemmerling 35:772–780
Fuzzy Logic 781 F
Perttunen J, Nikinmaa E, Lechowicz MJ, Siev€anen R, Messier C A consequence of the properties of a functor
(2001) Application of the functional-structural tree model F : C ! D is that it can be applied to all parts of
LIGNUM to sugar maple saplings (Acer saccharum Marsh)
growing in forest gaps. Ann Bot 88:471–481 a diagram in category C, yielding a valid diagram in
Prusinkiewicz P, Karwowski R, Lane B (2007) The L + C plant category D. Furthermore, the resulting diagram will
modeling language. In: Vos J, Marcelis LFM, de Visser PHB, commute if the original commutes.
Struik PC, Evers JB (eds) Functional-structural plant model- Many mathematical constructions that relate enti-
ling in crop production. Springer, Dordrecht, pp 27–42
Siev€anen R, M€akel€a A, Nikinmaa E, Korpilahti E (1997) Special ties of different nature can be understood as functors
issue on functional-structural tree models, preface. Silva between suitable categories. Typical examples are
Fennica 31:237–238 (tensor) products or the mapping from Lie groups to
Vos J, Evers JB, Buck-Sorlin G, Andrieu B, Chelle M, de Visser Lie algebras. Functors also play a crucial role in uni-
PHB (2010) Functional–structural plant modelling: a new
versatile tool in crop science. J Exp Bot 61:2101–2115 versal algebra and coalgebra, where they specify
expression languages for the construction and decon-
F
struction of structured elements, respectively (Jacobs
and Rutten 1997).

Functor
References
Baltasar Trancón y Widemann
Ecological Modelling, University of Bayreuth, Adámek J, Herrlich H, Strecker GE (2004) Abstract and concrete
categories. Wiley, New York. http://katmat.math.uni-
Bayreuth, Germany
bremen.de/acc/acc.htm, online edition
Jacobs B, Rutten J (1997) A tutorial on (co)algebras and (co)
induction. EATCS Bull 62:222–259
Synonyms Lawvere W, Schanuel S (1997) Conceptual mathematics: a first
introduction to categories. Cambridge University Press,
Cambridge/New York
Category homomorphism; Category map

Definition
Fuzzy Logic
A functor is a mapping that respects the structure of
a category (Lawvere and Schanuel 1997; Adámek et Eyke H€ullermeier, Thomas Fober and
al. 2004). A functor F from category C to category D Marco Mernberger
consists of the following: Philipps-Universit€at Marburg, Marburg, Germany
• A mapping that associates with every object X 2 C
an object FðXÞ 2 D
• A mapping that associates with every arrow Synonyms
f 2 HomC ðX; YÞ an arrow Fðf Þ 2
HomD ðFðXÞ; FðYÞÞ Fuzzy set theory
Such that
• Identities are mapped to identities
Definition
FðidX Þ ¼ idFðXÞ (1)
The term “fuzzy logic” is used with different meanings
• Compositions are mapped to compositions in the literature. In a narrow sense, it refers to a branch
of mathematical logic, where it is studied as a special
Fðg  f Þ ¼ FðgÞ  Fðf Þ (2) type of multivalued logic, that is, a logic with more
than two truth degrees (Hajek 1998). In a wider (and
where the left composition is in C and the right com- more common) sense, fuzzy logic is used as an
position is in D. umbrella term for a collection of methods, tools, and
F 782 Fuzzy Logic

techniques for constructing intelligent systems that, by of real objects belonging to that concept) in the real
virtue of the very idea of partiality of truth, are capable word. For example, what is a “short DNA molecule”?
of handling, processing and exploiting uncertain, Biologists have a vague though sufficiently clear idea
imprecise, and incomplete information. These of this concept, without using a precise definition in
methods build on the key concept of a fuzzy set, as terms of an exact upper bound on the number of base
introduced by the founder of fuzzy logic, Lotfi A. pairs (bp) or the length in mm. Given such bounds,
Zadeh, in his seminal paper (Zadeh 1965). Fuzzy sets a proposition like “DNA molecule XYZ is small”
formalize the idea of graded class membership, would be either true or false; likewise, the truth
according to which an element can partially belong degree of a natural language proposition like “Einstein
to a set. In conjunction with generalized logical was born around noon” could be determined in
(set-theoretical) operators and derived notions like a unique way, given a precise definition of the
a fuzzy relation, the concept of a fuzzy set can be meaning of “around noon” in terms of a time interval
developed into a generalized set theory, which in turn (e.g., 12 o’clock 15 min) and the possibility to
provides the basis for generalizing theories in different pinpoint the exact time of a birth.
branches of (pure and applied) mathematics as well as Needless to say, this way of adapting human think-
fuzzy set-based approaches to intelligent systems ing and language to conventional logic and set theory
design, encompassing methods for information would be neither desirable nor useful. Instead, fuzzy
processing, decision making, optimization, and data logic offers an approximation in the other direction:
analysis. logic and set theory are generalized, so as so enable
a more faithful mathematical modeling of human con-
ception. The core idea in this regard is the notion of
Characteristics a fuzzy set, which allows for partial membership and
soft class boundaries (Pedrycz and Gomide 2007).
The notion of truth is commonly considered as
a bivalent concept: Logically, a proposition is either Fuzzy Sets
true or false, but nothing in-between. This conception, A fuzzy subset A of a reference set  is identified by
which pervades modern science and thinking, has a so-called membership function, often denoted mA ð Þ,
a long-standing tradition in Western philosophy, and which is a generalization of the characteristic function
manifests itself in standard mathematical theories, A ð Þ of an ordinary set A . For each element
notably logic and set theory. And admittedly, formal x 2 , this function specifies the degree of member-
systems based on bivalent logic (including theories of ship of x in the fuzzy set; it can be interpreted as the
uncertainty based on such systems, like probability truth degree of the proposition that x 2 A. Usually,
theory) have proved extremely useful in the scientific membership degrees mA ðxÞ are taken from the unit
terrain, where they paved the way for the amazing interval [0,1], i.e., a membership function is an
success of the exact and engineering sciences in the X ! ½0; 1 mapping. In principle, however, more gen-
last century. eral membership scales (such as ordinal scales or com-
In many other less exact fields of science, however, plete lattices) can be used. We denote by ðÞ the set
ranging from the biological and life sciences over legal of all fuzzy subsets of .
practice to the modeling of cognitive processes and Fuzzy sets are often used for discretizing numerical
human intelligence, the bivalence of truth can be called attributes in a “soft” manner, taking advantage of their
into question. In fact, it was already noticed by ability to model “non-sharp” boundaries between clas-
Bertrand Russell in 1923 that “All traditional logic ses. Thus, they serve as an interface between the orig-
habitually assumes that precise symbols are being inal numerical scale and a symbolic scale comprised of
employed. It is therefore not applicable to this terres- the (natural language) terms associated with the fuzzy
trial life, but only to an imagined celestial existence” sets. For example, in ▶ gene expression analysis, one
(Russell 1923). Roughly speaking, this is because of typically distinguishes between normally expressed,
the vagueness and ambivalence of the concepts dealt under-expressed, and over-expressed genes. This clas-
with in these fields: for the intension of these concepts, sification is made on the basis of the expression level of
there is rarely a precise extension (in the sense of a set the gene (a normalized numerical value), by using
Fuzzy Logic 783 F
Fuzzy Logic, Fig. 1 Fuzzy under normal over
partition of the gene
expression level with
a “smooth” transition between

membership
under-expression, normal
expression, and
over-expression; membership
functions are shown as red
lines
–2 0 2
expression level

F
corresponding thresholds. For example, a gene is often In a quite similar way, the Cartesian product of
called over-expressed if its expression level is at least fuzzy sets A 2 ðÞ and B 2 ðÞ can be defined:
twofold increased. Needless to say, a precise threshold mAB ðx; yÞ ¼ mA ðxÞ mB ðyÞ for all ðx; yÞ 2   .
of that kind is arbitrary to some extent and implies • The logical disjunction can be generalized
an unnatural sudden jump from completely over- analogously, namely by means of a t-conorm
expressed to not at all over-expressed. Figure 1 . If is a t-norm, then defined by
shows a fuzzy partition of the expression level with a b ¼ 1  ð1  aÞ ð1  bÞ is a t-conorm.
a “smooth” (and arguably less arbitrary) transition Well-known examples of t-conorms include the
between under-, normal, and over-expression, formal- maximum ða; bÞ 7! maxða; bÞ, the algebraic sum
ized in terms of fuzzy sets with trapezoidal member- ða; bÞ 7! a þ b  ab, and the Łukasiewicz
ship functions. For instance, according to this t-conorm ða; bÞ 7! minða þ b; 1Þ. A t-conorm can
formalization, a gene with an expression level of at be used for defining the union of fuzzy sets:
least 3 is definitely considered over-expressed, below 1 mA[B ðxÞ ¼ mA ðxÞ mB ðxÞ for all x 2 .
it is definitely not over-expressed, but in-between, it is • A generalized implication ⇝ is a
considered over-expressed to a certain degree. ½0; 1  ½0; 1 ! ½0; 1 mapping which is monotone
decreasing in the first and monotone increasing in
Generalized Logical Connectives the second argument, and which satisfies the bound-
To operate with fuzzy sets in a formal way, fuzzy set ary conditions a ⇝ 1 = 1, 0 ⇝ b = 1, 1 ⇝ b = b.
theory offers generalized set-theoretical resp. logical (Apart from that, additional properties are
connectives (like in the classical case, there is a close sometimes required.) Implication operators of that
correspondence between set theory and logic). Espe- kind, such as the Łukasiewicz implication ða; bÞ 7!
cially, important in this regard is a class of operators minð1  a þ b; 1Þ, are especially important in con-
called triangular norms or t-norms for short (Klement nection with the modeling of fuzzy rules.
et al. 2002).
• A t-norm is a ½0; 1  ½0; 1 ! ½0; 1 mapping The Extension Principle
which is associative, commutative, monotone Apart from basic logical connectives, fuzzy logic
increasing (in both arguments) and which satisfies offers a number of tools for generalizing and
the boundary conditions a 0 = 0 and a 1 = a for “fuzzifying” existing theories and methods. One of
all 0  a  1. Well-known examples of t-norms these tools is the so-called extension principle, which
include the minimum ða; bÞ 7! minða; bÞ, the prod- allows for extending a mapping f : n !  to a fuzzy
uct ða; bÞ 7! ab, and the Łukasiewicz t-norm mapping F : ðÞn ! ðÞ in a generic way.
ða; bÞ 7! maxða þ b  1; 0Þ. F accepts fuzzy subsets of  as input and, correspond-
A t-norm naturally qualifies as a generalized logical ingly, produces fuzzy subsets of  as output. More
conjunction. Moreover, it can be used to define the specifically, for fuzzy subsets A1 ; . . . ; An as input, the
intersection of fuzzy subsets A; B 2 ðÞ as fol- output B ¼ FðA1 ; . . . ; An Þ is a fuzzy subset of  with
lows: mA\B ðxÞ ¼ mA ðxÞ mB ðxÞ for all x 2 . membership function
F 784 Fuzzy Logic

 
mB ðyÞ ¼ sup minfmA1 ðx1 Þ; . . . ; mAn ðxn Þg
y¼f ðx1 ;...; xn Þ
mB ðyÞ ¼ sup min mA ðxÞ; min ðmAðiÞ ðxÞ⇝mBðiÞ ðyÞÞ ;
x21 ...m i¼1;...;n

The supremum operator in this expression is where mAðiÞ ðxÞ ¼ mAðiÞ ðx1 Þ mAðiÞ ðx2 Þ ... mAðiÞ ðxm Þ:
1 2 m
playing the role of a generalized existential quantifier. for a t-norm , and ⇝ is a fuzzy implication.
Thus, mB ðyÞ can be interpreted as the truth degree
of the following proposition: 9ðx1 ; . . . ; xn Þ 2 n : Applications
ð8i 2 f1; . . . ; ng : ðxi 2 Ai ÞÞ ^ ðy ¼ f ðxÞÞ. Or, stated Fuzzy methods have not only been developed
differently, y belongs to the fuzzy output F(A) insofar for ▶ knowledge representation and information
as there exist x1 ; . . . ; xn that belong, respectively, to processing, but also in many other branches of applied
A1 ; . . . ; An and are mapped to y. mathematics, including optimization, decision making,
statistics, and data analysis. Moreover, a plethora of
Fuzzy Inference concrete applications have been developed in different
Fuzzy sets can be used to formalize vague and fields, ranging from fuzzy control systems over flexible
imprecise knowledge. For example, a basic propo- querying in databases to medical expert systems.
sition of the form “X is A”, where X is a variable In bioinformatics and systems biology, fuzzy
with domain  and A a fuzzy subset of , can be methods have been used with the aim to capture the
understood as a flexible ▶ constraint on the value of intrinsic fuzziness of terms, concepts, and relation-
X: those values x 2  with mA ðxÞ ¼ 0 are excluded, ships in the biological sciences, and advantages of
while all other values x are considered as possible to fuzzy sets and fuzzy logic for analyzing biological
some degree, namely to the degree mA ðxÞ; in partic- data and modeling biological systems are becoming
ular, those x with mA ðxÞ ¼ 1 are declared as fully more and more recognized. As a concrete example, we
plausible. mention the use of fuzzy ▶ clustering as a data analysis
In principle, fuzzy ▶ inference can then be real- tool for inferring homogeneous groups of related bio-
ized by combining and propagating constraints of logical entities (like genes) in a data-driven way; oblig-
that type, using suitable logical operators as well as ing to biological reality, these groups may have soft
projection and extension operators for fuzzy rela- boundaries and are allowed to overlap (Dembélé and
tions. Fuzzy rule-based ▶ inference can be seen as Kastner 2003; M€oller-Leveta et al. 2005). Apart from
an important special case. Here, the idea is to this, fuzzy methods have been applied to many other
express knowledge about the (functional) depen- problems in this field; for an overview of recent
dency between attributes in the form of a set of advances, see (Xu et al. 2008; Jin and Wang 2009).
IF–THEN rules:

ðiÞ ðiÞ
Ri : IFðX1 is A1 Þ AND ðX1 is A2 Þ AND . . . AND Cross-References
ðXm is AðiÞ
m Þ THEN ðY is B Þ ðiÞ
▶ Clustering
▶ Constraint
As an example, consider a rule like “If gene X is ▶ Gene Expression
over-expressed and gene Y is under-expressed, ▶ Inference
then gene Z is over-expressed”. A rule of that ▶ Knowledge Representation
kind can be seen as a soft ▶ constraint that partly
excludes some value combinations
ðx1 ; x2 ; . . . ; xm ; yÞ 2 1  2  . . .  m  . Given
a fuzzy specification of the input attributes Xj in References
terms of fuzzy subsets Aj ð j ¼ 1; . . . ;mÞ, the output
Dembélé D, Kastner P (2003) Fuzzy C-means method for clus-
produced by the fuzzy rule system consisting of n
tering microarray data. Bioinformatics 19(8):973–980
rules Ri ði ¼ 1; . . . ; nÞ is a fuzzy subset B of  such Hajek P (1998) Metamathematics of fuzzy logic. Kluwer,
that Dordrecht
Fuzzy Set Theory 785 F
Jin Y, Wang L (eds) (2009) Fuzzy systems in bioinformatics and Xu D, Keller JM, Popescu M (2008) Applications of fuzzy logic
computational biology. Springer, Berlin in bioinformatics. World Scientific, Singapore
Klement EP, Mesiar R, Pap E (2002) Triangular norms. Kluwer, Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Dordrecht
M€oller-Leveta CS, Klawonn F, Choc KH, Yina H,
Wolkenhauer O (2005) Clustering of unevenly sampled
gene expression time-series data. Fuzzy Set Syst
152(1):49–66
Pedrycz W, Gomide F (2007) Fuzzy systems engineering:
Fuzzy Set Theory
toward human-centric computing. Wiley, New York
Russell B (1923) Vagueness. Aust J Psych Phil 1:84–92 ▶ Fuzzy Logic

F
G

G Domain Gcn2-dependent General Translational


Control
▶ Groove (G) Domain
▶ Translational Control of GCN4

G Type Domain

▶ Groove (G) Domain


Gene and Allele Names

▶ Gene and Allele Nomenclature

G1 Checkpoint

▶ Cell Cycle Transition, Detailed Regulation of Gene and Allele Nomenclature


Restriction Point
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine, UPR 1142, Université
G1 Phase Checkpoint Montpellier 2, Montpellier, France

▶ Cell Cycle Transition, Principles of Restriction


Point Synonyms

Gene and allele names; Gene and allele symbols

G2/M Transition
Definition
▶ Cell Cycle Transitions, G2/M
Gene and allele nomenclature for immunoglobulins
(IG) or antibodies and T cell receptors (TR) has been
set up by IMGT®, the international ImMunoGeneTics
GCN Control information system® (http://www.imgt.org) (▶ IMGT®
Information System) (Lefranc and Lefranc 2001a, b).
▶ Translational Control of GCN4 The gene and allele nomenclature is based on

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
G 788 Gene and Allele Symbols

the concepts of classification (generated from Cross-References


the ▶ CLASSIFICATION Axiom) of ▶ IMGT-
ONTOLOGY, the global reference in ▶ immunogenetics ▶ Chain Type
and ▶ immunoinformatics. ▶ Configuration Type
The four major concepts of classification, Group, ▶ FunctionalityType
Subgroup, Gene, and Allele, have allowed the gene and ▶ Gene Type
allele nomenclature for the V, D, J, and C gene type ▶ IMGT-ONTOLOGY
(▶ Gene Type) of the IG and TR whatever the receptor ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
type, the chain type (▶ Chain Type), and the species ▶ IMGT-ONTOLOGY, Leafconcept
from fish to human (▶ TaxonRank). ▶ IMGT ® Information System
The Group concept allows to classify a set of genes ▶ Immunogenetics
which belong to the same multigene family, within the ▶ Immunoinformatics
same species or between different species. For the IG ▶ Location Type
and TR, “Group” allows to classify a set of genes ▶ Pseudogene
which belong to the same ▶ GeneType (V, D, J or C). ▶ Structure Type
The Subgroup concept allows to classify a subset of ▶ TaxonRank
genes which belong to the same group, and which, in
a given species, share at least 75% of identity at the
nucleotide sequence level (and in the germline config- References
uration (▶ Configuration Type) for the IG and TR V,
D, and J genes). Giudicelli V, Chaume D, Lefranc M-P (2005) IMGT/GENE-DB:
a comprehensive database for human and mouse immunoglobu-
The Gene concept allows to classify, in the
lin and T cell receptor genes. Nucleic Acids Res 33:D256–D261
▶ IMGT® Information System, a unit of DNA sequence Lefranc M-P, Lefranc G (2001a) The immunoglobulin
that can be potentially transcribed and/or translated (this FactsBook. Academic Press, London, pp 1–458
definition includes the regulatory elements in 50 and 30 , Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic, London, pp 1–398
and the introns, if present). The leafconcepts (▶ IMGT-
ONTOLOGY, Leafconcept) of “Gene” are gene names
(Lefranc and Lefranc 2001a, b). In IMGT-ONTOL-
OGY, a gene name is composed of the name of the Gene and Allele Symbols
species (leafconcept of the ▶ TaxonRank “Species”)
and of the international Human Genome Organisation ▶ Gene and Allele Nomenclature
(HUGO) Nomenclature Committee (HGNC)/IMGT
gene symbol, for example, Homo sapiens IGHV1-
2. By extension, orphon (▶ Location Type) and
▶ pseudogene (▶ FunctionalityType) gene names are Gene Association Analysis,
also leafconcepts of “Gene.” Frequent-Pattern Mining
The Allele concept allows to classify a polymorphic
variant of a gene. The leafconcepts of “Allele” are Jesús Aguilar-Ruiz1, Domingo Rodrı́guez -Baena1 and
allele names. Alleles identified by the mutations of Ronnie Alves2
1
the nucleotide sequence are classified by reference to School of Engineering, Pablo de Olavide University,
allele *01. For immunoglobulin (IG) and T cell recep- Escuela Politécnica Superior, Seville, Spain
2
tor (TR) genes, full description of mutations and allele Instituto de Informática, Universidade Federal do Rio
name designations are recorded for the core sequences Grande do Sul, Porto Alegre, Brasil
(V-REGION, D-REGION, J-REGION, C-REGION).
They are reported in Alignment tables, in IMGT Rep-
ertoire http://www.imgt.org and in IMGT/GENE-DB Definition
(Giudicelli et al. 2005) of IMGT ®, the international
ImMunoGeneTics information system® (▶ IMGT ® In ▶ transcriptomics, Gene association analysis helps
Information System). to infer ▶ gene expression profiles from ▶ DNA
Gene Association and Linkage Analysis 789 G
microarray studies by enumerating and evaluating all statistically, overrepresentations of gene association
possible gene association patterns. patterns with gene annotations. Text mining can be
Gene association patterns are essentially ▶ asso- also applied for searching genes being already
ciation rules induced by sample frequency of a gene published in the medical literature. In fact, it is also
in an observed ▶ DNA microarray study. For exam- possible to include biological background in early
ple, an association rule between genes in the form stages of the Gene Association Analysis, being espe-
R1: geneA ! geneB, and geneC could be an indic- cially important in integrative genomics. In this case,
ative that when geneA is ‘over-expressed’, it is association patterns will present a global appreciation
likely to observe an ‘over-expression’ in geneB rather than a local perspective of a microarray study.
and geneC. For instance, a rule like Ribosome ! []T6, []T7
The starting point in the Gene Association Anal- combines information about metabolic pathways,
ysis is a N  M matrix of gene expression values, expression, and temporal data, meaning that genes
where the rows correspond to experimental condi- involved in Ribosome pathway are under-expressed
tions and columns represent genes. Most of the in that respective time points. G
techniques for extraction of gene association pat-
terns require that the matrix passes through
a discretization process. So, the input matrix is References
preprocessed in order to transform their values in
binary values. Alves R, Rodriguez-Baena D, Aguilar-Ruiz J (2010) Gene
association analysis: a survey of frequent pattern mining
Depending on the application and on the type of
from gene expression data. Brief Bioinform 11(2):
information to be extracted, values “0” and “1” will 210–224
have a specific meaning. As an example, consider that Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining:
the microarray will be mapped to a binary gene expres- current status and future directions. Data Min Knowl Discov
15:55–86
sion matrix in such way that a gene tagged with “1” in
a particular condition stands for an “Over-expression”
and “0” stands for an “Under-expression.” Then, all
significant genes are selected and must pass through an
enumeration process. Again, genes considered as “sig- Gene Association and Linkage Analysis
nificants” depend on the specific experiment. In the
former matrix example, significant genes could be the Roger Higdon
“Over-expressed” ones. Finally, frequency is observed Seattle Children’s Research Institute, Seattle,
to detect remarkable frequent patterns. WA, USA
▶ Frequent pattern mining is the most costly task in
gene association analysis (Alves et al. 2010). Given the
high dimensionality of the gene expression matrices, Synonyms
classical techniques, like the ones based on Apriori
(see the review paper of Han et al. 2007 for further Gene mapping; Genetic association; Genetic
information), might not be suitable to high- epidemiology; Genome-wide association; Linkage
dimensional data analysis. New frequent pattern disequilibrium
mining techniques properly devised for gene associa-
tion analysis have been developed to face this new
challenge. Definition
Once all the frequent gene groups have been
obtained, they are combined in order to generate asso- Gene association is the association between a genetic
ciation rules. variation (genotype, haplotype, or single nucleotide
The significant gene association rules discovered polymorphism (SNP)) and a physical trait (phenotype),
(with remarkable frequency) are evaluated to check typically the presence or absence of a disease. Linkage
their biological relevance (Alves et al. 2010). To do analysis is the study of gene association due to their
so, biological databases could be used for checking, proximity on the same chromosome.
G 790 Gene Association and Linkage Analysis

Characteristics alleles to the number expected based upon Mendelian


inheritance using a chi-squared test. Another family-
Genetic linkage is the tendency of gene loci or alleles to based test is the transmission disequilibrium test (TDT)
be inherited together due to their physical proximity on (Spielman and Ewens 1996). The TDT utilizes the geno-
the same chromosome. This proximity causes them to types or haplotypes of parents and a disease-affected
stay together during meiosis, and they are therefore child. In the TDT the number of alleles transmitted
genetically linked. Genetic association tests are from parents to the child is compared to the number not
used to find genetic linkage between a genetic trait or transmitted. Patterns of nonrandom transmission indicate
polymorphism and a physical trait or phenotype such as association of genetic marker with a disease. The simplest
a disease (de Bakker et al. 2005). A similar term to TDT is based on McNemar’s test for matched pairs.
genetic linkage with a different meaning is linkage dis- For genome-wide association studies, a disease is
equilibrium, a term used in the study of population compared to a large number of known genetic markers,
genetics for the nonrandom association of alleles at two typically SNPs using case-control studies or family-
or more loci, not necessarily on the same chromosome based tests. Genome-wide association studies identify
(Terwilliger and Ott 1994). Genetic association studies the SNPs most closely associated with a disease enabling
are truly testing for linkage disequilibrium which may or researchers to identify the most likely location(s) of
may not be due to actual linkage on a chromosome. disease-related genes. The log-odds (LOD) score is the
A number of different types of studies are used to test most common metric for ranking association; it is simply
for genetic association or linkage. These tests may be the value of the log-likelihood test for genetic associa-
used for testing the association of a disease versus tion. LOD scores are based upon parametric tests typi-
a single genotype, but more commonly they are used cally used in case-control studies or family-based tests;
to test against a large number of SNPs in order to however, there are also nonparametric-based alterna-
determine the location of genes associated with tives. Tests for association can be carried out using single
a particular disease. The studies are known as genome- point linkage, which considers the association disease
wide association studies (Manolio et al. 2010). The most and genetic markers on an individual basis, or by using
commonly used tests are case-control studies which multipoint linkage, which utilize multivariate measures
compare the distribution of genetic polymorphism to calculate the association between disease and a set of
between a sample of disease case and control subjects. genetic markers. Multipoint linkage tests can increase
A simple chi-squared or ▶ Fisher’s exact test can be used statistical power but are much more complicated and
to test for association. If cases and controls are not well numerically intensive to carry out.
matched for ethnicity or geographic origin, then genetic Gene association and linkage studies can also be
associations may be confounded with the effect of based on quantitative outcome rather than just dichot-
population stratification. Population stratification occurs omous outcomes such as disease state. These are
where subpopulations have different frequencies of known as quantitative trait loci (QTL) models and
genetic traits due to common ancestry. studies (Sen and Churchill 2001).
Family-based tests can have much more statistical
power than case-control studies because they utilize the
genetic relationships between family members. How- Cross-References
ever, these studies are much more difficult to conduct
since they can be difficult for genetic data among family ▶ Fisher’s Test
members. The earliest forms of family-based tests are
based on pairs of siblings affected with the same disease;
this is known as the affected sib-pair (ASP) test. If References
a genetic marker is not linked to the disease, then trans-
mission of alleles should be governed by random Men- de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ,
delian inheritance, but if the disease gene and marker Altshuler D (2005) Efficiency and power in genetic associa-
tion studies. Nat Genet 37(11):1217–1223
gene are linked, then the siblings should share alleles
Manolio TA, Guttmacher AE, Manolio TA (2010) Genomewide
more often than by chance. The simplest ASP tests are association studies and assessment of the risk of disease.
based on a comparison of the number of observed shared N Engl J Med 363(2):166–176
Gene Expression Biomarkers 791 G
Sen S, Churchill GA (2001) A statistical framework for quanti- represent a combination of several gene products,
tative trait mapping. Genetics 159:371–387 a gene expression signature.
Spielman RS, Ewens WJ (1996) The TDT and other family-
based tests for linkage disequilibrium and association. Am
J Hum Genet 59(5):983–989, PMC 1914831
Terwilliger JD, Ott J (1994) Handbook of human genetic link- Characteristics
age. Johns Hopkins Press, Baltimore
In the past, most of the studies exploring the
transcriptome (collection of all RNA molecules in
Gene Expression a cell or tissue) were done with methods targeting single
or few molecules, like Northern blotting. The last 20
Yufei Huang years has witnessed a great development in technologies
Picower Institute for Learning and Memory, that allow a large-scale screening of genes expressed in
Massachusetts Institute of Technology, Cambridge, a given cell or tissue. In this respect, the technologies that
MA, USA have contributed most for this exploratory work are G
Greehey Children’s Cancer Research Institute, Expressed Sequence Tags (ESTs), ▶ DNA microarray,
University of Texas at Texas Health Science Center at and Serial Analysis of Gene Expression. Although
San Antonio, San Antonio, TX, USA important, these technologies have all significant limita-
Department of Epidemiology and Biostatistics, tions specially related to sensitivity. It is believed that all
University of Texas at Texas Health Science Center at these technologies cover just a fraction of the whole
San Antonio, San Antonio, TX, USA transcriptome of a given sample, missing for example
most of the low abundant RNA messages.
In spite of these limitations, these technologies
Definition have made important contributions to the process of
▶ biomarkers discovery, especially for gene expres-
Gene expression is the process by which a functional sion biomarkers. A gene expression biomarker can be
product, such as protein, is synthesized according to either a product from a single gene or a signature
genetic information. It usually consists of the following comprised of many gene products. Examples of the
stages: transcription, post-transcription, translation, and first type of biomarker include the transmenbrane pro-
post-translation. tein HER2, a breast cancer biomarker that presents
a direct correlation to prognostic and if a given tumor
will be sensitive to antibody therapy with trastuzumab.
Cross-References The most known example of the second type of gene
expression biomarker is MammaPrintTM, a 70-gene
▶ Gene Regulation breast cancer signature that assess the recurrence
potential of certain types of breast tumors.
Although a gene expression biomarker can be used
Gene Expression Biomarkers for the classification of any cell or tissue state, most of
the efforts for the identification of such biomarkers have
Beatriz Stransky1 and Sandro J. de Souza2 been devoted to pathological states, especially cancer.
1
Universidade Federal do ABC, Santo Andre, Brazil Gene expression biomarkers have clinical relevance
2
Ludwig Institute for Cancer Research, Sao Paolo, (▶ Biomarkers, Clinical Relevance) since they can be
Brazil used to determine the presence of the disease, its stage,
and to analyze and monitor the responses to treatments.
They can also be classified as diagnostic biomarkers or
Definition predictive biomarkers (Blomme and Warder 2008).

Gene expression biomarkers are biological markers Discovery Process


identified through large-scale gene expression profil- The standard pathway for gene expression biomarker
ing. They may consist of a single gene product or discovery and development can be described by four
G 792 Gene Expression Biomarkers, Ranking

steps: (1) the establishment of sets of high-quality of data are protein-protein interaction networks
samples; (2) the use of a large-scale gene expression (interactome) and gene regulatory networks. The use
platform; (3) the identification of a gene expression of these networks allows the use of graph theory to
biomarker through mathematical and computational explore other quantitative parameters of the data.
strategies; and (4) validation of the discriminatory The dissection of a gene expression profiling into
power of the gene expression biomarker in an indepen- pathways and functional modules (through a system biol-
dent set of samples. ogy approach) will serve to improve the identification of
Initially, the discovery process for gene expression new gene expression biomarkers as well as to increase
biomarkers involves the purification of RNA for the opportunities for therapeutic intervention in case of
a given sample and the use of one of the above plat- diseases. Even if a drug target is not directly differen-
forms for gene expression profiling. The process is tially expressed in a disease state, the corresponding
only effective if a large number of samples are used drug may still be useful if the pathway that the target
to account for biological variability. ▶ Data mining belongs is seen differentially expressed in that disease.
strategies are then used to identify putative gene
expression biomarkers. These biomarkers, either
a single product or a signature, need to be further Cross-References
validated in a larger and independent panel of samples.
In the last few years, the field of gene expression ▶ Biomarker Discovery, Knowledge Base
biomarkers has been revolutionized by the development ▶ Biomarkers
of next-generation sequencing. These new sequencing ▶ Biomarkers, Clinical Relevance
technologies have allowed an exhaustive analysis of the ▶ Biomarkers, Protein Expression
transcriptome, covering with high sensitivity all RNA ▶ Data Mining
molecules in a sample. Moreover, they have allowed the ▶ DNA Microarrays
identification of a large collection of transcripts variants ▶ Gene Expression
generated through processes like alternative splicing and ▶ Gene Ontology
alternative polyadenylation (Caballero et al. 2001).
The last decade has also witnessed the emergence
of non-coding RNAs, especially ▶ microRNAs, as References
critical regulatory molecules in many normal and path-
ological conditions. Not surprisingly, micro-RNAs Blomme EAG, Warder SE (2008) Gene expression-based bio-
markers of drug safety. In: Wang F (ed) Biomarker methods
have been characterized as gene expression bio-
in drug discovery and development. Humana Press, Totowa
markers in many biological conditions, especially can- Caballero OL, de Souza SJ, Brentani RR, Simpson AJG
cer (Jeffrey 2008). (2001) Alternative spliced transcripts as cancer markers.
Dis Markers 17:67–75
Jeffrey SS (2008) Cancer biomarker profilling with microRNAs.
Gene Expression Biomarkers and Systems Biology
Nat Biotechnol 26:400–401
Although these large-scale gene expression profiling
technologies have generated datasets that are per se
very informative, their integrated use has proven to be
the most effective way to extract more valuable infor- Gene Expression Biomarkers, Ranking
mation. In that aspect, platforms based on systems
biology approaches have been very useful in serving Ronnie Alves
as a scaffold for integration of gene expression data. Instituto de Informática, Universidade Federal do Rio
A critical issue when integrating gene expression Grande do Sul, Porto Alegre, Brazil
data into a system biology approach is the ▶ ontology
of genes and gene products. To be effective, this inte-
gration has to obey certain classification rules that will Synonyms
allow the identification of pathways and functional
modules as gene expression biomarkers. The most Differential expression analysis; Gene ranking; Order
used networks to serve as a scaffold for the integration statistics; Top-K lists
Gene Expression Biomarkers, Ranking 793 G
Definition situation is gene expression measurements of one cell
line before and after chemical treatment. Furthermore,
Gene expression biomarkers by ranking refer to the the availability of replicates enables to rank genes
task of selection and ordering of the most significant according to their associated t-statistic for each gene:
genes from transcriptomic studies, where, usually,
p
a control versus target experimental setting is properly t ¼ m=ðst= nÞ (1)
designed for the evaluation of differentially expressed
genes (DEG) associated to a particular tissue or cell in where m is the difference of means across replicates; st,
the related study. Selection is evaluated through the within groups standard deviation; and n, the
a differentiation score (Statistics), and then, genes are number of genes considered for testing. F-scores are
ranked in ascending order having high-scored genes the straightforward generalization of t-scores in the
(Gene Markers) in the top of the gene list. multiconditional case (Ewens and Grant 2004). Prob-
lems arise when genes with small intensity differences
present almost no changes between groups. This might G
Characteristics yield high t-scores and thus, these genes will pop up in
the top of the list. To overcome this situation one could
Ranking Gene Expression Differences artificially enlarge these variances by employing
Differential expression analysis is the traditional strat- different penalizing factors in the t-statistic test. In
egy for ranking genes (Biomarkers, Gene Markers) (Lonnstedt and Speed 2002) a parametric empirical
from gene expression data. The task of finding DEG Bayes approach being equivalent to a penalized
falls into the following steps (Steinhoff and Vingron t-statistic is introduced:
2006): p   
• Ranking: genes are ranked according to their evi- t¼m f þ std2 n (2)
dence of differential expression.
• Assigning significance: a statistical significance is where f is the penalty value which is estimated from the
being assigned to each gene. mean and standard deviation of the variance across
• Cut-off value: to arrive at a limited number of DEG samples. The approach entitled “significance analysis
a cut-off value for the statistical significance needs of microarrays,” or just SAM (Tusher et al. 2001), is
to be determined. one of the most used penalized t-statistic. Another
The simplest experimental setting is the comparison similar strategy based on percentiles is proposed in
of two experimental groups CA and CB and asking for (Efron et al. 2001), although in this case it suggested
their differences in each gene. Basically, one can use the application of an additive penalizing factor in the
the empirical intensity values of each series CA and CB denominator of the t-statistic that it is the 90th percen-
and introduce an ordered list of ranked differences tile of the standard deviation across samples. If the
between them. Typically, in this “simple” approach penalization factor is zero the method is solely based
a fixed cut-off is chosen, usually this is a fold change on an ordinary t-statistic test.
of two. Thus, all genes showing a fold change of more A number of linear methods have also been pro-
than two are considered to be differential. It has been posed for ranking gene expression. In the ANOVA
shown in several studies that such approach is able to (Regression, Statistics) model it is assumed a linear
provide a decent DEG list with high reproducibility, model of specific effects (like dye, slide, treatment,
but it has also been reported having low statistical gene effects and their associated interactions) for log
power and high false discovery rate (FDR). intensities of all genes. A moderated t-statistic is
Availability of repetitions provides for a richer spec- suggested in (Smyth 2004) which is proportional to
trum of applicable statistical procedures. Either one com- the t-statistic with sample variance offset. It is also
pares two groups or multiple groups. In the two-group possible to design a robust linear model for each single
comparison, one considers either a paired or unpaired gene, estimating contrasts of all pairwise comparisons
situation. Comparing a healthy group with a diseased one of tested groups. Rather than applying t-statistics
is an example for an unpaired experiment because the approaches for ranking gene expression differences,
samples are independent. An example for a paired one could take advantages of nonparametric tests,
G 794 Gene Expression Biomarkers, Ranking

usually based on a Wilcoxon rank sum test or permu- Gene Expression Biomarkers, Ranking, Table 1 Possible
tation t-test. Rank-based strategies use rank scale outcomes from m hypothesis tests based on a significance
threshold t 2 (0, 1] to their associated P-values
information instead of the numerical ones for differen-
tiation expression analysis and it can be applied with- Not significant Significant
(P-value > t) (p-value  t) Total
out any assumptions regarding data distribution.
Null true U V m0
Alternative T S m1
Significance of the Ranked Gene Lists true
Once gene rankings are found, following statistical W R m
tests, the next step is checking its significance. Usually,
researchers use a P-value cut-off of 0.05 and genes
presenting a lower P-value are those showing signifi- As demonstrated in (Benjamini and Hochberg 1995),
cance. On the other hand, in order to obtain such the FDR offers a less strict multiple test criterion than
p-value multiples tests are conducted, and, in the end, the FWER, being more appropriated for differential
it can increase the false discovery rate. To overcome expression analysis.
this situation one alternative is finding a criterion to
limit the number of testing procedures. Thus, one can Consensus Gene-Ranking
either remove from the test genes which are not Several statistical tests have been proposed in the
expected to be relevant or ignoring those genes literature making the selection of one unique ranking
presenting quite low expression variation across all method a hard task. Indeed, there is no consensus with
experimental conditions. Therefore, when using mul- respect to one universal (unique) rank test (r). One
tiple tests one must evaluate the error rate associated suggested alternative is to evaluate empirically the
when assessing the statistical significance. Given ranked list while combining several tests before electing
a type I error rate (false positive or false discovery the final gene list, rather than pushing optimizations into
rate) controlling for multiple testing means correcting one particular test trying to find an expected gene list.
P-values such that the given error rate can be The common criterion is evaluating the level of
guaranteed for all tests. Methods can be divided into consensus among the top-100 ranked genes by
those that control the family wise error rate (FWER) or intersecting the top-k lists:
the false discovery rate (FDR). The probability of at
least one type I error within the significant genes is X
p
called FWER. The FDR is the expected proportion of sðr; r 0 ; kÞ ¼ Iðrj  k ^ r 0j  kÞ (5)
type I errors within the rejected hypotheses. Table 1 j¼1

describes the various outcomes when applying multi-


ple tests to determine which of the m hypothesis tests where I denotes the indicator function [I(A) ¼ 1 if A is
are statistically significant. Specifically, V is the num- true, I(A) ¼ 0 otherwise], or the proportion s(r, r0 , k)/k
ber of type I errors and R is the total number of of genes in the top-k list from l that, are also in the top-k
significant null hypotheses (total discoveries). The list from l0 , also denoted as percentage of overlap or
FWER is defined to be percentage of overlapping genes (POG). An overview
regarding stability measures for consensus gene-
FWER ¼ PrðV  1Þ (3) ranking is provided in (Boulesteix and Slawski 2009).
Consensus can also be achieved by measuring the
and the FDR is usually defined to be (Benjamini and levels of convergence and divergence while evaluating
Hochberg 1995) several dissimilarity matrices (rdist) based on finding
gene rankings. The motivation is to use a “voting” strat-

   egy in such way that gene rankings that are identified as
V V
FDR ¼ E ¼ E ; R > 0 PrðR > 0Þ (4) closer to each other could be an indication of agreement
R_1 R
among different tests. Thus, like employing ensemble-
learning (Baldi and Brunak 2001) strategies in supervised
The effect of “R v 1” in the denominator (Eq. 4) of problems, one could use an ensemble clustering solution
the first expectation is to set V/R ¼ 0 when R ¼ 0. by combining several candidate-ranking solutions for
Gene Network 795 G
devising a unified gene ranking. Once all distance matri- References
ces are calculated for all gene rankings, a simple consen-
sus function could be applied: Baldi P, Brunak S (2001) Bioinformatics: the machine learning
approach (adaptive computation and machine learning),
2nd edn. MIT, Cambridge
0
Consensusdistði;jÞ ¼ minðrdistði;jÞ ; rdistði;jÞ Þ (6) Benjamini Y, Hochberg Y (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple testing.
J R Stat Soc B Methodol 57(1):289–300
Boulesteix ALL, Slawski M (2009) Stability and aggregation of
Gene rankings can be grouped according to ranked gene lists. Brief Bioinform 10(5):556–568
a consensus function, and a hierarchical clustering Efron B, Tibshirani R, Storey JD (2001) Empirical Bayes analysis of
approach could be used to define the meta-rankings a microarray experiment. J Am Stat Assoc 96(456):1151–1160
(Gene Modules). Given that more than two gene Ewens WJ, Grant GR (2004) Statistical methods in bioinformat-
ics: an introduction (statistics for biology and health),
rankings could be used in the unified model,
2nd edn. Springer, New York
quantiles could be used as they provide more robust Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M,
statistics. Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, G
Bloomfield CD, Lander ES (1999) Molecular classification
of cancer: class discovery and class prediction by gene
Gene Expression Biomarkers Methods and
expression monitoring. Science 286(5439):531–537
Applications Lonnstedt I, Speed TP (2002) Replicated microarray data. Stat
Most biomarkers have been discovered by molecular Sin 12:31–46
profiling studies, based on association or correlation. Smyth GK (2004) Linear models and empirical Bayes methods
for assessing differential expression in microarray experi-
One of the first molecular profiling studies was
ments. Stat Appl Genet Mol Biol 3(1):1–25
reported in (Golub et al. 1999), who showed that Steinhoff C, Vingron M (2006) Normalization and quantification
gene expression patterns could classify tumors, of differential expression in gene expression microarrays.
thereby remarking new insights like the stage, Brief Bioinform 7(2):166–177
Tusher V, Tibshirani R, Chu G (2001) Significance analysis of
grade, clinical course, and response to treatment
microarrays applied to the ionizing radiation response. PNAS
of a tumor. Recent work in several groups has 98:5116–5124
indentified unique gene expression patterns, being
strong correlated with clinical outcomes. Candidate
biomarkers solely based on gene expression data
depend on both available samples and on the rank- Gene Expression Regulation
ing algorithm, increasing the amount of data (meta-
analysis) can improve the reproducibility of the ▶ Regulation and Autoregulation
resulting gene-ranking models.
A large variety of ranking algorithms are available as
workflow applications such as GSEA, GeneTrailExpress,
Gene Mapping
g:Profiler, and Taverna; or web-based bioinfomatics
resources as omniBioMarker and Oncominer; or
▶ Gene Association and Linkage Analysis
customized software packages as the widely used
R Bioconductor.

Gene Modulation
Cross-References
▶ Gene Regulation
▶ Biomarkers, Ranking
▶ Gene Expression Biomarkers
▶ Gene Expression Biomarkers, Ranking
▶ Gene Regulation Gene Network
▶ Modular Organization of Gene Regulatory
Networks ▶ Biological Disease Mechanism Networks
▶ Regression Analysis ▶ Gene Regulatory Networks
G 796 Gene Normalization with GNAT

automaton, where each accepting state stores the data-


Gene Normalization with GNAT base identifiers for all names that end at this state.
Besides the dictionary, GNAT generates a profile for
Conrad Plake each gene comprising species information, known
Biotechnology Center (BIOTEC), Technische annotations provided by the ▶ gene ontology, associ-
Universit€at Dresden, Dresden, Germany ated diseases, sequence annotations, and other textual
descriptions separated by type.

Definition Gene Mention Recognition


GNAT parses a text at the character level using the
Gene normalization in literature (▶ Entity Mention Nor- dictionary automaton, starting from the beginning of
malization) refers to the process of assigning a gene every word. If an accepting state is reached with the
name occurring in text its corresponding entry in last character of a word, the text passage consumed so
a gene database. The ▶ text mining system GNAT has far is stored as a gene mention together with all poten-
been developed to accomplish this task (Hakenberg et al. tial gene identifiers. Shorter mentions contained within
2008). longer ones are ignored. Next, each mention and its
surrounding words are compared against predefined
word lists and regular expressions to identify and dis-
Characteristics card mentions that resemble a gene name but in the
current context refer to something else (e.g., cell line,
Gene normalization is required for data integration pur- disease, protein domain).
poses, such as the extraction of gene annotation from
scientific publications to complement data stored in gene Gene Mention Normalization
databases. Due to ambiguity of many gene names, which After recognition of gene mentions, the task of nor-
often refer to orthologous or entirely different genes, malization is to identify each mention by assigning the
named after phenotypes and other biomedical terms, or database identifier of the referenced gene. If a text also
resemble common English words, developing automated mentions one or more species, GNAT only allows for
methods to gene normalization is challenging (Fundel genes from these species, ignoring all others. Species
and Zimmer 2006). The process of gene normalization recognition is carried out by ▶ AliBaba, but can,
carried out by text mining systems such as GNAT typi- for example, also be performed using LINNAEUS
cally involves multiple steps starting from building (▶ Named Entity Recognition and Normalization of
a gene name dictionary (data acquisition), finding gene Species, LINNAEUS). If a gene mention is left with
mentions in text, and assigning potential gene identifiers a single gene identifier assigned, this identifier is taken
(gene mention recognition), to finally deciding on the as reference to the Entrez Gene database. In cases
correct identifier for each gene mention as reference to where GNAT encounters an ambiguous gene mention,
a gene database (gene mention normalization). that is, a mention with more than one gene identifier
assigned, it compares the text at hand against each
Data Acquisition candidate gene’s profile. This comparison, which
GNAT utilizes data available from the gene database takes the different types of annotation available into
Entrez Gene and the protein database UniProt. These account, yields a normalized score that reflects the
databases contain known gene symbols, synonyms, similarity between text and profile. The gene whose
and alternative designations, as well as additional profile is most similar to the text is taken as reference
annotations, which further describe genes and their for the ambiguous mention.
products. Prior to building a gene name dictionary,
each name is transformed into a regular expression to Related Tools
allow for its recognition in text even if slight spelling Other text mining tools (▶ Text Mining, Tools)
variations occur (e.g., hyphens vs. white spaces and performing gene normalization in text are available
Roman vs. Arabic numbers). All regular expressions via the biocreative metaserver (▶ BioCreative Meta-
are then compiled into a deterministic finite state Server and Text-Mining Interoperability Standard).
Gene Regulation 797 G
Cross-References
Gene Ranking
▶ AliBaba
▶ Applied Text Mining ▶ Biomarkers, Ranking
▶ BioCreative Meta-Server and Text-Mining ▶ Gene Expression Biomarkers, Ranking
Interoperability Standard
▶ Entity Mention Normalization
▶ Gene Ontology
▶ Named Entity Recognition
Gene Redundancy
▶ Named Entity Recognition and Normalization of
▶ Genetic Redundancy
Species, LINNAEUS
▶ Text Mining, Tools

Gene Regulation G
References
Jia Meng1 and Yufei Huang1,2,3
Fundel K, Zimmer R (2006) Gene and protein nomenclature in 1
Picower Institute for Learning and Memory,
public databases. BMC Bioinformatics 7:372. doi:10.1186/
1471-2105-7-372. http://dx.doi.org/10.1186/1471-2105-7-372
Massachusetts Institute of Technology, Cambridge,
Hakenberg J, Plake C, Leaman R, Schroeder M, Gonzalez G (2008) MA, USA
2
Inter-species normalization of gene mentions with GNAT. Greehey Children’s Cancer Research Institute,
Bioinformatics 24(16):i126–i132. doi:10.1093/bioinformatics/ University of Texas at Texas Health Science Center at
btn299. http://dx.doi.org/10.1093/bioinformatics/btn299
San Antonio, San Antonio, TX, USA
3
Department of Epidemiology and Biostatistics,
University of Texas at Texas Health Science Center at
San Antonio, San Antonio, TX, USA
Gene Ontology
Synonyms
Jiguang Wang
Beijing Institute of Genomics, Chinese Academy of
Gene modulation; Regulation of gene expression
Sciences, Beijing, China

Definition
Definition
Gene regulation (Watson and Roberts 1965) concerns
Gene Ontology (GO) (Ashburner et al. 2000) provides the control of the synthesis of functional gene products or
a hierarchical vocabulary of GO terms for describing expression (mainly mRNAs or proteins) in cells. Proper
functions and characteristics of gene or gene products gene regulation defines the distinct phenotype of
in different databases and different species. There are a biological system and ensures its stability, and
three structured, species-independent ontologies, i.e., misregulation is usually associated with disease. It is
biological processes, cellular components, and molec- also essential in promoting the versatility and adaptabil-
ular functions. GO project is a collaborative effect to ity of a living organism under various environmental
maintain, make cross-links for, and develop tools to conditions.
use the three ontologies.

Characteristics
References
Stages of Gene Expression Regulation
Ashburner M, et al. Gene ontology: tool for the unification of
biology. The Gene Ontology Consortium. Nat Genet. To control the synthesis of gene product (mRNA or
2000;25:25–29. proteins), gene regulation may assume different modes
G 798 Gene Regulation

Gene Regulation, Table 1 Different gene regulations and stages


Stage Targets Regulatory process High-throughput technology
Chromatin domains DNA DNA methylation Methylation array/seq, ChIP-seq, LC-MS
DNA phosphorylation
Histone deacetylation
Transcription DNA-RNA Transcription factor ChIP-chip, ChIP-seq
Repressors
Activators
Enhancers
Post-transcription RNA Capping Microarray, RNA-seq, Exon array, HITS-CLIP, RIP-seq
Splicing
Polyadenylation
RNA editing
microRNA silencing
Translation RNA-protein Translational initiation
Peptide elongation
Termination
Post-translation Protein Acylation Protein array, LC-MS
Phosphorylation
Protein degradation

at including chromatin domain, transcription, post- gene regulations, gene regulatory network is a systems
transcription, and translation. Summarized in Table 1 biology approach toward understanding overall molec-
is a list of different modes of gene regulations. ular mechanisms that control the cell response. Com-
putational reconstruction of gene regulatory network is
High-Throughput Technologies for Gene one of the current research focuses of computational
Regulation system biology.
As gene regulation occurs at multiple stages of gene
expression, different high-throughput technologies Modeling of Gene Regulatory Network
based on microarray, deep sequencing, or Liquid chro- Modeling is the key step in uncovering gene regulatory
matography-mass spectrometry (LC-MS) have been network. Various models have been proposed and we
developed for investigating different gene regulations, discuss some most popular models in the following.
which monitor the expression profiles of thousands of
genes simultaneously. A summary of these technologies Boolean Network
is shown in Table 1. Note that, although the final product A Boolean network (Shmulevich and Dougherty 2009)
of gene regulation is protein, most of the technologies models the association instead of direct regulation
are array/sequencing based and they mainly measure between sets of genes, where transition of the associ-
mRNA activities. This reality is due to the low sensitiv- ation states are modeled by Boolean functions. In
ity of existing proteomics technologies including LC- a Boolean network–based gene regulatory network,
MS and protein array in measuring protein expression. both the regulatory states and regulatory effects are
digitalized, which facilitate the related computation
Systems Biology and Gene Regulatory Network and analysis. Particularly, a Boolean node represents
Systems biology seeks to model all the individual units the binary state (on/activation or off/repression) of
of a biological system under a proper mathematical a gene, and an edge indicates the regulatory effect/
framework, such that both the overall structure and association between the two nodes, which is modeled
construction units of the system can be captured simul- by Boolean functions. When modeling the regulatory
taneously. Since response of cells to changing endog- dynamics, states of all Boolean variables can update
enous or exogenous conditions is governed by intricate with time. A simple example of a Boolean network is
Gene Regulation 799 G
the reaction kinetics, or the dynamics of regulatory
effects. Suppose that an ODEs GRN consists of G
nodes representing the expression of G genes, and let
yg ðtÞ; g 2 ½1; 2; :::; G represent the expression of the
a b g-th gene at time t. Then, the temporal regulatory
effects on a target gene by the regulators can be
modeled by the ODE:

dyg
¼ f g ðy1 ; y2 ; :::; yG Þ (1)
dt

c This equation describes quantitatively the temporal


evolution of the regulatory network. The regulatory
function f g can be derived from known biochemical G
Gene Regulation, Fig. 1 Boolean network principles. The ODE model (1) provides a more accurate
account of gene regulation than the Boolean network
model, but it requires greater knowledge regarding gene
regulation and is also computationally more complicated.
Gene Regulation, Table 2 State transition table
Previous Next Factor Analysis Model
A B C A B C Factor analysis model (Child 2006) usually aims at
0 0 0 0 0 0
modeling transcriptional regulatory network, that is,
0 0 1 1 1 0
direct regulation of mRNA expression by transcription
0 1 0 0 0 1
factors (TFs). Consider a set of genes regulated by a set
0 1 1 1 0 1
of transcription factors. Let yg ðtÞ; g 2 ½1; 2; :::; G rep-
1 0 0 0 1 0
1 0 1 1 1 0
resent the mRNA expression of the g-th gene at time t
1 1 0 0 0 1 and xl ðtÞ; r 2 ½1; 2; :::; L represent the protein-level
1 1 1 1 1 1 expression of the l-th transcription factor also at
time t. The TF regulation can be modeled by a linear
relationship as in (2):

2 3 2 32 3
y1 ðtÞ a1;1  a1;l x1 ðtÞ
6 .. 7 6 .. .. .. 76 .. 7
Gene Regulation, Table 3 Boolean network state update 4 . 5¼4 . . . 54 . 5 (2)
Time 0 1 2 3 4 5 6 7 8 ... yG ðtÞ aG;1  aG;L xL ðtÞ
a 1 0 0 1 0 1 0 1 0 ...
b 0 1 0 1 0 1 0 1 0 ... where ag;l is the regulatory coefficient of the g-th gene
c 0 0 1 0 1 0 1 0 1 ...
by the l-th transcription factor. Note that (2) can also be
written in a matrix form:

illustrated in Fig. 1, where three genes are regulating Y ¼ AX (3)


each other. Its state transition table is shown in Table 2,
and given the initial states (1, 0, 0), their states will be where, Y is the mRNA expression matrix of genes, X is
updated as shown in Table 3. the protein expression level of transcription factors,
which is usually difficult to measure, and A is the regu-
Coupled Ordinary Differential Equations latory coefficient matrix or the loading matrix. In a factor
Instead of describing the final states of regulated tar- model, both A and X are unknowns, and factor analysis
gets, Coupled Ordinary Differential Equations (ODEs) seeks to estimate both A and X simultaneously from the
(Chicone 2006) or stochastic ODE seeks to describe observation Y. Since the model is underdetermined,
G 800 Gene Regulation

Gene Regulation, Table 4 Probability of A


Gene A
Gene A Gene B
UP DOWN
0.2 0.8

Gene Regulation, Table 5 Conditional probability of


B given A
Gene C
Gene B
Gene A UP DOWN
DOWN 0.4 0.6
Gene Regulation, Fig. 2 Bayesian network
UP 0.2 0.8

Gene Regulation, Table 6 Conditional probability of C given


additional knowledge needs to be incorporated to con-
A and B
strain the solution space. Examples of such constraints
Gene C
are orthogonal factors, the sparsity of A, known regula-
Gene A Gene B UP DOWN
tions, etc.
DOWN DOWN 0.01 0.99
DOWN UP 0.8 0.2
Bayesian Network
UP DOWN 0.1 1.9
A Bayesian network (Heckerman 2008) or belief net-
UP UP 0.95 0.05
work is a probabilistic graphical model that models
a set of genes and their conditional dependencies via
a directed acyclic graph (DAG). Bayesian network
includes a wide variety of models as special cases, modeling gene regulation. The protein-level expres-
ranging from the basic deterministic model to more sion is often inappropriately approximated by the
sophisticated hierarchical probabilistic models (Huang corresponding mRNA level; such approximation can
et al. 2009). An example of the Bayesian network is lead to high modeling error. As the increase of com-
shown in Fig. 2, and the conditional probability of each putational processing power, integrating disparate data
node is summarized in Tables 4–6. sets to account for multiple aspects of gene regulation
Depending on the modeling and the availability of the is the current trend of systems biology, where proper
data, the inference goal can be to estimate the unobserved modeling of different types of data is the key.
variables (such as the regulatory coefficients between
genes) and/or to identify the network structure (such as
whether or not a gene is regulated by another gene). Cross-References

Discussion ▶ Epigenetics
Various models have been proposed for modeling sys- ▶ Epigenetics, Drug Discovery
tems-level gene regulations. However, because of the
complexity of gene regulation and the availability of
data, no existing models can cover all aspects of gene References
regulations. Most existing models only apply to
a limited subset of the mRNAs or proteins of interest, Chicone C (2006) Ordinary differential equations with applica-
and they model mostly indirect regulations/association tions. Springer, New York
Child D (2006) The essentials of factor analysis. Continuum,
between genes products.
London
Limited by the current techniques, most current Heckerman D (2008) A tutorial on learning with Bayesian net-
models utilize only mRNA expression levels for works. Innov Bayesian Netw 156:33–82
Gene Regulatory Networks 801 G
Huang Y, Tienda-Luna I et al (2009) Reverse engineering gene
regulatory networks. Signal Proc Mag IEEE 26(1):76–97
Shmulevich I, Dougherty E (2009) Probabilistic boolean
networks: the modeling and control of gene regulatory
networks. Siam, New York Gene6
Watson J, Roberts K (1965) Molecular biology of the gene.
Wa Benjamin, New York

Gene1
e1 Gene5
G

Gene Regulation Network

▶ MicroRNA-mRNA Regulation Networks


Gene4
Ge
Gene2
Gene
e2

G
Gene Regulatory Networks Gene3

Yong Wang
Academy of Mathematics and Systems Science,
Chinese Academy of Sciences, Beijing, China
Gene Regulatory Networks, Fig. 1 A toy example for
the gene regulatory network with six genes. Here, nodes
Synonyms denote genes, and edges denote their regulatory relationships.
Specifically, red arrows represent activation, and blue arcs
represent repression
Gene network; Genetic network

Definition transcription factors that are the main players in regu-


latory networks or cascades. By binding to the pro-
Cells efficiently carry out molecular synthesis, energy moter region at the start of other genes they turn them
transduction, and signal processing across a range of on, initiating the production of another protein, and so
environmental conditions by networks of genes, which on. Some transcription factors are inhibitory.
we define broadly as networks of interacting genes, Mathematically, gene regulatory network is defines
proteins, and metabolites (Chen et al. 2009). Formally as a directed graph (refer to Fig. 1 for a 6-gene network)
speaking, a gene regulatory network or genetic regu- in which nodes denote genes and edges denote their
latory network (GRN) is a collection of DNA segments regulatory relationships. And usually we use a matrix
in a cell which interact with each other (indirectly J to represent the gene regulatory relationships. The
through their RNA and protein expression products) regulatory relationships can be directed, signed, and
and with other substances in the cell, thereby weighted. For example, element Jij represents an effect
governing the rates at which genes in the network are of gene j on gene i, while Jji represents an effect of gene
transcribed into mRNA. In general, each mRNA mol- i on gene j. Thus the influence between gene i and gene
ecule goes on to make a specific protein (or set of j is directed. Furthermore, a sign associated with Jij
proteins). In some cases this protein will be structural, represents a specific role of regulation. For example, if
and will accumulate at the cell-wall or within the cell the sign of Jij is positive, gene j is the activator of gene i.
to give it particular structural properties. In other cases On the other hand, if the sign of Jij is negative, gene j is
the protein will be an enzyme; a micro-machine that the repressor of gene i. Furthermore the associated
catalyses a certain reaction, such as the breakdown of weight (the absolute value) of element Jij indicates how
a food source or toxin. Some proteins, though, serve strong the regulatory interaction is. Obviously, a zero
only to activate other genes, and these are the weight of Jij indicates no interaction between two genes.
G 802 Gene Regulatory Networks

Characteristics transcription factors and their target promoters. They


fundamentally define the transcriptional regulatory
Significance of Gene Regulatory Networks network of the cell. The recently developed ChIP-
Many cellular processes such as cell cycle, cellular chip methodology involves the chromatin immunopre-
differentiation, and apoptosis are well controlled via cipitation of an epitope-tagged transcription factor
gene regulation. On the one hand, gene regulatory (TF) bound to DNA fragments containing target
network is essential for all viruses, prokaryotes, promoters, followed by the hybridization of those
and eukaryotes. It includes the processes to turn the amplified DNA fragments to an intergenic microarray.
information in genes into gene products (proteins) to Currently large amounts of ChIP-chip data in yeast and
increase the versatility and adaptability of an organism other organisms are publicly available. For example,
by allowing the cell to express protein when needed. genome-wide location data performed in yeast by
Even more complex, gene regulatory network drives Harbison et al. (2004) and Lee et al. (2002) contain
the processes of cellular differentiation and morpho- information regarding the binding of 204 regulators to
genesis, leading to the creation of different cell types in their respective target genes in rich medium, and can
multicellular organisms where the different types of be downloaded from their websites (http://web.wi.mit.
cells may possess different gene expression profiles edu/young/regulatory_network/). ChIP-chip data have
though they all possess the same genome sequence. the advantage that they provide a direct biochemical
On the other hand, gene regulation system is extremely link between TFs and promoters and have the potential
complex to allow intuitively understanding regarding to identify targets without knowing the activating
to its nonlinear, dynamics, and robustness properties. conditions. From this viewpoint, ChIP-chip data are
To understand the complex mechanisms inside a very important source of information for analyzing
the gene regulation system, biologists usually direct transcriptional regulatory interactions.
apply various treatments to perturb cells including heat Sequencing techniques can also be used to reveal gene
shock, stress, and other techniques, and then observe the regulatory relationships by systematically analyzing gene
phenotype change of the cell such as the concentration upstream regions in the genome to identify potential
changes of interested molecules in the cell. This type of regulatory elements (also known as regulatory binding
research can produce small-scale regulatory relation- motifs). These motifs, often represented as regular
ships and it is hard to construct a mathematical model expressions, were transformed into the corresponding
for whole-genome scale. Recently, the step to understand weight matrices. We can then simply count the occur-
the gene regulation system inside a cell is greatly accel- rences of regular expression-type patterns with the goal of
erated by the invention of new biological techniques. identifying possible gene regulatory relationships. The
Specifically, DNA microarray and other high throughput weight matrices corresponding to these motifs are subse-
technologies were developed which enabled an experi- quently used to screen all intergenic sequences. The
menter to simultaneously measure the concentration of higher the score of a motif hit in a gene, the more likely
thousands of molecules from a single sample of cells or it will be a regulatory relationship (Brazma et al. 1998).
tissues. Such data offer a possibility to systematically More reliable sources for gene regulatory relation-
identify a model of a cell’s underlying control systems. ships are from the literature and curated databases.
For example, YEASTRACT (Yeast Search for
Modeling Gene Regulatory Network Transcriptional Regulators And Consensus Tracking)
Biologists developed several experimental techniques to is a curated repository of more than 12,500 regulatory
reveal the gene regulatory network. For example, gene associations between transcription factors and target
perturbation experiments (e.g., knockouts or RNA inter- genes in Saccharomyces cerevisiae (Teixeira et al.
ference) may indicate relationships between genes due to 2006), based on more than 900 bibliographic refer-
direct or indirect genetic interactions. In contrast, chro- ences. The information in YEASTRACT is updated
matin immunoprecipitation chip data may reveal direct regularly to match the recent literature on yeast
protein–DNA interactions or cofactor associations with regulatory networks. Since the regulatory relationships
bound transcription factors. from literature and databases are usually generated by
Protein–DNA interaction data concerns the interac- small-scale experiments, they are believed to be of
tions between proteins and DNA, particularly between high quality compared to large-scale experiments.
Gene Regulatory Networks 803 G
The most abundant data to model the gene regula- continuous models of neural networks and difference/
tory network is the microarray data. Microarray tech- differential equations. A common challenge for all these
nologies enable the simultaneous measurement of models is the scarcity of the data, since a typical gene
all RNA transcripts in a cell, producing tremendous expression dataset consists of relatively few time points
amounts of gene expression data from different (often less than 20) with respect to a large number of
research groups. For instance, the Stanford Microarray genes (generally over thousands). In other words, the
Database (SMD) has deposited data for 70,113 number of genes far exceeds the number of time points
experiments, from 341 labs and 56 organisms, as of for which data are available, making the problem of
2007 (Demeter et al. 2007). determining gene regulatory network structure
DNA microarray experiments are usually classified a difficult and ill-posed one (D’Haeseleer et al. 2000).
based on the type of array used in the experiment
(cDNA and oligonucleotide arrays) or according to Two General Reverse-Engineering Strategies
the organism that is profiled. From the viewpoint of The first strategy is the “physical” strategy for reverse-
gene regulatory network modeling, we distinguish engineering transcription regulation using mRNA G
between static and time series experiments. In static expression data (Gardner and Faith 2005). The physical
expression experiments, a snapshot of the expression approach seeks to identify the protein factors that regu-
of genes in different samples is measured. In time late transcription, and the DNA motifs to which the
series expression experiments, a temporal process is factors bind. In other words, it seeks to identify true
measured at various time intervals. Another important physical interactions between regulatory proteins and
difference between these two types of data is that while their promoters. An advantage of this strategy is that it
static data from a sample population (e.g., ovarian can reduce the dimensionality of the reverse-engineering
cancer patients) are assumed to be independently and problem by restricting possible regulators to TFs. It also
identically distributed, time series data exhibit a strong enables the use of genome sequence data, in combination
autocorrelation between successive points. with mRNA expression data, to enhance the sensitivity
Since many biological systems are dynamic systems, and specificity of predicted interactions. The limitation
temporal profiles of gene expression levels during a given of this approach is that it cannot describe regulatory
biological process can often provide more insights into control by mechanisms other than transcription factors.
how gene expression levels evolve in time and how genes A second strategy, which we call the “influence”
are dependent among each other during a given biological approach, seeks to identify regulatory influences between
process. One important feature of such time-course gene RNA transcripts (Yeung et al. 2002). In other words, it
expression data is the possible dependency of gene looks for transcripts that act as “inputs” whose concen-
expression levels across time points for a given gene. In tration changes can explain the changes in “output” tran-
addition, as gene expression levels evolve over time, time scripts. Each transcript may act as both an input and an
intervals can be an important factor that affects the gene output. The input transcripts can be considered the regu-
expression levels. Methods which can preserve the time lators of transcription. By construction, such a model
sequence and the time dependence of the observed data does not generally describe physical interactions between
are needed for analyzing the time-course gene expression molecules since transcription is rarely controlled directly
data (Wang et al. 2006). by RNA (and never by messenger RNA, which is the type
Collectively, these microarray data enable the analy- of RNA predominantly measured by DNA microarrays).
sis on gene expression profiles to detect dependencies Thus, in general, the regulator transcripts may exert their
among genes over different conditions, i.e., reverse engi- effect indirectly through the action of proteins, metabo-
neering the gene regulatory networks. So far, a wide lites, and effects on the cell environment. Nevertheless, in
variety of approaches have been proposed to infer gene some cases, the regulator transcripts may encode the TFs
regulatory networks from time-course data or perturba- that directly regulate transcription. In such cases, the
tion experiments (De Hoon et al. 2003; Dewey and Galas influence model may accurately reflect a physical inter-
2001; Friedman 2004; Gardner et al. 2003; Holter et al. action. An advantage of the influence strategy is that the
2001; Husmeier 2003; Nachman et al. 2004; Tegner et al. model can implicitly capture regulatory mechanisms at
2003). These approaches include discrete models of the protein and metabolite level that are not physically
Boolean networks and Bayesian networks, and measured. That is, it is not restricted to describing only
G 804 Gene Regulatory Networks

transcription factor/DNA interactions. As described in important first step toward biological function discovery
the section on differential equation models, an influence of small molecules and drug design.
model may be advantageous when trying to predict the We can rewrite Eq. 1 in a compact form for all time
global response of the cell to stimuli. The limitation of points of one dataset by matrix notation:
this approach is that the model can be difficult to interpret
in terms of the physical structure of the cell, and therefore dX
¼ JX þ PC (2)
difficult to integrate or extend with further research. dt
Moreover, the implicit description of hidden regulatory
factors may lead to prediction errors. where X ¼ (x(t1),. . .,x(tm)) and dX/dt ¼ (dx(t1)/dt,. . .,
dx(tm)/dt) are n  n matrices with the first derivative
Linear Differential Equations for Gene Regulatory of mRNA concentration dxi(tj)/dt ¼ [xi(tj + 1)xi(tj)]/
Network [tj+1  tj] for i ¼ 1,. . .,n; j ¼ 1,. . .,m. Although the
In general, a genetic network can be expressed by a set forward difference approximation here is utilized for
of nonlinear differential equations. Almost all of numerical computation of dx/dt, backward or other
the existing approaches for gene regulatory network difference approximation methods can be applied
inference use linear or additive models, primarily similarly. Suppose that there are s external perturba-
due to the complex structures of biological systems and tion compounds, then C ¼ (c(t1),. . .,c(tm)) is an s  m
the scarcity of data (Wang et al. 2006a, b, 2007). Fur- matrix representing the s perturbations. The unknowns
thermore, linear equations can capture the main features to be calculated are connectivity matrix J and P.
of the network near the steady state, and can provide
a good starting point for further modeling and analysis.
Assume that there are N microarray datasets X1, Cross-References
X2, . . ., XN with m1, m2, . . ., mN time points, respec-
tively, for one organism. These time-course datasets may ▶ Combinatorial Transcription Regulatory Network
be measured under various environments or stimuli by ▶ Continuous Model
different labs. Let us first consider one time-course ▶ Linear Model
dataset with m time points. A linear differential ▶ MicroRNA-embedding Regulation Networks,
equation can be used to represent the rate of synthesis Logical Modeling
of a transcript as a function of the concentrations of other ▶ Ordinary Differential Equation (ODE)
transcripts in a cell and the external perturbations: ▶ Regulation
▶ Reverse Engineering
dxðtÞ
¼ JxðtÞ þ PcðtÞ; t ¼ t1 ; t2 ; :::; tm (1)
dt
References
where x(t) ¼ (x1(t), . . ., xn(t)) T2Rn, xi(t) is the expres-
sion level (mRNA concentrations) of gene i at time Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene
point t. J ¼ (Jij)n  n is an n  n connectivity matrix regulatory elements in silico on a genomic scale. Genome
with elements Jij representing the effect of gene j on gene Research 8(11):1202
Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
i with a positive, zero, or negative sign, indicating acti- methods and applications in systems biology. Wiley,
vation, no interaction, and repression, respectively. P ¼ Hoboken
(Pij)n  s is an n  s matrix representing the effect of the De Hoon MJL, Imoto S, Kobayashi K, Ogasawara N, Miyano S
s perturbations or s small molecules on x, and c(t)2Rs (2003) Inferring gene regulatory network from time-ordered
gene expression data of bacillus subtilis using differential
represents the external perturbations with s compounds equations. Pac. Symp. Biocomput 17–28
at time t. (In principle, the external perturbation can be of Demeter J, Beauheim C, Gollub J, Hernandez-Boussard T, Jin H,
virtually any type, for example, an external environmen- Maier D (2007) The Stanford Microarray Database: imple-
tal factor, a small molecule, an enzyme, a microRNA, mentation of new analysis tools and open source release of
software. Nucl. Acids Res. 35(suppl_1):D766–770
or a post-translationally modified protein.) A non-zero Dewey TG, Galas DJ (2001) Dynamic models of gene expres-
element Pij of P implies that the i-th gene is a direct target sion and classification. Functional & Integrative Genomics
of the j-th perturbation or compound. Identifying P is an 1(4):269–278
Gene Set and Protein Set Expression Analysis 805 G
D’Haeseleer P, Liang, S, Somogyi R (2000). Genetic network Definition
inference: from co-expression clustering to reverse engineer-
ing. Bioinformatics 16(8):707-726
Friedman N (2004) Inferring cellular networks using probabilis- Gene set expression analysis (GSEA) often referred
tic graphical models. Science 303(5659):799–805 to as gene set enrichment analysis is based upon
Gardner TS, Faith JJ (2005) Reverse-engineering transcription determining whether predefined sets of genes differ
control networks. Phys Life Rev 2(1):65–88 in their expression patterns in some way. This is
Gardner TS, di Bernardo D, Lorenz D, Collins JJ (2003) Infer-
ring genetic networks and identifying compound mode of opposed to conventional ▶ relative expression analysis
action via expression profiling. Science 301(5629):102–105 which examines differences on a gene-by-gene by
Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, basis. Similarly, these methods can be applied to pro-
Danford TW (2004) Transcriptional regulatory code of a tein expression data from mass spectrometry (MS)
eukaryotic genome. Nature 431:99–104
Holter NS, Maritan A, Cieplak M, Fedoroff NV, Banavar JR proteomic analysis.
(2001) Dynamic modeling of gene expression data. Proc Natl
Acad Sci US A 98(4):1693
Husmeier D (2003) Sensitivity and specificity of inferring genetic Characteristics
regulatory interactions from microarray experiments with G
dynamic Bayesian networks. Bioinformatics 19(17):2271–2282
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber Utilizing information about predefined sets, typically
GK (2002) Transcriptional Regulatory Networks in Saccha- taken from biochemical pathway databases such as
romyces cerevisiae. Science 298(5594):799–804 KEGG, Panther, or Metacyc or from the Gene Ontology
Nachman I, Regev A, Friedman N (2004) Inferring quantitative
models of regulatory networks from expression data. Bioin- (GO), increases the ability to infer biological meaning
formatics 20(90001) from gene expression differences and can increase the
Tegner J, Yeung, MK, Hasty J, Collins JJ (2003) Reverse engi- power to detect differences by combining data across
neering gene networks: Integrating genetic perturbations with related genes. Originally, analyses were based on data
dynamical modeling. Proc Natl Acad Sci USA 100(10):5944
Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, from gene expression microarrays; more recently other
Mira NP (2006) The YEASTRACT database: a tool for the technologies such next-gen sequencing are being used to
analysis of transcription regulatory associations in Saccharo- measure gene expression. A number of comparisons and
myces cerevisiae. Nucl. Acids Res. 34(suppl_1):D446–451 reviews of GSEA analyses have been published (Nam
Wang Y, Joshi T, Zhang X-S, Xu D, Chen L (2006) Inferring
gene regulatory networks from multiple microarray datasets. and Kim 2008; Emmert-Streib and Glazko 2011). Addi-
Bioinformatics 22(19):2413–2420 tionally, the methodology is being applied to protein set
Wang Y, Joshi T, Xu D, Zhang, X, Chen L (2006) Supervised expression analysis (PSEA) through the use of MS pro-
inference of gene regulatory networks by linear program- teomics data.
ming. Lecture notes in computer science 4115:551
Wang RS, Wang Y, Zhang XS, Chen L (2007) Inferring tran- The simplest approach to GSEA analysis was based
scriptional regulatory networks from high-throughput data. on chi-square or ▶ Fisher’s tests. A criteria was chosen
Bioinformatics 23:8 for differential expression (e.g., ER > 2, P-value <.05)
Yeung MKS, Tegner J, Collins JJ (2002) Reverse engineering and the proportion of differentially expressed (or over
gene networks using singular value decomposition and
robust regression. Proc Natl Acad Sci USA 99:6163–6168 or under expressed) genes was compared across gene
sets as can be seen in Table 1. ▶ Fisher’s test could
then be used to calculate a p-value for comparing the
proportion of differentially expressed genes in
Gene Set and Protein Set Expression a particular gene set to the proportion in the remaining
Analysis sets based on the hypergeometric distribution in (1).

Roger Higdon   
P NP
Seattle Children’s Research Institute, Seattle,
X DX
WA, USA P  value ¼ PðXÞ ¼   (1)
N
D
Synonyms
This approach has two major flaws: The first, Fish-
Gene set enrichment analysis er’s Exact test assumes expressions from genes within
G 806 Gene Set Enrichment Analysis

Gene Set and Protein Set Expression Analysis, randomization is not useful with very small sample
Table 1 Using Fisher’s Exact test for pathway enrichment sizes. Some argue that using self-contained methods
Differentially Not differentially with sample randomization is the only approach
expressed expressed to ensure statistically valid results (Goeman and
Genes in pathway X P-X Buhlmann 2007). Others contend it is better to use
Genes not in pathway D-X N + X-D-P multiple approaches in order to get more information
out the analyses since competitive and self-contained
methods test different hypotheses (Tian et al. 2005).
Although methods are applicable to proteomics
gene sets are uncorrelated with each other. This is data, PSEA has unique difficulties of its own.
unlikely since gene sets are chosen precisely because Typically, sample sizes are quite small, so the use
the genes are related in some biological manner. of sample permutation or randomization tests is
Hence, there is usually positive correlation among problematic. Also, unlike microarray data which
genes and therefore Fisher’s Exact test is often highly assays an entire genome, MS proteomics usually
anticonservative. In addition, Fisher’s Exact test assay only a fraction of the proteome. This results
dichotomizes a continuous variable (Expression ratio often in sparse coverage of proteins sets such as
or t-statistic) and therefore is less powerful than biochemical pathways.
methods that take advantage of the continuous data.
One of the most widely used approaches for gene
set analysis is the gene set enrichment analysis (GSEA) Cross-References
approach (Subramanian et al. 2005). In this approach,
genes are ordered by a differential expression statistic ▶ False Discovery Rate (FDR)
(typically a ▶ Student’s t-Test), then for each gene set ▶ Fisher’s Test
the distributions of expression statistics are compared ▶ Relative Expression Analysis
between those in the set and those not in the set using ▶ Student’s t-Test
the Kolmogorov-Smirnov test statistic. To calculate
p-values and ▶ false discovery rates (FDR), a permu-
tation test of samples is used in order to account for References
effects of correlation within sets. The Kolomogorov-
Smirnov test has been criticized for lacking power; Efron B, Tibshirani R (2007) On testing the significance of sets
of genes. Ann Appl Stat 1:107–129
therefore, others have proposed gene set analysis
Emmert-Streib F, Glazko GV (2011) Pathway analysis of
approaches that utilize other statistical tests, such as expression data: deciphering functional building blocks of
the GSA approach of Efron and Tibshirani (2007). complex diseases. PLoS Comput Biol 7(5):e1002053
More generally, GSEA approaches have been Goeman JJ, Buhlmann P (2007) Analyzing gene expression data
in terms of gene sets: methodological issues. Bioinformatics
broken down into self-contained and competitive
23:980–987
approaches. Self-contained approaches compare indi- Nam D, Kim S (2008) Gene-set approach for expression pattern
vidual gene sets across conditions without regard analysis. Brief Bioinform 9:189–197
for other gene sets. On the other hand, competitive Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set
enrichment analysis: a knowledge-based approach for
methods measure relative differences between gene
interpreting genome-wide expression profiles. Proc Natl
sets. Examples of competitive tests are enrichment Acad Sci USA 102:15545–15550
tests such as Fisher’s exact test described above, e.g., Tian L, Greenberg SA, Kong SW et al (2005) Discovering sta-
find sets with more or fewer differentially expressed tistically significant pathways in expression profiling studies.
Proc Natl Acad Sci 102:13544–13549
genes. A number of methods are a mixture of these two
approaches. Although, there are a few methods based
on parametric tests, most GSEA approaches are based
on permutation or randomization tests of samples or
genes. Randomizing samples preserves correlation Gene Set Enrichment Analysis
structures and therefore leads to p-values that are less
biased than gene randomization. However, sample ▶ Gene Set and Protein Set Expression Analysis
Gene Sets for Pathways 807 G
products, i.e., messenger RNA (mRNA) levels, that
Gene Sets for Pathways provide a surrogate measurement for pathway activa-
tion. The fundamental assumption for any gene set is
Michael F. Ochs that changes in the measured biomolecules will be
Department of Oncology, Johns Hopkins University, coordinated.
Baltimore, MD, USA The most common gene sets presently comprise
mRNA levels expected to undergo coordinated change
and measurable on a gene “expression” microarray.
Definition The inherent assumption therefore is that these genes
are coregulated by a series of transcriptional activators.
A gene set is a grouping of genes that function in A gene set could also be created to monitor activation
a coordinated manner to provide some biological behav- of a signaling pathway through phosphoprotein levels,
ior. Many biological processes are controlled by path- as most signaling proteins change activity based on
ways, where proteins encoded by genes create phosphorylation of specific amino acid residues. This G
a directional though potentially circular flow, such as would effectively be a protein post-translational mod-
creation of a series of metabolites (metabolic pathway) ification set. A further “gene” set could be created from
or an ordered series of post-translational modifications metabolomic measurements, where the amounts of
(signaling pathway). Gene sets that serve as surrogate metabolites along a metabolic pathway could provide
indicators for activity in biological pathways must mea- a surrogate measurement for enzyme activity. This
sure the appropriate targets, which differ for each type of type of gene set could provide insight into metabolic
pathway. syndromes and diseases, including diabetes.
For a gene set to serve as a surrogate for measure-
ment of activity on a biological pathway, changes in
Characteristics the levels measured by the set must be linked to
changes in the pathway. Unfortunately, the term “path-
Biological phenotypes arise from the coordinated actions way” is used quite loosely biologically, primarily for
of numerous biomolecules and structures in an organism. historical reasons. Initially, the term pathway was used
The coordination, such as during organismal develop- quite literally, indicating nerve conduction, a viral
ment, is controlled by master regulators, often cell invasion course, or auditory path. Later, it referred to
signaling networks responding to environmental and the metabolic pathways that created biomolecules in
endocrine clues coupled to transcriptional and transla- cells (Carson and Frischer 1966), effectors for hor-
tional controllers. Often the change that drives a pheno- monal control (Samuels 1964), and gene interactions
type, such as the development of a specific tissue type, is and epistasis (Avery and Wasserman 1992). For gene
the activation of a set of genes, thus a coordinated change set enrichment analysis, we typically take a pathway to
in the transcript levels of many genes simultaneously. indicate a linked series of biomolecules: (1) a series of
Alternatively, the change could be coordinated post- proteins that are enzymes and the metabolic products
translational modifications of a set of proteins modifying produced by them (a metabolic pathway), (2) a series
their activity or localization, the increase in flux through of proteins that act together to transduce a signal when
a metabolic pathway leading to a buildup or excretion of post-translationally modified (a signaling pathway), or
a metabolite, or the breakdown of cellular structures as (3) a series of transcriptional regulators produced in
occurs during apoptosis. series (a transcriptional regulatory network). For each
Gene sets provide an approach for an analysis to of these, different gene sets are needed to serve as
gain insight into these drivers of phenotype through surrogate markers of activity.
identification of small changes in many biomolecular A gene set for a metabolic pathway could take three
levels or states rather than through identification of forms, depending on the biomolecule measured. The
a change in a single gene, protein, or metabolite. For most common measurement remains the mRNA levels
example, all the transcriptional products produced by from a microarray. In this case, a pathway, such as
the activation of the RAS-RAF-MEK-ERK signaling provided by the Kyoto Encyclopedia of Genes and
pathway could be used as a gene set of transcriptional Genomes (KEGG), would be converted to just the
G 808 Gene Sets for Pathways

enzymes and then to the genes encoding the enzymes. which is the incorrect set if the goal is a surrogate of
The gene set would comprise all the genes whose pathway activity.
protein products (i.e., enzymes) are necessary to take A gene set for a transcriptional regulatory network
the metabolic precursor chemical to its final form. The (TRN) also relies on the relationship between
assumption is that if the cell needs to produce more of a transcription factor (TF) and its target genes. A TRN
this final chemical form, it will coordinately upregulate is essentially a cascade of TFs linked through the gene of
all the genes to produce all enzymes in the pathway. a TF being a transcriptional target of an upstream TF.
This is an indirect measurement of the levels of the Then the initial activation of a single TF can lead to
proteins, where the protein levels are assumed to track activation of additional TFs, which then may lead to
the mRNA levels. It is worth noting that this is not true activation of further TFs, etc. The relationship of genes
in eurkaryotic organisms in general, although it may be sets for this series and for a signaling pathway is obvious.
valid in this limited case of metabolic enzymes. Alter- In both cases, the gene targets of a TF are the key to
natively, the gene set could be proteomic measure- constructing an appropriate gene set. In the case of
ments of the enzymes themselves, indicating the a TRN, there is an additional complication that arises
concentration of the catalysts that drive the chemical due to the timing of the measurements. If time resolution
changes. Finally, the gene set could be the set of is high, then the data can potentially separate the first
metabolites produced, measured by metabolomic tech- round (i.e., primary) of transcription from later (i.e.,
nologies such as mass spectrometry. Here, the direct secondary) transcription, and a true regulatory network
metabolic flux through the pathway would be mea- of relationships could be used, with each TF’s signature
sured. Note that this progression is from less direct to isolated. However, if a time resolution is low or, in the
more direct measurements of the pathway activity, but extreme case, only a single measurement is made, then
also from more mature to less mature technological the appropriate gene set may be all the genes regulated
platforms. by any member in the TRN. Effectively, the superset of
A gene set for a signaling pathway could take two all TF gene sets in the TRN would be the appropriate
forms. The first would be the direct measurement of the gene set to serve as a surrogate for TRN activity.
amount of phosphoproteins and unmodified proteins A gene set is typically tested for significance using
along the pathway. As with the metabolites in a rank sum test upon a statistic generated for each gene,
a metabolic pathway, this would effectively be protein, or metabolite during preprocessing. This initial
a measurement of the flux through the pathway, here individual biomolecular (e.g., gene) statistic relies on
providing the strength of the signal. The second is more comparing two or more phenotypic groups, such as
complex, involving mRNA levels. The complexity arises through a t-test or Wilcoxon rank sum test. The members
from the fact that signaling is driven not by protein levels of the set are compared to the non-set members also
but by post-translational modification of proteins and the measured in the experiment, and significance is deter-
duration of these modifications, the specific modifica- mined by permutation tests on the set labels or by the
tions made, and the number of proteins with these mod- assumption of a distribution on all sets. The final signif-
ifications. As such, the mRNA levels for genes encoding icance for the gene set must be adjusted for multiple
the proteins become an inadequate surrogate for signal- testing, using either family-wise error rate or false dis-
ing activity. However, most, though not all, signaling covery rate corrections. Family-wise error rates are more
pathways lead to changes in the activity of transcription strict, but if the goal of a study is hypothesis-generation
factors (TFs). These TFs have transcriptional gene tar- for more focused testing, then false discovery rate
gets, and the mRNA levels of these targets are directly methods provide a greater chance of finding a novel
modified by the transcription factors, either increased hypothesis.
(activated) or decreased (repressed). The coordinated The complexity of defining gene sets argues that
changes of the transcription factors downstream of sets must be chosen with an understanding of biology
a signaling pathway then provide a surrogate measure and with a determined goal in mind. The set must
of pathway activity. It is critical to note that the gene sets conform to known biological behavior, so that the set
available in databases ignore this complication, however, is an appropriate surrogate for the pathway of interest.
so that the gene set for a KEGG signaling pathway is For example, using the genes encoding the proteins in
typically just the genes encoding the signaling proteins, a signaling pathway is an incorrect set if the goal is
Gene Silencing 809 G
a measure of pathway activity. Limiting the sets tested
to those that are of interest is also useful for minimiz- Gene Silencing
ing the effect of multiple testing, since testing all
existing sets can lead to thousands of tests, reproducing Yan Zhang
the multiple testing problem of individual gene Key Laboratory of Systems Biology, Shanghai
measurements. Institutes for Biological Sciences, Chinese Academy
Future gene sets are likely to involve integration of of Sciences, Shanghai, China
different molecular species. For instance, studies have
already integrated mRNA levels, protein levels, and
metabolite levels into a coherent biological picture. Definition
Gene sets that comprise different types of measure-
ments and molecules will likely replace the present Gene silencing refers to a mechanism by which cells
gene sets as such measurements become more ubiqui- shut down large sections of chromosomal DNA. It is
tous. It will be critical when constructing such sets generally used to describe the “switching off” of a gene G
to carefully reproduce the biological relationships by a mechanism other than genetic modification. That
between molecular species, in order to construct mean- is, a gene which would be expressed (turned on) under
ingful sets. normal circumstances is switched off by machinery in
the cell. Gene silencing is done by incorporating the
DNA to be silenced into a form of DNA called
Cross-References heterochromatin that is already silent.

▶ Biological Disease Mechanism Networks


▶ Biomarkers Characteristics
▶ Co-expression
▶ Disease Classification or Discrimination Gene silencing is a general term describing epigenetic
▶ Functional Enrichment Analysis processes of gene regulation. This process is important
▶ Gene Expression Biomarkers for the differentiation of many different types of cells.
▶ Inference Genes are regulated at either the transcriptional or
▶ KEGG PATHWAY post-transcriptional level.
▶ MicroRNA-mRNA Regulation Networks Transcriptional gene silencing is the result of
▶ Network-Based Biomarkers histone modifications, creating an environment of het-
▶ Pathway, Functional Units erochromatin around a gene that makes it inaccessible
▶ Post-translational Modifications to transcriptional machinery (RNA polymerase,
▶ Protein Interaction Network transcription factors, etc.).
▶ Signal Transduction Pathway Post-transcriptional gene silencing is the result of
▶ Transcriptional Regulatory Network mRNA of a particular gene being destroyed or
▶ Transcriptional Reprogramming blocked. The destruction of the mRNA prevents trans-
lation to form an active gene product (in most cases,
a protein). A common mechanism of post-transcrip-
tional gene silencing is RNAi (Fig. 1).
References Both transcriptional and post-transcriptional
gene silencing are used to regulate endogenous genes.
Avery L, Wasserman S (1992) Ordering gene function: the
Mechanisms of gene silencing also protect the
interpretation of epistasis in regulatory hierarchies. Trends
Genet 8(9):312–316 organism’s genome from transposons and viruses.
Carson PE, Frischer H (1966) Glucose-6-phosphate dehydroge- Gene silencing thus may be part of an ancient immune
nase deficiency and related disorders of the pentose phos- system protecting from such infectious DNA elements.
phate pathway. Am J Med 41(5):744–761
Genes may be silenced by DNA methylation during
Samuels LD (1964) Actinomycin and its effects. Influence on an
effector pathway for hormonal control. N Engl J Med meiosis, as in the filamentous fungus Neurospora
271:1301–1308, CONCL crassa.
G 810 Gene Type

Gene Silencing,
Fig. 1 RNAi-mediated gene RNA Pol III prom shRNA insert T T T T T shRNA expression cassette (psiRNA)
silencing in mammals using
short hairpin RNA genes

shRNA (~50 nt stem-loop with 2 nt 3’ overhang)


UU
Nucleus

Cytoplasm

UU

Dicer

UU

P
siRNA
UU P

UU
P
FMRP

VIG
Tudor-SN

UU
P
elF2C2
RISC

Degradation of target mRNA

Cross-References Cogoni C, Macino G (2000) Post-transcriptional gene silencing


across kingdoms. Curr Opin Genet Dev 10(6):638–643
Selker EU (1999) Gene silencing: repeats that count. Cell
▶ Computational microRNA Biology 97(2):157–160

References
Gene Type
Bass BL (2000) Double-stranded RNA as a template for gene
silencing. Cell 101(3):235–238 ▶ IMGT-ONTOLOGY, GeneType
Gene-centered Information Resource, GoGene 811 G
model organisms (▶ Model Organism) to concepts of
Gene-centered Information Resource, the Gene Ontology (GO) and to diseases based on
GoGene millions of citations in PubMed. With these associa-
tions, GoGene supports interpretation of results from
Conrad Plake high-throughput experiments or simply searching the
Biotechnology Center (BIOTEC), Technische literature for discussed genes. A sequence query lets
Universit€at Dresden, Dresden, Germany users retrieve genes with similar gene or protein
sequences and explore their GO and disease annota-
tion, for example, to find functional hints for yet
Definition uncharacterized genes.

GoGene is a publicly accessible web server supporting Usage


the task of searching for genes and gene-related molec- Users of GoGene can choose between three types of
ular functions, biological processes, cellular compo- queries: (1) lists of gene identifiers or gene names as in G
nents, single amino acid substitutions, and diseases Entrez Gene, (2) keywords that are forwarded to
(Plake et al. 2009). Searching is carried out either via PubMed to find genes mentioned in the resulting list
the literature database Pubmed (▶ MEDLINE and of citations, and (3) nucleotide or amino acid
PubMed), the Blast service at the European Bioinformat- sequences that are forwarded to the EBI, where
ics Institute (EBI), or directly via the NCBI Entrez Gene a remote Blast search against all sequences in the
database. The resulting list of genes is presented together Swiss-Prot database is invoked. The query type can
with the relevant parts of the ▶ Gene Ontology and either be specified by the user or GoGene tries to
Medical Subject Headings, two controlled vocabularies automatically find the best result. First, it checks if
that cover a broad variety of biomedical research. the query resembles a nucleotide or amino acid
GoGene is located at: http://www.gopubmed.org/ sequence, in which case the BLASTX or BLAST
gogene. service is invoked, respectively. Otherwise, the query
is forwarded to both PubMed and Entrez Gene and the
larger gene list is returned. For Entrez Gene results,
Characteristics genes are ranked as in the original result list. Results
from a PubMed search are ranked by occurrence fre-
Gene Annotation by Automated Literature Mining quency in the matching abstracts in descending order.
Sequence databases are growing and many entries are A gene list resulting from a Blast search is ranked by
lacking proper annotation (Baumgartner et al. 2007). sequence similarity, with the most similar gene listed
The low coverage of gene and protein annotation first. Figure 1 shows the flow of data happening after
has significant impact on turning experimental data the user submits a query. The resulting list of genes is
into knowledge such as of mechanisms governing bio- presented together with the relevant parts of the GO
logical processes within cells and organisms, or of and MeSH (Medical Subject Headings), which support
pharmacodynamic pathways determining the action navigation similar to a hyperlinked table of contents.
mechanisms of drugs. With thousands of scientific A gene list, including all GO and disease annotations,
articles published every day, the literature constitutes can also be downloaded as a file in different standard
the main resource of biomedical knowledge. This formats.
knowledge is not easily accessible as it requires
human experts to read papers and manually add the Use Case
relevant pieces of information to a database. GoGene Please note that the following search results might not
has been developed to support literature-based gene be reproducible anymore due to updated database con-
annotation. It employs the gene mention identification tents. Consider a biologist who is interested in rat
tool GNAT (▶ Gene Normalization with GNAT) and genes related to osteoporosis and bone resorption.
the web server GoPubMed (▶ Search Engines with A keyword search for “osteoporosis bone resorption”
Faceted Search) to associate genes from various in the Rat Genome Database gives no results. The same
G
812

1 2 3
5′ 3′
5′ 3′
5′ 3′
Parkinson’s disease 5′ 3′
5′ 3′
5′ 3′
5′ 3′
5′ 3′

Gene-centered Information Resource, GoGene, Fig. 1 GoGene accepts three types store, are added such as descriptive summaries, literature references, and ontological
of queries: keywords, sequences, and gene identifiers. Queries are sent either to PubMed, concepts describing gene functions, and their role in biological processes and diseases
UniProt, or Entrez Gene (1). If the destination for a query was not specified by the user, (3). The list of genes is then displayed in GoGene (4). Concepts from the GO and MeSH
GoGene determines automatically which database gives the best result. After a list of for all genes are displayed as a tree to the left
genes has been retrieved (2), more details about each gene, taken from a local annotation
Gene-centered Information Resource, GoGene
General Transcription Factors 813 G
query in Entrez Gene results in two rat genes (Pth and Cross-References
Tnfsf11). In the literature database PubMed, the query
“rats osteoporosis bone resorption” returns 857 cita- ▶ Alibaba
tions. Reading their abstracts to identify the relevant ▶ Entity Mention Normalization
genes is cumbersome. GoGene alleviates this task by ▶ Gene Normalization with GNAT
automatically searching these abstracts for mentioned ▶ Gene Ontology
genes and displaying the resulting gene list to the user. ▶ MEDLINE and PubMed
In the tree, the biological process, bone resorption, the ▶ Model Organism
organism, rat, and the disease, osteoporosis, are listed ▶ Mutual Information
as top categories. Selecting each as mandatory aug- ▶ Retrieving and Extracting Entity Relations from
ments the query and eventually leaves five rat genes EBIMed
(Pth, Tnfsf11, Ctsk, Tnfrsf11b, and Csf1) for which ▶ Search Engines with Faceted Search
literature references state their role in the specified
disease and biological process. G
References
Methods
Prerequisite for annotating genes with concepts from Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G,
Hunter L (2007) Manual curation is not sufficient for anno-
biomedical ontologies by automated literature analysis
tation of genomic databases. Bioinformatics 23(13):i41–i48
is the correct identification of genes and concepts in Manning CD, Schuetze H (1999) Foundations of Statistical
text. For gene identification, GoGene employs GNAT, Natural Language Processing, 1st edn. The MIT Press, Cam-
a tool for gene mention recognition and normalization bridge, MA, USA. ISBN 0262133601
Plake C, Royer L, Winnenburg R, Hakenberg J, Schroeder M
to a gene database (▶ Named Entity Recognition,
(2009) GoGene: gene annotation in the fast lane. Nucleic
▶ Entity Mention Normalization). Having defined the Acids Res 37(Web server issue):W300–W304. doi:10.1093/
gene literature in PubMed using GNAT, GoGene then nar/gkp429. http://dx.doi.org/10.1093/nar/gkp429
associates these publications with biomedical concepts
provided by the GoPubMed web server. This also
includes all MeSH concepts assigned by the US Gene-External Type of Promoter
National Library of Medicine for indexing. For each
concept found, the corresponding citation is also asso- ▶ Type 3 Promoters
ciated to all its ascendants according to the GO or
MeSH disease hierarchy. For instance, if a PubMed
citation is associated with pancreatic neoplasms,
then it is also associated with digestive system neo- General Transcription Factors
plasms, neoplasms by site, neoplasms, and diseases.
A relationship between a gene and a GO or disease Tetsuro Kokubo
concept is established based on co-occurrences within Department of Supramolecular Biology, Graduate
PubMed citations. The confidence in such a relation- School of Nanobioscience, Yokohama City
ship is computed as the log ratio of observed University, Yokohama, Kanagawa, Japan
co-occurrence probability to the co-occurrence proba-
bility expected under independence, or pointwise Synonyms
▶ mutual information (Manning and Schuetze 1999).
Basal transcription factors
Related Tools
Tools and web servers performing tasks similar Definition
to GoGene are ▶ Alibaba, EBIMed (▶ Retrieving
and Extracting Entity Relations from EBIMed), In eukaryotes, RNA synthetic activities are performed
and iHOP. by three distinct types of RNA polymerases
G 814 Generalization

(RNAPI, II, and III) that demonstrate different sensitiv- linear predictors are replaced by smooth nonlinear
ities to a-amanitin, a toxic substance from the mush- functions.
room (Amanita phalloides). These RNAPs alone cannot
bind to the core promoter, a DNA region surrounding
the transcription initiation site of a given gene. Thus, Characteristics
other proteins are essential for transcription initiation at
the specific site of the core promoter. The minimal set Generalized additive models (GAMs) were devel-
of such proteins was purified by chromatography for oped by Hastie and Tibshirani (1990) and presented
each RNAP and designated as GTFs (general transcrip- in a similar manner to ▶ generalized linear models
tion factors) (Hampsey 1998; Orphanides et al. 1996; (GLMs) where a function of the mean (the link
Thomas and Chiang 2006). The main functions of GTFs function) is modeled as a linear combination
are to recognize core promoter structures, recruit RNAP of smooth functions of explanatory or predictor
to the core promoter, and regulate transcription in variables.
response to activators and repressors.

Cross-References
gðmÞ ¼ b þ f1 ðx1 Þ þ ::: þ fm ðxm Þ (1)
▶ Transcription in Eukaryote

References GAMs typically use the same underlying distri-


butions for the data as GLMs, most commonly
Hampsey M (1998) Molecular genetics of the RNA polymerase II normal, binomial, or Poisson. The functions fi(xi)
general transcriptional machinery. Microbiol Mol Biol Rev
62(2):465–503 are most commonly fit using non-parametric
Orphanides G, Lagrange T, Reinberg D (1996) The general curve estimating methods such as local-regression
transcription factors of RNA polymerase II. Genes Dev (lowess, Cleveland and Devlin 1988), splines, or
10(21):2657–2683 smoothing splines. These methods potentially pro-
Thomas MC, Chiang CM (2006) The general transcription
machinery and general cofactors. Crit Rev Biochem Mol vide better fits to the data than linear or polynomial
Biol 41(3):105–178 models and allow the detection of features in the
data that may be missed. The R package GAM is
one of the most popular software tools for estimat-
Generalization ing GAMs.
An example of the use of GAMs in systems
▶ Abstraction biology is modeling functional similarity as
a smooth function of sequence similarity (Louie
2009). In the example shown in Fig. 1, sequence
similarity is measured using the BLAST bit score.
Generalized Additive Models
Functional similarity is based on using the Gene
Ontology (GO, Ashburner et al. 2000) and deter-
Roger Higdon
mining the shared level for two experimentally
Seattle Children’s Research Institute, Seattle,
validated proteins.
WA, USA
One drawback of GAMs is the possibility of
over-fitting the data; therefore, cross-validation
Definition methods should be used when fitting GAMs. Other
drawbacks include difficulty interpreting the models
Generalized additive models are an extension of and the inability to add interactions between
additive ▶ generalized linear models where the variables.
Generalized Linear Models 815 G
Generalized Additive GO Level vs BLAST bit score
Models, Fig. 1 Example of
7
generalized additive models.
A comparison of shared GO
annotation level as function 6
sequence similarity for
experimentally validated 5
proteins. Sequence similarity

GO Level
is measured by the BLAST bit 4
score. The relationship is
model using generalized 3
additive model where the
curve is estimated by lowess 2
regression
1

0 G
3 4 5 6
BLAST bit score

Cross-References Characteristics

▶ Generalized Linear Models Definition


In classical Linear Regression, the data mean is
modeled as a linear combination of the covariates.
References
The error is assumed to be normally distributed with
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H et al constant variance. Consequently, the mean as well as
(2000) Gene ontology: tool for the unification of biology. Nat the combination of the covariates can take any value on
Genet 25:25–29 the real line and the modeling approach is plausible.
Cleveland WS, Devlin SJ (1988) Locally-weighted regression: However, for skewed or categorical data, the model is
an approach to regression analysis by local fitting. J Am Stat
Assoc 83(403):596–610 not always appropriate. For example, count or propor-
Hastie TJ, Tibshirani RJ (1990) Generalized additive models. tion data has the mean restricted to a certain interval so
Chapman & Hall/CRC, New York/Boca Raton the additive model for the mean is inadequate.
Louie B et al (2009) A statistical model of protein sequence Generalized linear models (GLMs) describe the
similarity and function similarity reveals overly-specific
function predictions. PLoS One 4:e7546 mean as a function of the linear combination of
the covariates. The function maps the range of the
covariate combinations into the domain of the mean.
Generalized linear models are characterized by three
Generalized Linear Models components: a random component, a systematic
component, and a link function. The random compo-
Larissa Stanberry nent of the GLM model is given by independent real-
Bioinformatics and High-throughput Analysis izations ðy1 ; . . . ; yn Þ of response Y and its probability
Laboratory, Seattle Children’s Research Institute, distribution. The distribution of Y is assumed to belong
Seattle, WA, USA to the exponential family and can be written as:

f ðy; y; fÞ ¼ expðyy  bðyÞÞ=aðfÞ þ cðy; fÞ;


Synonyms
for some functions a(f), b(y), c(y,f). For example,
GLM normal, Poisson, binomial, and gamma distributions
G 816 Generalized Mass Action System

belong to the exponential family. For the distributions in References


the exponential family, the mean and variance can be
expressed as E(Y) ¼ m ¼ b0 (y) and var(Y) ¼ b00 (y)a(f). McCullagh P, Nelder JA (1989) Generalized linear models,
2nd edn. Chapman & Hall/CRC, Boca Raton
The systematic component refers to explanatory
variables and constitutes a linear predictor:
X
p
i ¼ bj xij ; i ¼ 1; . . . ; n;
j¼1 Generalized Mass Action System
where ðxi1 ; . . . ; xip Þ are the covariates for observation Eberhard O. Voit
yi and ðb1 ; . . . ; bp Þ is the vector of model parameters. The Wallace H. Coulter Department of Biomedical
The link function g() connects the systematic com- Engineering, Georgia Institute of Technology and
ponent, i , to the random component, mi ¼ EðYi Þ: Emory University, Atlanta, GA, USA
X
p
i ¼ gðmi Þ ¼ bj xij :
j¼1
Definition
The canonical link satisfies g(m) ¼ y.
▶ Biochemical Systems Theory permits several
Examples variants. The format of a Generalized Mass Action
Linear Regression is an example of the generalized (GMA) system models every process within
linear model with random components Y assumed to a system as one product of power-law functions.
be independently normally distributed with means mi As a result, every differential equation describing
and pvariance s2 ; the systematic component a dependent variable in a GMA system contains as
P
i ¼ bj xij ; i ¼ 1; . . . ; n and the identity link many power-law terms as processes producing or
g(z) ¼j¼1z so that mi ¼ i . The identity is also the canon- degrading the variable.
ical link for the normal model.
For the binomial model, the link functions include
the logit Z ¼ logm/(1m), the complementary log-log Cross-References
Z ¼ log{log(1m)}, and the probit given by the
inverse of the normal distribution function. The logit ▶ Biochemical Systems Theory (BST)
function is the canonical link for the binomial model. ▶ Metabolic Systems Modeling, Power-Law
The Poisson and gamma models are also examples Functions
of GLMs with canonical links given by the log and
inverse functions, respectively.

Model Estimation General-Purpose Computation,


The maximum likelihood estimates of the GLM parame- Graphics Processing Units
ters bi ; i ¼ 1; . . . ; p are obtained using iterated
weighted least squares. The response variable is Giuseppe Agapito
given by the first-order linear approximation of the Department of Experimental Medicine and Clinic,
link function at Y and the weights depend on the fitted University Magna Graecia of Catanzaro, Catanzaro,
values m^. The procedure iterates between computing Italy
a new estimate for Z and b. The iterations stop when
the change in model parameters is sufficiently small.
Synonyms

Cross-References Data parallel; Distributed data access; Distributed data


management; Integration technologies; Parallel
▶ Linear Regression computing
General-Purpose Computation, Graphics Processing Units 817 G
Definition perform different operations on the same data, this
architecture does not have any practical application
Graphical Processing Unit computing, or briefly GPU today.
computing, is a methodology that aims at taking 4. Multiple Instruction, Multiple Data (MIMD)
advantage from the computational power of graphics consists of multiple independent processors simul-
processor in the development of high-performance taneously executing different instructions on differ-
software for a wide range of scientific and engineering ent data.
fields. GPUs can be considered as a computer device The modern GPUs could provide the high compu-
operating as a coprocessor to the main CPU (host). tational power needed to manage these huge amounts
This cooperation between the GPU and CPU allows of data efficiently and relatively inexpensive compared
to increase the performance of CPU through the exe- to the number of multi-core CPUs needed to achieve
cution of time-consuming and computing intensive a similar number of cores (Nickolls and Dally 2010).
parts of the code on the GPU. The GPU’s power comes from the sheer number of
parallel processing cores on each chip which is typi- G
cally higher respect to that of a CPU; moreover, the
Characteristics memory of a GPU is typically faster due to a larger bus
width. This means GPUs can transfer information to
Both scientists and computer graphics designers need and from their memory more quickly than CPUs,
to process large volumes of data very quickly while a frequent operation that could create a bottleneck for
modeling sets of many interrelated equations to pro- several applications. Finally, the GPU frequency is
duce results that are as realistic and reliable as possible. currently rapidly increasing, suggesting that GPUs
Many biological phenomena are extremely difficult to will be able to process data at even faster rates in the
study experimentally and researchers must rely on future. These benefits are leading many scientists to
computational simulations, i.e., to determine the pro- choose GPUs over clusters of multi-core CPUs. The
tein 3D structure. This is one of the greatest challenges high parallelism of GPU is a feature needed to render
in computational biology. Nuclear magnetic resonance complex graphical effects in 3D in the animations and
(NMR) spectroscopy and X-ray crystallography are the games; in fact, GPU initially was developed to graph-
most popular methods for structure prediction. If all ical purposes (Dematté and Prandi 2010).
distances are known exactly, the coordinates for each Over the past few years, the GPU has evolved from
atom can be easily computed. However, only a small a fixed-function processor into a general-purpose par-
fraction of all distance bounds are obtained by exper- allel programmable processor with additional special-
iments conducted with these methodologies; thus, it is purpose functionality.
necessary to use heuristics to infer 3D structures from The recent introduction of programming environ-
data. For this reason, it is needed to use massively ments for the development of non-graphics applications
parallel computing capability in order to obtain signif- on GPU facilitated the use of GPU for high-performance
icant models. The high parallelism expressed by the computations. The most widespread programming envi-
interactions among the atoms leads to the idea of using ronment is Compute Unified Device Architecture
parallel computing techniques to tackle the complexity (CUDA) introduced by NVIDIA. CUDA is a hardware
of biological systems. Parallel computing techniques and software co-processing architecture for parallel
require dedicated architectures that can be classified computing that enables NVIDIA GPUs to execute
into the following classes: programs written in C, C++, Fortran, OpenCL,
1. Single Instruction, Single Data (SISD) is a mono- DirectCompute, and other languages, with the goal to
processor architecture that is able to execute exactly exploit the parallelism of the GPU. CUDA programs are
one instruction at a time. explicitly divided into code that runs on the CPU (host)
2. Single Instruction, Multiple Data (SIMD) is a type and code that runs on the GPU (device). The data parallel
of architecture in which many processing units exe- portions of an application are executed on the device
cute the same instruction on different data elements. as kernels (functions). This is possible as all the
3. Multiple Instruction, Single Data (MISD) is a type architectural details (threads, warps, multiprocessor,
of architecture where independent processors etc.) are hidden to the end user. The notions available
G 818 General-Purpose Computation, Graphics Processing Units

General-Purpose Computation, Graphics Processing Units, Fig. 1 Example of a CUDA program

for the user in CUDA are blocks, grids, and threads, to A program can be organized into the following
ease the decomposition of the problem domain. parts: (a) set up input data on the CPU; (b) transfer
A CUDA program is organized into a host program, the data to the GPU by cudaMemcpy(dev_a, &a,
consisting of one or more sequential threads running size, cudaMemcpyHostToDevice), a function that
on a host CPU, and one or more parallel kernels suit- allows to transfer input data from CPU to GPU by
able for execution on a parallel computing GPU. In using as last parameter the cudaMemcpyHost-
Fig. 1 some basic features of parallel programming ToDevice keyword; (c) run the kernel on the GPU
with CUDA are shown where it is possible to note by the __global__ modifier indicating that the pro-
the affinity with C language. cedure is a kernel entry point, but the execution on
Genetic Algorithms 819 G
the GPU is done by the extended function call i.e.,
add < <<4, 2>> > (dev_a, dev_b, dev_c) Genetic Algorithms
launches the kernel add() in parallel across 4 blocks
of 2 threads each; (d) finally, transfer the result Feng-Sheng Wang and Li-Hsunan Chen
back to the CPU by the cudaMemcpy(dev_a, Department of Chemical Engineering, National Chung
&a, size, cudaMemcpyDeviceToHost) defining Cheng University, Chiayi, Taiwan
the direction from GPU to CPU by using the
cudaMemcpyDeviceToHost keyword.
Synonyms
Cross-References
Evolutionary algorithm
▶ Multicore Computing
G
Definition
References
Genetic algorithms (GA) are adaptive heuristic
Dematté L, Prandi D (2010) GPU computing for systems biol- search algorithms based on the conjecture of natural
ogy. Brief Bioinform 3:323–333
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE
selection and genetics (Zhilinskas and Žilinskas
2:56–69 2008). The basic concept of genetic algorithms is
designed to simulate processes in a natural system
necessary for evolution, specifically those that fol-
low the survival of the fittest inspired by Darwin’s
principle. As such they represent an intelligent
General-Purpose GPU exploitation of a random search within a defined
search space to solve a problem. Algorithm is
▶ GPU Computing started with a population of strings (represented
by chromosomes or the genotype of the genome),
which encodes candidate solutions (called individ-
uals or phenotypes) to an optimization problem, and
GENESIS: General Neural Simulation evolves toward better solutions. The typical steps
System are:
1. Choose an initial population of candidate solutions.
▶ Database of Quantitative Cellular Signaling 2. Calculate the fitness, how well the solution is, of
(DOQCS) each individual.
3. Perform crossover from the population.
The operation is to randomly choose some pair of
the individuals as parents and exchange so parts
Genes-to-Systems Breast Cancer from the parents to generate new individuals.
Database 4. Mutation is to randomly change some individuals to
create other new individuals.
▶ Data Integration, Breast Cancer Database 5. Evaluate the fitness of the offspring.
6. Select the survive individuals.
7. Proceed from 3 if the termination criteria have not
been reached.
Genetic Algorithm Example. Genetic algorithms have been applied in
the fields of bioinformatics, computational biology,
▶ Evolutionary Algorithm, Transcription Regulatory and systems biology (Larranaga et al. 2006; Handl
Network Construction et al. 2007).
G 820 Genetic Association

References have been carried based on exhaustive studies on large


cohorts of patients including pedigree analysis, genetic
Handl J, Kell DB, Knowles J (2007) Multiobjective optimization mapping, and correlation of disease symptoms. Anal-
in bioinformatics and computational biology. IEEE/ACM
ysis of these genes and their functions reveal that
Trans Comput Biol Bioinform 4(2):279–292
Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, mutations leading to disruption of their function
Lozano JA, Armananzas R, Santafe G, Perez A, Robles cause one or more of the following cellular effects:
V (2006) Machine learning in bioinformatics. Brief (1) increased oxidative stress, (2) mitochondrial dam-
Bioinform 7(1):86–112
age, (3) disruption of protein degradation machinery,
Zhilinskas A, Žilinskas A (2008) Stochastic global optimization.
Springer, New York, pp 111–145 (4) neurotoxic protein aggregation, and (5) abnormal
DA metabolism. Consequently, mutations in two
genes might have synergistic effect in dopaminergic
neurons. Based on these familial mutations, many cel-
lular and animal models that mimic PD pathology have
Genetic Association been developed to understand the molecular mecha-
nisms and to test potential drugs. Interestingly, the
▶ Gene Association and Linkage Analysis molecular effects of PD mutations in dopaminergic
neurons are similar to the effects following exposure
to environmental toxins. Therefore, PD pathology is
an intricate interplay between genetic predisposition
Genetic Control and environmental factors that could be summa-
rized as follows: “genetic factors load the gun,
▶ Regulation environmental factors pull the trigger” (Abeliovich
and Beal 2006).
A list of all the reported genes related to PD pathol-
ogy is shown in Table 1. Among the listed genes, a-syn
Genetic Epidemiology and leucine-rich repeat kinase (LRRK2) exhibit
autosomal dominant pattern and are associated with
▶ Gene Association and Linkage Analysis intracellular protein aggregation. Whereas, Parkin,
DJ-1, and Pten-induced kinase1 (PINK1) exhibit auto-
somal recessive inheritance and are connected with
mitochondrial dysfunction and oxidative stress path-
Genetic Factor in Parkinson’s Disease ways. These gene products have also been shown to
interact with each other in different models thereby
Rajeswara Babu Mythri1, Shireen Vali2 and modulating each other’s function.
M. M. Srinivas Bharath1 These gene products have either cytoplasmic or
1
Department of Neurochemistry, National Institute of mitochondrial localization. However, they differ in
Mental Health and Neurosciences (NIMHANS), their degradation pathways which could be either
Bangalore, Karnataka, India via proteasome or autophagy or lysosome. Some pro-
2
Cell Works Group Inc., Bangalore, India teins possess distinct features and modulate unique
functions. For example, a-syn has a role in synaptic
function, chaperone activity, microtubule organization
Definition and DA metabolism. DJ-1 codes for a protein with
chaperone and protease activities with a role in tran-
It has been documented that only 5–10% of the con- scriptional regulation. PINK 1 has a protective func-
firmed cases of PD involve genetic factors. These tion against cellular autophagy. Interestingly, toxic
genetic factors include duplication or point mutations mutations in certain proteins might impinge on
in specific genes. Till date, many genes with both a variety of cellular functions. For example, a-syn
autosomal recessive and dominant inheritances linked aggregation elicits proteasomal inhibition, mitochon-
to familial PD have been discovered. Such discoveries drial damage, oxidative stress and neurotoxicity.
Genetic Marker 821 G
Genetic Factor in Parkinson’s Disease, Table 1 List of genes with known familial mutations in PD
Sl. No. Gene Park Locus Function
1. a-syn Park1 and 4 4q21–22 Presynaptic, chaperone
2. Parkin Park 2 6q25–27 E3 ubiquitin ligase
3. UCH-L1 (Ubiquitin carboxy-terminal hydrolase L1) Park 5 4p14 Ubiquitin recycling enzyme 10
4. PINK-1 Park 6 1p25–36 Mitochondrial
5. DJ-1 Park 7 1p36 Oxidative stress response
6. LRRK2 Park 8 12p11.2q13.1 Kinase

Cross-References variant, microsatellite markers involve various number


of repeats of short DNA sequences (2–6 nucleotides),
▶ Disease System, Parkinson’s Disease insertion–deletion variant and inversion variants may
cover a range of 10 bases, and copy number variants G
occur at between multiple of 103 bases (kb) and mul-
References tiple of 106 bases (Mb).
The names of genetic markers are yet to
Abeliovich A, Beal MF (2006) Parkinsonism genes: culprits and be standardized. However, copy number variants,
clues. J Neurochem 99:1062–1072
insertion–deletion variants, inversion variants, and
other genomic scale variants are called structural
variants.
The nature of detection of genetic markers evolves
Genetic Marker with the development of biotechnology. An early type
of genetic marker during 1980s, restriction fragment
Wentian Li length polymorphism (RFLP), is less often used now as
The Robert S. Boas Center for Genomics and Human there are more efficient technologies to detect other
Genetics, Feinstein Institute for Medical Research, types of genetic markers.
Manhasset, NY, USA Genetic markers are essential for genetic studies
(mapping genotype–phenotype relationships), disease
gene mapping, and the resulting medical applications
Synonyms (Strachan and Read 2010). It has also been increas-
ingly used in population genetics and evolutionary
Genetic variation; Polymorphism biology. As genetic markers become surrogates of
genes or gene products, they can also be used in studies
of gene network and system biology.
Definition

Genetic markers are detectable variations of DNA Characteristics


sequence with known chromosome locations.
A genetic marker may or may not have a biological Single Nucleotide Polymorphism
function. Each possible state among the various num- Single nucleotide polymorphism (SNP) is a prototypic
ber of genetic variants is an allele. The chromosome example of genetic marker (Fig.1). Due to the higher
location of a genetic marker is referred to as a locus. mutation rates between nucleotide adenine (A) and
A polymorphism is a genetic variant that appears in at guanine (G), A ↔ G, and between thymine (T) and
least 1% of the population. cytosine (C), T ↔ C, at a given base position on
Genetic variations can occur at many different a chromosome, there are typically only two types of
DNA length scales, such as single nucleotide base, nucleotide (e.g., A or G) present. These are the two
oligonucleotides, and long segment of bases. Single alleles of the SNP. For diploid species including
nucleotide polymorphism (SNP) is a single base human, a pair of chromosomes could exhibit at most
G 822 Genetic Marker

ATTGGCCTTAACCCCCGATTATCAGGAT
single nucleotide polymorphism
ATTGGCCTTAACCTCCGATTATCAGGAT

ATTGGCCTTAACCCGATCCGATTATCAGGAT
insertion−deletion variant
ATTGGCCTTAACCC −−− CCGATTATCAGGAT

ATTGGCCTTAACCCCCGATTATCAGGAT
inversion variant
ATTGGCCTTCGGGGGTTATTATCAGGAT

ATTGGCCTTAACACACACACAGGAT
microsatellite
ATTGGCCTTAACACACA −−−− GGAT

ATTGGCCTTA ... GGCCTTA ... ACCCCCGATA


copy number variant
ATTGGCCTTA ... −−−−−−−−−− ACCCCCGATA

Genetic Marker, Fig. 1 Examples of the main DNA variants whereas microsatellite markers have many possible states
or genetic markers (modified from (Frazer et al. 2009)). Each depending on the number of simple repeats. Copy number var-
DNA sequence segment represents one possible state (allele) iants (CNVs) involve DNA segments of much longer lengths
Single nucleotide polymorphism (SNP), insertion–deletion var- (represented by dots in the graph), and it is common to have only
iant, and inversion variant typically have only two alleles, two alleles

four ordered genotypes (e.g., A|A, A|G, G|A, G|G). method for microsatellite markers is to first identify
When the parental origin of the pair of chromosomes the location of the repeat sequence by its flanking
is ignored, there are at most three genotypes for a SNP sequence, then amplify the amount of repeats-
(e.g., A/A, A/G, G/G). containing DNA segment by polymerase chain reac-
Genotyping is an experimental determination of tion (PCR), and finally the repeat length is deter-
the genotype of a genetic marker. The genotyping mined by a gel electrophoresis. Microsatellite
technology evolves constantly. Many are based markers played an important role in the construc-
on hybridization: single-stranded probe set binds the tion of genetic map of human genome during
single-stranded DNA segment containing a specified 1980s–1990s. The roughly 30,000 known microsat-
SNP, and the binding strength differs between the two ellite markers in human genome cover on average
alleles. The chip technology allows such hybridiza- 100 kb per marker.
tion-based detection to be carried out for hundreds of
thousands of probes, making it possible to genotype Copy Number Variants
a large fraction of SNPs in human genome in one run. Copy number variants are also variable number of
SNP has been a popular choice of genetic marker repeats markers with a much longer repeating
since mid-1990s. The total number of SNPs in human sequence (1 kb or larger) (Feuk et al. 2006). The
genome that differentiate unrelated individuals is in normal number of copies of any DNA segment in
the range of several millions (Frazer et al. 2009). The a diploid genome is 2, and a deletion (duplication)
exact number of SNPs depends on a particular ethnic leads to a reduced (increased) copy number of 1 (3).
population (e.g., Caucasian, Asian, African), and Several mechanisms for the generation of copy number
depends on whether SNPs with very low minor- variants have been proposed, such as nonhomologous
allele-frequency are counted as polymorphism. An recombination.
evenly distributed set of 10 million SNPs in human More than a thousand CNVs have been observed in
genome corresponds to one SNP per 300 bases on human genome. The exact number of CNVs should
average. again depend on a particular ethnic population, and
depend on whether rare or newly created de novo
Microsatellite Markers CNVs are counted. Even though the number of CNVs
Microsatellite markers are one type of variable number might be low, more than 90% of base difference
of tandem repeat markers of which the repeat sequence between two individual’s genomes are due to CNVs
is simple (1–6 bases) (Fig.1). The common genotyping and other structural variants (Frazer et al. 2009).
Genetic Marker 823 G
Genetic Marker, Table 1 An illustration of a genetic association test. A total of 1,000 normal and 1,000 disease affected samples
are genotyped, and the sample counts for three genotypes (A/A, A/G, G/G) are listed in the table. The row–column correlation in the
2-by-3 genotype table can be tested by Pearson’s chi-square test: w2 ¼ 72.516 leading to p-value ¼ 2.22  1016. The Cochran–
Armitage trend test statistic of the genotype table is CAT ¼ 66.366, corresponding to p-value ¼ 3.33  1016. For the 2-by-2 allele
count table, w2 ¼ 69.829 leads to p-value ¼ 1.11  1016. Two more quantities can be obtained from the 2-by-2 allele count table:
The minor allele frequency difference between the diseased and the normal group is 0.095, and odds-ratio is 2.13
Genotype count Allele count
Phenotype A/A A/G G/G total A G total
Normal n0,AA ¼ 20 n0,AG ¼ 170 n0,GG ¼ 810 n0 ¼1,000 n0,A ¼ 210 n0,G ¼ 1,790 2n0 ¼ 2,000
2% 17% 81% 100% 10.5% 89.5% 100%
Diseased n1,AA ¼ 40 n1,AG ¼ 320 n1,GG ¼ 640 n1 ¼ 1,000 n1,A ¼ 400 n1,G ¼ 1,600 2n0 ¼ 2,000
4% 32% 64% 100% 20% 80% 100%

Association Between Phenotypes and Genetic near the same chromosome location. The term “inter- G
Markers action” used in genetics, statistics, epidemiology, and
The main application of genetic markers is as biochemistry has conflicting meanings, and has been
a surrogate of genes for studying genotype–phenotype a source of confusion. We may loosely distinguish two
relationship based on statistical correlation. One such types of meanings of interaction.
study is disease gene mapping, where the phenotype is Bateson’s interaction (or epistasis) (Cordell 2002;
the disease status. Investigation of genotype–pheno- Phillips 2008), after William Bateson (1861–1926),
type relationship can be carried out on family data refers to the situation when the nature of phenotype–
(linkage analysis (Ott 1999)), on a randomly selected genotype relationship for gene 1 is modified by
dataset in a population (association analysis (Balding a change of allele in gene 2. This is often discussed
2006; Li 2008)), or on an exhaustive collection of when gene 1 is the major effect gene, and gene 2 is
people in an isolated population. The latter contains a modifier gene.
elements from both linkage and association analyses. Fisher’s interaction (or statistical interaction), after
For a genetic marker to be surrogate of a gene, an Ronald Aylmer Fisher (1890–1962), refers to any devi-
allele of the marker has to be in linkage disequilibrium ation from the linear phenotype–genotype relation:
(LD) with that of a gene in proximity. LD is simply y ¼ c1 g1 + c2 g2 where y is a quantitative phenotype,
a nonrandom statistical association between alleles at g1 and g2 are numerical coding of two genes (e.g.,
two loci (Slatkin 2008). The presence or absence of LD number of minor alleles), and c1, c2 are regression
between neighboring markers partitions the human coefficients. If another model with a nonlinear term,
genome into discrete LD blocks, and all markers in e.g., y ¼ c1 g1 + c2 g2 + c12 g1 g2 fits the data better
a LD block can potentially be surrogates of genes than the linear relationship (either in the sense of
within the same block. model selection or in the sense of significant c12 coef-
Genetic association carried out on the whole genome ficient), a statistical interaction between genes 1 and 2
level with a large number of genetic markers is called is claimed.
a genome-wide association study (GWAS) (▶ Genome- The term “interaction” in biochemistry, biological
wide Association Study). GWAS has generated a long pathway, and system biology has a more intuitive
list of genetic markers associated with various human meaning of a physical contact by chemical bonds, to
diseases. Statistical analysis plays an important role in be in the same pathway, or being linked in a gene
marker-based genetic association studies (Balding 2006; network. The purpose to study Fisher’s or Bateson’s
Li 2008). Table 1 shows how a typical association test interaction in genetic marker data is to discover the
can be carried out for a single SNP. true physical interaction among gene products.
The concept of gene–gene interaction can be
Gene–Gene and Gene–Environment Interaction extended by gene–environment interaction (Thomas
Genetic markers can be used in studying gene–gene 2010). The Bateson’s interaction is particularly easy
and gene–environment interaction, based on the to be translated in this context: where certain environ-
assumption that these can be surrogates of genes at or ment factor plays the role of modifier gene.
G 824 Genetic Material

Cross-References Definition

▶ Genome-wide Association Study The stable coexistence of two or more variations in the
genotype in a population at a high frequency is called
genetic polymorphism. Based on these variations
References
populations can be divided into subgroups and
Balding DJ (2006) A tutorial on statistical methods for popula- variability is seen in the gene pool. These polymor-
tion association studies. Nat Rev Genet 7:781–791 phisms lead to multiple alleles for a particular locus,
Cordell HJ (2002) Epistasis: what it means, what it doesn’t and in disease-associated genes, selective advantage
mean, and statistical methods to detect it in humans. Hum makes one allele protective and the another susceptible
Mol Genet 11:2463–2468
Feuk L, Carson AR, Schere SW (2006) Structural variation in the to a disease. In enzyme coding genes, genetic poly-
human genome. Nat Rev Genet 7:85–97 morphism, may affect the biological activity or affinity
Frazer KA, Murray SS, Schork NJ, Topol EJ (2009) Human of the enzyme to the substrate. The fact that they
genetic variation and its contribution to complex traits. Nat coexist stably in the population indicates that most
Rev Genet 10:241–251
Li W (2008) Three lectures on case-control genetic association often they do not have drastic effects on the phenotype.
analysis. Brief Bioinfom 9:1–13
Ott J (1999) Analysis of human genetic linkage, 3rd edn. The
Johns Hopkins University Press, Baltimore Cross-References
Phillips PC (2008) Epistasis - the essential role of gene interac-
tions in the structure and evolution of genetic systems. Nat
Rev Genet 9:855–867 ▶ Epigenetics, Drug Discovery
Slatkin M (2008) Linkage disequilibrium – understanding the
evolutionary past and mapping the medical future. Nat Rev
Genet 9:477–485
Strachan T, Read A (2010) Human molecular genetics, 4th edn.
Garland Science, New York, Part 4 Genetic Redundancy
Thomas D (2010) Gene-environment-wide association studies:
emerging approaches. Nat Rev Genet 11:259–272 Diana Ascencio and Alexander DeLuna
Laboratorio Nacional de Genómica para la
Biodiversidad, CINVESTAV-IPN, Irapuato,
Genetic Material Guanajuato, Mexico

▶ Genome
Synonyms

Gene redundancy
Genetic Network

▶ Gene Regulatory Networks


Definition

Genetic redundancy is when two or more genes perform


Genetic Polymorphisms the same biochemical function. Genetic redundancy is
usually defined at the phenotypic level, that is, to
Vani Brahmachari and Shruti Jain describe situations in which mutations in one of these
Dr. B. R. Ambedkar Center for Biomedical Research, genes has little or no effect on the organism’s fitness.
University of Delhi, Delhi, India Duplicate (paralogous) genes are the most important
source of genetic redundancy. Since duplicated genes
may diverge in function, genetic redundancy does not
Synonyms necessarily provide functional redundancy or overlap.
The prevalence of genetic redundancy represents
Genetic Variations a paradox, since redundant functions should be
Genetic Redundancy 825 G
evolutionarily unstable. Yet, particular scenarios explain Moreover, duplicate genes show on average
why genetic – and even functional – redundancy is a lower number of genetic interactions with other
frequently generated and maintained in genetic systems. genes, which also reflects functional compensation
and overlap between the duplicates (VanderSluis
et al. 2010). Taken together, these observations
Characteristics suggest that duplicate genes tend to maintain
a substantial functional overlap.
Prevalence of Genetic Redundancy in Biological
Systems The Problem of Genetic Redundancy
Genetic redundancy is a salient feature of living It is challenging to account for genes with redundant
organisms. Genomes in all three domains of life – bac- functions that have been conserved for long evolution-
teria, archaebacteria, and eukaryotes – have a large pro- ary times. Natural selection, in particular purifying
portion of duplicated genes, ranging from over 15% and selection, acts to conserve genes that impact fitness of
up to 65% of their genes (Zhang 2003). Duplicate genes an organism, thus a fully redundant function should in G
are constantly originated through small-scale duplication principle not be under any kind of selective pressure.
or by whole-genome duplication. The particular mecha- This rationale makes genetic redundancy evolution-
nism of duplication and biological role are relevant for arily unstable, which apparently contradicts the
the process of fixation and conservation of duplicate prevalence of genetic redundancy across biological
genes: genes that are part of molecular complexes or species (Nowak et al. 1997).
cellular networks are more likely retained if duplicated Nowak et al. (1997) modeled several scenarios that
by whole-genome duplication. Likewise, genes with cer- could explain why genetic redundancy is common
tain functions such as metabolic enzymes, transporters, and even evolutionarily stable. Let us consider
and transcription factors are often preserved in duplicate a population of organisms that have two loci A and B
(Conant and Wolfe 2008). that perform one essential function F. If both genes
Duplicate genes have the potential for yielding novel perform the function with similar efficacy and mutation
functions; however, not all gene duplications involve rates operate equally on both genes, genetic redundancy
functional innovation. In fact, most gene duplicates do is evolutionarily unstable. In a different scenario, one of
not confer novel functions and are probably preserved in the genes, B, performs the function with slightly less
the genome by passive mechanisms such as partitioning efficacy than its paralog A; redundancy is maintained
of the ancestral function (subfunctionalization) or selec- because B has lower mutation rate than A. Nowak et al.
tion for dosage (Conant and Wolfe 2008). Phylogenetic (1997) also considered situations where genes per-
analyses of genes in model eukaryotes have shown that form more than one function, the overlap between
the functional overlap between duplicate genes is not both genes affecting only one function, as well as
only a transient state after the duplication, but is often instances where developmental errors provide
a stable state across long evolutionary scales (Vavouri a selective pressure to maintain genetic redundancy.
et al. 2008). In addition, both genes performing exactly the same
The occurrence of synergistic (negative) genetic function needed in high amounts would result in
interactions is used as a direct indication of functional deletions of any of the two genes causing fitness
overlap between duplicate genes. For any two genes, defects; hence both genes would be under selective
a synergistic interaction occurs when the phenotype of pressure (Conant and Wolfe 2008).
the double-deletion mutant is quantitatively more
severe than the expected combined effects of the single Genetic Robustness, Genetic Redundancy, and
deletions. In the particular case of duplicate genes, Functional Innovations
a synergistic interaction between the duplicates Living organisms are remarkably robust to genetic per-
indicates that they can compensate for each other’s turbations. Genome-wide surveys in model organisms
loss. Up to 55% of duplicate gene pairs in yeast have shown that complete gene inactivation typically has
metabolism interact synergistically with each other, little or no phenotypic effect. There are two main
which indicate that they carry out important and mechanisms that are responsible for such functional
overlapping functions (DeLuna et al. 2008). compensation to mutations (genetic robustness). Genetic
G 826 Genetic Redundancy

redundancy is one of these mechanisms, whereby dele-


tion of one gene has little effect due to the presence of its
duplicate and functionally overlapping paralogous gene.
The second mechanism of genetic robustness relies on
the distributed nature of genetic networks; interactions
between genes with unrelated functions also provide
functional compensation (Wagner 2008).
Early after gene duplication the sister copies may
experience a relaxed selection pressure; sequences of
duplicate genes tolerate more nucleotide changes than
their single-copy ancestral genes. Hence, genetic
redundancy provides not only functional backup and
robustness, but also the chance to accumulate sequence
changes that may diversify their functions. This inno-
vation mechanism has evolutionary potential. In fact,
Genetic Redundancy, Fig. 1 The utilization of genetic redun-
there are several examples of the important role of
dancy in a responsive backup circuit. Two duplicate genes, A and
gene duplication in vertebrate radiation, flowering B, perform the same function, but they are regulated in
plant evolution, and heart development (Wagner a different ways. Levels of protein B are negatively regulated
2008). There is thus an important link between by its duplicate A that is positively regulated by an external
signal. The response is the sum of the two products. Even though
genetic redundancy, mutational robustness, and
the signal upregulates A, this will prompt downregulation of B,
functional innovations, as their constant interplay reducing fluctuations in overall protein and activity levels
allows organisms to survive environmental pertur- (Adapted from Kafri et al. (2009))
bations and yet have the ability to generate pheno-
typic diversity.

Dosage Effects and Increase of Metabolic Flux


Gene redundancy may increase gene expression advantage during adaptation to glucose-rich environ-
level. The process of gene duplication has an ments; the appearance of angiosperms on earth opened
immediate effect on gene dosage and this feature an ecological niche for microorganisms with the abil-
can be beneficial for the organism, which in turn ity to consume glucose and produce ethanol rapidly
drives the conservation of functionally overlapping through fermentation (Conant and Wolfe 2008).
pairs of genes. Redundant genes encoding ribo- Genetic redundancy may thus provide a beneficial
somal proteins and some metabolic isoenzymes in dosage-mediated effect underlying adaptation of an
yeast have likely been retained in duplicate due to organism to a novel environment.
selection for changes in gene dosage (Conant and
Wolfe 2008). Responsive Backup Circuits
In yeast, about 25% of duplicate genes originated Genetic redundancy could provide a mechanism to
from the whole-genome duplication are metabolic reduce noise caused by the direct interaction between
enzymes. Analyses of metabolic networks and a signal and the response. From a series of studies in
flux-models indicate that genetic redundancy is not vertebrate developmental pathways, it has been noticed
more frequently associated with reactions with higher that redundant duplicates are typically cross-regulated by
flux and not to indispensable metabolic reactions, negative feedback, allowing one of the redundant pro-
suggesting that the reason for their retention is to teins, the responder, to respond to an alteration in the
increase metabolic flux rather than providing func- expression of its partner, the inducer Fig. 1. This mutual
tional compensation (Papp et al. 2004). Most genes in repression can be used to reduce stochastic fluctuations
the glycolytic and fermentation pathways of Saccha- in protein expression and to reduce the effects of noise in
romyces cerevisiae are present in more than one copy. environmental or developmental signals (Kafri et al.
Following whole-genome duplication, an increase in 2009). For example, MyoD and Myf5 are two function-
the products of these genes gave yeast a growth ally overlapping regulators of skeletal muscle
Genetic Regulation Mechanisms 827 G
development, which are expressed in separate cell line-
ages. Mutations in MyoD induce increased proliferation Genetic Regulation Mechanisms
of the Myf5-positive cell lineage thus increasing expres-
sion of the Myf5 isoform. In this case, extracellular Pramod R. Somvanshi and Kareenhalli V. Venkatesh
signals regulate a “responsive circuitry” comprising Department of Chemical Engineering, Indian Institute
MyoD and Myf5 that effectively buffers against fluctu- of Technology Bombay, Powai, Mumbai,
ations in MyoD levels (Kafri et al. 2009). Maharashtra, India
In yeast, a modest fraction of proteins show
significant upregulation upon deletion of their dupli-
cate genes. Metabolic enzymes are over-represented Definition
among such paralog-responsive proteins that match
almost exclusively duplicate pairs whose overlapping Genetic information is carried in the form of genetic
function is required for growth. Moreover, media code in the DNA of an organism and corresponding
conditions that add or remove requirements for genes need to be expressed to yield a specific response G
the function of a duplicate gene pair specifically or phenotype. Gene expression is a multistep process
eliminate or create responsiveness of duplicate genes which is precisely coordinated and controlled by
(DeLuna et al. 2010). These observations suggest that various complex regulatory mechanisms for an opti-
responsiveness between duplicate genes could indeed mal performance and is termed as genetic regulation
provide an important mechanism for compensation of (Perdew et al. 2006).
genetic, environmental, or stochastic perturbations in
protein abundance.
Characteristics

References The gene expression is a process by which the genetic


information is transferred from DNA to RNA and RNA
Conant GC, Wolfe KH (2008) Turning a hobby into a job: to proteins via ▶ transcription and translation, respec-
how duplicated genes find new functions. Nat Rev Genet
tively. The expression is triggered by the preceding
9:938–950
DeLuna A, Vetsigian K, Shoresh N, Hegreness M, signaling cascades which activate the ▶ transcription
Colón-González M, Chao S, Kishony R (2008) Exposing factors for the initiation of the transcription. RNA poly-
the fitness contribution of duplicated genes. Nat Genet merases, along with other protein complexes, facilitate
40:676–681
transcription process. The gene products of the transcrip-
DeLuna A, Springer M, Kirschner MW, Kishony R (2010)
Need-based up-regulation of protein levels in response tion are RNA molecules, which after further processing
to deletion of their duplicate genes. PLoS Biol 8: and modification (capping, splicing, termination, cleav-
e1000347 age, and polyadenylation) are transported from nucleus
Kafri R, Springer M, Pilpel Y (2009) Genetic redundancy: new
to cytoplasm for translation into functional proteins.
tricks for old genes. Cell 136:389–392
Nowak MA, Boerlijst MC, Cooke J, Smith JM (1997) Evolution of Prokaryotic and eukaryotic gene expression have sev-
genetic redundancy. Nature 388:167–171 eral regulatory mechanisms in common; however, they
Papp B, Pal C, Hurst LD (2004) Metabolic network analysis of differ in their cellular structures, wherein prokaryotic
the causes and evolution of enzyme dispensability in yeast.
Nature 429:661–664
cells lack the nuclear envelope and the chromatin struc-
VanderSluis B, Bellay J, Musso G, Costanzo M, Papp B, tures. Hence, the events of nucleocytoplasmic transport
Vizeacoumar FJ, Baryshnikova A, Andrews B, Boone C, and chromatin remodeling are very specific to eukary-
Myers CL (2010) Genetic interactions reveal the evolu- otic systems In prokaryotes, genes are clustered
tionary trajectories of duplicate genes. Mol Syst Biol
together to form an operon which encodes the proteins
6:429
Vavouri T, Semple J, Lehner B (2008) Widespread conservation required for coordinated functioning and follow certain
of genetic redundancy during a billion years of eukaryotic order of expression. Further, the multiple operons coor-
evolution. Trends Genet 24:485–488 dinate together to form a regulon, which is typically
Wagner A (2008) Gene duplications, robustness and evolution-
regulated through common regulatory mechanisms.
ary innovations. Bioessays 30:367–373
Zhang J (2003) Evolution by gene duplication: an update. Trends Gene expression is regulated at various stages
Genet 18:292–298 from transcription of RNA to the post-translational
G 828 Genetic Regulation Mechanisms

Genetic Regulation Mechanisms, Fig. 1 Gene regulatory mechanisms

modification of the proteins. Usually, the regulation is of gene expression is initiated with the activation and
achieved by altering the rates of various steps men- binding of ▶ transcription factors (TF) to the promoter
tioned above. The various mechanisms that elicit such regions. TFs also help in activating and recruiting RNA
kind of regulations are briefed below (see Fig. 1) polymerases (enzymes involved in RNA synthesis) to
(Perdew et al. 2006; White and Sharrocks 2010). the transcriptional initiation site. A single TF can act as
an activator or repressor for two different genes or mul-
Protein-DNA Interactions tiple genes at a time. There are certain enhancer and
Protein-DNA interactions are the most fundamental silencer regions on the DNA, which when bound by the
interactions in regulation of the gene expression respective transcription factors drastically increase
where various regulatory proteins bind to the DNA and decrease the rate of transcription, respectively.
(Fig. 1a) to accomplish transcription. A gene contains Moreover, binding of certain activator or repressor mol-
a core promoter region and contains sequence called as ecules to the promoter regions increases or decreases the
TATA box where the proteins complexes including dif- rate of transcription, respectively. Similarly in prokary-
ferent transcription factors (TFIID,-B,-E,-F,-H), TBPs otes, certain operons are catabolite-regulated operons
(TATA-binding proteins), and TAFs (TBP-associated (e.g., lac-operon) where catabolite acts as an activator
factors) can bind to facilitate transcription. The process for energy metabolism whereas for attenuated operons
Genetic Regulation Mechanisms 829 G
(e.g., trp-operon) the proteins act as a repressor for its modifications in TFs and proteins are ubiquitination,
own synthesis (Orphanides and Reinberg 2002; Levine phosphorylation, acetylation on lysine residues,
and Tjian 2003; Kornberg 1999). methylation on arginine and lysine residues, glycosyl-
ation, fatty acylation, disulfide bond formation, and
Protein-Protein Interactions proteolysis (Orphanides and Reinberg 2002).
Various protein molecules and protein complexes are
known to interact with the TFs and RNA polymerases Multiple Binding Sites and Cooperativity
to modulate the regulation of the gene expression Certain promoter regions possess multiple binding
(Fig. 1b). The rate of gene expression can be regulated sites for the TFs which can modulate the genetic
by modulating either the probability of the TF binding or responses (Fig. 1c). It is observed that higher promoter
the strength (affinity) of the TF binding to the promoter occupancy and higher strength of TF binding leads to
region. It is found that the increase in TF binding strength increased rate of transcription and vice versa. The avail-
is achieved by the formation of dimeric or multimeric ability of the multiple binding sites provides a room for
protein-DNA complex. Multimeric protein-DNA inter- altering the rates of transcription based upon the combi- G
actions have been shown to yield steep sensitive natorial effect of the upcoming signals. The bound
responses through the effect of stoichiometry as com- subunit has the cooperative effect on the binding of the
pared to a binding of single TF. Several proteins act as next subunit by increasing or decreasing its affinity
specificity factors, inducers, activators, repressors, core- toward the binding region exhibiting positive or negative
pressors, coactivators, and mediator complexes to regu- cooperativity, respectively. The stoichiometry of inter-
late gene expression. Specificity factors are the proteins action is drastically affected and the equilibrium is so
that alter (increase or decrease) the specificity of the shifted that a steep rise in the rate of expression is
RNA polymerases to the promoter regions. Inducers obtained with increasing number of binding sites. Occu-
are the molecules that interact with the activators and pancy of multiple binding sites has shown to yield
repressors to induce the transcription through positive ultrasensitive and subsensitive responses by the binding
and negative controls, respectively. Activators enhance of activators and repressors, respectively. Such responses
the rate of transcription by increasing the affinity of the also facilitate initial delays and are operational only by
RNA polymerase to the respective promoter regions. certain activation or repression thresholds (Chin et al.
Repressors intervene the association of the RNA poly- 1999; Sacketta and Saroff 1996).
merases and TF binding to the promoter regions and
decrease the rate of transcription. The mediator protein Auto-Regulation and Feedback Mechanisms
complex is an important interface that compounds the Auto-regulation is a phenomenon in which the rate of
activity of RNA polymerases with the activator protein gene expression is directly or indirectly regulated by
complex to facilitate the transcriptional regulation. its own gene product with certain feedback mecha-
Moreover, there are various interactions of coactivators nisms (Fig. 1d). The process by which a gene product
and corepressors with activators and repressors leading upregulates its own production by increasing the rate of
to positive and negative control mechanisms (Perdew its gene expression as a result of positive feedback is
et al. 2006; Orphanides and Reinberg 2002; Levine and called as positive auto-regulation (PAR). PAR exhibits
Tjian 2003; Kornberg 1999). slower response time due to the delay in activation and is
sensitive to the inherent noise leading to cell-to-cell
Protein Modifications and Stability variability in the gene product concentrations. It helps
The cell has to respond to changing environmental in signal amplification and pattern generation. The pro-
stimuli and correspondingly regulates the net protein cess by which the gene product downregulates its own
abundance at a given instance. This is achieved by production by the decreasing rate of its gene expression
controlling the rates of synthesis and degradation of as a result of negative feedback is called as negative auto-
the proteins and altering the stability of the mRNAs regulation (NAR). NAR fastens the response time and
and proteins as the requirement. Various cofactors is resistant to the inherent noise leading to reduced
assign the functional groups to the amino acid residues cell-to-cell variability of gene product concentrations.
and help in catalyzing the enzymatic reactions Various combinations of positive and negative feedback
for protein stabilization and degradation. Several loops have shown to elicit memory, bistability,
G 830 Genetic Variation

oscillations, and robustness in the gene expression pro- which leads to acetylation and methylation of N-terminal
files. Moreover, such feedback mechanisms allow the residues of histone tails. On the contrary, the transcrip-
cells to filter out transient input signals and aids in tional repressors recruit HDACs (Histone deactylases)
appropriate decision making by responding to the multi- that leads to deacetylation of the histone tails and subse-
ple regulatory pathways (Alon 2007). quent repression of the transcription (Orphanides and
Reinberg 2002; Kornberg 1999).
Nucleocytoplasmic Transport The above discussed mechanisms work in coordina-
In eukaryotes, a potential way of regulating gene tion with different levels to regulate gene expression.
expression is by controlling nucleocytoplasmic trans- Cells have engineered gene regulatory motifs involving
port (Fig.1e) through the nuclear pore complexes and these mechanisms in combination with feed forward and
the selective exchange of RNA (export) and protein feedback loops to elicit a system level regulation. More-
molecules (import) in the two compartments. over, the global regulatory pathways from signaling and
The nuclear and cytoplasmic environment is spatially metabolic networks also coordinate to govern the gene
separated by the nuclear envelope, which facilitates the expression profiles. With advances in research in this
separation of transcriptional and translational machin- area, newer interactions and mechanisms are being
ery. The transport of the RNA and proteins are medi- explored that elicit precise regulation.
ated by the specialized transport motifs called as
nuclear export signals (NES) and nuclear localization
Cross-References
signals (NLS), respectively. The shuttling of the
heterodimeric complex (to and fro) from nucleus to
▶ Transcription
cytoplasm serves as a quality control mechanism by
▶ Transcription Factor
selective transport of the only mature RNAs to the
translational machinery. This transport can also be
regulated by the covalent modifications of the NES References
and NLS molecules. Furthermore, the event of RNA
export is coupled with the transcription and pre-mRNA Alon U (2007) Network motifs: theory and experimental
approaches. Nat Rev Genet 8:450–461
processing by the capping and splicing mechanisms.
Chin JW, Kohler JJ, Schneider TL, Schepartz A (1999) Gene
Many proteins shuttle continuously between cyto- regulation: protein escorts to the transcription ball. Curr Biol
plasm and nuclease based on their requirement and 9:R929–R932
localization at the respective sites (Orphanides and Dimaano C, Ullman KS (2004) Nucleocytoplasmic transport:
integrating mRNA production and turnover with export
Reinberg 2002; Dimaano and Ullman 2004).
through the nuclear pore. Mol Cell Biol 24(8):3069–3076
Kornberg RD (1999) Eukaryotic transcriptional control, multi-
Chromatin Remodeling and Histone Modification layered transcription mechanisms, millenium issue. TCB
In eukaryotes, DNA is not easily accessible to the tran- 9(12):(0962–8924)
Levine M, Tjian R (2003) Transcription regulation and animal
scription interactions as it is supercoiled around the core of
diversity. Nature 424:147–151
octameric histone proteins in the nucleosomes leading to Orphanides G, Reinberg D (2002) A unified theory of gene
a highly compact structure called as chromatin. Various expression. Cell 108:439–451
coregulatory proteins are recruited along with the tran- Perdew GH, Vanden Heuvel JP, Peters JM (2006) Regulation
of gene expression-molecular mechanisms. Human Press,
scriptional factor to facilitate chromatin decompaction
Totowa, Book
and conformational changes that provide access to the Sacketta DL, Saroff HA (1996) The multiple origins of
promoter regions for recruitment of RNAPII and general cooperativity in binding to multi-site lattices. FEBS Lett
transcriptional machinery. This process is coupled with 397:1–6
White RJ, Sharrocks AD (2010) Coordinated control of the gene
the transcription elongation. The chromatin structure
expression machinery. Trends Genet 26(5):214–220
can be temporarily modified by the phosphorylases
from signaling cascades and permanently modified by
methylation of DNA, termed as gene silencing. For the
activaton of the genes, activator proteins recruit Genetic Variation
HATs (Histone acetyltransferse) and HMTs (Histone
methyltransferase) to the promoter regions of the genes, ▶ Genetic Marker
Genome Project 831 G
process consists of starting with raw DNA sequences
Genetic Variations and giving a biological meaning to its content (Stein
2001). The first step in annotating a raw sequence will
▶ Genetic Polymorphisms require the mapping of structural elements in the genome
by comparing the latter against a library of already
known sequences. This especially includes genetic
markers, genetic polymorphisms, protein-coding regions
Genome (▶ Protein Structure Comparison, High-Performance
Computing, ▶ Proteome, ▶ Proteome Analysis Pipeline,
Vani Brahmachari and Shruti Jain ▶ Proteomics), as well as other functional elements such
Dr. B. R. Ambedkar Center for Biomedical Research, as non-translated RNAs (▶ RNA-seq), repetitive ele-
University of Delhi, Delhi, India ments, duplicated genes, and regulatory regions
(▶ Gene Regulation, ▶ Regulation, ▶ Regulation and
Autoregulation, ▶ Regulation Function). The structural G
Synonyms elements are aligned using a specialized search tool,
such as BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi)
Genetic material developed at the National Center for Biotechnology
Information Database (Altschul et al. 1990), or
GENSCAN (http://genes.mit.edu/GENSCAN.html) by
Definition Burge and Karlin (1997). The second step in annotation
involves the association of biological processes to the
The genome is defined as the entire genetic makeup of an sequences identified from the first step. These processes
organism. It can be either DNA or RNA (as is in the case include metabolic functions, regulation (▶ Gene
of some viruses). It includes the nuclear as well as Regulation), interactions (▶ Binding Affinity), and gene
organelle DNA. The complexity and the size of the expression (▶ Gene Expression). In 1999, a consortium
genome vary between organisms. The average human created the gene ontology (http://www.geneontology.
genome size is 3.2  109 base pairs and corresponds to org) to standardize the vocabulary used for describing
one copy of each of the 23 chromosomes (haploid set) as genes, gene products, and genomes.
opposed to 4.6  106 base pairs for Escherichia coli.

References
Cross-References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ
(1990) Basic local alignment search tool. J Mol Biol
▶ Epigenetics
215:403–410
Burge C, Karlin S (1997) Prediction of complete gene structures
in human genomic DNA. J Mol Biol 268:78–94
Stein L (2001) Genome annotation: from sequence to biology.
Nat Rev Genet 2:493–503
Genome Annotation

Akos Dobay
Institute of Evolutionary Biology and Environmental Genome Browsers
Studies (IEU), University of Zurich, Zurich,
Switzerland ▶ Genomic Resources

Definition
Genome Project
Genome annotation is the process of attaching higher-
level information to primary sequences. The whole ▶ NCBI BioProject Genome Resources
G 832 Genome-Scale Metabolic Modeling

information and a huge knowledge about identity and


Genome-Scale Metabolic Modeling specificity of enzymes, first efforts have been undertaken
to reconstruct the metabolic network from genome data.
▶ Constraint-based Modeling Therefore data from several biological layers, e.g.,
genome, transcriptome, proteome, and metabolome,
can be used to infer the specific metabolic network.
The genome contains the information which proteins
Genome-Scale Metabolic Network a cell can in principle produce. Therefore, a metabolic
network can in principle be derived from the genome by
Andrzej M. Kierzek
assembling all enzymes for which a coding region in the
Division of Microbial Sciences, Faculty of Health and
genome has been identified. However, several issues
Medical Sciences, University of Surrey, Guildford,
make this task nontrivial. Firstly, to identify a function
Surrey, UK
of a particular gene relies on finding similarities, or
homologies, to known and previously annotated genes.
This is essentially a statistic process and therefore there is
Definition
no certainty whether new annotations are correct. But
even if this problem could be resolved, the next difficulty
Genome-scale metabolic network is the list of bio-
arises in defining the abstract, computer-readable
chemical reaction formulas implied by the repertoire
description of the metabolic network. Which reactions
of enzymes identified in the genome of the organism
should be included in the model and where should
under investigation.
a “boundary” of metabolism be defined? Clearly, mac-
romolecular assembly, such as protein or RNA synthesis,
cannot be included, but it is by no means trivial to define
Cross-References
a suitable threshold for the complexity of the included
metabolites. For example, small proteins such as ferre-
▶ Mycobacterium Tuberculosis
doxins or thioredoxins are important cofactors and as
such a description of a metabolic network without these
compounds is somewhat incomplete. Finally, a network
Genome-Scale Metabolic Network defined from the genome sequence represents the max-
Inference imal metabolic capabilities of a cell or organism; how-
ever, which enzymes are actually expressed and active
Oliver Ebenh€ oh1 and Stefan Kempa2 depends greatly on the environmental conditions or the
1
Institute for Complex Systems and Mathematical specific tissue into which a cell has been differentiated.
Biology, University of Aberdeen, Kings College, Old The following general strategy allows exploiting
Aberdeen, Aberdeen, UK different types of high-throughput data with heteroge-
2
Integrative Metabolomics and Proteomics, BIMSB neous origin:
(Berlin Institute for Medical Systems Biology), Berlin, The reconstruction process typically begins with
Germany the sequenced and annotated genome. With the help of
biochemical databases such as KEGG, BRENDA,
Reactome, or MetaCyc, the annotated genes are mapped
Definition to enzymes and the catalyzed biochemical reactions.
Depending on the envisaged computational analysis,
The metabolic network is the biochemical backbone for this established list of reactions has to be curated. Most
all chemical reactions in a cell, organ, or organism. importantly, stoichiometric inconsistencies have to be
Metabolism is a basic sign of life and metabolic dysfunc- detected and removed. Reactions in which the numbers
tions may cause or be the result of many diseases. The of atoms in the substrates and products are not balanced
structure of the network and the identity of the proteins are fatal for any further computational analysis because
(here enzymes) determine the metabolic behavior of the models may predict that metabolites are created from
the system. With the availability of genome-scale nothing or completely annihilated. More difficult to
Genome-Wide Association 833 G
detect are thermodynamic inconsistencies which may In multicellular organisms the different cell types
lead to energy-producing cycles which clearly violate fulfill specific metabolic functions and therefore each
fundamental laws of thermodynamics. This curation cell displays a specific expression pattern for metabolic
process leads to a draft metabolic network which can enzymes. Also single cellular organisms adapt their
be used for further computational analyses including molecular repertoire according to external or internal
constraint-based models, such as flux balance stimulation. Thus, not all genome encoded reactions
analysis (FBA). are expressed all the time in all cells. Using further
The second task is typically the identification of miss- information from transcriptome analyses or protein
ing reactions and subsequent completion of the draft expression data, active metabolic subnetworks can
network. The principle strategy is to query the model be inferred. A strategy which is exemplified on
and validate whether it agrees with experimental obser- different human tissues has been described by
vations. The most fundamental observation is that a cell Jerby et al. (2010).
is actually alive and growing, which implies that the A still unresolved challenge is to combine genome-
metabolic network must be able to produce all biomass scale metabolic models with global gene regulatory G
precursors from the available nutrients. Knowledge of models. The expressed subnetwork can be viewed as
the chemical composition of the growth medium and the a result of the genetic up- or downregulation of genes
biomass allows the definition of fluxes which the model encoding the respective enzymes. However, also the
must be able to support. Metabolome data are highly metabolic state of the cell is signaling back to the
useful to refine this analysis. Model consistency transcriptional regulatory system. An approach how
demands that, besides biomass precursors, all other the metabolic and gene regulatory subsystems can be
experimentally observed metabolites must also be pro- simultaneously described by timescale separation has
ducible by the network. After those experimentally been proposed by Baldazzi et al. (2010). However,
observed metabolic functions have been identified this method has so far only been demonstrated for
which the draft network is not able to support, candidate very small systems and its application to large-scale
reactions must be found which, when added to the draft, models seems difficult. To validate these kinds of
provide the network with the missing functionality. Sev- combined models, however they may be realized in
eral methods for this gap-filling have been proposed. detail, it will be necessary to simultaneously mon-
One approach which allows integrating genomic, prote- itor the dynamics of concentrations of metabolites
omic, and metabolomic data is described by Christian (small molecules), the level of transcripts, and the
et al. (2009). Essentially, the draft network is embedded activity of enzymes.
in a reference network derived from a biochemical
database containing enzymatic reactions from a large
number of different organisms. With a simple search References
algorithm, minimal sets of reactions are identified
which complete the draft to consistency. Subsequently, Baldazzi V, Ropers D, Markowicz Y, Kahn D, Geiselmann J,
de Jong H (2010) The carbon assimilation network in
the identified reactions are ranked according to how well Escherichia coli is densely connected and largely sign-
the genome sequence and proteomic data support the determined by directions of metabolic fluxes. PLoS Comput
existence of a catalyzing enzyme. In this way, the Biol 6:e1000812
genome annotation of Chlamydomonas reinhardtii Christian N, May P, Kempa S, Handorf T, Ebenhoh O (2009)
An integrative approach towards completing genome-scale
could be considerably improved. The caveat of this and
metabolic networks. Mol Biosyst 5:1889–1903
similar approaches is that only those reactions can be Jerby L, Shlomi T, Ruppin E (2010) Computational reconstruction
identified which have been previously reported. A future of tissue-specific metabolic models: application to human liver
challenge is to develop computational methods which metabolism. Mol Syst Biol 6:401
allow for the identification of hitherto unknown meta-
bolic reactions. In principle, such a method could be
based on chemical reaction patterns from which theoret-
ically feasible reactions can be computed that may be Genome-Wide Association
catalyzed either by known enzymes or by putative
enzymes with functional similarities to known enzymes. ▶ Gene Association and Linkage Analysis
G 834 Genome-Wide Association Study

statistical methods such as cluster analysis. Depending


Genome-Wide Association Study on the study, matching can also be done by other
criteria, such as gender and environmental exposure.
Wentian Li Lack of proper sample matching is a common cause of
The Robert S. Boas Center for Genomics and Human false positive results.
Genetics, Feinstein Institute for Medical Research,
Manhasset, NY, USA References

Hirschhorn JN, Daly MJ (2005) Genome-wide association stud-


Synonyms ies for common diseases and complex traits. Nat Rev Genet
6:95–108
Ioannidis JPA, Thomas G, Daly MJ (2009) Validating,
Genome-wide case-control studies; Genome-wide augmenting and refining genome-wide association signals.
genetic association analysis Nat Rev Genet 10:318–329
McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J,
Ioannidis JPA, Hirschhorn JN (2008) Genome-wide associ-
ation studies for complex traits: consensus, uncertainty and
Definition challenges. Nat Rev Genet 9:356–369

Genome-wide association studies (GWAS) are pro-


jects to investigate the statistical association between
phenotypes and a dense set of genetic markers
Genome-Wide Case-Control Studies
(▶ Genetic Marker) that capture a substantial amount
▶ Genome-Wide Association Study
of genetic variations in the genome, using a large num-
ber of matched samples.
Phenotypes can be qualitative traits such as disease
status or quantitative traits such as blood pressure. Statis- Genome-Wide Genetic Association
tical association between disease status and alleles of Analysis
a genetic marker is carried out by categorical data analysis.
Genetic markers are usually genotyped by microarray ▶ Genome-Wide Association Study
chips. Whether a substantial genetic variation in the
genome, including common, rare, and structural varia-
tions, is captured by the set of markers depends on the Genomic
number of markers and their chromosome locations.
The typical number of single nucleotide polymor- ▶ Microbiome
phism (SNP) markers used in a current GWAS is 300 k
(300,000), 500 k (500,000), or 1 M (1,000,000). The
number of samples collected in a GWAS range from
hundreds to tens of thousands. These large number of
Genomic Databases
markers present a particular problem for statistical
Erika De Francesco2, Giuliana Di Santo1,
analysis called multiple testing.
Luigi Palopoli1 and Simona E. Rombo1,3
Stringent data quality control procedures are usu- 1
Dipartimento di Elettronica, Informatica
ally required before statistical analysis, as biased
e Sistemistica Università della Calabria, Rende, Italy
genotyping rates between case and control groups 2
Exeura, Rende (CS), Italy
could lead to false positives. 3
Istituto di Calcolo e Reti ad Alte Prestazioni,
The diseased (case) and normal (control) samples in
Consiglio Nazionale delle Ricerche, Rende (CS), Italy
a GWAS have to be appropriately matched by their
ethnic or geographic origin, to avoid true signal being
overwhelmed by genetic variations unrelated to the Synonyms
disease. Using GWAS genotyping data, it is possible
to uncover unmatched samples and outliers by Non-coding intergenic sequences; Genomic Datasets
Genomic Databases 835 G
Genomic Databases, Information type
Fig. 1 Information
typologies
Genomics Segment

Maps Gene

Genetic
Clone/contig sequence

Physical
Polymorphism

Control region
Variation and mutation

Motif
Pathway
Structural features
G
Expression

Bibliographic reference Telomere

Centromere
Other

Repeat

Definition The number of relevant biological data sources


available to date can be estimated in about 970 units
Genomic databases store datasets related to the genomic (Galperin 2007). Among them, there are about 100
sequencing of different organisms and gene annotations. genomic databases that can be classified by consider-
Differently from gene databases, containing only coding ing different characteristics.
DNA sequences, genomic databases contain also non- One way to go is by employing five different and
coding intergenic sequences. Genomic databases are orthogonal characteristics (De Francesco et al. 2009),
listed among the data resources useful in systems biology. that are:
Typologies of recoverable information (e.g., genomic
FASTA Format segments or clone/contig regions)
Each entry of a FASTA document consists of three com- Database schema types (e.g., Genolist schema or
ponents: (1) a comment line that is optional and reports Chado schema)
brief information about the sequence and the GenBank Query types (e.g., simple queries or batch queries)
entry code; (2) a sequence that is represented as a string Search methods (e.g., graphical interaction–based or
on the alphabet {A, C, G, T} of the nucleotide symbols; query language–based methods)
and (3) a character denoting the end of the sequence. Result formats (e.g., flat files or XML formats)
These characteristics are more thoroughly
discussed next.
Characteristics
Recoverable Information
Systems biology focuses on studying (complex) Genomic databases contain a large set of data types.
interactions involving various components (among Some archives report only the sequence, the function,
others, genes, proteins, enzymes, pathways, and such) and the organism corresponding to a given portion of
in biological systems. Therefore, studies in systems biol- genome, and other ones contain also detailed informa-
ogy often rely on (sometimes massive) bunches of data tion useful for biological or clinical analysis. Figure 1
that are stored nowadays in biological databases. shows a taxonomy of the main classes of information
G 836 Genomic Databases

Schemas

Database management
system (DBMS)

Other GUS GenoList schema Chado schema Pathway tools schema

Relational Object-oriented Object-relational

Genomic Databases, Fig. 2 Database schemas

Query type

Simple query Batch query Analysis query

Genomic Databases,
Sequence similarity query Pattern search query
Fig. 3 Query types

that are recoverable from genomic databases. For relational schemas, whose structure is anyways inde-
example, Genomic segments include all the nucleotide pendent from the biological nature of data (see Fig. 2).
subsequences that are meaningful from a biological For example, the Genomics Unified Schema (GUS)
point of view, such as genes, clone/clontig sequences, (GUS 2006) is a relational schema suitable for a large
polymorphisms, control regions, motifs, and structural set of biological information, including genomic data,
features of chromosomes. genic expression data, and protein data, while the
Pathway Tools Schema (Karp 2000) is an object
Database Schemas schema used in Pathway/Genome databases.
Most genomic databases are relational, even if relevant
examples exist based on the object-oriented or the Query Types
object-relational model (see e.g., (WormBase 2006; In Fig. 3, a taxonomy of query types supported by most
Twigger et al. 2002)). Four different types of database genomic databases is illustrated.
schema that are designed “specifically” to manage By simple querying, it is possible, for example, to
biological data can be distinguished from other recover data satisfying some standard search parameters
unspecific database schemas, which are mostly generic such as gene names, functional categories, and others.
Genomic Databases 837 G
Genomic Databases, Database search methods
Fig. 4 Search methods

Text based Grafical interaction based Sequence based Query language based

Free text Form

Analysis queries are more complex and, somehow, more Query result format
typical of the biological domain. They consist in retriev-
ing data based on similarities (similarity queries) and HTML
G
patterns (pattern search queries). The former ones take
XML
as input a DNA or a protein (sub)sequence and return SBML
those sequences found in the database that are the most
BioPAX
similar to the input sequence. The latter ones take as
input a pattern p and a DNA sequence s and return Other XML formats
those subsequences of s which turn out to be most FASTA
strongly related to the input pattern p.
Flat file
EMBL file
Search Methods
A further classification criterion is related to methods GenBank file
used to query available databases, as illustrated in Other flat file formats
Fig. 4 where four main classes of query methods
are distinguished: text-based methods, graphical interac- Comma/tab-separated files
tion–based methods, sequence-based methods, and TSV
query language–based methods. The most common
CSV
methods are the text-based ones, further grouped in two Attribute-value file
categories: free text and forms. In both cases, the query
can be formulated specifying some keywords. With the Genomic Databases, Fig. 5 Result formats
free text methods, the user can specify sets of words, also
combining them by logical operators. With forms, recently, in order to facilitate the spreading of infor-
searching starts by specifying the values to look for that mation in heterogeneous contexts, the XML format is
are associated to attributes of the database. sometimes supported.

Result Formats
In genomic databases, several formats are adopted to Cross-References
represent query results (see the taxonomy in Fig. 5).
Web interfaces usually provide answers encoded in ▶ Metabolic Networks, Databases
HTML, but other formats are often available as well.
For example, flat files are semistructured text files
where each information class is reported on one or References
more consecutive lines, identified by a code used to
characterize the annotated attributes. Often, special De Francesco E, Di Santo G, Palopoli L, Rombo SE
(2009) A Summary of genomic databases: overview and
formats have been explicitly conceived for biological
discussion. In: Sidhu AS, Dillon TS (eds) Biomedical data
data. An example is the FASTA format (Mount 2004), and applications, studies in computational intelligence.
commonly used to represent sequence data. More Springer, Berlin, pp 37–54
G 838 Genomic Datasets

Galperin MY (2007) The molecular biology database collection: epigenetic marking is transmitted through meiosis.
2007 update. Nucl Acids Res 35:D3–D4 Only a small subset of genes in the human genome
GUS (2006) The Gen. Unified Schema docum. http://www.
gusdb.org/documentation.php are subjected to genomic imprinting. If a gene is said to
Karp PD (2000) An ontology for biological function based on be maternally imprinted it means that the copy of the
molecular interaction. Bioinformatics 16:269–285 gene coming from the maternal side is repressed, as per
Mount DW (2004) Bioinformatics: sequence and genome the convention in the field (Ideraabdullah et al. 2008).
analysis, 2nd edn. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor
Twigger S, Lu J, Shimoyama M et al (2002) Rat genome
database (RGD): mapping disease onto the genome. Nucl Cross-References
Acids Res 30(1):125–128
WormBase (2006) Db on biology and genome of C. Elegans
http://www.wormbase.org/ ▶ Epigenetics

References

Genomic Datasets Ideraabdullah FY, Vigneau S, Bartolomei MS (2008) Genomic


imprinting mechanisms in mammals. Mutat Res 647(1–2):
77–85
▶ Genomic Databases

Genomic Resources
Genomic Imprinting
Ricardo Cruz-Herrera del Rosario
Vani Brahmachari and Shruti Jain Genome Institute of Singapore, Singapore, Singapore
Dr. B. R. Ambedkar Center for Biomedical Research,
University of Delhi, Delhi, India
Synonyms

Synonyms Genome browsers

Differential expression of homologous genes;


Parental-origin-effect; Transcription repression Definition

Genome browsers are graphical interfaces that allow


Definition the viewing of the DNA of sequenced species at dif-
ferent scales, ranging from a few bases, to thousands of
Genomic imprinting is an epigenetic phenomenon due bases at the level of genes, up to whole chromosomes.
to which only one copy of the gene, inherited from To display a large amount of heterogeneous data,
the mother or the father is expressed. The term imprint- genome browsers use the genomic coordinates as the
ing is borrowed from behavioral science, implying main axis and data are displayed as different tracks that
parental-origin effect on the genes. The transcriptional run parallel to it.
activity of genes inherited from the maternal side
through the egg or oocyte could be different from
that of the same genes inherited through the sperm, Characteristics
even if they have identical DNA sequence. Thus
genomic imprinting is a special case of epigenetic The data provided by genome browsers include gene
inheritance where DNA methylation and histone annotations, transcript evidence, regulatory information,
modifications are responsible for differential regula- repeat elements, alignment with other genomes, epige-
tion of the homologous genes or chromosomes and the netic marks, genomic variants, etc. Users can select
Gillespie Stochastic Simulation 839 G
specific tracks that are only relevant to the biological
question they are trying to answer. Genome browsers Genomics
may allow the retrieval of the data that they provide.
Genome browsers can be web-based or can be ▶ Disease System, Malaria
downloaded as a stand-alone program. Since their use
involves huge amounts of raw data, most popular
genome browsers are web-based as they do not require
Genomics Network
the user to download data locally. Another advantage
of web-based genome browsers is that they can auto-
▶ Functional/Signature Network Module for Target
matically update the data that they provide. Genome
Pathway/Gene Discovery
browsers can also be classified into multi-species or
species-specific browsers. A short description of some
of the most popular multi-species web-based browsers
is given below. Geometric Networks G
The UCSC Genome Browser (www.genome.ucsc.
edu; Kent et al. 2002) is one of the most widely ▶ Stem Cell Networks
used genome browsers worldwide. It provides verte-
brate, deuterostome, insect, nematode and microbial
genomes. It also provides a large amount of annotation Gibbs Sampler
data and functional data from next generation sequenc-
ing experiments. It allows the retrieval of almost all ▶ Markov Chain Monte Carlo
data that it provides.
The Ensembl browser (www.ensembl.org; Flicek
et al. 2011) provides the genome and annotation of
vertebrate genomes, but also has sister sites that pro- Gillespie Algorithm
vide metazoan, plant, fungi, and unicellular eukaryote
and prokaryote genomes. ▶ Chemical Master Equation
The VISTA browser (www.genome.lbl.gov/vista; ▶ Stochastic Simulation Algorithm
Frazer et al. 2004) consists of a suite of programs and
databases for comparative genomics.
Gillespie Stochastic Simulation

References Ruiqi Wang


Institute of Systems Biology, Shanghai University,
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Shanghai, China
Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L,
Hendrix M, Hourlier T, Johnson N, K€ah€ari A, Keefe D,
Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P,
Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Definition
Rios D, Ritchie GR, Ruffier M, Schuster M, Sobral D,
Spudich G, Tang YA, Trevanion S, Vandrovcova J, Vilella Consider the master equation
AJ, White S, Wilder SP, Zadissa A, Zamora J, Aken BL,
Birney E, Cunningham F, Dunham I, Durbin R, Fernández-
Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, @pðX; tÞ X M

Vogel J, Searle SM (2011) Ensembl 2011. Nucleic Acids Res ¼ fwk ðX  yk ÞpðX  yk ; tÞwk ðXÞpðX; tÞ:
@t k¼1
39(Database issue):D800–D806
Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004) (1)
VISTA: computational tools for comparative genomics.
Nucleic Acids Res 32(Web Server issue):W273–279
Although the analytical solution of the master equa-
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler
AM, Haussler D (2002) The human genome browser at tion is rarely available, the density function can be
UCSC. Genome Res 12:996–1006 constructed numerically using the stochastic simulation
G 840 Glazier–Graner–Hogeweg Model

algorithm (SSA). Generally, the SSA first constructs References


numerical realizations and then averages the results of
many realizations. The goal of stochastic simulation is Gillespie DT (1977) Exact stochastic simulation of coupled
chemical reactions. J Phys Chem 81:2340–2361
then to describe the evolution of the state XðtÞ from some
given initial state Xð0Þ.
The reaction probability density function
PðDt; mjX; tÞ is the joint probability density function Glazier–Graner–Hogeweg Model
of two random variables, i.e., the time to the next
reaction Dt and the index of the next reaction m, ▶ Cellular Potts Model
given X. The reaction probability density function for
the master equation takes the form

PðDt; mjX; tÞ ¼ wm expða0 ðXÞDtÞ (2) GLM

with ▶ Generalized Linear Models

X
M
a0 ðXÞ ¼ wj ðXÞ;
j¼1
Global Linearization
where Dt  0 and m ¼ 1;    M.
The reaction probability density function provides ▶ Quasilinearization
the basis for the SSA.
According to the joint density function (Eq. 2), the next
reaction and the time of its occurrence can be generated
through the direct method. Draw two random numbers r1 Global Maximum
and r2 from a uniform distribution in the unit interval
[0, 1]. The time to the next reaction Dt and the index of ▶ Global Optimum
the next reaction m, given X, can be taken as follows:

1 1
Dt ¼ lnð Þ; (3) Global Minimum
a0 ðXÞ r1
▶ Global Optimum
m ¼ the smallest integer satisfying

X
m
wj0 ðXÞ > r2 a0 ðXÞ: (4) Global Network Alignment
j0

Shihua Zhang1 and Zhenping Li2


1
The Gillespie direct method for exact simulation of National Center for Mathematics and
the master equation is as follows: Interdisciplinary Sciences, Academy of Mathematics
Step 1. Initialization: set t ¼ 0 and fix the initial num- and Systems Science, Chinese Academy of Sciences,
bers of molecules Xð0Þ. Beijing, China
2
Step 2. Calculate the propensity function wk , School of Information, Beijing Wuzi University,
k ¼ 1;    M: Beijing, China
Step 3. Generate two random numbers r1 and r2 in [0, 1].
Step 4. Determine Dt and m according to (Eqs. 3 and 4).
Step 5. Execute reaction m and advance time Dt, i.e., Synonyms
t t þ Dt. If t reaches Tmax , terminate the compu-
tation. Otherwise, go to Step 2. Local network alignment; Network alignment
Global Sensitivity Analysis 841 G
Definition References

Global network alignment problem is used to find the Jongen HT, Meer K, Triesch E (2004) Optimization theory.
Kluwer, Dordrecht
best overall alignment between the input networks. The
mapping for it should cover all of the input nodes. Each
node in an input network is either matched to one or more
nodes in the other network(s) or explicitly marked as
a gap node which has no match in another network Global Sensitivity Analysis
(Singh et al. 2008; Zaslavskiy et al. 2009).
Yunfei Chu and Juergen Hahn
Artie McFerrin Department of Chemical Engineering,
Cross-References Texas A&M University, College Station, TX, USA

▶ Local Network Alignment G


▶ Multiple Network Alignment Definition
▶ Network Alignment
Global sensitivity analysis is an alternative to the
widely used local sensitivity analysis to quantify
References parameter effects on the output. The advantage of
global sensitivity analysis over the local counterpart
Singh R, Xu J, Berger B (2008) Global alignment of multiple is that parameter uncertainty is taken into account
protein interaction networks with application to functional
orthology detection. Proc Natl Acad Sci USA 105(35):
in the calculation of the sensitivity value and the
12763–8 sensitivity is not dependent on the nominal values
Zaslavskiy M, Bach F, Vert JP (2009) Global alignment of of the parameters. Furthermore, global sensitivity
protein-protein interaction networks by graph matching analysis allows to vary several parameters simulta-
methods. Bioinformatics 25(12):i259–67
neously and over a large range to investigate their
effects on the output. Global sensitivity analysis
returns more detailed information about parameter
effects and can also capture interactions among
Global Optimum parameters.
Various techniques for global sensitivity analysis
Lin Wang have been developed (Saltelli et al. 2008); some pop-
School of Computer Science and Information ular ones include the Morris’ screening method, sam-
Engineering, Tianjin University of Science and pling-based approaches, and variance-based indices.
Technology, Tianjin, China All these techniques identify the effect of parameter
variations on the output; however, each one looks at
this problem from a different perspective.
Synonyms The main challenge of global sensitivity analysis is
the required computational effort. Many global sensi-
Global maximum; Global minimum tivity measures need to evaluate a multidimensional
integral, and computationally intensive approaches
such as the Monte Carlo method have to be applied to
Definition compute the integral.

Optimum indicates minimum and maximum, respec-


tively. Let f : Rn ! R be a real valued function and Cross-References
M Rn . A point x 2 M is called a global minimum
(respectively, global maximum) if there exists ▶ Optimal Experiment Design
f ð
xÞ  f ðxÞ (respectively, f ð
xÞ  f ðxÞ) for all x 2 M. ▶ Signal Transduction Pathway
G 842 Global Stability

References
Global State, Boolean Model
Saltelli A, Ratto M, Andres T, Campolongo F (2008) Global
sensitivity analysis: the primer. Wiley, New York
Xi Chen, Wai-Ki Ching and Nam-Kiu Tsing
Advanced Modeling and Applied Computing
Laboratory, Department of Mathematics, University of
Hong Kong, Hong Kong, China
Global Stability

Tianshou Zhou Synonyms


School of Mathematics and Computational Sciences,
Sun Yet-Sen University, Guangzhou, Guangdong, State
China

Definition
Definition
In a Boolean model, a global state is a gene state vector
Global stability means that the attracting basin of consisting of all the gene expression states. A Boolean
trajectories of a dynamical system is either the state model actually consists of a set of n nodes (where each
space or a certain region in the state space, which is the node corresponds to a gene):
defining region of the state variables of the system.
In other words, global stability means that any trajec- V ¼ fv1 ; v2 ; . . . ; vn g
tories finally tend to the attractor of the system, regard-
less of initial conditions. Most of biological systems, and a list of Boolean functions (which represent the
e.g., gene regulatory systems, are needed to be globally regulatory rules for nodes). Define vi(t) to be the state
stable. (0 or 1) of the node vi at time t. Here we let:
To help understand global stability, here we give an
example. Consider the famous Lorenz system: vðtÞ ¼ ðv1 ðtÞ; v2 ðtÞ; . . . ; vn ðtÞÞT

dx which is called the Gene Activity Profile (GAP). The


¼ að y  x Þ
dt GAP can take any possible form from the set:
dy
¼ cx  xz  y n o
dt
dz S ¼ ðv1 ; v2 ; . . . ; vn ÞT : vi 2 f0; 1g
¼ xy  bz
dt
where each element in the set S is a global state,
If a ¼ 10; b ¼ 3; c ¼ 28, then we know that it has and thus totally there are 2n possible (global) states
an attractor (called the Lorenz attractor). This attractor in the network. An example of a two-gene network
is globally stable since any trajectories beginning at is given in Table 1. From the table, there are four
initial points in the state space of the system finally global states in this network: (0, 0), (0, 1), (1, 0),
tend to the attractor of the system. and (1, 1).
Global stability is different from both the local
stability of a steady state and structural stability, Global State, Boolean Model, Table 1 The truth table
where the former describes how the trajectories
State v1 (t) v2(t) f (1) f (2)
near the steady state respond to a perturbation and
1 0 0 1 1
the latter describes how the trajectories are changes
2 0 1 1 0
when the parameters of a system are changes.
3 1 0 1 0
Global stability belongs to a kind of asymptotic
4 1 1 0 0
stability.
Goodwin Oscillator 843 G
References various biochemical and physiological factors. To pre-
vent oxidative damage, GSH is present in millimolar
Shmulevich I, Dougherty E, Kim S, Zhang W (2002) From concentrations in the brain; however, the concentra-
Boolean to probabilistic Boolean networks as models of
tions are higher in the astrocytes compared to neurons.
genetic regulatory networks. Proc IEEE 90:1778–1792
Age-dependent decline in GSH in the brain and cere-
brospinal fluid has been observed in many organisms
including humans. Further, the SN region of the mid-
Glutathione brain has lower levels of GSH compared to other ana-
tomical areas. However, during PD, there is a further
Rajeswara Babu Mythri1, Shireen Vali2 and decrease in SN GSH levels. GSH depletion is one of
M. M. Srinivas Bharath1 the first known indicators of oxidative stress and
1
Department of Neurochemistry, National Institute of neurodegeneration in PD prior to selective inhibition
Mental Health and Neurosciences (NIMHANS), of CI activity and DA loss. GSH depletion exacerbates
Bangalore, Karnataka, India the neurotoxicity of PD causing chemical toxins such G
2
Cell Works Group Inc., Bangalore, India as MPTP and 6-hydroxydopamine suggesting that
disruption of the redox homeostasis of the cells triggers
disease pathways in PD (Bharath et al. 2002).
Definition
Cross-References
Glutathione, a tripeptide (g-L-glu-L-cys-gly; GSH), is
the most abundant nonprotein thiol biomolecule in mam-
▶ Disease System, Parkinson’s Disease
malian tissues. GSH also occurs as GSSG, the oxidized
form of GSH and as GSSR representing GSH-cysteine
disulphides linked to proteins. GSH mainly functions as References
a cellular antioxidant involved in detoxification of toxic
free radicals (namely, peroxynitrite) and xenobiotics. Bharath S, Hsu M, Kaur D, Rajagopalan S, Andersen J (2002)
Glutathione, iron and PD. Biochem Pharmacol 64:
GSH acts as an electron donor in the reduction of perox-
1037–1048
ides. Apart from this, GSH is also involved in mainte-
nance of redox potential, transportation and storage of
cysteine and as a cofactor. GSH has a role in signal
transduction, cell proliferation, storage of nitric oxide, Goodwin Oscillator
and regulation of gene expression. GSH also plays an
important role in DNA metabolism, protein synthesis, Jinzhi Lei
activation of certain enzymes, and enhancement of Zhou Pei-Yuan Center for Applied Mathematics,
immune function. Tsinghua University of Beijing, Beijing, China
De novo synthesis of GSH in vivo occurs in the
cytosol from the constituent amino acids in two con-
secutive steps catalyzed by g-glutamyl cysteine ligase Definition
(GCL), the rate limiting enzyme and GSH synthase.
Cellular GSH level is also contributed by reduction of Goodwin oscillator is a quintessential example of
GSSG to GSH. Conversely, GSH levels could be a biochemical oscillator based on negative feedback
decreased when it is converted to GSH conjugates, alone that was invented by Brain Goodwin (Goodwin
GSSG, or by release from cells. Approximately 10% 1965, 1966). The oscillator consists of mRNA, protein,
of cellular GSH is transported to the mitochondria by and protein product (repressor). MRNA is the control-
an energy-dependent mechanism. There exists ling factor for protein synthesis. Protein is the
a steady-state balance between synthesis and depletion controlling factor for the production of protein prod-
of cellular GSH. uct. The repression of mRNA synthesis by protein
Compared to other organs in the human body, the product follows the same law of surface adsorption as
brain is more susceptible to oxidative damage due to does protein inhibition (Fig. 1) (Goodwin 1965, 1966).
G 844 GPGPU

Goodwin B (1965) Oscillatory behavior in enzymatic control


X processes. Adv Enzyme Regul 3:425–438
Goodwin B (1966) An entrainment model for timed enzyme
Y Z synthesis in bacteria. Nature 209:479–481

Goodwin Oscillator, Fig. 1 A schematic representation of the


Goodwin oscillator

GPGPU
The kinetic equations describing the Goodwin
oscillator are ▶ GPU Computing

dX v0
¼  k1 X
dt 1 þ ðX=KÞp
dY GPU
¼ v1 X  k2 Y (1)
dt
dZ Luca Lombardi and Piercarlo Dondi
¼ v2 Y  k3 Z Department of Computer Engineering and Systems
dt
Science, University of Pavia, Pavia, Italy
Here X, Y, Z are concentrations of mRNA, protein,
and protein product, respectively; v0, v1, and v2 deter-
mine the rates of transcription, translation, and cataly- Synonyms
sis; k1, k2, and k3 are rate constants for degradation of
each component; 1/K is the binding constant of protein Visual processing unit (VPU)
product to transcription factor; and p is a measure of
the cooperativity of the repressor.
▶ Hopf bifurcation analysis in Goodwin’s model Definition
shows that to obtain biochemical oscillation, the
cooperativity of the negative feedback must be very A Graphic Processing Unit (GPU) is a specialized
high, say p > 8, and the degradation rate constants of processor dedicated to the creation of the images visu-
the three components have to be nearly equal. alized on the screen. It implements a set of the most
Bliss, Painter, and Marr (1982) fixed these problems frequently used graphics primitive operations, so as the
by a slight modification of Goodwin’s model: CPU is not involved in the time-expensive graphical
computation. GPUs, originally designed for personal
dX v0 computers, are currently used in many other kinds of
¼  k1 X
dt 1þX devices, such as mobile phones, games consoles,
dY tablets, or embedded systems. In a PC a GPU can be
¼ v1 X  k2 Y (2)
dt included as a standalone graphic card (usually in the
dZ Z mid-high-level solutions) or embedded in the mother-
¼ v2 Y  k3 :
dt 1 þ Z=K board (in low-level models). Finally, at the end of
2010, CPUs with integrated GPU were released:
Now the feedback step is no longer cooperative, and a solution particularly efficient to reduce communica-
the uptake of protein product has a form of Michaelis- tion times among the processors and to limit battery
Menten function. consumption in mobile devices (as notebooks or
netbooks).

References
Cross-References
Bliss R, Painter P, Marr A (1982) Role of feedback inhibition in
stabilizing the classical operon. J Theor Biol 97:177–193 ▶ High-Performance Computing, Structural Biology
GPU Computing 845 G
Device Architecture (CUDA) of NVIDIA, that greatly
GPU Computing facilitate the programming of applications targeted at
GPUs. The use of GPUs for general-purpose applica-
Fransisco Vázquez, José Antonio Martı́nez and tions has exceptionally increased in the last few years
Ester M. Garzón due to the evolution of both GPU programming
Department of Computer Architecture and Electronics, resources and the semiconductors technology. In this
University of Almerı́a, Almerı́a, Spain way, GPUs have emerged as new computing platforms
that offer massive parallelism and provide incompara-
ble performance-to-cost ratio for scientific computa-
Synonyms tions (Kaeli and Leeser 2008). Currently, NVIDIA is
leading the GPU computing and its interface CUDA is
General-Purpose GPU; GPGPU the key of thousands of applications which are accel-
erated by the NVIDIA GPUs; a relevant percentage of
these applications are referred to the systems biology. G
Definition CUDA is based on the programming model SIMT
(Single Instruction, Multiple Threads), that is, every
The use of GPUs (Graphics Processing Units) to accel- instruction in the program is executed by hundreds of
erate the computation of scientific and engineering threads with different data. The mapping of the threads
applications. on the GPU cores is automatically carried out by
CUDA (Kirk and Hwu 2010).
An approach to facilitate the GPU programming is
Characteristics based on the use of basic routines or libraries which
(1) compute the most used operations in the applica-
At first the goal of the Graphics Processing Units tions and (2) are optimally accelerated by GPU. In this
(GPUs) was to accelerate the specific computation line, NVIDIA supplies a wide set of routines related to
related to the visualization in the computer; in this several kinds of applications such as CuBLAS,
way the CPU could be entirely dedicated to the com- CuFFT, and so on.
putation of general applications. So GPU hardware It should be noted that the computation related to
resources were specialized in graphic computation every application can be classified in two kinds:
and included several specific Arithmetic Units and • Computationally intensive, if it includes large
a particular memory hierarchy connected to the central sequences of arithmetic-logic operations over
process unit (CPU). Later, the role of GPUs has been large data sets. So, these operations can be com-
more relevant and the graphic software to exploit the puted by a large set of threads which are mapped on
GPUs was based on graphics-specific programming the high number of cores into the GPU and all of
languages like OpenGL and Cg with a substantial them execute the same sequence of instructions.
cost of programming. Then, the SIMT programming model is suitable
In the last decade the evolution of GPU technology for the computation and it can be accelerated by
has been based on the following: the massive parallelism of the GPU (see Fig. 1).
1. The amount of hardware resources has been signif- • Control-flow intensive, if the process is dominated
icantly increased. by control-flow operations, that is, decision points,
2. The GPU was only focused on graphic computa- in order to drive different instructions over different
tion, so it became an underutilized resource in the data. This computation is not suitable for the SIMT
computer. programming model and it is not accelerated by the
3. A wide range of applications in Science and Engi- GPU. This kind of computation can achieve better
neering share the characteristics of the graphic performance on the multicore CPU.
processes. Consequently, the GPU computing cannot substi-
Thus, the main recent advances in GPU technology tute the “CPU computing” because both can improve
have focused on the development of Application Pro- the performance of different kinds of computations
gramming Interfaces (APIs), such as Compute Unified included in the applications. So, currently, the
G 846 GPU Computing

Input data Process 1 Process 2 Process 3 Process 4 Process 5 Output data


Run-times on CPU

Processes 2 and 4
They are computationally intensive, that is, these
processes compute the same operations over
large data sets. So, they are suited for the GPU
CPU GPU programming model (Single Instruction Mutliple
Threads). Their run-time can be strongly
reduced by means of GPU computing, thought
they need some CPU - GPU communication
processes.

Run-times using GPU as accelerator Processes 1,3 and 5


They are control-flow intensive, that is, these
Input data Process 1 2 Process 3 4 Process 5 Output data
processes compute different operations over
different data. Then, GPU computing can not
reduce their run-time.
Data Communications
GPU-CPU

GPU Computing, Fig. 1 GPU computing can strongly accelerate the computationally intensive procedures in the program

high-performance computation (HPC) is based on het- preserve the specimen from radiation damages. But,
erogeneous computation (GPU-multicore computing) as a consequence, the computed images are blurred,
and the effort of programming is relevant in this con- that is, they exhibit poor signal-to-noise ratio (SNR).
text, because the programmers have to identify both Then, in ET it is necessary to apply either a simple
types of computations and develop the program com- 3D-reconstruction method which produces blurred
bining two parallel interfaces. However, if one kind of images plus a sophisticated denoising method, or
computation dominates the application, then, only one sophisticated 3D-reconstructions which are capable
parallel interface can be considered. generating images with high resolution. Both alterna-
tives need long run-times to compute the reconstruc-
GPU Computing for Systems Biology tion because both are computationally intensive. So,
Electron Tomography (ET) is taken as an illustrative HPC is paramount in this context to cope with those
example related to the systems biology in order to computational needs.
analyze the role played by GPU computing in this The GPU computing has emerged as a new HPC
field. ET has emerged as the leading technique for the technique that offers massive parallelism. It is based on
structural analysis of unique complex biological spec- the GPUs platforms which are very suitable for their
imens. It combines electron microscopy with the integration in laboratories of structural biology due to
power of 3D imaging. ET has made it possible to their incomparable performance-to-cost ratio and easy
directly visualize the molecular architecture of organ- maintenance and use.
elles, cells, and complex viruses. Furthermore, ET The specific methods in ET are computationally
allows the identification of the macromolecular assem- intensive and can be strongly accelerated on GPU
blies in their native cellular environment and the study platforms. However, it is necessary to reprogram
of their distribution in 3D as well as their interactions. them or even to propose a new approach to define
ET has been crucial for recent breakthroughs in life the ET algorithms, in order to better exploit the paral-
sciences (see Lucic et al. (2005); Frank (2006) for lelism of the GPU platforms. In this line, several ET
reviews). approaches have already been proposed (Castaño-Diez
Computer-automated data collection has been et al. 2008; Vazquez et al. 2010; Xu et al. 2010).
essential for the advent of ET as a structural technique
in cellular biology. It allows to automate specimen Tomographic Reconstruction
tilting, area tracking, focusing, and recording of Tomographic reconstruction can be modeled as a least
images under low electron-dose conditions in order to square problem that can be solved by means of
GPU Computing 847 G
algorithms based on matrix operations (Herman 1980). Afterward, that matrix is used to reconstruct all the slices
Large sparse matrices are involved in these algorithms. by Sparse Matrix-Vector products (SpMV). So, the WBP
However, matrix data structures have not been tradi- method is computationally intensive and the acceleration
tionally included in the implementations due to their of SpMV is the key to achieve a high performance.
large memory requirements. As a consequence, the This matrix formulation of WBP allows to use CUDA
algorithms usually recompute of matrix coefficients routines previously developed, which efficiently accel-
when needed. Nonetheless, modern computers have erate the operation SpMV on the GPU. The routine based
large memory units available, so it is now possible to on the format called ELLPACK-R achieves highest per-
improve the performance of reconstruction algorithms formance on GPUs (Vazquez et al. 2011) and facilitates
by storing large matrices in core. the development based on CUDA of WBP. Additionally,
Assuming the single tilt axis geometry and using the particular operation SpMV (B ∗ ps) involved in the
voxels to represent the volume to be reconstructed, matrix implementation of WBP has some specific char-
the 3D reconstruction problem can be decomposed into acteristics that can be exploited to accelerate these oper-
a set of independent 2D reconstruction subproblems ations with GPUs. Using the general implementation of G
corresponding to the slices perpendicular to the tilt SpMV on GPU based on ELLPACK-R as a starting
axis. Each of the 2D slices of the volume can then be point, the three geometry-related symmetry properties
computed from the corresponding set of 1D projections allow a significant reduction of the memory require-
(usually known as sinogram). The reconstructed 3D ments to store the matrix B. The corresponding formats
volume is obtained by simply stacking the 2D slices. to compress the matrix related to these symmetries are
The standard method to solve this problem is referred as sym1, sym2, and sym3.
Weighted Back Projection (WBP) (Frank 2006). Briefly, The matrix WBP approach has been implemented
the method uniformly distributes the object mass present with CUDA and evaluated on a NVIDIA GeForce
at the projection images over computed backprojection GTX 295 GPU. In order to evaluate the use of matrix
rays. The intersection of the backprojection rays from the WBP and GPU computing, the speedup factors against
different images reinforces the density at the points the standard recomputation-based WBP on the CPU
where the mass is in the original structure. Therefore, were computed. As it is shown in the evaluation results
the mass of the object is reconstructed. Formally, the at Fig. 2, matrix WBP on the GPU yield excellent net
backprojection can be defined by means of the matrix speedups up to 165x, especially for huge datasets as
backprojection operator B as: commonly used in the tomography field.

g ¼ B ps (1) Denoising Method


Several simple linear methods, Gaussian or kernel-based
where B is a sparse matrix related with the projection filtering, reduce the noise at reasonable time but at the
geometry and ps is the vector of sinograms for the expense of blurring the structural features; then, in ET it
different tilt angles of the slice s. When the number of is essential to apply nonlinear methods, such as the
tilt angles is large enough, the vector g is a good estima- Anisotropic Nonlinear Diffusion or Beltrami filter,
tion of the slice g*. Therefore, the 3D reconstruction which preserve or even highlight the structural features.
consists of the following set of matrix-vector products: They are computationally intensive and GPU computing
can relevantly reduce their run-times.
gs ¼ B ps with 0  s  Nslices (2) The Beltrami flow is a selective noise filtering
method that preserves structural features; its formula-
where Nslices is the total number of slices. tion is based on the iterative solution of the following
It is important to note that the matrix B is sparse differential equation (Kimmel et al. 2000):
and the location of nonzero coefficients exhibits
 
some regularity and several kinds of symmetries. These 1 HI
It ¼ pffiffiffi div pffiffiffi (3)
characteristics are key to develop efficient matrix g g
implementations of 3D WBP. At the beginning of 3D
reconstruction process, the nonzeroes of B are computed where It ¼ @I =@t denotes the derivative of the image
and stored into one sparse matrix data structure. density I with respect to the time t; ∇I is the gradient
G 848 GPU Computing

GPU Computing, Standard (CPU) vs GPU


Fig. 2 Effective speedup
derived from different 180 standard 167,46
163,52 163,78
approaches (the standard general
160 150,65
based on recomputation and sym1
the four matrix approaches) on sym2 136,72
140
sym3
the GPU compared to the
standard approach on the CPU 120 106,97

Speed-up
100

80 69,50
60

40

20

0
128 192 256 384 512 768 1024
Datasets

GPU Computing,
Fig. 3 From left to right, the
original HIV-1 reconstruction
and the results with the noise
reduced at 10, 25, 50, 100,
150, and 200 iterations are
shown. Only a representative
slice of the 3D reconstruction
is presented

GPU Computing, Table 1 Run-times(s) of Beltrami Filter on a core of CPU based on 2 Quad Core Intel Xeon 2,26 Ghz and GPU
code on one Tesla C1060 card for 3D images of dimension V  V  V
V 128 192 256 384 512 640 768
Sequential Beltrami 8,2 28,2 98,7 335,5 800,5 1552,1 2729,3
GPU Beltrami 0,5 1,0 2,1 5,7 14,6 22,9 38,7
Speedup 16,4 28,2 47,0 58,9 54,8 67,8 70,5

vector, that is ∇I ¼ (Ix, Iy, Iz), Ix ¼ ∂I/∂x being the arithmetic operations over every voxel in the 3D
derivative of I with respect to x (similar applies for image. So, GPU computing is capable of accelerating
y and z); g denotes the determinant of the first funda- its computation due to a highly computationally inten-
mental form of the surface, which is g ¼ 1 + |∇I|2; and sive nature of the algorithm. This characteristic is
div is the divergence operator. frequently shared by most of the advanced filtering
Figure 3 is intended to illustrate the performance of methods. The results in Table 1 show that the acceler-
this method in terms of noise reduction and feature ation of Beltrami filter based on GPU computing is
preservation over a representative ET dataset that was very relevant, specially for larger images.
taken from the Electron Microscopy Data Bank (http://
emdatabank.org). Conclusions
To sum it up, the Beltrami Method is translated into Currently, the GPU computing is paramount to
an iterative method where every step includes the same cope with the high computational needs in the
Granular Computing 849 G
systems biology field. It is capable of accelerating
the processes computationally intensive. However Granular Computing
its performance decreases for control-flow inten-
sive processes. There is a wide range of applica- C. Maria Keet
tions in systems biology which can be accelerated KRDB Research Centre, Free University of Bozen-
by GPU computing, but their software must be Bolzano, Bolzano, Italy
reprogrammed or even the algorithms need
rethinking in order to adapt them to the GPU
platforms. Definition

Granular computing is an emerging paradigm in


Cross-References computing and applied mathematics to process
data and information, where the data or information
▶ General-Purpose Computation, Graphics Processing are divided into so-called information granules that G
Units come about through the process of granulation. An
▶ High-Performance Computing, Structural Biology information granule is a collection of entities that,
▶ Multicore Computing in the scope of granular computing, usually origi-
nate from numerical analysis to group entities
together at a certain level of granularity, thanks to
their similarity, functional or physical adjacency, or
References indistinguishability.
The principal techniques used in granular comput-
Castaño-Diez D, Moser D, Schoenegger A, Pruggnaller S,
ing are machine learning (▶ Identification of Gene
Frangakis AS (2008) Performance evaluation of image
processing algorithms on the GPU. J Struct Biol Regulatory Networks, Machine Learning), fuzzy sets
164:153–160 and logic (▶ Fuzzy Logic), rough sets, ▶ clustering,
Frank J (2006) Electron tomography: methods for three- ▶ data mining, and, to a lesser extent, ontologies.
dimensional visualization of structures in the cell, 2nd edn.
Hence, the principal approach with granular comput-
Springer, New York, p 455
Herman GT (1980) Image reconstruction from projections: the ing is that of scale-based, quantitative, ▶ granularity
fundamentals of computerized tomography. Academic, with a focus on classifying instance data into their
New York appropriate level of granularity as well as attempts to
Kaeli DR, Leeser M (2008) Special issue on general-purpose
generate the best possible partitioning and, thereby, the
processing using graphics processing units. J Parallel Distrib
Comput 68:1305–1402 optimal hierarchy of levels of granularity.
Kimmel R, Malladi R, Sochen NA (2000) Images as embed- Meyers (2009) contains several entries for granular
ded maps and minimal surfaces: movies, color, texture, computing, and Yao (2010) provides a recent state of
and volumetric medical images. Int J Comput Vis
the art concerning theoretical foundations and applica-
39:111–129
Kirk DB and Hwu WW (2010) Programming Massively Parallel tions of granular computing.
Processors: A Hands-on Approach. MorganKaufmann.
pp 256
Lucic V, Forster F, Baumeister W (2005) Structural studies by
electron tomography: from cells to molecules. Ann Rev
Cross-References
Biochem 74:833–865
Vázquez F, Garzón EM, Fernández JJ (2010) A matrix approach ▶ Granularity
to tomographic reconstruction and its implementation on
GPUs. J Struct Biol 170:146–151
Vázquez F, Garzón EM, Fernández JJ (2011) A new approach
for sparse matrix vector product on NVIDIA GPUs. Concurr References
Comput-Pract. doi:10.1002/cpe.1658
Xu W, Xu F, Jones M, Keszthelyi B, Sedat J, Agard D, Meyers RA (2009) Encyclopedia of complexity and systems
Mueller K (2010) High-performance iterative electron science. Springer, Berlin
tomography reconstruction with long-object compensa- Yao JT (ed) (2010) Novel developments in granular computing:
tion using graphics processing units (GPUs). J Struct applications for advanced human reasoning and soft compu-
Biol 171:142–153 tation. IGI Global, Hershey
G 850 Granularity

Granularity, Table 1 Sample granular perspective for human


Granularity structural anatomy (criterion) granulated by parthood (nrG type
of granularity), with six granular levels and sample entities
residing at each level (which also could be their respective
C. Maria Keet instances)
KRDB Research Centre, Free University of Bozen- Name of level Sample contents of each level
Bolzano, Bolzano, Italy Body Male human body
Organ Liver, pancreas
Tissue Epithelium, smooth muscle
Definition Cell Erythrocyte, melanocyte
Organelle Ribosome
Granularity concerns the ability to represent and operate Molecule b-galactosidase, Insulin
on different levels of detail in data, information, and
knowledge that are located at their appropriate level.
The entities are described relative to that level, which a certain level. The first basic questions that arise,
may be more coarse-grained or concern fine-grained then, are: what are the components that have to do
details. Devising these ordered levels of granularity in with granularity, and how does one obtain them? The
a granular perspective are either determined by the laws answers to these questions may depend on the emphasis
of nature or are a resultant of human cognition to divide one takes regarding granularity, where the principal
the data, information, or knowledge. dimensions are:
1. Arbitrary scale versus non-scale-dependent
granularity, roughly fitting with quantitative versus
Characteristics qualitative granularity
2. How levels, and its contents, in a granular perspec-
Introduction tive relate to each other
Multiscale analysis of biological systems, such as in 3. Difference in emphases, being focused on either
metagenomics, requires one to traverse from the molec- the entities, their relations, or the criterion for
ular level of detail, to cells, bacterial communities, up to granulation
micro- and macro-environments and habitats. Longer 4. The (mathematical) representation, such as based
established disciplines, such as plant and animal taxon- on set theory, ▶ mereology, or an encompassing
omy, categorize specimens in the tree that contains more framework that accommodates both
(Genus-level) or less (e.g., Family-level) details. That Such different emphases have brought forward var-
biology concerns different levels of detail and hierarchi- ious proposals for representing granularity and, to
cal systems has been noted widely (Vogt 2010; Salthe a lesser extent, how one can manage the granulated
1985), and efforts have gone into the characteristics of data, information, and knowledge.
granularity, how one can model it, and how to effectively
use it to manage the large amounts of data, information, Theories of Granularity
and knowledge. Theories of granularity are predominantly informal
Granularity is relatively static: once the granular and subject domain oriented (Salthe 2001), with
levels and hierarchies in a subject domain are charac- a few exceptions that focus on the subject domain
terized, it is a static structure (a granularity frame- independent logic-based representation (Bittner and
work) either imposed on the data, information, or Smith 2003; Keet 2008). Thus far, the most compre-
knowledge being granulated or based on it being inher- hensive effort to find answers to the two aforemen-
ent in the entities in reality themselves. Entities can tioned questions is proposed by Keet (2008), which
undergo changes and thereby move from a finer- takes into account the four distinct emphases regarding
grained level to a coarser-grained one, or vice versa, granularity and it provides a formal characterization
but are not present at two levels in the same hierarchy informed by ontology. The principal “ingredients” for
at the same time; e.g., a single cell at the Cell-level of any granularity framework, are the entities that are
granularity develops into a multicellular organism at the assigned to different levels of granularity that, in
Organism-level. Entities are somehow “assigned to” turn, form a hierarchy of levels so that one obtains
Granularity 851 G
Granularity, Fig. 1 Top- cG
level taxonomy of types of
granularity, each with one or sG nG
more examples, where Lx
stands for a particular level in
the granular perspective sgG saG nrG nfG naG
Single ‘black boxes’,
granulation folding entities
relation, e.g. and relations
part-of, (L2) into one
spatial entity (L1)
containment
sgrG sgpG saoG samG nacG nasG
Cell wall Physical Map of the Aggregate Team (L1) as Set of
represented size: coin earth with by hour (L1), aggregate of organs (L1),
as line (IL1), separator, isotherms in minute (L2), its players division into
lipid bi-layer or dialysis steps of 10 and second (L2), colony Pancreas,
(L2), and tubes (L1), 5 (L1) (L3) (L1) as Liver (L2) G
3D-structure or 1 (L3) aggregate of
(L3) degrees bacteria (L2)

a granular perspective on a particular domain of inter- Granularity, Table 2 Distinguishing characteristics at the
branching points of the top-level taxonomy of types of granular-
est, provided the granulation is carried out in
ity of Fig. 1
a consistent manner using a particular criterion for
Branching
granulation and type of granularity (mechanism of
point Distinguishing feature
granulation, see below). It can be proven that to be
sG – nG Scale (quantitative) – non-scale (qualitative)
able to have an instance of a granularity framework, it sgG – saG Grain size (scale on entity) – aggregation (scale of
requires at least two adjacent levels of granularity in entity)
a granular perspective; an informal example of sgrG – sgpG Resolution – size of the entity
a granular perspective is shown in Table 1. The gran- saoG – samG Overlay aggregated – entities aggregated
ular perspectives within one granularity framework according to scale
can be linked to each other, provided the contents in naG – nrG – Semantic aggregation – one type of relation
nfG between entities in different levels – different
the levels of different perspectives overlap; e.g., Insu-
type of relation between entities in levels and
lin is located at the Molecule-level as a subtype of relations among entities in level
Peptide and as a subtype of Hormone in a functional nacG – nasG Parent-child not taxonomic and relative
perspective. independence of contents of higher/lower level –
Granulation is the act of devising the granular parent-child with taxonomic inheritance
levels and perspectives and assigning contents to
those levels, which can be done manually or computa-
tionally. For data and information granulation, the categorizes the mechanism of granulation, of which
automated approach is generally called ▶ granular the eight principle types are described in Fig. 1 and
computing (Yao 2010), whereas at the knowledge Table 2 (see Keet 2008, Chap. 2 for motivation
layer with its entities at the intensional level (classes, and explanation). The main distinction is between
concepts, relationships, etc.), one uses processes such quantitative and qualitative granularity. Concerning
as ▶ modularization, ▶ abstraction, and expansion that qualitative granularity, nrG’s granulation relations
are constrained by the criterion for granulation and include part-whole relations that are either a type of
type of granularity that together determine uniqueness true parthood (▶ Mereology) or motivated by linguis-
of a granular perspective. tics (▶ Meronymy), such as structural part-of (e.g.,
Table 1), contained in, and being a member of some-
Types of Granularity thing. Examples for nfG are the MAPK cascade and
Good granular perspectives adhere to underlying prin- the Second messenger system with its components, and
ciples regarding how the levels are identified. This can clustering in ER models. nacG’s aggregate generally is
be structured in a taxonomy of types of granularity that termed with a meaningful collective noun and such
G 852 Granularity

Granularity, Table 3 Typical classification levels in ecosystems using as criteria for granulation similar spatial scales and
subdivisions in biotic and abiotic environments, resulting in seven granular perspectives (depicted as columns) that each contains
eight granular levels or less (rows). Each named level contains instances, such as Palearctic and Afrotropic in the Ecozone-level
Biotic Abiotic
Ecosystem Biogeography Zoogeography Phytogeography Physiography Geology Pedology
Ecozone Biome Floral kingdom
Ecoprovince Zoogeographic province Floral province Geoprovince
Ecoregion Bioregion Floral region Physio-region Georegion Pedoregion
Ecodistrict
Ecosection
Ecosite
Ecotope Biotope Zootope Phytotope Physiotope Geotope Pedotope
Ecoelement Bioelement Geoelement

that the instances of the aggregate are different from Granularity, Table 4 Two abbreviated granular perspectives
instances of its members and a change in its members (table columns), one for plant anatomy with parthood and one
time granularity, which are linked at three levels in the hierarchy
does not change the meaning of the whole. sgrG’s (denoted with “,”) so that cross-granular conditions can be
grain size with respect to resolution can also be applied asserted and, e.g., information retrieved from the information
in GIS; e.g., the city of Paris is represented on carto- system
graphic maps as polygon or as point. With sgpG, one Plant sample Time granularity
focuses on the physical size of the entities themselves; Tissue , Day
within one level one can distinguish instances of, say, ↕ ↕
<5 and 1 mm, but instances <1 mm are indistin- Cell , Hour
guishable from each other (but are distinguishable in ↕ ↕
a finer-grained level), such as sieves with different Molecule , Millisecond
pore sizes or two objects touching each other, like the
wallpaper and the wall where, when zoomed in, we
also observe the glue that connects the wallpaper to the
wall. samG has an associated mathematical function that, say, biological sample analysis of a cell culture –
for measured aggregation (1960s in a minute and so be it the structural components or the processes it is
forth). An example of granulation motivated by scale is involved in – yields information at a time granularity
included in Table 3. of hours (Table 4).
The interaction of granularity with notions such
Challenges as ▶ emergence, ▶ holism, and ▶ reduction is to be
Currently, there is still a gap between the ▶ knowledge explored further. Granularity can provide a methodo-
representation of granularity with its (onto)logical logical modeling framework to enable structured
foundations and implementations (▶ Granular Com- examination of claims of emergence both from
puting), being how to relate structured thinking and a formal ontological modelling and computational
structured problems solving, respectively (Yao 2005), angle by making the complex at least less complex,
in particular with respect to dealing with attributes in and aids understanding which levels are essential for
granular computing and its ontological counterpart explanation of some property of observed behavior.
of criterion of granulation. In addition, conditional
granularity across granular perspectives is to be
addressed in applications, such as linking time granu- Cross-References
larity (Euzenat and Montanari 2005) together with
qualitative granularity that already can be modeled ▶ Abstraction
unambiguously with the theory of granularity (Keet ▶ Emergence
2008; Vogt 2010). Examples for its need are abound: ▶ Granular Computing
for instance, in a multiscale analysis, one has to know ▶ Holism
Graph Algorithms in Network Analysis 853 G
▶ Mereology A directed graph is a tuple (V,E) where V is a set of
▶ Meronymy vertices and E {(x,y)x,y2V} is a set of arcs. There-
▶ Modularity fore, (x,y) and (y,x) are different arcs, while for an
▶ Modularization undirected graph, {x,y} and {y,x} are two representa-
▶ Organism State, Lymphocyte tions of the same edge.
▶ Reduction An edge {v,w} (or arc (v,w)) is said to be incident
▶ Top-Down Decomposition of Biological Networks with vertices v and w. v and w themselves are said to be
adjacent. The degree of a vertex is the number of edges
(arcs) incident with it.
References A labeled graph is a tuple (V,E,l) where (V,E) is
a graph (either directed or undirected) and l:V[E ! S
Bittner T, Smith B (2003) A theory of granular partitions. In: is a labeling function assigning to all vertices and
Duckham M, Goodchild MF, Worboys MF (eds) Founda-
edges (or arcs) labels from some alphabet of labels S.
tions of geographic information science. Taylor & Francis,
London, pp 117–151 A hypergraph is a graph whose edges (arcs) can link G
Euzenat J, Montanari A (2005) Time granularity. In: Fisher M, more than two vertices. A multigraph is a graph where
Gabbay D, Vila L (eds) Handbook of temporal reasoning in there may be more than one edge (arc) between a given
artificial intelligence. Elsevier, Amsterdam, pp 59–118
pair of vertices.
Keet CM (2008) A Formal theory of granularity. Ph.D. Thesis,
KRDB Research Centre, Faculty of Computer Science, Free Graph theory is the study of the properties of graphs
University of Bozen-Bolzano, Italy (Diestel 2002). Algorithmic graph theory is the study
Salthe SN (1985) Evolving hierarchical systems – their structure of algorithms to compute properties of graphs (Gross
and representation. Columbia University Press, New York
and Yellen 2004). ▶ Graph mining is the study of
Salthe SN (2001) Summary of the Principles of Hierarchy
Theory. http://www.nbi.dk/natphil/salthe/hierarchy_th.html. performing data mining on and machine learning
Accessed 10 October 2005. from data represented with graphs.
Vogt L (2010) Spatio-structural granularity of biological mate-
rial entities. BMC Bioinformatics 11:289
Yao YY (2005) Perspectives of granular computing. IEEE Conf
Granul Comput (GrC2005) 1:85–90 References
Yao JT (ed) (2010) Novel developments in granular computing:
applications for advanced human reasoning and soft compu- Diestel R (2010) Graph theory. Springer, Heidelberg
tation. IGI Global, Hershey Gross JL, Yellen J (2004) Handbook of graph theory. CRC Press,
Boca Raton

Graph

Jan Ramon Graph Algorithms in Network Analysis


Declarative Languages and Artificial Intelligence
Group, Katholieke Universiteit Leuven, Seok-Hee Hong
Leuven, Belgium School of IT, University of Sydney, Sydney, NSW,
Australia

Definition
Characteristics
A graph is an abstract representation of a set of objects,
some of which are connected by links. The objects are Graph Theory
represented with vertices (also called nodes) and the We first define necessary terminologies in graph
links with edges or arcs. Several types of graphs are theory. For more details, see (Bondy and Murty 1976;
used depending on what is suitable for a particular West 2001).
application. A graph is a popular mathematical structure for
An undirected graph is a tuple (V,E) where V is a set modeling entities and the relationships between
of vertices and E {{x,y}x,y2V} is a set of edges. them. A graph G ¼ (V, E) consists of a set of
G 854 Graph Algorithms in Network Analysis

Graph Algorithms in a 1 b 1
Network Analysis,
Fig. 1 Example of (a) an
undirected graph, and (b)
a directed graph

2 3 4 2 3 4

a 1 b 1 c 1

2 3 2 3 2 3
Graph Algorithms in
Network Analysis,
Fig. 2 Example of (a)
a DAG, (b) a directed graph
with a cycle, and (c) the
directed cycle in (b) 4 4

vertices (or nodes) V and a set of edges E, where an edge A path in a graph is a sequence of vertices such that
e 2 E is defined by a pair of vertices v, w 2 V and denoted each vertex in the sequence is connected by an edge to
by e ¼ (u, v). We say that an edge e ¼ (u, v) is incident to the next vertex in the sequence. The length of a path
vertices u and v, and that u and v are adjacent to each is the number of edges in the path. A cycle is a path where
other. The vertices u and v are called the endvertices of the starting vertex is the same as the ending vertex.
e. The degree of a vertex v 2 V in a graph G is the A path or cycle is called Hamiltonian if it visits all the
number of edges incident to v. For example, vertex 3 of vertices of the graph exactly once. A graph is acyclic if it
the graph shown in Fig. 1a has degree 5. contains no cycles. A directed acyclic graph is called a
An undirected graph is a graph in which the edges DAG. For example, the directed graph in Fig. 2a is a
have no orientation. A directed graph (or digraph) is DAG, while the directed graph in Fig. 2b is not a DAG,
a graph in which the edges have an orientation, i.e., an since it contains the directed cycle shown in Fig. 2c.
edge e ¼ (u, v) is defined by an ordered pair of two A tree T is a connected simple graph with no cycles.
vertices v, w 2 V. For example, Fig. 1a shows an A vertex of degree 1 in a tree is called a leaf. A non-leaf
undirected graph, and Fig. 1b shows a directed graph. vertex is called an internal vertex. A rooted tree has
A weighted graph is a graph in which the edges have a special vertex called the root, and is often treated as
weights, such as integers or real numbers. a DAG with the edge directions originating from the
A loop is an edge e whose endvertices are the same root. A k-ary tree is a rooted tree in which every
vertex, i.e., e ¼ (v, v). An edge e ¼ (u, v) is called internal vertex has at most k children. In particular,
a multiple edge, if there is another edge e 0 ¼ (u, v) with a 2-ary tree is called a binary tree. For example, Fig. 3b
the same endvertices u and v. A multigraph is a graph shows a tree, and Fig. 3c shows a rooted binary tree.
with multiple edges. For example, the undirected graph A bipartite graph G ¼ (V, W, E) consists of two
in Fig. 1a has multiple edges between vertex 1 and disjoint vertex sets V and W (i.e., no two vertices in V
vertex 3, and the directed graph in Fig. 1b has a loop at are adjacent and no two vertices in W are adjacent), and
vertex 4. A simple graph has no multiple edges or an edge set E, where every edge e ¼ (v, w) has an
loops. In most cases, the graph refers to a simple undi- endvertex v 2 V and the other endvertex w 2 W.
rected graph. Figure. 3a shows an example of a bipartite graph.
Graph Algorithms in Network Analysis 855 G
Graph Algorithms in 2
Network Analysis,
b
a1 2 7
Fig. 3 Example of a (a)
bipartite graph, (b) a tree, and 12 6 1 3 c 1
(c) a rooted binary tree 2 3

5 8 4 5 6 7
4

3 4 5 9 10 11 8 9

a b c d
1 2 1 2 1 2 1 2

Graph Algorithms in G
Network Analysis,
Fig. 4 Example of (a)
a clique, (b) a subgraph of (a),
(c) a spanning subgraph of (a),
and (d) a spanning tree of (a) 4 3 3 4 3 4 3

a 1

b
5 8 2 3

2 3
1
7
Graph Algorithms in
Network Analysis,
Fig. 5 Example of (a) 4 6 9 4 5
a disconnected graph, and (b)
a graph with a cut vertex 10

A regular graph is a graph in which every vertex connected, if every pair of vertices is reachable; other-
has the same degree. For each n, the complete graph, wise, it is disconnected. A directed graph is weakly
denoted by Kn, is a simple graph with n vertices in connected, if it contains an undirected path for every
which every vertex is adjacent to all the other vertices. pair of vertices. It is strongly connected, if it contains
An (induced) subgraph G0 ¼ (V0 , E0 ) of a graph a directed path for every pair of vertices. A cut vertex is
G ¼ (V, E) is a graph where V0 is a subset of V, E0 is a vertex whose removal disconnects the remaining
a subset of E, and all endvertices of e 2 E0 are in V0 . subgraph. A bridge (or cut edge) is an edge whose
A subgraph G0 is a spanning subgraph of G if V ¼ V0 . removal disconnects a graph. If a graph is still
A spanning tree is a spanning subgraph that is a tree. connected after removing any k1 vertices, then the
For example, Fig. 4a shows complete graph K4, and graph is called k-connected. For example, the graph in
Fig. 4b shows a subgraph of K4. Fig. 4c shows a Fig. 5a is disconnected, and the graph in Fig. 5b is
spanning subgraph of K4, and Fig. 4d shows a spanning connected, with vertex 1 as a cut vertex.
tree of K4. The distance d(u, v) between two vertices u and v in
A pair of vertices v and w in a graph G is reachable, a graph G is the length of the shortest path between
if there is a path between v and w in G. A graph G is them. The eccentricity of a vertex v in a graph G is the
G 856 Graph Algorithms in Network Analysis

Graph Algorithms in a 1 b 1
Network Analysis,
Fig. 6 Example of (a)
Breadth-first search (BFS) and
(b) Depth-first search (DFS) 2 3 4 2 5 7

5 6 7 8 3 4 6 8

maximum distance from v to any other vertex. A center Both the DFS and BFS algorithms can be implemented
is a vertex with the minimum eccentricity. The diam- to run in O(|V| + |E|) time, i.e., linear in the size of the
eter of a graph G is the maximum eccentricity over all graph G. For details, see (Cormen et al. 2001; Goodrich
vertices in that graph. For example, the graph in Fig. 5b and Tamassia 2001). For example, Fig. 6 depicts exam-
has a diameter 2, and vertex 1 is the center of the graph. ples of BFS and DFS, where the number of a vertex
The distance between vertex 2 and vertex 5 is 2. represents the ordering performed by BFS and DFS.

Graph Algorithms Connected Components


We now briefly explain basic graph algorithms. For A connected component of an undirected graph
more details on the full description of each algorithm, G ¼ (V, E) is a maximal connected subgraph
see (Cormen et al. 2001; Goodrich and Tamassia G0 ¼ (V0 , E0 ) of G in which any two vertices u and v
2001). in G0 are reachable from each other (i.e., they are
connected by a path in G0 ). For example, the graph
Graph Traversal shown in Fig. 5a has three connected components: One
A graph traversal is a method of visiting all the vertices consists of vertices {1, 2, 3, 4}, and the others consists
of a graph G, starting from a given vertex v, where the of vertices {5, 6, 7, 8, 9} and an isolated vertex 10.
elementary move is from one vertex to another along A strongly connected component of a directed
an edge connecting them. There are two well-known graph G ¼ (V, E) is a maximal strongly connected
traversal methods: breadth-first search (BFS) and subgraph G0 ¼ (V0 , E0 ) of G in which every two vertices
depth-first search (DFS). u and v in G0 are reachable from each other in both
BFS begins at a chosen vertex v, and visits all neigh- directions (i.e., they are connected by a directed path
bor vertices u1, u2,. . .,uk adjacent to v; then it recursively from u to v and a directed path from v to u).
visits all the neighbors of ui, until it explores all the It is easy to compute the connected components of
vertices of G that are reachable from v. In this way, BFS an undirected graph G in linear time, by using either
visits all the vertices of distance k from v, before it visits BFS or DFS. One can use DFS to compute the strongly
the vertices of distance k + 1 from v. connected components of a directed graph G in linear
BFS produces a breadth-first tree with root v which time. For details, see (Cormen et al. 2001; Goodrich
contains all the reachable vertices u from v. The path and Tamassia 2001).
from v to u in the breadth-first tree corresponds to the
shortest path (i.e., the path containing the smallest Minimum Spanning Tree
number of edges) from v to u in G. A minimum spanning tree (MST) of a connected
DFS explores a graph G from a chosen vertex v by weighted undirected graph G is a spanning subtree T
searching deeper in the graph whenever possible; it of G that connects all the vertices of G with the min-
traverses the depth of G before the breadth of G. DFS imum total edge weight, among all possible spanning
explores unexplored edges from the most recently vis- trees of G. For example, Fig. 7b shows a minimum
ited vertex u, until it explores all the edges from u. It spanning tree of a graph in Fig. 7a.
then backtracks to a vertex w, that was discovered from There are two well-known algorithms to compute
v, and explores the other unexplored edges from w. a MST of a weighted undirected graph: Prim’s algo-
DFS repeats this process until it visits all the vertices rithm and Kruskal’s algorithm. Both algorithms are
that are reachable from v. greedy algorithms, i.e., they make the best possible
Graph Algorithms in Network Analysis 857 G
Graph Algorithms in a b
Network Analysis, 3
2
15 3
1 3 1 2 3
Fig. 7 (a) A weighted
connected graph, and (b) its
13 12 11 7 7
minimum spanning tree
8 2 4 2 4

4 5 6 4 5 6
5 10 5

a 2
4
4
b 2 4
2 15 3 2
3
Graph Algorithms in
Network Analysis, 7 20 1 6
1 6
Fig. 8 Example of a shortest 6 G
path: (a) a weighted graph, and
1 12 12
(b) the shortest path between 5
3 5 3 5
vertices 1 and 6

available choice at each stage of the algorithm. More shows a shortest path between vertex 1 and vertex 6 of
specifically, at each step of the algorithm, they add one the graph in Fig. 8a.
safe edge at a time (i.e., an edge that does not create There are two problems: the single-source shortest
a cycle), preserving the minimum weight spanning tree path problem and the all-pair shortest path problem.
property.
Kruskal’s algorithm first sorts the edges in Single-Source Shortest Path The aim of the single-
a nondecreasing order by their weights. It then finds source shortest path problem is to find shortest paths
a safe edge to add to the growing forest (i.e., a set of from a chosen source vertex v to all the other vertices in
disconnected trees), by finding an edge e ¼ (u, v) with the graph. For unweighted graphs, one can use breadth-
the minimum weight among all the edges that connect first search algorithm to compute a shortest path from v
any two trees in the forest. to all the other vertices. For weighted graphs, there are
Prim’s algorithm maintains a single tree T ¼ (VT, two well-known algorithms: Dijkstra’s algorithm and
ET) at each stage of the algorithm. The tree initially the Bellman-Ford algorithm.
only contains an arbitrary vertex v and grows until it Both algorithms produce a shortest path tree,
spans all the vertices in the graph. At each step, it finds starting from the source vertex v to all the other reach-
a safe edge with the minimum weight among all edges able vertices in the graph. Each vertex u has a value on
e ¼ (u, v), where u 2 V  VT and v 2 VT (i.e., an edge shortest path estimate (initially assigned with a very
connecting a vertex in T and a vertex in G  T). large value). Each step of the algorithms tries to
The running time of Prim’s algorithm and Kruskal’s decrease the estimate value, using the relaxation tech-
algorithm depends on the data structures that are used. For nique. More specifically, relaxing an edge e ¼ (u, w)
example, Prim’s algorithm can be implemented to run in tests whether one can improve the shortest path from v
O(|E|log|V|) time, using a binary heap data structure. to w by rerouting the path through u.
Kruskal’s algorithm can be implemented to run in O(|E| Dijkstra’s algorithm solves the single-source
log|V|) time, using a disjoint-set data structure. For faster shortest path problem on a weighted directed graph
implementations using more complex data structures, see G ¼ (V, E), with nonnegative edge weights. It also
(Cormen et al. 2001; Goodrich and Tamassia 2001). uses a greedy approach, and maintains a set S of ver-
tices whose final shortest-path weights from v have
Shortest Path already been determined. The algorithm repeatedly
The aim of the shortest path problem is to find a path chooses a vertex u 2 V  S with the minimum
between two vertices in a weighted graph, such that the shortest-path estimate, adds u to S, and relaxes all the
sum of edge weight is minimized. For example, Fig. 8b edges leaving u. Dijkstra’s algorithm can be
G 858 Graph Algorithms in Network Analysis

implemented to run in O(|V|log|V| + |E|) time. For 14 4 17


15 13 6 8
details, see (Cormen et al. 2001; Goodrich and
Tamassia 2001). 2 1 3
16 9
The Bellman-Ford algorithm solves the single-
source shortest path problem on a weighted directed
graph G ¼ (V, E), where the edge weights may be 17 19
5
12
11
10
18
negative. It either detects negative-weight cycles, or
progressively decreases an estimate on the weight of Graph Algorithms in Network Analysis, Fig. 9 Example of
a shortest path from v until it discovers the actual centrality analysis
shortest-path weight. The running time of the algo-
rithm is O(|V||E|). For details, see (Cormen et al. 1994). For algorithmic aspects of network analysis, see
2001; Goodrich and Tamassia 2001). (Brandes and Erlebach 2005). For network analysis of
biological networks, see (Junker and Schreiber 2008).
All-Pair Shortest Path The aim of the all-pair
shortest path problem is to find the shortest paths Individual-level Analysis
between every pair of vertices u and v in the graph. The centrality index aims to compute the importance
The problem can be solved by repeatedly using the of actors in a social network. There are many different
algorithm for the single-source shortest path problem, centrality measures available, which are based on the
for each vertex v 2 V as a source vertex. definition of importance in specific applications. For
Alternatively, the Floyd-Warshall algorithm solves definitions of various centrality measures, see (Bran-
the all-pairs shortest path problem in O(|V|3) time using des and Erlebach 2005; Wasserman and Faust 1994).
a dynamic programming approach. The algorithm For algorithms for computing each centrality measure,
operates on directed graphs that may have negative- see (Brandes and Erlebach 2005).
weight edges, but do not contain any negative-weight
cycles. For details, see (Cormen et al. 2001; Goodrich Degree Centrality For undirected graphs, the degree
and Tamassia 2001). centrality of a vertex v is defined as the degree of v. For
example, in the undirected graph in Fig. 9, vertices 2
Network Analysis and 3 have the highest degree centrality.
A network is an extension of the concept of a graph. For directed graphs, there are two variants of degree
The network sometimes contains more information, centrality: the in-degree centrality (i.e., the number of
such as vertex attributes and edge attributes. Network incoming edges) and the out-degree centrality (i.e., the
analysis spreads from social sciences to complex sys- number of outgoing edges).
tems, communication networks, bioinformatics, trans-
portation systems, and project planning. Closeness Centrality In a social network analysis,
Roughly speaking, there are three different levels of a vertex with a small total distance maybe more impor-
analysis: tant than a vertex with a high total distance. More
• Individual-level: Examples include centrality mea- formally, closeness centrality of a vertex u can be
sures such as degree centrality and betweenness defined as follows:
centrality.
• Group-level: Examples include clique analysis, 1
cC ðuÞ ¼ (1)
k-core analysis, structural equivalence, and sðuÞ
blockmodeling.
• Network-level: Examples include network statistics where s(u) denotes the sum of the distances from
(such as degree distribution, diameter, average path a vertex u 2 V to all the other vertices v in a graph
length, and clustering coefficient) and network com- G ¼ (V, E). For example, vertex 1 of the graph shown
parison (such as graph isomorphism and similarities). in Fig. 9 has the highest closeness centrality.
In the following, we describe some of the basic
concepts and methods in network analysis. For details Eccentricity Centrality Hage and Harary defined the
on social network analysis, see (Wasserman and Faust eccentricity e(u) of a vertex u as the maximum distance
Graph Algorithms in Network Analysis 859 G
from u to a vertex v in the graph (i.e., e(u) ¼ max{d(u,
v) : v 2 V }). More formally, eccentricity centrality of
a vertex u is defined as follows:

1 1
cE ðuÞ ¼ ¼ (2) k=3
eðuÞ maxfdðu; vÞ : v 2 Vg
k=2

Stress Centrality The definition of stress centrality is k=1


based on the set of shortest paths in a graph. The formal
definition of stress centrality of a vertex u is as follows:
Graph Algorithms in Network Analysis, Fig. 10 Example of
X X k-core analysis
cS ðuÞ ¼ sst ðuÞ (3)
s6¼u2V t6¼u2V G
k-Core Analysis Another way to identify dense sub-
where sst (u) denotes the number of shortest paths graphs in a graph is k-core analysis. The k-core of
between s and t containing u. For example, vertex 1 a graph G is a maximal connected subgraph of G, in
of the graph in Fig. 9 has the highest stress centrality. which all the vertices have degree at least k. Equiva-
lently, a k-core is a connected component of the sub-
Shortest Path Betweenness Centrality Let dst (u) graph of G, formed by repeatedly deleting all vertices
denote the fraction of shortest paths between s and t of degree less than k. The k-core of a graph G can be
that contain vertex u; i.e., computed efficiently in linear time by repeatedly
removing the vertex with the smallest degree.
sst ðuÞ Note that a k-core is a subgraph of a (k1)-core. For
dst ðuÞ ¼ (4)
sst example, in Fig. 10, the dark gray region indicates the
3-core of the graph, and the light gray region shows the
where sst denotes the total number of shortest paths 2-core of the graph.
between s and t.
More formally, the shortest path betweenness Structural Equivalence and Blockmodel Two ver-
centrality of a vertex u is defined as follows: tices u and v are structurally equivalent, if they are
connected to exactly the same vertices in a graph G
X X (intuitively, they hold identical positions in the net-
cB ðuÞ ¼ dst ðuÞ: (5) work). More formally, two vertices u and v are struc-
s6¼u2V t6¼u2V
turally equivalent if, for each edge (u, x) in G, there is
an edge (v, x) in G, and for each edge (v, x) in G, there is
For example, the vertex 1 of a graph in Fig. 9 has an edge (u, x) in G.
highest shortest path betweenness centrality. We can partition the vertex set V of G into structural
equivalence classes or blocks such that if u and v are
Group-level Analysis structurally equivalent, then they belong to the same
Clique Analysis A clique is a dense cohesive sub- equivalence class X. For algorithms for computing struc-
group in a network. Formally, a clique of an undirected tural equivalence, see (Brandes and Erlebach 2005).
graph G ¼ (V, E) is a subset U of V, such that every pair Given a structural equivalence relation, if a vertex u
of vertices u, v 2 U are adjacent (i.e., the induced in block X is connected to a vertex w in block Y, then
subgraph G[U] is a complete graph). A maximum every vertex in block X is connected to every vertex in
clique of a graph G is a clique with the largest size. block Y. Then, by defining an edge (x, y) between the
However, testing whether a given graph G has blocks X and Y, we can construct a reduced graph or
a clique of size k is an NP-complete problem, which blockmodel of the network.
is difficult to solve. For details, see (Brandes and Intuitively, blockmodeling represents the relation-
Erlebach 2005). ships between social positions. For example, Fig. 11b
G 860 Graph Algorithms in Network Analysis

shows a blockmodel of the graph in Fig. 11a, defined Vertices u and v are automorphically equivalent if
according to structural equivalence. Block a consists v ¼ a(u) for some automorphism a. For example,
of vertex 1, block b consists of vertex 2, block e Fig. 11c shows a blockmodel of the graph in Fig. 11a
consists of vertex 5, and block f consists of the vertex as defined by automorphic equivalence. Block a
set {6, 7}. consists of vertex 1, block b consists of the vertex
There are two relaxations of structural equivalence: set {2, 3}, block c consists of the vertex set {4, 5},
automorphic equivalence and regular equivalence. and block d consists of the vertex set {6, 7}.
Informally, automorphically equivalent vertices have Computing the automorphisms of a given graph is
the same position in a network in a more abstract sense a computationally hard problem known as isomor-
than structurally equivalent vertices: They are not phism complete. However, for special classes of graphs
connected to the exactly same vertices, but to vertices such as trees and planar graphs, an automorphism can
that play analogous roles in the network. be computed efficiently in linear time. For details, see
To make this idea more formal, we need the concept (Brandes and Erlebach 2005).
of an automorphism. A permutation a of V is an auto-
morphism of the graph G ¼ (V, E) if, for each edge Network-level Analysis
(u, x) in G, (a(u), a(x)) is an edge in G, and vice versa. Graph Isomorphism An isomorphism between two
graphs is a mapping between the vertex sets of the two
graphs, which preserves adjacency. An isomorphism
1
a b a c of graphs G and H is a bijection f: V (G) ! V (H)
a between the vertex sets of G and H, such that any two
vertices u and v of G are adjacent in G if and only if f
(u) and f (v) are adjacent in H.
2 3 For example, Fig. 12a and b are isomorphic graphs,
b c b
since we can define a mapping as follows: {(4, a),
(1, b), (5, e), (2, c), (6, f), (7, g), (3, d), (8, h), (9, i),
(10, j)}. However, Fig. 12a and c are not isomorphic
4 5
d e c graphs, since there is no mapping between the two
vertex sets that preserves adjacency.
Testing whether two graphs are isomorphic is a com-
6 7 d putationally hard problem. The time complexity of the
f problem is not known for general graphs. However,
Graph Algorithms in Network Analysis, Fig. 11 Examples
the problem can be solved efficiently in linear time, for
of blockmodels of a graph in (a), based on (b) structural equiv- special classes of graphs such as trees and planar graphs.
alence, and (c) automorphic equivalence For details, see (Brandes and Erlebach 2005).

a
5 c 16 17 18
15
b
1 2 a b 11 12
6 e
Graph Algorithms in
Network Analysis, 7 j
Fig. 12 Example of
isomorphic graphs: the graphs 4 3 d c 14 13
in (a) and (b) are isomorphic
graphs, (c) nonisomorphic
8 9 10 h i f g 19 20
graph
Graph Alignment, Protein Interaction Networks 861 G
References Similarly to sequence alignment, it provides means
for functional and phylogenetic comparison of the
Bondy JA, Murty USR, Bondy JA, Murty USR (1976) Graph proteins.
theory with applications. North Holland, New York
Brandes U, Erlebach T (2005) Network analysis: methodologi-
cal foundations. Springer, Berlin
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduc- Characteristics
tion to algorithms, 2nd edn. MIT Press/McGraw-Hill,
Cambridge, MA/New York
Graph Alignment Structure
Goodrich MT, Tamassia R (2001) Algorithm design. Wiley,
Chichester A graph alignment (▶ Network Alignment) compares
Junker BH, Schreiber F (2008) Analysis of biological networks. protein–protein interaction networks (Protein–Protein
Wiley, Hoboken Interaction Network) (PIN) and groups their proteins
Wasserman S, Faust K (1994) Social network analysis: methods
into equivalence (analogy) classes. In the case of
and applications. Cambridge University Press, New York
West D (2001) Introduction to graph theory. Prentice Hall, a pair-wise comparison, the equivalence is conveniently
Upper Saddle River represented by a pairing of the analogous proteins thus G
creating a map between the nodes (proteins) of the two
networks. The alignment of the links (protein–protein
interaction) results from the aligned nodes.
Graph Alignment Figure 1 represents a pair-wise graph alignment
between two PINs representing a part of the ribosomal
▶ Scoring Function, Graph Alignment complex in two distinct species. Proteins are represented
as the nodes of the network and the protein–protein
interactions as the links. The alignment is shown by
dashed lines, which interconnect the proteins in the
Graph Alignment, Protein Interaction same equivalence class. The alignment of nodes forces
Networks the alignment of links. Several modalities exist for the
aligned links. The links may be either present in both
Michal Kolář species resulting from matching protein–protein interac-
Institute of Molecular Genetics, Academy of Sciences tions or they may be absent in one or both species.
of the Czech Republic, Prague, Czech Republic In practice, the alignment of the two networks is
represented as in Fig. 2. The proteins belonging to the
same equivalence class are overlaid. The line type indi-
Synonyms cates presence or absence of the protein–protein interac-
tions in individual PINs and thus the modality of the link.
Alignment, protein interaction networks; Network An alignment of multiple networks is represented simi-
alignment, protein interaction networks larly, for example, see Fig. 3. There, the nodes represent
proteins of the same equivalence class (e.g., MRPL19,
RPLK, and RLX1). Some equivalence classes may be
Definition absent in one or more of the networks (e.g., MRPL4 and
HP1409 do not have an analogous protein in fish).
Graph alignment of protein–protein interaction A pair-wise graph alignment of networks G1(V1, E1)
networks (▶ Protein-Protein Interaction Networks) is and G2(V2, E2) is formally defined as a mapping A from
a method of comparison of protein interaction data the vertex (node) set V1 to the vertex set V2; A: i 2 V1 !
between two or more species and is one of the methods i0 2 V2. An edge (link) (i, j) 2 E1 is told to be aligned to
of ▶ comparative analysis of molecular networks. The an edge (i0 , j0 ) 2 E2 if and only if A(i) ¼ i0 and A(j) ¼ j0 .
method represents the interaction data in a form The type of the mapping A distinguishes whether
of a graph (▶ Graph, ▶ Protein-Protein Interaction the alignment is considered global or local. The global
Networks, ▶ Interactome) and constructs a mapping graph alignment (▶ Global Network Alignment) is
between the nodes of the graph (proteins) and the defined by a (total) injective mapping A. Thus, each
corresponding links (protein–protein interactions). node in the smaller network is aligned to some node in
G 862 Graph Alignment, Protein Interaction Networks

RPLD

RPLP RPLK
YML6

MRPL16 MRPL19
HP1409 UVRA

RPLO
MRPL4 MRP7

MRPL10

bacterium yeast

Graph Alignment, Protein Interaction Networks, Fig. 1 A sparse with only several interactions described in the databases.
small part of protein–protein interaction networks The yeast network on the other side forms an almost complete
(Protein–Protein Interaction Network) of a bacterium clique. The graph alignment, which is denoted by dashed lines,
(Helicobacter pylori) and yeast (Saccharomyces cerevisiae) may be used to predict physical interactions among the bacterial
representing protein components of the ribosome. Each node genes. The figure is based on Sharan et al. (2005) and updated
represents a protein and is labeled by its symbol. The links using STRING db version 8.3 (http://string-db.org)
stand for protein–protein interactions. The bacterial network is

the larger network. No two nodes from the smaller protein–protein interaction networks. Nodes (proteins)
network may be aligned to the same node in the larger in the same equivalence class are considered function-
network. The local graph alignment (▶ Local Network ally analogous.
Alignment) is defined by a partial injective mapping A.
Thus, only elements of a subset of the node set V1 are Score of a Graph Alignment
aligned to some nodes in V2. Still, no two nodes from The alignment is scored by interaction similarity
one network may be aligned to the same node in the and protein similarity. Each equivalence class of
other network. In some definitions of the graph align- aligned proteins (or a pair of aligned proteins in case
ment, the injective property of the mapping is lifted, of a pair-wise graph alignment) contributes to a node
allowing for alignment of two or more nodes of one score (▶ Node Score, Graph Alignment) Sn, which
network to the same node in the other network. This rewards similarity of the aligned proteins (e.g., level
option allows for the representation of, for example, of their homology, say a BLAST bit score) and penal-
protein duplications. izes similarity among proteins not respected by the
A multiple graph alignment (▶ Multiple Network graph alignment and its equivalence classes. Aligned
Alignment) of N graphs Gi(Vi,Ei), i ¼ 1, . . ., N is an protein pairs contribute a positive link score (▶ Link
equivalence relation A over the nodes V ¼ V1 [ . . . [ Score, Graph Alignment) Sl if the interaction between
VN. An equivalence relation is transitive and partitions the proteins is conserved in all or some networks. Thus,
V into a set of disjoint equivalence classes. A local the interaction between equivalence classes (MRPL16,
alignment is a relation over a subset of the nodes in V; RPLP, MRPL16) and (YML6, RPLD, MRPL4) in
a global alignment is a relation over all nodes in V. Fig. 3 would contribute a positive link score (Link
Figure 3 shows an example of an alignment of three Score, Graph Alignment), as the protein–protein
Graph Alignment, Protein Interaction Networks 863 G
both yeast
bacterium bacterium
YML6 yeast fish
YML6
RPLD
RPLD
MRPL4

MRPL16 MRPL19 MRPL16 MRPL19


RPLP RPLK RPLP RPLK
MRPL16 RLX1

MRPL4 MRP7 MRPL4 MRP7


HP1409 UVRA HP1409 UVRA
— MRPL27
G
MRPL10 MRPL10
RPLO RPLO
MRPL8

Graph Alignment, Protein Interaction Networks, Fig. 2 A


more concise representation of the alignment in Fig. 1 is created Graph Alignment, Protein Interaction Networks,
by overlaying the aligned nodes and coding the modality of the Fig. 3 Illustration of an alignment of multiple networks. In
links by line types. Here, full strong lines stand for the interac- addition to the bacterium and yeast, the protein–protein interac-
tions present in both species, dashed lines for the interactions tion network of fish (Oryzias latipes) also has been aligned. The
present in yeast only, and dotted lines for the interactions present nodes correspond to the equivalence classes and the protein
in the bacterium only symbols in the three species are given (from the top: yeast,
bacterium, and fish). The protein–protein interactions present
in each of the networks are given in different line type. The
interaction is present in all three networks. On the other equivalence class consisting of MRPLl4 in yeast and HP1409
side, the interactions present in a single network only in the bacterium is not represented in fish
would be penalized by the link score (Link Score,
Graph Alignment) (e.g., interaction between the (https://www.mi.fu-berlin.de/w/LiSA/Natalie), which
classes (MRPL10, RPLO, MRPL8) and (MRP7, employs Lagrangian relaxation.
UVRA, MRPL27) in Fig. 3). The node score (Node Graph alignment heuristics have been proposed based
Score, Graph Alignment) and the link score on three main ideas: The first kind of algorithm forces
(Link Score, Graph Alignment) form the full scoring alignment of orthologous proteins (▶ Orthologs) in the
function (▶ Scoring Function, Graph Alignment) of PINs. Thus, these aligners use only the information on
the graph alignment S ¼ Sn + Sl. the nodes of the network. This approach allows to iden-
tify ancestral networks, network parts enriched in con-
Construction of a Graph Alignment served edges, or to decide between paralogous genes.
The graph alignment problem stands in finding the The second approach utilizes only the information
highest scoring graph alignment among all possible on the interaction patterns of the proteins. It allows to
alignments of the protein–protein interaction net- discover common regulatory motives in PINs and to
works. As opposed to the sequence alignment, finding study phylogeny. Similarity of the aligned proteins is
an optimal graph alignment is a computationally hard not required in this approach as common topological
problem. The underlying subgraph isomorphism prob- structures are searched for.
lem, which asks if one ▶ graph exists as an exact The third strategy relies both on aligning homolo-
subgraph of the other graph, is ▶ NP-hard. This gous proteins and on aligning topologically analogous
means that no exact and efficient algorithm can be subnetworks. This is the most complete approach,
found. All current methods use some heuristic algo- which allows direct evolutionary comparison of the
rithms (▶ Heuristic Optimization) to find the best graphs. Several algorithms have been proposed. For
graph alignment with a notable exception of Natalie excellent reviews of particular algorithms and their
G 864 Graph Alignment, Protein Interaction Networks

applications, see references Sharan and Ideker (2006), complexity of the data leads also to its weakness: the
Chen et al. (2009), Stumpf and Wiuf (2009), Cannataro computational cost of the algorithms for finding of
et al. (2010), and Pržulj (2011). the optimal graph alignment is large.

Purposes of a Graph Alignment Algorithms and Tools


Similarly to sequence alignment the graph alignment There is a large variety of graph alignment algorithms
provides a broad applicability. The local graph alignment and tools; tools with academic licensing include:
has been used for detection of conserved network • GRAAL (http://bio-nets.doc.ic.ac.uk/GRAAL_suppl_
motives (▶ Network Motif) or pathways (▶ Pathway) inf)
among different species, for detection of paralogous • Græmlin (http://graemlin.stanford.edu)
pathways in a single species or for a database • GraphAlignment (http://bioconductor.org/pack-
search in which a small query pathway is searched ages/bioc/html/GraphAlignment.html)
within a large protein–protein interaction network • IsoRank (http://groups.csail.mit.edu/cb/mna)
of a target organism. • Natalie (https://www.mi.fu-berlin.de/w/LiSA/Natalie)
Graph alignment may further help in decision on • PathBlast (http://pathblast.org)
protein orthology in cases where sequence homology
is not conclusive enough (e.g., when proteins have
a weak sequence homology only or when several Cross-References
▶ paralogs exist). The global alignment immediately
suggests functional orthologs across species. In addi- ▶ Co-expression
tion, the method of graph alignment allows detection ▶ Comparative Analysis of Molecular Networks
of a functional replacement of a protein by an ▶ Gene Regulatory Networks
unrelated nonhomologous protein (non-orthologous ▶ Global Network Alignment
gene displacement). The alignment may also be ▶ Graph
used to predict the function of a protein based on ▶ Heuristic Optimization
the functional annotation of other proteins in the same ▶ Information, Biological
equivalence class. ▶ Interactome
Global graph alignment may be used to recover ▶ Link Score, Graph Alignment
species phylogeny. Topology-only-based alignments ▶ Local Network Alignment
have the potential to provide a completely new, inde- ▶ Metabolic and Signaling Networks
pendent source of phylogenetic information. ▶ Multiple Network Alignment
While comparison of the PINs has been given the ▶ Network Alignment
major focus in the recent years, mainly because of good ▶ Network Motif
quality and availability of the data, many applications ▶ Node Score, Graph Alignment
emerged also in comparison of other Biological Network ▶ Orthologs
Models, including protein contact maps (▶ Protein ▶ Paralogs
Contact Maps), ▶ metabolic and signaling networks, ▶ Pathway
gene ▶ co-expression networks, and ▶ gene regulatory ▶ Protein Contact Maps
networks (▶ Comparative Analysis of Molecular ▶ Protein-Protein Interaction Networks
Networks). ▶ Scoring Function, Graph Alignment

Strengths and Weaknesses


As the graph alignment employs for detection of anal- References
ogy of proteins two pieces of biological information
(▶ Information, Biological) – similarity of the Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein
aligned proteins and similarity of the topology of interactions: technologies, databases, and algorithms. ACM
Comput Surv 43(1):1–36
their interactions – it is much stronger than sequence
Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
alignment and it may detect orthologous proteins also methods and applications in systems biology. Wiley, New
in case of a weak sequence similarity. However, the Jersey
Graph Mining 865 G
Pržulj N (2011) Protein-protein interactions: making sense of net- CH4 O2 H2
works via graph-theoretic modeling. Bioessays 33(2):115–123
Sharan R, Ideker T (2006) Modeling cellular machinery
through biological network comparison. Nat Biotech O C O
24(4):427–433
Sharan R, Ideker T, Kelley B, Shamir R, Karp RM (2005)
Identification of protein complexes by comparative analysis CO2 H2O
of yeast and bacterial protein interaction data. J Comp Biol
12(6):835–846 Graph Mining, Fig. 1 A molecule (carbondioxide) represented
Stumpf MPH, Wiuf C (2009) Statistical and evolutionary analysis as a graph (left) and a chemical interaction network depicting
of biological networks. Imperial College Press, London the oxidation of methane and hydrogen (right)

Graph Clustering

▶ Modules in Networks, Algorithms and Methods First, in the transactional graph mining setting, databases G
▶ Network Clustering of separate, independent graphs are considered.
For example, in a molecule database, molecules are
commonly represented using one vertex for every atom
Graph Mining and one edge for every bond between two atoms. Large,
publicly available databases of chemical compounds
Jan Ramon include the NCI dataset (http://cactus.nci.nih.gov/) and
Declarative Languages and Artificial the ZINC dataset (http://zinc.docking.org/).
Intelligence Group, Katholieke Universiteit Leuven, Second, in the single (large) network setting, all data
Leuven, Belgium is represented in one large, connected network. Exam-
ples of such networks include the Internet, social net-
works, citation networks, concept networks, computer
Synonyms networks, chemical interaction networks, gene regula-
tory networks, socioeconomic networks, and encyclope-
Learning from graph structured data; Network analysis dias. Sample datasets are publicly available at, among
others, http://snap.stanford.edu/data/. In a chemical
interaction network, molecules are represented by verti-
Definition ces connected by chemical reactions. The level of detail
and the exact representation may be different among
Graph mining is the study of how to perform ▶ data datasets. For example, chemical reactions may be
mining and machine learning (▶ Identification of Gene represented as separate nodes in the network with arcs
Regulatory Networks, Machine Learning) on data from/to the participating compounds, or they may be
represented with graphs. One can distinguish between, implicit, in which case compounds which are involved
on the one hand, transactional graph mining, where in the same chemical reaction are just connected
a database of separate, independent graphs is considered with an undirected edge. Next to networks of chemical
(such as databases of molecules and databases of compounds, it is also common to consider higher-level
images), and, on the other hand, large network analysis, networks such as protein interaction networks and gene
where a single large network is considered (such as regulatory networks. For example, in gene regulatory
chemical interaction networks and concept networks). networks nodes represent genes and arcs between
nodes indicate that one gene codes for a transcription
factor regulating the other gene. In comparison to the
Characteristics transactional setting, an important challenge in the single
network setting is that one’s beliefs on all data may be
Graph-Structured Data dependent on one another. Most traditional machine
In many applications, it is natural to represent data learning techniques assume that examples are drawn
with ▶ graphs. One can distinguish two main settings. identically and independently (i.i.d.) (Fig. 1).
G 866 Graph Mining

Other abstractions are sometimes preferred to graphs maximum common subgraph operators. Kernels on
in order to represent similar data, such as relational data- molecular graphs such as presented in (De Grave and
bases and logic. The domains focusing on data mining Costa 2010) can be used with any kernel-based learn-
using these representations are called relational data min- ing (▶ Learning, Kernel-based) method such as sup-
ing and inductive logic programming, respectively. port vector machines and Gaussian processes. Metrics
Representing data with graphs has several advantages. and maximum common subgraph operators can be used
First, the representation language is simple and therefore, in instance-based learning approaches, or as features for
allows for the fast development of algorithms. Second, a wide range of classification algorithms (Schietgat et al.
the representation language is expressive and adequate 2010).
for the majority of applications. Finally, there is a vast
literature on efficient graph algorithms. A potential dis- Methods for Analyzing Large Networks
advantage, especially in order to use algorithms Analyzing Overall Network Regularity
implemented only for simpler graph representation, is An important starting point for many methods
that it may be necessary to transform the data into for analyzing large networks is the observation that
a simpler (but equally expressive) graph format in large real-world networks, independently of the domain,
a preprocessing step. satisfy a number of statistical regularities. For example,
many networks satisfy the small world model, which
Transactional Graph Mining Methods informally corresponds to the fact that the number of
Graph mining methods cover the whole range of highly connected nodes is much smaller than the number
methods from data mining and machine learning. We of low degree nodes. Also, many networks can be clus-
only list here a few examples of methods which tered in modules of nodes which are much better
received significant attention in the literature. connected to each other than to nodes in other modules.
As a consequence, much inspiration has come from ran-
Graph Pattern Mining dom graph theory (Bollobás 2001; Durrett 2007) and
Graph pattern mining methods perform ▶ pattern mining spectral graph theory (Chung 1997), which study the
on graph-structured data, i.e., they list all patterns which statistical properties of such graphs. Alon (2007) dis-
satisfy some interestingness criterium such as being cusses motifs in biological networks and the surprising
frequent. A frequent pattern is a pattern which is deviation of frequencies of certain motifs from what one
a subgraph of at least a certain fraction of the transaction would expect if the given network were completely
graphs in the database. Well-known graph mining systems random.
are gSpan (Yan and Han 2002) and Gaston (Nijssen and
Kok 2004). Predicting Node Properties
A popular strategy for the application of these systems Often however, in addition to network-level regulari-
and related ones to quantitative structure–property rela- ties, also a more detailed node-by-node analysis of
tionship (QSPR) modeling (i.e., the modeling of the rela- a network is necessary. Several approaches aim at
tionship between the structure of molecules and their modeling properties of nodes in a network. First, in
chemical properties) is to first generate frequent molecular the field of statistical relational learning (Getoor and
fragments, then to generate one boolean feature per pat- Taskar 2007), probabilistic models are being studied
tern (with value 1 for molecules having the pattern as which allow to reason about beliefs of the properties of
substructure and with value 0 for other molecules), and individual nodes and their connections in a Bayesian
then to apply some suitable ▶ classification algorithm network manner. Second, semi-supervised learning
(such as a support vector machine (▶ Biomedical (Zhu and Goldberg 2009) aims at learning predictive
Decision Support Systems)) on these features. models exploiting not only the information about the
training examples but also the information about the
Comparing Graphs unlabeled examples. This is especially useful in net-
In order to compare small graphs, such as molecular works where nodes and their connections are known,
graphs, one can use graph kernels, graph metrics, and but not the value of some target attribute.
Graphical Gaussian Model 867 G
Cross-References
Graphical Gaussian Model
▶ Biomedical Decision Support Systems
▶ Classification Zhong-Yuan Zhang
▶ Data Mining School of Statistics, Central University of Finance and
▶ Learning, Kernel-Based Economics, Beijing, China
▶ Learning, Relational
▶ Learning, Supervised
▶ Learning, Unsupervised Synonyms
▶ Pattern Mining
Concentration graph model; Covariance selection
model
References
G
Alon U (2007) An introduction to systems biology. Chapman & Definition
Hall/CRC, Boca Raton
Bollobás B (2001) Random graphs. Cambridge University Press,
Cambridge Graphical Gaussian model (CGM) (Crzegorxczyk et
Chung F (1997) Spectral graph theory. AMS, Providence al. 2008; Hache et al. 2009; Werhli et al. 2006) is
De Grave K, Costa F (2010) Molecular graph augmentation an undirected graph whose nodes are genes and
with rings and functional groups. J Chem Inf Model
two genes are linked by an edge if there is an
50(9):1660–1668
Durrett R (2007) Random graph dynamics. Cambridge University interaction between them. The interactions are mea-
Press, Cambridge sured by the partial correlation coefficients condi-
Getoor L, Taskar B (2007) An introduction to statistical tioned on all the other genes. After the correlation
relational learning. MIT Press, Cambridge, MA
coefficients are determined, a statistical significance
Nijssen S, Kok J (2004) A quickstart in frequent structure mining
can make a difference. In: Proceedings of the 10th ACM test is employed, and the two genes whose score is
SIGKDD international conference, pp 647–652 significantly large are considered to have an inter-
Schietgat L, Costa F, Ramon J, De Raedt L (2010) Effective feature action. Otherwise they are considered to be condi-
construction by maximum common subgraph sampling.
Machine Learning 83(2):137–161
tional independence and there is no edge between
Yan X, Han J (2002) gSpan: graph-based substructure pattern them. Under the assumption that the data are dis-
mining. In: Proceedings of the 2002 international conference tributed according to a multivariate Gaussian distri-
on data mining (ICDM’02), pp 721–724 bution, the partial correlation coefficient of gene
Zhu X, Goldberg AB (2009) Introduction to semi-supervised
x and gene y is given as:
learning. Synth Lect Artif Intell Machine Learn 3:1–130

C1 xy
scoreðx; yÞ ¼  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C1xx Cyy
1

Graph Partitioning
1
▶ Modules, Identification Methods and Biological where C1
xy is the element of C , inverse of the covari-
Function ance matrix C of the data.
▶ Network Clustering

Cross-References
Graph Study of Metabolic Networks
▶ Identification of Gene Regulatory Networks,
▶ Topology of Metabolic Reaction Networks Machine Learning
G 868 Graphical Model

References References

Crzegorxczyk M, Husmeier D, Werhli AV (2008) Reverse engi- Koller D, Friedman N (2009) Probabilistic graphical models.
neering gene regulatory networks with various machine MIT Press, Massachusetts. ISBN 0-262-01319-3
learning methods. In: Emmert-Streib F, Dehmer M (eds)
Analysis of microarray data: a network-based approach.
Wiley GmbH, Weinheim, KGaA
Hache H, Lehrach H, Herwig R (2009) Reverse engineering of
gene regulatory networks: a comparative study. EURASIP
J Bioinform Syst Biol 2009:617281
Graphical Representation of Biological
Werhli AV, Grzegorczyk M, Husmeier D (2006) Comparative Processes
evaluation of reverse engineering gene regulatory networks
with relevance networks, graphical Gaussian models and ▶ SBGN
Bayesian networks. Bioinformatics 22(20):2523–2531

Green Systems Biology


Graphical Model
▶ Plant Systems Biology
Xiujun Zhang
Institute of Systems Biology, Shanghai University,
Shanghai, China
Green’s Theorem

Definition ▶ Partial Differential Equations, Poisson Equation

A graphical model is a way of representing probabilistic


relationships between random variables. In a graphical
model, variables are represented by nodes, conditional Grid Computing, Parallelization
independencies (independencies) are represented by Techniques
(missing) edges (Koller and Friedman 2009). Figure 1
gives a simple graphical model in which the nodes ‘A–E’ Giuseppe Agapito
indicate variables and the arrows indicate dependencies Department of Experimental Medicine and Clinic,
between variables. In this graphical model, ‘C’ depends University Magna Graecia of Catanzaro,
on ‘A’ and ‘B’, ‘E’ depends on ‘C’ and ‘D’. The two Catanzaro, Italy
most common forms of graphical models are directed
graphical models and undirected graphical models.
One of the most popular directed graphical model is Synonyms
Bayesian network.
Data parallel; Distributed data access; Distributed
data management; Integration technologies; Parallel
A B computing

Definition
C D

A Computational Grid is a collection of heterogeneous


computers and resources spread across the network mak-
E ing a confederation of multiple administrative domains
with the intent to provide users uniform access to these
Graphical Model, Fig. 1 Simple graphical model resources to reach a common goal. To provide an
Grid Computing, Parallelization Techniques 869 G
efficient use of resources of a Computational Grid, many Grid Computing is a special category of parallel
protocols to access resources have been developed. Each computing because it relies on a network of heteroge-
access protocol has unique security requirements and neous computers spread across the network that is in
implications for both the user and the resource provider. contrast to the traditional notion of a supercomputer,
Grid Computing was developed to provide scalable which has many processors connected by local
access to wide area distributed resources. high-speed connectors. The major advantage from the
Grid Computing is the easiness and cheapness
whereby it is possible to increase the computational
Characteristics power. The increasing of the computational power is
done by combining more resources such as traditional
Research in many areas of life sciences, such as geno- computers, clusters, and different kind of resources,
mics and proteomics, are particularly computationally thus obtaining a computational power similar to that of
intensive. This generated the need for a huge amount of a multiprocessor supercomputer but at a lower cost.
computational resources for running algorithms of Developing Grid applications presents significant G
ever-increasing complexity over data of ever- challenges, considering the high heterogeneity of
increasing size. Grid provides the optimal solution to resources and the dynamic behavior of Grid environ-
meeting many of the computational needs of research ments. Grid applications need to be designed to exploit
in life sciences as described in Krishnan (2004). One of the heterogeneous capacities of the available resources
the major challenges for the bioinformatics is to pro- and overcome the problems related to fluctuations in
vide the tools needed to analyze the sequences pro- both performance and availability of these shared
vided by whole genome sequencing. The existent data resources. In such context, it seems that writing effi-
are distributed in different domains (since data exper- cient and stable programs for the Grid is more difficult
iments are conducted in different laboratories and than write applications for traditional parallel machine.
research centers), reason for which Grids represent But nevertheless, it is possible to use classical pro-
the solution to develop applications able to speed up gramming techniques based on the exchange of mes-
the analysis of the whole genome, in order to predict sages to develop Grid applications. The most common
the function of a new gene, or identify important techniques are developed starting from the Message
regions in genomic sequences. Passing Interface (MPI). For scientific applications,
The computational Grids can be viewed such as the MPI specification is the most widely used, and
persistent environments that enable the realization of there exist several versions of MPI optimized for the
software able to integrate visualizations, computing Grid. The efficient and reliable execution of real sci-
and analysis resources belonging to different domains entific applications is the reason for which applications
and geographically distributed. for the Grid are developed using MPI (Nascimento
In Grid, the sharing is not limited to data, but it is et al. 2007). The Message Passing Interface Standard
extended to applications and hardware, and it is possi- is a message passing library, and its goal is to establish
ble to share the resources among different organiza- a portable, efficient, and flexible standard for writing
tions spread in the network. The access to resources is message passing programs.
regulated by different policies enabling the use to only To simplify the development of Grid-based appli-
authorized users. Furthermore, Grid allows the users to cations, the GridMPI and MPICH-G2 standards were
access and use resources and services spread across the developed.
network in a transparent way, without requiring that GridMPI is an implementation of MPI designed to
they are aware of the physical locations of the resource. obtain high-performance computing in the Grid.
In fact, supposing that a job submitted on a node of the GridMPI can establish multiple connections among
Grid crashes due to some reasons, the Grid automatically geographically distributed computers in order to obtain
resubmits the job on another available node. This is an a high computational power in an easy way.
advantage for the users that only have to submit their MPICH-G2 is a Grid-enabled implementation of
service requests at the Grid, which then automatically the MPI standard that allows the user to couple multi-
locates the available computing resources to serve the ple machines, potentially of different architectures, to
requests. run MPI applications.
G 870 Grid Computing, Parameter Estimation for Ordinary Differential Equations

Cross-References macromolecules and biological processes represented


by the models. Just to mention a few cases, these param-
▶ Cores eters can be the constants appearing in the Michaelis-
▶ GridMPI Menten kinetics (Nelson and Cox 2005), the association
▶ Message Passing Interface (MPI) constant for two proteins interacting to produce a protein
▶ MPICH-G2 complex or the diffusion coefficient of a protein between
▶ Parallel Computing two compartments (e.g., between the cytoplasm and
nucleus).
Due to the lack of experimental measurements,
References experimental errors, and biological variability, the
value of many of these parameters is yet unknown or
Krishnan A (2004) A survey of life sciences applications on the uncertain (Gunawardena 2010). Since variations of the
grid. New Gener Comp 2:111–125
parameter values can dramatically affect the system’s
Nascimento AP, Sena AC, Boeres C, Rebello VEF (2007)
Distributed and dynamic self-scheduling of parallel MPI trajectory over the phase space, the selection of
grid applications. Concurr Computat Pract Exper a “proper” set of parameter values is a crucial task for
19:1955–1974 the usability of a model as a tool for the in silico
investigation of the properties of the modeled system.

The Parameter Estimation Problem


Grid Computing, Parameter Estimation The parameter estimation problem of nonlinear
for Ordinary Differential Equations dynamical systems is stated as the minimization of
a cost function J(Y, Y∗) that measures the goodness
Ivan Merelli, Ettore Mosca and Luciano Milanesi of the model output Y∗ with respect to a given data set
Institute for Biomedical Technologies – CNR Y (experimental data), subject to constraints that are
(Consiglio Nazionale delle Ricerche), Segrate, the model itself f(dx/dt, x, p, t) ¼ 0 (where x are the
Milan, Italy variables and p parameters), the initial conditions
x(t0) ¼ x0, other possible algebraic equalities
g(x, p) ¼ 0 and inequalities h(x, p)  0, and the bounds
Synonyms over the parameter values pL  p  pU (Moles et al.
2003). Thus, it is mathematically defined as
Fitting of continuous and deterministic models; High a nonlinear programming problem with differential-
throughput computing algebraic constraints, shortly a NLP-DAE problem.

Global Optimization
Definition The nonlinearity and the constrained nature of the
system dynamics make usually these problems non-
The parameter estimation for ordinary differential equa- convex. This means that the NLP-DAE solution must
tions with grid computing refers to the exploitation of be searched with a global optimization (GO) method,
a combination of computer resources, usually character- since it is very likely that a local method would identify
ized by loose coupling, heterogeneity, and geographical a solution of local nature. GO strategies can be classi-
dispersion, to carry out the calculations required to fied in two broad classes, deterministic and stochastic:
identify a particular set of values for the parameters generally speaking, deterministic approaches can
included in continuous and deterministic models. assure a higher degree of assurance that the global
optimum will be reached, but no algorithm can guar-
antee the identification of the global optimum with
Characteristics certainty in finite time and, moreover, the computa-
tional cost increases very quickly with the problem
Systems biology kinetic models contain parameters that size; stochastic methods have weak theoretical guar-
usually describe physical and chemical properties of antees of convergence to global optimality, but can
GridMPI 871 G
locate the vicinity of it in modest computation time, after the previous swap. If this is the case, the script flags
and, moreover, are easy to implement and do not a specific field in the database that enables the exchange
require a transformation of the original problem when each process has completed the current step.
(Moles et al. 2003).
Evolution strategies (ES), a sub-class of nature-
inspired stochastic optimization methods belonging to Cross-References
the class of evolutionary algorithms (Fogel 2006), are
a good candidate to cope with NLP-DAE problems ▶ Convex Programming
exploiting grid platform. In fact, they have shown good ▶ Dynamical Systems Theory, Asymptotics and
performance when applied to parameter estimation in Singular Perturbations
biochemical pathways (Moles et al. 2003), and it is ▶ Evolutionary Algorithms
possible to implement them in a data parallel manner, ▶ Global Optimum
which is the best solution to obtain very good perfor- ▶ Grid Computing, Parallelization Techniques
mance with grid computing. More precisely, it is possi- ▶ High-Throughput Computing, Asynchronous G
ble to run different instances of an ES algorithm Communication
simultaneously, swapping periodically the best results ▶ Mathematical programming, Constraint
among the processes. Importantly, this approach speeds ▶ Mathematics, Nonlinear Programming
up the convergence to the optimal solution since a wider ▶ Ordinary Differential Equation (ODE)
search over the solutions space is performed. ▶ Parallel Computing, Data Parallelism

Distributed Implementation
To manage the distribution of each run of the ES References
algorithm on different computational resources and to
enable the asynchronous communication among these Fogel DB (2006) Evolutionary computation: toward a new phi-
losophy of machine intelligence. IEEE Press, Piscataway
runs, a software environment is required. A possible
Gunawardena J (2010) Models in systems biology: the parameter
solution, described in Mosca et al. (2008), involves problem and the meanings of robustness. In: Lodhi HM,
a relational database that stores the results achieved Muggleton SH (eds) Elements of computational systems
during each evolution process. More precisely, each biology. Wiley, Hoboken, pp 21–47
Moles CG, Mendes P, Banga JR (2003) Parameter estimation in
run is performed on a computational resource, which
biochemical pathways: a comparison of global optimization
can be completely independent of the others. The only methods. Genome Res 13(11):2467–2474
requirement is the availability of a communication Mosca E, Merelli I, Alfieri R, Milanesi L (2008) A distributed
network to establish a connection to the database. approach for parameter estimation in systems biology
models. Il Nuovo Cimento 32:165–168
Each run starts by contacting the main database to
Nelson DL, Cox MM (2005) Lehninger principles of biochem-
inform the system of its presence, then it randomly istry. WH Freeman, New York
initializes the population and runs the optimization.
After the accomplishment of each iteration, each pro-
cess stores its solutions along with the corresponding
cost function (that evaluates the quality of the solu- GridMPI
tion). If the other processes are ready to swap their
results, the algorithm downloads from the database Giuseppe Agapito
a subset of the solutions (e.g., sorted by the fitness Department of Experimental Medicine and Clinic,
value) from another randomly selected process and University Magna Graecia of Catanzaro,
adds them to its current set of results. Catanzaro, Italy
The coordination of the swap among the processes is
delegated to a script which runs in close association
with the database. This script queries the database to Definition
check if all processes are ready to exchange their indi-
viduals, which means that they have performed GridMPI is an open source project with the aim
a minimum number of iterations from the beginning or to provide an efficient and easy use of the MPI
G 872 Groove (G) Domain

(Message Passing Interface) standard in the Grid envi- to the major histocompatibility (MH) superfamily
ronments. GridMPI unlike from MPI comes with (MhSF) (▶ MH Superfamily (MhSF)). The G domain
a collection of functions and services specific for comprises the G-DOMAIN of the MH and the G-
the Grid environments that allow the user to over- LIKE-DOMAIN of the MhSF proteins other than MH
come in an easy way problems typical of the Grid, (or related proteins of the immune system (RPI)-
related with the development of Grid applications. MH1Like).
Typical examples regarding such problems often A G domain (G-DOMAIN or G-LIKE-DOMAIN)
include a connection between different resources is usually encoded by one exon of a gene. Two
and organizations in order to improve the compu- G domains participate to the characteristic groove
tation costs associated with scientific applications structure that, in the MH, binds a peptide. Each
tipically performed by limiting the latency time, G domain is very conserved and comprises four anti-
and managing and managing security problems parallel beta strands (floor of the groove) and a helix
automatically and in a transparent way. (wall of the floor). In the MH class I (MH1) or in the
RPI-MH1Like, the two G domains belong to the same
chain (MH1-Alpha, or RPI-MH1Like-Alpha), whereas
in the MH class II (MH2), the two domains belong
Cross-References
to different chains, MH2-Alpha and MH2-Beta
(Lefranc et al. 2005). The G-domain description per
▶ Grid Computing, Parallelization Techniques
receptor type and chain type, based on the ▶ IMGT-
ONTOLOGY concepts, is shown in Table 1. IMGT ®
labels are in capital letters.
Analysis of G-domain amino acid sequences can be
Groove (G) Domain performed by tools of IMGT®, the international
ImMunoGeneTics information system ® (http://www.
Marie-Paule Lefranc imgt.org) (▶ IMGT ® Information System).
Laboratoire d’ImmunoGénétique Moléculaire, IMGT/DomainGapAlign tool (Ehrenmann et al.
Institut de Génétique Humaine UPR 1142, Université 2010) aligns the user amino acid sequences with the
Montpellier 2, Montpellier, France closest G domains of the IMGT domain reference
directory, creates gaps according to the ▶ IMGT
unique numbering for G domain (Lefranc et al.
Synonyms 2005), delimits the strands, loops and helix, highlights
differences with the closest reference(s), and generates
G domain; G type domain; Groove domain the IMGT Colliers de Perles (▶ IMGT Collier de
Perles).
For MH proteins with known three-dimensional (3D)
Definition structures, IMGT Colliers de Perles are used for compar-
ison of pMH contact analysis (Kaas et al. 2008). Contact
The Groove (G) domain is a type of structural unit analysis between V-ALPHA and V-BETA (▶ Variable
(domain) that characterizes a protein chain belonging (V) Domain of the T cell receptor TR-Alpha, and

Groove (G) Domain, Table 1 G-domain description per receptor type and chain type
G domain Receptor type G-domain description per chain type
G-DOMAIN MH class I (MH1) G-ALPHA1 On the same chain
G-ALPHA2
MH class II (MH2) G-ALPHA
G-BETA
G-LIKE-DOMAIN MhSF other than MH (RPI-MH1Like) G-ALPHA1-LIKE On the same chain
G-ALPHA2-LIKE
Grouping 873 G
TR-Beta chains, respectively) and G-ALPHA1 and G- Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
ALPHA2 (G domains of the MH1-Alpha chain) or G- (2005) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
ALPHA and G-BETA (G domains of the MH2-Alpha DOMAIN. Dev Comp Immunol 29:917–938
and MH2-Beta chains, respectively) are available in the
3D database (IMGT/3Dstructure-DB) of the ▶ IMGT®
information system (Ehrenmann et al. 2010).

Groove Domain
Cross-References
▶ Groove (G) Domain
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering
▶ IMGT ® Information System
▶ IMGT-ONTOLOGY Ground Substance G
▶ MH Superfamily (MhSF)
▶ Variable (V) Domain ▶ Extracellular Matrix

References
Grounding
Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/3Dstructure-
DB and IMGT/DomainGapAlign: a database and a tool for ▶ Entity Mention Normalization
immunoglobulins or antibodies, T cell receptors, MHC, IgSF
and MhcSF. Nucleic Acids Res 38:D301–307
Kaas Q, Duprat E, Tourneur G, Lefranc M-P (2008) IMGT
standardization for molecular characterization of the T cell
receptor/peptide/MHC complexes. In: Schoenbach C, Grouping
Ranganathan S, Brusic V (eds) Immunoinformatics,
Immunomics reviews, series of springer science and business
media LLC. Springer, New York, pp 19–49 (Chap. 2) ▶ Abstraction
H

Hairpin Characteristics

▶ Stem-Loop Formation and Stability


The formation of a hairpin structure is dependent on
the stability of the resulting helix and loop regions. The
first prerequisite is the presence of a sequence that can
Hairpin Loop fold back on itself to form a paired double helix. The
stability of this helix is determined by its length, the
▶ Stem-Loop number of mismatches or bulges it contains (a small
number are tolerable, especially in a long helix), and
the base composition of the paired region. Pairings
between guanine and cytosine have three hydrogen
Hairpin Structure bonds and are more stable compared to adenine–uracil
pairings, which have only two. In RNA, guanine–
Yan Zhang uracil pairings featuring two hydrogen bonds are as
Key Laboratory of Systems Biology, Shanghai well common and favorable. Base-stacking interac-
Institutes for Biological Sciences, Chinese Academy tions, which align the pi orbitals of the bases’ aromatic
of Sciences, Shanghai, China rings in a favorable orientation, also promote helix
formation. An example of a simple hairpin structure
in RNA is shown in Fig. 1.
Synonyms The stability of the loop also influences the formation
of the hairpin structure. “Loops” that are less than three
Stem-loop structure bases long are sterically impossible and do not form.
Large loops with no secondary structure of their own
(such as pseudoknot pairing) are also unstable. Optimal
Definition loop length tends to be about 4–8 bases long. One
common loop with the sequence UUCG is known as
Hairpin structure is a pattern that can occur in the “tetraloop” and is particularly stable due to the base-
single-stranded DNA or, more commonly, in RNA. The stacking interactions of its component nucleotides.
structure is also known as a stem-loop structure. It occurs
when two regions of the same strand, usually comple- Structural Contexts
mentary in nucleotide sequence when read in opposite Hairpin structures occur in pre-microRNA structures
directions, base-pair to form a double helix that ends in an and most famously in transfer RNA, which contain
unpaired loop. The resulting structure is a key building three true stem-loops and one stem that meet in
block of many RNA secondary structures. a cloverleaf pattern. The anticodon that recognizes a

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
H 876 Half-Life

Krol J, Krzyzosiak WJ (2006) Structure analysis of microRNA


U C precursors. Methods Mol Biol 342:19–32
U A Wadkins RM (2000) Targeting DNA secondary structures.
Curr Med Chem 7(1):1–15
A G
A U
G C
Half-Life
C G
▶ Life Span, Turnover, Residence Time
C G
▶ Lymphocyte Population Kinetics
A U
G C
U A Half-life Time
C G Jean Clairambault
A C A A Equipe-Projet BANG, INRIA Paris-Rocquencourt
BP 105, Le Chesnay, Paris, France
Hairpin Structure, Fig. 1 Example of a hairpin structure
in RNA

Definition
codon during the translation process is located on one
of the unpaired loops in the tRNA. Two nested hairpin Half-life time (t1/2) of a drug in a tissue is the time
structures occur in RNA pseudoknots, where the loop it takes to decrease by one half from its maximum
of one structure forms part of the second stem. concentration. If the decay law is exponential with
Many ribozymes also feature hairpin structures. exponent l, i.e., obeys first-order kinetics with rate l,
dt ¼ lX, then t1=2 ¼ l . If the decay curve
i.e., dX ln 2
The self-cleaving hammerhead ribozyme contains
three stem-loops that meet in a central unpaired region looks like a sum of two exponential curves, then in
where the cleavage site lies. The hammerhead a phenomenological perspective it can be modeled as a
ribozyme’s basic secondary structure is required for solution curve to second-order kinetics or equivalently
self-cleavage activity. a chain of two first-order kinetics with exponents l and
The mRNA hairpin structure forming at the m (l > m), and then pharmacologists define a t1/2a and
ribosome binding site may control an initiation of a t1/2b (t1/2a < t1/2b) measurable from the decay curve,
translation. which are nothing but lnl2 and lnm2 , respectively.
Hairpin structures are also important in prokaryotic A semiphysiological interpretation can be found to
rho-independent transcription termination. The hairpin such kinetics, consisting of “distribution” (i.e., fast
loop forms in an mRNA strand during transcription diffusion) followed by slow elimination (renal, or by
and causes the RNA polymerase to become dissociated binding to plasma proteins). Sometimes are also
from the DNA template strand. This process is known mentioned three half-life times, t1/2a, t1/2b, and t1/2g
as rho-independent or intrinsic termination, and the when three consecutive episodes are clearly distin-
sequences involved are called terminator sequences. guishable in the decay curve.

References HAM/TSP
Svoboda P, Di Cara A (2006) Hairpin RNA: a secondary
structure of primary importance. Cell Mol Life Sci ▶ Human T-Lymphotropic Virus Type-I-associated
63(7–8):901–908 Myelopathytropical Spastic Paraparesis
HapMap 877 H
Cross-References
Hamartoma
▶ Explanation, Evolutionary
Barbara J. Davis
Section of Pathology, Tufts Cummings School
of Veterinary Medicine Biomedical Sciences, References
North Grafton, MA, USA
Frank SA (1998) The foundations of social evolution. Princeton
University Press, Princeton
Hamilton WD (1964) The genetical evolution of social behav-
Definition iour. J Theor Biol 7:1–52
Van Veelen M (2007) Hamilton’s missing link. J Theor Biol
Hartoma is a disorganized benign mass of 246(3):551–554
tissue found in the particular site at which it
develops and, therefore, considered a developmental
malformation.
Haplotype Map H
Cross-References ▶ HapMap

▶ Cancer Pathology

HapMap

Jingky Lozano-K€uhne
Hamilton’s Rule Department of Public Health, University of Oxford,
Oxford, UK
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Sciences et des Techniques, Université Paris 1 Synonyms
Panthéon-Sorbonne, Paris, France
Haplotype map; International HapMap project

Definition
Definition
Due to Hamilton (1964), the rule concerns the fix-
ation by natural selection of an altruistic behavior: The shortened term HapMap (from Haplotype Map) is
an act with cost c for its actor and benefit b for its generally used to refer to the International HapMap
beneficiary will evolve if and only if c < br, where project that aims to find genes affecting health, disease,
r is the ▶ relatedness of the beneficiary to the focal and responses to medications and environmental
individual. C and b are measured in terms of factors. The information produced by the project is
▶ fitness. Originally supposed to underpin the pro- stored in the HapMap database and made freely
cess of kin selection, it proved to be the general available to the public (Thorisson et al. 2005).
rule for the evolution of cooperation, given that
many cases where cooperation emerges among
non-kin behave according to such rule; the main
reason is that relatedness as such measures a statis-
References
tical association between individuals rather than
Thorisson GA, Smith AV, Krishnan L, Stein LD (2005) The
a degree of kinship, even if the latter yields ipso International HapMap Project Web site. Genome Res
facto an association. 15:1591–1593
H 878 Hardy-Weinberg Law

The hazard ratio can be calculated using the hazard


Hardy-Weinberg Law rate which are defined as
Prðt  T  t þ hjT  tÞ
Philippe Huneman lðtÞ ¼ limh!0þ (1)
h
Institut d’Histoire et de Philosophie (IHPST),
des Sciences et des Techniques, Université Paris 1 The hazard rate specifies the instantaneous rate at
Panthéon-Sorbonne, Paris, France which failures occur for items that are surviving at time
t and gives the risk of failure per unit time.
The hazard ratio is simply equal to HR ¼ lðtjgroup 1Þ
lðtjgroup 2Þ .
Definition If HR ¼ 1, the risk of occurrence of the event of
interest is the same in the two groups of patients. If
In a panmictic infinite population of diploid sexual HR > 1 (HR < 1), the risk of occurrence of the event of
individuals with no selection, the frequencies p and interest is higher in group 1 (in group 2, respectively).
(1p) of two alleles A and a (recessive) at one locus The hazard ratio can be calculated at different
would reach an equilibrium given by the Hardy- points in time. The hazard ratio differs from the
Weinberg formula: p2 AA, 2p (1p) Aa, (1p)2 aa. relative risk in the sense that this latter is calculated
The infinity of the population is requested in order to over the whole period of follow-up and not at a given
avoid the effects of genetic drift. time t.
When the hazard ratio is constant over time, the
hazards are said to be proportional. This is the assump-
Cross-References tion made in proportional hazards models.
An example of proportional hazards ratio is given in
▶ Explanation, Evolutionary Fig. 1. It represents the disease-free survival curves of
patients with acute leukemia classified into three risk
categories (Copelan et al. 1991). These categories were
References defined according to the status of the patients at the
time of transplantation as follows: acute lymphoblastic
Gillespie J (2004) Population genetics – a concise guide. The leukemia (ALL) (38 patients), acute myelocytic
John Hopkins University Press, Baltimore
leukemia (AML) low risk (54 patients), and AML
high risk (45 patients). The three curves are parallel
to each other showing that the hazards are constant
over time and thus proportional. Patients in group
Hazard Ratios AML high risk have the greatest chance of failure.
Patients in group AML low risk have the best
Sigrid Rouam prognosis.
Université Paris-Sud, Villejuif, France Figure 2 displays an example of non-proportional
hazards ratio. It represents the survival curve of
gastric cancer patients receiving two different treat-
Definition ments: chemotherapy (45 patients) and chemother-
apy plus radiotherapy (45 patients) (Stablein and
The hazard ratio is a measure commonly used in Koutrouvelis 1985). Clearly, the hazards are not
survival analysis to compare the risk of occurrence proportional. The figure shows that, at the begin-
of an event of interest (e.g., death) in two groups ning, patients receiving chemotherapy have a better
(e.g., treatment group vs. control group) at a given prognosis than patients with chemotherapy plus
time. For example, the hazards ratio can be used to radiotherapy. But after 2.7 years, the survival func-
describe the outcome of therapeutic trials where the tions of the two groups intersect, and patients
question is to what extent treatment can delay the with chemotherapy plus radiotherapy have a better
apparition of a disease. prognosis.
Hazard Ratios 879 H
Hazard Ratios,
1.0
Fig. 1 Disease-free survival
for the 137 bone marrow
transplant patients

0.8

0.6

Pr(T>t)
0.4

0.2
H
ALL
AML Low Risk
0.0 AML High Risk

0 500 1000 1500 2000 2500


t (days)

1.0 chemotherapy
chemotherapy + radiotherapy

0.8

0.6
Pr(T>t)

0.4

0.2

0.0
Hazard Ratios,
Fig. 2 Survival for the 0 500 1000 1500 2000 2500 3000
90 gastric cancer patients t (days)
H 880 Health Informatics

References References

Copelan EA, Biggs JC, Thompson JM, Crilley P, Szer J, Coiera E (2003) Guide to health informatics. Hodder, London
Klein JP, Kapoor N, Avalos BR, Cunningham I,
Atkinson K, Downs K, Harmon GS, Daly MB, Brodsky I,
Bulova SI, Tutschka PJ (1991) Treatment for acute myelo-
cytic leukemia with allogeneic bone marrow transplantation
following preparation with BuCy2. Blood 3(1):838–843 Helper T Cell 1 Response
Stablein DM, Koutrouvelis IA (1985) A two-sample test sensi-
tive to crossing hazards in uncensored and singly censored
data. Biometrics 41(3):643652 ▶ Th1 Response

Health Informatics Helper T Cell 2 Response

Guy Tsafnat ▶ Th2 Response


Centre for Health Informatics, Australian Institute for
Health Innovation, University of New South Wales,
Sydney, NSW, Australia
Heterochromatin

Definition Vani Brahmachari and Shruti Jain


Dr. B. R. Ambedkar Center for Biomedical Research,
Health Informatics is a broad concept that covers University of Delhi, Delhi, India
all aspects of the use of information technology (IT)
in the provision and implementation of health care
(Coiera 2003). This includes the use of IT to: Synonyms
• Facilitate communication at any level of a health
care system, from the communication between cli- Closed chromatin; Compact chromatin
nicians and patients all the way to share clinical
records from multiple electronic health record sys-
tems in a large-scale study Definition
• Implement and maintain electronic health record
management systems, ensuring appropriate privacy The heterochromatin is the compacted, tightly packed
and access to the data chromatin which is inaccessible to the transcription
• Implement and maintain monitoring and reporting machinery and is intensely stained with DNA binding
systems to ensure health service quality and patient dyes. Two types of heterochromatin are present in
safety cells: constitutive heterochromatin in the centromeres
• Develop decision support systems for clinicians, that largely remains transcriptionally silent in all cells
clinical researchers, and translational researchers and facultative heterochromatin which is transiently
Health Informaticians come from a range of silenced. Besides, being highly compacted, hetero-
background including social sciences, psychology, chromatin is enriched in epigenetic modifications of
human-computer-interaction and ergonomics, com- ▶ histone and DNA associated with transcriptional
munication science and linguistics, ethnography inactivity.
and information technology. As electronic health
record management systems become more preva-
lent, health informatics becomes more central to Cross-References
health systems enabling better monitoring and
quality control, improve decision making, and ▶ Epigenetics
communication. ▶ Histones
Heterochromatin and Euchromatin 881 H
Constitutive heterochromatin is observed in almost
Heterochromatin and Euchromatin all eukaryotic cells and localized at repeated sequences
and transposons and also localized at ▶ centromere
Yota Murakami and telomere, where repeated sequences are enriched.
Department of Chemistry, Hokkaido University, Constitutive heterochromatin is stably maintained and
Sapporo, Japan suppresses transcription and recombination of the
embedded DNA sequences. Facultative heterochroma-
tin is observed in higher eukaryote and represses tran-
Synonyms scription of protein coding genes during many cell
generations. Formation of facultative heterochromatin
Silent chromatin and active chromatin is developmentally regulated.
The chromatin higher order structures are defined
by covalent histone modifications (▶ Histone Post-
Definition translational Modification to Nucleosome Structural
Change) (Fig. 1). Heterochromatin is generally hypo-
Heterochromatin and euchromatin are two major acetylated. Constitutive heterochromatin is defined by H
categories of chromatin higher order structure. Hetero- methylation of lysine 9 of histone H3 and its binding
chromatin has condensed chromatin structure and is proteins, ▶ heterochromatin protein 1 family proteins.
inactive for transcription, while euchromatin has loose Facultative heterochromatin is defined by methylation
chromatin structure and active for transcription. Het- of lysine 27 of histone H3 and its binding partner,
erochromatin is further divided into two subcategories: polycomb family proteins (▶ Polycomb Complexes).
constitutive and facultative heterochromatin. Hetero- In contrast, euchromatin is generally hyper-acetylated
chromatin and euchromatin are defined by specific and avoids methylation of histone H3 lysins 9 and 27
histone modifications. Since heterochromatin can and has various states of histone modifications that
spread into neighboring euchromatic region and tightly correlate with the transcription activities.
repress gene expression, it is important to regulate Note that heterochromatin of budding yeast does not
boundaries between euchromatin and heterochroma- have the methylation mark of the histones and is
tin. Generally, the balance of euchromatic and hetero- defined by hypo-acetylated state and Sir proteins.
chromatic histone-modifying enzymes determines the Heterochromatin is “inactive” chromatin, which
boundary. At particular region, sequence specific ele- prevent DNA metabolism such as transcription and
ments and their binding proteins define the boundary. recombination (Fig. 1). The basis of the inactiveness
In addition, the boundary elements determine the has been thought the tight packaging of the nucleosome
domain structure of genome that is important for array, which prevents access of enzymes promoting the
wide range of regulation of transcription through asso- DNA metabolism. However, recent study suggests that
ciation of nuclear structure or self-association. not only the condensed structure but also recruitment of
“effector” proteins to heterochromatin by heterochro-
matin proteins are responsible for repression of the
Characteristics transcription (Grewal and Jia 2007). Intriguingly, an
effector that activates transcription is also recruited
▶ Chromatin forms various higher order structure to heterochromatin, suggesting that transcription in
and the structure is classified into two categories; the heterochromatin could be actively regulated
▶ heterochromatin and ▶ euchromatin. Heterochroma- (Grewal and Jia 2007). One of the mechanisms
tin has condensed chromatin structure: ▶ nucleosomes that regulates the recruitment of the effectors are
are regularly positioned and packed tightly. In contrast, post-translational modifications of heterochromatin
euchromatin has loose nucleosomes structure and the proteins, including phosphorylation, but the precise
extent of the packaging of nucleosomes varies from mechanism is not clear yet (Shimada and Murakami
region to region and dynamically regulated. 2010). Hence, heterochromatin is a dynamic
Heterochromatin is further divided into two catego- chromatin structure rather than an “inactive” static
ries, constitutive and facultative heterochromatin. chromatin structure.
H 882 Heterochromatin and Euchromatin

Heterochromatin and Histone modifications


Euchromatin,
Acetyl group
Fig. 1 Heterochromatin and
euchromatin Euchromatic modifications
nucleosome Euchromatin Heterochromatic modifications

Proteins involved in
DNA metabolism HP1 family proteins (H3K9me)
Polycomb family proteins (H3K27me)

Heterochromatin

Spreading

Heterochromatin
Histone
modifying
protein Euchromatin

Heterochromatin and Histone


Euchromatin, modifying
Fig. 2 Spreading of protein
Heterochromatin

Similarly, various combinations of histone modifica- which convert neighboring euchromatic nucleosome to
tions in euchromatin attract various kinds of proteins that heterochromatic state (Fig. 2). To avoid the uncontrolled
regulate transcription, resulting in wide range of tran- silencing of the gene expression by spreading of hetero-
scriptional states, from silent state to active state. chromatin, the “boundaries” between euchromatin and
One characteristic of heterochromatin is “spreading”; heterochromatin should be strictly regulated.
heterochromatin can autonomously spread into neigh- Heterochromatin and euchromatin are determined
boring euchromatic region, resulting in ▶ position effect by histone modification. Therefore, primary determi-
variegation of neighboring genes. The exact mechanism nant of the boundaries would be the competition
of spreading is not clear yet, but a simple model is widely between the heterochromatic modifying enzymes
accepted that heterochromatin protein recruits histone- and the euchromatic ones. Indeed, the competitive
modifying enzymes via protein-protein interaction, determination mechanism is observed at the boundary
Heterochromatin and Euchromatin 883 H
Heterochromatin and
Euchromatin, Histone Chromatin
modifying remodeling
Fig. 3 Boundary elements/
protein factor
barrier insulators

Heterochromatin

TFIIIC
CTCF Euchromatin
BEAF

Boundary element
(Barrier insulator) H

between telomeric heterochromaitn and its neighbor- Nuclear structure


ing euchormatin in budding yeast (Kimura et al. 2002).
Since competitive determination of the boundaries
Self-association
results in fluctuation of the boundaries, at certain
region of the genome, the boundary should be tightly
regulated to prevent the stochastic silencing of genes
close to heterochromatin by spreading. Some DNA
elements have been shown to have the barrier function
against heterochromatin spreading in various organ-
isms and are called boundary elements or barrier insu-
lators (Fig. 3). The boundary elements exhibit the
barrier function through their binding proteins. In bud-
ding yeast, tRNA genes, which are transcribed by RNA
polymerase III (▶ RNA Polymerase), prevent the Euchromatin
spreading of heterochromatin (Lunyak 2008; Sun
Heterochromatin
et al. 2011). The proteins assembled onto the tRNA
(Sun et al. 2011), including RNA polymerase III and its Boundary element binding protein
regulatory proteins are required for the barrier function.
In fission yeast, tRNA genes also act as the boundary
Heterochromatin and Euchromatin, Fig. 4 Chromatin
element (Scott et al. 2006). Interestingly, TFIIIC, domains and boundary elements
a tRNA specific transcription factor, act as the boundary
protein without tRNA transcription in fission yeast
(Noma et al. 2006). In higher eukaryote, several DNA
sequence specific DNA binding proteins, such as BEAF Search for the proteins that shows barrier activity
in Drosophila and CTCF in mammals, are shown to act when tethered close to heterochromatin identified
as boundary proteins. These proteins are thought to ▶ nuclear pore proteins (Ishii et al. 2002). This suggests
recruit specific proteins including chromatin remodeling that anchoring of genomic region to nuclear structure
factors or histone-modifying enzymes, to establish the including nuclear envelope is one of the determinants to
barrier against heterochromatin spreading (Fig. 3) establish the boundary. Supporting this idea, some of
(Lunyak 2008; Sun et al. 2011). the boundary proteins associate with nuclear structure
H 884 Heterochromatin Protein 1

(Vogelmann et al. 2011). In addition, tRNA genes as Sun JQ, Hatanaka A, Oki M (2011) Boundaries of transcription-
well as the boundary proteins including TFIIIC form ally silent chromatin in Saccharomyces cerevisiae. Genes
Genet Syst 86:73–81
cluster in nucleus (Noma et al. 2006). The association Vogelmann J, Valeri A, Guillou E, Cuvier O, Nollmann M
with nuclear structure or self-clustering of the (2011) Roles of chromatin insulator proteins in higher-order
boundary elements results in the formation of chro- chromatin organization and transcription regulation. Nucleus
matin loop, and this chromatin loop defines silent 2:358–369
heterochromatic domain and active euchromatic
domain of the genome. Therefore, the boundary
elements regulate not only the boundary between Heterochromatin Protein 1
heterochromatin and euchromatin but also domain
structure of chromatin in the nucleus, which might Yota Murakami
be important for regulation of whole genome orga- Department of Chemistry, Hokkaido University,
nization (Vogelmann et al. 2011) Fig. 4. Sapporo, Japan

Synonyms
Cross-References
HP1
▶ Centromere
▶ Chromatin
▶ Euchromatin Definition
▶ Heterochromatin
▶ Heterochromatin Protein 1 Heterochromatin protein 1 (HP1) is widely
▶ Histone Post-translational Modification to conserved protein that localized at heterochromatin.
Nucleosome Structural Change HP1 is a main component of constitutive heterochroma-
▶ Nuclear Pore tin that localizes on repeated sequences including
▶ Nucleosomes centromere and telomere. HP1 contains two distinct
▶ Polycomb Complexes domains, chromo domain and chromo-shadow domain.
▶ Position Effect Variegation Chromo domain of HP1 specifically recognizes di- or tri-
▶ RNA Polymerase methylated lysine 9 of histone H3. Chromo-shadow
domain promotes dimerization of HP1, which is thought
to be responsible for the packaging of nucleosomes in
heterochromatin. In addition, many proteins interact
References with HP1 through chromo-shadow domain (Grewal and
Jia 2007). Those proteins play various roles in the
Grewal SI, Jia S (2007) Heterochromatin revisited. Nat Rev
Genet 8:35–46
regulation of heterochromatin function (Fanti and
Ishii K, Arib G, Lin C, Van Houwe G, Laemmli UK Pimpinelli 2008).
(2002) Chromatin boundaries in budding yeast: the nuclear
pore connection. Cell 109:551–562
Kimura A, Umehara T, Horikoshi M (2002) Chromosomal gradi-
ent of histone acetylation established by Sas2p and Sir2p func-
Cross-References
tions as a shield against gene silencing. Nat Genet 32:370–377
Lunyak VV (2008) Boundaries. Boundaries. . .Boundaries??? ▶ Heterochromatin and Euchromatin
Curr Opin Cell Biol 20:281–287
Noma KI, Cam HP, Maraia R, Grewal SIS (2006) A role for
TFIIIC transcription factor complex in genome organization.
Cell 125:859–872
Scott K, Merrett S, Willard H (2006) A heterochromatin barrier References
partitions the fission yeast centromere into discrete chroma-
tin domains. Curr Biol 16:119–129 Fanti L, Pimpinelli S (2008) HP1: a functionally multifaceted
Shimada A, Murakami Y (2010) Dynamic regulation of hetero- protein. Curr Opin Genet Dev 18:169–174
chromatin function via phosphorylation of HP1-family pro- Grewal SI, Jia S (2007) Heterochromatin revisited. Nat Rev
teins. Epigenetics 5:30–33 Genet 8:35–46
Heuristic Search 885 H
The common disadvantages of heuristic optimiza-
Heuristic Optimization tion algorithms are as follows:
1. Not absolute optimal solution: Heuristic algorithms
Feng-Sheng Wang and Li-Hsunan Chen cannot guarantee to find the optimal solution.
Department of Chemical Engineering, National Chung 2. Uncertainty: The time required for finding a “near-
Cheng University, Chiayi, Taiwan optimal” solution can be large in an unlucky case.

Definition Cross-References

Heuristic designates a computational procedure that ▶ Evolution Programming


determines an optimal solution by iteratively trying to ▶ Genetic Algorithms
improve a candidate solution with regard to a given ▶ Particle Swarm Optimization (PSO)
measure of quality. Heuristics make few or no assump- ▶ Simulated Annealing
tions about the problem being optimized and can search ▶ Tabu Search
large spaces of candidate solutions toward finding opti- H
mal or near-optimal solutions at a reasonable computa-
tional cost without being able to guarantee either References
feasibility or optimality, or even in many cases to state
how close to optimality a particular feasible solution is. Michalewicz Z (1996) Genetic algorithms + data
structures ¼ evolution programs. Springer, Berlin
Heuristics implement some form of stochastic search
Reeves CR (1995) Modern heuristic techniques for combinato-
optimization, such as ▶ evolution programming, evolu- rial problems. McGraw-Hill, Londres
tion strategy, ▶ genetic algorithms, genetic program- Sharda R, Voß S, Woodruff DL, Fink A (2003) Optimization
ming, and differential evolution (Michalewicz 1996; software class libraries. Springer, New York, pp 81–154
Zhilinskas A, Žilinskas A (2008) Stochastic global optimization.
Reeves 1995; Sharda et al. 2003; Zhilinskas and Springer, New York
Žilinskas 2008). Other methods having a similar mean-
ing as heuristic are derivative-free, direct search, and
black-box optimization techniques.
Heuristic Search

Characteristics Long Jason Lu1 and Minlu Zhang2


1
Division of Biomedical Informatics, Cincinnati
Heuristic optimization algorithms are developed in all Children’s Hospital Research Foundation, Cincinnati,
kinds of forms variant from simple “trial and error” to OH, USA
2
complicated algorithms as evolutionary algorithms. The Department of Computer Science, University of
methods are easy to understand and easy to implement Cincinnati, Cincinnati, OH, USA
and use. The mathematical formulation of the problem is
flexible. Heuristic optimization can be applied to itera-
tively solve continuous/integer problem. They are usually Definition
applied when there is no known algorithm that guarantees
for finding the optimal solution in efficient computational Heuristic search refers to a search strategy that
cost (i.e., time or memory space) or when a “near-opti- attempts to optimize a problem by iteratively improv-
mal” solution is good enough in practical use. ing the solution based on a given heuristic function or
The common advantages of heuristic optimization a cost measure. A heuristic search method does not
algorithms are as follows: always guarantee to find an optimal or the best solu-
1. Fast: They can find a “near-optimal” solution in tion, but may instead find a good or acceptable solution
a short time. within a reasonable amount of time and memory space.
2. Small: They can work in a relatively small memory Several commonly used heuristic search methods
space. include hill climbing methods, the best-first search,
H 886 Hfq

the A* algorithm, simulated-annealing, and genetic References


algorithms (Russell and Norvig 2003). A classic
example of applying heuristic search is the traveling Brennan RG, Link TM (2007) Hfq structure, function and ligand
binding. Curr Opin Microbiol 10:125–133
salesman problem (Russell and Norvig 2003).
Chao Y, Vogel J (2010) The role of Hfq in bacterial pathogens.
Curr Opin Microbiol 13:24–33
Jousselin A, Metzinger L, Felden B (2009) On the facultative
Cross-References requirement of the bacterial RNA chaperone. Hfq Trends
Microbiol 17:399–405
▶ Biological Applications of Network Modules
▶ Modules in Networks, Algorithms and Methods

References Hierarchic Mechanisms


Russell SJ, Norvig P (2003) Artificial intelligence: a modern ▶ Mechanism, Multilevel
approach. Prentice Hall, Upper Saddle River

Hfq Hierarchical Agglomerative Clustering

Tatsuhiko Someya1, Nobukazu Nameki2 and Marie Lisandra Zepeda-Mendoza and Osbaldo
Gota Kawai3 Resendis-Antonio
1
University of Tsukuba, Tsukuba, Japan Center for Genomics Sciences-UNAM, Universidad
2
Gunma University, Kiryu, Japan Nacional Autónoma de México, Cuernavaca,
3
Department of Life and Environmental Sciences, Morelos, Mexico
Chiba Institute of Technology, Narashino,
Chiba, Japan
Synonyms

Synonyms Agglomerative hierarchical clustering; Agglomerative


hierarchical data segmentation; Bottom-up hierarchi-
Host factor 1 cal clustering

Definition Definition

Escherichia coli Hfq is a 102 residues protein Cluster analysis consists to classify a set of
that is first identified as a host factor of the RNA objects (observations, individuals, cases) into sub-
phage Qb (Chao and Vogel 2010). Hfq orthologous sets, called clusters, such that they have similar
proteins are conserved among approximately half of characteristics or properties. There are different
all sequenced Gram-positive and Gram-negative bacte- ways to define the similarities among objects or
ria (Brennan and Link 2007). Hfq protein forms variables through the use of different metrics.
homohexameric ring and binds to poly(A) and single- Some of them are as follows:
stranded AU-rich RNAs at two distinct binding surfaces • The single-linkage clustering, or nearest neighbor
(Brennan and Link 2007). In Gram-negative bacteria, clustering, takes into account the shortest distance
Hfq promotes base-pairing between sRNA and its target of the distances between the elements of each clus-
mRNA by inducing the RNA structural changes acting ter. This is one of the simplest methods.
as an RNA chaperone (Jousselin et al. 2009). The • The complete linkage clustering, or farthest neigh-
sRNA–mRNA interactions are known to regulate the bor clustering, takes the longest distance between
expression of several genes (Jousselin et al. 2009). the elements of each cluster.
Hierarchical Structure 887 H
• The average linkage clustering takes the mean of
the distances between the elements of each cluster. Hierarchical Structure
The merged clusters are the ones with the minimum
mean distance. Marie Lisandra Zepeda-Mendoza and Osbaldo
There are a variety of clustering algorithms; one Resendis-Antonio
of them is the agglomerative hierarchical clustering. Center for Genomics Sciences-UNAM, Universidad
This clustering method helps us to represent Nacional Autónoma de México, Cuernavaca, Morelos,
graphically the results through a dendogram. The Mexico
dendogram has a tree structure that consists of the
root and the leaves; the root is the cluster that has all
the observations, and the leaves are individual obser- Synonyms
vations. The agglomerative hierarchical clustering
starts with the individual observations and succes- Hierarchical modularity; Hierarchical organization
sively fuses the clusters that are closer together
(the most similar ones).
Definition H
Cross-References A hierarchical structure in a network context is
characterized by being a topological structure in
▶ Modules, Identification Methods and Biological which nodes are placed in different layers; and in
Function these layers there can be small and highly connected
modules. The nodes in one layer collect influences
from each other and from the nodes of the superior
References layer, while also being able to influence nodes of the
inferior layer.
Jain AK, Dubes RC (1998) Algorithms for clustering data. The transcriptional regulatory network of
Prentice Hall, Englewood Cliffs
Escherichia coli shows a clear example of this view
Jain AK, Murthy MN, Flynn PJ (1999) Data clustering: a review.
ACM Comput Rev 31(3):264–323 of hierarchical structure. The nodes are organized in
layers where every node of one layer can receive
inputs from nodes in upper layers and can be linked
to nodes in lower layers (Resendis-Antonio et al.
2005).
Hierarchical Models In metabolic networks, the high size-independent
clustering coefficient (which is evidence for modu-
▶ Mixed and Multi-Level Models larity) and the power law degree distribution (which
supports the scale-free model that would rule out
modular topology) pose an apparent contradiction
that can be solved by proposing the existence
of metabolic hierarchical modules (Ravasz et al.
Hierarchical Modularity 2002). This model presents a C(k)k1, and this
property can be used as a quantitative indicator of
▶ Hierarchical Structure hierarchy.
To better understand this model, imagine a starting
point of a small cluster of four densely connected
nodes. Next generate three replicas of the starting
module and connect them to the central node of the
Hierarchical Organization first starting cluster, obtaining a large 16-node module
made of 4 smaller modules. Then generate 3 replicas of
▶ Hierarchical Structure this 16-node module and connect the peripheral nodes
H 888 Hierarchy

to the central node of the first starting cluster. These


steps of replication and connection can be repeated High Throughput Computing
indefinitely (Ravasz et al. 2002).
▶ Grid Computing, Parameter Estimation for Ordinary
Differential Equations
Cross-References

▶ Hierarchy
▶ Modules, Identification Methods and Biological
Function Highly Structured System
▶ Module Network
▶ Complex System

References

Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL


(2002) Hierarchical organization of modularity in metabolic
networks. Science 297:1551
High-Performance Computing
Resendis-Antonio O, Freyre-Gonzalez J, Menchaca-Mendez R,
Gutierrez-Rios RM, Martinez-Antonio A, Avila-Sanchez C, ▶ Large-Scale and High-Performance Computing
Collado-Vides J (2005) Modular analysis of the transcrip- ▶ Multicore Computing
tional regulatory network of E. coli. Trends Genet 21:16

High-Performance Computing,
Hierarchy Structural Biology
Shihua Zhang
Piercarlo Dondi and Luca Lombardi
National Center for Mathematics and Interdisciplinary
Department of Computer Engineering and Systems
Sciences, Academy of Mathematics and Systems
Science, University of Pavia, Pavia, Italy
Science, Chinese Academy of Sciences, Beijing, China

Synonyms
Synonyms
Parallel computing
Hierarchical structure

Definition
Definition
High Performance Computing (HPC) refers to
Hierarchy is defined as an arrangement of items or
technologies used for implementing systems able to
simply an ordered set. In networks, hierarchy describes
execute time expensive elaborations and to manage
the hierarchical relationship among nodes (Ravasz
a huge amount of data in a small amount of time.
et al. 2002).
HPC solutions are commonly exploited in different
scientific fields that require the solution of complex
mathematical models, like climatology, physics,
References medicine, or biology. One of the most recent innova-
tions, which presents a good compromise between
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL
(2002) Hierarchical organization of modularity in metabolic hardware cost and performances, is the use of the
networks. Science 297(5586):1551–1555 ▶ GPU for parallel computation. This technology
High-Performance Computing, Structural Biology 889 H
High-Performance Serial Execution
Computing, Structural
Biology, Fig. 1 Serial versus
parallel computation, the
efficiency of parallelization
depends on the relative portion Op1 Op2 Op3 Op4 Op5
of parallelizable operations
with respect to the total
(Eq. 3). Not parallelizable Parallelizable Operations Not parallelizable
Operation Operation

Parallel Execution

Op2

H
Op1 Op3 Op5

Op4

supplies promising results in simulation and modeling processors must operate independently and have
of biological systems and in real-time medical a workload equally distributed between them; the
analysis. computational time has the priority, time needed for
the communications; and the synchronization between
processors and for data transfer has to be reduced to
Characteristics the minimum (Culler et al. 1998). Figure 1 shows an
example of serial and parallel execution of a simple
Parallel Computing algorithm.
HPC (▶ Large-Scale and High-Performance Comput- Nearly all situations in which a parallel
ing) comes from the need of more and more great implementation is useful can be reduced to two cases: a
computational power to elaborate complex mathemat- general case in which different independent instructions
ical models and to evaluate new scientific theories. can be executed on different data, or a single operation
Technical and physical constraints limit the maximum that must be executed on a large amount of data (e.g.,
speed reachable by a single CPU, so all HPC solutions meteorological analysis – each unit examines in the same
involve the use of parallel computing resources, in way a small portion of a geographic area) (Hord 1998).
which multiple processing units cooperate to complete In most cases, biological and medical simulations fall
a task in a time as small as possible. Not all types of into the latter category.
problems can be parallelized (e.g., a computation in The performance of a parallel algorithm are usually
which each step depends on the outcomes of the measured by two related indexes: the speedup (Eq. 1),
previous one) and also if the parallelization is possible, that is, the ratio between the execution time with one
some conditions must be satisfied to work properly. processor and that with N processors, and the efficiency
It is crucial to redesign the program (the so-called (Eq. 2), that is, the ratio between the speedup and the
porting of the application) in a parallel way: the number of processors:
H 890 High-Performance Computing, Structural Biology

1500

GeForce GTX 480


1250 GPU
CPU
Theoretical GPLOP/s

1000
Geforce GTX280

750

GeForce 8800 GTX


500

GeForce 7800 GTX Core i7-980X


250
Extreme Edition
GeForce 6800 Ultra Xeon DP 5160 Xeon DP X5460
GeForce FX5800
0 Core i7 940
Pentium 4
Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09

High-Performance Computing, Structural Biology, Fig. 2 Growing rate of computational power for CPU and GPU in terms of
single precision floating point operations per second – based on data presented in Kirk and Hwu (2010)

Tð1Þ entertainment, generated a growth rate of the computing


SðNÞ ¼ (1) power of GPUs incredibly high, higher than that of the
TðNÞ
CPU (Fig. 2) – actually a GPU is a multi-core device
SðNÞ composed by hundreds of simple processor units.
EðNÞ ¼ (2) Between 2001 and 2002, a crucial enhancement in
N
the architecture of the graphic cards was achieved: the
Their theoretical best values are, respectively, introduction of programmable modules. This evolution
S(N) ¼ N and E(N) ¼ 1, but actually these indexes gave the capability of a more complex interaction with
depend on the fraction of the algorithm that is the GPU, and allowed to employ a series of realistic
effectively parallelizable. Amdahl’s law (Eq. 3) graphic effects unreachable with the precedent models
defines the maximum reachable speedup: P is the based on fixed steps and instructions (Kirk and Hwu
parallelizable part of the code, where smaller is P 2010). The high level of flexibility lets the program-
the farther the speedup from the optimal value. In the mers consider a new series of applications that are not
worst case an increase of the number of processors directly involved the graphic, but could successfully
N generates a small (or null) enhancement of the exploit the great computational power of graphic hard-
performance (Culler et al. 1998): ware (e.g., matrices calculations or convolutions). The
research field that studies the use of GPUs for parallel
1 computation is called GPGPU (General-Purpose com-
SðNÞ  (3)
ð1  PÞ þ NP puting on GPU).
Even if first experiments performed promising
The available hardware for parallel computing is results and a good speedup, they were still limited by
heterogeneous: multi-core processors, distributed architectural constraints and by the complexity of
computing, clusters, ▶ grid computing, FPGAs, and, a low-level programming model. The interest of scien-
more recently, GPUs. tific community in this area (in optimal conditions
a GPU can increase the performances hundreds of
GPGPU times with respect to a CPU) led the main developers
The need of realistic and detailed 3D models and of graphic cards (ATI and Nvidia) to the creation of
environments in very different fields, from science to new models completely programmable and enabled for
High-Performance Computing, Structural Biology 891 H
parallel computing. At the end of 2006, Nvidia intro- intense parts of their tools obtaining significant
duced CUDA (Computer Unified Device Architecture) enhancements. The visualization of molecular orbitals
the first GPU architecture designed for allowing paral- (MOs) generally required up to hundreds of seconds on
lel computing (Kirk and Hwu 2010). Instead, for its a CPU; with the CUDA implementation a high-quality
hardware, ATI supplied Stream technology that adopts rendering is achievable in less than a second (Stone et al.
the standard OpenCL (Open Computing Languages) 2009). This time reduction allowed the creation of
(Munshi et al. 2011) developed by Khronos Group the first interactive animations of quantum chemis-
(a no-profit consortium that includes up to now ATI try simulation trajectories using real-time computa-
and Nvidia). At the moment CUDA directly supports tion. Another example involves the analysis of
specific extensions for the most common programming the interaction of biological molecules and ions.
languages (e.g., C/C++, Fortran, or Python) and it is The ion placement can be very computationally
the most diffused technology for GPGPU. Nvidia also demanding: large structures, such as viruses, could
provides a particular category of graphic card, the need several days even using moderately sized
Tesla series, created expressly for HPC. With respect clusters of computers. However the independency
to a traditional GPU, a Tesla card renounces at the of data makes this problem ideal for the porting on
video output for high reliability, for a major quantity GPU, for example, the Coulomb-based ion place- H
of memory, and for the capability of a continued and ment can reach a speedup of 100 times or more.
intensive use. In November 2010, three of the top five The implementation developed by UIUC reduces
supercomputers in the world (including the first) the time to generate large ionized structures to
employ Tesla cards. a few minutes on a single desktop computer
The best performance in parallel computing with (Stone et al. 2007).
a GPU can be achieved using it as a sort of stream In modern medical imaging, one of the most
processor, which means that a GPU is most efficient if important goals is the production of highly detailed
the same operation is applied on a large amount of images in a short period of time, in particular for
independent data. Typical applications that can take human scanning, to be able to give more quick
advantages of graphic acceleration involve image elab- diagnosis.
oration (e.g., filtering or ▶ mathematical morphology Techniscan Medical Systems recently created
operators), video processing, wheatear forecasting, a new system for ultrasound scanning, the Whole
geological modeling, fluid dynamics, medical imag- Breast Ultrasound (WBU™). However, even with
ing, molecular modeling, and biological structure a cluster of 16 computers equipped with the latest
simulations. Xeon processors, the procedure takes too much time
to examine numerous patients in a day. A new
Application Examples approach, that employs four Tesla GPUs, is able to
The GPU computing allows the researchers to use run Techniscan’s algorithm in less than 20 min, less
desktop computers for complex computations that than half the time taken by the cluster (Hardwick
up to now were a prerogative of large computer 2009). This speedup allows radiologists to perform a
clusters. In consequence the integration of graphic complete ultrasound scan and to see the results during
cards in a traditional cluster can supply the com- a 30-min patient visit. Also by the economic point of
puting power needed to cope with more complex view the hardware cost of a GPU solution is cheaper
classes of problems. The state of art and the prac- than adopting a traditional cluster.
tical impact of HPC with GPU in biology and
medicine can be illustrated by the following
examples. Cross-References
NAMD (Nanoscale Molecular Dynamics) and VMD
(Visual Molecular Dynamics), developed by the Uni- ▶ GPU
versity of Illinois at Urbana-Champaign (UIUC), are ▶ Grid Computing, Parallelization Techniques
two widely used tools for simulating and visualizing ▶ Grid Computing, Parameter Estimation for Ordinary
biomolecular processes and interactions. The UIUC Differential Equations
researchers ported on TESLA the most computationally ▶ Mathematical Morphology Operators
H 892 High-Throughput Computing, Asynchronous Communication

References
Hill Equation
Culler D, Singh JP, Gupta A (1998) Parallel computer architec-
ture: a hardware/software approach. Morgan Kaufmann,
Pramod R. Somvanshi and Kareenhalli V. Venkatesh
San Francisco
Hardwick J (2009) Embedded Tesla-using CUDA and Tesla in Department of Chemical Engineering, Indian Institute
a medical device. In: GTC09, GPU technology conference of Technology Bombay, Powai, Mumbai,
2009, 30 Sept–2 Oct, San Josè Maharashtra, India
Hord RM (1998) Understanding parallel supercomputing.
Piscataway, IEEE
Kirk D, Hwu W (2010) Programming massively parallel
processors: a hands-on approach. Morgan Kaufmann, Synonyms
Burlington
Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG,
Hill function; Hill kinetics
Schulten K (2007) Accelerating molecular modeling appli-
cations with graphics processors. J Comput Chem
28:2618–2640
Stone JE, Saam J, Hardy DJ, Vandivort KL, Hwu WW, Definition
Schulten K (2009) High performance computation and
interactive display of molecular orbitals on GPUs and
multi-core CPUs. In: 2nd workshop on general-purpose Several molecular interactions in cellular systems
processing on graphics processing units, ACM international exhibit sigmoidal response curve to variations in the
conference proceeding series, vol 383. ACM, New York, input concentrations. Such a response curve is
pp 9–18
typically represented by Hill equation as given below.
Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011)
OpenCL programming guide. Addison-Wesley Profes-  
sional, 1st edition, July 2011, Boston I nH
Y¼ (1)
K0:5 þ I nH
nH

where Y is the output response and I is the input


concentration. Hill equation involves two parameters,
Hill Coefficient ðnH Þ and half-saturation constant (K0.5).
High-Throughput Computing,
While Hill coefficient characterizes the sensitivity of the
Asynchronous Communication
response, the half-saturation constant quantifies the
threshold concentration required for 50% output
Ettore Mosca, Ivan Merelli and Luciano Milanesi
response. Hill equation is typically used to quantify
Institute for Biomedical Technologies – CNR (Consiglio
cooperativity, where the initial binding of an effecter
Nazionale delle Ricerche), Segrate, Milan, Italy
molecule (ligand, activator) to the receptor enhances
the binding of the forthcoming effecter molecules
(Goldbeter and Dupont 1990). This is observed in allo-
Definition
steric regulation of enzymes and of ligands binding to
their respective macromolecules. The equation describes
Asynchronous communication is a third component
saturation of the receptor molecules as a function of the
mediated communication in which sender and receiver
effector concentration (Klipp et al. 2005).
are not concurrently engaged in communication. The
main feature of asynchronous communication is the
transmission of data without the use of an external
signal for coordination, which results in a non-blocking
Characteristics
policy of the message exchange.
In 1910, Archibald Vivian Hill was the first to introduce
this equation to describe equilibrium relationship
between oxygen tension and the saturation of hemoglo-
References
bin with oxygen. He observed a sigmoidal binding
Eiben AE, Smith JE (2003) Introduction to evolutionary com- curve of oxygen with hemoglobin, which revealed the
puting, (Natural computing series). Springer, Berlin cooperative kinetics of O2 binding to hemoglobin.
Hill Equation 893 H
To mathematically represent this phenomenon, he intro- Substituting [RLn ] from Eqs. 3 in 5, we obtain
duced a cooperativity coefficient, which was termed as
Hill coefficient (Hill 1910, 1913). ½Ln
Y¼ n (6)
Hill Equation is a rate law, which is used to model ½L þ Kd
biological interactions that demonstrate sigmoidal
response. The equation is used to capture the biomolec- This is the Hill equation for the receptor-ligand
ular interaction that exhibit cooperativity among two binding kinetics. In biomolecular reactions, for
binding molecules. The bound subunit has the coopera- any input function (I), the above equation can be
tive effect on the binding of the next subunit by increas- written in terms of (I) by replacing (L), which
ing its affinity toward the binding region, which in turn gives the generalized form of Hill equation as
increases the rate of reactions as compared to the single given in Eq. 1.
receptor-ligand reaction (Weiss 1997; Klipp et al. 2005). The rates of the reactions with allosteric regulation
Hill equation is the general formalism used to study (that exhibit cooperativity) can be represented using
emergent properties, such as ▶ ultrasensitivity, ▶ ampli- Hill equation, as given below,
fication, and ▶ bistability, in biological networks.
Further, Hill equation is also used extensively in many Vmax I nH H
v ¼ Vmax Y ¼ (7)
PK-PD models to describe the nonlinear drug-dose nH
K0:5 þ I nH
response relationship (Goutelle et al. 2008).
where v and Vmax represent the net rate and maximum
Receptor-Ligand Binding Kinetics rate of reaction, respectively. The plot of the fractional
Hill equation represents the physicochemical equilib- saturation Y versus I yield a typical input-output
rium between the reacting species and can be derived response curve. The nature of the curve varies based
based on the ▶ law of mass action. We consider upon the Hill coefficient (Fig. 1). The Hill coefficient
a reaction of n number of ligands binding to of one gives a typical Michaelis-Menten hyperbolic
a receptor which is given by response while Hill coefficient greater than one yields
a sigmoidal curve with the inflection point at K0.5. The
R þ nL $ RLn (2) analogy of the receptor-ligand interaction kinetics yield-
ing Hill equation, can be applied to several biomolecular
Appling the law of mass action, the equilibrium reactions such as binding of transcription factor to the
dissociation constant Kd is given by promoter in a gene regulatory network, enzymatic reac-
tions in metabolic network, and phosphorylation cycles
½R½Ln in signaling pathways. This equation can also represent
Kd ¼ (3)
½RLn  the kinetics of repression of a biomolecular reaction
which is given by
where [L], [R] and [RLn] are the molecular concentra-  
nH
tions of the ligand, receptor, and ligand-receptor K0:5
Y¼ (8)
complex. At equilibrium condition, the total receptor K0:5 þ RnH
nH

concentration is given as,


where Y is the fractional rate of expression, R is the
RT ¼ ½R þ ½RLn  (4) concentration of repressor molecule, K0.5 is half-
saturation constant, and nH is the Hill coefficient
where, [RT] is the total receptor concentration. (Alon 2007). In this case, it can be noted that the output
The fractional saturation of the receptor (Y) is given response Y tends to zero as the repressor concentration
by the ratio of the number of bound receptors to the increases to infinity (see Eq. 8).
total number of receptors,
Parameter Estimation for Hill Equation
Hill coefficient ðnH Þ and the half-saturation constant
½RLn  (K0.5) are the two parameters used in the Hill equation.
Y¼ (5)
½RLn  þ ½R These parameters can be obtained by linearizing the
H 894 Hill Equation

Hill Equation, response


1 Positive cooperative
Fig. 1 Typical input-output rasens itive-Sigmoidal-
response obtained using Hill hH = 2 =Ult response
0.9 on cooperative
equation for different values of Vmax
nten -Hyperbolic-n
lis-Me
Hill coefficient ichae
0.8 2 = 1-M
hH e response
cooperativ
nsitive -Negative
0.7 hH =0.5-Subse

Fractional Response
0.6

0.5

0.4

0.3

0.2
K0.5
0.1

0
0 1 2 3 4 5 6 7 8 9 10
Input Concentration in Arbitrary Units (AU)

3 Input Fractional
in mM response
2.5
I Y
0.62 0.1 Slope = ηH =1.89
2
1.65 0.4
1− Y

1.5 2.3 0.6


Y

3.25 0.75
1
ln

6.6 0.9
0.5

0
−0.5 0 0.5 1 1.5 2
−0.5 ln I

−1
Y-intercept = − ηH ln K0.5 = −1.264
−1.5 K0.5 = 1.95 mM

Hill Equation, −2
Fig. 2 Graphical evaluation
of parameters of Hill equation −2.5

Hill equation (Eq. 1). The linearized form of Hill estimate the half-saturation constant (K0.5) (Covel
equation is given by: 1970). Figure 2 shows an example to demonstrate the
graphical evaluation of these parameters.
 
Y
ln ¼ nH ln I  nH ln K0:5 (9)
1Y Hill Coefficeint and Cooperativity
Hill coefficient provides the measure of cooperativity
By plotting the LHS of the equation against ln I one that can be quantified based on the steepness of the
can obtain the Hill coefficient (nH ) as the slope of the binding curve saturation (Goldbeter and Dupont 1990).
curve and the intercept of Y-axis can be used to The measure of the steepness of the curve is captured
Histone Chaperones 895 H
by Hill coefficient, which depicts the variation in the Hill AV (1913) The combinations of haemoglobin with oxygen
response curves from hyperbolic to sigmoidal. and carbon monoxide. Biochem J 7:471–480
Klipp E, Herwig R, Kowald A, Wierling C, Lehrach H (2005)
The Hill coefficient is computed based on the fold Systems biology in practice concepts, implementation and
change in input stimuli required to take a response application. WILEY-VCH, Weinheim
from 10% activation to 90% activation. Koshland DE (1987) Switches, thresholds and ultrasensitivity.
Trends Biochem Sci 12(6):225–229
Vinod PK, Venkatesh KV (2008) Quantification of signaling
log ð81Þ networks. J Indian Inst Sci 88:1
nH ¼   (10)
Weiss JN (1997) The Hill equation revisited: uses and misuses.
log II9010 FASEB J 11:835–841

A typical Michaelis-Menten hyperbolic response has


nH ¼ 1, which requires 81-fold change in the input stim- Hill Function
ulus to bring 90% of maximum response. A fractional
Hill coefficient, that is nH < 1, indicates negative ▶ Hill Equation
cooperativity where binding of one ligand decreases the
affinity for the binding of the other. The response gener- H
ated through negative cooperativity is termed as Hill Kinetics
subsensitivity where more than 81-fold change in the
input stimulus is required to obtain 90% of maximum ▶ Hill Equation
activation (see Fig. 1). The response indicates positive
cooperativity when nH > 1 leading to a sigmoidal
response, which is also termed as ▶ ultrasensitivity. In Histone Chaperones
such a case, less than 81-fold change in input stimulus
is required to obtain 90% of the maximum activation (see Toshiya Senda1 and Naruhiko Adachi2
1
Fig. 1) (Vinod and Venkatesh 2008; Koshland 1987). Biomedicinal Information Research Center (BIRC),
National Institute of Advanced Industrial Science
and Technology (AIST), Tokyo, Japan
Cross-References 2
Structure-guided Drug Development Project,
JBIC Research Institute, Japan Biological Informatics
▶ Amplification Consortium, Tokyo, Japan
▶ Bistability
▶ Law of Mass Action Definition
▶ Pathway Modeling, Metabolic
▶ Ultrasensitivity Histone chaperones are factors that interact with
histones and mediate nucleosome assembly, disassem-
bly, or both in an ATP-independent manner. They play
References critical roles in various nuclear events. The first
identified histone chaperone, nucleoplasmin, was
Alon U (2007) An introduction to systems biology-design
isolated in 1978 by Laskey (Wolffe 1998). Since
principles of biological networks. Taylor & Francis Group/
Chapman & Hall/CRC, Boca Raton then, many histone chaperones have been isolated
Covel ML (1970) Analysis of Hill interaction coefficients of (Eitoku et al. 2008). Histone chaperones can be
the kwon and brown equation. J Biol Chem 245(23):6335–6336 categorized by the preference of histone binding,
Goldbeter A, Dupont G (1990) Allosteric regulation, cooperativity,
namely, those with a preference for histones H3–H4
and biochemical oscillations. Biophys Chem 37(1–3):341–353
Goutelle S, Maurin M, Rougier F, Barbaut X, Bourguignon L, and those with a preference for histones H2A–H2B.
Ducher M, Maire P (2008) The Hill equation: a review of its Histone chaperones are involved in histone storage,
capabilities in pharmacological modeling. Fundam Clin histone transfer, histone exchange (between old and
Pharmacol 22:633–648
newly synthesized histones, and between canonical
Hill AV (1910) The possible effects of the aggregation of the
molecules of hemoglobin on its dissociation curves. J Physiol and variant histones), and nucleosome structural
40:4–7 change, which are required for most DNA-mediated
H 896 Histone Covalent Modification

reactions in eukaryotes. Several lines of evidence change have been studied as separate research fields,
have shown that histone chaperones play critical the findings of these two fields are being integrated into
roles in transcription, replication, and DNA repair a comprehensive picture as the molecular mechanisms
(Eitoku et al. 2008). linking histone PTMs to nucleosome structural change
continue to be discovered.
In 1964, Murray reported methylation of histones in
Cross-References the cell. This was probably the first report of histone
modification. Soon after that, the Allfrey group revealed
▶ Histone Post-translational Modification to that histones are subjected to acetylation and methylation
Nucleosome Structural Change after the completion of the histone polypeptide chain,
and that there is a correlation between histone acetylation
and transcriptional activation (Allfrey et al. 1964).
References Although this discovery suggested the connection
between histone post-translational modifications
Eitoku M, Sato L, Senda T, Horikoshi M (2008) Histone (PTMs) and nuclear events, the molecular mechanism
chaperones: 30 years from isolation to elucidation of
underlying the observed correlation was at that time
the mechanism of nucleosome assembly and disassembly.
Cell Mol Life Sci 65:414–444 unknown due to lack of a sufficient knowledge of the
Wolffe AP (1998) Chromatin, structure function, 3rd edn. chromatin template and biological signaling in the
Academic, San Diego nucleus. It took more than 40 years of effort for
researchers to begin to understand the molecular mech-
anisms connecting histone PTMs and transcription acti-
Histone Covalent Modification vation; for these breakthroughs to take place, substantial
advances were required in three research fields, the
▶ Post-translational Modifications, Histone structure of the chromatin template, factors involved in
nucleosome structural change, and nuclear signaling
with histone PTMs.
In 1974, Kornberg discovered the nucleosome struc-
Histone Modification ture (Kornberg and Lorch 1999). The nucleosome core
particle was considered to be a complex of the histone
▶ Post-translational Modifications, Histone octamer, which comprises two copies of each canonical
histone (H2A, H2B, H3, and H4), and DNA. As was
easily recognized from the tertiary structure of the nucle-
osome, the nucleosome structure inhibits enzyme reac-
Histone Post-translational Modification tions with DNA such as transcription, replication, and
to Nucleosome Structural Change DNA repair due to tight interactions between DNA and
histone proteins. Therefore, the nucleosome structure
Naruhiko Adachi1 and Toshiya Senda2 must be disassembled to initiate these reactions. On the
1
Structure-guided Drug Development Project, JBIC other hand, the nucleosome structure is necessary to
Research Institute, Japan Biological Informatics stably maintain genetic information in the nucleus. To
Consortium, Tokyo, Japan resolve the functional conflicts of the nucleosome, regu-
2
Biomedicinal Information Research Centre (BIRC), lation of nucleosome assembly and disassembly is
National Institute of Advanced Industrial Science and required.
Technology (AIST), Tokyo, Japan Two types of factors involved in the
nucleosome structural change, histone chaperones
and ATP-dependent nucleosome-remodeling factors,
Characteristics have so far been identified. Nucleoplasmin, which is
categorized as a histone chaperone, was the first factor
Although histone post-translational modification found to have a nucleosome structural change activity
(PTM) and factors involved in nucleosome structural (Wolffe 1998; Eitoku et al. 2008). Although many
Histone Post-translational Modification to Nucleosome Structural Change 897 H
histone chaperones have been found since then, their identified, as summarized in Fig. 1 (Allis et al. 2006).
biological significance had remained elusive until the These findings established the notion that histone
functional identification of the histone chaperone PTMs are recognized by their corresponding recognition
CIA/Asf1 (Eitoku et al. 2008). Genetic, biological, domains, which are likely to function as adaptor domains
and biochemical studies showed that CIA/Asf1 is linking histone PTMs and chromatin factors, such as
involved in various nuclear events such as transcrip- ATP-dependent nucleosome-remodeling factors and his-
tion, replication, and DNA repair through nucleosome tone chaperones.
structural change. Indeed, ATP-dependent nucleosome-remodeling
The first-identified ATP-dependent nucleosome- factors contain histone PTM recognition domains.
remodeling factor, the SWI/SNF complex, was isolated For example, the Rsc4 subunit of RSC contains tandem
as a complex that contained the gene product of swi2/ bromodomains that physically interact with acetylated
snf2. The swi2/snf2 gene was initially identified from Lys14 of histone H3 (H3-K14) on the histone tail
mutant yeast strains that showed the SWI and SNF region, suggesting a relationship between histone acet-
phenotypes. Since these phenotypes were not observed ylation and the activity of RSC. Genetic and biological
in mutant yeast strains harboring destabilized chromatin analyses have suggested that RSC is involved in
structures, the swi2/snf2 gene was considered to be site-specific nucleosome remodeling in transcription H
functionally relevant to the chromatin structure. in response to histone PTMs including acetylated
A purified protein complex including the swi2/snf2 H3-K14. The connection between acetylation of his-
gene product showed ATP-dependent nucleosome- tone proteins and nucleosome structural change seems
remodeling activity (Turner 2001; Elgin and Workman to be mediated by the tandem bromodomains in the
2001). After this finding, several ATP-dependent nucle- Rsc4 subunit.
osome-remodeling factors such as NURF, RSC, ACF, In contrast with ATP-dependent nucleosome-
CHRAC, NuRD/NURD, and INO80 were isolated. Bio- remodeling factors, no histone chaperones have histone
chemical and biological studies have shown that ATP- PTM recognition domains in the molecule. The
dependent nucleosome-remodeling factors adjust the Horikoshi group, however, discovered physical and
position of nucleosomes and are involved in nuclear genetic interactions between histone chaperone CIA/
events such as transcription and DNA replication Asf1 and double bromodomains in the general transcrip-
(Eberharter and Becker 2004; Elgin and Workman tion initiation factor TFIID (Chimura et al. 2002),
2001). leading to the idea that histone acetylation could
The last critical advance is attributed to functional regulate the function of the histone chaperone CIA/Asf1.
and mechanistic studies of histone PTMs. The discov- A molecular model that connects histone PTMs and
ery of a histone acetyltransferase and the subsequent transcription activation through nucleosome structural
identification of several histone modification enzymes change by histone chaperone CIA/Asf1 was proposed
accelerated studies of histone PTMs and their on the basis of two crystal structures, CIA/Asf1–H3–H4
functions (Allis et al. 2006). Since some transcriptional and CIA/Asf1–double-bromodomain complexes. Struc-
coactivators contain a histone acetyltransferase domain/ ture-based biochemical, genetic and molecular biologi-
subunit, the connection between histone PTMs and cal analyses have suggested that acetylated histones
nucleosome structural change that is required for tran- recruit histone chaperone CIA/Asf1 via the double
scription activation came to be recognized. It was pro- bromodomain in the TFIID complex to a promoter
posed that histone PTMs themselves affect the structure region, resulting in histone eviction around the promoter
of chromatin through changing the electrostatic and/or site (Fig. 2) (Akai et al. 2010).
physical properties of nucleosomes. However, this The correlation between histone PTMs and transcrip-
hypothesis seemed to be insufficient to explain all nucle- tion activation (through nucleosome structural change)
osome structural changes required for transcriptional was discovered about 45 years ago. In the past 45 years,
activation. The situation was advanced by the structural structural details of chromatin, the signaling system with
and functional analysis of bromodomains. These studies histone PTMs, and the structural and functional relation-
revealed that bromodomains recognize acetylated ship of histone chaperones and ATP-dependent chroma-
lysines on histone proteins. Following this finding, tin-remodeling factors have been revealed. These
several histone PTM recognition domains have been findings have begun to be integrated such that the
H 898 Histone Post-translational Modification to Nucleosome Structural Change

Histone Post-translational Modification to Nucleosome nucleosome-remodeling factors contain a histone PTM recogni-
Structural Change, Fig. 1 Histone PTM recognition domains. tion domain in the complex. (The relationships are shown by
Histone acetylation (red), methylation (cyan), and phosphoryla- thick purple lines.) ATP-dependent nucleosome-remodeling
tion (yellow) recognition domains are shown in blue, green, and factors are shown in purple with labels. (Names of the subunit
orange, respectively. The names of the domains are labeled. containing a histone PTM recognition domain are given in
Histone chaperones interacting with these histone PTM recog- parentheses.) No recognition domains have been reported for
nition domains are shown in red. Usually ATP-dependent histone ubiquitination (green) and arginine methylation (cyan)

Histone Post-translational Modification to Nucleosome Biochemical, genetic, and structural analyses indicate that the
Structural Change, Fig. 2 A schematic representation of the double bromodomain of TFIID recruits CIA to specific promoter
hi-MOST model (linking the biological signaling from histone regions. Recruited CIA evicts histones and promotes RNA poly-
modifications to structural change of the nucleosome). merase II entry. BrD bromodomain, Ac acetylation

connection between histone PTMs and the nucleosome and DNA dissociation from the nucleosome – remain
structural change can be reasonably explained at the elusive. In addition, much is unknown about the signal-
molecular level. However, our understanding of these ing mechanism with histone PTMs. Although the histone
molecular processes is still limited. For example, the code hypothesis was proposed to explain the signaling
details of the molecular mechanism for nucleosome dis- mechanism with histone PTMs (Allis et al. 2006), the
assembly by histone chaperones – such as H2A–H2B hypothesis is too simple to explain the results of several
Histopathology 899 H
mutational analyses of the histone tail region. In order to
understand the whole picture of histone PTM signaling, Histones
a macroscopic theory that considers the whole network
of histone PTM signaling is required. A combination of Vani Brahmachari and Shruti Jain
experimental and theoretical approaches should lead to Dr. B. R. Ambedkar Center for Biomedical Research,
a comprehensive understanding of molecular processes University of Delhi, Delhi, India
between histone modification and various nuclear events
including nucleosome structural change.
Synonyms

Cross-References Chromatin proteins; Nucleosome

▶ Histones
▶ Nucleosome Structure Definition
▶ Nucleosomes
▶ Post-translational Modifications Histones are highly basic proteins with molecular H
weights ranging from 11 to 23 kDa and form the
basic unit of ▶ nucleosomes. There are four core
References histones named H2A, H2B, H3, and H4; two subunits
of each interact to form the histone octamer. Histone
Akai Y, Adachi N, Hayashi Y, Eitoku M, Sano N, Natsume R, H1 is called the linker histone as it binds to the DNA
Kudo N, Tanokura M, Senda T, Horikoshi M (2010)
Structure of the histone chaperone CIA/ASF1-double
between two nucleosomes. The histone-DNA complex
bromodomain complex linking histone modifications and brings about the first level of compaction of the DNA
site-specific histone eviction. Proc Natl Acad Sci USA in the nucleus. The H1, H2A, and H2B are rich in
107:8153–8158 lysine amino acid, whereas H3 and H4 are rich in
Allfrey VG, Faulkner R, Mirsky AE (1964) Acetylation and
arginine. The amino-terminal tails of these proteins
methylation of histones and their possible role in the
regulation of RNA synthesis. Proc Natl Acad Sci USA extend beyond the nucleosome and, hence, are acces-
51:786–794 sible for covalent modification, while the octamer is
Allis CD, Jenuwein T, Reinberg D, Caparros M-L (2006) Epige- bound to DNA and participates in the interaction with
netics, 1st edn. Cold Spring Harbor Laboratory Press,
various other gene regulatory proteins.
New York
Chimura T, Kuzuhara T, Horikoshi M (2002) Identification and
characterization of CIA/ASF1 as an interactor of
bromodomains associated with TFIID. Proc Natl Acad Sci
Cross-References
USA 99:9334–9339
Eberharter A, Becker PB (2004) ATP-dependent nucleosome ▶ Epigenetics
remodeling: factors and functions. J Cell Sci 117: ▶ Nucleosomes
3707–3711
Eitoku M, Sato L, Senda T, Horikoshi M (2008) Histone
chaperones: 30 years from isolation to elucidation of the
mechanism of nucleosome assembly and disassembly. Cell Histopathology
Mol Life Sci 65:414–444
Elgin SCR, Workman JL (2001) Chromatin structure and gene
expression (Frontiers in molecular biology), 2nd edn. Oxford Jingky Lozano-K€uhne
University Press, Oxford Department of Public Health, University of Oxford,
Kornberg RD, Lorch Y (1999) Twenty-five years of the nucleo- Oxford, UK
some, fundamental particle of the eukaryote chromosome.
Cell 98:285–294
Turner BM (2001) Chromatin and gene regulation: molecular Definition
mechanisms in epigenetics, 1st edn. Wiley-Blackwell,
Hoboken
Wolffe AP (1998) Chromatin, structure function, 3rd edn. Histopathology is the study of microscopic structures
Academic, San Diego of diseased tissues (Crissman et al. 2004).
H 900 Holism

References emphasized the impossibility to study living beings


according to the mechanistic-analytic tradition (nowa-
Crissman JW, Goodman DG, Hildebrandt PK, Maronpot RR, days sometimes called reductionism), which purports to
Prater DA, Riley JH, Seaman WJ, Thake DC (2004) Best
understand systems by dividing them into their smallest
practices guideline: toxicologic histopathology. Toxicol
Pathol 32:126–131 possible or discernible elements and describing their
elemental properties alone. In the anti-mechanistic
view, known as “organicism” (the term “organicism”
refers likely to the properties of the whole organism,
Holism whereas “holism” is a more encompassing concept,
thus applicable to any level of organization in biology),
Alvaro Moreno living processes are studied in relation to the integrated
Department of Logic and Philosophy of Science, organized whole, (namely, the entire organism), rather
University of the Basque Country, than to any of its parts. Both Goethe and Kant (1790), for
San Sebastian, Spain example, considered that, contrary to what happens in
man-made machines, in a natural organism, every part
could neither be explained nor even exist, isolated from
Definition the whole. Each part must be an organ producing the
other parts and being produced by them.
Holism is the idea that whole entities are fundamental The debate between the mechanistic-reductionistic
components of reality and have an existence, which is and the organicistic-antireductionistic views traversed
irreducible to the sum of their parts. Thus, all the prop- the history of nineteenth- and twentieth-century biology,
erties of a given system cannot be determined or taking many different forms, as, for instance, the dispute
explained in terms of the properties of its component between mechanists and vitalists in the early decades of
parts alone. Instead, the system as a whole is what the past century. Yet, the modern history of holism in
determines in an important way how the parts behave. biology begins with the work of researchers like
Holism in biology is the theory that emphasizes that Rachevsky (1938), Bertalanffy (1952), and Elsasser
living entities can only be understood as wholes, because (1966), who argued for the necessity to adopt an inte-
they show global emergent properties that cannot be grated approach to deal with living systems (and other
attributed to specific or well-distinguishable parts. Sys- complex systems as well). British emergentists of the
tems are said to be holistic when the linear aggregation of early twentieth century, like Broad (1925), Morgan
their parts cannot explain the functioning of the system (1923), and Alexander (1920), are also important refer-
as a whole: “the whole is more than the sum of the parts” ences in the modern history of holism in biology and
as the main slogan of the sciences of complexity runs. philosophy of biology. (Actually, it was in this context
There are different types of holism, depending on that Smuts (1926) proposed the first modern version of
which kind of reductionism they are opposed to the concept of holism). The common idea of these authors
(Gatherer 2010). For example, some authors defend is that as the organization of systems becomes more and
holism because they are epistemological antireduction- more complex, it gets structured in levels, and new prop-
ists (e.g., that some phenomena are so complex that their erties and causal interactions appear, in addition to those
behavior cannot be deduced from the knowledge of their of the more fundamental levels. Their emergent proper-
fundamental properties), while others go beyond the ties are systemic features of biological systems which
epistemological argument and adopt ontological could not be predicted, despite a thorough knowledge of
antireductionism (e.g., that there exist, in certain com- the features of their parts and of the laws governing them.
plex systems, true emergent properties, often considered The experimental success of the research program of
endowed with specific, “downward” causal powers) molecular biology during the second half of the twentieth
(Andersen et al. 1996). century undermined anti-reductionist arguments and
caused holistic objections to fade away. At the end of
History the last century, however, the reductionist research pro-
Holism in biological sciences has its roots in German gram faced a dead end (Morange 2003). As knowledge
eighteenth-century biologists and philosophers, who on biological systems became more detailed and
Holism 901 H
fine-grained, researchers progressively realized that com- harmonized way, coordinating themselves at different
ponents were acting in strongly holistic ways. For exam- timescales, interacting hierarchically in local networks,
ple, at the end of the last century, research in the structure which form, in turn, global networks, meta-networks, etc.
of the genome faced increasing evidences that genetic This complex organization shows that, in fact, holism
components acted in a complex web of interactions. and mechanistic de-composition can be combined for the
Thus, as E. F. Keller has pointed out (2007), research purposes of biological explanation, as Bechtel has
in biology changed from a program inscribed in DNA recently pointed out (Bechtel 2010). In biological sys-
analysis to a new distributed (namely, more holistic) tems, holistic-emergent processes (which are continu-
program in which DNA, RNA, and protein components ously taking place) produce both dissipative patterns
operate alternatively as instructions and data. and more complex structures, which, in turn, are bound
To be sure, this turn has not only been a consequence to become selective functional constraints acting on the
of the internal development of biology but also of the dynamic processes that underlie those holistic processes.
recent development of new scientific tools alternative to Moreover, these functionally diverse constraints may
the traditional analytic-reductionistic methods. The give rise (once a certain degree of variety is reached) to
combination of increasingly powerful computers along new self-organizing holistic processes, which, in turn,
with new modeling techniques (such as cellular autom- may be functionally reorganized. In this way, an increase H
ata, genetic algorithms, Boolean networks, chaos, and in organizational complexity can take the paradoxical
dynamical systems theory) has allowed a blossom of form of an apparent “simplification” of the underlying
holism in modern science. All these deep innovations complicatedness, giving rise to levels of organization in
are affecting many scientific disciplines, giving rise to which a mechanistic de-compositional strategy might be
what is currently called the new “sciences of complex- locally applicable. Interestingly, these functional con-
ity.” This impact is especially important in biology, straints can be described as localized mechanisms (and
where new approaches like artificial life, synthetic biol- therefore, to a certain degree, they are amenable to
ogy, and systems biology constitute its main expression. functional de-composition) because they act as distin-
Interestingly, it is precisely in the field of systems biol- guishable parts (or collections of parts) related to partic-
ogy that the idea of holism and the debate between ular tasks performed in the system (for example,
reductionist and emergentist views has resurfaced with catalytic regulation). These two types of processes –
renewed force (Cornish-Bowden et al. 2005; Conti et al. the holism of the global network of processes and the
2007; Boogerd et al. 2007). (It must be admitted, how- local control devices/actions – are anyhow complemen-
ever, that the blossoming of holism in systems biology is tary: Both are required to produce and maintain the high
rather a reject of reductionism than a return to the holism level of complexity that characterizes biological sys-
of the 1930s, as emphasized by Gatherer (2010)). tems. Thus, de-composition and re-composition strate-
gies turn out to be complementary.
In sum, holism is a pervasive phenomenon in the
Characteristics world of complex systems. Yet, the high degree of
complexity of biological systems relies on a specific
Holistic phenomena occur even in relatively “simple” form of organization that combines holism and locally
physical or chemical systems: an ensemble of interacting differentiated functions and mechanisms. How to
units producing a global property, or pattern of behavior, understand this complex entanglement between emer-
that cannot be ascribed to any of them but only to the gent holistic processes and functionally localized
whole. However, as Keller (2007) has pointed out, in structures is probably the most difficult challenge of
these systems, the emergent, holistic pattern lacks any future research in systems biology.
form of functional differentiation. This is not the case in
biological systems, in which the presence of holistic
properties goes together with functional diversification. Cross-References
Compared to nonliving holistic systems (like ordinary
dissipative structures), biological systems involve ▶ Complex System
a much richer internal structure: They are made of parts ▶ Complexity
with different functionalities, acting in a selective and ▶ Emergence
H 902 Holm’s Method

▶ Interlevel Causation Definition


▶ Reduction
▶ Self-Organization Designed for multiple hypothesis testing, the Holm’s
method iteratively accepts and rejects hypotheses. The
Holm’s method is a close relative to the ▶ Bonferroni
References correction with slightly different threshold levels.
Let a be the determined significance threshold for
Alexander S (1920) Space, time, and deity. Macmillan, London rejecting the null hypotheses and k be the number of
Andersen PB, Emmeche C, Finnemann NE, Christiansen PV
hypotheses. Begin by ordering the k hypotheses by their
(1996) Downward causation: minds, Bodies and Matter.
Aarhus University Press, Aarhus respective p-values. Select the lowest p-value and com-
Bechtel W (2010) The cell, locus or object of inquiry? Stud Hist pare it to a/k. If the p-value is lower, then reject that
Philos of Biol Biomed Sci. doi:10.1016/j.shpsc.2010.07.006 hypothesis and perform the same selection with the
Boogerd F, Bruggeman F, Hofmeyr J-H, Westerhoff HV
(2007) Systems biology. Philosophical foundations. Elsevier,
remaining k1 hypotheses and a threshold of a/(k1).
Amsterdam Repeat the process until the selected p-value is not
Broad CD (1925) The mind and its place in nature. Routledge & smaller, at which time all remaining hypotheses should
Kegan Paul, London be accepted.
Conti F, Valerio MC, Zbilut JP, Giuliani A (2007) Will systems
By progressively adapting the threshold values, the
biology offer new holistic paradigms to life sciences? Syst
synth biol 1:161–165 Holm’s method gains power over the ▶ Bonferroni cor-
Cornish-Bowden A, Cardenas ML (2005) Systems biology may rection. Whereas in the Bonferroni correction all values
work when we learn to understand the parts in terms of the are thresholded relative to a/k, the Holm’s method uti-
whole. Biochem Soc Trans 33:516–519
lizes a/k, a/k1,. . .,a. Therefore, the probability of
Elsasser W (1966) Atom and organism. Princeton University
Press, Princeton rejecting a hypothesis with the Bonferroni method is
Gatherer D (2010) So what do we really mean when we say that less than or equal to the same probability for the
systems biology is holistic? BMC Syst Biol. doi:10.1186/ Holm’s method.
1752-0509-4-22
Kant I (1790/1952) Critique of judgment. Oxford University
Press, Oxford References
Keller EF (2007) The disappearance of function from ‘self-
organizing systems’. In: Boogerd F, Bruggeman F, Hofmeyr
J-H, Westerhoff HV (eds) Systems biology. Philosophical Holm S (1979) A simple sequentially rejective multiple test
foundations. Elsevier, Ámsterdam, pp 303–317 procedure. Scand J Stat 6(2):65–70
Morgan CL (1923) Emergent evolution. Williams and Norgate,
London
Rachevsky N (1938) Mathematical biophysics: physico-
mathematical foundations of biology. Dover Publications,
New York
Smuts JC (1926) Holism and evolution. Macmillan, London Holm’s Procedure
von Bertalanffy L (1952) Problems of life: an evaluation of
modern biological thought. Watts, London ▶ Holm’s Method

Holm’s Method
Holm-Bonferroni Method
Winston Haynes
Seattle Children’s Research Institute, Seatlle, ▶ Holm’s Method
WA, USA

Synonyms
Holobiont
Holm’s procedure; Holm-Bonferroni method; Sequen-
tially rejective Bonferroni test ▶ Microbiome
Hopf Bifurcation 903 H
Holoenzyme Recruitment Pathway Hopf Bifurcation

▶ PIC Assembly Pathways Xiaojuan Sun


Zhou Pei-Yuan Center for Applied Mathematics,
Tsinghua University of Beijing, Beijing, China

Homeostasis
Synonyms
▶ Life Span, Turnover, Residence Time
Poincaré-Andronov-Hopf bifurcation
▶ Lymphocyte Population Kinetics

Definition

Homeostatic Proliferation Hopf bifurcation is a local bifurcation in which a steady H


state of a dynamical system changes its stability, so that
▶ Modeling, Cell Division and Proliferation the appearance or disappearance of a periodic orbit occurs.
In a dynamical system that is described by ODE
model (▶ Ordinary Differential Equation (ODE))

dX
¼ F ðXÞ: (1)
Homogeneous Structures dt

▶ Cellular Automata Let X* be a steady state such that F(X*) ¼ 0 and J


the coefficient matrix (Jacobian matrix) of the system
when it is linearized near X*:


@ F ðXÞ 
Homologous Recombination J¼  : (2)
@ X  X¼X 
Myong-Hee Sung
Laboratory of Receptor Biology and Gene Expression, Suppose that all eigenvalues of J have negative real
National Cancer Institute, National Institutes of parts except one conjugate nonzero purely imaginary
Health, Bethesda, MD, USA pair b. A Hopf bifurcation arises when these two
imaginary eigenvalues cross the imaginary axis because
of a variation of the system parameters (Hale and Kocak
Definition 1991; Strogatz Steven 1994; Kuznetsov 2004). In the
critical situation when all eigenvalues of J have negative
Homologous recombination is a process in which two real parts except one conjugate nonzero purely imaginary
similar or identical DNA segments are swapped, pair, the system is said to have critical parameter value.
giving rise to a newly combined sequence of DNA.
References

Hale J, Kocak H (1991) Dynamics and bifurcations. Springer,


New York
Kuznetsov Yuri A (2004) Elements of applied bifurcation
Homoplasy theory. Springer, New York
Strogatz Steven H (1994) Nonlinear dynamics and chaos.
▶ Convergent Evolution Addison Wesley, Reading, Mass
H 904 Horizontal Genomics

Horizontal Genomics Host–Pathogen Interactions

Maureen A. O’Malley Karyala Prashanthi1 and Nagasuma Chandra2


1
Department of Philosophy, University of Sydney, Bioinformatics Centre, Indian Institute of Science,
Sydney, NSW, Australia Bangalore, Karnataka, India
2
Department of Biochemistry, Indian Institute of
Science, Bangalore, India
Synonyms

Lateral genomics Definition

The manifestation of a disease is critically dependent


Definition upon the interactions of pathogens with their
hosts (HPIs), which are complex and dynamic in nature.
Horizontal genomics is the large-scale study of the HPIs are multifaceted, with each species having to rec-
pattern and process of gene transfer between organisms, ognize, respond, and adapt to each other. Specific inter-
whether closely or distantly related. These investiga- actions occur between macromolecules of the host and
tions have considerable implications for whether the pathogen, some beneficial to the host as in pathogen
evolutionary history of organisms, particularly microor- elimination or suppression, while others are beneficial
ganisms, should be represented by a unique tree of life. to the pathogen, such as during initiation or progress of
an infection or even immune evasion. An understanding
Cross-References of the mechanisms of such interactions is necessary for
targeted development of prevention and control mea-
▶ Metagenomics sures against infectious diseases.
Upon entry into the host, pathogens interact with their
hosts for replication or obtaining nutrition. In response,
References the host system attempts to either eliminate the pathogen
by triggering its innate and adaptive immune responses
Bapteste E, Burian RM (2010) On the need for integrative or at least overcoming the exploitation of its resources by
phylogenomics, and some steps towards its creation. Biol the pathogen, by modulating availability of nutrients and
Philos 25:711–735 suppressing pathogenic virulence factors (Fig. 1).
Doolittle WF (1999) Lateral genomics. Trends Cell Biol 9:
M5–M8
The outcome that may range from pathogen clear-
Frost S, Leplae R, Summers AO, Toussaint A (2005) Mobile ance to asymptomatic carriage or active disease is
genetic elements: the agents of open source evolution. Nat a reflection of the properties of the microbe and the
Rev Microbiol 3:722–732 host’s ability to respond to it.

Host Adaptive Immune Response to HIV Characteristics


Infection
Systems Perspective of HPIs
▶ Systems Immunology, Adaptive Immune Response Pathogens have a formidable task of surviving and
to HIV Infection infecting their target host cells, which they do, by
manipulating the host’s complex cellular networks.
Host cells are equipped with their own armory against
pathogens. Thus, it is no surprise that for every move
Host Factor 1 a pathogen makes to exploit the host, a counter move
has been observed in the host. Needless to say, the
▶ Hfq reverse is also true, which means, that every move by
Host-mycobacterial Interactions

At the entry level At the virulence level At the nutrition level

Host Mycobacterial Virlence Factors Mycobacterium Host


receptor ligands
Host–Pathogen Interactions

Cell wall Antigen 85, LAM, MmaA4, PDIM, Tfr / Tf for iron acquisition IFN-γ and TNF-α
Fc-γ Receptor Opsonized PcaA downregultes Tfr
mycobacterium Cellular FadD33, Isocitrate lyase, LipF, and ferritin
metabolism Nitrate reductase, PanC / PanD expression
Complement Ag 85C, LAM Heat-shock protein HspX Utilizes cholesterol as carbon source IFN-γ inhibits all
receptor 3 (CR3) Iron uptake Mycobactin other source of
Magnesium uptake MgtC Activate isocitrate lyase to utilize carbon by excluding
Mannose receptor ManLAM Regulation DevRS, HspR, IdeR, MprAB, WhiB3 two carbon source via the Mtb phagosome
PhoP, RelA, SigA, SigE, SigF, SigH glyoxalate shunt Pathway from recycling
Secreted proteins ESAT-6 / CFP-10 endosomes
Scavenger Unknown
Secretion system ESX-1, ESX-5 Lpd and SucB play critical roles in the
receptors
Stress protein AhpC, KatG, SodA, SodC, synthesis of acetyl-coenzyme A, which
Toxin Phospholipase C is essential for the gloxylate shunt
Unclassified Erp, HbhA, PE / PE-PGRS pathway

At the critical host cell At the immune


pathway level evasion level

Host cell Mycobacterial components inhibiting Immune Mycobacterial components inhibiting


pathways cellular pathways response immune response

Phagosome (1) PtpA dephosphorylation of VPS33B, 2) Protein kinase G inhibits RNI noxR1, noxR3, peptide methionine sulfoxide reductase catalyse
maturation maturation of the mycobacterial phagosome, (3) SapM depletes the reduction of methionine sulfoxide. AhpC, Lpd, SucB and the
PI3P mycobacterial phagosomal membranes and preventing thioredoxin-like AhpD forms an antioxidant complex with NADH
phagosome-lysosome fusion, (4) ManLAM blocks phagosomal dependent peroxidase and peroxynitrite reductase activity.
maturation. ROI Lsr2 protects mycobacteria against ROI. Mycobacterial Superoxide
Ca Signalling (1) ManLAM inhibits the increase in cytosolic Ca2+ / calmodulin dismutase and Catalase degrade ROI in the phagosome.
And PI3K concentration, by blocking the association of Ca2+ / calmodulin MHC class ll Mycobacterial 19-kDa lipoprotein inhibits activation of the MHC
pathway with Pl3K and thereby prevent the recruitment of EEA1 to antigen processing class ll expression activated by IFN-γ. Intraphagosomal production
phagosomes. (2) Mycobacterium inhibits SK1 which phosphorylates of Mtb urease is responsible for disruption of MHC class ll
Sphingosine to S1P which regulates intracellular. Ca homeostasis trafficking leading to intracellular retention.
by releasing Ca from cytoplasmic organelles. Apoptosis of Two mycobacterial proteins inhibit apoptosis of infected
p38 and ERK Pathogenic mycobacteria have evolved mechanisms to prevent a the infected macrophage: secA2 and nuoG.
MAPK signalling sustained activation of the ERK1 / 2 and p38 cascades, and this macrophage
pathway accounts, at least in part, for their survival.
Adaptive immune M. tuberculosis avoids elimination by the host immune system
IFN-γ and JAK / Cells that are infected with virulent mycobacteria show response by delaying the onset of adaptive responses until the
STAT pathway decreased levels of the IFN-γ receptor, which impairs the
bacteria have established a protected niche in which they can persist
tyrosine phosphorylation of JAK1 / 2 and reduces the
permanently
905

DNA-binding activity of STAT


NF-kB pathway ManLAM inhibits NF-kB pathway.
H

Host–Pathogen Interactions, Fig. 1 An example of the multifaceted nature of host–pathogen interaction


H
H 906 Host–Pathogen Interactions

Range of host-pathogen interactions Methods

Experimental methods: Epidemiological studies,


host population genetics
Computational methods: Agent-based modelling,
Population
differential equations, epidemiological models
level
Experimental methods: Dynamic live microscopic
techniques, fluorescence activated cell sorting
Cell-cell interaction
Computational methods: Boolean or agent-based
level
modelling, differential equations

Experimental methods: Metabolomics using mass


and NMR spectroscopy, microarray analysis,
Pathway or network level proteomics, siRNA analysis
Computational methods: Graph analysis, integrated
network topological / quantative analysis
Experimental methods: High throughput microarray
Genome wide transcriptional or protein analysis, proteomics, siRNA analysis, metabolomics
level Computational methods: Clustering and network
inference

Experimental methods: Molecular biology


Protein-protein / ligand level techniques, X-ray crystallography, FRET
Computational methods: Docking, molecular
dynamics

Host–Pathogen Interactions, Fig. 2 Experimental and computational methods to study host–pathogen interactions at different
levels of biological hierarchy

the host system in response to an infection can be (h) biofilm formation to protect themselves against
countered by the pathogen as well. Thus, it becomes antibiotics and host defense mechanisms (Brogden
a complex interplay between the two species, making it et al. 2007). All of these processes are complex sys-
often difficult to predict the winner. Appreciation of tems in themselves involving several molecules
the enormity of HPIs and how they influence the out- including toxins and extensive cross talk among
come of an infection requires the study of the individ- them, some of which can be readily obtained from
ual components as one connected system (Forst 2006). KEGG and other bio-systems’ databases.
Recent developments in “omics”-scale experimental Pathogenesis of cholera by V. cholerae occurs
and computational technologies have facilitated the through a series of spatio-temporally controlled events
study of systems biology of HPIs. Figure 2 illustrates under the control of a gene cascade termed the ToxR
different levels at which HPIs are studied. regulon that encodes the virulence factors. A systems
biology study of the temporal regulation of gene expres-
HPI in Virulence Mechanisms sion in V. cholerae using high-resolution time series
Several pathogens utilize virulence factors in them to genomic profiling (Kanjilal et al. 2010) provided insights
inhibit various host functions, to achieve colonization, not only into the temporal dynamics of the ToxR regulon
immuno-evasion, or immune-suppression, through one but also identified potential new members of the process.
or more of the following: (a) adherence to host cell
involving adhesins (e.g., ManLAM in Mycobacteria) HPI for Adherence to or Entry into Host Cells
(b) invasion through invasins (e.g., collagenase pro- Pathogens must first adhere to host cells so as to colo-
duced by Clostridium histolyticum and neuraminidase nize at appropriate sites. In its simplest form, attach-
by Vibrio cholerae and Shigella dysenteriae) that dam- ment to the host cell requires two factors: a ligand and a
age host cells to facilitate growth and spread of the host cell surface receptor. Alteration of the host’s cellu-
pathogen, (c) avoidance of phagolysosome formation, lar morphology through actin polymerization or use of
(d) autophagy, (e) preventing complement activation, specialized structures in bacteria such as pili, fimbriae,
(f) nutrient acquisition through specialized receptor or capsules is an example of mechanisms used by path-
systems (e.g., Tfr/Tf in Mycobacteria), (g) quorum ogens toward this (Pizarro-Cerdá and Cossart 2006).
sensing and communication with other bacteria, and This initial phase then leads to a complex host–pathogen
Host–Pathogen Interactions 907 H
molecular cross talk that enables them to reach their Although most microorganisms can synthesize
appropriate intracellular niches, subversion of cellular organic molecules that they need, they are critically
functions, and establishment of disease. dependent on the host for a few compounds. Due to the
Endocytosis and phagocytosis are extremely abundance of these resources in the host, microorgan-
dynamic processes and involve more than 200 proteins. isms may have lost genes required for their biosynthe-
Some pathogens trigger receptors that can activate these sis. Chlamydia, for example, is entirely dependent on
processes, while some others secrete toxins that enter the host for supply of tryptophan, an essential nutrient.
the host cell and activate these processes. Specific sets However, the host has devised a defense strategy to
of interactions not only trigger specific signaling cas- limit tryptophan through IFN-g-mediated activation of
cades but also determine factors such as tissue tropism, indoleamine 2,3 dioxygenase (IDO), to catabolize
species specificity, and genetic specificity. Specific sig- L-tryptophan to n-formylkynurenine.
naling events bring about particular cellular responses
such as stimulation of innate and adaptive immunity, HPI in Critical Host Cell Pathways
growth, proliferation, survival, and apoptosis. Pathogens often hijack one or more host intracellular
An example of a systems biology study of pathogen pathways in the host, for their survival (Bhavsar et al.
entry is reported for Chlamydia pneumonia, which 2007). Salmonella and Listeria, for example, utilize the H
illustrates the complexity of HPIs during entry (Wang host cytoskeleton to enter and move within host cells.
et al. 2010). Nine functional modules that were activated S. flexneri, Yersinia spp, and M. tuberculosis provide
upon exposure to the pathogen, consisting of 135 molec- examples of inhibition of a signaling cascade (NF-kB
ular components, involved in cell adhesion, transcription, pathway), the latter having been activated in the host in
endocytosis, and receptor systems, were incorporated response to the infections through the use of Toll-like
into a network capturing known inter-pathway cross receptors and other pattern recognition receptors.
talks. The network was observed to be significantly resil- The c-Met signaling network that is mediated by the
ient since intervention at any single point alone was hepatocyte growth factor (HGF) in normal physiology
insufficient to prevent entry. Instead manipulation of to achieve mitogenesis, motogenesis, and morphogen-
a combination of three key proteins (a chemokine recep- esis gets activated by the Helicobacter pylori virulence
tor, an integrin receptor, and a platelet-derived growth factor CagA as well. However, with the latter, it is
factor) was found to inhibit pathogen’s entry. highly correlated with gastric cancer. A systems level
study based on comparative logical modeling identi-
HPI for Nutrition fied intervention points for CagA-induced but not
The human host provides an appealing ecosystem for HGF-induced c-Met signaling, which were subse-
numerous microorganisms, with a variety of adaptations, quently validated experimentally (Franke et al. 2008).
to harness the existing nutrient resources. Bacterial path-
ogens that colonize extracellular niches often face envi- HPI for Elimination of Pathogen by Host or Immune
ronments with frequently changing physical conditions Evasion by Pathogens
and nutrients. However, intracellular pathogens replicat- Exposure to a pathogen leads to layers of defense
ing in cytoplasm or phagosomal compartments encoun- responses in the host. For each level of defense, path-
ter a more congenial environment for growth. Several ogens have designed ways to evade them (Henderson
nutrient acquisition adaptations are also known in intra- and Oyston 2003). For example, countering innate
cellular bacteria (Schaible and Kaufmann 2005). immune responses can be seen in the following cases:
Iron being an essential growth factor for both host H. pylori counteract acidic environment in the stomach
and pathogen is associated with coevolution of mutual by secreting urease which increases the pH surrounding
high affinity iron uptake and retention systems. the bacterium. Some bacteria such as Streptococcus
Mycobacteria, for example, block phagosomal matu- pneumoniae have antiphagocytic substances in their sur-
ration and are present in an early phagosome and faces to inhibit phagocytosis. Catalase and superoxide
accesses host cellular iron through the host transferrin dismutase synthesized by bacteria such as H. pylori and
receptor/transferrin (Tfr/Tf) system. The host counters Staphylococci scavenge reactive oxygen intermediates.
this by Interferon-g activation, which downregulates Examples where adaptive immunity is evaded include
Tfr and ferritin levels. cleaving and inactivation of IgA through IgA1 proteases
H 908 Host–Pathogen Interactions, Mathematical Models

by Streptococci and Neisseria spp., and over-activation Brogden KA, Minion FC, Cornick N, Stanton TB, Zhang Q (eds)
of T-cells by Staphylococci to produce toxic levels of the (2007) Virulence mechanisms of bacterial pathogens,
1st edn. ASM Press
pro-inflammatory cytokines, through the use of super- Devignot S, Sapet C, Duong V, Bergon A, Rihet P, Ong S, Lorn
antigens. There are also reports about the pathogen trig- PT, Chroeung N, Ngeav S, Tolou HJ, Buchy P, Couissinier-
gering complex interactions among the host subsystems Paris P (2006) Genome-wide expression profiling deciphers
that are detrimental to the host itself. For example, the host responses altered during dengue shock syndrome and
reveals the role of innate immunity in severe dengue. PLoS
mechanisms of systemic vascular dysfunction in Dengue ONE 5(7):e11671
Shock Syndrome were correlated with interplay of innate Forst CV (2006) Host-pathogen systems biology. Drug Discov
immunity, inflammation, and host lipid metabolism Today 11(5–6):220–227
(Devignot et al. 2006). Franke R, Muller M, Wundrack N, Gilles ED, Klamt S, Kahne T,
Naumann M (2008) Host-pathogen systems biology: logical
modelling of hepatocyte growth factor and Helicobacter pylori
Application in drug and vaccine discovery induced c-Met signal transduction. BMC Syst Biol 2(1):4
Knowledge of the molecular mechanisms involved in Hartman AL, Ling L, Nichol ST, Hibberd ML (2008) Whole-
HPIs can be utilized in disease diagnosis, treatment, or genome expression profiling reveals that inhibition of host
innate immune response pathways by Ebola virus can be
prevention, in a number of ways. For example, in reversed by a single amino acid change in the VP35 protein.
Ebola virus, substitution of a single amino acid in the J Virol 82(11):5348–5358
VP35 protein is sufficient to disrupt the viral inhibition Henderson B, Oyston PCF (eds) (2003) Bacterial evasion of host
of innate immune signaling, while maintaining the immune responses, 1st edn, Advances in molecular and cellular
microbiology (2). Cambridge University Press, Cambridge
ability to replicate to wild-type levels in cell culture Kanjilal S, Citorik R, LaRocque RC, Ramoni MF,
(Hartman et al. 2008), which has led to exploration of Calderwood SB (2010) A systems biology approach to
mutant varieties as vaccine candidates. modeling Vibrio cholerae gene expression in virulence-
Specific host mechanisms are also being explored to inducing conditions. J Bacteriol 192(17):4300–4310
Pizarro-Cerdá J, Cossart P (2006) Bacterial adhesion and entry
target drug therapy, by (a) preventing exploitation of into host cells. Cell 124(4):715–727
host proteins for pathogen’s replication or (b) enhanc- Schaible UE, Kaufmann SHE (2005) A nutritive view on the
ing hosts’ natural pathogen elimination mechanisms host-pathogen interplay. Trends Microbiol 13(8):373–380
such as production of interferons, the latter is achieved Tan SL, Ganji G, Paeper B, Proll S, Katze MG (2007) Systems
biology and the host response to viral infection. Nat
through agonists of certain toll-like receptors such as Biotechnol 25(12):1383–1389
7-thia-8 oxoguanosine (TLR7 agonist) that is used for Wang A, Johnston SC, Chou J, Dean D (2010) A systemic
HCV treatment (Tan et al. 2007). network for Chlamydia pneumoniae entry into human cells.
Although systems level studies and hence applica- J Bacteriol 192(11):2809–2815
tions in drug and vaccine discovery emanating from
them are not as yet commonplace, perhaps due to dif-
ficulties in comprehensive model building and experi-
mental design, the need for studying them as whole Host–Pathogen Interactions,
systems has become firmly established. Some examples Mathematical Models
from literature are already showing the power of these
approaches, with a very high potential to translate into Sumanta Mukherjee1 and Nagasuma Chandra2
1
clinically useful applications in the coming years. Bioinformatics Centre, Indian Institute of Science,
Bangalore, Karnataka, India
2
Department of Biochemistry, Indian Institute of
Cross-References Science, Bangalore, India
▶ Host-Pathogen Systems, Target Discovery
Definition
References Systems approaches are essential to understand the
Bhavsar AP, Guttman JA, Finlay BB (2007) Manipulation of
complex web of host–pathogen interactions (HPIs)
host-cell pathways by bacterial pathogens. Nature that determine the outcome of an infection. Mathemat-
449(7164):827–834 ical modeling helps enormously to study emergent
Host–Pathogen Interactions, Mathematical Models 909 H
properties of the system, dissect the role of individual (▶ Host–Pathogen Interactions, Fig. 2), the most well
components and their interactions, to understand studied being epidemiological level models and
systems difficult to study experimentally and to predict molecular level models.
outcomes under a range of scenarios and ultimately to
identify strategies for countering the disease.
HPIs constitute typical multicomponent interacting Characteristics
systems, making model-building a daunting task.
Given that both host and pathogen are entire systems Epidemiological Models
themselves, their interaction can be viewed as a system Models in this category provide a parametric represen-
of systems. The complexity in HPIs, as in other tation of evolution of the geographical and population-
biological systems, arises through feedback and feed wide spread of an epidemic, over a period of time,
forward controls, bistability, as well as activatory and addressing questions such as (a) which population is
inhibitory mechanisms. Evaluation of system parame- more susceptible to a given disease (b) which strain of
ters is often challenging, and it is a common practice a pathogen is more likely to spread and cause disease
to derive them from experimental observations. and (c) which are the key parameters that control the
Currently available models of HPIs span across differ- epidemic, hence leading to identifying strategies for H
ent levels of hierarchy in biological organizations disease diagnosis and control.
The SIR model and its variants are the most
commonly used models in this category, in which,
the population is split into three compartments
bI g (Anderson and May 1992), which are (S (t)), suscepti-
S I R ble to disease; (I (t)), actively infected population; and
(R (t)), population recovered or not vulnerable to
Host–Pathogen Interactions, Mathematical Models, disease (Fig. 1). The infectivity rate (b) and the
Fig. 1 Compartmental view of the SIR model recovery rate (g) are both assumed to be constants.

SIR model

1
S (Susceptible population)
S / I / R population concentration
(normalized to the scale of 1)

I (Infected population)
0.8 R (Recovered population)

0.6

0.4

0.2

20 40 60 80 100 120 140 160 180


time (days)

Host–Pathogen Interactions, Mathematical Models, populations. The population axis has been normalized between
Fig. 2 Simulation output of SIR model. The Red curve shows 0 and 1 representing (S,I,R) concentrations as a fraction of the
the time evolution of the susceptible population over time. Blue population
and Green lines show profiles of recovered and infected
H 910 Host–Pathogen Interactions, Mathematical Models

For short-period and low-fatality infections, the Host–Pathogen Interactions, Mathematical Models,
model equations are summarized in (Eq. 1), and a Table 1 Choice of modeling strategies
typical simulation output is shown in Fig. 2. System parameter Time Modeling strategy
Continuous Continuous ODE, PDE
Equation 1 Differential equations representing Continuous Discrete Difference equation
a simple SIR model. Discrete Continuous Network model, integer
programming
dS Discrete Discrete Agent-based modeling
¼ bSI
dt
insights into the system dynamics. Boolean logic
dI functions are used for describing interactions.
¼ bSI  gI When the interacting components are represented in
dt
the vector form as, x ¼ ðx1 ; x2 ; . . . ; xn Þ 2 f0; 1gn ,
the time updates defined by a set of Boolean
dR functions, time updates are represented as
¼ gI
dt xi ðt þ 1Þ ¼ fi ð
xðtÞÞ V i 2 f1 . . . ng, then the dynamics
of the network is represented by means of a transition
The extent of infectivity in the population will graph. A Boolean network with N variables has 2N
depend upon rates of infectivity, recoverability, and distinct states, each state being a set of unique
initial susceptible population. The parameter Ro combination of system variables. Formally,
(Basic Reproductive Ratio) is generally defined as ! s f0; 1gn f0; 1gn shows the synchronous
(Ro ¼ bSg ), capturing the expected number of sec- transition of the states. Boolean systems too exhibit
ondary infections arising from a single infected attractor dynamics, similar to other deterministic
individual. A number of case studies illustrate the dynamical systems. An attractor is a distinct set of
usefulness of the SIR model. Studies on the Bom- states, occurring in the time evolution of the system.
bay plague epidemic (1905–1906), the influenza If xs is an attractor point, then xs ¼ f ðxs Þ, where f is the
epidemic in an English school (1978), the Eyam Boolean time evolution rule, which implies that once
plague episode in England, all show very good attained, it remains in that state. Cyclic attractors on
correlation with predictions by their respective the other hand recur in sequence, once any of its states
SIR models (Murray 2005). is visited. A given system can have multiple attractors.
Different initial states eventually converge to one or
Modeling Interacting Networks of Biochemical the other attractors. The set of states which leads to
and Biological Species a particular attractor is collectively called the basin of
This set of mathematical models capture interactions attraction for the given attractor (Albert et al. 2008).
in and between cells at the molecular level. Different variants that are explored are the use of
These mathematical models are used to predict the synchronous evolution of states, asynchronous firing
evolution of various system parameters over time. of the state transition, and compartmentalization of
Each parameter in the model can either be continuous biological components (Kauffman 1993).
or discrete in nature. Depending on the nature of the A simplified Boolean model of phagocytosis is
parameter, different modeling strategies are employed, shown in Fig. 3, in which a resting macrophage
as illustrated in Table 1. M engulfs extracellular bacteria BE , and becomes
an infected macrophage MI , rendering extracellular
Boolean Network bacteria as intracellular bacteria BI .
Although a quantitative modeling of the system The Boolean rules are as in (Eq. 2).
dynamics provides accurate predictions, the required
data for such modeling is often not available, making Equation 2 Boolean rules representing state transi-
qualitative modeling the only feasible option. Though tion of simple phagocytosis model.
qualitative in nature, they are often sufficient to capture
the topology of the network and provide significant MI ¼ M and BE
Host–Pathogen Interactions, Mathematical Models 911 H
Host–Pathogen
Interactions, Mathematical
Models, Fig. 3 Macrophages
ingesting bacteria and Resting
becoming infected. Macrophage
Internalized bacteria
proliferate and burst out
Infected Bursting
Bacterium Macrophage Macrophage

BI ¼ ðBE and MÞ or ðBE and MI Þ ODEs in studying HPIs. The set of ordinary equations
representing the phagocytosis model (Fig. 3) are
The host immune response to a Bordetella infection, given in
studied using this approach (Thakar et al. 2007), Equation 3, where, l is the average rate of mac-
indicated that the dynamics were different for rophage production, g is the pathogen intake probabil-
the freshly infected host, reinfection of the host, ity by a resting macrophage per contact. d represents the
and fresh infection with antibody injections. A conva- death rate of a normal macrophage and d1 is the mortal- H
lescent host immune response was predicted to be ity rate of infected macrophages. gN and gI are the
much faster than in a host with antibody transfused average pathogen intake rates by normal resting
treatment, correlating well with experiments using macrophage and infected macrophages respectively. N
mice (Thakar et al. 2007). In a separate study, is the average number of pathogens thrown out into the
a 75 node host–pathogen interactome model, built system per macrophage burst, and  represents the
to study tuberculosis, indicated that the propensity ingestion rate of phagocytosed pathogen.
for bacteria to persist was the highest as compared
to those for clearance or active proliferation (Raman
Equation 3 System of differential equations
et al. 2010).
representing interactions between macrophages
and bacteria.
Ordinary Differential Equations (ODES)
Ordinary differential equations serve as powerful tools dM
to build time-continuous models of any system. ¼ l  gMBE  dM
dt
By first principle, differential equations describe the
rate of change of value of a variable with respect to dMI
¼ gMBE  d1 MI
time. The symbolic representation of a differential dt
equation is given as dBE
¼ aBE  gN MBE  gI MI BE þ d1 NMI
dt
dy
or y_
dx dBI
¼ gN MBE þ gI MI BE  d1 NMI  BI
dt
yðtþDtÞyðtÞ
In notation using limits, y ¼ dy
lim ¼ lim Dt ,
Dt0
representing the slope of the curve yðtÞ at any given Partial Differential Equations and Compartment
time t. Models (PDEs)
HPI models generally contain many parameters, Given the complexity and heterogeneity in the host
giving rise to coupled systems, as shown in the previ- system, it is often difficult to place them into a single
ous SIR model (Eq. 1). Coupled differential equations differential framework. In order to address the tempo-
are a system of ODEs, where each equation signifies ral and spatial evolution together, multi-compartment
the time evolution of one parameter. The parameters models and partial differential equation based models
are dependent on each other as specified by their are employed. In these, the entire space is subdivided
respective equations. Hence, the system of equations into well-mixed smaller compartments, each of which
needs to be solved simultaneously. A simple model has its own governing equations, while communication
from literature is described to illustrate the use of among them is allowed through interface equations.
H 912 Host–Pathogen Interactions, Mathematical Models

Agent-Based Models payoffs (Blaser and Kirschner 2007). Pure strategies


Agent-based models, also known as cellular automata, often do not exist, rendering the problem to one
are hybrid computational models capable of addressing of identifying strategies that maximize payoffs for
far more complex interaction patterns than any other both players. An example of HPIs in Salmonella as
mathematical tool can. The basic idea is to divide the well as mycobacterial infections illustrates the use of
complete system into multiple subgroups, each group game theoretical approaches (Eswarappa 2009).
represented by an individual entity on a computational
grid. Different individuals are allowed to interact in the Summary
computational framework, subject to a set of bounds and As evident from the above discussion, a system can be
predefined rules. The framework also allows introduction addressed by different mathematical models, each at
of stochastic perturbations, as observed in real systems. a different level of granularity and providing insights
Taking the same example of phagocytosis, agent from a different perspective. Multi-scale modeling,
interactions are defined in a two-dimensional grid where different models can be built at different levels
with cyclic boundary conditions. The space is initially and all are threaded together into an integrated frame-
randomly filled with M macrophages and N bacteria. work, appears to be a most plausible direction for
At time t ¼ 0; none of the macrophages are infected future pursuits (Kirschner et al. 2007).
and all bacteria are extracellular. The maximum car-
rying capacity of each macrophage C and a probability
that any bacteria and macrophage contact ends up in Cross-References
phagocytosis PN are defined. Then the contact is
defined with a distance threshold D between extracel- ▶ Host–Pathogen Interactions
lular bacteria and macrophages. The different compo-
nents which are the resting macrophage, the infected
macrophage, extracellular bacteria and intracellular References
bacteria are considered as different types of agents.
Resting macrophage has an initial bacterial count of Albert I, Thakar J, Li S, Zhang R, Albert R (2008) Boolean
network simulations for life scientists. Source Code Biol
zero. External bacteria are free to move and upon
Med 3:16
contact with macrophages, get internalized with Anderson R, May R (1992) Infectious diseases of humans:
a probability PN , after which they are marked as dynamics and control. Oxford University Press, Oxford
internal bacteria. The internal bacteria have no free Blaser MJ, Kirschner D (2007) The equilibria that allow
bacterial persistence in human hosts. Nature 449(7164):
mobility and move with the macrophage that ingested
843–849
it. When the internal bacterial count exceeds the Eswarappa SM (2009) Location of pathogenic bacteria during
infected macrophage’s carrying capacity C, the latter persistent infections: insights from an analysis using game
bursts out, resulting in reducing the count of macro- theory. PLoS ONE 4(4):e5383
Kauffman S (1993) The origins of order: self-organization
phages and all the ingested internal bacteria will be
and selection in evolution. Oxford University Press,
marked back as extracellular bacteria. Though the Oxford
model proposed here is highly simplified, the strategy Kirschner DE, Chang ST, Riggs TW, Perry N, Linderman JJ
can be used to extend the model to simulate real (2007) Toward a multiscale model of antigen presentation in
immunity. Immunol Rev 216:93–118
systems very closely. IMMSIM is an example of an Kohler B, Puzone R, Seiden PE, Celada F (2000)
agent-based model tool, that has been used to simulate A systematic approach to vaccine complexity using an
humoral and cellular responses (Kohler et al. 2000). automaton model of the cellular and humoral immune
system. I. Viral characteristics and polarized responses.
Vaccine 19(7–8):862–876
Game Theoretic Approach
Murray J (2005) Mathematical biology: I. An introduction.
Game theory provides a mathematical framework to Springer, New York
address complex issues such as adaptability. For study- Raman K, Bhat AG, Chandra N (2010) A systems perspective of
ing HPIs, host and pathogen can be considered as two host-pathogen interactions: predicting disease outcome in
tuberculosis. Mol Biosyst 6(3):516–530
players making moves based on strategies available to
Thakar J, Pilione M, Kirimanjeswara G, Harvill ET, Albert R
them. Adaptation of each strategy incurs a cost for the (2007) Modeling systems-level regulation of host immune
player, but geared for maximizing the respective responses. PLoS Comput Biol 3(6):e109
Host–Pathogen Systems, Target Discovery 913 H
extracts in complex living systems and observing
Host–Pathogen Systems, Target changes of phenotypes. With the possibility to
Discovery isolate pharmacologically active substances that
are responsible for the observed effects at the
Christian V. Forst beginning of the nineteenth century, a key step
Department of Clinical Sciences, Southwestern toward modern drug discovery was made. With
Medical Center, University of Texas, Dallas, TX, USA the advances in molecular biology and biochemis-
try, the approach of testing defined substances in
complex living systems was largely abandoned in
Synonyms the 1990s on favor of a more reductionistic target-
based approach powered by the sequencing of the
Disease marker identification; Drug targets; human genome and the spawning of a new era with
Host–pathogen interactions the potential knowledge of all potential drug targets
readily available (Overington et al. 2006).
The failed promises of postgenomic target-base
Definition drug discovery has recently yielded to revisit a more H
systems-level approach involving the screening of test
Host-pathogen systems describe the interactions of compounds under disease conditions to determine
a pathogen with a host. These interactions take place induced phenotypic changes (Sams-Dodd 2005). This
on different level of details, both in time and space, approach, which goes beyond individual genes and
from fraction of a second to the lifetime of the host proteins as it involves the investigation of biochemical
(e.g., decades), from molecules to whole organisms networks by a systems approach (Butcher 2005) is
and societies (in the case of epidemiology). referred to as chemical genetics. Such chemical
Target discovery refers to the identification of drug genetic screens typically involve the use of genetic
targets, the naturally existing cellular or molecular (gene deletion, knockdowns, overexpression) as well
structure involved in the pathology of interest that as nongenetic, environmental (infection with patho-
potential drugs are meant to act on (see ▶ Epigenetics, gens) perturbations while comparing the influence of
Drug Discovery). lead molecules with and without such permutations.
The retroactive identification of pathways and biolog-
ical functions that underlie the observed phenotypic
Characteristics responses, termed target deconvolution, provides
important insights into the biological mechanisms of
Model System disease and further facilitate drug development
Host-pathogen systems with respect to target discovery (Terstappen et al. 2007).
are predominantly modeled by interaction networks. The
host-pathogen interactome is the most recent focus of The Overall Process
genomic technologies. Together with interaction infor- Drug target/pathway deconvolution in host-pathogen
mation, genome-wide studies of host-pathogen interac- system is embedded within the modern, holistic,
tions using mRNA and microRNA transcriptomics, drug-development pipelines including (1) assay develop-
RNA interference (RNAi), proteomics, and other plat- ment, (2) screening, (3) hits and leads, and (4) target
forms are used to obtain important insights into compo- deconvolution. It utilizes chemical genomic approaches
nents and pathways which are essential for pathogen and analyzes the experimental results in the context of the
infection, proliferation, and persistence. host-pathogen interactome. The resulting deconvoluted
drug targets and key biological processes are realized as
The Discovery Process, a Historical Perspective subnetworks of the host-pathogen interactome.
Target identification has been approached using
a variety of genetic and biochemical methods The Interactome
(Terstappen et al. 2007). Historically, pharmacologi- The host-pathogen interactome typically consists
cal activities were often discovered by testing plant of nodes (vertices) and edges that connect the nodes.
H 914 Host–Pathogen Systems, Target Discovery

Host–Pathogen Systems,
Target Discovery, Interaction
Fig. 1 Types of elementary
interactions/reactions in Protein−Protein Interaction: Protein Protein
biochemical networks
Enzyme (Protein)
Metabolic Reaction: Substance Substance

Kinase (Protein)
Signal Transduction: Protein Protein

Transcr.
Poly-
Factors
merase
Gene Expression: Gene mRNA

Ribosome
Protein Expression: mRNA Protein

Co−factors
or Polymerase
Protein
Gene Activation/Inhibition: Gene
(TF)

More complex network representations include hyper- global RNA interference (RNAi), proteomics, and other
graphs (Forst et al. 2006) and rule-based descriptions platforms, are revealing important insights into pathways
(Hlavacek et al. 2006). In simple node/edge represen- and functional units that are essential for pathogen
tations, genes, RNA/DNA, proteins, and chemicals are infection, proliferation, and persistence as well as for
represented as nodes, binary interactions and reactions host defense.
are described as edges (Fig. 1). Multiple interactions Genomic screening data are analyzed in the context of
between chemicals in biochemical reactions are biochemical networks. Figure 2 describes a principle
appropriately described by a graph generalization, approach to analyze host-pathogen networks for target
a hypergraph, where an (hyper) edge can connect deconvolution. On the left side, the corresponding exper-
more than two nodes. Alternative representations iments assays are developed, the experimental data col-
require appropriate ontologies that relate biological lected and preprocessed. Data involves qualitative
concepts using a controlled dictionary (Karp 2000). sequence tags, presence/absence of proteins, as well as
Examples include network representations in KEGG quantitative values such as gene expression, Z-scores
Kanehisa et al. (2011) and Reactome (Matthews after RNAi screens, and drug concentrations. On the
et al. 2008). right side, interaction information is collected and syn-
Interactome information is readily available on thesized into a large biochemical host-pathogen network.
the Internet. The website Pathguide (http://www. The data is then analyzed in the context of biochem-
pathguide.org) provides a comprehensive list of repos- ical networks. Typical analysis protocols involve
itories and resources on protein-protein interactions, • Network biology and graph topology
metabolic pathways, signaling pathways, pathway • Network clusters, complexes, and modules
diagrams, transcription factors and gene-regulatory • Response networks
networks, protein-compound interactions, genetic • Pathway enrichment
interaction networks, as well as other interaction- Network biology was coined by Barabasi and Oltvai
based resources. 2004 and describes the graph-topological analysis of
biochemical networks (Barabasi and Oltvai 2004).
Network Analysis His research hypothesized that high connectivity of
Genome-wide studies of host-pathogen interactions proteins in protein interaction networks indicate impor-
using mRNA and microRNA (miRNA) transcriptomics, tance with respect to biological functions. Highly
Host–Pathogen Systems, Target Discovery 915 H

Genome/Literature
Experiments
information
System response
Feedback
to experiment
Network
components

Sub-network
construction Interaction in
BioSystems

Proteomics, Gene Expr.


Subcellular Locations Update
Network
(e.g. mutual H
Info - Aracne)
Functional Initial biological
Protein tags, location, Network network
Pathway scores
RNAi Z-scores

Connections?
(complete, correct ...)

Host–Pathogen Systems, Target Discovery, Fig. 2 Response network analysis. Screening data is analyzed in the context of
biochemical networks

connected protein knockouts tend to be lethal for the used to determine enrichment of gene-ontology nodes
organism as tested in the case of yeast. by genes. By including quantitative data, such gene-
Network clusters, complexes, and modules are expression profiles, the Gene Set Enrichment Analysis
used to group subsets of network components (genes, method (GSEA; Subramanian et al. 2005) is capable to
proteins, compounds) into connected subgraphs. Purpose determine whether an a priori defined set of genes
of this exercise is to cluster tightly connected compounds shows statistically significant, concordant differences
which are thought to be functionally related (Aravind between two biological states (e.g., phenotypes).
2000). Predefined gene sets (aka Molecular Signature Data-
The calculation of response networks involves of the bases) were extracted, for example, from BioCarta,
superposition of local, component-based data, such as KEGG (Kanehisa et al. 2011), and Reactome path-
gene-expression values, with network information. ways. In a next step, together with biochemical net-
Response networks of a system refer to best-scored sub- work information, enriched gene sets are used as
networks in large biochemical networks responding to scaffolds to construct enriched pathways. Thus, the
specific environmental conditions measured by the network provides additional context information.
corresponding experiment (Ideker et al. 2001; Ideker
et al. 2002; Cabusora et al. 2005). Target Deconvolution
Pathway enrichment is a multistep process. It is Target deconvolution describes the retrospective iden-
based on set enrichment analysis identifying statisti- tification of targets that underlie observed phenotypic
cally significant genes that enrich a particular set. responses. In the case of host-pathogen interactions,
For example, hypergeometric distributions have been the phenotypic responses typically involve survival or
H 916 Host–Pathogen Systems, Target Discovery

death of the host, virus replication, or host defense – References


with and without administered drugs. Recent studies
involve the identification of human host factors Aravind L (2000) Guilt by association: contextual information in
genome analysis. Genome Res 10:1074–1077
required for influenza virus replication using genome-
Barabasi A-L, Oltvai ZN (2004) Network biology:
wide RNAi screens. Shapiro et al. utilize yeast two- understanding the cell’s functional organization. Nat
hybrid analysis to identify physical associations Rev Genet 5:101–113
between host and viral proteins (Shapira et al. 2009). Butcher EC (2005) Can cell systems biology rescue drug
discovery? Nat Rev Drug Discov 4:461–467
After verification with RNAi screens, a core network
Cabusora L, Sutton E, Fulmer A, Forst CV (2005) Differential
that is enriched in RNA-binding proteins, components network expression during drug and stress response.
of WNT signaling, and viral polymerase subunits have Bioinformatics 21:2898–2905
been identified as potentially important in influenza Forst CV, Flamm C, Hofacker IL, Stadler PF (2006)
Algebraic comparison of metabolic networks, phyloge-
infections. K€ onig et al. uses multiple analysis
netic inference, and metabolic innovation. BMC Bioin-
approaches including network context to calculate formatics 7:67
consensus scores for the identification of target host Hlavacek WS, Faeder JR, Blinov ML, Poser RG, Hucka M,
proteins required for WSN virus replication (K€ onig Fontana W (2006) Rules for modeling signal-transduction
systems. Science STKE, 344:re6, doi:10.1126/stke.
et al. 2010). Two hundred and nineteen factors were
3442006re6
confirmed to be required for efficient wild-type influ- Ideker T, Ozier O, Schwikowski B, Siegel AF (2002) Discovering
enza virus growth, including those involved in kinase- regulatory and signalling circuits in molecular interaction
regulated signaling, ubiquitination, and phosphatase networks. Bioinformatics 18(Suppl 1):S233–S240
Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK,
activity, and 181 factors assemble into a highly signif-
Bumgarner R, Goodlett DR, Aebersold R, Hood L (2001)
icant host-pathogen interaction network. Integrated genomic and proteomic analyses of a
All of these studies employ human cell models. systematically perturbed metabolic network. Science
Thus the success of their deconvoluted targets have 292:929–934
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M
to be further verified in preclinical and clinical
(2011) KEGG for integration and interpretation of
scenarios. large-scale molecular data sets. Nucl Acids Res
doi:10.1093/nar/gkr988
Karp PD (2000) An ontology for biological function based on
molecular interactions. Bioinformatics 16:269–285
Cross-References K€onig R, Stertz S, Zhou Y et al (2010) Human host
factors required for influenza virus replication. Nature
463:813–817
▶ Biological Network Model
Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D,
▶ Functional Enrichment Analysis de Bono B, Garapati P, Hemish J, Hermjakob H, Jassal B,
▶ Gene Expression Kanapin A, Lewis S, Mahajan S, May B, Schmidt E,
▶ Gene Ontology Vastrik I, Wu G, Birney E, Stein L, D’Eustachio P (2008)
Reactome knowledgebase of human biological pathways and
▶ Gene Set and Protein Set Expression Analysis
processes. Nucleic Acids Res 37:D619–D622
▶ Graph Overington JP, Al-Lazikani B, Hopkins AL (2006) How
▶ Host–Pathogen Interactions many drug targets are there? Nat Rev Drug Discov
▶ Hypergraph Theory 5:993–996
Sams-Dodd F (2005) Target-based drug discovery: is something
▶ Interactome
wrong? Drug Discov Today 10:139–147
▶ MicroRNA Shapira SD, Gat-Viks I, Shum BOV et al (2009) A physical and
▶ Network Clustering regulatory map of host-influenza interactions reveals
▶ Pathway pathways in H1N1 infection. Cell 139:1255–1267
Subramanian A, Tamayo P, Mootha VK, Mukherjee S,
▶ Proteome
Ebert BL, Gillette MA, Paulovich A, Pomeroy SL,
▶ Response Golub TR, Lander ES, Mesirov JP (2005) Gene set enrich-
▶ RNA Interference ment analysis: a knowledge-based approach for interpreting
▶ Signal Transduction genome-wide expression profiles. Proc Natl Acad Sci USA
102:15545–15550
▶ Topology of Metabolic Reaction Networks
Terstappen GC, Schlupen C, Raggiaschi R, Gaviraghi G (2007)
▶ Transcription Factor Target deconvolution strategies in drug discovery. Nat Rev
▶ Transcriptome Drug Discov 6:891–903
Hough Transform 917 H
In the example of Fig. 1, applying HT to the points
Hough Transform in the segment [p, q] results in the mapping of the latter
segment into a bundle of lines which will intersect into
Virginio Cantoni1,2 and Elio Mattia3 a particular point (m0, c0). This point will correspond
1
Department of Computer Engineering and Systems to the actual slope and intercept the line of the image
Science, University of Pavia, Pavia, Italy space which contains the segment [p, q]. Computation-
2
Computational Biology, KTH Royal Institute of ally, the HT is performed by voting (m, c) values in
Technology, Stockholm, Sweden a quantized parameters space; in the example of Fig. 1,
3
Centre for Systems Chemistry, Stratingh Institute for (m0, c0) will result in a higher final vote, both indicat-
Chemistry, Rijksuniversiteit Groningen, Groningen, ing success in retrieving a line and providing its actual
The Netherlands slope and intercept.
The number of points into which one single point of
the image space is mapped can be enormously
Definition reduced if edge detection operators not only provide
information about the location of meaningful points in
The Hough transform (HT) is a coordinate transforma- an image, but also about the approximate orientation of H
tion introduced by Hough (Hough 1962). It is useful in the curve on which points are possibly located.
computer vision as a method for retrieving shapes This variation is the so-called gradient method (GM).
within digital images. It was first conceived for lines, Although the GM makes line retrieval by HT a trivial
circumferences, and simple polygons (Duda and Hart process, it only facilitates retrieval of more complex
1972) and later generalized to arbitrary shapes (Ballard shapes such as polygons – for which even with the aid
1981). of the GM, segments have to be voted.
HT is best explained for the case of retrieval of In the generalized Hough transform (GHT),
lines. HT maps every meaningful point of an x–y retrieval of shapes of any nature – e.g., a wrench – is
image space into a line of an m–c parameters space. accomplished with the aid of the GM, to help establish
Meaningful features of an image for the purpose of HT relative orientations. One reference point internal to the
are identified by retrieving points that have high object to be retrieved is established, then a table
gradient values, as these points could possibly belong containing information about distance and orientation
to the contour of, e.g., a straight or curved line; com- of selected control points of the object – e.g., 10 points
putationally, they can be determined by edge detection located on peculiar features of the object – with respect to
operators. Figure 1 depicts the HT of points p and q of the reference point is built. GHT is then performed on
the image on the left into lines in the parameters space a given image by performing HT, in a parameters space
on the right. The various points of the HT line of corresponding to the image space; HT is performed
a point p give the set of slope (m) and intercept (c) multiple times per meaningful point – e.g., 10 times per
values of the bundle of lines which contain the point p. point of the image, in our example – supposing that it

y c

q q

Hough Transform,
Fig. 1 The Hough transform
duality between edge points p
and straight line parameters:
two points of a segment of the p
image space (left) and two
lines of the parameters space x m
(right)
H 918 Hough Transform, Structural Motif Retrieval

might correspond to any control point previously tool among the available heuristics that allow to esti-
established. If the object of interest is present in the mate the presence of particular structural features
image, one point in the image space will be highly within the structure of a protein.
voted; it will correspond to the reference point of the
object and will be voted once by every control point of
the object actually in the image. Characteristics
The advantages of HT are efficiency, rotation- and
translation-insensitiveness, parallelizability, noise and Systems biology benefits enormously from automatic
occlusion insensitiveness, and relative tolerance to methods aimed at sorting out meaningful information
approximate descriptions. Among its disadvantages, and overall trends from the huge amount of data about
its high computational complexity, which results in biological systems that has been, is being and will be
high time and memory requirements. Randomized collected with modern high-throughput techniques.
and deterministic HT variations help dealing with Among these methods, it is widely acknowledged
these downsides. that a substantial role in carrying out data mining
nowadays is played by protein structural comparison.
Known functional units such as structural motifs can
Cross-References be retrieved in recently discovered proteins and simi-
larities between known and new structures can be
▶ Hough Transform, Structural Motif Retrieval established. This allows inferring protein function,
explaining the role of specific sequences in biochemi-
cal pathways and biological networks, building up
References phylogenetic trees and ultimately creating new data-
base annotations, which in including the results of the
Ballard DH (1981) Generalizing the Hough transform to detect completed predictions will be helpful for the structures
arbitrary shapes. Pattern Recognit 13:111–122 that will be discovered in the future.
Duda RO, Hart PE (1972) Use of the Hough transformation
Various approaches have been developed for pro-
to detect lines and curves in pictures. Commun ACM
15:11–15 tein structure comparison, based on either distance
Hough PVC (1962) Methods and means for recognizing matrices (Taylor and Orengo 1989; Holm and Sander
complex patterns. US patent 3069654 1993), graph theory, or geometric hashing (Nussinov
and Wolfson 1991; Comin et al. 2004). The various
available algorithms are heuristics.
It is fundamental, in this context, to be acquainted
Hough Transform, Structural Motif with the notion of heuristic: An experience-based or
Retrieval intuition-guided problem-solving technique, which is
generally fast at the price of suboptimality, i.e., there is
Virginio Cantoni1,2 and Elio Mattia3 generally no theorem to fully guarantee the reliability
1
Department of Computer Engineering and of the results they provide. Heuristics are employed
Systems Science, University of Pavia, Pavia, Italy either when just no optimal approach is available at all
2
Computational Biology, KTH Royal Institute of or when, despite availability of an optimal method, the
Technology, Stockholm, Sweden heuristic approach is way faster and yields results with
3
Centre for Systems Chemistry, Rijksuniversiteit an acceptable accuracy given the inherent gain in
Groningen, Groningen, The Netherlands execution time.
The reason for the existence of multiple solutions of
the same protein structural comparison problem is that
Definition disparate inspirations lead to diverse approaches, very
often based on very different lines of reasoning.
The Hough transform can be used efficiently in the It turns out that the existence of multiple heuristics
context of protein structural comparison and motif for protein structure comparison is actually vital
retrieval, thereby constituting another fundamental and the reason for this is that it is very difficult to
Hough Transform, Structural Motif Retrieval 919 H

ρ θ θ
θ ρ

y y′
reference
point Whole circular
x x′ range of votes
with same ρ
z z′ and θ

Hough Transform, Structural Motif Retrieval, Fig. 1 The principle of applying the Hough transform to protein structure
comparison. Left: model protein; right: object protein (voting space)
H
establish quantitatively the distance from optimality of a cleaner voting space and in a better signal-to-noise
suboptimal structural comparison results. Therefore, ratio.
only when such methods yield results which are in The principle of the method is illustrated in Fig. 1.
accordance with one another, despite the fact that the On the left is the xyz space of the so-called model
nature of the calculations can be profoundly different, protein, while on the right is the x0 y0 z0 space of the
can the predictions be deemed accurate. so-called object protein. Voting is performed in the
The purpose of this entry is to illustrate a recently latter space. In order to do so, for every secondary
developed heuristic approach for assessing protein structure of the model protein (illustrated as a bold
structure similarity and for retrieving structural motifs, arrow on the left of Fig. 1), the characteristic parame-
which is inspired by a computational method imported ters rho and theta must be computed; these correspond
from the field of computer vision: the ▶ Hough to the distance of the secondary structure to a well-
transform (Mattia 2010). defined reference point, such as the geometric center of
In this method, protein similarity is determined by the protein, and the angle that the direction of the
performing a comparison between pairs of protein secondary structure forms with the segment joining
structures and calculating a comparison score. Motif the latter with the reference point. The parameters
retrieval is allowed for by letting one of the proteins in rho and theta allow to vote reference points in the
the comparison be the structural motif which is to be voting space. Knowledge of only two parameters in
searched for. The comparison score is calculated in a 3D space results in incomplete definition of the
a vote space, which generally corresponds to the coor- position of the point to vote, making voting of an entire
dinate space in which one of the proteins is described, circumference indispensable. This circumference is
by appropriately aligning couples of secondary struc- the rim of a cone centered in the object protein’s
tures of the two proteins (e.g., alpha-helices and secondary structure, as Fig. 2 illustrates. If the second-
beta-strands), one from each of the latter, and voting ary structures of the object proteins have a similar
selected reference points. Either the vote of the mostly spatial arrangement as the ones of the model protein
voted point in the voting space or other functions of the from which the rho and theta parameters are taken
votes in the voting space can be used as the final from, then, after many voting steps, i.e., when each of
comparison score. the rho and theta values from all of the secondary
The method can be easily tuned so that voting is structures of the model protein has been used to vote
performed only between secondary structures of simi- circumferences around every secondary structure of
lar type, e.g., alpha-helices with alpha-helices and the object protein, as many circumferences as the num-
beta-strands with beta-strands. Higher information ber of secondary structures in the model protein will all
content in the voting procedure always results in intersect, in the voting space, in one highly voted point,
H 920 Hough Transform, Structural Motif Retrieval

Hough Transform,
Structural Motif Retrieval,
Fig. 2 Example of a filled
voting space, in which
circumferences from different
secondary structures interfere
to create voting peaks
(two different perspectives)

which can be used as a score for the comparison. This computationally costly, to the point that it easily
will not happen whether the two proteins have different becomes too heavy to be executed in reasonable
structures, resulting in a low score. Figure 2 illustrates times for ordinary purposes. The algorithmic complex-
a sample voting space, in which successful comparison ity is in this case of O(N2), where N is the number of
results in the formation of vote peaks due to the voting votes in the vote space, which is in turn proportional to
circumferences intersecting with one another. the number of secondary structures in the model pro-
Fundamental for the implementation of the method tein and in the object protein and to the number of steps
is discretization. Both the geometric space where in which the voting circumference is divided.
voting is performed and the voting circumference are The problems associated with the high computa-
discretized, the former in “voxels” (cubic volume tional requirements of the smoothing algorithm are
elements), the latter in an integer number of steps. solved if a particular data structure is introduced in
The entity of the discretization deeply influences the implementation, i.e., the ▶ range tree. The use of
the quality of the results: A high-detail mesh produces range trees in the smoothing step allows the computa-
more reliable scores, although at the price of longer tional complexity to fall down to O(Nlog3N).
execution times, making a compromise between Balancing of the range trees guarantees that the com-
parameter settings and acceptable execution times putational complexity stick to the logarithmic order
necessary. Overwhelming memory usage is avoided above, without worst-case scenarios.
by maintaining information of only the voxels that The execution times vary strongly depending on the
contain a nonzero vote. size of the input, i.e., how many proteins to compare
The most important aspect of the implementation and how many secondary structures they contain, and
regards the way of dealing with the voting space after it the parameters settings. A typical execution time for
has been filled. Since structural similarity does not a one-to-one comparison is from a fraction of second to
mean structural identity, the circumferences in the a few seconds on a standard laptop. Noteworthy is the
voting space will not actually overlap perfectly, change in execution times that the use of range trees
although they will come to close proximity around brings about: A 33-to-33 proteins comparison took
one point. This results in the votes not forming the 8 h to complete with the standard algorithm (without
actual peaks that are wanted in order to compute range trees), while only 11 min with the optimized
a score. For this reason, smoothing of the vote space algorithm (with range trees).
by accumulation of all of the votes sufficiently near to The algorithm is embarrassingly parallel. This
every point in the vote space is needed. Performing means that, since it requires very little communication
a smoothing of the vote space by merely scrolling the between independent voting steps, it is easily split into
vote space and for every point summing every vote that parallel tasks, resulting in faster execution, which is
happens to be sufficiently close to it (thereby scanning fundamental for operation in the context of database
the vote space again for every point in it) is annotation.
HTLV, Cellular Transcription 921 H
All in all, as it has been pointed out in the introduc-
tory paragraphs that results of structural alignment from HPC
different methods which are in reciprocal agreement
help to confirm positive predictions, the Hough trans- ▶ Large-Scale and High-Performance Computing
form method, which has proven successful, is therefore
a fundamental component of the suite of protein struc-
tural comparison algorithms available to date.
The method has its own peculiar advantages, too,
HTLV
which are inherited directly by the algorithm it is based
▶ HTLV, Cellular Transcription
upon, the ▶ Hough transform. Among them, effi-
ciency, rotation- and translation-insensitiveness,
parallelizability, noise and occlusion insensitiveness,
and relative tolerance to approximate descriptions. The HTLV, Cellular Transcription
method is also highly parameterized, so that it can be
adapted to specific cases in order to obtain the best Christian Sch€onbach
results. The main downside is its suboptimality, as is Department of Bioscience and Bioinformatics, Kyushu H
the case with any heuristic. Other specific disadvan- Institute of Technology, Iizuka, Fukuoka, Japan
tages of the Hough transform, such as high time and
memory requirements, and of range trees, such as the
inherent difficulties in treating complex data struc- Synonyms
tures, have been successfully dealt with and solved.
HTLV; Human T-lymphotropic virus

Cross-References
Definition
▶ Hough Transform
▶ Range Tree Human T-lymphotropic viruses (HTLV) are members
of the ▶ Deltaretroviridae. HTLV-1, the only known
human oncogenic retrovirus is the etiological agent of
▶ adult T-cell leukemia/lymphoma (ATL) and
References ▶ Human T-Lymphotropic Virus Type-I-associated
Myelopathytropical Spastic Paraparesis (HAM/TSP).
Comin M, Guerra C, Zanotti G (2004) PROuST: a comparison
method of three-dimensional structures of proteins using HTLV-2 is associated with HAM/TSP-like illness,
indexing techniques. J Comput Biol 11–6:1061–1072 whereas the pathogenicities of HTLV-3 and HTLV-4
Holm L, Sander C (1993) Protein structure comparison by align- are unknown. HTLVs share a similar proviral genomic
ment of distance matrices. J Mol Biol 233:123–138
Mattia E (2010) Protein structure comparison and motif retrieval
organization. Long terminal repeats (LTR) flank
by the generalized Hough transform. IUSS Thesis, Pavia, the structural genes gag, pol, and env (Fig. 1). The pX
Italy region encodes the regulatory protein Tax and so-called
Nussinov R, Wolfson H (1991) Efficient detection of three- accessory proteins Rex, p21, p12, p13, and p30. The
dimensional motifs in biological macromolecules by
basic leucine zipper factor HBZ is translated from an
computer vision techniques. Proc Natl Acad Sci USA
88:10495–10499 antisense transcribed mRNA of the 30 LTR region
Taylor WR, Orengo CA (1989) Protein structure alignment. (Matsuoka and Jeang 2007; Higuchi and Fujii 2009).
J Mol Biol 208:1–22 CD4+ T-cell transformation by HTLV-1, viral per-
sistence, and immune response modulation are driven
by the highly pleiotropic oncoprotein Tax1 (Tax of
HTLV-1) in conjunction with HBZ, p12, p13, Rex1,
HP1 and p30. Env, Rex, Tax, and dendritic cells are asso-
ciated with HTLV-1 tropism for infecting CD4+
▶ Heterochromatin Protein 1 T-cells (Jones et al. 2008; Boxus and Willems 2009).
H 922 HTLV, Cellular Transcription

p30
p13
p12
Rex (p27)
p21
Tax (p40)

env
gag pol
protease
5⬘LTR gag pol env pX 3⬘LTR

HBZ

HTLV, Cellular Transcription, Fig. 1 HTLV proviral genome and transcripts

Tax1 can interact with hundreds of cellular proteins to Tax


transform CD4+ cells (Boxus et al. 2008). Protein HTLV–1 LZR PBM
complex formation of Tax1 with cellular proteins
CREB2, ATF2, EP300, and KATB2 is essential for HTLV–2 LZR
Tax1-mediated viral and cellular transcription initia-
tion. Major Tax1 targets are the non-canonical NFkB HTLV–3 LZR PBM
and AKT1 signaling pathways including transcription
factors AP1, SRF, and TP53 (Matsuoka and Jeang HTLV–4 LZR
2007). Various feedback mechanisms on viral and
cellular transcription level ensure HTLV-1 persistence HTLV, Cellular Transcription, Fig. 2 Comparison of HTVL
and/or immune escape from recognition by Tax1- Tax motif/domain organization
specific cytotoxic T-cells. For example, Tax1-
mediated T-cell immortalization is suppressed by of Tax1 with CHUK, IKBKG, and NFKB2 leads
HBZ heterodimers with CREB2 or JUN, whereas to nuclear translocation of RELB and NFKB2
HBZ transcription is activated by Tax1 (Boxus and heterodimers which induce the expression of various
Willems 2009; Matsuoka 2010). p30 binding to cell proliferation–promoting cytokines (IL1B, IL2,
EP300 can modulate transcription of both, viral and IL6, IL9, IL13, and IL15) and their corresponding
cellular genes. Depending on the level of available receptors (Boxus and Willems 2009). The LZRs of
p30, it may either stabilize Tax1-CREB2-ATF2- HTLV-2, -3, and -4 are more similar to each other than
EP300-KATB2 complex or compete for EP300 and to HTLV-1 LZR (Fig. 2) (Higuchi and Fujii 2009).
suppress transcription initiation (Bai et al. 2010). Accordingly, HTLV-1 but not HTLV-2 in vitro infec-
Upon transformation, mutations in the Tax1-coding tion of IL2-dependent cell lines results over time in IL2-
region, deletions and/or methylation of 50 LTR pre- independent growth (Boxus et al. 2008).
vents Tax1 expression. HTLV-1 complements host cell transformation
with Tax1-driven aneuploidic or improper chromo-
somal segregation effects and clastogenic or mismatch
Characteristics repair–associated DNA damage (Matsuoka and Jeang
2007; Boxus and Willems 2009). The Tax1 C-terminal
Tax-Mediated Transcriptional Changes PDZ domain–binding motif (PBM) has been associ-
Tax1 leucine zipper–like region (LZR) mediates the ated not only with cell proliferation but also genomic
activation of the non-canonical NFkB pathway, which instability–promoting properties. Tax proteins of
is essential in T-cell transformation. Complex formation HTLV-2 and -4 lack PBM (Fig. 2).
HTLV, Cellular Transcription 923 H
Modulation of the cell cycle by Tax1 and HBZ in also binds to the immature forms of the IL2Rb and gc
conjunction with epigenetic changes promotes the chains, which enhance STAT5 activation and TCR
clonal expansion of transformed CD4+ T-cells via signaling in response to IL2. To avoid immune recog-
mitotic proliferation. On the other hand, Tax1-specific nition of the infected proliferating T-cells, p12 binds in
cytotoxic CD8+ T-cells select against clonal prolifera- the ER to HLA class I molecules which reroutes HLA
tion. Overall, the relatively low infection efficacy class I-trafficking to the proteasome for degradation
of HTLV-1 and multifactorial dependence on CD4+ rather than to the cell surface (Van Prooyen et al.
T-cell transformation and proliferation capacity of 2010).
abnormal cells results in latency periods of several
decades and up to 5% ATL incidence. In case of p13
HAM/TSP, the latency period is shorter, and patients p13 has dual subcellular localization potential. Mainly,
show a vigorous HTLV-1-specific cytotoxic CD8+ p13 localizes to mitochondria where it increases reactive
immune response accompanied by pro-inflammatory oxygen species production and reduces mitochondrial
cytokine production, which contribute to the neuropatho- Ca2+ uptake, which influences the proliferation and
genicity (Matsuoka and Jeang 2007). death of T-cells. Tax1-mediated ubiquitylation facili-
tates sorting of p13 to the nucleus. Subsequent heterodi- H
Effects of Accessory Proteins merization with Tax1 inhibits CREBBP and EP300 co-
While Tax1 is sufficient and necessary to transform activator binding to Tax1 and attenuates Tax-mediated
T-cells, accessory proteins are equally important for transcription of HTLV-1 genes in a potential negative
the survival of HTLV-1 in the host cells and progression feedback loop (Silic-Benussi et al. 2010).
toward ATL or HAM/TSP as outlined in excellent
reviews of p12 (Van Prooyen et al. 2010), p13 (Silic- Rex1 (p27)
Benussi et al. 2010), p30 (Bai et al. 2010), and HBZ The RNA-binding post-transcriptional regulator Rex1
(Matsuoka 2010) in a special issue of Molecular Aspects (p27) controls the splicing and export of HTLV RNAs
of Medicine. Some of the multifunctional accessory transcribed from the pX regions, including its own
proteins have Tax1 synergistic and antagonistic func- RNA. Rex1 is essential for the production and assem-
tions to promote immune escape and modulate cell bly of viral particles which enables active infection
proliferation, viral replication, and persistence. (Boxus and Willems 2009; Bai et al. 2010).

p12 p30
p12 protein usually localizes to the endoplasmic retic- p30 antagonizes the effects of Tax1 on both, transcrip-
ulum (ER) and cis-Golgi. Yet, proteolytic cleavage of tional and post-transcriptional levels. Direct interac-
p12 at a non-canonical ER retention signal produces tion of p30 with CREBBP and EP300 suppresses the
p8, which localizes to lipid rafts of the immunological transcription of HTLV-1 from the 50 LTR and of cellu-
synapse and affects T-cell receptor (TCR) signaling. In lar genes that are dependent on CREBBP/EP300 acti-
addition, interaction of p8 with linker for activation vation. Another cellular target of p30 is transcription
of T-cells (LAT) decreases LAT phosphorylation. factor SPI1 also known as PU.1. Binding of p30 to the
As a result, suppressed T-cell receptor signaling ETS domain of SPI1 prevents its binding to DNA and
downregulates the activity of NFAT transcription fac- therefore transcriptional activation of its target genes
tor family members which decreases cytokine produc- (e.g., TLR4) including itself. Downregulation of
tion and T-cell proliferation (Van Prooyen et al. 2010). TLR4 expression interferes with the innate immune
ER-located p12 promotes in synergy with Tax1 response, induction of pro-inflammatory cytokines
T-cell proliferation and viral persistence by activating (e.g., IL8, TNFA, CCL2, etc.) and production of
both IL2 and IL2R expression. p12 interaction with anti-inflammatory IL10 in TLR4-primed dendritic
CALR and CANX increases cytoplasmic Ca2+ levels cells (Bai et al. 2010).
and dephosphorylation of NFATC1, which locates to Post-transcriptionally p30 interaction with the large
the nucleus to activate IL2 transcription. In turn, the ribosomal subunit protein L18a and binding to Tax1
increase in IL2 activates Tax1 transcription via and Rex1 mRNAs were found to increase their nuclear
CREB2 and ATF2. At the same time ER-located p12 retention, thereby suppressing viral replication and
H 924 HTML

possibly prolonging viral latency. On cellular level, Cross-References


p30 increases via an unknown mechanism the phos-
phorylation of GSK3B kinase in macrophages, which ▶ Adult T-Cell Leukemia/Lymphoma
leads to an increase in IL10 production (Bai et al. ▶ Deltaretroviridae
2010). ▶ Human T-Lymphotropic Virus Type-I-associated
Myelopathytropical Spastic Paraparesis
HBZ
HBZ gene is transcribed from the 30 LTR which is not
known to undergo methylation and/or deletions. References
Therefore, HBZ is expressed at all stages of infection.
Both HBZ mRNA and protein modulate cellular tar- Bai XT, Baydoun HH, Nicot C (2010) HTLV-I p30: a versatile
gets. HBZ protein forms heterodimers with CREB, protein modulating virus replication and pathogenesis. Mol
Aspects Med 31(5):344–349
CREB2, and p300/CBP which attenuate Tax-mediated
Boxus M, Willems L (2009) Mechanisms of HTLV-1 persis-
transcription of HTLV-1 genes from the 50 LTR. tence and transformation. Br J Cancer 101(9):1497–1501
A stem-loop structure of HBZ mRNA promotes the Boxus M, Twizere JC, Legros S, Dewulf JF, Kettmann R, Willems
proliferation of infected and leukemic cells, and L (2008) The HTLV-1 Tax interactome. Retrovirology 5:76
Higuchi M, Fujii M (2009) Distinct functions of HTLV-1 Tax1
increases the transcription of E2F1. Possibly, the
from HTLV-2 Tax2 contribute key roles to viral pathogene-
increased cell proliferation is associated with the sis. Retrovirology 6:117
induction of E2F1 target genes. In addition, HBZ Jones KS, Petrow-Sadowski C, Huang YK, Bertolette DC,
mRNA may modulate the phenotype of T-cells Ruscetti FW (2008) Cell-free HTLV-1 infects dendritic
cells leading to transmission and transformation of CD4(+)
(Matsuoka 2010).
T cells. Nat Med 14(4):429–436
Matsuoka M (2010) HTLV-1 bZIP factor gene: Its roles
HTLV and Transcription of Cellular miRNAs in HTLV-1 pathogenesis. Mol Aspects Med 31(5):359–366
HTLV-1 infection affects also the expression of small Matsuoka M, Jeang KT (2007) Human T-cell leukaemia virus
type 1 (HTLV-1) infectivity and cellular transformation. Nat
RNAs. Ruggero and co-workers (Ruggero et al. 2010)
Rev Cancer 7(4):270–280
reviewed reports of numerous miRNAs that were Ruggero K, Corradin A, Zanovello P, Amadori A, Bronte V,
found to be up- or downregulated in infected T-cells Ciminale V, D’Agostino DM (2010) Role of microRNAs in
and ATL cells. For example, in ATL cells, miR-93 and HTLV-1 infection and transformation. Mol Aspects Med
31(5):367–382
-130b downregulate TP53INP1, an apoptosis- and cell
Silic-Benussi M, Biasiotto R, Andresen V, Franchini G,
cycle arrest–promoting tumor suppressor. In HTLV- D’Agostino DM, Ciminale V (2010) HTLV-1 p13, a small
1-infected T-cell lines, Tax1 upregulates miR-146a protein with a busy agenda. Mol Aspects Med 31(5):350–358
and miR-130b via the NF-kB pathway. Other Tax1- Van Prooyen N, Andresen V, Gold H, Bialuk I, Pise-Masison C,
Franchini G (2010) Hijacking the T-cell communication
modulated small RNAs include DNA-directed RNA
network by the human T-cell leukemia/lymphoma virus
polymerase III–transcribed transfer RNAs (tRFs) and type 1 (HTLV-1) p12 and p8 proteins. Mol Aspects Med
miRNAs, which affect cell proliferation and cell cycle 31(5):333–343
progression.

Conclusions
In the past 5 years, HTLV-1 and -2 studies have con- HTML
siderably contributed to the understanding of the
molecular mechanism of T-cell transformation and Tim Clark
viral manipulation of cellular pathways at both tran- Department of Neurology, Massachusetts General
scription and protein levels. Potential therapeutic tools Hospital and Harvard Medical School, Boston,
(e.g., PI3K inhibitors, allogeneic hematopoietic stem MA, USA
cell transplantation, or mutant Tax vaccination) and
diagnostic markers such as miRNAs have emerged
(Matsuoka and Jeang 2007; Ruggero et al. 2010), but Synonyms
efficient therapies and diagnostic markers have yet to
be developed. Hypertext markup language
HTTP 925 H
Definition Hickson I, Schwartz B (eds) (2012) HTML5: a technical speci-
fication for Web developers. WHATWG. http://developers.
whatwg.org/
HTML is the principal markup language of the Raggett D, Hors AL, Jacobs I (eds) (1999) HTML 4.01 specifica-
▶ World Wide Web. It specifies the structural tion, W3C recommendation 24 December 1999. World Wide
semantics for text in Web documents. Its elements Web Consortium. http://www.w3.org/TR/REC-html40/
indicate to Web browsers how to display content on
a Web page, including text, images, video, and
audio. It also can embed programming instructions
to the browser using JavaScript, or other scripting HTTP
languages.
A separate language, CSS (Cascading Style Tim Clark
Sheets), is used to specify the presentation of the Department of Neurology, Massachusetts General
page elements, such as typefaces and styles, layout, Hospital and Harvard Medical School, Boston,
colors, etc. MA, USA
Elements of HTML consist of tags enclosed in
angle brackets, inserted into the content, like this
Synonyms H
example in HTML5:
<!DOCTYPE html>
Hypertext transfer protocol
<html>
<head>
<title>Definition of HTML Definition
</title>
</head> HTTP is a generic stateless Internet application protocol
<body> for distributed and collaborative ▶ hypertext- and hyper-
<p>HTML is the markup media-based information systems. It provides the foun-
language of the World Wide dation for data communications on the World Wide Web.
Web.</p> It is currently the dominant Internet protocol.
</body> The HTTP protocol has a circumscribed set of allowed
</html> actions to access its target, which is a distributed resource
Tags plus CSS indicate to the browser how the specified by the ▶ URL or Uniform Resource Locator
content should be rendered. string. Resources are resolved, interpreted, and acted
HTML 4.01 is the most recent fully standardized upon according to client request, by computers running
W3C version of HTML. HTML5, with a number of Web server software (e.g., Apache httpd).
attractive new features, is currently usable in “working The actions supported by HTTP are constrained
draft” state. It is under parallel and coordinated devel- to a relatively limited set. Programmatic Web services
opment by both the W3C (http://w3.org) and the that conform to this set of allowed actions
WHATWG (http://www.whatwg.org). (GET, HEAD, PUT, POST, DELETE, TRACE,
OPTIONS, and CONNECT) are called “resource-ori-
ented” services and are said to conform to the REST
Cross-References (representational state transfer) architectural pattern –
they are said to be “RESTful.”
▶ World Wide Web Secure communications on the Web for such
purposes as financial transactions are implemented
using the ▶ HTTPS protocol, which encrypts the Inter-
net transport layer data transmitted by HTTP.
References HTTP standards development has been a global
collaborative engineering effort, coordinated by the
Hickson I (ed) (2011) HTML5: a vocabulary and associated
APIs for HTML and XHTML, W3C Working Draft World Wide Web Consortium (http://w3.org) and the
25 May 2011. World Wide Web Consortium Internet Engineering Task Force (http://www.ietf.org).
H 926 HTTP Secure

Cross-References symmetric encryption technologies such as AES


or RC4.
▶ World Wide Web Before establishing a secure link, communicating
nodes must authenticate themselves – they must pre-
sent credentials that reliably establish their identities.
References In HTTPS, Web browsers and services determine
whether or not to trust servers, by authenticating an
Fielding RT, Taylor RN (2002) Principled design of the X.509 public-key cryptographic identity certificate
modern web architecture. ACM Trans Internet Technol
provided by the server. X.509 certificates are signed
2(2):115–150
Fielding R, Getty J, Mogul J, Frystyk H, Masinter L, Leach P, by a trusted Certification Authority (e.g., Verisign,
Berners-Lee T (1999) Hypertext transfer protocol – HTTP/ Microsoft), or self-signed by the server’s own
1.1, IETF RFC 2616. Internet Engineering Task Force. http:// organization.
www.ietf.org/rfc/rfc2616.txt
The Electronic Frontier Foundation (https://www.
Jacobs I, Walsh N (2004) Architecture of the World Wide Web,
vol 1. In: W3C recommendation. World Wide Web eff.org/) recommends encryption of Web traffic when-
Consortium. http://www.w3.org/TR/webarch/ ever possible. “HTTPS Everywhere” (https://www.eff.
org/https-everywhere) browser extensions are now
available to support this recommendation.

HTTP Secure
References
▶ HTTPS
Apache Software Foundation (2011) SSL / TLS strong
encryption: an introduction. Apache Software Foundation.
https://httpd.apache.org/docs/2.3/ssl/ssl_intro.html
Dierks T (2008) Rescorla E: IETF RFC 5246: the transport
layer security (TLS) protocol, version 1.2. In: IETF Requests
HTTPS for Comment, vol 5246. Internet Engineering Task Force.
https://www.ietf.org/rfc/rfc5246.txt
Tim Clark Rescorla E (2000) HTTP over TLS: RFC 2818. Internet Engi-
Department of Neurology, Massachusetts General neering Task Force. https://www.ietf.org/rfc/rfc2818.txt
Hospital and Harvard Medical School, Boston,
MA, USA

Synonyms Hub

HTTP secure; Hypertext transfer protocol secure Junhua Zhang


Academy of Mathematics and Systems Science,
Chinese Academy of Sciences, Beijing, China
Definition Key Laboratory of Random Complex Structures and
Data Science, Chinese Academy of Sciences, Beijing,
▶ HTTPS is an encrypted version of the ▶ HTTP China
application-layer Internet protocol for enacting secure National Center for Mathematics and Interdisciplinary
transactions on the ▶ World Wide Web, such as in Sciences, Chinese Academy of Sciences, Beijing,
financial transactions and for confidential corporate China
and other data. HTTPS syntax is identical in all
respects to the standard HTTP protocol except that
resource requests, as well as the returned data, Synonyms
are encrypted by the SSL (Secure Sockets Layer)
or TLS (Transport Layer Security) protocols using Date hub; Party hub
Human T-Lymphotropic Virus Type-I-associated Myelopathy/Tropical Spastic Paraparesis 927 H
Definition
Human Leukocyte Antigens (HLA)
Complex networks (including most biological networks)
are known as scale-free networks and are characterized ▶ Major Histocompatibility Complex (MHC),
by a power-law degree distribution. This means that Applications
most nodes of the network have a lower degree, whereas
a small percentage of nodes possess a large number of
links in the network. These high-degree nodes are called
hubs. In protein-protein interaction (PPI) networks hubs
represent proteins with a large number of interactions, Human Proteome Organisation
called hub proteins. In ▶ gene regulatory networks
sometimes a single ▶ transcription factor can regulate Sandra Orchard
many target genes; such a ▶ transcription factor is called EMBL Outstation, European Bioinformatics Institute,
a regulatory hub. Functional analysis showed that hubs Hinxton, Cambridge, UK
were enriched in kinase and adaptor domains acting
primarily in signal transduction (▶ Signal Transduction H
Pathway) and ▶ transcriptional regulation, whereas Synonyms
non-hubs had more DNA-binding domains and
were involved in catalytic activity. Moreover, hub HUPO
proteins were more likely to be essential than non-hub
proteins.
Based on a computational analysis of yeast microar- Definition
ray data, Han et al. (2004) proposed two types of hubs,
i.e., party hubs and date hubs. Date hubs display low The Human Proteome Organisation (www.hupo.org) is
▶ co-expression with their partners, while party hubs an international scientific organization representing
have high co-expression. These two kinds of hubs were and promoting proteomics through international
further discussed in Batada et al. (2006) and Agarwal cooperation and collaborations by fostering the
et al. (2010). development of new technologies, techniques, and
training. The organization organizes an annual con-
gress and a number of initiatives, largely aimed at
identifying the proteome content of a number of
References healthy and diseased tissues and cell types.

Agarwal S, Deane CM, Porter MA, Jones NS. Revisiting


date and party hubs: novel approaches to role assignment in
protein interaction networks. PLoS Comput Biol. 2010;6:
e1000817
Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Human T-Lymphotropic Virus
et al. Stratus not altocumulus: a new view of the yeast protein
interaction network. PLoS Biol. 2006;4:e317
Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF,
▶ HTLV, Cellular Transcription
et al. Evidence for dynamically organized modularity in
the yeast protein-protein interaction network. Nature.
2004;430:88–93

Human T-Lymphotropic Virus


Type-I-associated Myelopathy/Tropical
Spastic Paraparesis
Human Clock
▶ Human T-Lymphotropic Virus Type-I-associated
▶ Circadian Rhythm Myelopathytropical Spastic Paraparesis
H 928 Human T-Lymphotropic Virus Type-I-associated Myelopathytropical Spastic Paraparesis

Cross-References
Human T-Lymphotropic Virus
Type-I-associated Myelopathytropical ▶ HTLV, Cellular Transcription
Spastic Paraparesis

Christian Sch€onbach References


Department of Bioscience and Bioinformatics, Kyushu
Institute of Technology, Iizuka, Fukuoka, Japan Culcea E, Sandbrink F (2009) Tropical Myeloneuropathies.
eMedicine. http://emedicine.medscape.com/article/1166055-
overview
Irish BP, Khan ZK, Jain P, Nonnemacher MR, Pirrone V,
Synonyms Rahman S, Rajagopalan N, Suchitra JB, Mostoller K,
Wigdahl B (2009) Molecular mechanisms of neurodegener-
ative diseases induced by human retroviruses: a review. Am
Human T-lymphotropic virus type-I-associated
J Infect Dis 5(3):231–258
myelopathy/tropical spastic paraparesis; HAM/TSP Roucoux DF, Murphy EL (2004) The epidemiology and disease
outcomes of human T-lymphotropic virus type II. AIDS
Rev 6(3):144–154
Definition

HAM/TSP is a slow-onset, chronic-progressive central


nervous system disease that develops in less than
5% of HLTV-1 infected individuals and leads to HUPO
corticospinal tract degeneration. In endemic areas,
the onset of HAM/TSP is biased toward females at ▶ Human Proteome Organisation
a 3:1 ratio and occurs between 40 and 50 years of age
(Culcea and Sandbrink 2009). The proviral load
and genetic background influence the disease onset
and progression. Yet, the molecular mechanisms
of onset and progression are not fully understood. Hybrid Simulation Strategies
Proinflammatory mediators found in lesions are asso-
ciated with the infiltration of cytotoxic CD8+ T-cell in Ruiqi Wang
the central nervous system. In addition, high proviral Institute of Systems Biology, Shanghai University,
load and the presence of extracellular Tax1 have been Shanghai, China
implicated in the initiation of TNFA-mediated destruc-
tion of neuronal cells (Irish et al. 2009). The inflam-
mation of the spinal cord causes spastic paraparesis Definition
of both legs, impaired position sense of the
feet, lower lumbar pain, hyper-reflexia of upper Hybrid simulation strategies aim to combine different
extremities, and urinary incontinence (Culcea and approaches into one calculation scheme. The essential
Sandbrink 2009). The inflammation can trigger idea is to partition reactions or species into two or more
autoimmune conditions such as uveitis and myosi- groups: e.g., a group of low-copy number species and
tis. Infections with HTLV-2 are also associated a group of high-copy number species, and then treat
with HAM/TSP, but symptoms were reported to them in different ways. For reactions, a system can be
be milder and the progression slower. Ataxic decomposed into two subsystems containing fast and
HAM is only associated with HTLV-2 (Roucoux slow reactions, respectively. Fast reactions often
and Murphy 2004). Treatment of HAM/TSP with involve high-copy number species, e.g., in metabo-
INFA2 and IFNB1 ameliorates symptoms and slows lism. Slow reactions or reactions involving low-copy
the progression of the disease during the course of number species can frequently be found in signal trans-
the therapy. Effective HAM/TSP therapies or duction or gene expression systems. The two subsys-
HTLV vaccines are not available. tems are then simulated by using different methods,
Hypergeometric Distribution 929 H
e.g., exact methods and approximate algorithms, Cross-References
respectively.
In the slow/discrete subset of reactions, the fast ▶ Langevin Equation
subset evolves due to the action of the fast reactions, ▶ Law of Mass Action
which means that, during that time, the behavior of fast ▶ Master Equation
subset of reactions can be approximated by using ordi- ▶ Stochastic Simulation Algorithms
nary differential equations (ODEs), stochastic differ-
ential equations (SDEs), or approximate stochastic
simulation methods, independent of the slow subset.
Therefore, hybrid simulation strategies can be References
roughly divided into discrete/ODE method, maximal
Alfonsi A, Cances E, Turinici G, Ventura BD, Huisinga
timestep algorithm, and discrete/Langevin method, W (2005) Exact simulation of hybrid stochastic and
depending on how the high-copy number reactions deterministic models for biochemical systems. ESAIM Proc
are approximated. 14:1–13
Chen L, Wang R, Li C, Aihara K (2010) Modeling biomo-
The discrete/ODE method begins with
lecular networks in cells: structures and dynamics.
a partitioning of species and combining discrete event Springer, London H
simulations with ODE models. This method approxi- Puchalka J, Kierzek AM (2004) Bridging the gap between
mates the high-copy number species with continuous stochastic and deterministic regimes in the kinetic simula-
tions of the biochemical reaction networks. Biophys
deterministic techniques. The basic idea is that an ODE
J 86:1357–1372
integration step will take place on the assumption Salis H, Kaznessis Y (2005) Accurate hybrid stochastic simula-
that no discrete reaction takes place. If discrete reac- tion of a system of coupled chemical or biochemical reac-
tion has occurred, the time of the event should be tions. J Chem Phys 122:054103
van Kampen NG (1981) Stochastic processes in physics and
identified, the discrete updating should take place,
chemistry. Elsiever, Amsterdam
and the continuous variables should be updated over Wilkinson DJ (2006) Stochastic modelling for systems biology.
this shorter timestep. Chapman & Hall/CRC, Boca Raton
The hybrid methods also use a combination of an
exact updating procedure for the low concentration spe-
cies with various approximate simulation methods, e.g.,
t-leap method, for the other species. There are various
exact/approximate simulation methods, e.g., maximal Hypergeometric Distribution
timestep method proposed by Puchalka and Kierzek,
which combines the next reaction method with t-leap Yong Wang
method. Academy of Mathematics and Systems Sciences,
Other methods also use a combination of stochastic Chinese Academy of Sciences, Beijing, China
simulation and numerical integration of SDEs. In these
hybrid simulation methods, slow reactions are simu-
Synonyms
lated based on discrete updating, while fast reactions
are updated based on Langevin approach. An auto-
Binomial distribution without replacement
matic partitioning and dynamic repartitioning of fast
and slow reactions has been proposed by Salis and
Kaznessis in 2005. Definition
Actually, many different hybrid simulation
strategies have been proposed. They integrate In probability theory and statistics, the hypergeometric
different partitioning policy and various exact/ distribution is a discrete probability distribution
approximate simulation techniques. Hybrid simula- that describes the number of successes in a sequence
tion strategies are important because of the exis- of a number of draws from a finite population without
tence of very complex and heterogeneous models replacement. So in essence the hypergeometric
that integrate signaling, metabolism, and gene distribution is the binomial distribution without
expression. replacement.
H 930 Hypergraph Theory

Suppose we have the following hypergeometric


experiment: Hypertext and Hypermedia
• N: The number of items in the population
• N2: The number of items in the population that are Tim Clark
classified as successes Department of Neurology, Massachusetts General
• N1: The number of items in the sample Hospital and Harvard Medical School, Boston,
Then random variable X, the number of items in the MA, USA
sample that are classified as successes, follows
a hypergeometric distribution:
   Definition
N1 N  N1
i N2  i Hypertext is a nonlinear system of digital text
PðX ¼ iÞ ¼  
N organization, conceptually similar to footnotes, in
N2 which text contains pointers, called “hyperlinks,” to
  other text segments, which may be accessed by acting
N (e.g., mouse clicking) on the pointers. These other text
Here is the number of combinations of
N2 segments may be within the same document, or not.
N things, taken N2 at a time. Hypermedia is the multi-media generalization of
hypertext. It forms the conceptual basis for naviga-
References tional user experience on the ▶ World Wide Web.
Ted Nelson coined the terms “hypertext” and “hyper-
http://en.wikipedia.org/wiki/Hypergeometric_distribution media” and was an early advocate, developing the Xan-
adu system concept. Douglas Engelbart was one of the
first to actually explore and demonstrate hypertext and
hypermedia computer systems, resulting in his famous
Hypergraph Theory online multimedia “NLS” system demonstration of
December 9, 1968, at the Stanford Research Institute.
▶ Subset Surprisology and Toponomics
(Engelbart received the 1990 ACM Software Systems
Award along with William K. English for NLS.) These
and other researchers were initially inspired by the vision
Hyperplasia of Vannevar Bush in his July 1945 Atlantic Monthly
article, “As We May Think” (Nelson 1965; Engelbart
Barbara J. Davis and English 1968; Conklin 1987).
Section of Pathology, Tufts Cummings School Hypertext and Hypermedia are the subjects of sig-
of Veterinary Medicine Biomedical Sciences, nificant ongoing research as topics in their own right,
North Grafton, MA, USA on which the Association for Computing Machinery
(ACM) sponsors annual conferences (http://ht2010.
org/, http://www.ht2011.org/, etc.). Many develop-
Definition ments in distributed hypermedia that were origi-
nally omitted from the simplified initial design of
Hyperplasia is an excessive, orderly growth in which the web, such as stand-off link services, are now
cells have a “normal increase in number,” maintain being addressed and brought back into currency
uniformity and polarity, and stop growing after on the modern Web, through ▶ semantic web
cessation of the stimulus that evoked the growth. technologies.

Cross-References Cross-References

▶ Cancer Pathology ▶ World Wide Web


Hypothesis Testing 931 H
References algorithm A takes D as an input and produces
a function (model, hypothesis) f 2 H F as an output,
Conklin J (1987) Hypertext: an introduction and survey. Com- where H is the hypothesis space. This subset is
puter 20(9):17–41
determined by the formalism used to represent models
Engelbart DC, English WK (1968) A research center for
augmenting human intellect. In: Proceedings of the (e.g., as logical formulas, linear functions, or
December 9–11, 1968, fall joint computer conference, non-linear functions implemented as artificial neural
part I, 1476645. ACM, San Francisco, pp 395–410 networks or ▶ decision trees). Thus, the choice of the
Nelson TH (1965) Complex information processing: a file struc-
hypothesis space produces a representation bias,
ture for the complex, the changing and the indeterminate. In:
ACM ’65, proceeding of the 1965 20th national conference. which is part of the algorithm’s ▶ inductive bias.
ACM Press, New York

Cross-References

Hypertext Markup Language ▶ Classification


▶ Decision Tree
▶ HTML ▶ Inductive Bias
H

Hypertext Transfer Protocol Hypothesis Testing

▶ HTTP Roger Higdon


Seattle Children’s Research Institute,
Seattle, WA, USA

Hypertext Transfer Protocol Secure


Synonyms
▶ HTTPS
Frequentist hypothesis testing; Parametric hypothesis
testing

Hypothesis Space
Definition
Eyke H€ullermeier, Thomas Fober and
Marco Mernberger Conventional statistical hypothesis testing validates
Philipps-Universit€at Marburg, Marburg, Germany whether a hypothesis about a quantity of interest
(a parameter) is true or false based on the likelihood
that the observed data could have been generated if the
Definition hypothesis were true.

In machine learning, the goal of a supervised learning


algorithm is to perform induction, i.e., to generalize Characteristics
a (finite) set of observations (the training data) into
a general model of the domain. In this regard, the Statistical hypothesis testing has a long history
hypothesis space is defined as the set of candidate (Fisher 1925 and Neyman and Pearson 1933). Hypoth-
models considered by the algorithm. esis testing begins with a question about a parameter or
More specifically, consider the problem of learning parameters of interest that describe aspects of a popula-
a mapping (model) f 2 F ¼ Y X from an input space X tion of interest. The parameters define the probability
to an output space Y, given a set of training distribution for generating data from the population. In
data D ¼ fðx1 ; y1 Þ; :::; ðxn ; yn Þg X Y. A learning order to test a hypothesis about a parameter, a sample of
H 932 Hypothesis Testing

Hypothesis Testing, Table 1 Outcomes of a statistical Given the above definitions, the basic steps of
hypothesis test a statistical hypothesis test are as follows:
Fail to reject H0 Reject H0 1. The first step is to describe the null and alternative
H0 true Correct Type I error hypotheses that relate to the research objective.
H0 false Type II error Correct This is important so that the results of the hypothe-
sis test are relevant to the research objective.
Specifically, the null hypothesis needs to be defined
data is taken from the population and a test statistic is in such a way that its rejection allows for the con-
calculated from the data (Lehmann and Romano 2005). clusion that research objective has been achieved.
Below are a number of definitions associated with For example, rejecting the null hypothesis that the
hypothesis testing. difference in means between a treatment and con-
trol is 0 allows for the conclusion that the treatment
• Simple hypothesis – A hypothesis based on a single had an effect.
parameter value (i.e., y ¼ 0) 2. The second step is to consider the statistical
• Composite hypothesis – A hypothesis based on assumptions being made about the sample in
a range of parameter values (i.e., y > 0) doing the test. Is the data continuous or discrete?
• Null hypothesis (H0) – A simple hypothesis associ- Are the data independent? Are they paired? Col-
ated with the theory one would like to disprove lected over time? Is the sample random? This will
(status quo) help determine which methods to use and whether
• Alternate hypothesis (HA) – A hypothesis (often results are valid or generalizable.
composite) associated with a theory one would 3. Determine the appropriate test statistic given the
like to prove assumptions.
• Test statistic – A function of the data with which 4. Derive the distribution of the test statistic under the
decisions about the hypothesis are made null hypothesis based on the assumptions. In most
• Decision rule – Values of the test statistic that cases, there exists a test statistic with a known
lead to rejecting or failing to reject the null distribution, such as the two-sample t-test, ANOVA
hypothesis F-tests, chi-squared tests for independence, t-tests for
• Acceptance region – The set of values for the test regression parameters, and so on.
statistic which we fail to reject the null hypothesis 5. The distribution of the test statistic and the level of
• Rejection region – The set of values for the test significance create acceptance and rejection regions.
statistic which we reject the null hypothesis 6. Compute the value of the test statistic.
• Critical value(s) – The value(s) on the border of the 7. Decide to either fail to reject the null hypothesis or
acceptance and rejection regions reject it in favor of the alternative hypothesis.
In a statistical hypothesis test, either H0 or HA will One problem with this approach is that it leads to an
be true, and either H0 will be rejected or we will fail to absolute reject or does not reject outcome and fails to
reject it. This leads to the four possible outcomes of separate uncertain conclusions from definitive ones. It
a statistical hypothesis test shown in Table 1. also leads to arbitrary thresholds of significance such
These outcomes lead to two types of errors and the as 0.05. Calculating a p-value improves upon this by
probability of those errors informs the decision rule providing a quantitative scale rather than absolute
defined by a statistical hypothesis test. Below are the decision. P-values near 0 provide definitive statistical
definitions of these errors and probabilities. proof against H0, while values near alpha imply evi-
• Type I error – Wrongly rejecting H0 dence against H0 but are uncertain. Finally p-values
• Type II error – Wrongly not rejecting H0 near 1 would give no reason to believe the null hypoth-
• Level of significance/size of the test (a) – The esis was not true. Also calculating p-values is simple
probability of a type I error since they are based on the distribution of test statistic
• Power – one minus the probability of a type II error under the null hypothesis. All this still assumes that
for a given parameter value in Ha (1-b) people properly interpret p-values rather than use them
• p-value – smallest alpha value for which H0 is as another way to threshold and that they are calculated
rejected in a valid manner (Jones and Tukey 2000).
Hypothesis Testing, Bayesian vs Frequentist 933 H
Another issue that is not addressed by this approach is testing, is about comparing the prior knowledge
practical significance versus statistical significance. If the about research hypothesis to posterior knowledge
power of a test is very high, then often statistical signif- about the hypothesis rather than accepting or
icance is achieved without any practical significance. If rejecting a very specific hypothesis based on the
the power of the test is low, then statistical significance experimental data.
may be difficult to achieve no matter how large the
observed test statistic is. Power for a specific test versus
a specific simple alternative hypothesis is generally con- Characteristics
trolled by the sample size; so it is important to choose
a large enough sample size to achieve statistical signifi- Just as with Bayesian inference, Bayesian hypothesis
cance for practically significant alternative hypotheses, testing is based on posterior distribution (Box and Tiao
but not so large that it yields statistical significance for 1973):
uninteresting results thereby wasting resources.
Other issues to consider in hypothesis testing, par- pðyjXÞ ¼ pðXjyÞpðyÞ
ticularly with respect to systems biology, are the
impact of multiple hypothesis testing and the validity A p-value-like quantity can be generated for H
of results when assumptions are violated. If violating a one-side hypothesis like y < 0 by integrating the
assumptions is a concern, then use of nonparametric posterior:
hypothesis testing should be considered. ð0
p¼ pðyjXÞ
Cross-References 1

▶ Hypothesis Testing, Parametric vs Nonparametric However, for a two-sided hypothesis testing


▶ Multiple Hypothesis Testing y ¼ 0 it is not feasible since the integral for
a continuous posterior would be 0. An ad hoc
solution would be to integrate in a region around
References 0 that would represent a region of no practical
significant difference:
Fisher RA (1925) Statistical methods for research workers.
Oliver and Boyd, Edinburgh, p 43
Jones LV, Tukey JW (2000) A sensible formulation of the ða
significance test. Psychol Methods 5(4):411–414 p¼ pðyjXÞ
Lehmann EL, Romano JP (2005) Testing statistical hypotheses,
a
3Eth edn. Springer, New York
Neyman J, Pearson ES (1933) On the problem of the most
efficient tests of statistical hypotheses. Philos Trans R Soc An alternative to direct integration of the
Ser A 231:289–337 posterior is the use of Bayes factors (Kass and
Raftery 1995). The Bayes factor is the ratio of
posterior odds of the null hypothesis (y ¼ y0)
versus an alternative (y ¼ y1) to the prior odds,
Hypothesis Testing, Bayesian vs
where the posterior odds is the ratio of posterior
Frequentist
distributions for the two hypotheses and the prior
odds is likewise defined.
Roger Higdon
The Bayes factor is the analog to the frequentist
Seattle Children’s Research Institute,
likelihood ratio test. In fact, if the prior odds is equal to
Seattle, WA, USA
1, then they are equivalent. Large Bayes factors would
favor the null hypothesis while small Bayes factors
Definition would favor the alternative. Bayes factors do not explic-
itly make probability statements and so tend to rely on
Bayesian hypothesis testing, similar to Bayesian rules of thumb for significance. Also, they can be diffi-
inference and in contrast to frequentist hypothesis cult to calculate.
H 934 Hypothesis Testing, Parametric vs Nonparametric

More formal comparisons of models can be carried Hypothesis Testing, Parametric vs Nonparametric,
out using the Bayesian Decision Theory, where cost Table 1 Statistical tests and their nonparametric analogs
functions are added to the Bayesian model (Berger Parametric test Nonparametric test
1985). One sample T-test Sign-rank test
Two-sample T-test Rank-sum or Mann-Whitney
test
One-way ANOVA Kruskal-Wallis test
Cross-References
One-way randomize complete Friedman test
block ANOVA
▶ Bayesian Inference Pearson correlation Spearman rank correlation or
▶ Hypothesis Testing Kendall’s tau

References

Berger JO (1985) Statistical decision theory and Bayesian anal- modeled by standard probability distributions
ysis, 2nd edn. Springer, New York
(Gibbons and Chakraborti 2003 and Wasserman
Box GEP, Tiao GC (1973) Bayesian inference in statistical
analysis. Wiley, New York 2007). Most commonly this occurs when there is
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc doubt about the data having a normal distribution.
90(430):791 The most commonly used nonparametric tests are
based upon substituting ranks for the original data.
Table 1 gives nonparametric analogs to many com-
monly used statistical hypothesis tests.
There are many other tests such as the Kolmogorov-
Hypothesis Testing, Parametric vs Smirnov test for comparing two distributions or the
Nonparametric Wald-Wolfowitz runs test for randomness.
An alternative to rank tests is to apply randomiza-
Roger Higdon tion (Edgington 1995), permutation, or other
Seattle Children’s Research Institute, resampling approaches such as the bootstrap (Efron
Seattle, WA, USA 1982) to conventional test statistics. These approaches
utilize the random assignment of the data to treatment
groups or explanatory variables in order to estimate
Synonyms p-values. Randomization tests are based on calculating
all possible randomizations of the data without
Distribution-free tests; Nonparametric tests; Rank tests replacement. A permutation test approximates a
randomization test by taking random permutations of
the data with replacement. Bootstrapping resamples
Definition the original data with replacement. Randomization
tests are commonly used in systems biology due
Nonparametric hypothesis testing, in contrast to to the doubt about parametric assumptions in
parametric hypothesis testing, does not rely on biological data. Some common examples are the
assumptions that data come from a particular parame- significance analysis of microarrays tests in relative
terized distribution such as a normal distribution. expression analysis (Tusher 2001) and gene set
enrichment tests.
Nonparametric tests offer a number of advan-
Characteristics tages including protection from the violation of
distributional assumptions, increased power when
Nonparametric or distribution-free tests are widely assumptions are violated, and the ability to
used for statistical hypothesis testing, particularly preserve complex correlation structures. Disadvan-
when there is doubt as to whether data can be easily tages include decreased power when distributional
Hypoxia-inducible Factor-1 935 H
assumptions can be met, requirements for large embryonic development for the formation of multiple
sample sizes, and difficulty in applying tests to tissues including the central nervous system and the
complex statistical models. vasculature. Hypoxia also occurs in diseases such
as cardiac ischemia and cancer; therefore, targeting
signaling pathways initiated by hypoxia represents
Cross-References a promising therapeutic strategy for these diseases
(Cassavaugh and Lounsbury 2011).
▶ Hypothesis Testing
▶ Relative Expression Analysis
Cross-References

▶ Regulation of Tumor Angiogenesis


References

Edgington ES (1995) Randomization tests, 3rd edn.


Marcel-Dekker, New York References
Efron B (1982) The jackknife, the bootstrap, and other
resampling plans. Soc of Ind and Appl Math CBMS-NSF Cassavaugh J, Lounsbury KM (2011) Hypoxia-mediated biolog- H
Monogr 38 ical control. J Cell Biochem 112(3):735–744
Gibbons JD, Chakraborti S (2003) Nonparametric statistical Semenza GL (2009) Regulation of oxygen homeostasis by
inference, 4th edn. CRC Press, Boca Raton hypoxia-inducible factor 1. Physiology (Bethesda) 24:
Tusher V, Tibshirani R, Chu G (2001) Significance analysis of 97–106
microarrays applied to the ionizing radiation response. Proc
Natl Acad Sci 98:5116–5121
Wasserman L (2007) All of nonparametric statistics. Springer,
New York
Hypoxia-inducible Factor-1

Marsha A. Moses
Department of Surgery/Harvard Medical School,
Hypoxia Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA
Marsha A. Moses
Department of Surgery/Harvard Medical School,
Vascular Biology Program/Children’s Hospital Definition
Boston, Boston, MA, USA
Hypoxia-inducible factor-1 (HIF-1) is a dimeric protein
complex comprises of two subunits, HIF-1a and
Definition HIF-1b, both of which are members of the bHLH-PAS
family of transcription factors. It regulates the transcrip-
Hypoxia refers to the reduced oxygen tension caused tion of many genes whose functions are critical for the
by low oxygen availability or insufficient oxygen survival of cells under hypoxic conditions. Under
delivery within an organism. Under hypoxic condi- normoxic conditions, HIF-1a is hydroxylated by prolyl
tions, the organism must exert a series of adaptive hydroxylase domain–containing (PHD) proteins loaded
responses to ensure cellular survival, such as changes with oxygen as cofactors. The hydroxylated HIF-1a
in metabolism, regulation of pH, increased red blood then interacts with the von Hippel-Lindau tumor
cell production and oxygen transport, and initiation of suppressor protein (VHL), a member of the E3
angiogenesis (Cassavaugh and Lounsbury 2011). ubiquitination complex, and is subjected to protein deg-
These processes are mediated by dimeric protein radation by the 26s proteasome. Under hypoxic condi-
complexes, the hypoxia-inducible factors, which tions, the activity of PHD is decreased due to low
regulate the transcription of many genes that are crit- availability of oxygen in the cells. Therefore, HIF-1a
ical for these cellular responses (Semenza 2009). Hyp- is relieved from protein degradation, translocates
oxia-induced cellular signaling is essential during to the nucleus, and forms a dimer with HIF-1b.
H 936 Hysteresis

There, the HIF-1 complex binds to the hypoxia response References


element (HRE) within the promoters of its target genes
and activates their gene transcription (Semenza 2007). Levy AP, Levy NS, Iliopoulos O, Jiang C, Kaplin WG Jr, Goldberg
MA (1997) Regulation of vascular endothelial growth factor by
Due to its ability to upregulate the transcription of
hypoxia and its modulation by the von Hippel-Lindau tumor
proangiogenic factors such as vascular endothelial suppressor gene. Kidney Int 51(2):575–578
growth factor (VEGF), HIF-1 is one of the key Semenza GL (2007) Hypoxia-inducible factor 1 (HIF-1)
mediators of hypoxia-induced angiogenesis (Levy pathway. Sci STKE 407:cm8
et al. 1997).

Cross-References Hysteresis

▶ Regulation of Tumor Angiogenesis ▶ Cell Cycle Dynamics, Irreversibility


I

ICS its domain of definition. A parameter that is structurally


identifiable may still be practically non-identifiable.
▶ Interacting Cell Systems This can arise due to insufficient amount and/or quality
of experimental data and inappropriately chosen mea-
surement time points. It manifests in a confidence inter-
val that is infinite, although the likelihood (▶ Maximum
Ideation Likelihood Estimate) has a unique minimum for this
parameter. It is structurally and practically identifiable,
▶ Computational Creativity if the confidence interval of its estimate has finite size.
An approach for identifiability analysis (▶ Structural and
Practical Identifiability Analysis) utilizing the profile
likelihood was proposed by Raue et al. (2009) that, at
Identifiability the same time, enables to calculate likelihood-based
confidence intervals (Murphy and van der Vaart 2000).
Andreas Raue1 and Jens Timmer1,2,3
1
Institute for Physics, University of Freiburg, Freiburg,
Germany Cross-References
2
BIOSS Centre for Biological Signalling Studies and
Freiburg Institute for Advanced Studies (FRIAS), ▶ Confidence Intervals
Freiburg, Germany ▶ Maximum Likelihood Estimate
3
Department of Clinical and Experimental Medicine, ▶ Structural and Practical Identifiability Analysis
Link€ oping University, Link€ oping, Sweden

Definition References

Murphy SA, van der Vaart AW (2000) On profile likelihood.


Identifiability indicates if a parameter of a mathematical
J Am Stat Assoc 95(450):449–485
model can be estimated from experimental data. If Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M,
a parameter is structurally non-identifiable, the structure Klingm€ uller U, Timmer J (2009) Structural and practical
of the model in combination with the available quantities identifiability analysis of partially observed dynamical
models by exploiting the profile likelihood. Bioinformatics
that are accessible by experiments prevents that its value
25(15):1923–1929
be estimated (Walter 1987). ▶ Confidence intervals of Walter E (1987) Identifiability of parametric models. Pergamon,
a structurally non-identifiable parameter are infinite on Elmsford

W. Dubitzky et al. (eds.), Encyclopedia of Systems Biology, DOI 10.1007/978-1-4419-9863-7,


# Springer Science+Business Media LLC 2013
I 938 Identification of Candidate T Cell Antigenic Peptides

correlation coefficient is used, the score between gene


Identification of Candidate T Cell x and gene y is:
Antigenic Peptides
P
ð1=mÞ m i¼1 ðxi  x Þðyi  yÞ
▶ T Cell Epitope, Prediction with Peptide Libraries scoreðx;yÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pm qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ;
ð1=mÞ i¼1 ðxi  xÞ ð1=mÞ m 2
i¼1 ðyi  y Þ2

where x ¼ (x1, x2,  , xm) and y ¼ (y1, y2,  , ym) are the
Identification of Gene Regulatory m–dimensional observations (e.g., the expression level
Networks, Machine Learning of m time points or conditions) of gene x and y. The two
genes whose correlation score is larger than a predefined
Zhong-Yuan Zhang threshold are considered to have an interaction. Other-
School of Statistics, Central University of Finance and wise, they are considered to be marginal independence
Economics, Beijing, China and there is no edge between them.
Note that there may be pseudo-associations or false-
positive correlations in relevance networks. For exam-
Definition ple, the genes that are regulated by a common regulator
may have similar expression patterns and thus a high
Machine learning (ML) is a branch of artificial intelli- correlation score, but they do not necessarily have
gence (AI). ML is concerned with the design and a direct association.
development of algorithms and computer programs,
which make computers (machine) “learn” from expe- Graphical Gaussian Networks
rience by analyzing the datasets available. Graphical Gaussian model (GGM, Crzegorczyk et al.
Machine learning algorithms can be divided into 2008; Hache et al. 2009; Werhli et al. 2006) is an
four communities according to the desired outcome undirected graph whose nodes are genes and two genes
of them: (1) supervised learning, (2) semi-supervised are linked by an edge if there is an interaction between
learning, (3) unsupervised learning, and (4) reinforce- them. The interactions are measured by the partial corre-
ment learning. It has a wide variety of applications in lation coefficients conditioned on all the other genes. After
finance, text, biology, and etc. the correlation coefficients are determined, a statistical
In this entry, we particularly focus on ML approaches significance test is employed and the two genes whose
to reverse engineering of gene regulatory networks. We score is significantly large are considered to have an
describe the most widely used types of ML algorithms, interaction. Otherwise, they are considered to be condi-
including relevance network, graphical Gaussian net- tional independence and there is no edge between
work, Boolean network, Bayesian network, and network them. Under the assumption that the data are distributed
component analysis, which apply in reconstructing gene according to a multivariate Gaussian distribution, the par-
regulatory networks. tial correlation coefficient of gene x and gene y is given as:

C1 xy
Characteristics scoreðx; yÞ ¼  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
Cxx C1
1
yy

Relevance Networks
1
Relevance network (Butte and Kohane 2003; where C1
xy is the element of C , inverse of the covari-
Crzegorczyk et al. 2008; Hache et al. 2009; Werhli ance matrix C of the data.
et al. 2006) is an undirected graph and allows loops.
The nodes are genes and two genes are linked by an Boolean Networks
edge if there is a regulatory interaction between them. Maybe the simplest way to model gene regulatory
The interactions are measured by some association coef- network as a directed graph is Boolean network.
ficients such as Pearson correlation, Spearman correla- A Boolean network model (Jong 2002; Kauffman
tion, or mutual information. For example, if the Pearson 1969; L€ahdesm€aki et al. 2003) is composed of
Identification of Gene Regulatory Networks, Machine Learning 939 I
X T T+1 1 1 0
Z X
0 0
1 1 0 1 1
1 0 1
1 0 0
T T+1 T T+1
X Y X Y Z
Y Z
0 0 0 0 0
0 1 1 1 1 1
1 1
1 0 0
1 1 1

Identification of Gene Regulatory Networks, Machine fz(0,1) ¼ 0; fz(1,1) ¼ 1. Right: The states trajectories of two
Learning, Fig. 1 Left: An example of Boolean network. X, Y, particular realizations of the Boolean network. The top trajectory
and Z are three genes. Their Boolean functions are given by the falls into a cyclic attractor, and the bottom one falls into a point
truth tables placed next to them. For example, the Boolean attractor
function fz of gene Z at time T + 1 is: fz(0,0) ¼ 0; fz(1,0) ¼ 0;

A I
P (A), P (B|A), P (D|A), P (C|B)

B D A P(A) A B P (B|A) A D P (D|A)


−1 0.1 −1 −1 0.1 −1 1 0.1
0 0.2 0 −1 0.1 0 1 0.2
C 1 0.7 1 −1 0.8 1 1 0.7
J N N J J
Identification of Gene Regulatory Networks, Machine means
N inhibited.
N ! means
J is activated by and ┤
Learning, Fig. 2 An example of Bayesian network. Left: means is inhibited by . Right: The conditional distri-
Graph of the Bayesian model. Each gene can take one of the bution of each gene. The tables list a subset of the conditional
following three states: 1 means activated, 0 means silent and 1 possibilities

two parts: (1) A directed and loops-allowed graph Bayesian Networks


with N nodes representing the genes where each Bayesian network model is composed of two parts:
gene is connected with only K other genes (hence, (1) a directed acyclic graph (DAG), and (2) the condi-
Boolean network is also called N-K model). Each tional distribution of each gene (Crzegorczyk et al. 2008;
gene only has two states: 1 (on) means expressed Djebbari and Quackenbush 2008; Friedman et al. 2000;
and 0 (off) means silent. (2) Boolean functions and Jong 2002; Werhli et al. 2006). In the graph, the nodes
states of genes, where the state of each gene i, represent genes and the edges represent conditional
i ¼ 1, 2,  , N, is modeled as the output of Boolean dependence of the genes. In other words, genes A and
function fi with K (or at most K ) inputs specified by B are connected by edge ! if B is activated by A and ┤ if
the graph. Once the network and the Boolean func- B is inhibited by A. Genes are conditionally independent
tions are determined, the model’s state at any dis- if they are not connected by edges (sometimes, ! is used
crete time step is uniquely determined by the initial whatever activated or inhibited it is since the Bayesian
assignment of the states of the genes. Since the model has the second part). For instance, in Fig. 2, the
number of the possible states is finite, the model probability of gene B being inhibited given that gene A is
will inevitably fall into an attractor at last (point activated is 0.8 (Fig. 2).
attractor or cyclic attractor, See Fig. 1). Note that there may be more than one DAGs satisfy-
To find the Boolean networks for real gene expres- ing the same conditionally independent assumptions
sion dataset, many algorithms have been developed. among the variables and these DAGs are said to
I 940 Identification of Gene Regulatory Networks, Machine Learning

be equivalent. All the equivalent DAGs have the identi-


cal basic undirected graph but may have some edge- initial F

directions being different from each other. Learning − − 0


a Bayesian network or equivalent class from a set of
0 0 −
training data is NP-hard and computationally intractable, Regulatory
Strengths − 0 0
and thus heuristic search strategies, such as hill-climbing
algorithm, have been proposed. 0 − −
Markov chain Monte Carlo (MCMC) simulation is 0 0 −
also employed for Bayesian network structure learn-
ing. Another point is that the DAG structure (i.e., Regulatory Layer Output Layer
acyclic structure) is sometimes inconsistent with the
fact that there may be loops or cyclic structures in the Identification of Gene Regulatory Networks, Machine
gene regulatory networks. Learning, Fig. 3 An example of completely identifiable net-
work in network component analysis. It is a bipartite network.
The expression levels of the genes in output layer and a partial
Network Component Analysis knowledge of the regulatory strengths are available, and NCA
Different from the network models mentioned seeks to identify the intensities of the regulatory factors and
above, network component analysis (NCA, Liao reconstruct the underlying network topology. The initial F sat-
isfies the assumptions of NCA. “–” can be arbitrarily replaced by
et al. 2003) tries to model a gene regulatory net- random nonzero values and finally be identified by NCA
work as a bipartite graph whose vertices can be
divided into two parts: the regulatory factors and • Each regulatory factor can regulate at most
the genes regulated by the factors (indeed, genes n  (r  1) genes, which means the graph should
interact with each other indirectly through their be sufficiently sparse.
products such as RNA or proteins). • G should have full column rank, which means the
The expression data of the genes are available by intensities of each regulatory factor cannot be formu-
experiments and can be formulated as a matrix X of lated as a linear combination of the other intensities.
size n  m in which the ith row represents the ith gene’s In other words, once the above three assumptions
expression level across the m time points (or different are satisfied, the network is identifiable. Figure 3 gives
conditions, or samples). To find the intensities of the an example of completely identifiable network.
regulatory factors and the regulatory strengths, network
component analysis factorizes X into two low rank matri- Conclusion
ces F and G such that X  FGT, where F is of size n  r Though the models mentioned above can partly reveal the
and G is of size m  r, and r  n.m. The jth column of G regulatory structures and give us many insightful conclu-
is the intensities of the jth regulatory factor, and the ith sions, how to reconstruct gene regulatory network still
row of F denotes the sensitivities of the ith gene to the remains a challenging problem. The key difficulty is that
regulatory factors. For example, Fij ¼ 0 means that the the mechanisms of gene regulatory network are very com-
jth TF does not regulate the ith gene. Different from the plex and there are not enough data available. It seems that
traditional matrix factorization models such as principal no single model can explain the problem successfully.
component analysis (PCA) and independent component Please note that there have been several variations and
analysis (ICA), NCA does not have any statistical con- extensions of the models mentioned above which can
straints on F and G, which may not necessarily find improve the reconstruction performance significantly.
biological meaningful components. It takes advantage
of partial connectivity information as a priori knowledge
which is available by experiments and can result in Cross-References
a unique decomposition without considering the scale
problem (Liao et al. 2003) if the following three assump- ▶ Bayesian Network Model
tions are satisfied: ▶ Boolean Networks
• F should have full column rank, which means the ▶ Graphical Gaussian Model
regulatory mode of each factor cannot be formulated ▶ Network Component Analysis
as a linear combination of the other regulatory modes. ▶ Relevance Networks
Identification of Gene Regulatory Networks, Neural Networks 941 I
References regression, systems control, decision making, predic-
tion, etc.
Butte AS, Kohane IS (2003) Relevance networks: a first step toward In this entry, we particularly focus on neural network
finding genetic regulatory networks within microarray data. In:
approaches to reverse engineering of gene regulatory
Parmigiani G, Garett ES, Irizarry RA, Zeger SL (eds) The anal-
ysis of gene expression data. Springer, New York, pp 428–446 networks. We describe three types of ANN models,
Crzegorczyk M, Husmeier D, Werhli AV (2008) Reverse including linear additive network model, recurrent neu-
engineering gene regulatory networks with various machine ral network model, and stochastic neural network, which
learning methods. In: Emmert-Streib F, Dehmer M (eds)
can be applied in reconstructing gene regulatory net-
Analysis of microarray data: a network-based approach.
Wiley-VCH, Weinheim works. Generally speaking, the neural networks are com-
Djebbari A, Quackenbush J (2008) Seeded Bayesian networks: posed of two components: (1) A directed and loops-
constructing genetic networks from microarray data. BMC allowed graph. The nodes represent genes or other mol-
Syst Biol 2(1):57
ecules such as transcriptional factors (TFs) that are
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using Bayes-
ian networks to analyze expression data. In: Proceedings of involved in the regulatory process, and are connected
the fourth annual international conference on computational by directed edges with weight matrix W whose elements
molecular biology, Tokyo, Japan, pp 127–135 denote the connective strengths. A positive weight Wij
Hache H, Lehrach H, Herwig R (2009) Reverse engineering of
means that node i is activated by j while a negative
gene regulatory networks: a comparative study. EURASIP
J Bioinform Syst Biol 25(9):1205–12057 weight Wij means that i is inhibited by j. A zero weight
Jong HD (2002) Modeling and simulation of genetic regulatory Wij implies no interaction. Each gene responds based I
systems: a literature review. J Comput Biol 9(1):67–103 upon its received weighted signals indirectly through
Kauffman SA (1969) Metabolic stability and epigenesis in ran-
activation function. (2) The second component is the
domly constructed genetic nets. J Theor Biol 22:437–467
L€ahdesm€aki H, Shmulevich I, Yli-Harja O (2003) On learning activation function f(x). Obviously the simplest one is
gene regulatory networks under the Boolean network model. the identity f(x) ¼ x. But the most widely used functions
Mach Learn 52:147–167 are sigmoid-shaped functions such as f ðxÞ ¼ 1þe1 x .
Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C,
Once the topology of the network and the activation
Roychowdhury VP (2003) Network component analysis:
reconstruction of regulatory signals in biological systems. functions are determined, ANN can be used to capture
Proc Natl Acad Sci 100(26):15522–15527 the dynamics of the considered gene regulatory systems.
Werhli AV, Grzegorczyk M, Husmeier D (2006) Comparative This is similar to Boolean network model (BN), but BN
evaluation of reverse engineering gene regulatory networks
can only display the qualitative behaviors and cannot
with relevance networks, graphical Gaussian models and
Bayesian networks. Bioinformatics 22(20):2523–2531 give the expression levels of the genes quantitatively.

Characteristics
Identification of Gene Regulatory
Networks, Neural Networks Linear Additive Network Model
In linear additive regulatory model, the regulatory
Zhong-Yuan Zhang interactions are supposed to be linear (D’Haeseleer
School of Statistics, Central University of Finance and et al. 1999; Wessels et al. 2001). The activation
Economics, Beijing, China function in linear model is the identity function
f(x) ¼ x. In other words, given the expression state
vector u(t) where ui(t) is the expression value of gene
Definition i at time t, the model predicts the state vector u(t + Dt)
at the next time point t + Dt as follows:
Artificial neural network (ANN) model, also known as X
connectionist model or parallel distributed processing ui ðt þ DtÞ ¼ Wij uj ðtÞ þ bi ; i ¼ 1; 2;    n; (1)
model, is a branch of artificial intelligence (AI). ANN j

is a product of the study of bionics. It is inspired from or


and mimics the way that the central nervous system X
(CNS) works. There are several types of ANN that ui ðt þ DtÞ  ui ðtÞ ¼ W 0 ij uj ðtÞ þ bi ;
have been widely used for function approximation, j
I 942 Identification of Gene Regulatory Networks, Neural Networks

Identification of Gene gene 1


Regulatory Networks, W1i
Neural Networks, gene 2 gene i activation function
Fig. 1 Schematic diagram of W2i
gene i in the linear additive
network model. The activation output
•••
function is: f(x) ¼ x. And the
expression level of gene i thus
gene n−1 Wn−1i
depends entirely on the sum of
its received weighted inputs
and the bias bi Wni
gene n bias bi

where bi is the gene-specific parameter, and Wij0 ¼ Wij If the degradation rate di of gene ui is included, the
if i 6¼ j and Wii0 ¼ Wii  1. model can be reformulated as:
It can be also rewritten as difference quotient
ui ðt þ DtÞ  ui ðtÞ 1
ui ðt þ DtÞ  ui ðtÞ X 00 ¼ mi  di ui ðtÞ (4)
¼ W ij uj ðtÞ þ b0 i ; Dt 1 þ eðai ri ðtÞþbi Þ
Dt j
In summary, the regulatory interactions of
W0
where Wij00 ¼ Dtij and b0i ¼ Dt
bi
. recurrent neural network model are nonlinear, that
The parameters W and bi, i ¼ 1, 2,   , n can be is, the regulatory input ri(t) is transferred into
found by training the model using the experimental a transcriptional response by the nonlinear sigmoid func-
data available (Fig. 1). tion f ðxÞ ¼ 1þeðaxþbÞ
1
or any other squashing functions
such as the arctan function. To find the parameters, the
Recurrent Neural Network Model weight matrix W, a and b, several algorithms have been
Though the linear additive network model is expected to developed such as particle swarm optimization and
capture several important linear components of gene differential evolution (Xu et al. 2007) (Fig. 2).
regulation, still the model is oversimplified (Hache
et al. 2009; Weaver et al. 1999; Wessels et al. 2001). Stochastic Neural Network Model
The recurrent neural network model extends it by intro- Though noise is considered as the by-product of
ducing the sigmoid function f ðxÞ ¼ 1þeðaxþbÞ
1
; where a life activities, it often plays important roles in life activ-
and b are two gene-specific parameters, as the activation ities, for example, gene regulatory networks are inher-
function, which is in accordance with the common prac- ently noisy (Tian and Burrage 2003). Furthermore, noise
tice in systems biology. In other words, given the expres- contamination in experimental data is inevitable due to
sion state vector u(t) where ui(t) is the expression value of the limitations of technology. Hence, the stochastic
gene i at time t, the model can predict the state vector model conforms to reality better than the classical ones.
u(t + Dt) at the next time point t + Dt : For experimental data with expression levels,
stochastic neural network model introduces Poisson
1 random variables to describe the biochemical process
ui ðt þ DtÞ ¼ ui ðtÞ þ Dt mi ; (2)
1þ eðai ri ðtÞþbi Þ in the gene regulatory networks:

or ui ðt þ DtÞ ¼ ui ðtÞ þ PðDtmi f ðxÞÞ  PðDtdi ui ðtÞÞ;

ui ðt þ DtÞ  ui ðtÞ 1 where f(x) is the sigmoid function of the sum of u0i s
¼ mi ; (3)
Dt 1 þ eðai ri ðtÞþbi Þ received weighted inputs, P(l) is a Poisson random
P variable with mean l and
where ri ðtÞ ¼ j Wij uj ðtÞ þ bi is the regulatory input
of gene i, bi, ai, and bi are the gene-specific parameters lm l
ProbfPðlÞ ¼ mg ¼ e ; m ¼ 0; 1;    :
and mi is the maximal expression value of gene i. m!
IgSF 943 I
Identification of Gene gene 1
Regulatory Networks, W1i
Neural Networks, gene 2 gene i activation function
Fig. 2 Schematic diagram of W2i
gene i in the recurrent neural
network model. The activation output
•••
function is: f ðxÞ ¼ 1þeðaxþbÞ
1

whose value is between gene n-1 Wn-1i


0 and 1. And the expression
level of gene i responds to the
sum of its received weighted Wni
inputs through the squashing
gene n bias bi mi
function f(x)

For experimental data with normalized expression References


levels, the exponential random variables are
introduced: D’Haeseleer P, Wen X, Fuhrman S, Somogyi R (1999) Linear
modeling of mRNA expression levels during CNS develop-
ment and injury. In Proceedings of the Pacific symposium on
I

ui ðt þ DtÞ ¼ ui ðtÞ þ EðDtm 
i f ðxÞÞ  EðDtdi ui ðtÞÞ; biocomputing, vol 4, pp 41–52
Hache H, Lehrach H, Herwig R (2009) Reverse engineering of
where Ē(l) is an exponential random variable with gene regulatory networks: a comparative study. EURASIP
J Bioinform Syst Biol 2009:617281
mean l and Tian T, Burrage K (2003) Stochastic neural network models for
Z x
gene regulatory networks. In: CEC 2003: The 2003 congress
on evolutionary computation. Proceedings, Canberra, 8–12

ProbfEðlÞ<xg ¼ l1 el1 x dx; x > 0; Dec 2003. IEEE Computer Society, Piscataway, pp 162–169
0
Weaver DC, Workman CT, Stormo GD (1999) Modeling regu-
latory networks with weightmatrices. Proc Pac Symp
where l1 ¼ 1/l. Biocomput 4:112–123
Wessels LF, Van Someren EP, Reinders MJ (2001) A comparison
Conclusion of genetic network models. Proc Pacific Symp Biocomput,
vol 6, pp 508–519
Though the models mentioned above can partly Xu R, Venayagamoorthy GK, Wunsch DC II (2007) Modeling of
reveal the regulatory structures and give us many gene regulatory networks with hybrid differential evolution
insightful conclusions, how to reconstruct a gene and particle swarm optimization. Neural Netw 20(8):917–927
regulatory network still remains a challenging
problem. The key difficulty is that the mechanisms of
a gene regulatory network are very complex and there Identification of Putative miRNA
are not enough data available. It seems that no single Recognition Elements (MRE)
model can explain the problem successfully.
Please note that there have been several variations and ▶ MicroRNA Target Prediction
extensions of the models mentioned above which can
improve the reconstruction performance significantly.
Identifiers.org URL
Cross-References
▶ MIRIAM URI
▶ Differential Evolution
▶ Linear Additive Network Model
▶ Particle Swarm Optimization (PSO) IgSF
▶ Recurrent Neural Network
▶ Stochastic Neural Network ▶ Immunoglobulin Superfamily (IgSF)
I 944 Image Algebra

providers of molecular interaction data who have


Image Algebra agreed to share curation effort and:
• Develop and work to a single set of curation rules
▶ Mathematical Morphology for Protein Surface when capturing data from both directly deposited
Modeling interaction data and publications in peer-reviewed
journals
• Capture full details of an interaction in a “deep”
curation model
Image Segmentation • Perform a complete curation of all protein-protein
interactions experimentally demonstrated within
Myong-Hee Sung a publication
Laboratory of Receptor Biology and Gene Expression, • Make these interactions available in a single search
National Cancer Institute, National Institutes of interface on a common website
Health, Bethesda, MD, USA • Provide data in standards compliant download
formats
• Make all IMEx records freely accessible under the
Definition Creative Commons Attribution License

Image segmentation is an image analysis method


where objects of interest are separated from the
rest of the image content (background, irrelevant fea-
tures, noise, etc.). Various techniques such as IMGT Collier de Perles
thresholding or watershed algorithms can be used to
segment images. Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
IMEx

▶ IMEx Consortium Definition

“IMGT Collier de Perles” (or “IMGT_Collier_de_


Perles”) concept is a major concept of numerotation
IMEx Consortium (generated from the ▶ NUMEROTATION
Axiom) of ▶ IMGT-ONTOLOGY, the global reference
Sandra Orchard in ▶ immunogenetics and ▶ immunoinformatics
EMBL Outstation, European Bioinformatics Institute, (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
Hinxton, Cambridge, UK 2005, 2008; Duroux et al. 2008), built by IMGT ®, the
international ImMunoGeneTics information system ®
(http://www.imgt.org) (▶ IMGT ® Information Sys-
Synonyms tem). The “IMGT Collier de Perles” concept allows
standardized two-dimensional (2D) graphical repre-
IMEx; International molecular exchange consortium sentations of the domains (▶ Domain Type), based on
the ▶ IMGT unique numbering. Three leafconcepts
has been defined: for V domain (▶ Variable (V)
Definition Domain), for C domain (▶ Constant (C) Domain),
and for G domain (▶ Groove (G) Domain). “IMGT
The IMEx consortium (www.imexconsortium.org) is Collier de Perles for V domain” represents the
an international collaboration between major public V-DOMAIN of immunoglobulins (IG) or antibodies
IMGT Collier de Perles 945 I
and T cell receptors (TR) and the V-LIKE-DOMAIN are closely packed against each other through hydro-
of the ▶ immunoglobulin superfamily (IgSF) proteins phobic interactions giving a hydrophobic core and
other than IG and TR. “IMGT Collier de Perles for joined together by a disulfide bridge between the
C domain” represents the C-DOMAIN of IG and TR B-STRAND in the first sheet and the F-STRAND in
and the C-LIKE-DOMAIN of IgSF proteins other than the second sheet (Lefranc et al. 2003). Four strands
IG and TR. “IMGT Collier de Perles for G domain” form one sheet and five strands form a second sheet.
represents the G-DOMAIN of major histocompatibil- It is remarkable that the 3D structure fold of the
ity (MH) and the G-LIKE-DOMAIN of the ▶ MH V domain has been conserved through evolution,
superfamily (MhSF) proteins other than MH. The despite the sequence divergence between IgSF
IMGT Colliers de Perles instances are obtained, domains and, even more surprisingly, despite the
starting from V, C, or G domain amino acid sequences, particularities of the IG and TR synthesis compared
using IMGT/DomainGapAlign and IMGT/Collier to the other proteins (Lefranc and Lefranc 2001a, b)
de Perles tools (http://www.imgt.org). If three- (▶ Immunoglobulin Synthesis). Indeed, in a ▶ conven-
dimensional (3D) structures are available, they are tional gene (GeneType), the V-LIKE-DOMAIN is usu-
provided in IMGT/3Dstructure-DB, with hydrogen ally encoded by a unique exon or more rarely by two
bonds (for V and C domains, IMGT Colliers de Perles spliced exons, whereas in IG and TR, the V-DOMAIN
on two layers) or with IMGT pMH contact analysis (for results from the rearrangement of two or three types of
G domains). IMGT Colliers de Perles allow to bridge genes ▶ (GeneType): variable (V) gene, ▶ joining (J) I
the gap between amino acid sequences and 3D struc- gene and, in some loci, ▶ diversity (D) gene (Lefranc
tures, whatever the species, the IgSF or MhSF protein, and Lefranc 2001a, b) (▶ Immunoglobulin Synthesis).
or the ▶ chain type. They are particularly useful for The V-LIKE-DOMAIN is usually, as the IG and TR
antibody engineering, sequence-structure analysis, V-DOMAIN, the most N-terminal (and extracellular)
visualization and comparison of positions for domain of the protein. However, in contrast to the IG
mutations, polymorphisms, and contact analysis and TR V-DOMAIN which is always unique, the
of IG, TR, MH, and other IgSF and MhSF proteins. V-LIKE-DOMAIN may be present in several copies
in the same protein and interspersed with C-LIKE-
DOMAIN or with domains of other superfamilies.
Characteristics Based on the IMGT unique numbering, the
conserved amino acids always have the same position.
IMGT Collier de Perles for V Domain For the V domain, five positions are essential for the
“IMGT Collier de Perles for V domain” core structure: cysteine 23 (1st-CYS), tryptophan 41
(or “IMGT_Collier_de_Perles_for_V_domain”) is (CONSERVED-TRP), conserved hydrophobic amino
a ▶ leafconcept of the “IMGT Collier de Perles” concept acid 89, cysteine 104 (2nd-CYS), and “FG anchor” 118
of numerotation. It represents the V domain (▶ Variable (J-PHE or J-TRP in V-DOMAIN) (Lefranc et al.
(V) Domain) that includes the V-DOMAIN of 2003). In the IG and TR V-DOMAIN, the structurally
immunoglobulins (IG) or antibodies and T cell conserved antiparallel beta strands are designated as
receptors (TR) and the V-LIKE-DOMAIN of the immu- framework regions (FR-IMGT) (▶ Framework Region
noglobulin superfamily (IgSF) proteins other than IG (FR-IMGT)) whereas the BC, C0 C00 , and FG loops
and TR. It is based on the IMGT unique numbering for are designated as complementarity determining
V domain (Lefranc et al. 2003) (▶ IMGT unique regions (CDR-IMGT) (▶ Complementarity Determin-
numbering). ing Region (CDR-IMGT)) (Lefranc et al. 2003). The
The topology and three-dimensional (3D) structure length (number of amino acids or by extrapolation
of a V-LIKE-DOMAIN are very similar to those of an number of codons, that is number of occupied posi-
IG or TR V-DOMAIN (Fig. 1a). tions) of the strands and loops is a crucial and original
A V domain consists of about 100 amino acids in concept of IMGT-ONTOLOGY. The lengths of the
nine antiparallel beta strands (A, B, C, C0 , C00 , D, E, F, BC (CDR1-IMGT), C0 C00 (CDR2-IMGT), and FG
and G), linked by beta turns (AB, CC0 , C00 D, DE, and (CDR3-IMGT) loops characterize the V domain.
EF) and loops (BC, C0 C00 , and FG), located on two Thus, the length of the three loops BC, C0 C00 , and FG
layers, forming a sandwich of two sheets. The sheets is shown, in number of amino acids (or codons), into
I 946 IMGT Collier de Perles

a b
lgSF domain of V type V-DOMAIN (IG, TR)

A F T G G Y
1 E T N P V Y T
B V Y Y Y Y G G
Q G W I T S S 85 G G
L 26 S 39 L 55 D 66 T S T A D
T
E A G G N A R Y
D
E K 41 W I N 80 A Y A W 118
G E
S 23 C V W E T 89 M 104 C G
F G S K E K L Q E Q
D F
C P I Q G F I L Y G
10
E K R H R A S V T
L V P G G K 75 S A S
C′
46

V S L S V
R T 1a3l_H T D T
C″ 15 P G 16 S E V
1a3I_H A B C C¢ C≤ D E F G S
S 128

IGHV-D-J

c V-DOMAIN (IG, TR) d V-LIKE-DOMAIN

S
F S
F T G G R Y Q
Y P P L Q
T L Y T N P V P G Y E N
G V G Y Y Y Y Y N M N F G L
G Q G G W 85 S I S T L K V Y R 85 E Y N Q
D L A 26 S 39 L T 55 D S T 66 D I E 26 S 39 A S 55 V G V 66
T L
Y E R A G A G N N L I Y S V V Y
D K
118 W E A K 41 W Y I A N 118 E V K K 41 L T C G S
23 23
G S 104 C C V 89 M W T E K K 104 C C H 89 F V D K
Q G F S K Q E L K S Q F S K Y E C T
10 F 10 V
G P Y I Q L G I F N S Y L G L A N G
P
T E V K R S H A R G M I N L Q S F
S L A V 46 P S G 75 K G T L D V 46 N D 75

V V S S L I V T A L
T R D T T 1a3I_H I A Q N Y 1yjd_C
V 5 P E G 16 S H 5 Y N D 16 V
S V
G A F B C E C¢ D C≤ G A F B C E C¢ D C≤
128 S 128 K

IGHV-D-J CD28[D1]

IMGT Collier de Perles, Fig. 1 IMGT Collier de Perles for amino acids shown in squares (anchor positions), which belong to
V domain. (a) Ribbon representation of a V-DOMAIN taken as the neighboring strands (FR-IMGT). BC loops are represented
an example. A similar topology and 3D structure characterize online in red, C0 C00 loops in orange, and FG loops in purple.
a V-LIKE-DOMAIN. (b) and (c) V-DOMAIN on one layer and Hatched circles or squares correspond to missing positions
on two layers, respectively (Mus musculus VH [8.8.12]). (d) according to the IMGT unique numbering for V domain (Lefranc
V-LIKE-DOMAIN on two layers (Homo sapiens CD28 et al. 2003). Arrows indicate the direction of the beta strands and
[9.9.13]). Amino acids are shown in the one-letter abbreviation. their designations in 3D structures. The IMGT Colliers de Perles
Position at which hydrophobic amino acids (hydropathy index on two layers show, on the forefront, the GFCC0 C00 strands and, on
with positive value: I, V, L, F, C, M, A) and tryptophan (W) are the back, the ABED strands. IMGT Colliers de Perles on two
found in more than 50% of analyzed sequences are shown online layers with hydrogen bonds are available in IMGT/3Dstructure-
in blue. All proline (P) are shown online in yellow. The loops BC, DB (http://www.imgt.org): 1a3l_H, 1yjd_C
C0 C00 , and FG (corresponding to the CDR-IMGT) are limited by
IMGT Collier de Perles 947 I
brackets and separated by dots. In Fig. 1, the lengths numbering first described for the V-REGION and for
are [8.8.12] for the V-DOMAIN CDR-IMGT and the V domain (Lefranc et al. 2003). The five positions
[9.9.13] for the V-LIKE-DOMAIN. IMGT Colliers that are essential for the core structure have the
de Perles for V domain can be represented on one same number in the C domain: cysteine 23 (1st-
layer (Fig. 1b) or two layers (Fig. 1c, d) (Ruiz and CYS), tryptophan 41 (CONSERVED-TRP), conserved
Lefranc 2002; Lefranc et al. 2003; Kaas et al. 2007; hydrophobic amino acid 89, cysteine 104 (2nd-CYS)
Kaas and Lefranc 2007). If 3D structures are available, and “FG anchor” 118 (Lefranc et al. 2005a).
IMGT Colliers de Perles on two layers are provided IMGT Colliers de Perles for C domain can be
with hydrogen bonds, in IMGT/3Dstructure-DB represented on one layer (Fig. 2b) or two layers
(http://www.imgt.org) (Ehrenmann et al. 2010). (Fig. 2c, d) (Lefranc et al. 2005a; Kaas et al. 2007;
Kaas and Lefranc 2007). If 3D structures are available,
IMGT Collier de Perles for C Domain IMGT Colliers de Perles on two layers are provided
“IMGT Collier de Perles for C domain” with hydrogen bonds, in IMGT/3Dstructure-DB
(or “IMGT_Collier_de_Perles_for_C_domain”) is (http://www.imgt.org) (Ehrenmann et al. 2010).
a leafconcept of the “IMGT Collier de Perles” concept
of numerotation. It represents the C domain (▶ Con- IMGT Collier de Perles for G Domain
stant (C) Domain) that includes the C-DOMAIN of IG “IMGT Collier de Perles for G domain”
or antibodies and TR and the C-LIKE-DOMAIN of (or “IMGT_Collier_de_Perles_for_G_domain”) is I
IgSF proteins other than IG and TR. It is based on the a leafconcept of the “IMGT Collier de Perles” concept
IMGT unique numbering for C domain (Lefranc et al. of numerotation. It represents the G domain (▶ Groove
2005a) (IMGT unique numbering). (G) Domain) that includes the G-DOMAIN of major
The topology and 3D structure of a C-LIKE- histocompatibility (MH) and the G-LIKE-DOMAIN
DOMAIN are very similar to those of an IG or of the major histocompatibility superfamily (MhSF)
TR C-DOMAIN (Fig. 2a). proteins other than MH. It is based on the IMGT
A C domain consists of about 100 amino acids in unique numbering for G domain (Lefranc et al.
seven antiparallel beta strands (A, B, C, D, E, F, 2005b) (IMGT unique numbering).
and G), linked by beta turns (AB, DE, and EF), The topology and 3D structure of a G-LIKE-
a transversal strand (CD) and loops (BC and DOMAIN are very similar to those of a MH
FG), located in two layers, forming a sandwich of two G-DOMAIN (Fig. 3a). Two G domains are necessary
sheets. The sheets are closely packed against each other to form the characteristic groove of the MhSF proteins
through hydrophobic interactions giving a hydrophobic that comprise the MH (MH class I or MH1, and MH
core and joined together by a disulfide bridge between class II or MH2) and related proteins of the immune
the B-STRAND in the first sheet and the F-STRAND in system (RPI)-MH1Like (Lefranc et al. 2005b).
the second sheet (Lefranc et al. 2005a). Four strands A G domain consists of about 90 amino acids in four
form one sheet and three strands form a second sheet. antiparallel beta strands (A, B, C, and D), linked by
The C domain has a topology and 3D structure turns (AB, BC, CD), forming half of the groove floor,
similar to those of the V domain, but differs by the and one alpha helix, forming one wall of the groove.
number of antiparallel beta strands (seven instead of For each G domain, the positions that contribute to the
nine). By comparison with a V domain, the groove floor comprise positions 1–49, and comprise
C0 -STRAND, C0 C00 -LOOP, and C00 -STRAND are miss- the four strands linked by turns (Lefranc et al. 2005b).
ing in the C domain and are replaced by the character- The numbering of the G domain helix starts at position
istic transversal CD-STRAND that links the two 50 and ends at position 92 (Fig. 3).
sheets. Depending on the CD-STRAND length, the Interestingly, despite the high sequence divergence,
D-STRAND is in the first or in the second sheet only five additional positions (54A, 61A, 61B, 72A, and
(Lefranc et al. 2005a). Additional positions observed 92A) are necessary to align any G-DOMAIN and
in the C domain define the AB-TURN, DE-TURN, G-LIKE-DOMAIN (Lefranc et al. 2005b). It is worth-
and EF-TURN (Lefranc et al. 2005a). while to note that position 54A in G-ALPHA1-LIKE is
The IMGT unique numbering for C domain the only additional position needed to extend the IMGT
(Lefranc et al. 2005a) is derived from the IMGT unique numbering for G-DOMAIN to the G-LIKE-DOMAIN.
I 948 IMGT Collier de Perles

a lgSF domain of C type b C-DOMAIN (IG, TR)


84.7 85.7

A D
D K S
A P K S T 111 112

1 A Y D D Y T S
P F I 85.1 Q S K T
T N N 36 D M 85 H S
A 84

V 26 N 39 V T S T P
W
S L K S A I
B S
I F 41 W 80 N T E V 118
E
F 23 C K L 89 L 104 C K 119

D P V I V T T S
G
P V D G 77 L Y F
F 10 S Q N
S S 45 G S E R 45.5 45.6
45.7 T S N
C 45.3 45.4
45.1 45.2
E A K N R
Q G D H
CD L G 1axt_L E R
15 T S 16 Y E 128

15.3 96.1 96.2


15.1
1axt_L
15.2
A B C D E F G
IGKC

c C-DOMAIN (IG, TR) d C-LIKE-DOMAIN


85.7 84.7 85.7 84.7

1.5 M
1.4 R
1.3 A D 1.3 T
1.2 D 85.4 S K 84.4 1.2 E 85.4 84.4

1.1 A P K 85.3 T S 84.3 1.1 D E 85.3 84.3

S 1 A T Y D 85.2 Y D 84.2 1 L P D 85.2 84.2

T P K F I 85.1 S Q 84.1 P S N 85.1 84.1

S T H N N 36.85 M D 84 S K L Y S 36.85 84

P V T 25 N 39 V S T T A N 26 A 39 T
W
I S A L K S L V T G Q S
S A
118 V I E F 41 W T N 80 118 S V Q Q W S Q 80
23
119 K F 104 C 23 C K 89 L L 119 D F 104 C C F 89 Y S
S P T V I T V P L R K H F S
F P Y V D L G 77 V E Y L N I I 77
S Q N P
N S S S 45 G S E R 45.5 45.6 45.7 Q 10 Q E T 45 E S L 45.6 45.7
10 45.4 45.4 45.5
45.1 K 45.1
R E N A L W G V A
Q H G D E Y S S A
L R G E V S D D T
Y 1axt_L 1e4j_A
126 15 T E S 16 H 15 V N K 16 V
96.2 95.2
15.1 15.3 127 I 15.1 L 15.3

E
15.2 15.2

G A F B E G A F B E

IGKC FCGR3B [D1]

IMGT Collier de Perles, Fig. 2 (continued)


IMGT Collier de Perles 949 I
The helix (positions 50–92) seats on the beta 2. A second feature of the IMGT ® standardization
sheet and its axis forms an angle of about 40 with is the comparison of cDNA and/or amino acid
the beta strands. Two cysteines, CYS-11 (in strand A) sequences with genomic sequences, and the identi-
and CYS-74 (in the helix) are well conserved in fication of the splicing sites, to delimit precisely
the G-ALPHA2, G-BETA, and G-ALPHA2-LIKE the domains: a V-LIKE-DOMAIN, a C-DOMAIN,
domains where they participate to a disulfide bridge a C-LIKE-DOMAIN, a G-DOMAIN, or a G-LIKE-
that fastens the helix on the groove floor. The IMGT DOMAIN is usually encoded by a unique exon or
Colliers de Perles allows to describe specific features more rarely by two spliced exons (Lefranc et al.
(Lefranc et al. 2005b; Kaas et al. 2007; Kaas and 2003, 2005a, b). This IMGT ® standardization for
Lefranc 2007). As an example, the G-ALPHA1 and the domain delimitation explains the discrepancies
G-ALPHA domains have a conserved N-glycosylation observed with the generalist UniProt/Swiss-Prot
site at position 86 (N-X-S/T, where N is asparagine, database which identifies domains based on amino
X any amino acid except proline, S is serine, and T is acid sequences and does not take into account the
threonine). genomic information. The IMGT Collier de Perles
If 3D structures of peptide/MH are available, IMGT also puts the question of the leader region. Indeed,
Colliers de Perles are provided with IMGT pMH the N-terminal end of the first domain of an IgSF or
contact analysis, in IMGT/3Dstructure-DB (http:// MhSF chain depends on the proteolytic cleavage
www.imgt.org) (Ehrenmann et al. 2010). site of the leader region (peptide signal) which is I
rarely determined experimentally. When this site is
What Do We Learn from IMGT Collier de Perles? not known, the IMGT Colliers de Perles start with
1. Any domain represented by an IMGT Collier de the first amino acid resulting from the splicing (usu-
Perles is characterized by the length of its strands, ally a splicing frame 1) (“Splicing sites” in IMGT
loops and turns and, for the G domain, by the length Aide-mémoire, http://www.imgt.org). For a IG
of its helix (Lefranc et al. 2003, 2005a, b) (IMGT and TR V-DOMAIN, the leader proteolytic site
unique numbering). The strand, loop, turn, or is known (or is extrapolated) and the IMGT
helix lengths (number of amino acids or by extrap- Colliers de Perles start with the first amino
olation number of codons, that is number of occu- acid of the V-REGION (Lefranc and Lefranc
pied positions) become crucial information which 2001a, b).
characterizes the domains. This first feature of 3. The IMGT Colliers de Perles allow a precise
the IMGT ® standardization based on the IMGT visualization of the inter-species differences for
unique numbering allowed, for instance, to show the IgSF V and C domain strands and loops, and
that the distinction between the C1, C2, I1, and MhSF G domain strands and helix, even in the
I2 domain types found in the literature and in the absence of 3D structures. This has been applied
databases to describe the IgSF C domains is unnec- for example to the CD28 family members and
essary and moreover unapplicable when dealing their B7 family ligands and to the BTLA protein
with sequences for which no structural data are which belong to the IgSF by their V and/or
known (discussed in Lefranc et al. 2005a). C domains (Garapati and Lefranc 2007).

IMGT Collier de Perles, Fig. 2 IMGT Collier de Perles for in the V domain. Positions 45 and 77 which delimit the charac-
C domain. (a) Ribbon representation of a C-DOMAIN taken as teristic CD-STRAND of the C domain, and position 118 which
an example. A similar topology and 3D structure characterize corresponds structurally to J-PHE or J-TRP of the IG and TR
a C-LIKE-DOMAIN. (b) and (c) C-DOMAIN on one layer and J-REGION (Lefranc and Lefranc 2001a, b) are also shown in
on two layers, respectively (Mus musculus C-KAPPA). (d) squares. Hatched circles correspond to missing positions
C-LIKE-DOMAIN on two layers (Homo sapiens FCGR3B according to the IMGT unique numbering for C domain (Lefranc
[D1]). Amino acids are shown in the one-letter abbreviation. et al. 2005a). Arrows indicate the direction of the beta strands
Position at which position at which hydrophobic amino acids and their designations in 3D structures. The IMGT Colliers de
(hydropathy index with positive value: I, V, L, F, C, M, A) and Perles on two layers show, on the forefront, the GFC strands and,
tryptophan (W) are found in more than 50% of analyzed on the back, the ABE strands. IMGT Colliers de Perles on two
sequences are shown in gray. The positions 26, 39, and 104 are layers with hydrogen bonds are available in IMGT/3Dstructure-
shown in squares by homology with the corresponding positions DB (http://www.imgt.org): 1axt_L, 1e4j_A
I 950 IMGT Collier de Perles

a MhSF domain of G type b G-DOMAIN (MH1)


E P D R R R N A
I Q G T K K A S
E W G Q H V L G L G Y Q E
51 W E Y E V H D 92
T T Y S
CD AB 50
P 55 59 63 67 70 73 77 80 84 88
42 A A
S
G-ALPHA1 Q D 18 R G
G
[D1] G-ALPHA1 M
R S E P
D 38 P
[D1] 49 A E F R
R P R
R S 14 1
F V G
V I S
F S
A T H
31 Q V F 10 T
T G F V 28
D Y Y 7A Q Y D
1akj_A 1akj_A D V R R A G
BC 28 M 10 M Y K 31
BC S Y Q D
H G H
C Y
S Y I
G D G A
1 14 V R A A 49
G L T D
L K
G-ALPHA2 S F 38 W
E S
[D2] D W R 18 D R
L 42
G-ALPHA2
88 84 80 77 73 70 67 63 59 55 M 50
[D2] 92 T T N R W T
2A
Y
R L E E Y R L V W T A 51
G L E C G L A R Q A H E K K T A
Q K L V
AB CD [D3]
E E A A H Q
61A

MH class I

c G-DOMAIN (MH2) d G-LIKE-DOMAIN


G S Q Q N E H
F R A F A G L A I K T N I S G S Q W K Q M Q V R V F T I Q L V S D
A D A L E M K S Y P T Q F R K M P
E
51 E F E A N N 50 W L L Y S D E M K 92
V I R T N 92 K
50 E 55 59 63 67 51 P 55 59 63 67 70
70 73 77 80 84 88 73 77 80 84 88
42 K 54A 42 S
[D2]
K A 18 A D 18 R
E G T S
G-ALPHA M 38 D N N
T D
D
T
I S 38 S W
[D1] 49 L V Q P 49 K S A
V 1 R G-ALPHA1-LIKE W
R W H
S N 14 T F R
R F 14 1
G P T S Y
F E
L
R [D1] H D P
I Y T S
F F S M I
E F Q V E
M E 10 L 31 L Q 10
31 D F W 28 V L I
A 28
G D Q 7A Q D W C 7A Q F Q
1fyt_B Y N L
D F I L I Q 1cd1_C G L R A G
28 I K C 28 F S V
10 F E 31 10 A K 31
V R T H Y
E E Y G
H E S L V
E C L N C F
H V E V
E L R K S R
G-BETA K 1 14 F R A V 49
G-ALPHA2-LIKE Q 1 14 M E F
T V 49
I F F R T Q Y S Q P 49.1
V
[D1] N R 38 D Y [D2] P A 38 W W G 49.2
S E G S A
G T E 18 G N 49.3
D G 18 T P 49.4
V 42 42 49.5
84 88 80 50 88 84 80 77 73 70
92 77 73 70 67 63 59 55 E 92 Q 2A L
67 63 59 55 S 50
R F V N Y 2A A R L W D L 51 K L S
D A
E L R
G F T V T L P W 51
R Q T S G G Y H C T V A R Q L D N Y A P G G V L C D L M Q T S G N V I L
L
V E R E K L P A K
D E K S E R N Q A D
[D2] 61B [D3] 61B
61A Q 61A D
MH class II RPI-MH1Like

IMGT Collier de Perles, Fig. 3 IMGT Collier de Perles for numbering for G domain (Lefranc et al. 2005b). In IMGT
G domain. (a) Ribbon representation of two G-DOMAIN taken Colliers de Perles, position 7A is only displayed in the G-
as an example. A similar topology and 3D structure characterize ALPHA, G-ALPHA1, and G-ALPHA1-LIKE domains, and
the G-LIKE-DOMAIN. (b) G-DOMAIN of MH class I (or positions 61A and 61B in the G-BETA, G-ALPHA2, and
MH1). G-ALPHA1 [D1] and G-ALPHA2 [D2] (Homo sapiens G-ALPHA2-LIKE domains. Position 54A is only occupied in
HLA-A*0201). (c) G-DOMAIN of MH class II (or MH2). G-ALPHA1-LIKE of RPI-MH1Like proteins. Position 92A is
G-ALPHA [D1] and G-BETA [D1] (Homo sapiens only added for HLA-DMA and H2-DMA IMGT Colliers de
HLA-DRA*0101 and HLA-DRB1*0101). (d) G-LIKE- Perles. Note that the N-terminal end of a peptide in the cleft
DOMAIN of RPI-MH1Like. G-ALPHA1-LIKE [D1] and G- would be on the left-hand side. IMGT Colliers de Perles are
ALPHA2-LIKE [D2] (Mus musculus CD1D1). Amino acids available in IMGT/3Dstructure-DB (http://www.imgt.org):
are shown in the one-letter abbreviation. Hatched circles corre- 1akj_A, 1fyt_B, 1cd1_C
spond to missing positions according to the IMGT unique

The IMGT Colliers de Perles are particularly useful easily compare the amino acid sequences of the
in molecular engineering and antibody humaniza- four FR-IMGT (FR1-IMGT: positions 1–26,
tion design based on CDR grafting. Indeed they FR2-IMGT: 39–55, FR3-IMGT: 66–104, and
allow to precisely define the CDR-IMGT and to FR4-IMGT: 118–128) between the mouse
IMGT Collier de Perles 951 I
(or other species) and the closest human diseases and leukemias, pMH contact analysis, and
V-DOMAIN. Analyses performed on humanized more generally ligand–receptor interactions involving
therapeutic antibodies underline the importance of V, C, and/or G domains.
a correct delimitation of the CDR regions to be
grafted.
4. The IMGT Colliers de Perles also allow Cross-References
a comparison to the IMGT Colliers de Perles
statistical profiles for the human expressed ▶ Chain Type
IGHV, IGKV, and IGLV repertoires. These statis- ▶ Complementarity Determining Region
tical profiles are based on the definition of 11 IMGT (CDR-IMGT)
amino acid physicochemical characteristics classes ▶ Constant (C) Domain
which take into account the hydropathy, volume, ▶ Conventional Gene
and chemical characteristics of the 20 common ▶ Diversity (D) Gene
amino acids (Pommié et al. 2004) (“Amino acids” ▶ Domain Type
in IMGT Aide-mémoire, http://www.imgt.org). ▶ Framework Region (FR-IMGT)
The statistical profiles identified positions which ▶ Gene Type
are conserved for the physicochemical characteris- ▶ Groove (G) Domain
tics: 41 FR-IMGT positions for the human IGHV ▶ IMGT Unique Numbering I
and 59 FR-IMGT positions for the human IGKV ▶ IMGT-ONTOLOGY
and IGLV at >80% threshold (see Plate 3 in ▶ IMGT-ONTOLOGY, Leafconcept
Pommié et al. 2004). After assignment of the IMGT ▶ IMGT ® Information System
Collier de Perles amino acids to the IMGT amino acid ▶ Immunogenetics
physicochemical classes, comparison can be made ▶ Immunoglobulin Superfamily (IgSF)
with the statistical profiles of the human expressed ▶ Immunoglobulin Synthesis
repertoires. This comparison is useful to identify ▶ Immunoinformatics
potential immunogenic residues at given positions in ▶ Joining (J) Gene
chimeric or humanized antibodies or to evaluate ▶ MH Superfamily (MhSF)
immunogenicity of therapeutic antibodies. ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
5. IMGT Colliers de Perles are also of interest when ▶ Variable (V) Domain
3D structures are available. In IMGT/3Dstructure- ▶ Variable (V) Gene
DB (Ehrenmann et al. 2010), “IMGT Collier de
Perles on 2 layers” are displayed with hydrogen
bonds for V and C domains. Clicking on References
a residue in “IMGT Collier de Perles on one
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
layer” gives access to the corresponding IMGT
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
Residue@Position card which provides the atom ONTOLOGY paradigm. Biochimie 90:570–583
contact types and atom contact categories for that Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/3Dstructure-
amino acid. IMGT Colliers de Perles display DB and IMGT/DomainGapAlign: a database and a tool for
immunoglobulins or antibodies, T cell receptors, MHC, IgSF
the IMGT pMH contact sites for 3D structures and MhcSF. Nucleic Acids Res 38:D301–D307
with peptide/MH (pMH) complexes, which can be Garapati VP, Lefranc M-P (2007) IMGT Colliers de Perles and
compared with the pMH contact sites available in IgSF domain standardization for T cell costimulatory
IMGT/3Dstructure-DB. activatory (CD28, ICOS) and inhibitory (CTLA4, PDCD1
and BTLA) receptors. Dev Comp Immunol 31:1050–1072
The IMGT Colliers de Perles for the V, C, and
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
G domains, based on the IMGT unique numbering, ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
represent therefore a major step forward for the Kaas Q, Lefranc M-P (2007) IMGT Colliers de Perles: standard-
comparative analysis of the sequences and structures ized sequence-structure representations of the IgSF and
MhcSF superfamily domains. Curr Bioinf 2:21–30
of the IgSF and MhSF domains, for the study of their
Kaas Q, Ehrenmann F, Lefranc M-P (2007) IG, TR, MHC, IgSf
evolution and for the applications in antibody and MhcSF: what do we learn from the IMGT Colliers de
engineering, IG and TR repertoires in autoimmune Perles? Brief Funct Genomic Proteomic 6:253–264
I 952 IMGT Unique Numbering

Lefranc M-P, Lefranc G (2001a) The immunoglobulin (▶ IMGT® Information System). The “IMGT unique
factsbook. Academic, London, pp 1–458 numbering” concept allows to number domain types
Lefranc M-P, Lefranc G (2001b) The T cell receptor factsbook.
Academic, London, pp 1–398 (▶ Domain Type) that are characteristic of protein
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E, superfamilies, whatever the species, the molecule
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT type (▶ MoleculeType) or the chain type (▶ Chain
unique numbering for immunoglobulin and T cell receptor Type). Three leafconcepts (▶ IMGT-ONTOLOGY,
variable domains and Ig superfamily V-like domains.
Dev Comp Immunol 27:55–77 Leafconcept) have been defined, respectively, for the
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, V domain (▶ Variable (V) Domain) and C domain
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, (▶ Constant (C) Domain) of the ▶ immunoglobulin
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, superfamily (IgSF) and for the G domain (▶ Groove
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics (G) Domain) of the ▶ MH superfamily (MhSF).
and immunoinformatics, In Silico Biol 4:17–29 “IMGT unique numbering for V domain” numbers the
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou D, V-DOMAIN of immunoglobulins (IG) or antibodies and
Jean C, Ruiz M, Da Piédade I, Rouard M, Foulquier E, T cell receptors (TR) and the V-LIKE-DOMAIN of the
Thouvenin V, Lefranc G (2005a) IMGT unique numbering for
immunoglobulin and T cell receptor constant domains IgSF other than IG and TR. “IMGT unique numbering for
and Ig superfamily C-like domains. Dev Comp Immunol C domain” numbers the C-DOMAIN of IG and TR and
29:185–203 the C-LIKE-DOMAIN of IgSF other than IG and TR.
Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G “IMGT unique numbering for G domain” numbers the
(2005b) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE- G-DOMAIN of major histocompatibility (MH) and
DOMAIN. Dev Comp Immunol 29:917–938 the G-LIKE-DOMAIN of the MhSF other than MH.
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a The IMGT unique numbering has been a great break-
system and an ontology that bridge biological and computa- through in immunogenetics and system biology in
tional spheres in bioinformatics. Brief Bioinform 9:263–275
Pommié C, Levadoux S, Sabatier R, Lefranc G, Lefranc M-P allowing, for the first time, to bridge the gap between
(2004) IMGT standardized criteria for statistical analysis of amino acid (and nucleotide) sequences of any V, C, or
immunoglobulin V-REGION amino acid properties. J Mol G domain and their two-dimensional (2D) and
Recognit 17:17–32 three-dimensional (3D) structures. The IMGT unique
Ruiz M, Lefranc M-P (2002) IMGT gene identification and
Colliers de Perles of human immunoglobulin with known numbering has been fundamental in the creation of the
3D structures. Immunogenetics 53:857–883 ▶ IMGT Collier de Perles and in the standardization of
the description of mutations, amino acid changes,
polymorphisms, and contact analysis in the IMGT®
databases, tools, and Web resources (http://www.imgt.
org) (▶ IMGT® Information System).
IMGT Unique Numbering

Marie-Paule Lefranc Characteristics


Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142, The “IMGT Unique Numbering”: Necessity and
Université Montpellier 2, Montpellier, France Birth of a Concept
IMGT-ONTOLOGY was created to manage the com-
plexity of immunogenetics knowledge and has been
Definition essential for the creation in 1989 of IMGT ®, the inter-
national ImMunoGeneTics information system ®
The “IMGT unique numbering” (or “IMGT_unique_ (http://www.imgt.org) (▶ IMGT ® Information Sys-
numbering”) concept is a major concept of numerotation tem). It has allowed to set up a standardized classifica-
(generated from the ▶ NUMEROTATION Axiom) tion of the immunoglobulin (IG) and T cell receptor
of ▶ IMGT-ONTOLOGY, the global reference in (TR) genes (Lefranc and Lefranc 2001a, b) based on
▶ immunogenetics and ▶ immunoinformatics (Duroux the ▶ CLASSIFICATION axiom (▶ Gene and Allele
et al. 2008), built by IMGT®, the international ImMuno- Nomenclature) and a standardized description of the
GeneTics information system® (http://www.imgt.org) IG and TR based on the ▶ DESCRIPTION axiom
IMGT Unique Numbering 953 I
(▶ Labels and Relations). The IG and TR variable DOMAIN of the IgSF other than IG and TR. The
domains form a huge repertoire of 2.1012 antigen recep- V domain strands and loops and their IMGT ® positions
tors per individual and, owing to the particularities and lengths, based on the IMGT unique numbering for
of their synthesis that involve DNA rearrangements V domain (V-DOMAIN and V-LIKE-DOMAIN)
(as described in ▶ Immunoglobulin Synthesis), there (Lefranc et al. 2003), are shown in Table 1.
was a need for a systematic and coherent numbering The V domain (V-DOMAIN of IG and TR
of the amino acids and codons, whatever the molecule and V-LIKE-DOMAIN of IgSF other than IG and
(▶ MoleculeType), configuration (▶ Configuration TR) is composed of the A-STRAND of 15 (or 14 if
Type) or chain (▶ Chain Type) type. It was therefore gap at position 10) amino acids (positions 1–15), the
a great breakthrough when the “IMGT unique number- B-STRAND of 11 amino acids (positions 16–26) with
ing” and its representation, the ▶ IMGT Collier de the first conserved cysteine (1st-CYS) at position 23,
Perles, were defined for the first time in 1997 for the the BC-LOOP (positions 27–38; the longest BC loops
▶ variable (V) domain (Lefranc 1997, 1999). This have 12 amino acids), the C-STRAND of eight amino
IMGT unique numbering was created by taking into acids (positions 39–46) with the tryptophan (CON-
account the high conservation of the structure of the SERVED-TRP) at position 41, the C0 -STRAND of
V domain and by integrating the knowledge acquired nine amino acids (positions 47–55), the C0 C00 -LOOP
by the analysis of multiple sources: alignment of more (positions 56–65; the longest C0 C00 loops have ten
than 5,000 sequences, literature data on the framework amino acids), the C00 -STRAND of nine (or eight if I
(FR) and complementarity determining regions (CDR), gap at position 73) amino acids (positions 66–74), the
structural data from X-ray diffraction studies and char- D-STRAND of ten (or eight if gaps at positions 81 and
acterization of the CDR hypervariable loops (Lefranc 82) amino acids (positions 75–84), the E-STRAND of
1997, 1999). The standardized delimitation of the 12 amino acids (positions 85–96) with a conserved
FR-IMGT (▶ Framework Region (FR-IMGT)) and hydrophobic amino acid at position 89, the
CDR-IMGT (▶ Complementarity Determining Region F-STRAND of eight amino acids (positions 97–104)
(CDR-IMGT)), a new and crucial feature of this num- with the second conserved cystein (2nd-CYS) at posi-
bering, was defined based on the longest CDR1-IMGT tion 104, the FG-LOOP (positions 105–117; these
and CDR2-IMGT found in the IMGT® multiple align- positions correspond to a FG loop of 13 amino acids),
ments of the germline IG and TR genes and, for the and the G-STRAND of 11 (or 10) amino acids
rearrranged CDR3-IMGT, on statistical analysis of the (positions 118–128) (Table 1, Fig. 1). In the IG and
IG and TR rearrangements (Lefranc 1997, 1999; Lefranc TR V-DOMAIN, the G-STRAND is the C-terminal
et al. 2003). The IMGT unique numbering, originally part of the J-REGION, with J-PHE or J-TRP 118 and
defined for the numerotation of the IG and TR the canonical motif F/W-G-X-G at positions 118–121
V-DOMAIN (Lefranc 1997), was rapidly extended to (Lefranc et al. 2003) (Table 1).
the V-LIKE-DOMAIN of the immunoglobulin super- In the IG and TR V-DOMAIN, the structurally
family (IgSF) other than IG and TR (Lefranc 1999; conserved antiparallel beta strands are also designated
Lefranc et al. 2003), then to the ▶ constant (C) domain as framework regions (FR-IMGT) (▶ Framework
(C-DOMAIN of IG and TR and C-LIKE-DOMAIN of Region (FR-IMGT)) whereas the loops are designated
IgSF other than IG and TR) (Lefranc et al. 2005a). Based as complementarity determining regions (CDR-
on the same concepts, and despite a very different IMGT) (▶ Complementarity Determining Region
structure of the ▶ groove (G) domain, the IMGT unique (CDR-IMGT)) (Lefranc et al. 2003). Strands A and
numbering for G domain was successfully set up for the B correspond to the FR1-IMGT (positions 1–26),
G-DOMAIN of MH and G-LIKE-DOMAIN of MhSF strands C and C0 to the FR2-IMGT (positions 39–55),
other than MH (Lefranc et al. 2005b). strands C00 , D, E, and F to the FR3-IMGT (positions
66–104) and strand G to the FR4-IMGT (positions
IMGT Unique Numbering for V Domain 118–128). The BC, C0 C00 , and FG loops correspond to
V Domain Strands and Loops the CDR1-IMGT, CDR2-IMGT, and CDR3-IMGT,
The IMGT unique numbering for V domain numbers respectively (Lefranc et al. 2003) (Table 1, Fig. 1).
the V-DOMAIN of immunoglobulins (IG) or anti- The loop length (number of amino acids (or
bodies and T cell receptors (TR) and the V-LIKE- codons), that is number of occupied positions) is
I 954 IMGT Unique Numbering

IMGT Unique Numbering, Table 1 V domain strands and loops, IMGT positions and lengths, based on the IMGT unique
numbering for V domain (V-DOMAIN and V-LIKE-DOMAIN) (Lefranc et al. 2003)
V domain strands Characteristic Residue @ V-DOMAIN
and loopsa IMGT positions Lengthsb Positionc FR-IMGT and CDR-IMGT
A-STRAND 1–15 15 (14 if gap at 10) FR1-IMGT
B-STRAND 16–26 11 1st-CYS 23
BC-LOOP 27–38 12 (or less) CDR1-IMGT
C-STRAND 39–46 8 CONSERVED-TRP 41 FR2-IMGT
C0 -STRAND 47–55 9
C0 C00 -LOOP 56–65 10 (or less) CDR2-IMGT
C00 -STRAND 66–74 9 (or 8 if gap at 73) FR3-IMGT
D-STRAND 75–84 10 (or 8 if gaps
at 81, 82)
E-STRAND 85–96 12 hydrophobic 89
F-STRAND 97–104 8 2nd-CYS 104
FG-LOOP 105–117 13 (or less, or more) CDR3-IMGT
G-STRAND 118–128 11 (or 10) V-DOMAIN J-PHE 118 FR4-IMGT
or J-TRP 118d
a ®
IMGT labels (concepts of description) are written in capital letters (Labels and relations)
b
in number of amino acids (or codons)
c ®
Residue@Position is a IMGT concept of numerotation that numbers the position of a given residue (or that of a conserved property
amino acid class), based on the IMGT unique numbering
d
In the IG and TR V-DOMAIN, the G-STRAND (or FR4-IMGT) is the C-terminal part of the J-REGION, with J-PHE or J-TRP 118
and the canonical motif F/W-G-X-G at positions 118–121

FR1-IMGT CDR1-IMGT FR2-IMGT CDR2-IMGT FR3-IMGT CDR3-IMGT FR4-IMGT


(1-26) (27-38) (39-55) (56-65) (66-104) (105-117) (118-128)
BC C C⬘ C⬘C⬙ C⬙ D FG G
A B E F
(27-38) (39-46) (47-55) (56-65) (66-74) (75-84) (105-117) (118-128)
(1-15) (16-26) (85-96) (97-104)
> > > > > > > > >
1 10 15 16 23 26 27 38 39 41 46 47 55 56 65 66 74 75 84 85 89 96 97 104 105 117 118 128
........ .... ...... .. .... ºº .... . . . . . AB . . . . . . . ...ºº ... ....... ........ A ... ...... ...... ..... º º .... .........

IMGT Unique Numbering, Fig. 1 “IMGT Protein display for V-LIKE-DOMAIN (Lefranc et al. 2003). Line 7: Pipes for the
V domain” header. The header comprises 7 lines. Line 1: Frame- start and end positions of strands and loops and highlighted
work regions FR-IMGT (e.g., FR1-IMGT) and complementarity positions, dots for the other positions, plus if needed, letters or
determining regions CDR-IMGT (e.g., CDR1-IMGT). Line 2: numbers for additional positions (e.g., AB or 123. . .). Optionally
start and end positions of FR-IMGT and CDR-IMGT, e.g., small circles above the line (instead of dots) may indicate the
(1–26). Line 3: name of strands (A, B. . .) and loops (AB, positions at the top of the BC, C0 C00 , and FG loops considering
BC. . .). Line 4: start and end positions of strands, e.g., (1–15). loop lengths of [12.10.13] (these positions are 32, 33 for BC, 60,
Line 5: arrows (for the strands). Line 6: positions according 61 for C0 C00 and 111, 112 for FG)
to the IMGT unique numbering for V-DOMAIN and

a crucial and original concept of IMGT-ONTOLOGY. IMGT Gaps and Additional Positions
The lengths of the BC (CDR1-IMGT), C0 C00 (CDR2- Dots in IMGT Protein displays (Scaviner et al. 1999;
IMGT), and FG (CDR3-IMGT) loops characterize the Folch et al. 2000; Lefranc and Lefranc 2001a, b)
V-DOMAIN and V-LIKE-DOMAIN. Thus, the length (Fig. 1) and hatched circles or squares in IMGT Col-
of the three loops BC, C0 C00 , and FG is shown, in liers de Perles for V domain (▶ IMGT Collier de
number of amino acids (or codons), into brackets and Perles) correspond to missing positions according to
separated by dots. For example [9.6.9] means that the the IMGT unique numbering for V domain (Lefranc
BC, C0 C00 , and FG loops (or CDR1-IMGT, CDR2- et al. 2003).
IMGT, and CDR3-IMGT for V-DOMAIN) have For BC, C0 C00 , or FG loops shorter than 12, 10, 13
a length of 9, 6, and 9 amino acids (or codons), amino acids, respectively, gaps are created at the apex
respectively. (missing positions, hatched in ▶ IMGT Collier de
IMGT Unique Numbering 955 I
Perles, or not shown in structural data representations). IMGT Unique Numbering, Table 2 C domain strands, turns
The gaps are placed at the apex of the loop with an and loops, IMGT positions and lengths, based on the IMGT
unique numbering for C domain (C-DOMAIN and C-LIKE-
equal number of amino acids (or codons) on both sides DOMAIN) (Lefranc et al. 2005a)
if the loop length is an even number, or with one more
amino acid (or codon) in the left part if it is an odd Characteristic
C domain strands, IMGT Residue @
number. For example, for FG loops shorter than 13 turns and loopsa positions Lengthsb Positionc
amino acids, gaps are created from the apex of the A-STRAND 1–15 15 (14 if
loop, in the following order: 111, 112, 110, 113, 109, gap at 10)
etc. For FG loops longer than 13 amino acids, addi- AB-TURN 15.1– 0–3
tional positions are created, between positions 111 and 15.3
B-STRAND 16–26 11 1st-CYS 23
112 at the top of the FG loop, in the following order:
BC-LOOP 27–31 10 (or less)
112.1, 111.1, 112.2, 111.2, 112.3, etc. (Lefranc et al.
34–38
2003).
C-STRAND 39–45 7 CONSERVED-
TRP 41
IMGT Unique Numbering for C Domain CD-STRAND 45.1– 0–9
C Domain Strands, Loops and Turns 45.9
The IMGT unique numbering for C domain numbers D-STRAND 77–84 8 (or 7 if
the C-DOMAIN of IG or antibodies and TR and the gap at 82) I
C-LIKE-DOMAIN of the IgSF other than IG and TR. DE-TURN 84.1– 0–14
84.7
The C domain strands, turns, and loops and their
85.1–
IMGT positions and lengths, based on the IMGT 85.7
unique numbering for C domain (C-DOMAIN and E-STRAND 85–96 12 hydrophobic 89
C-LIKE-DOMAIN) (Lefranc et al. 2005a), are shown EF-TURN 96.1– 0–2
in Table 2. 96.2
The C domain (C-DOMAIN of IG and TR and F-STRAND 97–104 8 2nd-CYS 104
C-LIKE-DOMAIN of IgSF other than IG and TR) is FG-LOOP 105–117 13 (or less,
composed by the A-STRAND of 15 (or 14 if gap at 10) or more)
G-STRAND 118–128 11 (or less)
amino acids (positions 1–15), the AB-TURN (addi- ®
a
tional positions 15.1, 15.2, and 15.3; the longest IMGT labels (concepts of description) are written in capital
letters (Labels and relations)
AB-TURN have three amino acids), the B-STRAND b
In number of amino acids (or codons)
®
of 11 amino acids (positions 16–26) with the 1st-CYS c
Residue@Position is a IMGT concept of numerotation that
at position 23, the BC-LOOP (positions 27–31, 34– numbers the position of a given residue (or that of a conserved
38), the C-STRAND of seven amino acids (positions property amino acid class), based on the IMGT unique
numbering
39–45) with the CONSERVED-TRP at position 41, the
CD-STRAND of one to nine amino acids (additional
positions 45.1–45.9), the D-STRAND of eight (or
seven if gap at 82) amino acids (positions 77–84), the
DE-TURN (additional positions 84.1–84.7 and 85.7– C Domain and V Domain Comparison
85.1, corresponding to 14 amino acids), the The A-STRAND and B-STRAND of the C domain are
E-STRAND of 12 amino acids (positions 85–96) with similar to those of the V domain (Lefranc et al. 2005a).
a conserved hydrophobic amino acid at position 89, the The longest BC-LOOP of the C domain have 10 amino
EF-TURN (additional positions 96.1 and 96.2, acids (missing positions 32 and 33), instead of
corresponding to two amino acids), the F-STRAND 12 amino acids in the V domain. The C-STRAND
of eight amino acids (positions 97–104) with the 2nd- and the D-STRAND of the C domain are shorter of
CYS at position 104, the FG-LOOP (positions 105– one position (46) and two positions (75, 76), respec-
117, these positions corresponding to a FG loop of 13 tively, compared to those of the V domain. The trans-
amino acids), and the G-STRAND of 11 (or less) versal CD-STRAND is a characteristic of the
amino acids (positions 118–128) (Lefranc et al. C domain (a V domain has instead two antiparallel
2005a) (Table 2, Fig. 2). beta strands C0 -STRAND and C00 -STRAND linked by
I 956 IMGT Unique Numbering

A AB B BC C CD D DE E EF F FG G
(1-15) (16-26) (27-38) (39-45) (77-84) (85-96) (97-104) (105-117) (118-128)
> > > > > > > >
1 10 15 16 23 26 27 38 39 41 45 77 84 85 89 96 97 104 105 117 118 128
87654321 . . . . . . . . . . . . 123 . . . . . . . . . . . . . . . . . . . . 1234567 . . . . . . 12345677654321 . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

IMGT Unique Numbering, Fig. 2 “IMGT Protein display for (Lefranc et al. 2005a). Missing positions (32 and 33 in the BC
C domain” header. The header comprises 5 lines. Line 1: name loop and 46 to 76) in the C domain by comparison with the
of strands (A, B. . .), turns and loops (AB, BC. . .). Line 2: V domain are not shown. Line 5: Pipes for the start and end
start and end positions of strands, e.g., (1–15). Line 3: arrows positions of strands and loops and highlighted positions, dots
(for the strands). Line 4: positions according to the IMGT for the other positions, plus if needed, numbers for additional
unique numbering for C-DOMAIN and C-LIKE-DOMAIN positions (e.g., 123. . .) (Lefranc et al. 2005a)

the C0 C00 -LOOP). The E-STRAND, F-STRAND, and IMGT Unique Numbering, Table 3 G domain strands, turns
G-STRAND of the C domain are similar to those of the and helix, IMGT positions and lengths, based on the IMGT
unique numbering for G domain (G-DOMAIN and G-LIKE-
V domain (Lefranc et al. 2005a) (IMGT Scientific
DOMAIN) (Lefranc et al. 2005b)
chart rules, http://www.imgt.org).
Characteristic
Residue @
IMGT Gaps and Additional Positions Positionc and
Dots in IMGT Protein displays (Fig. 2) and hatched G domain strands, IMGT additional
circles or squares in IMGT Colliers de Perles for turns and helixa positions Lengthsb positionsd
C domain (▶ IMGT Collier de Perles) correspond to A-STRAND 1–14 14 7A, CYS-11
missing positions according to the IMGT unique num- AB-TURN 15–17 3 (or 2 or
0)
bering for C domain (Lefranc et al. 2005a).
B-STRAND 18–28 11 (or 10e)
The longest BC-LOOP of the C domain have 10
BC-TURN 29–30 2
amino acids (missing positions 32 and 33, that are
C-STRAND 31–38 8
a feature of the C domain are not shown in the IMGT CD-TURN 39–41 3 (or 1f)
Colliers de Perles and IMGT Protein displays for D-STRAND 42–49 8 49.1 to 49.5
C domain). For BC loops shorter than 10 amino HELIX 50–92 43 (or less 54A, 61A, 61B,
acids, gaps are created from the apex in the following or more) 72A, CYS-74, 92A
®
order 34, 31, 35, 30, 36, etc. The FG-LOOP of the a
IMGT labels (concepts of description) are written in capital
C domain is similar to that of the V domain. Gaps for letters (Labels and relations)
b
FG loops shorter than 13 amino acids and additional In number of amino acids (or codons)
®
c
Residue@Position is a IMGT concept of numerotation that
positions for FG loops longer than 13 amino acids are numbers the position of a given residue (or that of a conserved
created following the same rules as those of the property amino acid class), based on the IMGT unique
V domain. numbering
d
Additional positions in the C domain define the AB- For details on the characteristic Residue@Position and addi-
tional positions, see text (Lefranc et al. 2005b)
TURN, DE-TURN, and EF-TURN (Table 2). For AB- e
Or 9 in some G-BETA (Lefranc et al. 2005b)
TURN shorter than 3 amino acids, gaps are created f
Or 0 in some G-ALPHA2-LIKE (Lefranc et al. 2005b)
(hatched in IMGT Colliers de Perles, or not shown in
structural data representations) in a decreasing ordinal
manner. For DE-TURN shorter than 14 amino acids,
gaps are created in the following order: 85.7, 84.7, other than MH or related proteins of the immune sys-
85.6, 84.6, 85.5, etc. For EF-TURN shorter than 2 tem (RPI)-MH1Like. The G domain strands, turns
amino acids, gaps are created in the following order: and helix and their IMGT positions and lengths,
96.2, 96.1 (Lefranc et al. 2005a). based on the IMGT unique numbering for G domain
(G-DOMAIN and G-LIKE-DOMAIN) (Lefranc et al.
IMGT Unique Numbering for G Domain 2005b), are shown in Table 3.
G Domain Strands, Turns and Helix The G domain (G-DOMAIN of MH and G-LIKE-
The IMGT unique numbering for G domain numbers DOMAIN of RPI-MH1Like) comprises a sheet of four
the G-DOMAIN of major histocompatibility (MH) and antiparallel beta strands linked by turns and a helix that
the G-LIKE-DOMAIN of ▶ MH superfamily (MhSF) seats on the beta strands, its axis forming an angle of
IMGT Unique Numbering 957 I
A B C D helix
AB BC CD
(1-14) (18-28) (31-38) (42-49) (50-92)
> > > >
1 7 14 18 28 31 38 42 49 50 54 61 72 80 90 92
0987654321 . . . . . A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1234567 . . . A . . . . . . AB . . . . . . . . . . A . . . . . . . . . . . . . . . . .

IMGT Unique Numbering, Fig. 3 “IMGT Protein display for numbering for G-DOMAIN and G-LIKE-DOMAIN (Lefranc
G domain” header. The header comprises 5 lines. Line 1: name et al. 2005b). Line 5: Pipes for the start and end positions of
of strands (A, B, C, D), turns (AB, BC, CD) and helix. Line 2: strands and helix and highlighted positions, dots for the other
start and end positions of strands (1–14), (18–28), (31–38), (42– positions, plus if needed, letters or numbers for additional posi-
49) and helix (50–92). Line 3: arrows (for the strands) and line tions that characterize some G domains (e.g., A, AB or 123. . .)
(for the helix). Line 4: positions according to the IMGT unique (Lefranc et al. 2005b)

about 40 with the strands. Two G domains are needed The additional position 7A represents a bulge in 3D
to form the MhSF groove made of a “floor” and two structures and is present in some G-ALPHA domains,
“walls.” Each G domain contributes by its strands and for instance those of the human HLA-DQA1 and
turns to half of the groove floor and by its helix to one HLA-DOA, and mouse H2-AA and H2-DOA chains
wall of the groove (Lefranc et al. 2005b). (Lefranc et al. 2005b). Additional positions at the
The G domain is composed by the A-STRAND of N-terminus of A-STRAND or at the C-terminus of I
14 amino acids (positions 1–14), the AB-TURN of D-STRAND can be added if necessary. Thus, two addi-
three or less amino acids (positions 15–17), the tional positions (1.2 and 1.1) are added at the N-terminus
B-STRAND of 11 or less amino acids (positions 18– of the A-STRAND of the G-ALPHA domains as the
28), the BC-TURN of two amino acids (positions 29 presence of these two amino acids was demonstrated
and 30), the C-STRAND of eight amino acids (posi- by protein sequencing of the HLA-DRA; however the
tions 31–38), the CD-TURN of three or less amino proteolytic cleavage site of the leader peptide (L-
acids (positions 39–41), the D-STRAND of eight REGION) needs to be confirmed experimentally for the
amino acids (positions 42–49), and a helix of 43 (or other G-ALPHA (Lefranc et al. 2005b). In each [D1] G-
less or more) amino acids (positions 50–92) (Lefranc DOMAIN (except the G-ALPHA domain of human
et al. 2005b) (Table 3, Fig. 3). HLA-DMA and mouse H2-DMA discussed below), the
The helix is split into two parts separated by a kink, amino acid at position 1 (shown within parentheses in
positions 58 of G-ALPHA1, 61 of G-ALPHA2, 63 of IMGT Protein displays, http://www.imgt.org) is encoded
G-ALPHA, and 62 of G-BETA being the “highest” by the codon that results from the splicing between the
points on the groove floor (Lefranc et al. 2005b). first exon (EX1) that encodes the L-REGION, and the
second exon (EX2) that encodes [D1] (Lefranc et al.
IMGT Gaps and Additional Positions 2005b). Ten amino acids have been added at positions
Dots in IMGT Protein displays (Fig. 3) and hatched 1.10–1.1 of HLA-DMA and H2-DMA but it is necessary
circles or squares in IMGT Collier de Perles for to confirm experimentally if they belong, or not, to
G domain (▶ IMGT Collier de Perles) correspond to the mature protein. Four additional positions (49.1–
missing positions according to the IMGT unique num- 49.4) are also observed at the C-terminus of D-STRAND
bering for G domain (Lefranc et al. 2005b). of the G-BETA domain of HLA-DMB and H2-DMB1
The gaps are localized in the turns. The AB-TURN (Lefranc et al. 2005b).
(positions 15–17) comprises three amino acids in the Interestingly, despite the high sequence divergence,
G-ALPHA1, G-ALPHA2, and G-BETA domains but only five additional positions in the helix (54A, 61A,
these positions are unoccupied in the G-ALPHA domains 61B, 72A and 92A) are necessary to align any G domain
(as well as position 18 of B-STRAND). The BC-TURN (Lefranc et al. 2005b). Three of them (61A, 61B, 72A)
(positions 29 and 30) comprises two positions that are characterize the G-ALPHA2 and/or G-BETA domains.
occupied in all G-DOMAIN. The CD-TURN (positions Indeed, positions 61A and 72A are occupied in the G-
39–41) is occupied by three amino acids in the ALPHA2 domain, whereas positions 61A, 61B, and 72A
G-ALPHA1 domains and by one amino acid in most of are occupied in G-BETA domains (except for the mouse
the other G domains (Lefranc et al. 2005b). H2-AB chain). The position 92A is only occupied in the
I 958 IMGT Unique Numbering

HLA-DMA and H2-DMA G-ALPHA domains. It is 2003, 2005a, b), the IMGT unique numbering has been
worthwhile to note that position 54A in G-ALPHA1- fundamental:
LIKE is the only additional position needed to • In the creation of the IMGT Colliers de Perles for V,
extend the IMGT numbering for G-DOMAIN to the C and G domain (▶ IMGT Collier de Perles)
G-LIKE-DOMAIN of the RPI-MH1Like proteins • In the definition of the Residue@Position concept
(Lefranc et al. 2005b). (standardized numerotation of amino acid changes
and polymorphisms)
G Domain Characteristics Comparison • In the standardized numerotation of mutations (at
Two cysteines, CYS-11 (in strand A) and CYS-74 the codon and nucleotide level) and definition of
(in the helix), are well conserved in the G-ALPHA2 alleles
and G-BETA domains where they participate to • In the delimitation of the FR-IMGT and CDR-
a disulfide bridge that fastens the helix on the groove IMGT for antibody engineering and antibody
floor. The G-ALPHA1 and G-ALPHA domains have humanization (Lefranc 2009)
a conserved N-glycosylation site at position 86 (N-X- • In the comparison of interactions and contact anal-
S/T, where N is asparagine, X is any amino acid except ysis between domains in 3D structures (Ehrenmann
proline, S is serine, and T is threonine). An et al. 2010)
N-glycosylation site is also found at position 86 in By bridging the gap between amino acid (and nucle-
the G-ALPHA2 domain of the mouse classical MH1 otide) sequences and 3D structures, the IMGT unique
chains (H2-D1, H2-K1 and H2-L). The G-BETA numbering is essential for domain comparison in immu-
domains (except for the human HLA-DMB and nogenetics and system biology. It is used in the IMGT®
mouse H2-DMB1 chains) have a conserved N- sequence and structure databases, the IMGT online
glycosylation site at position 15 (AB-TURN) (Lefranc tools (IMGT/V-QUEST, IMGT/DomainGapAlign,
et al. 2005b) IMGT/DomainDisplay, IMGT/Collier-de-Perles, etc.)
Interestingly, the G-ALPHA domains of the and the IMGT knowledge web resources (IMGT Pro-
HLA-DMA and H2-DMA chains have specific fea- tein displays, IMGT Alignments of alleles, IMGT
tures compared to the other G-ALPHA domains and Table of alleles, IMGT Colliers de Perles, etc.) (http://
share common characteristics with the G-ALPHA2 www.imgt.org) (▶ IMGT® Information System).
and G-BETA domains: there is a conserved As a major IMGT-ONTOLOGY concept of
CYS-11_CYS-74 disulfide bridge, positions 61A and numerotation, the IMGT unique numbering is used,
61B are occupied (as in the G-BETA domains) and with the IMGT-ONTOLOGY concepts of classifica-
there is no N-glycosylation site at position 86 (Lefranc tion (▶ Gene and Allele Nomenclature) and concepts
et al. 2005b) (IMGT Protein display in IMGT Reper- of description (▶ Labels and Relations), in the
toire, http://www.imgt.org). definition of monoclonal antibodies (mAb, suffix -mab)
Practically, the IMGT unique numbering for posi- and fusion proteins for immune applications (FPIA,
tions 1–39 and 73–92 of the G-ALPHA2 domains can suffix -cept) of the World Health Organization/Interna-
be obtained very easily by subtracting 90 from the tional Nonproprietary Name (WHO/INN) programme
mature protein numbering (91–129 and 163–182). (Lefranc 2011).
Between these positions, the two gaps (at positions
40 and 41) and the two insertions (at positions 61A
and 72A) are necessary, in the IMGT unique number- Cross-References
ing, to allow meaningful sequence and structure align-
ment and comparison between the G-ALPHA1 and ▶ Chain Type
G-ALPHA2 sequences. ▶ Complementarity Determining Region (CDR-
IMGT)
IMGT Unique Numbering Applications in ▶ Configuration Type
Immunogenetics and System Biology ▶ Constant (C) Domain
By allowing to number the V and C domains of the IG, ▶ Domain Type
TR, and IgSF other than IG and TR, and the G domain ▶ Framework Region (FR-IMGT)
of the MH and MhSF other than MH (Lefranc et al. ▶ Gene and Allele Nomenclature
IMGT® Information System 959 I
▶ Groove (G) Domain domains and Ig superfamily C-like domains. Dev Comp
▶ IMGT Collier de Perles Immunol 29:185–203
Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
▶ IMGT-ONTOLOGY (2005b) IMGT unique numbering for MHC groove
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom DOMAIN. Dev Comp Immunol 29:917–938
▶ IMGT-ONTOLOGY, Leafconcept Scaviner D, Barbié V, Ruiz M, Lefranc M-P (1999) Protein
displays of the human immunoglobulin heavy, kappa and
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom lambda variable and joining regions. Exp Clin Immunogenet
▶ IMGT ® Information System 16:234–240
▶ Immunogenetics
▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoglobulin Synthesis
▶ Immunoinformatics
▶ Labels and Relations IMGT ® Information System
▶ MH Superfamily (MhSF)
▶ MoleculeType Marie-Paule Lefranc
▶ Variable (V) Domain Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France I
References

Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, Synonyms


Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583.
IMGT®, the international ImMunoGeneTics information
Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/
3Dstructure-DB and IMGT/DomainGapAlign: a database system®
and a tool for immunoglobulins or antibodies, T cell
receptors, MHC, IgSF and MhcSF. Nucleic Acids Res
38:D301–D307
Folch G, Scaviner D, Contet V, Lefranc M-P (2000) Protein
displays of the human T cell receptor alpha, beta, gamma Definition
and delta variable and joining regions. Exp Clin
Immunogenet 17:205–215 IMGT ® is an ▶ information system (http://www.imgt.
Lefranc M-P (1997) Unique database numbering system for
org) created in 1989 to standardize and to manage the
immunogenetic analysis. Immunol Today 18:509
Lefranc M-P (1999) The IMGT unique numbering for immuno- complexity of ▶ immunogenetics data. The building of
globulins, T cell receptors and Ig-like domains. Immunolo- a unique ontology, ▶ IMGT-ONTOLOGY, has made
gist 7:132–136 IMGT ® the global reference in ▶ immunogenetics
Lefranc M-P (2009) Antibody database and tools: The IMGT ®
and ▶ immunoinformatics. IMGT® is a high-quality
experience, Ch 4. In: Zhiqiang An (ed) Therapeutic mono-
clonal antibodies: from bench to clinic. John Wiley Sons, integrated knowledge resource specialized in the
Hoboken, pp 91–114 immunoglobulins (IG) or antibodies, T cell receptors
Lefranc M-P (2011) Antibody nomenclature. From IMGT- (TR) and major histocompatibility (MH) of human
ONTOLOGY to INN definition. mAbs 3:1–2
and other vertebrate species, proteins of the ▶ immu-
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
FactsBook. Academic, London, pp 1–458 noglobulin superfamily (IgSF) and ▶ MH superfamily
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook. (MhSF), related proteins of the immune systems (RPI)
Academic, London, pp 1–398 of vertebrates and invertebrates, therapeutic monoclo-
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
nal antibodies (mAb), and fusion proteins for immune
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
unique numbering for immunoglobulin and T cell receptor applications (FPIA). IMGT ® provides a common
variable domains and Ig superfamily V-like domains. Dev access to standardized data from genome, proteome,
Comp Immunol 27:55–77 genetics, and three-dimensional (3D) structures.
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N,
IMGT ® consists of databases, interactive online
Guiraudou D, Jean C, Ruiz M, Da Piédade I, Rouard M,
Foulquier E, Thouvenin V, Lefranc G (2005a) IMGT unique tools, and Web resources. IMGT ® is freely available
numbering for immunoglobulin and T cell receptor constant at http://www.imgt.org.
I 960 IMGT ® Information System

Characteristics IMGT® is the global reference that provides the stan-


dards in ▶ immunogenetics and ▶ immunoinformatics
The number of genomics, genetics, three-dimensional (IMGT Scientific chart). IMGT® provides a common
(3D), and functional data published in the ▶ immuno- access to standardized data from genome, proteome,
genetics field is growing exponentially and involves genetics, and 3D structures. The IMGT® ▶ information
fundamental, clinical, veterinary, and pharmaceutical system manages ▶ immunogenetics knowledge
research. The number of potential protein forms of the according three main IMGT® biological approaches:
antigen receptors, immunoglobulins (IG) or anti- genomic, genetic, and structural approaches (Fig. 1).
bodies, and T cell receptors (TR) is almost unlimited. For each approach, IMGT ® provides databases,
The potential repertoire of each individual is estimated interactive online tools, and Web resources (Table 1).
to comprise about 2.1012 different IG and TR, and the
limiting factor is only the number of B and T cells that Genomic Approach
an organism is genetically programmed to produce. The IMGT ® genomic approach is gene-centered and
This huge diversity is inherent to the particularly com- mainly orientated toward the study of the genes within
plex and unique molecular synthesis and genetics of their loci and on the chromosomes.
the antigen receptor chains. This includes biological The IMGT ® genome database (IMGT/GENE-DB)
mechanisms such as DNA molecular rearrangements is the official repository of all the IG and TR genes and
(combinatorial diversity) in multiple loci (three for alleles approved by the World Health Organisation
IG and four for TR in humans) located on different (WHO)/International Union of Immunological Socie-
chromosomes (four in humans), nucleotide deletions ties (IUIS) Nomenclature Subcommittee for IG and
and insertions at the rearrangement junctions (or TR (Lefranc 2008). Reciprocal links exist between
N-diversity), and somatic hypermutations in the IG IMGT/GENE-DB and the Human Genome Nomencla-
loci (for a review, see Lefranc and Lefranc 2001a, b). ture Committee (HGNC) database and Entrez Gene at
IMGT ®, the international ImMunoGeneTics infor- the National Center for Biotechnology Information
mation system ® (http://www.imgt.org) (Lefranc et al. (NCBI).
2009), was created in 1989 by Marie-Paule Lefranc, The IMGT® genomics tools manage the locus orga-
Laboratoire d’ImmunoGénétique Moléculaire LIGM nization and gene location and provide the display of
(Université Montpellier 2 and CNRS) at Montpellier, physical maps for the human and mouse IG, TR, and MH
France, in order to standardize and to manage the loci. They allow to view genes in a locus (IMGT/
complexity of ▶ immunogenetics data. IMGT ® has GeneView, IMGT/LocusView), to search for clones
reached that goal through the building of a unique (IMGT/CloneSearch), to search for genes in a locus
ontology, ▶ IMGT-ONTOLOGY (Giudicelli and based on IMGT® gene names, functionality or localiza-
Lefranc 1999; Duroux et al. 2008), the first ontology tion on the chromosome (IMGT/GeneSearch, IMGT/
in ▶ immunogenetics and ▶ immunoinformatics. GeneInfo), or to provide a graphical representation of
IMGT ® is a high-quality integrated knowledge the numbers of available sequences containing
resource, specialized in the IG, TR, major histocom- rearranged IG and TR genes (IMGT/GeneFrequency).
patibility (MH) of human and other vertebrates, pro- The IMGT® genomics resources include chromo-
teins of the ▶ Immunoglobulin superfamily (IgSF) and somal localizations, locus representations, locus descrip-
▶ MH superfamily (MhSF), related proteins of the tion, gene positions, gene exon/intron organization, gene
immune systems (RPI) of vertebrates and invertebrates, exon/intron splicing sites, gene tables, potential germline
therapeutic monoclonal antibodies (mAb), and fusion repertoires, lists of genes and correspondence between
proteins for immune applications (FPIA). IMGT® has nomenclatures (IMGT Repertoire, Locus and genes
set up the official nomenclature of the IG and TR genes section).
and alleles, the definition of IMGT standardized labels,
and the ▶ IMGT unique numbering that bridges the gap Genetic Approach
between sequences and 3D structures for the ▶ variable The IMGT ® genetic approach refers to the study of the
(V) domain and ▶ constant (C) domain of the IG, TR, genes in relation with their sequence polymorphisms
and IgSF, and for the ▶ groove (G) domain of the MH and mutations, their expression, their specificity, and
and MhSF (Kaas et al. 2007; Lefranc et al. 2008). their evolution.
IMGT® Information System 961 I

IMGT / LIGM-DB Sequences / oligonucleotides IMGT / PRIMER-DB

Se
qu

genes and alleles


en

Identification of
ce
s/
mu
Gen tat
e ion
iden and al s

Nu
IMGT / CLL-DB le
tific
atio le

cle
Ana n IMGT / Allele-Align

oti
ly
junc sis of

dic
tion

/p
s

ro
IMGT / V-QUEST

tei
cse
es

qu
nc

IMGT /

ns f
tio s o

en
ue

mAb-DB

ce
nc si
eq

ju aly

s
/s

Se
An
ity

q ue
ific

INN

nc
ec

sequences
Sp

IMGT/JunctionAnalysis

es
/ ph
ylo
IMGT / GeneFrequency IMGT /

ge
Genes / reference sequences

ny
2Dstructure-DB

IMGT/PhyloGene
IMGT / GeneInfo
IMGT / DomainDisplay

es
IMGT / CloneSearch

nc
ny

ue
ge

IMGT / GeneSearch
eq
IMGT / DomainGapAlign
ylo

es
IMGT / GeneView
ph

nc
ere
s/
ne

IMGT / LocusView
ref
Ge

IMGT / Collier-de-Perles
s/
ne
Ge

loc Gen

3D structures
ali e

proteins
sa
tio
n

Genes
/ 3D str
IMGT / GENE-DB uctures
Genetic approach
Genomic approach IMGT /
3Dstructure-DB
Structural approach
s
ture
s truc 3D
Sequences 3D domains
Genes
3D structures IMGT / StructuralQuery IMGT / DomainSuperimpose

Monoclonal antibodies

IMGT ® Information System, Fig. 1 IMGT®, the international throughput version of IMGT/V-QUEST for Next Generation
ImMunoGeneTics information system ® (http://www.imgt.org). Sequencing (NGS) and deep sequencing of IG and TR V
IMGT ® comprises databases (shown as cylinders), tools (shown domains. Interactions between databases and tools in the geno-
as rectangles) and Web resources (not shown, see Table 1 for mic, genetic, and structural approaches are represented with
examples). IMGT/HighV-QUEST (not shown) is the high continuous, dotted, and broken lines

The IMGT ® sequences databases manage: 2. MH sequences of human and other vertebrates
1. IG and TR nucleotide sequences from human and (IMGT/MH-DB)
other vertebrate species (IMGT/LIGM-DB, created 3. Oligonucleotide sequences used as primers for com-
in 1989, on the Web since July 1995, the first and binatorial library constructions, scFv, phage display,
the largest IMGT ® database) or microarray technologies (IMGT/PRIMER-DB)
I 962 IMGT ® Information System

IMGT ® Information System, Table 1 IMGT ® databases, tools, and Web resources for genomic, genetic, and structural
approaches
Approaches Databases Tools Web resourcesa
Genomic IMGT/GENE-DB IMGT/GeneView IMGT Repertoire, Locus and genes section:
IMGT/LocusView – Chromosomal localizations
IMGT/CloneSearch – Locus representations
IMGT/GeneSearch – Locus description, etc.
IMGT/GeneInfo – Gene tables
IMGT/GeneFrequency – Potential germline repertoires
– Lists of genes
– Correspondence between nomenclatures
Genetic IMGT/LIGM-DB IMGT/V-QUEST IMGT Repertoire, Proteins and alleles section:
IMGT/MH-DB IMGT/HighV-QUEST – Alignments of alleles
IMGT/PRIMER-DB IMGT/JunctionAnalysis – Tables of alleles
IMGT/CLL-DB IMGT/Allele-Align – Allotypes
IMGT/mAb-DB IMGT/PhyloGene – Isotypes
IMGT/2Dstructure-DB IMGT/DomainDisplay – Protein displays, etc.
Structural IMGT/3Dstructure-DB IMGT/DomainGapAlign IMGT Repertoire, 2D and 3D structures section:
IMGT/Collier-de-Perles – IMGT Colliers de Perles (2D representations on one layer
or two layers)
IMGT/DomainSuperimpose – IMGT ® classes for amino acid characteristics
IMGT/StructuralQuery – IMGT Colliers de Perles reference profiles
– 3D representations
a
Only Web resource examples from the IMGT Repertoire sections are shown

4. Amino acid sequences of therapeutical monoclonal binding characteristics in relationship with the protein
antibodies of the WHO/International Nonproprietary functions, polymorphisms, and evolution.
Name (INN) program (IMGT/2Dstructure-DB cards) The IMGT® 3D structure database comprises IG,
The IMGT ® sequence analysis tools allow the iden- TR, MH, IgSF, MhSF, RPI, mAb, and FPIA with
tification of the variable (V), diversity (D), and joining known 3D structures (IMGT/3Dstructure-DB). The
(J) genes and of their mutations in rearranged IG and IMGT/3Dstructure-DB cards provide chain details with
TR sequences (IMGT/V-QUEST, approved by the IMGT annotations (receptor, chain and domain descrip-
European Research Initiative on Chronic Lymphocytic tion, IMGT gene and allele names, domain delimitations,
Leukaemia CLL (ERIC) for the analysis of the IGHV and ▶ IMGT Collier de Perles), contact analysis, down-
gene mutational status in CLL), the analysis of the V-J loadable renumbered IMGT/3Dstructure-DB flat files,
and V-D-J junctions that confer the antigen receptor visualization tools (Jmol and QuickPDB), and external
specificity (IMGT/JunctionAnalysis), the detection of links. IMGT Residue@Position cards provide detailed
polymorphisms (IMGT/Allele-Align), gene evolution information on the inter- and intra-domain contacts at
analyses (IMGT/Phylogene), or the display of amino each residue position, based on the ▶ IMGT unique
acid sequences from the IMGT domain directory numbering.
(IMGT/DomainDisplay). The IMGT ® structural tools allow to analyze amino
The IMGT ® genetics resources include alignments acid sequences per domain (IMGT/DomainGapAlign),
of alleles, tables of alleles, allotypes, isotypes, and to make ▶ IMGT Collier de Perles (IMGT/Collier-de-
protein displays (IMGT Repertoire, Proteins and Perles), to superimpose two domain 3D structures from
alleles section). IMGT/3Dstructure-DB (IMGT/DomainSuperimpose),
or to retrieve the IMGT/3Dstructure-DB entries
Structural Approach containing a V domain, based on specific structural
The IMGT ® structural approach refers to the study of characteristics of the intramolecular interactions: phi
the 2D and 3D structures and to the antigen- or ligand- and psi angles, distance in angstrom between amino
IMGT® Information System 963 I
acids, ▶ complementarity determining region (CDR-
FROM SEQUENCES TO
3D STRUCTURES IMGT) lengths, and so on (IMGT/StructuralQuery).
>CAMPATH-1H|1bey_H|VH-CH1|219 AA|23397.2
The IMGT ® structural resources include IMGT
MW|VH+CH1
Colliers de Perles, ▶ framework region (FR-IMGT),
QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQ
PPGRGLEWIGFIRDKAKGYTT
and CDR-IMGT lengths, profiles based on the 11
EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAR
EGHTAAPFDYWGQGSLVTVS IMGT amino acid physicochemical classes (IMGT
SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVS
WNSGALTSGVHTFPAVLQS Repertoire, 2D and 3D structures section).
SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKV

Other IMGT ® Databases and Web Resources


IMGT/CLL-DB contains IG sequences of CLL
patients, analyzed by IMGT/V-QUEST. IMGT/mAb-
DB provides information on therapeutic monoclonal
antibodies (mAb) and fusion proteins for immune
applications (FPIA). In addition to the IMGT Scientific
chart and IMGT Repertoire, IMGT ® Web resources
comprise IMGT Index, IMGT Bloc-notes, IMGT Edu-
cation (IMGT Lexique, Aide-Mémoire, Tutorials, etc.),
the IMGT Medical page, the IMGT Veterinary page, the I
IMGT Biotechnology page, the IMGT Immunoin-
formatics page, and IMGT other accesses.

IMGT ® Users
Since July 1995, IMGT ® has been available on the
Web at the IMGT Home page http://www.imgt.org
(Montpellier, France). The knowledge provided by the
IMGT® information system is of much value to clini-
cians and biological scientists in general. IMGT® data-
bases, tools, and Web resources are extensively queried
and used by scientists from both academic and research
laboratories from pharmaceutical companies, who are
equally distributed between the United States, Europe,
and the remaining world. IMGT® is used in very diverse

IMGT ® Information System, Fig. 2 IMGT ® approach from


sequence to three-dimensional (3D) structure in antibody engi-
neering and humanization. The VH domain of the alemtuzumab
antibody is shown as an example (IMGT/3DstructureDB and
PDB code:1bey) (http://www.imgt.org). From top to bottom:
(1) VH amino acid sequence. (2) IMGT Collier de Perles on
one layer. (3) IMGT Collier de Perles on two layers. Hydrogen
bonds between the amino acids of the C, C0 , C00 , and F and G
strands and those of the CDR-IMGT are shown. (4) Ribbon 3D
representation. (5) Spacefill 3D representation. The CDR1-
IMGT, CDR2-IMGT and CDR3-IMGT regions are colored in
red, orange and purple, respectively (IMGT Color menu). The
CDR-IMGT lengths are [8.10.12]. Anchor positions are shown
as squares in (2) and (3) (26 and 39, 55 and 66, 104 and 118), and
© Copyright 1995-2010 IMGT®, the international ImMunoGeneTics
information system® as spheres in (4). Hydrophobic amino acids (hydropathy index
© 2010 C. Ginestoux, F. Ehrenmann and M.-P. Lefranc
with positive value) and tryptophan (W) found at a given posi-
tion in more than 50% of analysed IG and TR sequences are
IMGT ® Information System, Fig. 2 (continued) shown in blue in (2) and (3)
I 964 IMGT ®, the International ImMunoGeneTics Information System ®

domains: (1) fundamental research, (2) medical research Kaas Q, Ehrenmann F, Lefranc M-P (2007) IG, TR, MHC, IgSf
(vaccine design, repertoire analysis of the IG antibody and MhcSF: what do we learn from the IMGT Colliers de
Perles? Brief Funct Genomic Proteomic 6:253–264
sites and of the TR recognition sites in normal and Lefranc M-P (2008) WHO-IUIS Nomenclature Subcommittee
pathological situations such as autoimmune diseases, for immunoglobulins and T cell receptors report August
infectious diseases, AIDS, leukemias, lymphomas, and 2007, 13th International Congress of Immunology, Rio de
myelomas), (3) veterinary research (immune repertoires Janeiro, Brazil. Dev Comp Immunol 32:461–463
Lefranc M-P (2009) Antibody database and tools: the IMGT ®
in domestic and wild life species, animal models), experience, Ch 4. In: Zhiqiang An (ed) Therapeutic mono-
(4) genomics (study of the genome diversity and evolu- clonal antibodies: from bench to clinic. John Wiley Sons,
tion of the adaptive immune response), (5) structural Hoboken, pp 91–114
biology (evolution of the domains of the IgSF and Lefranc M-P, Lefranc G (2001a) The immunoglobulin
FactsBook. Academic Press, London, UK, pp 1–458
MhSF proteins), (6) biotechnology related to antibody Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
engineering and ▶ antibody humanization (construction Academic Press, London, UK, pp 1–398
and analysis of single chain Fragment variable Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
(scFv), phage displays, combinatorial libraries, chimeric, IMGT, a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform
humanized, and human antibodies) (Fig. 2) (Lefranc 9:263–275
2009; Ehrenmann et al. 2010), (7) diagnostics (clonalities, Lefranc M-P, Giudicelli V, Ginestoux C, Jabado-Michaloud J,
detection, and follow-up of minimal residual diseases), Folch G, Bellahcene F, Wu Y, Gemrot E, Brochet X, Lane J,
and (8) therapeutical approaches (grafts, immunotherapy, Regnier L, Ehrenmann F, Lefranc G, Duroux P (2009)
IMGT ®, the international ImMunoGeneTics information
and vaccinology). system ®. Nucl Acids Res 37:D1006–D1012

Cross-References

▶ Complementarity Determining Region (CDR- IMGT ®, the International


IMGT) ImMunoGeneTics Information System ®
▶ Constant (C) Domain
▶ Framework Region (FR-IMGT) ▶ IMGT ® Information System
▶ Groove (G) Domain
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering
▶ IMGT-ONTOLOGY IMGT-ONTOLOGY
▶ Immunogenetics
▶ Immunoglobulin Superfamily (IgSF) Véronique Giudicelli and Marie-Paule Lefranc
▶ Immunoinformatics Laboratoire d’ImmunoGénétique Moléculaire,
▶ Information System Institut de Génétique Humaine UPR 1142,
▶ MH Superfamily (MhSF) Université Montpellier 2, Montpellier, France
▶ Variable (V) Domain

Synonyms
References
International ImMunoGeneTics information system ®;
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, Ontology of IMGT ®; Ontology of the IMGT ®
Giudicelli V (2008) IMGT-Kaleidoscope, the formal IMGT-
information system
ONTOLOGY paradigm. Biochimie 90:570–583
Ehrenmann F, Duroux P, Giudicelli V, Lefranc M-P (2010) -
Standardized sequence and structure analysis of antibody
using IMGT ®. In: Kontermann R, D€ ubel S (eds) Antibody Definition
engineering, Ch 2, vol 2, 2nd edn. Springer-Verlag, Berlin
Heidelberg, pp 11–31
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- IMGT-ONTOLOGY is the ontology for ▶ immunoge-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 netics and ▶ immunoinformatics. IMGT-ONTOLOGY,
IMGT-ONTOLOGY 965 I
created to manage the complexity of ▶ immunogenetics IDENTIFICATION OBTENTION
knowledge, is the first ontology and global reference in
the domain. IMGT-ONTOLOGY has been essential DESCRIPTION

for the creation in 1989 of IMGT®, the international Formal


IMGT-ONTOLOGY ORIENTATION
ImMunoGeneTics information system® (http://www. CLASSIFICATION
imgt.org) (▶ IMGT® Information System). IMGT-
ONTOLOGY provides a semantic specification of the NUMEROTATION LOCALIZATION
terms and relations to be used and a conceptualization of
the knowledge in immunogenetics and immunoin- IMGT-ONTOLOGY, Fig. 1 The seven axioms of the Formal
formatics, allowing IMGT® to bridge biological IMGT-ONTOLOGY or IMGT-Kaleidoscope
and computational spheres in bioinformatics. IMGT-
ONTOLOGY manages the immunogenetics knowledge molecular mechanisms in B cells (▶ Immunoglobulin
through diverse facets that rely on the seven axioms of the Synthesis) and in T cells are at the origin of this huge
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope: diversity (Lefranc and Lefranc 2001a, b). These mecha-
“IDENTIFICATION,” “DESCRIPTION,” “CLASSIFI- nisms include in particular DNA rearrangements and, for
CATION,” “NUMEROTATION,” “LOCALIZATION,” the IG, somatic hypermutations. In addition, there is
“ORIENTATION,” and “OBTENTION.” The concepts a considerable polymorphism of the major histocompat-
generated from these axioms have been essential for ibility (MH) proteins (human leukocyte antigens (HLA) I
establishing the IMGT Scientific chart rules for the in humans). These particularities of the adaptative
genome, proteome, genetics, and two-dimensional (2D) immune system of the vertebrates, and a better knowl-
and three-dimensional (3D) structures. As the same edge of the innate immune responses found in any spe-
axioms can be used to generate concepts for multi-scale cies, have made the immune system an excellent model
level approaches, the Formal IMGT-ONTOLOGY repre- for system biology.
sents a paradigm for system biology ontologies, which IMGT-ONTOLOGY (Giudicelli and Lefranc 1999;
need to identify, classify, describe, number, localize, and Lefranc et al. 2004, 2005, 2008; Duroux et al. 2008),
orientate objects, processes, and relations at the molecule, the first ontology for immunogenetics and immunoin-
cell, tissue, organ, organism, or population levels. formatics, has been built to ensure the accuracy and the
IMGT-ONTOLOGY concepts of identification are avail- consistency of the IMGT® data, as well as the coherence
able at the National Center for Biomedical Ontology between the components (databases, tools and
(NCBO) BioPortal http://bioportal.bioontology.org/ and Web resources) of IMGT®, the international ImMuno-
at IMGT® http://www.imgt.org. GeneTics information system® (http://www.imgt.org)
(▶ IMGT® Information System). IMGT-ONTOLOGY
is now acknowledged as the global reference in immu-
Characteristics nogenetics and immunoinformatics, allowing IMGT® to
bridge biological and computational spheres in bioinfor-
IMGT-ONTOLOGY Knowledge Domain matics (Lefranc et al. 2008).
An ontology is a formal representation of a knowledge
domain (Gruber 1993; Giudicelli and Lefranc 1999). The IMGT-ONTOLOGY Axioms and Concepts
knowledge domain covered by IMGT-ONTOLOGY IMGT-ONTOLOGY manages the ▶ immunogenetics
is the ▶ immunogenetics and ▶ immunoinformatics. knowledge through diverse facets that rely on the seven
Immunogenetics studies the genetics of the immune axioms of the Formal IMGT-ONTOLOGY or IMGT-
responses. In addition to the innate immune response, Kaleidoscope: “IDENTIFICATION,” “DESCRIP-
present in all living organisms, vertebrates with jaws (or TION,” “CLASSIFICATION,” “NUMEROTATION,”
gnathostomata, from fish to humans) have acquired the “LOCALIZATION,” “ORIENTATION,” and
adaptative immune response, characterized by an “OBTENTION” (Giudicelli and Lefranc 1999; Lefranc
extreme diversity of the specific antigen receptors that et al. 2004, 2005, 2008; Duroux et al. 2008). These
comprise the immunoglobulins (IG) or antibodies and axioms postulate that any object, any process and any
the T cell receptors (TR) (1012 different IG and 1012 relation, has to be identified, described, classified, num-
different TR per individual, in humans). Complex bered, localized, and orientated, and that the way it is
I 966 IMGT-ONTOLOGY

IMGT-ONTOLOGY, Table 1 Axioms of the Formal IMGT-ONTOLOGY (or IMGT-Kaleidoscope), concepts generated from them,
examples of concepts, and IMGT Scientific chart rules
Formal IMGT- Concepts generated from IMGT Scientific
ONTOLOGY axioms the axioms Examples of IMGT-ONTOLOGY concepts chart rulesa
IDENTIFICATION Concepts of ChainType (*) Standardized
axiom identification ConfigurationType keywords
DomainType
FunctionalityType
FunctionType
GeneType
LocationType
Molecule_EntityType
MoleculeType
MoleculeUnit
ReceptorType (*)
SpecificityType
StructureType
TaxonRank (*)
DESCRIPTION axiom Concepts of description Core Prototypes
Molecule_EntityPrototype Standardized labels
Recombination signal (RS) (Labels and
relations)
CLASSIFICATION Concepts of Group Gene and allele
axiom classification Subgroup nomenclature
Gene
Allele
NUMEROTATION Concepts of IMGT_unique_numbering (IMGT unique numbering) Contact analysis
axiom numerotation for V, C, and G domains Paratope/epitope
IMGT_Collier_de_Perles (IMGT Collier de Perles) for Conserved positions
V, C, and G domains
LOCALIZATION axiom Concepts of localization Cell compartment Chromosomal
Chromosome localization
Locus Locus
representation
Organ part Standardized
Organ keywords
ORIENTATION axiom Concepts of orientation DNA_strand_orientation IMGT index
Genomic_orientation (*)
OBTENTION axiom Concepts of obtention Origin Standardized
Methodology keywords
(*) indicates a highconcept, i.e., a concept with different levels of granularity
a
The corresponding controlled vocabulary and rules are available in the IMGT Scientific Chart at http://www.imgt.org

obtained can be characterized (Duroux et al. 2008; any molecule, cell, tissue, organ, organism, or popula-
Lefranc et al. 2008) (Fig. 1). tion, any process and any relation, has to be identified.
The IMGT-ONTOLOGY axioms, the concepts The IDENTIFICATION axiom has generated
generated from them, examples of concepts, and the concepts of identification which provide the
IMGT Scientific chart rules are listed in Table 1. terms and rules to identify an entity, its processes,
and its relations. The ▶ TaxonRank highconcept
IDENTIFICATION Axiom (Table 1) allows to identify the type of taxon
The ▶ IDENTIFICATION axiom of the Formal IMGT- in which an object, process, or relation is found
ONTOLOGY or IMGT-Kaleidoscope postulates that (Giudicelli and Lefranc 1999). Two major concepts
IMGT-ONTOLOGY 967 I
conventional
gDNA undefined
variable
mRNA germline
MoleculeType GeneType diversity ConfigurationType
cDNA rearranged
joining
protein partially-rearranged
constant
is_defined_by

defines is_defined_by
defines
defines is_defined_by

Molecule_EntityType

_has_ _for_ _has_ _for_

functional I
FunctionalityType StructureType regular
ORF
chimeric
pseudogene
humanized
productive
unproductive

IMGT-ONTOLOGY, Fig. 2 The “Molecule_EntityType” “is_defined_by” and “defines,” “_has_”, and “_for_”.
concept. The “Molecule_EntityType” concept is defined by the Leafconcepts are general (online: in blue) or specific of the IG
“MoleculeType,” “GeneType,” and “ConfigurationType” and TR (online: in red). The “Molecule_EntityType” concept
concepts of identification and has properties identified in the has 38 leafconcepts. Only a few examples of the
“FunctionalityType” and “StructureType” concepts (IDENTIFI- “StructureType” leafconcepts are shown
CATION axiom). Arrows indicate reciprocal relations

of identification in molecular biology, the sequence identifies, for cDNA (▶ MoleculeType),


▶ Molecule_EntityType and ReceptorType concepts are rearranged (▶ Configuration Type) entities comprising
given as examples. the core regions of a V gene (with its leader L),
J gene, and C gene (▶ Gene Type). If only the
“Molecule_EntityType” Concept “▶ MoleculeType” (“gDNA,” “mRNA,” “cDNA,” or
The “▶ Molecule_EntityType” concept allows to “protein”) is considered, the “Molecule_EntityType”
identify the type of molecule entity. The leafconcepts identify a “gene” (“MoleculeUnitGene,”
“▶ Molecule_EntityType” concept is defined by the 10 leafconcepts), a “transcript” (“MoleculeUnit-
“▶ MoleculeType” (“gDNA,” “mRNA,” “cDNA,” Transcript,” 11 leafconcepts), a “cDNA sequence”
“protein”), “▶ GeneType” (conventional, variable, (“MoleculeUnitSequence,” 11 leafconcepts), or
diversity, joining, constant), and “▶ ConfigurationType” a “chain” (“MoleculeUnitChain,” 6 leafconcepts), as
(undefined, germline, rearranged, partially-rearranged) indicated by their suffix, and corresponding to the four
concepts of identification (Duroux et al. 2008; Lefranc “MoleculeUnit” child concepts, respectively.
et al. 2008) (Fig. 2). The “▶ Molecule_EntityType” concept has proper-
The “Molecule_EntityType” comprises 38 ties identified in the “▶ FunctionalityType” (func-
leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept). tional, ORF, pseudogene, productive, unproductive)
For examples, a V gene identifies, for gDNA and “▶ StructureType” concepts (regular, chimeric,
(▶ MoleculeType), entities with a germline (▶ Config- humanized, etc.) (Duroux et al. 2008; Lefranc et al.
uration Type) V gene (▶ Gene Type), a L-V-J-C- 2008) (Fig. 2).
I 968 IMGT-ONTOLOGY

DomainType Molecule_EntityType Concepts of classification


(hierarchy)
defines is_defines_by defines
is_defined_by
defines is_defined_by

ChainType

defines is_defined_by

ReceptorType

_has_ _for_ _has_ _for_ _has_ _for_

StructureType SpecificityType FunctionType

IMGT-ONTOLOGY, Fig. 3 The “ReceptorType” concept. The “DomainType” concepts and by concepts of classification orga-
“ReceptorType” concept, defined by the “ChainType” concept nized in a hierarchy (see CLASSIFICATION axiom). Arrows
of identification, has properties identified in the indicate reciprocal relations “is_defined_by” and “defines,”
“StructureType,” “SpecificityType,” and “FunctionType” con- “_has_”, and “_for_”. The “ChainType” and “ReceptorType”
cepts (IDENTIFICATION axiom). The “ChainType” concept is concepts have different levels of granularity (up to five) and are
itself defined by the “Molecule_EntityType” and highconcepts

“ReceptorType” Concept DESCRIPTION Axiom


The “ReceptorType” concept allows to identify The ▶ DESCRIPTION axiom of the Formal IMGT-
the type of receptor, based on the ChainType ONTOLOGY or IMGT-Kaleidoscope postulates that
leafconcept(s) that identify the associated chains. any molecule, cell, tissue, organ, organism, or popula-
The “ReceptorType” concept is defined by the tion, any process and any relation, has to be described.
“▶ ChainType” concept of identification and has prop- In molecular biology, the DESCRIPTION axiom has
erties identified in the “StructureType,” generated the concepts of description which provide
“▶ SpecificityType,” and “▶ FunctionType” concepts the terms and the rules to describe motifs in the
(Duroux et al. 2008; Lefranc et al. 2008) (Fig. 3). nucleotide and amino acid sequences and in two-
The “▶ ChainType” concept is itself defined by the dimensional (2D) (▶ IMGT Collier de Perles) and
“▶ Molecule_EntityType” and the “▶ DomainType” three-dimensional (3D) structures. These concepts
concepts of identification and by concepts of classifica- gave rise to a standardized terminology (▶ Labels
tion (see CLASSIFICATION axiom). These latter are and Relations) and to a precise definition of the
organized in a hierarchy which confers different levels of annotation rules (IMGT Scientific chart, http://www.
granularity (up to five) to the “ReceptorType” and imgt.org).
“ChainType” concepts that are therefore defined as
“highconcepts” (▶ IMGT-ONTOLOGY, Highconcept). “Molecule_EntityPrototype” Concept
Thus, IG can be identified as a leafconcept of The Molecule_EntityPrototype concept, generated
the “ReceptorType” concept, comprising four chains, from the DESCRIPTION axiom, allows to provide
two IG-Heavy-Chain, and two IG-Light-Chain the description of the “Molecule_EntityType” concept
(▶ Chain Type) identical two by two and covalently (IDENTIFICATION axiom). Each leafconcept of
linked (Fig. 4). At a different granularity (“GeneLevel- the “Molecule_EntityPrototype” concept is linked
ChainType”), the receptor is identified as comprising to a leafconcept of the “Molecule_EntityType”
two IG-Heavy-Gamma1-Chain and two IG-Light- concept by the reciprocal relations “describes” and
Kappa-Chain. “is_described_by.” A “Molecule_EntityPrototype”
IMGT-ONTOLOGY 969 I
a b
VH
VH VH
VL -

S
-

S
S
CL VL

S
CH1 CH1
CL
-

S
-
-

S
S
-

S
S

S
CH1

S
- VL
-

S
S
Hinge-Region

S
CL

CH2 CH2
S S
CH2

-
S S

CH3
CH3 S S
CH3

-
S S

c d

V
I
V
V V
V J
1
2
CH1 CH1 V 3 CL

C C
DJ CH
1
CH2 CH3
Hinge
CH2 CH3
CH2 CH2 DJ
CH1
V
CL

CH3 CH3

IMGT-ONTOLOGY, Fig. 4 Identification of an IG or antibody respectively. CL, CH1, CH2, and CH3 are C domains (Constant
as a leafconcept of the “ReceptorType” concept. The four (C) domain). (a) 3D structure, (b) organization in Ig-like
representations, although different, allow to identify an IG domains, (c) organization in modules, and (d) regions coded by
as a receptor of four chains, two IG-Heavy-Chain and two the V, D, J, and C genes (GeneType). The C gene codes the
IG-Light-Chain (“ChainType”), themselves organized in constant region (CL for the IG-Light-Chain and CH1, hinge,
domains (“DomainType”). VH and VL are V domains (Variable CH2, and CH3 for the IG-Heavy-Chain). This representation,
(V) domain), coded by the V-D-J-region and V-J-region, schematized as a Y shape, is frequently used to represent an IG

leafconcept describes the entity organization with ▶ Constant (C) Gene) (▶ Gene Type). These impor-
its constitutive motifs and relations (▶ Labels and tant labels allow to describe the leafconcepts of
Relations) (Fig. 5). “Molecule_EntityPrototype” and to link sequences
and structures. Moreover, they permitted the definition
“Core” Concept and standardized description of the IG and TR alleles
Among the concepts of description, the “Core” (concepts of classification), now approved at the inter-
concept allows to describe the coding region of genes national level (Lefranc and Lefranc 2001a, b).
and contains five leafconcepts which are “REGION”
(for ▶ Conventional Gene), “V-REGION,” CLASSIFICATION Axiom
“D-REGION,” “J-REGION,” and “C-REGION” (for The ▶ CLASSIFICATION axiom of the Formal
V, D, J, and C genes, respectively) (▶ Variable (V) IMGT-ONTOLOGY or IMGT-Kaleidoscope postu-
Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene, lates that any molecule, cell, tissue, organ, organism,
I 970 IMGT-ONTOLOGY

a V-GENE
L-V-GENE-UNIT
L-INTRON-L V-REGION V-RS
3′-V-REGION V-SPACER

FR1-IMGT FR2-IMGT FR3-IMGT 3′UTR


5′UTR V-INTRON
INIT- DONOR-SPLICE ACCEPTOR-SPLICE 1st-CYS CONSERVED-TRP 2nd-CYS V-HEPTAMER V-NONAMER
CODON

L-PART1 L-PART2 CDR1-IMGT CDR2-IMGT CDR3-IMGT

V-EXON

b V-D-J-GENE
(N-D)-REGION

D-REGION
N-REGION N-REGION
L-INTRON-L V-REGION J-REGION

3′-V-REGION 5′-J-REGION

5′UTR V-INTRON FR1-IMGT FR2-IMGT FR3-IMGT FR4-IMGT


3′UTR
INIT- DONOR-SPLICE ACCEPTOR-SPLICE 1st-CYS CONSERVED-TRP 2nd-CYS J-TRP DONOR-SPLICE
CODON

L-PART1 L-PART2 CDR1-IMGT CDR2-IMGT CDR3-IMGT

JUNCTION
V-D-J-REGION

V-D-J-EXON

IMGT-ONTOLOGY, Fig. 5 Graphical representation of two GENE. Thirty-nine labels and twelve relations are necessary
leafconcepts of “Molecule_EntityPrototype” (DESCRIPTION and sufficient for a complete description of these leafconcepts
axiom of IMGT-ONTOLOGY). (a) V-GENE. (b) V-D-J-

or population, any process and any relation, has to be (Giudicelli et al. 2005), the IMGT® gene database, in
classified. In molecular biology, the concepts of clas- the Human Genome Database (GDB), in LocusLink at
sification generated from the CLASSIFICATION the National Center for Biotechnology Information
axiom allow to classify and name the genes and their (NCBI), in Entrez Gene when this database superseded
alleles (▶ Gene and Allele Nomenclature). The genes LocusLink, in Ensembl at the European Bioinformatics
which code the IG and TR belong to highly polymor- Institute (EBI), and in the Vega Genome Browser at the
phic multigenic families. A major contribution of Wellcome Trust Sanger Institute.
IMGT-ONTOLOGY was to set the principles of their
classification (Giudicelli and Lefranc 1999) (Fig. 6) NUMEROTATION Axiom
and to propose a standardized nomenclature (Lefranc The ▶ NUMEROTATION axiom of the Formal IMGT-
and Lefranc 2001a, b) based on the concepts of Group, ONTOLOGY or IMGT-Kaleidoscope postulates that any
Subgroup, Gene, and Allele (▶ Gene and Allele molecule, cell, tissue, organ, organism or population, any
Nomenclature). process and any relation, has to be numbered. In molec-
The IMGT® gene nomenclature for IG and TR genes ular biology, two major concepts of numerotation, gener-
(Lefranc and Lefranc 2001a, b) has been approved at the ated from the NUMEROTATION axiom, comprises the
international level by the Human Genome Organisation widely acknowledged ▶ IMGT unique numbering and
(HUGO) Nomenclature Committee (HGNC), in 1999, ▶ IMGT Collier de Perles concepts.
and by the World Health Organization-International
Union of Immunological Societies (WHO-IUIS) LOCALIZATION, ORIENTATION, and OBTENTION
(Lefranc 2007, 2008). The IMGT® IG and TR gene Axioms
names are the official reference for the genome projects The ▶ LOCALIZATION axiom, the ▶ ORIENTA-
and, as such, have been entered in IMGT/GENE-DB TION axiom, and the ▶ OBTENTION axiom of the
IMGT-ONTOLOGY 971 I
a b
Group Locus IGHV Human IGH
14q32.33

is_a_member_of has_member is_a_member_of has_member

in

in
d_

d_
re

ne

re
de

ne
ge

de
Subgroup IGHV1

or

ge
s_

or
is_

s_
is_
ha

ha
is_a_member_of has_member is_a_member_of has_member

Gene IGHV1-2

is_a_variant_of has_variant is_a_variant_of has_variant

Allele IGHV1-2*01

IMGT-ONTOLOGY, Fig. 6 Concepts of classification for gene They are associated to a “TaxonRank” concept and more pre-
and allele nomenclature (CLASSIFICATION axiom). (a) Hier- cisely for the “Gene” and “Allele” concepts to a leafconcept of
archy of the concepts of classification and their relations. “Species” (here, Homo sapiens). The “Locus” concept is I
(b) Examples of leafconcepts for each concept of classification. a concept of localization (LOCALIZATION axiom)

Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope ▶ Constant (C) Domain


postulate that any molecule, cell, tissue, organ, organism, ▶ Constant (C) Gene
or population, any process and any relation, has to be ▶ Conventional Gene
localized, orientated and that the way it is obtained can ▶ Diversity (D) Gene
be characterized. Concepts generated from these axioms ▶ Domain Type
are still in development and enriched with advances in ▶ FunctionalityType
science. ▶ Function Type
▶ Gene and Allele Nomenclature
Conclusion ▶ Gene Type
The concepts of IMGT-ONTOLOGY are currently used ▶ IMGT Collier de Perles
for the exchange and the sharing of knowledge in very ▶ IMGT Unique Numbering
diverse fields of basic, veterinary, pharmaceutical, and ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
medical research: immunogenetics, immunology, genet- ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
ics, molecular biology, analysis of the immune repertoires, ▶ IMGT-ONTOLOGY, Highconcept
evolution study of the ▶ immunoglobulin superfamily ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
(IgSF) and ▶ MH superfamily (MhSF), biotechnology ▶ IMGT-ONTOLOGY, Leafconcept
related to antibody engineering, and humanization ▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
(Lefranc et al. 2008). IMGT-ONTOLOGY represents ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
a key component in the elaboration and setting up of ▶ IMGT-ONTOLOGY, OPTENTION Axiom
standards for system biology and modeling of the immune ▶ IMGT-ONTOLOGY, ORIENTATION Axiom
system. ▶ IMGT-ONTOLOGY, SpecificityType
▶ IMGT ® Information System
▶ Immunogenetics
Cross-References ▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoglobulin Synthesis
▶ Chain Type ▶ Immunoinformatics
▶ Configuration Type ▶ Joining (J) Gene
I 972 IMGT-ONTOLOGY, ChainType

▶ Labels and Relations


▶ Location Type IMGT-ONTOLOGY, ChainType
▶ MH Superfamily (MhSF)
▶ Molecule Entity Type Marie-Paule Lefranc
▶ MoleculeType Laboratoire d’ImmunoGénétique Moléculaire,
▶ Recombination Signal (RS) Institut de Génétique Humaine UPR 1142,
▶ Structure Type Université Montpellier 2, Montpellier, France
▶ TaxonRank
▶ Variable (V) Domain
▶ Variable (V) Gene Synonyms

Chain type

References

Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Definition


Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
the formal IMGT-ONTOLOGY paradigm. Biochimie
“ChainType” is a ▶ highconcept of identification
90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge- (generated from the ▶ IDENTIFICATION Axiom) of
netics: the IMGT-ONTOLOGY. Bioinformatics 12: ▶ IMGT-ONTOLOGY, the global reference in
1047–1054 ▶ immunogenetics and ▶ immunoinformatics (Giudicelli
Giudicelli V, Chaume D, Lefranc M-P (2005) IMGT/GENE-
DB: a comprehensive database for human and mouse
and Lefranc 1999; Lefranc et al. 2004, 2005, 2008;
immunoglobulin and T cell receptor genes. Nucleic Acids Duroux et al. 2008), built by IMGT®, the international
Res 33:D256–D261 ImMunoGeneTics information system® (http://www.
Gruber TR (1993) A translation approach to portable ontologies. imgt.org) (▶ IMGT® Information System), that allows
Knowl Acquisition 5:199–220
to identify the type of chain. It is one of the most important
Lefranc M-P (2007) WHO-IUIS nomenclature subcommittee
for immunoglobulins and T cell receptors report. Immuno- concepts of identification for the standardization of
genetics 59:899–902 genome, transcriptome, and proteome data in system
Lefranc M-P (2008) WHO-IUIS nomenclature subcommit- biology. Indeed, being able to identify a type of chain
tee for immunoglobulins and T cell receptors report
means that it is possible to identify the transcript and the
August 2007, 13th international congress of immunol-
ogy, Rio de Janeiro, Brazil. Dev Comp Immunol 32: encoding gene(s).
461–463 The “ChainType” highconcept contains a hierarchy
Lefranc M-P, Lefranc G (2001a) The immunoglobulin of concepts which allows to identify the chain type at
FactsBook. Academic, London, pp 1–458
different levels of granularity. The finest level of granu-
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic, London, pp 1–398 larity, the “GeneLevelChainType” concept, identifies the
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, type of chain by reference to the gene(s) which code(s)
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner the chain (▶ Gene and Allele Nomenclature). It
D, Thouvenin V, Combres K, Girod D, Jeanjean S,
represents the main concept for a very precise identifi-
Protat C, Yousfi Monod M, Duprat E, Kaas Q, Pommié
C, Chaume D, Lefranc G (2004) IMGT-ONTOLOGY cation because it establishes a relationship with
for immunogenetics and immunoinformatics. In Silico “Gene” (concept of classification generated from the
Biol 4:17–29 ▶ NUMEROTATION Axiom) (the reciprocal relations
Lefranc M-P, Clément O, Kass Q, Duprat E, Chastellan P,
are: “is_coded_by” and “codes”).
Coelho I, Combres K, Ginestoux C, Giudicelli V,
Chaume D, Lefranc G (2005) IMGT-Choreography for Thus:
immunogenetics and immunoinformatics. In Silico Biol • An Homo sapiens IG-Heavy-Gamma1 chain is
5:45–60 identified by a V-D-J-C-chain (▶ Molecule Entity
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
Type), four domains VH, CH1, CH2, CH3
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform (▶ Domain Type) (with a hinge region between
9:263–275 CH1 and CH2), and a constant region encoded by
IMGT-ONTOLOGY, CLASSIFICATION Axiom 973 I
the human IGHG1 gene (▶ Gene and Allele Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Nomenclature). Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics
• An Homo sapiens IG-Kappa chain is identified by and immunoinformatics. In Silico Biol 5:45–60
a V-J-C-chain (▶ Molecule Entity Type), 2 domains Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
V-KAPPA and C-KAPPA (▶ Domain Type), and a system and an ontology that bridge biological and
a constant region encoded by the human IGKC gene computational spheres in bioinformatics. Brief Bioinform
9:263–275
(▶ Gene and Allele Nomenclature).
The number of instances of the leafconcepts
(▶ IMGT-ONTOLOGY, Leafconcept) of the
“GeneLevelChainType” concept depends on the number
of ▶ functional genes and ▶ ORF (FunctionalityType), IMGT-ONTOLOGY, CLASSIFICATION
per haploid genome, in a given species (in the case of the Axiom
immunoglobulin (IG) and T cell receptor (TR) genes, it
is the number of functional and ORF constant (C) genes Marie-Paule Lefranc
(▶ Constant (C) Gene) which is taken into account). If Laboratoire d’ImmunoGénétique Moléculaire,
only the functional genes are considered, the instances of Institut de Génétique Humaine UPR 1142,
this ▶ Leafconcept correspond to the isotypes. Université Montpellier 2, Montpellier, France
I

Cross-References Definition

▶ Domain Type The ▶ CLASSIFICATION axiom is an axiom of the


▶ Functional Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope
▶ FunctionalityType (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ Gene and Allele Nomenclature 2005, 2008; Duroux et al. 2008). The CLASSIFICA-
▶ IMGT-ONTOLOGY TION axiom postulates that any molecule, cell,
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom tissue, organ, organism or population, any process,
▶ IMGT-ONTOLOGY, Highconcept and any relation has to be classified. The CLASSIFI-
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom CATION axiom has generated the concepts of classi-
▶ IMGT-ONTOLOGY, Leafconcept fication of ▶ IMGT-ONTOLOGY, the global
▶ IMGT ® Information System reference in ▶ immunogenetics and ▶ immunoin-
▶ Immunogenetics formatics (Giudicelli and Lefranc 1999; Lefranc et al.
▶ Immunoinformatics 2004, 2005, 2008; Duroux et al. 2008), built by
▶ Molecule Entity Type IMGT ®, the international ImMunoGeneTics informa-
▶ Open Reading Frame (ORF) tion system ® (http://www.imgt.org) (▶ IMGT® Infor-
mation System).
IMGT-ONTOLOGY major concepts of classifica-
References tion comprise:
• The “Group”, “Subgroup”, “Gene,” and “Allele”
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, concepts
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT- • The standardized IMGT gene and allele names
ONTOLOGY paradigm. Biochimie 90:570–583
(▶ Gene and Allele Nomenclature)
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 The CLASSIFICATION axiom is one of seven
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, axioms of the Formal IMGT-ONTOLOGY, the others
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, being “▶ IDENTIFICATION axiom,” “▶ DESCRIP-
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
TION axiom,” “▶ NUMEROTATION axiom,”
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics “▶ LOCALIZATION axiom,” “▶ ORIENTATION
and immunoinformatics. In Silico Biol 4:17–29 axiom,” and “▶ OBTENTION axiom.”
I 974 IMGT-ONTOLOGY, ConfigurationType

Cross-References Definition

▶ Gene and Allele Nomenclature “ConfigurationType” is a concept of identification


▶ IMGT-ONTOLOGY (generated from the ▶ IDENTIFICATION
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom Axiom) of ▶ IMGT-ONTOLOGY, the global
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom reference in ▶ immunogenetics and ▶ immunoin-
▶ IMGT-ONTOLOGY, LOCALIZATION Axiom formatics (Giudicelli and Lefranc 1999; Lefranc et al.
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom 2004, 2005, 2008; Duroux et al. 2008), built by IMGT®,
▶ IMGT-ONTOLOGY, OPTENTION Axiom the international ImMunoGeneTics information sys-
▶ IMGT-ONTOLOGY, ORIENTATION Axiom tem® (http://www.imgt.org) (▶ IMGT® Information
▶ IMGT ® information system System), that allows to identify the type of configuration
▶ Immunogenetics of a gene, and by extension, the type of configuration of
▶ Immunoinformatics the ▶ Molecule_EntityType leafconcepts (▶ IMGT-
ONTOLOGY, Leafconcept) that contain it.
The “ConfigurationType” concept comprises four
References leafconcepts:
1. “Undefined” identifies, whatever the molecule
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, type (▶ MoleculeType), the configuration of the
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
conventional genes (▶ Conventional Gene) and
the Formal IMGT-ONTOLOGY paradigm. Biochimie
90:570–583 that of the immunoglobulin (IG) and T cell receptor
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- (TR) constant (C) genes (▶ Constant (C) Gene)
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 (Lefranc and Lefranc 2001a, b), and by extension,
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
the configuration of the ▶ Molecule_EntityType
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, leafconcepts that only contain genes in undefined
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, configuration (▶ Configuration Type).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics 2. “Germline” identifies, whatever the molecule type
and immunoinformatics. In Silico Biol 4:17–29
(▶ MoleculeType), the configuration of the IG and
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, TR variable (V), diversity (D), and joining (J) genes
Chaume D, Lefranc G (2005) IMGT-Choreography for (▶ Variable (V) Gene, ▶ Diversity (D) Gene and
immunogenetics and immunoinformatics. In Silico Biol ▶ Joining (J) Gene) before DNA rearrangements
5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
(Lefranc and Lefranc 2001a, b), and by extension,
a system and an ontology that bridge biological and the configuration of the ▶ Molecule_EntityType
computational spheres in bioinformatics. Brief Bioinform leafconcepts that contain germline genes (with or
9:263–275 without constant (C) genes in undefined configura-
tion) (▶ Configuration Type).
3. “Rearranged” identifies, whatever the molecule
type (▶ MoleculeType), the configuration of the
IG and TR variable (V), diversity (D), and joining
IMGT-ONTOLOGY, ConfigurationType (J) genes (▶ Variable (V) Gene, ▶ Diversity
(D) Gene and ▶ Joining (J) Gene) after DNA
Marie-Paule Lefranc rearrangements, and by extension, the configuration
Laboratoire d’ImmunoGénétique Moléculaire, of the Molecule_EntityType leafconcepts that con-
Institut de Génétique Humaine UPR 1142, tain rearranged genes with, if present, completely
Université Montpellier 2, Montpellier, France rearranged D genes (with or without constant (C)
genes in undefined configuration) (▶ Configuration
Type).
Synonyms 4. “Partially-rearranged” identifies, whatever the
molecule type (▶ MoleculeType), the configuration
Configuration type of partially-rearranged IG or TR diversity (D) genes
IMGT-ONTOLOGY, DESCRIPTION Axiom 975 I
(▶ Diversity (D) Gene), and by extension, the con-
figuration of Molecule_EntityType leafconcepts IMGT-ONTOLOGY, DESCRIPTION Axiom
that contain at least one partially-rearranged
D gene [with another D or with rearranged variable Marie-Paule Lefranc
(V) or joining (J) genes (with or without constant Laboratoire d’ImmunoGénétique Moléculaire,
(C) genes in undefined configuration)] (▶ Configu- Institut de Génétique Humaine UPR 1142,
ration Type). Université Montpellier 2, Montpellier, France

Cross-References Definition

▶ Configuration Type The ▶ DESCRIPTION axiom is an axiom of the


▶ Constant (C) Gene Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope
▶ Conventional Gene (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ Diversity (D) Gene 2005, 2008; Duroux et al. 2008). The DESCRIPTION
▶ IMGT-ONTOLOGY axiom postulates that any molecule, cell, tissue,
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom organ, organism or population, any process, and
▶ IMGT-ONTOLOGY, Leafconcept any relation, has to be described. The DESCRIPTION I
▶ IMGT ® Information System axiom has generated the concepts of description
▶ Immunogenetics of ▶ IMGT-ONTOLOGY, the global reference in
▶ Immunoinformatics ▶ immunogenetics and ▶ immunoinformatics
▶ Joining (J) Gene (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ Molecule Entity Type 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ MoleculeType international ImMunoGeneTics information system®
▶ Variable (V) Gene (http://www.imgt.org) (▶ IMGT® Information System).
IMGT-ONTOLOGY concepts of description
comprise:
• The Molecule_EntityPrototype concept
References • The Core concept with its five essential leafconcepts
“REGION,” “V-REGION,” “D-REGION,”
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc
M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
“J-REGION,” and “C-REGION” (▶ IMGT-
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 ONTOLOGY), that describes the core coding region
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge- of each “▶ GeneType” ▶ leafconcept
netics: the IMGT-ONTOLOGY. Bioinformatics 12: • The standardized IMGT labels and their relations
1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
(▶ Labels and Relations)
FactsBook. Academic, London, pp 1–458 The DESCRIPTION axiom is one of seven axioms of
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook. the Formal IMGT-ONTOLOGY, the others being
Academic, London, pp 1–398 “▶ IDENTIFICATION axiom,” “▶ CLASSIFICATION
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
axiom,” “▶ NUMEROTATION axiom,” “▶ LOCALI-
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, ZATION axiom,” “▶ ORIENTATION axiom,” and
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, “▶ OBTENTION axiom.”
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, Cross-References
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 ▶ Gene Type
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
IMGT, a system and an ontology that bridge biological
▶ IMGT-ONTOLOGY
and computational spheres in bioinformatics. Brief ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
Bioinform 9:263–275 ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
I 976 IMGT-ONTOLOGY, DomainType

▶ IMGT-ONTOLOGY, Leafconcept in ▶ immunogenetics and ▶ immunoinformatics


▶ IMGT-ONTOLOGY, LOCALIZATION Axiom (Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005a,
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom 2008;Durouxetal.2008),builtbyIMGT®,theinternational
▶ IMGT-ONTOLOGY, OPTENTION Axiom ImMunoGeneTics information system® (http://www.
▶ IMGT-ONTOLOGY, ORIENTATION Axiom imgt.org) (▶ IMGT® Information System), that allows to
▶ IMGT ® Information System identifythetypeofdomain.
▶ Immunogenetics A domain is a chain (▶ Chain Type) subunit charac-
▶ Immunoinformatics terized by its three-dimensional (3D) structure and, by
▶ Labels and Relations extension, its amino acid sequence and the nucleotide
sequence which encodes it. This concept may theoreti-
cally comprise many leafconcepts (▶ IMGT-ONTOL-
References OGY, Leafconcept).
In IMGT-ONTOLOGY, the “DomainType” concept
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, has three major leafconcepts that have been first
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
described in the proteins of the adaptive immune
ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- response, immunoglobulins (IG) or antibodies, T cell
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 receptors (TR) and major histocompatibility (MH), and
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, then extended to the proteins of the ▶ Immunoglobulin
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
superfamily (IgSF) and ▶ MH superfamily (MhSF)
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, (Lefranc et al. 2003, 2005b, c). These leafconcepts are:
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics • The “V domain” (▶ Variable (V) Domain) that
and immunoinformatics. In Silico Biol 4:17–29 includes the variable domains of the IG and TR
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
and the V-like domains of other IgSF proteins
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
D, Lefranc G (2005) IMGT-Choreography for immunoge- • The “C domain” (▶ Constant (C) Domain) that
netics and immunoinformatics. In Silico Biol 5:45–60 includes the constant domains of the IG and TR
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) and the C-like domains of other IgSF proteins
IMGT, a system and an ontology that bridge biological and
• The “G domain” (▶ Groove (G) Domain) that
computational spheres in bioinformatics. Brief Bioinform
9:263–275 includes the groove domains of the MH and the
G-like domains of other MhSF proteins
Standardized description of the V, C, and G domains
is based on the ▶ IMGT unique numbering (Lefranc
et al. 2003, 2005a, b). Graphical two-dimensional (2D)
IMGT-ONTOLOGY, DomainType representations or ▶ IMGT Colliers de Perles can be
obtained using the IMGT/DomainGapAlign and
Marie-Paule Lefranc IMGT/Collier-de-Perles tools (http://www.imgt.org)
Laboratoire d’ImmunoGénétique Moléculaire, (▶ IMGT® Information System).
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
Cross-References
Synonyms
▶ Chain Type
▶ Constant (C) Domain
Domain type
▶ Groove (G) Domain
▶ IMGT Collier de Perles
Definition ▶ IMGT Unique Numbering
▶ IMGT-ONTOLOGY
“DomainType” is a concept of identification ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
(generated from the ▶ IDENTIFICATION Axiom) ▶ IMGT-ONTOLOGY, Leafconcept
of ▶ IMGT-ONTOLOGY, the global reference ▶ IMGT ® Information System
IMGT-ONTOLOGY, FunctionType 977 I
▶ Immunogenetics Definition
▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoinformatics “FunctionType” is a concept of identification
▶ MH Superfamily (MhSF) (generated from the ▶ IDENTIFICATION axiom)
▶ Variable (V) Domain of ▶ IMGT-ONTOLOGY, the global reference
in ▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
References 2005, 2008; Duroux et al. 2008), built by IMGT®, the
international ImMunoGeneTics information system®
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc (http://www.imgt.org) (▶ IMGT® Information System),
M-P, Giudicelli V (2008) IMGT-Kaleidoscope, The Formal
that allows to identify the type of function. In molec-
IMGT-ONTOLOGY Paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet- ular biology, “FunctionType” allows to identify
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 the type of function of a ▶ Molecule_EntityType
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E, leafconcept.
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
Thus, for example, the “FunctionType” concept
unique numbering for immunoglobulin and T cell receptor
variable domains and Ig superfamily V-like domains. Dev allows to identify:
Comp Immunol 27:55–77 1. For the immunoglobulins (IG) or antibodies and
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, the T cell receptors, that are the antigen receptors of I
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
the adaptive immune response, the function of
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, “antigen binding” and “antigen recognition” by the
Lefranc G (2004) IMGT-ONTOLOGY for Immunogenetics variable (V) domains (▶ Variable (V) Domain)
and Immunoinformatics. In Silico Biol 4:17–29 of the specific antigen receptors, these types of
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
functions being strongly related with the
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005a) IMGT-Choreography for Immunogenet- “▶ SpecificityType” concept that allows the amino
ics and Immunoinformatics. In Silico Biol 5:45–60 acid characterization (“▶ Paratope”/“▶ Epitope”) at
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, the binding interface.
Guiraudou D, Jean C, Ruiz M, Da Piedade I, Rouard M,
2. For the IG, the “effector properties” related to the
Foulquier E, Thouvenin V, Lefranc G (2005b) IMGT unique
numbering for immunoglobulin and T cell receptor constant Fragment crystallizable (or Fc), formed by the con-
domains and Ig superfamily C-like domains. Dev Comp stant CH2 and CH3 domains (▶ Constant (C)
Immunol 29:185–203 Domain). This includes, for example, the “comple-
Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
(2005c) IMGT unique numbering for MHC groove
ment dependent cytotoxicity (CDC)” and the
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE- “antibody-dependent cell cytotoxicity (ADCC)”
DOMAIN. Dev Comp Immunol 29:917–938 (IMGT Lexique, http://www.imgt.org).
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a 3. For the MH, the function of presentation
system and an ontology that bridge biological and computa-
of the processed antigen as peptide in the
tional spheres in bioinformatics. Brief Bioinform 9:263–275
groove formed by the two groove domains
(G-DOMAIN) (▶ Groove (G) Domain) (“pMH
contact analysis” in IMGT/3Dstructure-DB,
http://www.imgt.org).
IMGT-ONTOLOGY, FunctionType The “FunctionType” concept is currently developed
in ▶ IMGT-ONTOLOGY for a better characteriza-
Marie-Paule Lefranc tion of the properties of the therapeutical mono-
Laboratoire d’ImmunoGénétique Moléculaire, clonal antibodies (Magdelaine-Beuzelin et al.
Institut de Génétique Humaine UPR 1142, 2007) and fusion proteins for immune applications
Université Montpellier 2, Montpellier, France (FPIA), in relation with clinical applications
(oncology, autoimmune diseases, etc.) (IMGT/
Synonyms mAb-DB, http://www.imgt.org). This concept is
also developed for bridging the gap between
Function type molecular biology and cell system biology in
I 978 IMGT-ONTOLOGY, GeneType

animal models of immunotherapy (Pappalardo


et al. 2010). IMGT-ONTOLOGY, GeneType

Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Cross-References
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
▶ Constant (C) Domain
▶ Epitope
▶ Groove (G) Domain
Synonyms
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Gene type
▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT-ONTOLOGY, SpecificityType
▶ IMGT ® Information System
▶ Paratope
Definition
▶ Variable (V) Domain
“GeneType” is a concept of identification
(generated from the ▶ IDENTIFICATION axiom)
References of ▶ IMGT-ONTOLOGY, the global reference
in ▶ immunogenetics and ▶ immunoinformatics
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, 2005, 2008; Duroux et al. 2008), built by IMGT ®, the
the Formal IMGT-ONTOLOGY paradigm. Biochimie 90: international ImMunoGeneTics information system ®
570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
(http://www.imgt.org) (▶ IMGT ® Information Sys-
netics: the IMGT-ONTOLOGY. Bioinformatics 12: tem), that allows to identify the type of gene.
1047–1054 The “GeneType” concept comprises six
Lefranc M-P, Giudicelli V, Ginestoux C, Bose N, Folch G, leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept):
Guiraudou D, Jabado-Michaloud J, Magris S,
Scaviner D, Thouvenin V, Combres K, Girod D,
• Two leafconcepts refer to “conventional”
Jeanjean S, Protat C, Yousfi Monod M, Duprat E, (▶ Conventional Gene) genes, that is, any (coding
Kaas Q, Pommié C, Chaume D, Lefranc G (2004) or not coding) gene other than the immunoglob-
IMGT-ONTOLOGY for immunogenetics and immunoin- ulin (IG) or T cell receptor (TR) genes:
formatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
these leafconcepts “conventional-with-leader”
Coelho I, Combres K, Ginestoux C, Giudicelli V, and “conventional-without-leader” identify any
Chaume D, Lefranc G (2005) IMGT-Choreography for (coding or not coding) gene with or without
immunogenetics and immunoinformatics. In Silico Biol leader L region (or signal peptide), respectively.
5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
• Four leafconcepts refer to the IG and TR genes
IMGT, a system and an ontology that bridge biological of the vertebrate adaptive immune response and
and computational spheres in bioinformatics. Brief are specific to immunogenetics: three of these
Bioinform 9:263–275 leafconcepts, “variable” (V), “diversity” (D), and
Magdelaine-Beuzelin C, Kaas Q, Wehbi V, Ohresser M,
Jefferis R, Lefranc M-P, Watier H (2007) Structure-function
“joining” (J) (▶ Variable (V) Gene, ▶ Diversity (D)
relationships of the variable domains of monoclonal anti- Gene, ▶ Joining (J) Gene) identify the IG and TR
bodies approved for cancer treatment. Crit Rev Oncol genes that rearrange at the DNA level in the B and
Hematol 64:210–225 T cells and code the V, D, and J regions, respec-
Pappalardo F, Lefranc M-P, Lollini P-L, Motta
S (2010) A novel paradigm for cell and molecular inter-
tively, of the IG and TR variable domains
action: from the CMM model to IMGT-ONTOLOGY. (▶ Variable (V) Domain); the fourth leafconcept,
Immunome Res 6:1 “constant” (C), (▶ Constant (C) Gene) identifies the
IMGT-ONTOLOGY, Highconcept 979 I
IG and TR genes that code the C region of the IG
and TR chains (▶ Chain Type) (Lefranc and IMGT-ONTOLOGY, Highconcept
Lefranc 2001a, b).
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Cross-References Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
▶ Chain Type
▶ Configuration Type
▶ Constant (C) Gene Definition
▶ Conventional Gene
▶ Diversity (D) Gene A highconcept is a type of concept of ▶ IMGT-
▶ IMGT-ONTOLOGY ONTOLOGY, the global reference in ▶ immunoge-
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom netics and ▶ immunoinformatics (Giudicelli and
▶ IMGT-ONTOLOGY, Leafconcept Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
▶ IMGT ® Information System et al. 2008), built by IMGT ®, the international ImMu-
▶ Immunogenetics noGeneTics information system ® (http://www.imgt.
▶ Immunoinformatics org) (▶ IMGT ® Information System), that I
▶ Joining (J) Gene identifies a concept with different levels of granularity
▶ Variable (V) Domain corresponding to a hierarchy of concepts. A highconcept
▶ Variable (V) Gene provides a general characterization and the level in the
hierarchy is required for a precise characterization.
“▶ TaxonRank,” “▶ ChainType,” and “ReceptorType”
References are three examples of highconcepts of identification
(IDENTIFICATION axiom). The hierarchy of
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, “TaxonRank” is related to taxonomy levels and that
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, of “ChainType” and “ReceptorType” in part to concepts
the Formal IMGT-ONTOLOGY paradigm. Biochimie of classification.
90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
netics: the IMGT-ONTOLOGY. Bioinformatics 12:
1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Cross-References
FactsBook. Academic Press, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook. ▶ Chain Type
Academic Press, London, pp 1–398 ▶ IMGT-ONTOLOGY
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
D, Thouvenin V, Combres K, Girod D, Jeanjean S, ▶ IMGT ® Information System
Protat C, Yousfi Monod M, Duprat E, Kaas Q, Pommié ▶ Immunogenetics
C, Chaume D, Lefranc G (2004) IMGT-ONTOLOGY ▶ Immunoinformatics
for immunogenetics and immunoinformatics. In Silico
Biol 4:17–29
▶ TaxonRank
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V,
Chaume D, Lefranc G (2005) IMGT-Choreography for
immunogenetics and immunoninformatics. In Silico Biol References
5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
IMGT, a system and an ontology that bridge biological Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
and computational spheres in bioinformatics. Brief the Formal IMGT-ONTOLOGY paradigm. Biochimie
Bioinform 9:263–275 90:570–583
I 980 IMGT-ONTOLOGY, IDENTIFICATION Axiom

Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- of “Molecule_EntityType,” “StructureType” of


ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 “ReceptorType,” “▶ TaxonRank.”
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, Three of these concepts “TaxonRank,”
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, “ChainType” and “ReceptorType” are highconcepts
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, (▶ IMGT-ONTOLOGY, Highconcept).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics The IDENTIFICATION axiom is one of seven
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Durprat E, Chastellan P, axioms of the Formal IMGT-ONTOLOGY, the others
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, being “▶ DESCRIPTION axiom,” “▶ CLASSIFICA-
Lefranc G (2005) IMGT-Choreography for immunogenetics TION axiom,” “▶ NUMEROTATION axiom,”
and immunoinformatics. In Silico Biol 5:45–60 “▶ LOCALIZATION axiom,” “▶ ORIENTATION
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and axiom” and “▶ OBTENTION axiom.”
computational spheres in bioinformatics. Brief Bioinform
9:263–275
Cross-References

▶ Chain Type
▶ Configuration Type
IMGT-ONTOLOGY, IDENTIFICATION ▶ Domain Type
Axiom ▶ Function Type
▶ FunctionalityType
Marie-Paule Lefranc ▶ Gene Type
Laboratoire d’ImmunoGénétique Moléculaire, ▶ IMGT-ONTOLOGY
Institut de Génétique Humaine UPR 1142, ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
Université Montpellier 2, Montpellier, France ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
▶ IMGT-ONTOLOGY, Highconcept
▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
Definition ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
▶ IMGT-ONTOLOGY, OPTENTION Axiom
The ▶ IDENTIFICATION axiom is an axiom of the ▶ IMGT-ONTOLOGY, ORIENTATION Axiom
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope ▶ IMGT-ONTOLOGY, SpecificityType
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, ▶ IMGT ® Information System
2005, 2008; Duroux et al. 2008). The IDENTIFICA- ▶ Immunogenetics
TION axiom postulates that any molecule, cell, ▶ Immunoinformatics
tissue, organ, organism, or population, any process, ▶ Molecule Entity Type
and any relation, has to be identified. The IDENTIFICA- ▶ MoleculeType
TION axiom has generated the concepts of ▶ Structure Type
identification of ▶ IMGT-ONTOLOGY, the global ref- ▶ TaxonRank
erence in ▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005,
2008; Duroux et al. 2008), built by IMGT®, the interna-
tional ImMunoGeneTics information system® (http:// References
www.imgt.org) (▶ IMGT® Information System).
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
IMGT-ONTOLOGY major concepts of identifica-
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
tion comprise: the Formal IMGT-ONTOLOGY paradigm. Biochimie
“▶ Chain Type,” “▶ Configuration Type,” 90:570–583
“▶ Domain Type,” “▶ Function Type,” “▶ Functiona- Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
lityType,” “▶ Gene Type,” “▶ Molecule Entity Type,”
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
“▶ MoleculeType,” “ReceptorType,” “▶ IMGT- Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
ONTOLOGY, SpecificityType,” “▶ Structure Type” Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
IMGT-ONTOLOGY, LOCALIZATION Axiom 981 I
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics ▶ IMGT ® Information System
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, ▶ Immunogenetics
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, ▶ Immunoinformatics
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform References
9:263–275
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
netics: the IMGT-ONTOLOGY. Bioinformatics 12:
IMGT-ONTOLOGY, Leafconcept 1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Marie-Paule Lefranc Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Laboratoire d’ImmunoGénétique Moléculaire, Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Institut de Génétique Humaine UPR 1142, Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics I
and immunoinformatics. In Silico Biol 4:17–29
Université Montpellier 2, Montpellier, France
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics
Definition and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
A leafconcept is a type of concept of ▶ IMGT- computational spheres in bioinformatics. Brief Bioinform
ONTOLOGY, the global reference in ▶ immunoge- 9:263–275
netics and ▶ immunoinformatics (Giudicelli and
Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
et al. 2008), built by IMGT ®, the international ImMu-
noGeneTics information system ® (http://www.imgt.
org) (▶ IMGT ® Information System), that identifies IMGT-ONTOLOGY, LOCALIZATION Axiom
a concept that corresponds to the finest level of
granularity. A leafconcept is defined relative to Marie-Paule Lefranc
a parent concept (on a higher level). Leafconcepts Laboratoire d’ImmunoGénétique Moléculaire,
have been extensively characterized for the Institut de Génétique Humaine UPR 1142,
concepts of identification (▶ IMGT-ONTOLOGY, Université Montpellier 2, Montpellier, France
IDENTIFICATION Axiom), the concepts of
description (▶ IMGT-ONTOLOGY, DESCRIPTION
Axiom), the concepts of classification (▶ IMGT- Definition
ONTOLOGY, CLASSIFICATION axiom), and the
concepts of numerotation (▶ IMGT-ONTOLOGY, The ▶ LOCALIZATION axiom is an axiom of the
NUMEROTATION Axiom) of ▶ IMGT-ONTOLOGY. Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008). The LOCALIZATION
Cross-References axiom postulates that any molecule, cell, tissue,
organ, organism, or population, any process, and any
▶ IMGT-ONTOLOGY relation, has to be localized. The LOCALIZATION
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom axiom has generated the concepts of localization
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom of ▶ IMGT-ONTOLOGY, the global reference
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom in ▶ immunogenetics and ▶ immunoinformatics
I 982 IMGT-ONTOLOGY, LocationType

(Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005,


2008; Duroux et al. 2008), built by IMGT®, the interna- IMGT-ONTOLOGY, LocationType
tional ImMunoGeneTics information system® (http://
www.imgt.org) (▶ IMGT® Information System). Marie-Paule Lefranc
IMGT-ONTOLOGY concepts of localization com- Laboratoire d’ImmunoGénétique Moléculaire
prise “Chromosome,” “Locus,” cell compartments, Institut de Génétique Humaine UPR 1142,
organ parts, etc. Université Montpellier 2, Montpellier, France
The LOCALIZATION axiom is one of seven
axioms in the Formal IMGT-ONTOLOGY, the others
being “▶ IDENTIFICATION axiom,” “▶ DESCRIP- Synonyms
TION axiom,” “▶ CLASSIFICATION axiom,”
“▶ NUMEROTATION axiom,” “▶ ORIENTATION Location type; Molecule location type
axiom,” and “▶ OBTENTION axiom.”

Definition
Cross-References
“LocationType” is a concept of identification
▶ IMGT-ONTOLOGY (generated from the ▶ IDENTIFICATION Axiom)
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom of ▶ IMGT-ONTOLOGY, the global reference
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom in ▶ immunogenetics and ▶ immunoinformatics
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ IMGT-ONTOLOGY, OPTENTION Axiom international ImMunoGeneTics information system®
▶ IMGT-ONTOLOGY, ORIENTATION Axiom (http://www.imgt.org) (▶ IMGT® Information System),
▶ IMGT ® Information System that allows to identify, whatever the molecule type
▶ Immunogenetics (▶ MoleculeType) (gDNA, cDNA, mRNA, protein),
▶ Immunoinformatics the ▶ Molecule_entity type leafconcepts (▶ IMGT-
ONTOLOGY, Leafconcept) based on their location.
The “LocationType” concept comprises, for
instance, the following leafconcepts:
References • “Orphon” identifies, whatever the molecule type,
a gene that is found in vivo on a different locus from
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, the main locus (either on the same chromosome or
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, on another chromosome).
the Formal IMGT-ONTOLOGY paradigm. Biochimie
• “Transgene” identifies, whatever the molecule
90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge- type, a gene that is artificially introduced into a
netics: the IMGT-ONTOLOGY. Bioinformatics 12: multicellular organism (mouse, plant, etc.).
1047–1054 • “Translocated” identifies, whatever the molecule
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
type, a gene that results from a translocation (in vivo).
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, • “Transposed” identifies, whatever the molecule
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, type, a transgene or a retrotransposon that is
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics inserted in a chromosome.
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics Cross-References
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
▶ IMGT-ONTOLOGY
a system and an ontology that bridge biological and compu-
tational spheres in bioinformatics. Brief Bioinform ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
9:263–275 ▶ IMGT-ONTOLOGY, Leafconcept
IMGT-ONTOLOGY, Molecule_EntityType 983 I
▶ IMGT ® Information System 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ Immunogenetics international ImMunoGeneTics information system®
▶ Immunoinformatics (http://www.imgt.org) (▶ IMGT® Information System),
▶ Molecule Entity Type that allows to identify the molecule entity type.
▶ MoleculeType The “Molecule_EntityType” concept is defined by the
“▶ MoleculeType,” “▶ GeneType,” and “▶ Configura-
tionType” concepts.
References The “Molecule_EntityType” concept comprises
38 leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept).
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, The ten leafconcepts that are the most classical
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the
representatives of IG and TR data identification in
Formal IMGT-ONTOLOGY paradigm. Biochimie 90:
570–583 the ▶ IMGT ® information system are shown as exam-
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- ples. They are also illustrated in ▶ immunoglobulin
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 synthesis.
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
• V-gene identifies, for gDNA (▶ MoleculeType),
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, molecule entities with a germline V gene (▶ Gene
Yousfi MM, Duprat E, Kaas Q, Pommié C, Chaume D, Type). V-gene is in germline configuration
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics (▶ Configuration Type). I
and immunoinformatics. In Silico Biol 4:17–29
• D-gene identifies, for gDNA, molecule entities with
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chestellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, a germline D gene (▶ Gene Type). D-gene is in
Lefranc G (2005) IMGT-Choreography for immunogenetics germline configuration.
and immunoinformatics. In Silico Biol 5:45–60 • J-gene identifies, for gDNA, molecule entities with
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a germline J gene (▶ Gene Type). J-gene is in
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform germline configuration.
9:263–275 • C-gene identifies, for gDNA, molecule entities
with a C gene in undefined configuration (▶ Gene
Type). C-gene is in undefined configuration
(▶ Configuration Type).
• V-J-gene identifies, for gDNA, molecule entities
IMGT-ONTOLOGY, Molecule_EntityType with a rearranged V gene and a rearranged J gene
(▶ Gene Type). V-J-gene is in rearranged configu-
Marie-Paule Lefranc ration (▶ Configuration Type).
Laboratoire d’ImmunoGénétique Moléculaire, • V-D-J-gene identifies, for gDNA, molecule entities
Institut de Génétique Humaine UPR 1142, with a rearranged V gene, at least one rearranged
Université Montpellier 2, Montpellier, France D gene and a rearranged J gene (▶ Gene Type).
V-D-J-gene is in rearranged configuration
(▶ Configuration Type).
Synonyms • L-V-J-C-sequence identifies, for cDNA
(▶ MoleculeType), molecule entities with a leader
Molecule entity type L region, a rearranged V region, a rearranged
J region, and a C region in undefined configuration
(▶ Gene Type). L-V-J-C-sequence is in rearranged
Definition configuration (▶ Configuration Type).
• L-V-D-J-C-sequence identifies, for cDNA, molecule
Molecule_EntityType is a major concept of identifi- entities with a leader L region, a rearranged V region,
cation (generated from the ▶ IDENTIFICATION at least one rearranged D region, a rearranged
Axiom) of ▶ IMGT-ONTOLOGY, the global refer- J region, and a C region in undefined configuration
ence in ▶ immunogenetics and ▶ immunoinformatics (▶ Gene Type). L-V-D-J-C-sequence is in rearranged
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, configuration (▶ Configuration Type).
I 984 IMGT-ONTOLOGY, NUMEROTATION Axiom

• V-J-C-chain identifies, for protein (▶ MoleculeType),


molecule entities without a leader L region (mature IMGT-ONTOLOGY, NUMEROTATION
form), and with a rearranged V region, a rearranged Axiom
J region and a C region in undefined configuration
(▶ Gene Type). V-J-C-chain is in rearranged config- Marie-Paule Lefranc
uration (▶ Configuration Type). Laboratoire d’ImmunoGénétique Moléculaire,
• V-D-J-C-chain identifies, for protein, molecule Institut de Génétique Humaine UPR 1142,
entities without a leader L region (mature form) Université Montpellier 2, Montpellier, France
and with a rearranged V region, at least one
rearranged D region, a rearranged J region, and a
C region in undefined configuration (▶ Gene Type).
V-D-J-C-chain is in rearranged configuration Definition
(▶ Configuration Type).
The ▶ NUMEROTATION axiom is an axiom of the
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope
Cross-References (Giudicelli and Lefranc 1999; Lefranc et al.
2004, 2005, 2008; Duroux et al. 2008). The
▶ Configuration Type NUMEROTATION axiom postulates that any molecule,
▶ Gene Type cell, tissue, organ, organism, or population, any process,
▶ IMGT-ONTOLOGY and any relation, has to be numbered. The
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom NUMEROTATION axiom has generated the concepts
▶ IMGT-ONTOLOGY, Leafconcept of numerotation of ▶ IMGT-ONTOLOGY, the global
▶ IMGT ® Information System reference in ▶ immunogenetics and ▶ immunoin-
▶ Immunogenetics formatics (Giudicelli and Lefranc 1999; Lefranc et al.
▶ Immunoglobulin Synthesis 2004, 2005, 2008; Duroux et al. 2008), built by IMGT®,
▶ Immunoinformatics the international ImMunoGeneTics information sys-
▶ MoleculeType tem® (http://www.imgt.org) (▶ IMGT® Information
System).
IMGT-ONTOLOGY major concepts of
References numerotation comprise:
1. The “IMGT_unique_numbering” concept
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, (▶ IMGT Unique Numbering), with three
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the leafconcepts:
Formal IMGT-ONTOLOGY paradigm. Biochimie 90: • “IMGT_unique_numbering_for_V_domain”
570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
• “IMGT_unique_numbering_for_C_domain”
netics: the IMGT-ONTOLOGY. Bioinformatics 12: • “IMGT_unique_numbering_for_G_domain”.
1047–1054 2. The “IMGT_Collier_de_Perles” concept
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, (▶ IMGT Collier de Perles), with three
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
leafconcepts:
Yousfi MM, Duprat E, Kaas Q, Pommié C, Chaume D, • “IMGT_Collier_de_Perles _for_V_domain”
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics • “IMGT_Collier_de_Perles _for_C_domain”
and immunoinformatics. In Silico Biol 4:17–29 • “IMGT_Collier_de_Perles _for_G_domain”
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V,
The NUMEROTATION axiom is one of
Chaume D, Lefranc G (2005) IMGT-Choreography for seven axioms of the Formal IMGT-ONTOLOGY,
immunogenetics and immunoinformatics. In Silico Biol the others being “▶ IDENTIFICATION
5:45–60 axiom,” “▶ DESCRIPTION axiom,” “▶ CLASSI-
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
IMGT, a system and an ontology that bridge biological
FICATION axiom,” “▶ LOCALIZATION axiom,”
and computational spheres in bioinformatics. Brief “▶ ORIENTATION axiom,” and “▶ OBTENTION
Bioinform 9:263–275 axiom.”
IMGT-ONTOLOGY, OPTENTION Axiom 985 I
Cross-References and Lefranc 1999; Lefranc et al. 2004, 2005, 2008;
Duroux et al. 2008). The OBTENTION axiom postulates
▶ IMGT-ONTOLOGY that the way any molecule, cell, tissue, organ, organism
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom or population has obtained can be characterized. The
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom OBTENTION axiom has generated the concepts of
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom obtention of ▶ IMGT-ONTOLOGY, the global refer-
▶ IMGT-ONTOLOGY, LOCALIZATION Axiom ence in ▶ immunogenetics and ▶ immunoinformatics
▶ IMGT-ONTOLOGY, OPTENTION Axiom (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ IMGT-ONTOLOGY, ORIENTATION Axiom 2005, 2008; Duroux et al. 2008), built by IMGT®,
▶ IMGT Collier de Perles the international ImMunoGeneTics information
▶ IMGT ® Information System system® (http://www.imgt.org) (▶ IMGT® Information
▶ IMGT Unique Numbering System).
▶ Immunogenetics IMGT-ONTOLOGY concepts of obtention
▶ Immunoinformatics comprise:
• The “Origin” concept (leafconcepts characterize
the sources).
References • The “Methodology” concept (leafconcepts charac-
terize the methods). I
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M- The OBTENTION axiom is one of seven axioms
P, Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
in the Formal IMGT-ONTOLOGY, the others being
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- “▶ IDENTIFICATION axiom,” “▶ DESCRIPTION
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 axiom,” “▶ CLASSIFICATION axiom,”
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, “▶ NUMEROTATION axiom,” “▶ LOCALIZA-
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
TION axiom,” and “▶ ORIENTATION axiom.”
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29 Cross-References
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics ▶ IMGT-ONTOLOGY
and immunoinformatics. In Silico Biol 5:45–60 ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
a system and an ontology that bridge biological and
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
computational spheres in bioinformatics. Brief Bioinform
9:263–275 ▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
▶ IMGT-ONTOLOGY, ORIENTATION Axiom
▶ IMGT ® Information System
▶ Immunogenetics
IMGT-ONTOLOGY, OPTENTION Axiom ▶ Immunoinformatics

Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
References
Université Montpellier 2, Montpellier, France Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Definition Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
The ▶ OBTENTION axiom is an axiom of the Formal Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner
IMGT-ONTOLOGY or IMGT-Kaleidoscope (Giudicelli D, Thouvenin V, Combres K, Girod D, Jeanjean S,
I 986 IMGT-ONTOLOGY, ORIENTATION Axiom

Protat C, Yousfi Monod M, Duprat E, Kaas Q, • Antisense (synonyms: Minus or Not coding). The
Pommié C, Chaume D, Lefranc G (2004) IMGT- complementary 30 - > 50 strand which is the
ONTOLOGY for immunogenetics and immunoin-
formatics. In Silico Biol 4:17–29 strand transcribed by the RNA polymerase is
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, also designated as “template” DNA.
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, 2. The Genomic_Orientation concept is
Lefranc G (2005) IMGT-Choreography for immunogenetics a highconcept that defines the orientation of
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, genomic entities (chromosome, locus, and gene)
a system and an ontology that bridge biological and and comprises:
computational spheres in bioinformatics. Brief Bioinform • The Gene_Orientation concept, with a unique
9:263–275 ▶ leafconcept: “50 ! 30 ”. It is by convention
the orientation of transcription of the gene.
• The Locus_Orientation concept, with a unique
leafconcept: “50 ! 30 ”, defined by the orientation
IMGT-ONTOLOGY, ORIENTATION Axiom of transcription of the majority of the genes in the
locus, and/or for immunoglobulin (IG) and T cell
Marie-Paule Lefranc receptor (TR) loci, by the orientation of transcrip-
Laboratoire d’ImmunoGénétique Moléculaire, tion of the constant (C) gene(s) (▶ Constant (C)
Institut de Génétique Humaine UPR 1142, gene).
Université Montpellier 2, Montpellier, France • The Gene_Orientation_in_Locus concept, for
a gene or a cluster (set of genes), with two
exclusive leafconcepts:
Definition – Direct: same orientation as the locus.
– Opposite: opposite orientation.
The ▶ ORIENTATION axiom is an axiom of the • The Orientation_on_Chromosome concept,
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope for a gene or a locus, with two exclusive
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, leafconcepts:
2005, 2008; Duroux et al. 2008). The ORIENTATION – Forward (synonyms: Watson, FWD):
axiom postulates that any molecule, cell, tissue, organ, “50 ! 30 ” gene or locus orientation is from
organism, or population, any process, and any rela- pter (short arm telomeric end) to qter (long
tion, has to be orientated. The ORIENTATION arm telomeric end).
axiom has generated the concepts of orientation – Reverse (synonym: REV): “50 ! 30 ” gene or
of ▶ IMGT-ONTOLOGY, the global reference in locus orientation is from qter to pter.
▶ immunogenetics and ▶ immunoinformatics The ORIENTATION axiom is one of seven
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, axioms in the Formal IMGT-ONTOLOGY, the others
2005, 2008; Duroux et al. 2008), built by IMGT ®, being “▶ IDENTIFICATION axiom,” “▶ DESCRIP-
the international ImMunoGeneTics information sys- TION axiom,” “▶ CLASSIFICATION axiom,”
tem ® (http://www.imgt.org) (▶ IMGT ® Information “▶ NUMEROTATION axiom,” “▶ LOCALIZATION
System). axiom,” and “▶ OBTENTION axiom.”
IMGT-ONTOLOGY concepts of orientation com-
prise (IMGT Index, http://www.imgt.org):
1. The DNA_strand_orientation concept with two Cross-References
exclusive leafconcepts:
• Sense (synonyms: Plus or Coding): The 50 - > 30 ▶ Constant (C) Gene
DNA strand is designated, for a given gene, as ▶ IMGT-ONTOLOGY
“sense,” “plus,” or “coding” because its ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
sequence is identical to the sequence of the ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
premessenger (premRNA) [except for uracile ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
(u) in RNA, instead of thymine (t) in DNA]. ▶ IMGT-ONTOLOGY, Leafconcept
This coding strand is not transcribed. ▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
IMGT-ONTOLOGY, SpecificityType 987 I
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom the type of specificity for the immunoglobulins (IG) or
▶ IMGT-ONTOLOGY, OPTENTION Axiom antibodies and the T cell receptors (TR).
▶ IMGT ® Information System IG and TR are the specific antigen receptors of the
▶ Immunogenetics adaptive immune response (and, therefore, major
▶ Immunoinformatics leafconcepts of the “ReceptorType” concept) character-
ized by their function of binding and recognition of the
antigen (▶ Function Type). The “SpecificityType”
References concept allows an amino acid characterization of this
binding.
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, The “SpecificityType” concept includes two major
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
concepts:
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- • “▶ Paratope” identifies the amino acids of the anti-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 gen receptor variable domains (V-DOMAIN)
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, (▶ Variable (V) Domain) that recognize the antigenic
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
determinant (antigen (Ag) for IG or peptide/major
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, histocompatibility (pMH) for TR).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics • “▶ Epitope” identifies the antigenic determinant
and immunoinformatics. In Silico Biol 4:17–29 (Ag or pMH) recognized by the antigen receptor I
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(IG or TR, respectively).
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics Paratope and epitope are used in IMGT/3Dstructure-
and immunoinformatics. In Silico Biol 5:45–60 DB (http://www.imgt.org) for analysis of interactions
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a in IG/Ag and TR/peptide/major histocompatibility
system and an ontology that bridge biological and computa-
(TR/pMH) complexes (Kaas et al. 2008).
tional spheres in bioinformatics. Brief Bioinform 9:263–275

Cross-References
IMGT-ONTOLOGY, SpecificityType ▶ Epitope
▶ Function Type
Marie-Paule Lefranc ▶ IMGT-ONTOLOGY
Laboratoire d’ImmunoGénétique Moléculaire, ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Institut de Génétique Humaine UPR 1142, ▶ IMGT ® Information System
Université Montpellier 2, Montpellier, France ▶ Immunogenetics
▶ Immunoinformatics
▶ Paratope
Synonyms ▶ Variable (V) Domain

Function type
References
Definition Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
“SpecificityType” is a concept of identification ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet-
(generated from the ▶ IDENTIFICATION Axiom) of
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–
▶ IMGT-ONTOLOGY, the global reference in ▶ immu- 1054
nogenetics and ▶ immunoinformatics (Giudicelli and Kaas Q, Duprat E, Tourneur G, Lefranc M-P (2008) IMGT
Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux standardization for molecular characterization of the T cell
receptor/peptide/MHC complexes, chap. 2. In: Schoenbach C,
et al. 2008), built by IMGT®, the international ImMuno-
Ranganathan S, Brusic V (eds) Immunoinformatics,
GeneTics information system® (http://www.imgt.org) Immunomics Reviews, Series of Springer Science and
(▶ IMGT® Information System) that allows to identify Business Media LLC. Springer, New York
I 988 IMGT-ONTOLOGY, StructureType

Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, (“engineered,” “humanized,” “chimeric,” “fusion”) (“chi-
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, meric” and “fusion” can also be obtained in vivo)
Thouvenin V, Combers K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, (Giudicelli and Lefranc 1999; Lefranc et al. 2004).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, Cross-References
Coelho I, Combers K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 ▶ IMGT-ONTOLOGY
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
system and an ontology that bridge biological and computa- ▶ IMGT-ONTOLOGY, Leafconcept
tional spheres in bioinformatics. Brief Bioinform 9:263–275
▶ IMGT ® Information System
▶ Immunogenetics
▶ Immunoinformatics
▶ Molecule Entity Type
IMGT-ONTOLOGY, StructureType ▶ MoleculeType

Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire, References
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
Synonyms ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Molecule structure type; Structure type
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Definition Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
“StructureType” is a concept of identification Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
(generated from the ▶ NUMEROTATION Axiom) and immunoinformatics. In Silico Biol 4:17–29
of ▶ IMGT-ONTOLOGY, the global reference Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
system and an ontology that bridge biological and computa-
in ▶ immunogenetics and ▶ immunoinformatics
tional spheres in bioinformatics. Brief Bioinform 9:263–275
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008), built by IMGT ®,
the international ImMunoGeneTics information sys-
tem ® (http://www.imgt.org) (▶ IMGT ® Information
System), that allows to identify, whatever the molecule IMGT-ONTOLOGY, Unproductive
type (▶ MoleculeType) (gDNA, cDNA, mRNA,
protein), the type de structure of ▶ Molecule_ Marie-Paule Lefranc
EntityType leafconcepts (▶ IMGT-ONTOLOGY, Laboratoire d’ImmunoGénétique Moléculaire,
Leafconcept). Institut de Génétique Humaine UPR 1142,
The “StructureType” concept comprises 16 Université Montpellier 2, Montpellier, France
leafconcepts that identify structures with a classical orga-
nization (“regular”), from those which have been modi-
fied either naturally in vivo (“processed,” “unprocessed,” Definition
“partially-processed,” “sterile-transcript,” “unspliced,”
“partially-spliced,” “spliced,” “secreted,” “membrane,” “Unproductive” is a ▶ leafconcept of the
“mature-form,” “immature-form”), or artificially in vitro “▶ FunctionalityType” concept of identification
Immobilized Template Assay 989 I
(generated from the ▶ IDENTIFICATION Axiom) Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
of ▶ IMGT-ONTOLOGY, the global reference in ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N,
immunogenetics and immunoinformatics (Giudicelli Folch G, Guiraudou D, Jabado-Michaloud J, Magris S,
and Lefranc 1999; Lefranc et al. 2004, 2005, Scaviner D, Thouvenin V, Combres K, Girod D, Jeanjean S,
2008; Duroux et al. 2008), built by IMGT®, the inter- Protat C, Yousfi Monod M, Duprat E, Kaas Q, Pommié C,
national ImMunoGeneTics information system® (http:// Chaume D, Lefranc G (2004) IMGT-ONTOLOGY for
immunogenetics and immunoinformatics. In Silico Biol
www.imgt.org) (▶ IMGT® Information System). 4:17–29
“Unproductive” identifies, whatever the molecule Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
type (▶ MoleculeType), the functionality of Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
▶ Molecule_EntityType leafconcepts in rearranged or D, Lefranc G (2005) IMGT-Choreography for immunoge-
netics and immunoinformatics. In Silico Biol 5:45–60
partially-rearranged configuration (▶ Configuration Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
Type), whose coding region has stop codon(s) and/or system and an ontology that bridge biological and computa-
frameshift mutation(s), and/or whose junction is out-of- tional spheres in bioinformatics. Brief Bioinform 9:263–275
frame (for immunoglobulin (IG) and T cell receptor
(TR)), and/or for which a mutation affects the initiation
codon, and/or for which there are defects in the splicing
sites and/or in the regulatory element(s), and/or there are
unusual features (translocated, gene fusion, etc.), and/or Imitation Switch I
changes of conserved amino acids demonstrated as lead-
ing to uncorrect folding. ▶ ISWI
“Unproductive” is one of the two leafconcepts
(the other one being “▶ Productive”) that identify the
functionality of Molecule_EntityType leafconcepts in
rearranged or partially-rearranged configuration Immobilized Template Assay
(▶ Configuration Type) (IG and TR entities after
DNA rearrangements, and by extension fusion entities Tetsuro Kokubo
resulting from translocations, and hybrid entities Department of Supramolecular Biology, Graduate
obtained by biotechnology molecular engineering). School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan

Cross-References
Synonyms
▶ Configuration Type
▶ FunctionalityType Immobilized template system; In vitro transcription
▶ IMGT-ONTOLOGY assay using an immobilized template
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT ® information system Definition
▶ Immunogenetics
▶ Immunoinformatics The immobilized template assay is a powerful tech-
▶ Molecule Entity Type nique for analyzing the correlation between transcrip-
▶ MoleculeType tional activity and the transcription factors involved in
▶ Productive the reaction. The assay is composed of two experimen-
tal parts: an in vitro transcription assay using an
immobilized template (promoter DNA) bound to mag-
References netic beads via a biotin-streptavidin linkage (Fig. 1a),
and immunoblot analyses that examine the presence or
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal absence of specific transcription factor(s) in the PIC
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 (pre-initiation complex) (Fig. 1b).
I 990 Immobilized Template System

Immobilized template assay

a in vitro transcription b PstI digestion and immunoblot analyses

magnetic beads
Mediator RNAPII
DNA TFIIF
TFIIB TFIIE
UAS core promoter
PstI activator TFIID
TFIIA TFIIH
DNA
UAS core promoter
+cell extracts, activator PstI
magnetic
PIC beads

wash out
Mediator RNAPII
TFIIF
magnetic beads TFIIB TFIIE
activator TFIID
TFIIA TFIIH Mediator RNAPII
DNA TFIIF
UAS core promoter TFIIB TFIIE
PstI TFIID
activator TFIIH
TFIIA
DNA
UAS core promoter
+rNTP PstI

transcription
Mediator immunoblot analyses
RNAPII
TFIIF magnetic
magnetic beads TFIIB TFIIE
beads
activator TFIID
TFIIA TFIIH
DNA
UAS core promoter
PstI

Immobilized Template Assay, Fig. 1 Immobilized template subjected to immunoblot analyses, as described in B; (b) PstI
assay, (a) In vitro transcription assay. Template DNA containing digestion and immunoblot analyses. To minimize contamination
a core promoter and binding site (UAS) for the transcriptional by nonspecific proteins in the system, a PstI recognition site (or
activator of interest is immobilized on magnetic beads via that of another appropriate restriction enzyme) is introduced into
a biotin-streptavidin linkage. Crude cell lysates (or purified the template DNA in advance. After extensive washing, the PIC
transcription factors) and the transcription activator of interest is then released into the supernatant by mild treatment with PstI.
are added to the immobilized template and incubated for the The component(s) in the PIC is then analyzed by immunoblot
appropriate period to assemble the PIC. Once the PIC is assem- analyses
bled, transcription is initiated by the addition of rNTP or

Importantly, this assay enables researchers to dis- References


criminate between specific (i.e., important for tran-
scription) and nonspecific (i.e., junk) proteins, even Ranish JA, Yudkovsky N, Hahn S (1999) Intermediates in
formation and activity of the RNA polymerase II
when only crude cell lysates are available, since the
preinitiation complex: holoenzyme recruitment and
template and its specifically associated proteins can be a postrecruitment role for the TATA box and TFIIB.
efficiently recovered after nonspecific proteins have Genes Dev 13(1):49–63
been washed from the system. This assay has yielded Yudkovsky N, Ranish JA, Hahn S (2000) A transcription
reinitiation intermediate that is stabilized by activator.
many important findings using cell lysates prepared
Nature 408(6809):225–229
from various yeast mutants (Ranish et al. 1999;
Yudkovsky et al. 2000).

Cross-References Immobilized Template System

▶ Transcription Initiation in Eukaryote ▶ Immobilized Template Assay


Immune Pathway Dynamics, Biological Methods 991 I
their response to stimuli or perturbations (Sung 2010).
Immune Enhancer Yet these temporal dynamics can easily be lost in con-
ventional biological assay methods such as snapshot
▶ Systems Immunology, Vaccine Adjuvant measurements of different single cells or in vitro molec-
ular extractions obtained from cell populations (Sung
and McNally 2010).
Capturing the true dynamical behaviors of even
Immune Pathway Dynamics, Biological relatively simple molecular networks requires moni-
Methods toring of same living cells over real time. This differs
from taking measurements from separate samples
Myong-Hee Sung, Qizong Lao and collected at different time points after stimuli, which
Gordon L. Hager is more often done. ▶ Live cell imaging methods not
Laboratory of Receptor Biology and Gene Expression, only achieve the real-time criterion but also preserve
National Cancer Institute, National Institutes of the natural state and environment within the cell in
Health, Bethesda, MD, USA which the molecular interactions occur. In this entry,
we introduce some essential issues involved in
conducting a live cell imaging study.
Synonyms Various live cell imaging techniques have been I
developed in recent years that allow a number of
Experimental methods for studying immune signaling methods to analyze biophysical properties of the
network dynamics molecules of interest (Sung and McNally 2010).
Among these, time lapse imaging has been the most
widely employed for systems biology. In general, time
Definition lapse imaging provides a time series of a quantity that
is closely related to the concentration of the protein
Immune pathway dynamics is a field of study aimed at “X” that has been tagged. However, it can be further
understanding the molecular and cellular mechanisms subdivided into several categories depending on how
behind physiological and pathological activities of the the cells are manipulated to express the tagged protein.
immune system. This entry focuses on experimental Below we provide a few representative examples and
methods suitable for investigating immune pathway discuss their advantages and limitations.
dynamics. The simplest and common approach is to prepare
a DNA construct by fusing the gene X with the sequence
of a fluorescent protein (FP). Then this fusion DNA
Characteristics plasmid “X-FP” is expressed in cells by transient trans-
fection. Often the construct contains a strong promoter,
Important molecular pathways in immune responses nat- e.g., the cytomegalovirus (CMV) promoter, upstream of
urally arise as highly complex network of signaling the fusion DNA to ensure high expression. While con-
factors and target genes. Virtually all such pathways venient, this approach produces a heterogeneous popu-
contain multiple positive feedback and negative lation of cells where some express a massive amount of
feedback loops which may involve transcriptional induc- the fluorescent reporter and a substantial fraction of cells
tion and protein synthesis. For example, cytokine- often show no detectable fluorescence signal. This prob-
activated JAK-STAT pathways contain SOCS proteins lem could be partially overcome if one selects moder-
as negative feedback (Kageyama et al. 2007). ately expressing cells within the microscope fields of
Representing another interesting case, the antigen- or imaging for the relevant analysis, but it can drastically
cytokine-stimulated NF-kB pathway has a strong nega- reduce the efficiency of data acquisition in time lapse
tive feedback loop: the transcriptional activation and imaging. Although the fluorescence intensity of this
resynthesis of IkBa, a major inhibitor of NF-kB reporter protein is meant to serve as a proxy for the
(O’Dea and Hoffmann 2010). These complex networks concentration of the native protein X, the reporter
can bring about complicated dynamic behaviors during expression is likely to be regulated differently because
I 992 Immune Pathway Dynamics, Biological Methods

a b
A variable proportion of protein X in All of protein X in a cell is
a cell is visualized by X-FP reporter visualized by X-FP

X
X-FP

X-FP

Immune Pathway Dynamics, Biological Methods, also express protein X from the native gene X on its resident
Fig. 1 Many imaging approaches visualize a fraction of the chromosome. Shown are two cells that have different propor-
protein population, leaving the native protein “in the dark.” (a) tions of X tagged. (b)If the fusion reporter is introduced to cells
A fluorescent reporter is prepared by fusing the gene of interest that lack the endogenous gene X, or if X is replaced by X-FP at
(X) with the gene of a fluorescent protein (FP). The tagged the native genomic location, then all the cells express X-FP. This
protein X-FP is typically expressed by transient transfection or results in complete visualization of the total population
delivery into a random genomic site by viral infection. The cells

of the unnatural promoter and context of the transfected nuclear processes (Hakim et al. 2010). Furthermore, the
DNA. Another concern with transfection-based methods natural promoter-driven or the BAC-driven expression
is the cell stress induced by the transfection procedure vector can be delivered to “knockout” cells that lack the
itself. The harsh transfection administered before each expression of the endogenous gene X, when available.
experiment induces specific cellular responses, which is This has the advantage of tagging and visualizing all the
unavoidable with all common delivery methods that use protein molecules X in the cell, providing faithful mea-
lipids, electroporation, or ultrasound. surements of the total pool of protein X (Fig. 1).
Another approach is to have the reporter mimic Ideally, replacement of the native gene with the
the naturally regulated expression of protein X by fusion reporter would guarantee complete tagging
incorporating the promoter sequence of the native gene (Fig. 1b) as well as endogenous regulation of the
X upstream of the fusion reporter. Even a ▶ bacterial reporter. This can be achieved, albeit technically
artificial chromosome (BAC) containing a large genomic demanding, by ▶ homologous recombination to gener-
DNA sequence surrounding the gene X, now replaced by ate the desired cell line or a transgenic animal. In this
X-FP, can be integrated to the genome and expressed in case, the X-FP fusion gene would reside in the natural
cells. The BAC contains all the naturally occurring reg- context of the gene X, complete with the promoter, distal
ulatory DNA sequences in the neighborhood of the gene regulatory sequences, and long-range DNA elements in
X. This aspect increases the likelihood that it will support close spatial proximity that may be important for the
an expression pattern close to the endogenous gene regulation of X.
regardless of where the BAC is integrated in the genome. In all of the above approaches, the functionality of
However, it may still be affected by the different spatial the fluorescently tagged protein X-FP should be
organization around the gene which seems to play an rigorously tested especially in the case where X-FP is
increasingly important role in transcription and other the only version of protein X present in the cells.
Immune Pathway Dynamics, Biological Methods 993 I

Original image Segmentation Noise filtering

Track cells in time series

Excluding cells in partial view


and other corrections I

Immune Pathway Dynamics, Biological Methods, excluded from consideration such as those moving into/out
Fig. 2 An example analysis flow for time lapse images from of view or those where automatic identification failed.
live cell imaging. A segmentation algorithm identifies the The procedure is applied to each image in the time series, and
objects of interest (nuclei, membranes, cytoplasm, or other then a tracking algorithm identifies and links the images from
necessary regions for pixel intensity quantification). A noise the same cell over time. Finally, intensity measurements
filtering step reduces fine-grain pixel fluctuations without are collected from the relevant regions found during the
altering the major features of the image. Certain cells can be segmentation step

Sometimes the addition of a fluorescent tag to a protein during the entire course of the experiment including the
alters its conformation or interferes with molecular setup. For the same reason, photo-damage must be min-
interactions with other molecules, which disrupts its imized to ensure the natural cell physiology. Therefore,
cellular function. the excitation setting for the FP should be the minimum
Once the cells expressing the fusion reporter are pre- necessary to obtain quantifiable images (low laser power,
pared, it is critical to select the appropriate imaging setup. short exposure time, etc.), and the detection setting should
An important goal for live cell imaging is to quantify allow maximum light output (large pinhole, optimal
images and obtain measurements that are useful for sys- filters, etc.). In addition, the time interval between
tems biology. This requires a microscope that has high- image acquisitions must be just short enough to resolve
quality optical components including a light source suit- the relevant temporal dynamics, in order to minimize the
able for the particular FPs, appropriate objective lenses, extent of ▶ photobleaching (gradual loss of fluorescence
a sensitive camera, and high precision control of stage by repeated imaging).
and other devices. An autofocus mechanism is necessary Analysis of time lapse images can involve
for long-term imaging to minimize intensity fluctuations background adjustment, noise filtering, ▶ image
even from slight focus drifts. Another basic requirement segmentation (identifying the regions/objects of
is built-in microscope components that provide standard- interest, e.g., cell boundaries, nuclear boundaries, organ-
ized cell culture conditions (temperature, humidity, CO2). elles), tracking of objects over time, and the extraction of
A good rule of thumb in performing any live cell imaging the relevant intensities measured within specified objects
experiment is to minimize deviations from the conditions (Fig. 2). Certain tasks can be performed using commer-
of the tissue culture incubator that the cells experience cially available image processing software, but often
I 994 Immune Pathway Dynamics, Modeling

these tools are not flexible enough to accommodate


specific needs of individual projects. Thus, it is Immune Pathway Dynamics, Modeling
extremely useful to develop a custom analysis flow by
implementing the algorithm in a programming language. Myong-Hee Sung
Such an automated analysis also enables one to examine Laboratory of Receptor Biology and Gene Expression,
a large number of single cells and identify behaviors that National Cancer Institute, National Institutes of
are reproduced in the majority of the cells and those that Health, Bethesda, MD, USA
manifest in a subpopulation of cells.
Over the past decade, there has been a great
increase in the use of live cell imaging methods in cell Synonyms
biology and systems biology studies. The technology is
constantly evolving and maturing into an ever sophisti- Mathematical modeling of signaling in the immune
cated and reliable platform (Giepmans et al. 2006; system; Molecular network modeling in the
Wiedenmann et al. 2009). Despite the technically immune system
demanding aspects of live cell imaging, more
researchers are recognizing the unique benefits of incor-
porating these approaches. The assays are quantitative, Definition
and ideal for noninvasive monitoring of single living
cells. Systems biology studies of immune pathway Immune pathway dynamics is a field of study aimed at
dynamics cannot afford to do without understanding understanding the molecular and cellular mechanisms
real-time dynamics in single cells. behind physiological and pathological activities
of the immune system. This entry focuses on
modeling approaches for investigating immune
pathway dynamics and the rationale.
Cross-References

▶ Bacterial Artificial Chromosome (BAC)


Characteristics
▶ Homologous Recombination
▶ Image Segmentation
Physiologically important immunological processes
▶ Live Cell Imaging
are played out by a relatively small number of special-
▶ Photobleaching
ized cell types in the immune system, even though
there is a multitude of cell types resident in diverse
tissues within the body. These key immune cells
References are controlled, in turn, by a relatively few molecular
pathways that mediate specific cellular responses.
Giepmans BN, Adams SR, Ellisman MH, Tsien RY
(2006) The fluorescent toolbox for assessing protein location Molecular pathways are an attractive organizational
and function. Science 312(5771):217–224 unit of study because of their widespread significance
Hakim O, Sung MH, Hager GL (2010) 3D shortcuts to gene in multiple cell types and a manageable number of
regulation. Curr Opin Cell Biol 22(3):305–313
Kageyama R, Yoshiura S, Masamizu Y, Niwa Y (2007) Ultradian
constituent proteins and genes.
oscillators in somite segmentation and other biological events. For instance, cytokines are ubiquitous circulating
Cold Spring Harb Symp Quant Biol 72:451–457 proteins that the immune cells use to communicate
O’Dea E, Hoffmann A (2010) The regulatory logic of the with other cells about critical information such as
NF-kappaB signaling system. Cold Spring Harb Perspect
invasion by pathogens, disrupted tissue integrity, or
Biol 2(1):a000216
Sung MH (2010) Immune pathway dynamics, modeling. other danger signals. These cytokines are secreted by
Springer, ESB the source cells and they then activate specific
Sung MH, McNally JG (2010) Live cell imaging and systems responses in the receiving cells. Cytokine receptors
biology. Wiley Interdiscip Rev Syst Biol Med 3(2):167–182
on the surface of the recipient cell initiate an intracel-
Wiedenmann J, Oswald F, Nienhaus GU (2009) Fluorescent
proteins for live cell imaging: opportunities, limitations, lular signaling pathway leading to ▶ transcriptional
and challenges. IUBMB Life 61(11):1029–1042 reprogramming and induction of specific genes.
Immune Pathway Dynamics, Modeling 995 I
Some of the activated genes code for cytokines that NF-κB p53 Notch
are secreted for the next round of signaling, creating
IκBα mdm2 Hes
a complex web of cellular communication. Particular
examples that fall under this category are interferons
(IFN), interleukins (IL), and tumor necrosis factor β-catenin STAT NFAT
(TNF) family cytokines. Many IFNs and ILs
Axin2 SOCS Calcipressin
activate the janus kinase-signal transducer and activator
of transcription (JAK-STAT) pathway, while TNF-a Immune Pathway Dynamics, Modeling, Fig. 1 Most tran-
is a prototypical activator of nuclear factor kappaB scription factors in the immune system have the potential for
(NF-kB). oscillations due to a delayed negative feedback loop. Some of the
transcriptional regulators of established importance in immunity
Another type of signaling events critical for immu-
are shown. All of them have a negative feedback loop provided
nity is induced by the antigens and their derivative by a transcriptionally induced inhibitor. Top row: NF-kB acti-
products that originate from the invading pathogens. vates the transcription of several feedback genes. The strongest
Two kinds of lymphocytes, T and B cells, are impor- feedback loop is given by the induction of its own inhibitor
IkBa. p53 activation leads to induction of the inhibitor mdm2.
tant immune cells that detect foreign products in the
Notch activates the transcriptional repressor Hes proteins that
body. T cell ▶ epitopes are short fragments of the negatively regulate itself. Bottom row: b-catenin induces its
foreign antigens introduced by the pathogen. An indi- inhibitor Axin2. STAT factors activate the transcription of
vidual T cell ▶ epitope, presented on the surface of suppressor of cytokine signaling (SOCS) that inhibits I
JAK-STAT signaling. NFAT induces Calcipressin that inhibits
professional antigen-presenting cells, can be recog-
the upstream activator Calcineurin
nized by a clone of T lymphocytes with specific
antigen receptors. On the other hand, antigen receptors
on specific B lymphocytes recognize soluble foreign A major theme running through all of these signaling
antigens. The defense mechanisms elicited following pathways is the existence of a feedback inhibitor that is
antigen recognition are different between T and transcriptionally induced by the effector protein after the
B lymphocytes; yet, they share many common initiating signal (Fig. 1). They represent different exam-
intracellular pathways, most notably NF-kB. ples of auto-inhibitory modules with a delay (caused by
As with other molecular pathways, immune pathways transcription, mRNA processing and export, and trans-
consist of a complex network of signaling proteins, lation). This network structure creates the possibility of
where their behaviors can defy qualitative reasoning oscillatory activities in the pathway (Goldbeter 1997).
based on intuition. Mathematical modeling and com- Indeed, many pathways have been experimentally
puter simulations provide insights about emergent sys- shown to exhibit oscillatory behaviors in certain condi-
tems properties of such networks, thereby guiding tions, whereas others have not been examined properly.
experimental investigations. Essential components of In general, oscillations or other temporally complex
a pathway can be compiled and their interactions can behaviors are underestimated or undetectable by conven-
be represented in ordinary differential equations (ODE) tional experimental methods (Sung and McNally 2010).
to discern some dynamical features of the network. ODE Another important feature of such networks is the
models can be made more realistic and complex by nonlinear relationship between input-output of the path-
incorporating spatial features or stochastic components. way. As a consequence, genetic or chemical interven-
Agent-based models allow the flexibility to incorporate tions can produce paradoxical outcomes. These are
the entire complexity of the actual system, but the sim- reasons why mathematical modeling is an essential tool
ulations are extremely compute-heavy and often compli- for understanding immune signaling pathways.
cated to interpret. Pathways that have been quantitatively It is noteworthy that many models have been
modeled to date include IFNg/JAK-STAT, TNF-a/NF-k constructed to match observations from cell types
B, p53, extracellular signal-regulated kinase (ERK), that are quite distant from the immune system. There-
Ca2+-nuclear factor of activated T cells (NFAT), Wnt/ fore, caution is necessary: Parameters or even network
b-catenin, and Notch/Hes. The importance of these path- topology may have to be revised before applying the
ways for numerous immune cell types is widely appre- model to immunological cell contexts. For example,
ciated (Liu 2005; Staal and Sen 2008; Radtke et al. an NF-kB model derived from mouse embryonic fibro-
2010). blasts may have little relevance for functional NF-kB
I 996 Immune Repertoire

dynamics in macrophages or T cells. It is critical for


systems biology efforts to compare experimental mea- Immune Repertoire Diversity
surements on key variables of the network with
predicted behaviors from the mathematical model. Véronique Thomas-Vaslin, Adrien Six,
The quantitative models calibrated this way can reflect Bertrand Bellier and David Klatzmann
the cell type specificity of the underlying biology. Immunology-Immunopathology-Immunotherapy,
Finally, pathway modeling in immunological UPMC University Paris 06, UMR7211, CNRS,
contexts is an exciting area in systems biology with UMR7211, INSERM, U959, Paris, France
tremendous research opportunities. So far immune path-
ways have been examined mostly in isolation, whereas
in vivo, it is expected that these pathways have cross talk, Synonyms
influencing each other. In particular, when one or more
interacting pathways have the propensity for oscillation, BCR receptor diversity; Combinatorial diversity;
the consequences can be highly complex and extremely Lymphocyte repertoire; Repertoire diversity; Somatic
challenging to understand without modeling approaches. gene rearrangements; TCR receptor diversity
In another vein, existing models often depict certain
small modules in the pathway. For example, a model
may include only the receptor-proximal biochemical Definition
reactions of a larger network that consists of the receptor,
kinases, phosphatases, transcription factors, feedback Repertoire
loops, etc. Further efforts will be necessary to build T or B cell repertoires are collections of lymphocytes,
more comprehensive representation of many ubiquitous each characterized by its antigen-specific receptor.
pathways that regulate immunity. These receptors are produced by random somatic
rearrangements of V, D and J gene segments during
lymphocyte differentiation (Boudinot et al. 2008).
Cross-References
Receptor Antigen Diversity
▶ Epitope The B and T cell receptor antigen diversity is due to
▶ Transcriptional Reprogramming the combinatorial diversity of V, (D), J somatic
gene rearrangements and on the junctional diversity gen-
erated during gene rearrangements, chain combination
References leading to a theoretical potential repertoire, of the order
of 1012–1015 TCR/Ig specificities. T cell repertoire diver-
Goldbeter A (1997) Biochemical oscillations and cellular sity is maximal after gene rearrangement in the thymus.
rhythms: the molecular bases of periodic and chaotic Following positive and negative selection processes, the
behaviour. Cambridge University Press, Cambridge
Liu JO (2005) The yins of T cell activation. Sci STKE 2005:re1 available repertoire in lymphoid organs is in the order of
Radtke F, Fasnacht N, Macdonald HR (2010) Notch signaling in 2  106 in young mice and 2  107 in humans with
the immune system. Immunity 32(1):14–27 a clonal size of 50 cells/clone in mice to 1,000 cells/
Staal FJ, Sen JM (2008) The canonical Wnt signaling pathway clone in humans. It is estimated that any given clone
plays an important role in lymphopoiesis and hematopoiesis.
Eur J Immunol 38(7):1788–1794 can respond to 4,000–400,000 different antigens (or anti-
Sung MH, McNally JG (2010) Live cell imaging and systems genic peptides) with various affinity (Casrouge et al.
biology. Wiley Interdiscip Rev Syst Biol Med 3(2):167–182 2000).

Methods to Assess Repertoire Diversity


The diversity of immune repertoires can be assessed by
Immune Repertoire a number of methods both at the level of nucleic acids
(DNA/RNA) and proteins, such as:
▶ Lymphocyte Dynamics and Repertoires, Biological • Targeting the cell surface receptor by specific anti-
Methods bodies directed against variable regions. For example,
Immunogen 997 I
flow cytometry allows quantifying the percentage of tissues by using specific antibodies that are chemically
T cells expressing a particular TCR BV or AV gene linked to fluorochromes.
according to lymphocyte phenotypical and functional
characteristics.
• In situ hybridization with specific probes on histolog-
ical sections allows quantitative analysis on single Cross-References
cells by targeting the BCR or TCR variable region.
• CDR3 spectratyping (Boudinot et al. 2008) allows ▶ Fluorescence Microscopy
qualitative analysis of repertoire on cell populations
or on clones (Bonarius et al. 2006).
• Comparison of sequences of the variable region on
cell populations. References
A more comprehensive review on these methods
can be found in (Boudinot et al. 2008). Herman B (1998) Fluorescence microscopy,
Microscopy handbooks. Bios Scientific, Oxford. ISBN
9780387915517
Huang K, Murphy RF (2004) From quantitative microscopy
Cross-References to automated image understanding. J Biomed Opt 9(5):
893–912, ISSN 1083–3668 I
MedlinePlus (2011) Medical encyclopedia. http://www.nlm.nih.
▶ Immune Pathway Dynamics, Biological Methods gov/medlineplus/ency/encyclopedia_A.htm
▶ Lymphocyte Dynamics and Repertoires, Biological Molecular Expressions (2011) Fluorescence microscopy.
Methods http://micro.magnet.fsu.edu/primer/techniques/fluorescence/
fluorhome.html
Rost FWD (1991) Quantitative fluorescence
References microscopy. Cambridge University Press, Cambridge.
ISBN 9780521394222
Bonarius HP, Baas F et al (2006) Monitoring the T-cell receptor Spring KR (2003) Fluorescence microscopy. In:
repertoire at single-clone resolution. PLoS ONE 1:e55 Driggers RG (ed) Encyclopedia of optical engineering,
Boudinot P, Marriotti-Ferrandiz ME et al (2008) New perspec- Dekker encyclopedias series. Marcel Dekker,
tives for large-scale repertoire analysis of immune receptors. New York
Mol Immunol 45(9):2437–2445 Wolf DE (2007) Fundamentals of fluorescence and fluorescence
Casrouge A, Beaudoing E et al (2000) Size estimate of the alpha microscopy. In: Sluder G, Wolf DE (eds) Digital microscopy,
beta TCR repertoire of naive mouse splenocytes. J Immunol vol 81, 3rd edn, Methods in cell biology. Academic,
164(11):5782–5787 Amsterdam, pp 63–91
Wouters FS (2006) The physics and biology of fluorescence
microscopy in the life sciences. Contem Phys 47(5):
239–255. ISSN 0010-7514
Immune Response Evaluation

▶ Immunomonitoring

Immunofluorescence Immunogen

Yi Zeng Gajendra Raghava


Department of Pediatrics, University of Arizona, Bioinformatics Centre, Institute of Microbial
Tucson, AZ, USA Technology, Chandigarh, Chandigarh, India

Definition
Definition
Immunogen is a molecule which is recognized as for-
Immunofluorescence is a technique that allows visual- eign and leads to the activation of the host immune
ization of specific proteins or antigens in cells or system.
I 998 Immunogenetics

References
Immunogenetics
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
Marie-Paule Lefranc
ONTOLOGY paradigm. Biochimie 90:570–583
Laboratoire d’ImmunoGénétique Moléculaire, Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
Institut de Génétique Humaine UPR 1142, ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Université Montpellier 2, Montpellier, France Lefranc M-P (2005) IMGT, the international ImMunoGeneTics
information system ®: a standardized approach for immuno-
genetics and immunoinformatics. Immunome Res 1:3, http://
www.immunome-research.com/content/1/1/3
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Definition FactsBook. Academic Press, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic Press, London, pp 1–398
Immunogenetics is the science that studies, in relation Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
with immunology, the genetic data of the immune Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
system and immune response, at the molecule, cell, Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
tissue, organ, organism, and/or population levels, in
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
vertebrates and invertebrates. and immunoinformatics. In Silico Biol 4:17–29
Immunogenetics data are increasingly studied in Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
▶ immunoinformatics. Immunogenetics molecules (2005a) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
comprise the antigen receptors, immunoglobulins
DOMAIN. Dev Comp Immunol 29:917–938
(IG) or antibodies (▶ Immunoglobulin Synthesis) Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(Lefranc and Lefranc 2001a) and T cell receptors Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
(TR) (Lefranc and Lefranc 2001b), and the major Lefranc G (2005b) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
histocompatibility (MH) (Lefranc et al. 2005a), that
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
characterize the adaptive immune response, and mol- system and an ontology that bridge biological and computa-
ecules of the innate immune response, that include tional spheres in bioinformatics. Brief Bioinform 9:263–275
cytokines, chemokines, integrins, related proteins of
the immune system (RPI) of the ▶ immunoglobulin
superfamily (IgSF) and of the ▶ MH superfamily Immunogenic Peptides
(MhSF).
Immunogenetics knowledge management is ▶ T Cell Epitopes
based on the concepts of ▶ IMGT-ONTOLOGY,
the global reference in ▶ immunogenetics and
▶ immunoinformatics (Giudicelli and Lefranc 1999; Immunoglobulin Superfamily
Lefranc et al. 2004, 2005b, 2008; Duroux et al.
2008), built by IMGT ®, the international ImMuno- ▶ Immunoglobulin Superfamily (IgSF)
GeneTics information system® (http://www.imgt.org)
(▶ IMGT® Information System) (Lefranc 2005).
Immunoglobulin Superfamily (IgSF)

Marie-Paule Lefranc
Cross-References Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
▶ IMGT ® Information System Université Montpellier 2, Montpellier, France
▶ IMGT-ONTOLOGY
▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoglobulin Synthesis Synonyms
▶ Immunoinformatics
▶ MH Superfamily (MhSF) IgSF; Immunoglobulin superfamily
Immunoglobulin Synthesis 999 I
Definition Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
The immunoglobulin superfamily (IgSF) is a superfamily FactsBook. Academic Press, London, pp 1–458
of proteins that share domain structural characteristics: Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
a protein belongs to the immunoglobulin superfamily Academic Press, London, pp 1–398
(IgSF) if it has at least one ▶ variable (V) domain or Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
one ▶ constant (C) domain (Lefranc et al. 2003, 2005a). unique numbering for immunoglobulin and T cell receptor
A protein of the immunoglobulin superfamily (IgSF) variable domains and Ig superfamily V-like domains. Dev
may also belong to the ▶ MH superfamily (MhSF). Comp Immunol 27:55–77
Proteins of the immunoglobulin superfamily (IgSF) Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
are important components of the immune response. Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
They comprise the antigen receptors, immunoglobu- Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
lins (IG) or antibodies (▶ Immunoglobulin Synthesis) Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
(Lefranc and Lefranc 2001a) and T cell receptors (TR) and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou D,
(Lefranc and Lefranc 2001b) of the adaptive immune Jean C, Ruiz M, Da Piedade I, Rouard M, Foulquier E,
response, characterized by V-DOMAIN (▶ Variable Thouvenin V, Lefranc G (2005a) IMGT unique numbering
(V) domain) and C-DOMAIN (▶ Constant (C) for immunoglobulin and T cell receptor constant domains
domain), and related proteins of the immune system and Ig superfamily C-like domains. Dev Comp Immunol I
29:185–203
(RPI) characterized by at least one V-LIKE-DOMAIN Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(▶ Variable (V) domain) or one C-LIKE-DOMAIN Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
(▶ Constant (C) domain). IgSF genes and proteins are Lefranc G (2005b) IMGT-Choreography for immunogenetics
extensively studied in ▶ immunogenetics and and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
▶ immunoinformatics. system and an ontology that bridge biological and computa-
IgSF knowledge management is based on the tional spheres in bioinformatics. Brief Bioinform 9:263–275
concepts of ▶ IMGT-ONTOLOGY, the global
reference in ▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005b,
2008; Duroux et al. 2008), built by IMGT®, the interna- Immunoglobulin Synthesis
tional ImMunoGeneTics information system®, http://
www.imgt.org (▶ IMGT® Information System). Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
Cross-References Université Montpellier 2, Montpellier, France

▶ Constant (C) Domain


▶ IMGT ® Information System Definition
▶ IMGT-ONTOLOGY
▶ Immunogenetics The immunoglobulin synthesis, an example of
▶ Immunoglobulin Synthesis knowledge at the molecular level, illustrates the
▶ Immunoinformatics concepts d’▶ IMGT-ONTOLOGY, the global reference
▶ MH Superfamily (MhSF) in ▶ immunogenetics and ▶ immunoinformatics
▶ Variable (V) Domain (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008), built by IMGT®, the
international ImMunoGeneTics information system®
(http://www.imgt.org) (▶ IMGT® information system).
References The concepts of identification (▶ IMGT-ONTOLOGY,
IDENTIFICATION Axiom) identify the nucleotide and
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT- protein sequences and the three-dimensional (3D) struc-
ONTOLOGY paradigm. Biochimie 90:570–583 tures according to a structured terminology
I 1000 Immunoglobulin Synthesis

“MoleculeType” “ConfigurationType”
V-gene D-gene J-gene C-gene V-gene J-gene C-gene concept concept
gDNA germline
Genome 5′ 3′ 5′ 3′

1 DNA rearrangements 1

V-D-J-gene C-gene V-J-gene C-gene

5′ 3′ 5′ 3′ rearranged
gDNA

2 2
Transcription
L-V-D-J-C-sequence L-V-J-C-sequence
mRNA rearranged
Transcriptome 5′ 3′ 5′ 3′
(or in vitro cDNA)
3 Translation
3
V-D-J-C-sequence
V-J-C-sequence
protein rearranged
Proteome IG-Light-Chain
IG-Heavy-Chain

IG (or antibody)

Immunoglobulin Synthesis, Fig. 1 An example of knowl- 1: DNA rearrangements (is_rearranged_into), 2: Transcription


edge at the molecular level: the synthesis of an IG or antibody in (is_transcribed_into), 3: Translation (is_translated_into). The
humans (Duroux et al. 2008). A human being may potentially configuration of C-gene is “undefined” (ConfigurationType)
synthesize 1012 different antibodies (Lefranc and Lefranc 2001). (IMGT Repertoire, http://www/imgt/org)

(standardized keywords), the concepts of description code the IG (and TR): “variable” (V), “diversity” (D),
(▶ IMGT-ONTOLOGY, DESCRIPTION Axiom) “joining” (J), and “constant” (C) (▶ Variable (V)
describe the composition of the sequences and structures Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene,
with standardized labels (▶ Labels and Relations), the ▶ Constant (C) Gene). The configuration of the
concepts of classification (▶ IMGT-ONTOLOGY, V-gene, D-gene, and J-gene is identified as “germline”
CLASSIFICATION axiom) classify the genes and (Fig. 1), the configuration of the C-gene is “undefined”
alleles with a standardized nomenclature (▶ Gene and (▶ Configuration Type). During the differentiation of the
Allele Nomenclature), and the concepts of numerotation B lymphocytes in the bone marrow, the genomic DNA is
(▶ IMGT-ONTOLOGY, NUMEROTATION Axiom) rearranged first in the IGH locus and then in the IGK and
numerotate the nucleotide and amino acid numbering IGL loci. The rearrangements in the IGH locus lead to the
within sequences and structures (▶ IMGT Unique Num- junction of a D-gene and a J-gene to form a D-J-gene, and
bering, ▶ IMGT Collier de Perles). then to the junction of a V-gene to the D-J-gene to form
An IG or antibody is composed of two identical a V-D-J-gene. The rearrangements in the IGK or IGL loci
heavy chains associated with two identical light lead to the junction of a V-gene and a J-gene to form
chains, kappa or lambda. In humans, heavy chain a V-J-gene. The configuration of these genes is
genes (locus IGH), light chain kappa genes (locus identified as “rearranged” (▶ Configuration Type). After
IGK), and light chain lambda genes (locus IGL) are transcription and maturation of the pre-messenger by
located on the chromosomes 14 (14q32.3), 2 (2p11.2) splicing, the messenger RNA (mRNA) L-V-D-J-C-
and 22 (22q11.2), respectively (Lefranc and Lefranc sequence Molecule_EntityType (▶ IMGT-ONTOL-
2001). The synthesis of an immunoglobulin requires OGY, Molecule_EntityType) and L-V-J-C-sequence
rearrangements of the IGH, IGK, and IGL genes dur- (L for leader) are obtained and then translated into the
ing the differentiation of the B lymphocytes. heavy chain (IG-Heavy-Chain) (▶ Chain Type) and the
In the human genome (genomic DNA or gDNA) light chain (IG-Light-Chain) of an IG (or antibody)
(▶ MoleculeType), four types of genes (▶ Gene Type) (Fig. 1) (Lefranc and Lefranc 2001).
Immunoglobulin Synthesis 1001 I
Immunoglobulin Synthesis, VH
Fig. 2 The variable domains CDR2-IMGT
VL
VH and VL (Variable (V) CDR3-IMGT
domain) of the heavy and light CDR3-IMGT CDR1-IMGT
Q66
chains of an IG or antibody. CDR1-IMGT
N

The CDR-IMGT R55 S26


CDR2-IMGT
(Complementarity M39
L39

determining region (CDR- IGH Y55 R66


IGK
IMGT)) are highlighted V–D–J L89 W41 F118
C104 V–J
(online: VH CDR1-IMGT is in REGION S26 C 104 C23
W41 REGION
red, CDR2-IMGT in orange C23 N
W118 L89
and CDR3-IMGT in purple,
VL CDR1-IMGT is in blue,
CDR2-IMGT in green and
CDR3-IMGT in greenblue)
(IMGT Repertoire, http:// C
www.imgt.org)

I
The variable domains VH and VL (▶ Variable (V) ▶ Diversity (D) Gene
Domain) are coded by the V-D-J-REGION and the ▶ Epitope
V-J-REGION (Fig. 2). Each domain includes four ▶ Framework Region (FR-IMGT)
framework regions (FR) (▶ Framework Region ▶ Gene and Allele Nomenclature
(FR-IMGT)) (in pale gray in Fig. 2) and three hyper- ▶ IMGT Collier de Perles
variable loops or complementarity determining ▶ IMGT Unique Numbering
regions (CDR) (▶ Complementarity Determining ▶ IMGT-ONTOLOGY
Region (CDR-IMGT)). The CDR, and more particu- ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
larly the CDR3 that results from the junction of the ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
V-D-J genes (in the VH domain) and V-J genes (in the ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
VL domain), are involved in the antigen recognition. ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
The VH and VL amino acids in contact with the anti- ▶ IMGT-ONTOLOGY, SpecificityType
gen constitute the ▶ paratope (▶ IMGT-ONTOLOGY, ▶ IMGT ® Information System
SpecificityType). The part of the antigen recognized by ▶ Joining (J) Gene
the antibody is the ▶ epitope. The number of potential ▶ Labels and Relations
V-D-J and V-J rearrangements depends on the number ▶ Molecule Entity Type
of functional V, D, and J genes in the genome. Addi- ▶ MoleculeType
tional mechanisms (N diversity at the V-D-J and V-J ▶ Paratope
junctions and somatic hypermutations) allow to ▶ Variable (V) Domain
reach 1012 different antibodies per individual (Lefranc ▶ Variable (V) Gene
and Lefranc 2001a) (IMGT Repertoire, http://www.
imgt.org).
References
Cross-References Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
▶ Chain Type the Formal IMGT-ONTOLOGY paradigm. Biochimie
▶ Complementarity Determining Region (CDR- 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet-
IMGT) ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
▶ Configuration Type Lefranc M-P, Lefranc G (2001) The immunoglobulin
▶ Constant (C) Gene FactsBook. Academic, London, pp 1–458
I 1002 Immunohistochemistry

Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, Immunoinformatics is crucial for understanding
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, the immune response, particularly the adaptive
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, immune response of vertebrates. Implementation of
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics specific databases and tools (Flower 2007; Lefranc
and immunoinformatics. In Silico Biol 4:17–29 2008; Schoenbach et al. 2008), creation of the
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chestellan P, ▶ IMGT® information system in 1989, and development
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
D, Lefranc G (2005) IMGT-Choreography for immunoge- of ▶ IMGT-ONTOLOGY, the global reference in
netics and immunoinformatics. In Silico Biol 5:45–60 ▶ immunogenetics and ▶ immunoinformatics (Giudicelli
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a and Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
system and an ontology that bridge biological and computa- et al. 2008), built by IMGT®, the international ImMuno-
tional spheres in bioinformatics. Brief Bioinform 9:263–275
GeneTics information system® (http://www.imgt.org)
(▶ IMGT® Information System), have contributed to the
coming of age of immunoinformatics.
Immunohistochemistry
Cross-References
Barbara J. Davis
Section of Pathology, Tufts Cummings School ▶ IMGT ® Information System
of Veterinary Medicine Biomedical Sciences, ▶ IMGT-ONTOLOGY
North Grafton, MA, USA ▶ Immunogenetics
▶ Structural Immunoinformatics
Definition

Immunohistochemistry or “IHC” is the histological pro-


cess for identifying the presence of a protein, by which
References
a specific antibody is used to attach to the protein within Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
a tissue and then reacted with a coloring process for Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
visualization by light microscopy. When the color reac- ONTOLOGY paradigm. Biochimie 90:570–583
tion is a fluorescent dye, the procedure is referred to as Flower DR (ed) (2007) Immunoinformatics: predicting immu-
nogenicity in silico, vol 409, Series methods in molecular
immunofluorescence or “IF.”
biology. Humana Press, Totowa, pp 1–438
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054.
Cross-References Lefranc M-P (2008) IMGT ®, the international ImMunoGene-
Tics information system ® for immunoinformatics. Methods
for querying IMGT ® databases, tools and Web resources
▶ Cancer Pathology in the context of immunoinformatics. Mol Biotechnol
40:101–111
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Immunoinformatics Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Marie-Paule Lefranc Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
Laboratoire d’ImmunoGénétique Moléculaire, and immunoinformatics, In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Institut de Génétique Humaine UPR 1142, Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Université Montpellier 2, Montpellier, France Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
Definition system and an ontology that bridge biological and computa-
tional spheres in bioinformatics. Brief Bioinform 9:263–275
Schoenbach C, Ranganathan S, Brusic V (eds) (2008) Immunoin-
Immunoinformatics is the science that studies
formatics, vol 1, Immunomics reviews, series of springer
▶ immunogenetics and immunology data using bioin- science and business media LLC. Springer, New York,
formatics and computational approaches. pp 1–200
In Silico Design of Epitope-based Vaccines 1003 I
The rapid progress in flow cytometry implies that an
Immunological Scale increasing number of parameters can be looked at
simultaneously, which is highly relevant for a more
▶ Organism State, Lymphocyte comprehensive assessment of lymphocyte characteris-
tics, in particular their multifunctional profile.

Cross-References
Immunomonitoring
▶ Lymphocyte Dynamics and Repertoires, Biological
Bertrand Bellier, Adrien Six, Véronique
Methods
Thomas-Vaslin and David Klatzmann
▶ Systems Immunology, Novel Evaluation of Vaccine
Immunology-Immunopathology-Immunotherapy,
UPMC University Paris 06, UMR7211, CNRS,
UMR7211, INSERM, U959, Paris, France
References

Ochsenbauer C, Kappes JC (2009) New virologic reagents for


Synonyms neutralizing antibody assays. Curr Opin HIV AIDS 5:418–425
Pulendran B (2004) Modulating vaccine responses with dendritic I
Immune response evaluation; Vaccine-induced cells and Toll-like receptors. Immunol Rev 199:227–250
Seder RA, Darrah PA, Roederer M (2008) T-cell quality in
immune response memory and protection: implications for vaccine design.
Nat Rev Immunol 4:247–258
Subbarao B, Robertson DA (2006) Assays for B lymphocyte func-
Definition tion Chap 3, unit 3.8. In John E. Colegan et al. (eds) Current
Protocols in Immunology. Wiley Inter Science, Hoboken

A series of characteristic immunological parameters


used to measure vaccine-induced immune responses,
e.g., Immuno-Stimulant
1. Innate immune responses
(a) Dendritic cell activation (Pulendran 2004) ▶ Systems Immunology, Vaccine Adjuvant
(b) Inflammatory response
(c) Complement activation
2. Adaptive immune responses
(i) Antibody and B cell immune responses In Silico Design of Epitope-based
(Subbarao and Robertson 2006) Vaccines
– Neutralizing Ab (Ochsenbauer and Kappes
2009) Zhao Bing1, Kishore R. Sakharkar2 and
– Non-neutralizing Ab Meena K. Sakharkar3
1
– Antibody-dependent cell cytotoxicity Department of Biochemistry, University of Western
(ADCC) Ontario, London, ON, Canada
2
(ii) T-cell immune responses (Seder et al. 2008) Infectious Diseases Cluster, Universiti Sains
– Lysis (CTL) Malaysia, Pulau Pinang, Malaysia
3
– Tetramer Graduate School of Life and Environmental Science,
– Cytokines University of Tsukuba, Tsukuba, Ibaraki, Japan
(iii) Lymphocyte repertoire diversity
– Immunoscope/CDR3 spectratyping
– TCR/Ig rearrangement quantification Definition and Introduction
– TCR/Ig deep sequencing
See ▶ Lymphocyte Dynamics and Repertoires, In the past century, medical research has improved
Biological Methods. health and increased life expectancy largely because
I 1004 In Silico Design of Epitope-based Vaccines

of success in preventing and treating infectious dis- versus nonself discrimination (Viret and Janeway
eases. Vaccines, in particular, offer protection against 1999). The short antigenic peptides, derived from
infectious diseases. However, new threats to health parent molecules have been proposed as interesting
continually emerge as bacteria and viruses evolve, targets of vaccine design. Thus, a new generation of
are transported to new environments, or develop resis- high efficiency, low toxicity epitope-based vaccines
tance to drugs and vaccines. Also, the current threat of are in development.
the potential use of viruses and bacteria as biological Accessing subdominant specificities might be of
weapons demands efficient methodologies that ana- particular value in the case of tumor antigens, where
lyze genetic information related to potential viral and self-tolerance might have inactivated T-cells recogniz-
bacterial threats, identify potential vaccine target, and ing the most dominant specificities. This approach also
then develop diagnostic tests and vaccines for them. allows elicitation of immune responses against multi-
With the advances in genomics, understanding of ple conserved epitopes, a factor of crucial importance
pathogen variability as well as of the diversity of the in the case of rapidly mutating pathogens, such as HIV
human immune system, has greatly improved. and HCV. It is also possible to overcome limitations in
The availability of human genome data and various recombinant protein delivery by combining epitopes
microbial genome sequences provides opportunity to from multiple antigens into a single immunogen. They
establish computational methodologies to meet the can also increase potency and break tolerance. T-cell
demands. To date, genome sequences are availab

You might also like